Skip to content

pedramcode/myvm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

61 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Rust Virtual Machine / Assembler and Compiler

Table of Contents


Tutorials

Here is link to MyVM tutorial: Tutorial


๐Ÿ— Architecture

This project is a stack-based virtual machine (VM) with a separate call stack for managing function calls.

What is a Stack-Based VM?

A stack-based VM executes instructions primarily by manipulating a stack data structure.
Instead of operating directly on registers, most instructions push values onto the stack and pop them when needed.

For example, consider the following program:

PUSH 10
PUSH 20
ADD
  • PUSH 10 pushes the value 10 onto the stack.
  • PUSH 20 pushes the value 20.
  • ADD pops the top two values (20 and 10), adds them, and pushes the result (30) back onto the stack.

This simple model makes it easier to implement compilers and interpreters since there is no need to manage complex register allocations.


Instruction Set: RISC Design

The VM uses a Reduced Instruction Set Computing (RISC) design.
RISC focuses on having a small, well-defined set of simple instructions, each performing a single operation efficiently.

Each instruction in the VM is represented as an Opcode.

What is an Opcode?

An Opcode (Operation Code) is the numeric or symbolic representation of an instruction that tells the VM what operation to perform.
Opcodes may also take parameters (operands).

For example:

PUSH 42 ; Opcode: PUSH, Operand: 42
PUSH 32 ; Opcode: PUSH, Operand: 32
ADD ; Opcode: ADD (no operands)

Some Opcodes have variants.
For example, the Jump instruction has multiple variants that depend on conditions:

  • jmp โ†’ Unconditional jump
  • jnz โ†’ Jump if not zero
  • jz โ†’ Jump if zero
  • jg โ†’ Jump if greater
  • jl โ†’ Jump if less
  • jge โ†’ Jump if greater or equal
  • jle โ†’ Jump if less or equal

Main VM Components

The VM core is composed of three primary components:

1. Memory

  • The main storage that contains:
    • Uploaded program code
    • Variables and data
    • Execution stack
  • The stack is located at the end of memory and grows backwards to avoid collisions with program data.
  • If the stack grows beyond its capacity, a stack overflow occurs.
  • Each memory cell is 32-bit wide.

2. Registers

  • Registers are small, fast storage locations inside the VM that hold data during execution.
  • This VM has:
    • 8 general-purpose registers: r0 ... r7
    • Program Counter (PC) register: stores the address of the next instruction to execute

Registers allow the VM to perform operations more quickly than relying solely on memory.

3. Flags

  • Flags are single-bit indicators that store the outcome of operations.
  • They are used by conditional instructions (like jumps) to determine execution flow.
  • This VM defines the following flags:
    • Zero (Z) โ†’ Set when the result of an operation is 0
    • Overflow (O) โ†’ Set when an arithmetic operation produces a value outside the representable range
    • Negative (N) โ†’ Set when the result of an operation is negative
    • Carry (C) โ†’ Set when an arithmetic operation generates a carry/borrow bit beyond the register size

๐Ÿ“ VM Assembly Syntax and Commands

The VM uses a custom assembly language for programming, supporting constants, memory addresses, registers, labels, macros, and interrupts.


Program Entry Point

Every program must have exactly one .start label that serves as the entry point. This is where execution begins.

  • No .start label: Results in compilation error
  • Multiple .start labels: Results in compilation error

Example:

[text]
.start
    PUSH 10
    INT 0 0
    TERM

Sections

Programs must be organized into sections using [section_name] tags. There are two mandatory sections:

  • [text] - Contains executable code
  • [data] - Contains data definitions and constants

Writing code in the [data] section or defining data in the [text] section will result in an error.

Example:

[data]
$name dw "Pedram" 0
$scores w 0x2301 0x1212 19 20

[text]
.start
    PUSH $name
    INT 0 3
    TERM

Data Definitions

Data can be defined in the [data] section using the following syntax:

$[identifier] [data type] [... data separated with space ...]

Supported data types:

  • b - Byte: stores data in 1-byte arrays
  • w - Word: stores data in 2-byte arrays
  • dw - Double Word: stores data in 4-byte arrays

Examples:

[data]
$name b "Pedram"
$scores w 0x2301 0x1212 19 20
$data dw 0xaaaaaaaa 0xbbbbbbbb

Note on memory storage: Each memory cell is 32-bit (4 bytes). When using b or w types, data is packed into memory cells at the bit level. For example:

  • b 0xaa 0xbb 0xcc stores as: 0xaabbcc00
  • w 0xaabb 0xcc stores as: 0xaabbcc00

Enhanced Memory Access Syntax

The following syntax is supported for accessing data:

; Push operations
push $name        ; pushes $name address to stack
push [$name]      ; pushes $name value to stack  
push [$name + 4]  ; pushes $name value with offset to stack
push [$name + r0] ; pushes $name value with offset stored in register to stack

; Move operations
move r0 $name        ; move address of $name into register r0
move r0 [$name]      ; move value of $name into register r0
move r0 [$name + 2]  ; move value of $name with offset into register r0
move r0 [$name + r1] ; move value of $name with offset stored in r1 into register r0
move r0 &r1          ; move value of data that its address stored in register r1 to r0

Numbers

  • Decimal: 42
  • Hexadecimal: 0x2A
  • Binary: 0b101010

Memory Addresses

  • Use &[number] to reference memory addresses:
    • &0x1010, &321, &0b101010
  • They point to the value stored at the specified memory address.

Metas

Metas are special commands that control how the VM executes or prepares code.
They start with @ and may have parameters:

  • @ORG x โ†’ Sets the origin address in memory for the following code.
  • @INCLUDE "./file.asm" โ†’ Includes another assembly file into the current file.

Comments

  • Start with ;
; This is a comment

Registers

The VM supports registers:

  • General-purpose: r0, r1, r2, r3, r4, r5, r6, r7

  • Special-purpose: pc (Program Counter)

Opcodes (commands)

Command Parameters Description
PUSH 10 number Pushes a constant number to the stack
PUSH r0 register Pushes value of a register onto the stack
PUSH &10 address Pushes value from memory address onto the stack
PUSH $name data label Pushes address of data label to stack
PUSH [$name] data label Pushes value of data label to stack
PUSH [$name + 1] data label + offset Pushes value of data label with offset to stack
PUSH [$name + r0] data label + offset Pushes value of data label with offset in register to stack
POP r1 register Pops value from stack into a register
POP &32 address Pops value from stack into a memory address
ADD - Pops two values, adds them, pushes result
SUB - Pops two values, subtracts, pushes result
MUL - Pops two values, multiplies, pushes result
DIV - Pops two values, divides, pushes result, puts reminder in R3.
DROP - Drops the top item of the stack
SWAP - Swaps top two items on stack
INC r0 register increase register by 1
DEC r0 register decrease register by 1
MOVE r0 10 register, value Moves constant into register
MOVE r0 r1 register, register Moves value from one register to another
MOVE r0 &12 register, address Moves value from memory address to register
MOVE r0 $name register, data Moves address of data label to register
MOVE r0 [$name] register, data Moves value of data label to register
MOVE r0 [$name + 2] register, data Moves value of data label with offset to register
MOVE r0 [$name + r1] register, data Moves value of data label with register offset to register
MOVE r0 &r1 register, register Moves value from address in register to register
STORE 1010 32 address, value Stores constant into memory
STORE 1010 r3 address, register Stores register value into memory
JMP .label label Unconditional jump
JNZ .label label Jump if not zero
JZ .label label Jump if zero
JG .label label Jump if greater
JGE .label label Jump if greater or equal
JL .label label Jump if less
JLE .label label Jump if less or equal
AND - Pops two values, bitwise AND, pushes result
OR - Pops two values, bitwise OR, pushes result
XOR - Pops two values, bitwise XOR, pushes result
NOT - Pops one value, bitwise NOT, pushes result
SHR 10 value Pops value, shifts right by constant, pushes result
SHR r3 value Pops value, shifts right by register value, pushes result
SHL 10 value Pops value, shifts left by constant, pushes result
SHL r3 value Pops value, shifts left by register value, pushes result
CALL 32 address Calls procedure at address
CALL .label label Calls procedure by label
CALL r0 register Calls procedure at address in register
CALL &323 address Calls procedure at memory address
SAFECALL 32 address Calls procedure at address and preserve machine state (registers and flags)
SAFECALL .label label Calls procedure by label and preserve machine state (registers and flags)
SAFECALL r0 register Calls procedure at address in register and preserve machine state (registers and flags)
SAFECALL &323 address Calls procedure at memory address and preserve machine state (registers and flags)
RET - Returns from procedure
DUP - Duplicates top stack item
DUP 10 number Duplicates top stack item n times
DUP r3 register Duplicates top stack item r3 times
INT 0 2 module, function Calls an interrupt (see below)
TERM - Terminates code execution

Labels

  • Labels mark code positions and procedures for jumps and calls.

  • Start with . and contain only alphanumeric characters (no spaces):

.sayhello
    ; code here
RET

Interrupts

An interrupt is a pre-defined function in a VM module that performs operations outside normal instructions. Interrupts allow modular, system-level functionality like I/O.

Syntax

INT module_number function_number

Example: Module 0 = IO

Function Description
0 Pops top of stack and prints it
1 Pops number n from stack, then pops n items and prints them
2 Pops a stop value, then continuously pops and prints until reaching stop value
3 Pops address of string in [data] and prints it until reaching 0
4 Pops number from stack and prints it as string

Interrupts provide a bridge between VM code and system-level functions without complicating the instruction set.


Example Usage

@ORG 0x100

.start
CALL .f
TERM

.f
PUSH 13
INT 0 0
RET

Hello World! (Using data section)

[data]

$hello dw "Hello World!" 10 13 0

[text]

.start
push $hello
int 0 3
term

Reverse print number

@org 0

.start
push 123
call .split
TERM

.split
    push 10
    swap
    div
    push r3
    call .printdigit
    dup
    push 10
    swap
    sub
    drop
    jge .split
    call .printdigit
    call .newline
    ret

; push digit as parameter
.printdigit
    push 48
    add
    int 0 0
    ret

.newline
    push 10
    push 13
    push 10
    int 0 2
    ret

๐Ÿ’ป Command-Line Interface (CLI)

This project includes a CLI tool built with Rust Clap to compile assembly code into binary and execute binary files on the VM.

The CLI provides two main commands: compile and exec.


Installation

After cloning the repository, build the CLI with Cargo:

cargo build --release

The compiled executable will be in target/release/. You can run it directly:

./myvm [COMMAND] [OPTIONS]

Commands

1. Compile

Compiles an assembly source file into a binary that can be executed on the VM.

Usage:

./myvm compile -p source.asm -o output.bin
Options
Option Description
-p, --path Path to the source assembly file
-o, --output Path to the output binary file

How it works:

  • Reads the assembly file at the given path
  • Compiles it using the VM compiler
  • Writes a binary file with:
    • Header: origin address (u32)
    • Body: compiled bytecode (u32 per instruction)

2. Exec

Executes a compiled binary file on the VM.

Usage:

./myvm exec -p output.bin --cells 2048 --stack 256

Options:

Option Description Default
-p, --path Path to the binary file โ€”
-c, --cells Number of memory cells in the VM 2048
-s, --stack Number of cells allocated for the stack 256
-d, --dump Dumps the VM's memory layout to stdout false

How it works:

  • Reads the binary file
  • Parses the origin address from the header
  • Loads the bytecode into VM memory
  • Configures the VM memory and stack size
  • Sets the program counter to the origin
  • Executes instructions sequentially until TERM or an error occurs

๐Ÿ› ๏ธ Developer TODO / Roadmap

This project is a hobby but fully open for contributions. Here are some key areas to work on:

  • Error Handling

    • Detect and handle stack overflows, invalid memory access, and illegal opcodes
    • Provide descriptive runtime error messages
  • Heap and Memory Management

    • Implement dynamic memory allocation
    • Add garbage collection or memory reuse strategies
  • IO Interrupt Module

    • Expand module 0 functionality
    • Support reading input, printing formatted output, and file operations
  • Network Interrupt Module

    • Add network communication interrupts for sending/receiving data
    • Enable TCP/UDP support for simple network programs
  • More unit tests

    • Write unit test for all modules and functions
  • Code docs

    • Write better code docs
  • Create a REPL

    • REPL in CLI

About

Rust virtual machine and assembler

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages