A custom 16-bit instruction set architecture optimized for embedded matrix operations, with a complete assembler, encoder/decoder, and 5-stage pipeline simulator.
Application-Specific Processor (ASP) designed for efficient 4×4 and 8×8 matrix operations. Key design decisions:
- MAC instruction — single-cycle multiply-accumulate reduces dot product instruction count by 33% vs. separate MUL+ADD
- 16-bit encoding — instructions fit in one memory word, simplifying fetch/decode
- 8 general-purpose registers — sufficient for small matrix kernels
- 6-bit signed immediate — covers common offsets (-32 to 31)
graph LR
subgraph IF["Fetch"]
PC["Program Counter"] --> IMEM["Instruction Memory"]
IMEM --> IFID["IF/ID Register"]
end
subgraph ID["Decode"]
IFID --> RF["Register File R0-R7"]
IFID --> DEC["Decoder"]
RF --> IDEX["ID/EX Register"]
DEC --> IDEX
end
subgraph EX["Execute"]
IDEX --> ALU["ALU / MAC Unit"]
FW["Forwarding Unit"] --> ALU
IDEX --> EXMEM["EX/MEM Register"]
ALU --> EXMEM
end
subgraph MEM["Memory"]
EXMEM --> DMEM["Data Memory (64KB)"]
DMEM --> MEMWB["MEM/WB Register"]
end
subgraph WB["Write Back"]
MEMWB --> RF
end
HD["Hazard Detection Unit"] -.->|"stall/flush"| IFID
HD -.-> IDEX
graph LR
subgraph R_TYPE["R-Type Format (16-bit)"]
R_OP["Opcode 4 bit"] --- R_RD["Rd 3 bit"]
R_RD --- R_RS1["Rs1 3 bit"]
R_RS1 --- R_RS2["Rs2 3 bit"]
R_RS2 --- R_FN["funct 3 bit"]
end
subgraph I_TYPE["I-Type Format (16-bit)"]
I_OP["Opcode 4 bit"] --- I_RT["Rt 3 bit"]
I_RT --- I_RS["Rs 3 bit"]
I_RS --- I_IMM["Imm6 6 bit signed"]
end
| Format | Bits | Fields |
|---|---|---|
| R-type | 16 | Opcode(4) | Rd(3) | Rs1(3) | Rs2(3) | funct(3) |
| I-type | 16 | Opcode(4) | Rt(3) | Rs(3) | Imm6(6) |
| Instruction | Opcode | Format | Description |
|---|---|---|---|
| LOAD | 0x0 | I-type | Load from memory |
| STORE | 0x1 | I-type | Store to memory |
| ADD | 0x2 | R-type | Register addition |
| MUL | 0x3 | R-type | Register multiplication |
| MAC | 0x4 | R-type | Multiply-accumulate |
| SCAL | 0x5 | I-type | Scalar multiply with immediate |
| BEQ | 0x6 | I-type | Branch if equal |
| NOP | 0xF | - | No operation |
├── include/isa.hpp # ISA definitions, encoder, decoder (171 LOC)
├── src/
│ ├── assembler.cpp # Assembler + single-cycle simulator (507 LOC)
│ └── pipeline_simulator.cpp # 5-stage pipelined model with hazard handling (648 LOC)
├── programs/
│ ├── matrix_add.asm # Matrix addition example
│ ├── matrix_mul.asm # Matrix multiplication (single element)
│ ├── dot_product.asm # Vector dot product
│ └── scalar_multiply.asm # Scalar multiplication
├── isa-specification.md # Full ISA specification (encoding, semantics)
├── MICROARCHITECTURE.md # Pipeline datapath, hazards, performance analysis
├── Makefile # Build system
└── LICENSE
make # Build assembler and simulator
make run # Run demo (encode programs, execute, print results)
make test # Run validation testsThe MAC instruction is the key differentiator:
; Without MAC (3 instructions, 1 temp register):
MUL R6, R4, R5
ADD R0, R0, R6
; With MAC (1 instruction, no temp):
MAC R0, R4, R5For a 4×4 matrix multiply (64 MAC operations): 33% fewer instructions, eliminates intermediate register pressure.
Built-in validation covers:
- Round-trip encode/decode for all instruction types
- Immediate value boundary tests (-32 to 31)
- Register range validation (R0-R7)
- Format verification (R-type vs I-type)
MIT