This project implements a 64-bit RISC-V processor in Verilog.
The processor is developed in two stages:
- Sequential (Single-Cycle) Processor
- 5-Stage Pipelined Processor
The design supports a subset of the RISC-V ISA including R-type, I-type, load/store, and branch instructions.
The processor datapath and control logic are based on the architecture described in Computer Organization and Design – RISC-V Edition by Patterson and Hennessy.
- Designed and implementated of a 64-bit RISC-V processor with both single-cycle and 5-stage pipelined architectures (IF, ID, EX, MEM, WB) supporting R-type, I-type,
ld,sd, andbeqinstructions in verilog. - Individually implemented the core datapath modules, hazard detection unit, and forwarding logic to resolve data hazards and reduce pipeline stalls.
- Branch resolution moved to the ID stage, reducing branch misprediction penalty to a single pipeline flush cycle using a static always-not-taken predictor.
- Developed Verilog testbenches to verify pipeline execution, hazard handling, and memory operations.
In the single-cycle architecture, every instruction completes all stages of execution in a single clock cycle.
Stages executed within the cycle:
- Instruction Fetch
- Instruction Decode
- Execute
- Memory Access
- Write Back
Advantages:
- Simple design
- Easier verification
Limitations:
- Long critical path
- Lower performance
The pipelined processor divides instruction execution into five stages:
| Stage | Description |
|---|---|
| IF | Instruction Fetch |
| ID | Instruction Decode |
| EX | Execute |
| MEM | Memory Access |
| WB | Write Back |
Multiple instructions execute simultaneously across stages, improving overall throughput.
The pipeline design includes:
- Hazard Detection Unit
- Forwarding Units
- Branch forwarding logic
- Pipeline registers
- Stall and flush control
- Load-store forwarding unit
The ALU performs arithmetic and logical operations on two 64-bit operands according to the control signals generated by the ALU Control unit.
The ALU result is used for:
- arithmetic operations
- memory address calculation
- branch comparisons
| Signal | Width | Description |
|---|---|---|
| Operand A | 64 | First operand from register file |
| Operand B | 64 | Second operand from register file or immediate |
| ALU Control | 4 | Operation selector |
| Signal | Width | Description |
|---|---|---|
| ALU Result | 64 | Result of operation |
| Zero Flag | 1 | Indicates if result equals zero |
ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU
The ALU uses a single shared 64-bit adder instead of multiple adders to reduce hardware complexity.
The zero_flag is generated using a NOR reduction of the result bus and is used for branch instructions such as BEQ.
The Control Unit decodes the instruction opcode and generates control signals that guide datapath components such as the ALU, register file, memory, and multiplexers.
The ALU Control module determines the exact ALU operation based on:
ALUOpfrom the main control unit- instruction fields
funct3 - instruction fields
funct7
The Register File contains 32 general purpose registers (x0 – x31).
Features:
- Two simultaneous reads
- One write per clock cycle
- Register
x0always returns zero
The Immediate Generator extracts immediate values from instructions and sign-extends them to 64 bits for use in arithmetic operations.
The Program Counter stores the address of the next instruction to fetch.
- Normally increments by 4
- Updated with branch target address for branch instructions
Instruction memory stores the program instructions.
Features:
- 4096 byte memory
- Byte-addressable
- Instructions loaded from
instructions.txt
Data memory stores runtime data used by load and store instructions.
Features:
- Byte-addressable memory
- Supports both read and write operations
Multiplexers select between different datapath inputs such as:
- register values
- immediate values
- ALU outputs
- memory outputs
Pipeline registers separate the pipeline stages and store intermediate results between clock cycles.
| Register | Between Stages |
|---|---|
| IF/ID | Instruction Fetch → Decode |
| ID/EX | Decode → Execute |
| EX/MEM | Execute → Memory |
| MEM/WB | Memory → Write Back |
The Hazard Detection Unit detects situations where an instruction depends on data that has not yet been produced.
When a hazard is detected:
- PC is frozen
- IF/ID pipeline register is frozen
- A NOP bubble is inserted
The Forwarding Unit resolves Read-After-Write (RAW) hazards by forwarding results from later pipeline stages directly to the ALU inputs.
This reduces pipeline stalls.
The Branch Forwarding Unit forwards updated register values to the branch comparison logic in the ID stage, allowing branches to be resolved earlier.
The Extra Forwarding Unit handles hazards between load and store instructions by forwarding the loaded value directly to the store operation.
Two control signals ensure correct pipeline execution:
Stall
- Freezes early pipeline stages when data is not ready.
Flush
- Removes incorrectly fetched instructions when a branch is taken.
To test the processor implementation, follow these steps:
-
Download the repository and ensure the folder structure is preserved (both
SEQandPipelinefolders should remain intact). -
Write your program in RISC-V assembly.
-
Use the provided assembler to convert the assembly program into big-endian hexadecimal instruction format.
-
Copy the generated instruction bytes and paste them into the file instructions.txt located in the respective implementation folder (
SEQorPipeline). -
Run the corresponding testbench file using your Verilog simulator.
Sequential processor: seq_tb.v
Pipelined processor: pipe_tb.v
The simulator will execute the instructions and display the register and pipeline outputs, allowing verification of correct processor behavior.
For a detailed explanation of the sequential processor design and pipelined processor implementation, please refer to the reports present in the respective folders:
SEQ/Sequential_Report.pdfPipeline/RISC_V_Processor_Pipeline_Report.pdf