Skip to content

A cycle-accurate RISC-V simulator modeling a multi-compute-unit architecture with a shared fetch unit, two-level cache hierarchy, synchronization primitives, and scratchpad memory. Designed to study variable-latency memory behavior, parallel execution, and cache vs scratchpad performance trade-offs.

Notifications You must be signed in to change notification settings

AnirudhArrepu/RISC-V-Simulator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RISC-V Simulator with Cache Hierarchy, Synchronization, and Scratchpad Memory

This repository contains a cycle-accurate RISC-V processor simulator developed as part of the CS209P Computer Architecture course project. The simulator models a multi-compute-unit architecture with realistic pipeline behavior, a two-level cache hierarchy, synchronization primitives, and programmer-managed scratchpad memory.

The goal of this project is to study memory system behavior, parallel execution, and performance trade-offs in modern processor designs.

Architecture Overview

  • Four Compute Units (CUs)

    • Shared instruction fetch unit
    • Independent Decode, Execute, Memory, and Writeback stages per CU
    • Shared instruction and data memory
    • Each CU has a unique CID (Compute ID) register
  • Single Fetch Unit

    • All compute units fetch the same instruction
    • Execution is selectively enabled or disabled based on CID
    • Enables SIMD-like execution with control divergence

Memory Hierarchy

  • L1 Caches

    • L1 Instruction Cache (L1I)
    • L1 Data Cache (L1D)
    • Configurable parameters:
      • Cache size
      • Block size
      • Associativity
      • Access latency
    • Instruction fetch is treated as a cacheable memory access
    • Cache blocks of 64 bytes hold up to 16 instructions
  • L2 Cache

    • Unified L2 cache shared by instructions and data
    • Configurable size, associativity, and latency
    • Accessed on L1 cache misses
  • Main Memory

    • Accessed on L2 cache misses
    • Configurable main memory latency
    • Variable-latency memory operations introduce pipeline stalls

Cache Replacement Policies

  • LRU (Least Recently Used)
  • One additional configurable replacement policy The simulator tracks cache hits, misses, and stall cycles introduced by memory access delays.

Scratchpad Memory (SPM)

In addition to the cache hierarchy, the simulator includes a programmer-managed scratchpad memory:

  • Same size and access latency as L1D cache
  • No automatic replacement or tag lookup
  • Entirely controlled by software

Custom Instructions

  • lw_spm rd, offset(rs1) Loads a word from scratchpad memory into register rd

  • sw_spm rs2, offset(rs1) Stores a word from register rs2 into scratchpad memory

The SPM is used to compare cache-based and software-managed memory systems for strided access patterns.

Synchronization Support

SYNC Instruction

  • Acts as a barrier synchronization primitive
  • A compute unit stalls at SYNC until all compute units reach the same instruction
  • Implemented as a hardware-modeled no-op
  • Ensures correctness for parallel workloads such as reductions
  • This mechanism prevents premature reads of shared data before all compute units have completed their updates.

Performance Metrics

At the end of execution, the simulator reports:

  • Total number of stall cycles
  • Cache miss rate
  • IPC (Instructions Per Cycle)

These metrics are used to evaluate different cache configurations and memory access strategies.

Supported Workloads

  • Parallel array addition using per-CU partial sums
  • Strided array access benchmarks
  • Cache vs scratchpad memory performance comparison
  • Barrier synchronization using SYNC

The simulator supports evaluating both direct-mapped and fully associative cache configurations.

Developers

MOMs

    • Date: 10-03-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Raghavendra implemented GUI and Anirudh connected it with the backend using flask.
    • Date: 08-03-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Anirudh completed implementing the shared IF unit, shared Memory and worked on special purpose registers.
    • Date: 06-03-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Raghavendra started GUI, Anirudh completed implementing latencies and worked on shared IF unit.
    • Date: 04-03-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Anirudh Implemented pipelining with data forwarding and Raghavendra tried latencies.
    • Date: 02-03-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Raghavendra and Anirudh worked for detection and correctness in stall count and finally completed the stall count implementation.
    • Date: 28-02-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Raghavendra tried pipelining without forwarding implementation and Anirudh worked on detecting the RAW hazards and completed code for it, along with forwarding.
    • Date: 25-02-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Raghavendra and Anirudh discussed about the way to implement the pipelining and had decided an architecture.
    • Date: 19-02-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Raghavendra completed the GUI using HTML, CSS, and JavaScript, while Anirudh worked on integrating the GUI with the Python backend. Anirudh decided to use Flask for this integration.
    • Date: 17-02-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: The team decided to implement a GUI for the simulator. Initially, Raghavendra developed a basic GUI using Tkinter (import tkinter as tk from tkinter import messagebox). However, it was not visually appealing, so We decided to build the GUI using HTML, CSS, and JavaScript instead.
    • Date: 15-02-2025 - Memebers: Anirudh A, Raghavendra P - Decision: Anirudh tested the code with various programs and fixed several bugs using the data segment format. We verified the correct addressing of arrays and successfully obtained the correct output for sum-of-elements problems.
    • Date: 13-02-2025 - Memebers: Anirudh A, Raghavendra P - Decision: The team collaboratively implemented the Bubble Sort algorithm. We also added a data segment (word) to the code by creating an array to store input data in the format: arr: .word 0x4 ...
    • Date: 11-02-2025 - Memebers: Anirudh A, Raghavendra P - Decision: Anirudh realised array indexing is 1, but addi can also perform arithmetic operations and hence differentiating logical and pointer arithmetic will not be possible. Hence Anirudh decided to make memory of 4*x allocations, index belonging to its module 4 coreid.
    • Date: 09-02-2025 - Memebers: Anirudh A, Raghavendra P - Decision: The team divided responsibilities: 1.Raghavendra was assigned to implement arithmetic operations. 2.Anirudh was responsible for memory operations. 3.We discussed defining unique instructions that differ from the RISC-V instruction set.
    • Date: 07-02-2025 - Memebers: Anirudh A, Raghavendra P - Decision: 1.Anirudh was assigned to complete the Software Design by 10-02-2025. 2.Raghavendra was tasked with reviewing relevant topics and enhancing his Python knowledge.
    • Date: 06-02-2025
    • Memebers: Anirudh A, Raghavendra P
    • Decision: Decided to complete and build the GPU simulator with python language since, 1.Python has a simpler syntax compared to C/C++, making it easier to implement and understand complex GPU architectures. 2.Python has great visualization tools like Matplotlib and Seaborn, which help analyze performance metrics.

Note:

  • special register: x31
  • instructions implemented: add addi sub la lw sw bne ble beq jal jr slt j li
  • implemented .word in data segment
  • code should have a .data and a .text segment to work
  • label should have the corresponding instruction for ease, label should be written as a standalone statement
  • memory starts being used from the end for storing .data segment values

To Execute:

  • GUI
cd Codes
cd Simulator/Phase2
pip install -r requirements.txt
python main.py
Open 1270.0.1:5000 in browser
  • File Reading: change assembly.asm
cd Codes
cd Simulator/Phase2
pip install -r requirements.txt
python main.py

About

A cycle-accurate RISC-V simulator modeling a multi-compute-unit architecture with a shared fetch unit, two-level cache hierarchy, synchronization primitives, and scratchpad memory. Designed to study variable-latency memory behavior, parallel execution, and cache vs scratchpad performance trade-offs.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •