This repo provides tools and models which we used to experiment with applying supervised learning to cache admission. Providing a simple yes/no binary label for cache admission highly depends on cache state, hence this work instead aims to transform an online caching problem into an offline caching problem by predicting some label which represents the next access time of a piece of data. This project performed as a course project towards the completion of the Statistical Machine Learning (CSE-575) course at Arizona State University.
We use existing cache datasets and python bindings to libCacheSim to perform model inference during each cache miss. We train two models using LSTM and GRU as base models, based on the following features/labels:
| Name | Description | |
|---|---|---|
| Feature | ObjectID | Unique object identifier |
| Feature | ObjectSize | Object size in bytes |
| Feature | TimeSinceLastAccess | Logical time since last data access |
| Feature | AccessFrequency | Object access frequency over a window of time |
| Label | ReuseScore |
The predicted label is then used alongside a threshold to determine if the data should be admitted or not.
We compare our individual models, integrated with libCacheSim as admission policies, against existing admission policies and related works which also perform cache admission or early eviction strategies. The difference in performance between each solution and an optimal solution (assume cache size equivalent 100% WSS, achieving minimum possible miss ratio) is depicted in the bar plots shown below:
- Note: A smaller bar is more desirable, indicating the solution is closer to maximum achievable miss ratio
Comparison against simple admission schemes:
Comparison against SOTA admission and early eviction strategies:
| Directory | Content Description |
|---|---|
| Models | Contains trained model weights |
| Util | Contains our Jupyter notebooks which we use to explore datasets, features, and also our training loops |
| Util/Parse | Parsing utilities for pulling and parsing oracleGeneral format traces into convenient pd.DataFrame and CSV formats |
| Util/Simulation | Contains our scripts for running simulations in bulk, both existing solutions and our solutions |
| Visuals | Contains simple scripts which we use to generate visual aids for writing |

