Skip to content

AlexSutila/Learning-Based-Cache-Admission

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Based Cache Admission

This repo provides tools and models which we used to experiment with applying supervised learning to cache admission. Providing a simple yes/no binary label for cache admission highly depends on cache state, hence this work instead aims to transform an online caching problem into an offline caching problem by predicting some label which represents the next access time of a piece of data. This project performed as a course project towards the completion of the Statistical Machine Learning (CSE-575) course at Arizona State University.

Methodology Overview

We use existing cache datasets and python bindings to libCacheSim to perform model inference during each cache miss. We train two models using LSTM and GRU as base models, based on the following features/labels:

Name Description
Feature ObjectID Unique object identifier
Feature ObjectSize Object size in bytes
Feature TimeSinceLastAccess Logical time since last data access
Feature AccessFrequency Object access frequency over a window of time
Label ReuseScore $1 / {\log({\text{NextAccessTime}})}$

The predicted label is then used alongside a threshold to determine if the data should be admitted or not.

Results Summary

We compare our individual models, integrated with libCacheSim as admission policies, against existing admission policies and related works which also perform cache admission or early eviction strategies. The difference in performance between each solution and an optimal solution (assume cache size equivalent 100% WSS, achieving minimum possible miss ratio) is depicted in the bar plots shown below:

  • Note: A smaller bar is more desirable, indicating the solution is closer to maximum achievable miss ratio

Comparison against simple admission schemes:

  • image

Comparison against SOTA admission and early eviction strategies:

  • image

Repository Structure

Directory Content Description
Models Contains trained model weights
Util Contains our Jupyter notebooks which we use to explore datasets, features, and also our training loops
Util/Parse Parsing utilities for pulling and parsing oracleGeneral format traces into convenient pd.DataFrame and CSV formats
Util/Simulation Contains our scripts for running simulations in bulk, both existing solutions and our solutions
Visuals Contains simple scripts which we use to generate visual aids for writing