Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
171 commits
Select commit Hold shift + click to select a range
50580e2
updated configuration, namespaces, and gitignore to ignore experiements
Anodyine Feb 18, 2026
d1a7c9d
added code to monitor fragmentation amount over time
Anodyine Feb 19, 2026
1f9b5d4
added reproducible dockerfile setup
Anodyine Mar 19, 2026
c69f4d7
added docker multiuser setup and documentation
Anodyine Mar 20, 2026
bc70a0f
added more docker multiuser docs
Anodyine Mar 21, 2026
78e2cef
Merge pull request #1 from Anodyine/docker-setup-for-multiuser
Anodyine Mar 21, 2026
9d8159d
updated bootstrap scripts
Anodyine Mar 22, 2026
9f44f24
updated enter container script
Anodyine Mar 22, 2026
40c69b0
updated docker multiuser md
Anodyine Mar 22, 2026
08ad424
added plan for metrics gathering
Anodyine Mar 22, 2026
1ad0dd9
added member 3 and 4 plans
Anodyine Mar 23, 2026
3780a49
wrote plan for implementing mla
Anodyine Mar 23, 2026
f58b285
Add cache architecture and resident cache size helpers to ModelConfig
Anodyine Mar 23, 2026
1d3e7d5
added testing framework
Anodyine Mar 24, 2026
81e728c
Refactor vAttention cache sizing to use cache architecture helpers
Anodyine Mar 24, 2026
67f6f30
Centralize cache block byte sizing in ModelConfig~
Anodyine Mar 24, 2026
41a04e7
Add shared cache layout descriptor for vAttention sizing
Anodyine Mar 24, 2026
7ebbe0e
Add shared vAttention cache spec for allocator sizing
Anodyine Mar 24, 2026
320fad5
Add shared vAttention cache spec for allocator sizing
Anodyine Mar 24, 2026
30187f1
Add explicit MLA attention spec to ModelConfig
Anodyine Mar 24, 2026
e875fa9
Add resident cache component specs for dense KV and MLA
Anodyine Mar 24, 2026
650b01c
Add structured extension exports for vAttention specs
Anodyine Mar 24, 2026
d2f4f00
Add resident cache components to vAttention cache spec
Anodyine Mar 24, 2026
ed4b106
Validate vAttention cache and init specs with invariants
Anodyine Mar 24, 2026
073bfdb
Add explicit extension init modes for dense KV and MLA
Anodyine Mar 24, 2026
3796e58
Add explicit extension init requests for vAttention specs
Anodyine Mar 24, 2026
d8eb7f8
Add vAttention init dispatcher for extension request modes
Anodyine Mar 24, 2026
01a77e2
updated plan
Anodyine Mar 24, 2026
98f6940
updated plan
Anodyine Mar 24, 2026
85b95c5
Add tensor-parallel attention specs for dense KV and MLA
Anodyine Mar 24, 2026
423e242
Add tensor-parallel metadata to vAttention cache spec
Anodyine Mar 24, 2026
becc1f8
added detailed dev log for 3-23
Anodyine Mar 24, 2026
3714bb5
Add component-spec MLA vAttention init path
Anodyine Mar 24, 2026
ab8e664
Refactor vAttention allocator sizing for resident MLA cache
Anodyine Mar 24, 2026
4242433
Register DeepSeek-V2 config and model scaffolding
Anodyine Mar 24, 2026
d9f075a
Add DeepSeek-V2 MLA model scaffold and shape tests
Anodyine Mar 24, 2026
35f7862
Add DeepSeek-V2 MLA cache and reconstruction helpers
Anodyine Mar 24, 2026
91f2843
Add contiguous DeepSeek MLA reference attention path
Anodyine Mar 24, 2026
c1e95fa
Add DeepSeek MLA projection path from hidden states
Anodyine Mar 24, 2026
e75f007
Wire DeepSeek MLA reference attention through the model stack
Anodyine Mar 24, 2026
98ba203
Add batched DeepSeek MLA reference attention checks
Anodyine Mar 24, 2026
e9d8537
Add DeepSeek MLA attention backend bridge
Anodyine Mar 24, 2026
626ff7a
Bridge DeepSeek MLA into the Sarathi attention wrapper
Anodyine Mar 24, 2026
7027412
Add DeepSeek MLA layer-cache objects for wrapper execution
Anodyine Mar 24, 2026
f37968f
Add base MLA execution hook for attention wrappers
Anodyine Mar 24, 2026
1ea559e
Add MLA execution path to the vAttention FlashAttention wrapper
Anodyine Mar 24, 2026
81b9721
Let MLA wrappers reuse resident cache from layer-cache objects
Anodyine Mar 24, 2026
d422a94
Add componentized MLA runtime cache support to the FA wrapper
Anodyine Mar 24, 2026
a3b69db
Format MLA component caches for the vAttention runtime path
Anodyine Mar 24, 2026
b492575
Add DeepSeek MLA runtime-cache factory and integration checks
Anodyine Mar 24, 2026
de15503
Add paged-vs-contiguous MLA parity checks for FA wrapper
Anodyine Mar 24, 2026
40989d9
Add model-stack MLA parity checks for paged wrapper path
Anodyine Mar 24, 2026
1090b11
Add MLA-aware runner dispatch for wrapper execution
Anodyine Mar 24, 2026
6eb7416
Thread MLA runner kwargs through BaseWorker execution
Anodyine Mar 24, 2026
9a00235
Add worker MLA wrapper execution entrypoint
Anodyine Mar 24, 2026
7af825e
Add MLA resident-cache accounting stats to the worker path
Anodyine Mar 24, 2026
960289a
Track live MLA cache state in vATTN accounting stats
Anodyine Mar 24, 2026
c0abf97
Track MLA step scheduling state in vATTN cache stats
Anodyine Mar 24, 2026
fceff01
Record MLA cache transition history in the worker path
Anodyine Mar 24, 2026
8888784
updated plan
Anodyine Mar 24, 2026
787ca05
Add MLA cache transition deltas for runtime accounting
Anodyine Mar 24, 2026
588dc73
Add MLA cache sweep summaries for runtime accounting
Anodyine Mar 24, 2026
3599cdd
Add MLA sweep aggregation for runtime cache accounting
Anodyine Mar 24, 2026
6b6e2b3
Add MLA sweep-family matrix summaries for runtime accounting
Anodyine Mar 24, 2026
896d556
Add MLA sweep-matrix validation gates for runtime accounting
Anodyine Mar 24, 2026
14a3d12
Add MLA validation-suite gates for bounded accounting sweeps
Anodyine Mar 24, 2026
b52bce9
Add named profile checks for bounded MLA accounting suites
Anodyine Mar 24, 2026
e52e8e6
Add named MLA validation profile registry and worker checks
Anodyine Mar 24, 2026
21f3b56
Add multi-profile MLA validation selection for runtime suites
Anodyine Mar 24, 2026
797ad26
Add MLA readiness recommendations for bounded validation suites
Anodyine Mar 24, 2026
f360727
Add non-MoE MLP path to the DeepSeek model scaffold
Anodyine Mar 24, 2026
d0f9c71
Add DeepSeek token embedding and logits scaffold path
Anodyine Mar 24, 2026
a360107
Add installed scaffold-weight execution for DeepSeek
Anodyine Mar 24, 2026
022bb14
Add structured scaffold weight loading for DeepSeek
Anodyine Mar 24, 2026
9b96192
Add runner and worker support for installed DeepSeek scaffold weights
Anodyine Mar 24, 2026
b2084c3
Add loaded DeepSeek scaffold execution through the runner
Anodyine Mar 24, 2026
1d3e1a8
Make DeepSeek scaffold loading aware of pipeline layer offsets
Anodyine Mar 24, 2026
336e7d7
Add pipeline-aware DeepSeek scaffold runner integration checks
Anodyine Mar 24, 2026
59008a2
Add partitioned DeepSeek scaffold checks at the worker seam
Anodyine Mar 24, 2026
1f2d643
Add scaffold norm modules and norm-weight loading for DeepSeek
Anodyine Mar 24, 2026
fb33ac6
updated plan and generated dev log
Anodyine Mar 24, 2026
d8f8771
updated plan and generated dev log
Anodyine Mar 24, 2026
78dfd49
Add DeepSeek scaffold prefill/decode helpers
Anodyine Mar 24, 2026
3fcda15
Add runner and worker token-step scaffold APIs
Anodyine Mar 24, 2026
7ce0f44
Add greedy scaffold generation helper
Anodyine Mar 24, 2026
84eb4b6
Add runner and worker greedy scaffold generation
Anodyine Mar 24, 2026
2b3ea36
Add DeepSeek scaffold smoke harness
Anodyine Mar 24, 2026
b18b7f7
Add scaffold smoke parity comparison
Anodyine Mar 24, 2026
96d841c
Broaden DeepSeek scaffold load aliases
Anodyine Mar 24, 2026
abaf7b7
Add scaffold smoke blocker reporting
Anodyine Mar 24, 2026
cbca5d2
Fix paged MLA FlashAttention value layout
Anodyine Mar 24, 2026
459cb87
Support DeepSeek MLA projection aliases
Anodyine Mar 24, 2026
e5a60cf
Use DeepSeek-style aliases in smoke harness
Anodyine Mar 24, 2026
d611ad0
Load DeepSeek scaffold checkpoints from files
Anodyine Mar 24, 2026
8a4a4a5
Use checkpoint files in smoke harness
Anodyine Mar 24, 2026
5f6a748
Add safetensors scaffold smoke coverage
Anodyine Mar 24, 2026
55fd7fa
Add DeepSeek q-lora query scaffold support
Anodyine Mar 24, 2026
d9d9ee1
Exercise q-lora DeepSeek smoke generation
Anodyine Mar 24, 2026
3fc2cb3
Support DeepSeek KV latent layernorm aliases
Anodyine Mar 24, 2026
1e13da3
Add HF-style DeepSeek scaffold checkpoint smoke
Anodyine Mar 24, 2026
80259c1
Detect unsupported DeepSeek MoE checkpoints
Anodyine Mar 24, 2026
6cc0059
Add DeepSeek checkpoint compatibility probe
Anodyine Mar 24, 2026
f6499a1
Add bounded DeepSeek MoE scaffold helpers
Anodyine Mar 24, 2026
1c8e8c9
Wire bounded MoE through DeepSeek model path
Anodyine Mar 24, 2026
ca43d94
Load bounded DeepSeek MoE scaffold weights
Anodyine Mar 24, 2026
5924d5f
Extend DeepSeek smoke path for bounded MoE
Anodyine Mar 24, 2026
41e0d78
Probe real DeepSeek scaffold loadability
Anodyine Mar 24, 2026
c3acd43
Report actual DeepSeek smoke artifact format
Anodyine Mar 24, 2026
954c1f5
Emit DeepSeek KV latent layernorm in smoke checkpoints
Anodyine Mar 24, 2026
1e06970
Validate DeepSeek checkpoint config consistency
Anodyine Mar 24, 2026
61ffcfa
Emit more realistic DeepSeek scaffold config
Anodyine Mar 24, 2026
138f38f
added dev log
Anodyine Mar 24, 2026
bb40a4e
Emit HF-style DeepSeek scaffold weight names
Anodyine Mar 24, 2026
69dfc40
Use realistic DeepSeek lm_head key in HF smoke
Anodyine Mar 24, 2026
c4b1781
Persist scaffold artifacts and report missing probes cleanly
Anodyine Mar 24, 2026
45d11c2
Align DeepSeek loader with server weight contract
Anodyine Mar 24, 2026
cbcaaf3
Exercise DeepSeek through generic model loader
Anodyine Mar 24, 2026
7d226a0
Validate paged DeepSeek loader runtime parity
Anodyine Mar 24, 2026
fe64125
Slice DeepSeek attention tensors by TP rank
Anodyine Mar 24, 2026
c04eb4e
Slice DeepSeek MLP and MoE tensors by TP rank
Anodyine Mar 24, 2026
e686a86
Align DeepSeek smoke with TP-aware checkpoint loads
Anodyine Mar 24, 2026
34b4536
Lazy-load DeepSeek checkpoint weight utils
Anodyine Mar 24, 2026
f861855
Use TP attention shape in MLA cache formatting
Anodyine Mar 24, 2026
b7dda37
Exercise DeepSeek through ModelRunner paged path
Anodyine Mar 24, 2026
1ae2622
Bootstrap DeepSeek engine startup from scaffold checkpoint
Anodyine Mar 24, 2026
19bc15b
Stabilize DeepSeek vATTN server startup path
Anodyine Mar 24, 2026
409305e
Finish DeepSeek scaffold request loop on server path
Anodyine Mar 24, 2026
aba6a9d
Handle DeepSeek shared-expert TP widths
Anodyine Mar 25, 2026
012d470
updated config
Anodyine Mar 25, 2026
fa9c0f5
Add DeepSeek V2 Lite server wrapper
Anodyine Mar 25, 2026
6d8007f
Scale DeepSeek wrapper context with TP
Anodyine Mar 25, 2026
55470ab
Raise DeepSeek TP4 default context
Anodyine Mar 25, 2026
dd7b546
Fix DeepSeek MLA multi-block virtual reservation
Anodyine Mar 25, 2026
28a00e4
update config and deepseek server start script
Anodyine Mar 25, 2026
808c67a
Default DeepSeek wrapper to validated TP4 settings
Anodyine Mar 25, 2026
54d7a93
Split vAttention fragmentation metrics by meaning
Anodyine Mar 25, 2026
d2a6b8f
Keep vAttention logging off the Python runtime path
Anodyine Mar 25, 2026
e5635a0
added mla optimization plan
Anodyine Mar 25, 2026
11f4680
Merge pull request #2 from Anodyine/josh-metrics-with-plan
Anodyine Mar 25, 2026
4190ebc
Merge pull request #3 from Anodyine/implementing-mla
Anodyine Mar 25, 2026
f6882ec
added user creation script
Anodyine Mar 27, 2026
a96d2e7
updated plan for metrics recording
Anodyine Mar 29, 2026
1cc1914
updated plans
Anodyine Mar 29, 2026
a6671a7
updated plans
Anodyine Mar 29, 2026
357c8b8
updated plans
Anodyine Mar 29, 2026
23e4ea3
updated plans
Anodyine Mar 29, 2026
09aa10a
updated plans
Anodyine Mar 29, 2026
2811b0e
updated plans
Anodyine Mar 31, 2026
fad9a95
added presentation breakdown
Anodyine Mar 31, 2026
77507f2
Add vAttention fragmentation request metrics
JoshFran Apr 3, 2026
129793d
Merge pull request #4 from JoshFran/josh
Anodyine Apr 3, 2026
6c4d3fc
removed print statements and fixed incorrect number of blocks that we…
Anodyine Apr 3, 2026
7ecb458
removed manual flush endpoint
Anodyine Apr 3, 2026
7c859fa
skipped saving metrics on requests that are ignored by the server
Anodyine Apr 3, 2026
2c09238
adding build directory delete to build script
Anodyine Apr 3, 2026
2b78f52
updated plan for michel
Anodyine Apr 3, 2026
e64745b
Merge pull request #7 from Anodyine/issue-3-does-not-skip-ignored-req…
Anodyine Apr 5, 2026
5cee8e1
Merge pull request #6 from Anodyine/issue-2-metrics-can-unintentional…
Anodyine Apr 5, 2026
4372f02
Merge pull request #5 from Anodyine/issue-1-incorrect-number-of-mappe…
Anodyine Apr 5, 2026
5a370df
Merge pull request #8 from Anodyine/josh
Anodyine Apr 5, 2026
e8b181c
updated gitignore
Anodyine Apr 5, 2026
1461f9a
added request sweep script
Anodyine Apr 5, 2026
636ad74
added client request sweep
Anodyine Apr 5, 2026
6244a77
added basic plotting
Anodyine Apr 5, 2026
bd84559
added MLA vs MHA comparison
Anodyine Apr 5, 2026
db51bb7
generated some graphs
Anodyine Apr 5, 2026
04cc61d
Merge pull request #9 from Anodyine/adding-client-request-loop
Anodyine Apr 5, 2026
3a36006
added 4 models graph
Anodyine Apr 5, 2026
3b3151d
added dev log
Anodyine Apr 6, 2026
5ffafbf
Merge pull request #10 from Anodyine/mistral-gqa-to-mla-conversion
Anodyine Apr 6, 2026
8ba680b
Add context-length sweep script for fragmentation analysis
Apr 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.git
.github
.venv
__pycache__
*.pyc
build
dist
*.egg-info
experiments
sarathi-lean/build
pod_attn/build
vattention/build

10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,13 @@ vllm/*.pdf
vattention/dist
vattention/*egg-info
sarathi-lean/*egg-info

experiments/**
server-output/
tmp/

vAttention container command history.md
vAttention Container Setup.md
vAttention host cmd history.md

.codex
Loading