Skip to content

Quantize KV Cache of TabPFN-3 run with fit_mode="fit_with_cache"#983

Merged
bejaeger merged 8 commits into
mainfrom
ben/kv-cache-quantization
May 29, 2026
Merged

Quantize KV Cache of TabPFN-3 run with fit_mode="fit_with_cache"#983
bejaeger merged 8 commits into
mainfrom
ben/kv-cache-quantization

Conversation

@bejaeger
Copy link
Copy Markdown
Collaborator

@bejaeger bejaeger commented May 28, 2026

No description provided.

@bejaeger
Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@bejaeger bejaeger requested a review from a team as a code owner May 28, 2026 09:16
@bejaeger bejaeger requested review from eliott-kalfon and removed request for a team and eliott-kalfon May 28, 2026 09:16
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-tensor symmetric int8 quantization for the KV cache in TabPFN-3 models to reduce memory footprint during inference with minimal accuracy loss. The feedback suggests using a fully symmetric range of [-127, 127] for int8 quantization to prevent asymmetry, and updating the type annotations in the attention layer's forward method signature to include QuantizedKVCacheEntry to ensure static type checking safety.

Comment thread src/tabpfn/architectures/kv_cache.py
Comment thread src/tabpfn/architectures/tabpfn_v3.py
@bejaeger bejaeger requested a review from priorphil May 28, 2026 09:19
Copy link
Copy Markdown
Collaborator

@priorphil priorphil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread src/tabpfn/architectures/kv_cache.py
Comment thread tests/test_classifier_interface.py Outdated
Comment thread tests/test_regressor_interface.py Outdated
Copy link
Copy Markdown
Collaborator

@priorphil priorphil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, there's no native pytorch quantized tensor that would de-quantize on the fly?

@bejaeger bejaeger added this pull request to the merge queue May 29, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026
@bejaeger bejaeger added this pull request to the merge queue May 29, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026
@bejaeger bejaeger added this pull request to the merge queue May 29, 2026
Merged via the queue into main with commit 4dca8d7 May 29, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants