Quantize KV Cache of TabPFN-3 run with fit_mode="fit_with_cache" by bejaeger · Pull Request #983 · PriorLabs/TabPFN

bejaeger · 2026-05-28T09:16:37Z

No description provided.

bejaeger · 2026-05-28T09:16:38Z

This change is part of the following stack:

Quantize KV Cache of TabPFN-3 run with fit_mode="fit_with_cache" #983 ◀

_{Change managed by git-spice.}

gemini-code-assist

Code Review

This pull request introduces per-tensor symmetric int8 quantization for the KV cache in TabPFN-3 models to reduce memory footprint during inference with minimal accuracy loss. The feedback suggests using a fully symmetric range of [-127, 127] for int8 quantization to prevent asymmetry, and updating the type annotations in the attention layer's forward method signature to include QuantizedKVCacheEntry to ensure static type checking safety.

priorphil

Nice!

priorphil

Just to double check, there's no native pytorch quantized tensor that would de-quantize on the fly?

Quantize KV Cache of TabPFN-3 run with fit_mode="fit_with_cache"

01d30d4

bejaeger requested a review from a team as a code owner May 28, 2026 09:16

bejaeger requested review from eliott-kalfon and removed request for a team and eliott-kalfon May 28, 2026 09:16

rename changelog

971812f

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Comment thread src/tabpfn/architectures/kv_cache.py

Comment thread src/tabpfn/architectures/tabpfn_v3.py

fix comment

13d3412

bejaeger requested a review from priorphil May 28, 2026 09:19

priorphil approved these changes May 28, 2026

View reviewed changes

Comment thread src/tabpfn/architectures/kv_cache.py

Comment thread tests/test_classifier_interface.py Outdated

Comment thread tests/test_regressor_interface.py Outdated

priorphil reviewed May 28, 2026

View reviewed changes

bejaeger added 2 commits May 28, 2026 15:23

revision

88a2001

larger tolerance

a95e5c3

bejaeger added this pull request to the merge queue May 29, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026

increase tolerance more because of flaky test

2cf750c

bejaeger added this pull request to the merge queue May 29, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026

bejaeger added 2 commits May 29, 2026 17:27

bump up test tolerance

11a44dd

bump up test tolerance

83eb761

bejaeger added this pull request to the merge queue May 29, 2026

Merged via the queue into main with commit 4dca8d7 May 29, 2026
15 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize KV Cache of TabPFN-3 run with fit_mode="fit_with_cache"#983

Quantize KV Cache of TabPFN-3 run with fit_mode="fit_with_cache"#983
bejaeger merged 8 commits into
mainfrom
ben/kv-cache-quantization

bejaeger commented May 28, 2026 •

edited

Loading

Uh oh!

bejaeger commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

priorphil left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

priorphil left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bejaeger commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bejaeger commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

priorphil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

priorphil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bejaeger commented May 28, 2026 •

edited

Loading