increasing precision tolerance#3060
Conversation
Greptile SummaryThis PR relaxes the
Confidence Score: 5/5Safe to merge — only a test tolerance constant and its explanatory comment are changed; no production or library code is touched. The change is limited to widening a single fp16 tolerance pair from 1e-2 to 1.5e-2 in a test helper. The new value (0.015) gives ~40% headroom above the observed worst-case difference (~0.0107) on sm80, the comment clearly traces the reasoning, and all other dtype/backend/head-dim branches are unaffected. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[get_tols called] --> B{module == TransformerLayer?}
B -- No --> C[DotProductAttention tols\nfp16: 1e-3 / bf16: 1e-2]
B -- Yes --> D{head_dim_qk <= 128?}
D -- Yes --> E[fp16: 5e-3\nbf16: 3.5e-2]
D -- No --> F{backend == UnfusedAttention?}
F -- Yes --> G[fp16: 1.6e-2\nbf16: 1.2e-1]
F -- No --> H["fp16: 1.5e-2 ← widened from 1e-2\nbf16: 8e-2\n(FlashAttention / FusedAttention, head_dim=256 on sm80)"]
Reviews (1): Last reviewed commit: "increasing precision tolerance" | Re-trigger Greptile |
|
/te-ci pytorch L0 |
Here’s a cleaner Markdown rewrite of your PR description:
Description
test_kv_cachecompares full-sequence attention against incremental KV-cache decoding. In theTransformerLayerconfiguration wherehead_dim > 128infp16, these two execution paths use different kernels and masking strategies (e.g.,causalvs.padding_causal_bottom_right, and full-matrix vs. single-query-row kernels). As a result, their outputs diverge slightly due to accumulatedfp16rounding differences.On Ampere, this divergence can reach the current tolerance threshold in rare cases, producing a spurious failure. In one observed instance, a single element out of 4096 showed an absolute difference of ~0.0107, which narrowly exceeds the existing
1e-2tolerance.This change slightly relaxes the
fp16tolerance for the affected configuration to make the test robust across architectures. No runtime or library code is modified.Fixes: N/A (spurious
test_kv_cachefailure on sm80 / fp16 / head_dim=256)Type of change
Checklist
Notes
bfloat16tolerance in this branch is unchanged, as it was not failing.Diff summary