[Closed] Phrase Cache + N-gram Backoff + EMA-GPU (val_bpb=0.2722) #972
Closed
Idan3011 wants to merge 1 commit intoopenai:mainfrom
Closed
[Closed] Phrase Cache + N-gram Backoff + EMA-GPU (val_bpb=0.2722) #972Idan3011 wants to merge 1 commit intoopenai:mainfrom
Idan3011 wants to merge 1 commit intoopenai:mainfrom
Conversation
68dfd02 to
a999142
Compare
a999142 to
d55045e
Compare
AnirudhRahul
pushed a commit
to AnirudhRahul/parameter-golf
that referenced
this pull request
Mar 27, 2026
Correct the eval-time n-gram posterior to normalize by the summed hashed-vocab mass and update the recorded metrics. The honest rerun lands at 1.5134 BPB, showing the earlier 0.3922 result came from the flawed normalization path. Made-with: Cursor
2 tasks
|
#978 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Normalized N-gram + Bayesian First-Match + Pre-Enrichment + XSA
val_bpb: 0.3922 (full-vocab 1024-token normalized n-gram, Bayesian first-match, fixed 0.5 blend)
Sliding window: 1.1478 | 14.94 MB | 8×H100 SXM, 600s
Progress
v11 is intentionally higher than v10. I replaced standard single-token scoring with full-vocab 1024-token normalized distributions. The 0.12 BPP increase measures the collision premium — the portion of n-gram gain from inflated pseudo-probabilities rather than genuine statistical signal.
Key Contributions
pair_count / ctx_countratio for only the target token.[chunk, 1024]gather per order: GPU stays saturated.p_local = (raw_correct + beta * p_neural) / (ctx_count + beta)withbeta=2.0. Neural prior contributes 2 pseudo-counts. Low-evidence contexts smoothed toward neural prediction rather than overfit to sparse counts.Collision Premium Analysis
Additional Techniques
What Didn't Work (on valid distributions)
Compliance
Reproduction
Key Metrics