Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465) by hypery11 · Pull Request #758 · openai/parameter-golf

hypery11 · 2026-03-25T19:22:26Z

Results

Seed	val_bpb
42	1.0467
1337	1.0470
2024	1.0457
Mean	1.0465
Std	0.0007

Artifact: 13.99 MB
Train: 600s on 8xH100 SXM
Eval: ~116s

Method

11-layer transformer with XSA-all (Exclusive Self-Attention on all layers), LeakyReLU(0.5)^2, Value Residual, Gated Attention, BigramHash(10240), SmearGate. GPTQ-lite int6 + zstd-22. EMA(0.997) + Tight SWA + Late QAT.

7-gram backward-looking eval cache (alpha=0.40, 4M buckets). Score-first, deterministic, no TTT.

Architecture builds on community techniques from PRs #609, #549.

8xH100 SXM, train ≤600s
Eval ≤600s (116s)
Artifact ≤16MB (13.99MB)
3-seed validation (std 0.0007)

Seeds: 1.0467 / 1.0470 / 1.0457 (std 0.0007). 11L with XSA-all, LeakyReLU^2, VR, GA, GPTQ-lite int6. 13.99MB artifact. Train 600s, eval 116s.

Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465, 3 seeds)

5ed06ab

Seeds: 1.0467 / 1.0470 / 1.0457 (std 0.0007). 11L with XSA-all, LeakyReLU^2, VR, GA, GPTQ-lite int6. 13.99MB artifact. Train 600s, eval 116s.

sunnypatneedi mentioned this pull request Mar 26, 2026

Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609) #909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)#758

Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)#758
hypery11 wants to merge 1 commit intoopenai:mainfrom
hypery11:submission/2026-03-25_11L_XSA_ngram

hypery11 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hypery11 commented Mar 25, 2026

Results

Method

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant