11L AttnRes + Gated Attention + Looped Blocks + EMA + Cosine + QAT by Neopolita · Pull Request #607 · openai/parameter-golf

Neopolita · 2026-03-24T09:55:53Z

Summary

Block Attention Residuals (arXiv 2603.15031) replacing fixed skip_weights with learned depth routing
Per-head gated attention preventing attention-sink pathology
Looped middle blocks (layers 4-7 x2) for zero-param compute depth
EMA (0.995), cosine LR decay, QAT (last 15%)
11L, 3x MLP, 26.5M params, ~13.7MB artifact

Status

H100 run pending — requesting compute credits. Local M4 Max validation (500 steps): val_bpb 1.475 (float).

Test plan

Run on 8x H100 for 10 minutes
Update submission.json with final val_bpb and bytes_total
Add training log
Mark PR as ready for review

… Cosine + QAT Architecture: 11 layers, 3x MLP, Block Attention Residuals (replacing skip_weights), per-head gated attention, looped middle blocks (layers 4-7 x2). Training: EMA (0.995), cosine LR decay, QAT (last 15%). 26.5M params, ~13.7MB artifact. H100 run pending.

Tqela · 2026-03-24T12:18:00Z

use Flash attention 4 or 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

11L AttnRes + Gated Attention + Looped Blocks + EMA + Cosine + QAT#607

11L AttnRes + Gated Attention + Looped Blocks + EMA + Cosine + QAT#607
Neopolita wants to merge 1 commit intoopenai:mainfrom
Neopolita:submission/11L-attnres-gated-loop-ema-cos-qat

Neopolita commented Mar 24, 2026

Uh oh!

Tqela commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Neopolita commented Mar 24, 2026

Summary

Status

Test plan

Uh oh!

Tqela commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants