Skip to content

Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request)#1166

Open
Christopher-Lee-McClendon wants to merge 2 commits intoopenai:mainfrom
Christopher-Lee-McClendon:submission/10L-e2e-ttt-flow-refiner
Open

Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request)#1166
Christopher-Lee-McClendon wants to merge 2 commits intoopenai:mainfrom
Christopher-Lee-McClendon:submission/10L-e2e-ttt-flow-refiner

Conversation

@Christopher-Lee-McClendon
Copy link
Copy Markdown

@Christopher-Lee-McClendon Christopher-Lee-McClendon commented Mar 31, 2026

Non-Record Submission: 10L E2E TTT-Linear + FlowRefiner (E2E was a README request)

val_bpb: 1.1347 (int6 sliding window, stride=64, seed=42) | 15.20 MB artifact | 2×A100 PCIe 40GB

Summary

10-layer transformer with end-to-end TTT-Linear refinement and a 1-step FlowRefiner, compressed to fit under the 16 MB artifact cap. The lightweight FlowRefiner is inspired in part by the FLOWR paper (arXiv:2504.10564), which uses learned flow-matching vector fields with Euler-style transport updates for efficient refinement; here we adapt that idea into a tiny hidden-state refiner rather than a pocket-conditioned 3D ligand generator.

Key Results

Metric Value
Int6 sliding window BPB (stride=64) 1.13472408
Total artifact size 15,199,107 bytes (800K headroom)
Model params 25.7M base + 1.18M refiners
Training 7,185 steps, ~2.2 hours, 2×A100 PCIe 40GB
Quantization int6/int8 per-row + lzma-6

Three-Variant Comparison (supplementary)

Variant Layers val_bpb (sw) Total Size Status
A: 11L + 60% warmdown 11 1.1236 16.68 MB Over budget
B: 10L (this submission) 10 1.1347 15.20 MB Legal
C: 11L + int5 MLP 11 1.1507 14.30 MB Legal

Prior 11L Ablations on the Same Refiner Pair

These are earlier supporting runs on the same E2E-TTT / FlowRefiner pair from experiments_pr549/ rather than fresh 10-layer ablations for the legal submission:

Prior 11L run Sliding BPB Δ vs 11L baseline
Baseline 1.12440473
+ E2E-TTT only 1.12414225 -0.00026
+ Flow only 1.12531495 +0.00091
+ Both (Combined) 1.12344104 -0.00096

Synergy Note

In that earlier 11-layer study, FlowRefiner alone regressed after quantization, while the combined E2E-TTT + Flow model was best. The additive expectation from the isolated deltas is 1.12505247 BPB, whereas the actual combined run reached 1.12344104, a 0.00161 BPB improvement over additive expectation. We treat this as evidence that FlowRefiner is most useful when paired with TTT, while avoiding the claim that the same four-way ablation has already been rerun for the present 10-layer legal artifact.

Architecture

  • 10 layers, 512D, 8H/4KV (GQA), 3×MLP LeakyReLU(0.5)²
  • E2E TTT-Linear (1.08M params): per-head inner-loop SGD during train+eval
  • 1-step FlowRefiner (98K params): latent-space flow matching
  • BigramHash(1536), XSA, U-Net skips, VE128, Partial RoPE, SmearGate
  • EMA + SWA + Late QAT

Credits

Built on PR #549 (abaybektursun) and contributions from PR #65 (aquariouseworkman), PR #69 (TevBenji), PR #187 (Idan3011), PR #265 / PR #374 (unnir), PR #315 (jfprincz), PR #77 (samacqua), PR #50 (mattqlf), PR #76 (unixmadtoonslab), and the modded-nanogpt baseline. The flow-inspired framing for the hidden-state refiner was also informed by FLOWR (Cremer et al., arXiv:2504.10564).

See README.md for the detailed writeup, provenance paths to the prior 11-layer ablation logs, and supplementary variant comparison.

- 10-layer, 512D, E2E TTT-Linear + 1-step FlowRefiner
- val_bpb 1.13472408 (int6 sliding window, stride=64, seed=42)
- Artifact: 15,199,107 bytes (800K headroom under 16MB cap)
- BigramHash(1536), LeakyReLU(0.5)², mixed int6/int8 + lzma
- Includes three-variant size-quality comparison (11L/10L/int5)
- Trained on 2×A100 PCIe 40GB, 7185 steps, ~2.2 hours
@Christopher-Lee-McClendon Christopher-Lee-McClendon changed the title Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request) Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant