Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request)#1166
Open
Christopher-Lee-McClendon wants to merge 2 commits intoopenai:mainfrom
Open
Conversation
- 10-layer, 512D, E2E TTT-Linear + 1-step FlowRefiner - val_bpb 1.13472408 (int6 sliding window, stride=64, seed=42) - Artifact: 15,199,107 bytes (800K headroom under 16MB cap) - BigramHash(1536), LeakyReLU(0.5)², mixed int6/int8 + lzma - Includes three-variant size-quality comparison (11L/10L/int5) - Trained on 2×A100 PCIe 40GB, 7185 steps, ~2.2 hours
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Non-Record Submission: 10L E2E TTT-Linear + FlowRefiner (E2E was a README request)
val_bpb: 1.1347 (int6 sliding window, stride=64, seed=42) | 15.20 MB artifact | 2×A100 PCIe 40GB
Summary
10-layer transformer with end-to-end TTT-Linear refinement and a 1-step FlowRefiner, compressed to fit under the 16 MB artifact cap. The lightweight FlowRefiner is inspired in part by the FLOWR paper (arXiv:2504.10564), which uses learned flow-matching vector fields with Euler-style transport updates for efficient refinement; here we adapt that idea into a tiny hidden-state refiner rather than a pocket-conditioned 3D ligand generator.
Key Results
Three-Variant Comparison (supplementary)
Prior 11L Ablations on the Same Refiner Pair
These are earlier supporting runs on the same E2E-TTT / FlowRefiner pair from
experiments_pr549/rather than fresh 10-layer ablations for the legal submission:Synergy Note
In that earlier 11-layer study, FlowRefiner alone regressed after quantization, while the combined E2E-TTT + Flow model was best. The additive expectation from the isolated deltas is 1.12505247 BPB, whereas the actual combined run reached 1.12344104, a 0.00161 BPB improvement over additive expectation. We treat this as evidence that FlowRefiner is most useful when paired with TTT, while avoiding the claim that the same four-way ablation has already been rerun for the present 10-layer legal artifact.
Architecture
Credits
Built on PR #549 (abaybektursun) and contributions from PR #65 (aquariouseworkman), PR #69 (TevBenji), PR #187 (Idan3011), PR #265 / PR #374 (unnir), PR #315 (jfprincz), PR #77 (samacqua), PR #50 (mattqlf), PR #76 (unixmadtoonslab), and the modded-nanogpt baseline. The flow-inspired framing for the hidden-state refiner was also informed by FLOWR (Cremer et al., arXiv:2504.10564).
See
README.mdfor the detailed writeup, provenance paths to the prior 11-layer ablation logs, and supplementary variant comparison.