GolfStudent v2 14L: d=352, Value Residuals, GPTQ-lite, Schedule-Free, Muon+EMA by whitestone1121-web · Pull Request #604 · openai/parameter-golf

whitestone1121-web · 2026-03-24T07:42:35Z

GolfStudent v2 — 16MB Hybrid LM

Architecture: d=352, L=14 (10x GatedMLP + 4x Attention every 3rd layer), vocab=1024, weight-tied embedding/lm_head, SwiGLU FFN (3x expansion), RoPE on attention layers, orthogonal weight init

v2 improvements over v1:

d=352 (from 288) — +65% model capacity, ~15MB INT8+zlib (94% of 16MB budget)
Value Residuals — learned scalar skip gates (init=0, tanh-gated) every 3 blocks
GPTQ-lite — 5 clip percentile candidates per row, min reconstruction MSE INT8
Schedule-Free final 120s — constant LR floor (10%) + faster EMA (decay=0.97) instead of LR→0 warmdown

Training:

Muon optimizer (momentum 0.85→0.99 warmup over 1500 steps, WD=0.04) for matrix params
Adam (fused) for embeddings/scalars
EMA decay=0.997, updated every step
Wallclock-aware schedule: cosine + schedule-free final 120s
Grad clip=0.3

Quantization: Per-row INT8 GPTQ-lite (5 clip percentiles, min MSE) + zlib level=9
Size: ~15.06MB / 16MB (94.1% budget)

…Muon+EMA, INT8+zlib

whitestone1121-web · 2026-03-25T01:07:16Z

Updated for v2: Architecture is now d=352 (from 288), adding Value Residuals (learned tanh-gated skip connections every 3 blocks, init=0) and GPTQ-lite INT8 (5 clip percentile candidates per row, min reconstruction MSE). No distillation - pure CE on FineWeb binary shards. Quantization + zlib happens after the wallclock timer exits, matching standard contest format. Dry-run confirms ~15.06MB (94.1% of 16MB budget).

…patibility

…hedule tuning

… context, and zstd compression

…6 Embeddings)

whitestone1121-web added 4 commits March 24, 2026 00:30

Add GolfStudent: 14L Hybrid (GatedMLP+Attention), d=288, vocab=1024, …

233f760

…Muon+EMA, INT8+zlib

feat: v2 — d=304, value residuals, GPTQ-lite, schedule-free 120s

c444acc

feat: v2 d=352 VR GPTQ-lite schedule-free

8ffc714

chore: sync docstring to v2 config

2b2b9a6

whitestone1121-web changed the title ~~GolfStudent 14L: Hybrid GatedMLP+Attn, d=288, vocab=1024, Muon+EMA, INT8+zlib~~ GolfStudent v2 14L: d=352, Value Residuals, GPTQ-lite, Schedule-Free, Muon+EMA Mar 25, 2026

whitestone1121-web mentioned this pull request Mar 25, 2026

Add SignalBrain Evolver to Machine Learning section vinta/awesome-python#2954

Closed

whitestone1121-web added 8 commits March 24, 2026 22:27

fix: strip hidden/bidi Unicode from README and submission.json

7390475

add run_h100.sh for cloud execution

818f83d

fix: add huggingface_hub and datasets to pip install

c0df7c5

fix: remove torch from pip upgrade to preserve RunPod host driver com…

8bdeef0

…patibility

fix: drop 1 layer to reliably hit the 16MB size limit

f9a824e

feat: expertly refine GolfStudent with INT6, LeakyReLU2, and Top-3 sc…

bc96740

…hedule tuning

fix: resolve INT6 degradation with Late-Stage QAT noise, 2048 sliding…

9c3b5bd

… context, and zstd compression

build: finalize elite parameter-golf tuning stack (Partial RoPE + FP1…

ea41d5b

…6 Embeddings)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GolfStudent v2 14L: d=352, Value Residuals, GPTQ-lite, Schedule-Free, Muon+EMA#604

GolfStudent v2 14L: d=352, Value Residuals, GPTQ-lite, Schedule-Free, Muon+EMA#604
whitestone1121-web wants to merge 12 commits intoopenai:mainfrom
whitestone1121-web:feat/alan-samaha-golf

whitestone1121-web commented Mar 24, 2026 •

edited

Loading

Uh oh!

whitestone1121-web commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whitestone1121-web commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GolfStudent v2 — 16MB Hybrid LM

Uh oh!

whitestone1121-web commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

whitestone1121-web commented Mar 24, 2026 •

edited

Loading