Skip to content

Submission: 12L Int5-MLP BigramHash10K EMA (1.1476 BPB)#592

Open
Skytuhua wants to merge 1 commit intoopenai:mainfrom
Skytuhua:submission/12L-int5mlp-bigramhash10k-ema-1.1476
Open

Submission: 12L Int5-MLP BigramHash10K EMA (1.1476 BPB)#592
Skytuhua wants to merge 1 commit intoopenai:mainfrom
Skytuhua:submission/12L-int5mlp-bigramhash10k-ema-1.1476

Conversation

@Skytuhua
Copy link
Copy Markdown

@Skytuhua Skytuhua commented Mar 24, 2026

Summary

  • 12-layer GPT with mixed Int5/Int6 quantization (MLP=Int5, Attn=Int6)
  • BigramHash(10240) expanded embeddings
  • EMA(0.997) + SWA + GPTQ-lite clip search + late QAT
  • SmearGate, OrthoInit, XSA last 4 layers, partial RoPE (16 dims)
  • Sliding window eval stride=64
  • Built on PR#414 SOTA stack

Results

Metric Value
val_bpb (sliding window s64) 1.14760365
val_loss 1.93767556
Artifact size 15,497,769 bytes
Training steps 4973 (600s wallclock cap)
Hardware 8x H100 SXM
Seed 1337

Test plan

  • Full training run on 8xH100 SXM (600s wallclock)
  • Int6+zstd quantization roundtrip verified
  • Sliding window evaluation (stride=64)
  • Artifact under 16MB limit (15.5MB)

12-layer GPT with mixed Int5/Int6 quantization, BigramHash(10240),
EMA(0.997), GPTQ-lite clip search, late QAT, SmearGate, XSA last 4,
partial RoPE, sliding window eval stride=64. Built on PR#414 SOTA stack.

val_bpb: 1.14760365 (sliding window s64)
artifact: 15,497,769 bytes (under 16MB)
trained on 8xH100 SXM, seed 1337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant