Skip to content

Record: LeakyReLU(0.5)² + Per-Document LoRA TTT (mean val_bpb=0.9443, 3 seeds)#620

Closed
robinojw wants to merge 1 commit intoopenai:mainfrom
robinojw:submission/leakyrelu-lora-ttt
Closed

Record: LeakyReLU(0.5)² + Per-Document LoRA TTT (mean val_bpb=0.9443, 3 seeds)#620
robinojw wants to merge 1 commit intoopenai:mainfrom
robinojw:submission/leakyrelu-lora-ttt

Conversation

@robinojw
Copy link

Summary

  • Mean val_bpb: 0.9443 (3 seeds, std: 0.0023)
  • 11L U-Net (512dim, 27M params) with LeakyReLU(0.5)² activation
  • Per-document LoRA TTT (rank-8, 3 epochs, chunk-256, docs ≥512 tokens)
  • SmearGate, BigramHash, depth-scaled residuals, SWA, Muon+WD, int8+zstd-22
  • Artifact: 15,430,887 B (96.4% of 16MB limit)

Results

Seed val_bpb Steps Step Avg
1337 0.9461 7008 85.62ms
42 0.9450 7008 85.62ms
2025 0.9417 6984 85.98ms
Mean 0.9443

Key Technique

LeakyReLU(0.5)²: Single-line activation swap — F.leaky_relu(x, 0.5) replacing torch.relu(x). Preserves negative gradient flow, prevents dead neurons in the squared activation.

Known Issue

TTT scoring currently only records on the final epoch. Epochs 1-2 train without scoring, so the LoRA has seen tokens before scoring them in epoch 3. Fix is a 1-line change (remove if epoch == args.ttt_epochs - 1: guard). See README for details.

Hardware

8× H100 SXM 80GB, PyTorch 2.9.1+cu128, RunPod Secure Cloud. ~17 min/seed.

Environment Variables

TTT_EPOCHS=3 TTT_MIN_DOC_LEN=512

@valerio-oai
Copy link
Contributor

As you mentioned in the issue thread, I think this approach is disallowed, so closing this PR for now such that people don't copy the TTT section of this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants