Skip to content

Record: 11L XSA + Mixed INT6 + Adaptive N-gram Cache (2->7 backoff) - val_bpb=0.9631, 3-seed#993

Closed
aerosta wants to merge 1 commit intoopenai:mainfrom
aerosta:my-submission
Closed

Record: 11L XSA + Mixed INT6 + Adaptive N-gram Cache (2->7 backoff) - val_bpb=0.9631, 3-seed#993
aerosta wants to merge 1 commit intoopenai:mainfrom
aerosta:my-submission

Conversation

@aerosta
Copy link
Copy Markdown

@aerosta aerosta commented Mar 28, 2026

11L XSA + Mixed INT6 + Adaptive N-gram Cache (2->7 backoff)

val_bpb: 0.96308303 (3-seed mean, std 0.00035576) | 15,882,569 bytes mean | 8xH100 SXM, 600s

Results (8xH100 SXM, 600s)

Seed Steps Sliding val_bpb Final val_bpb Artifact bytes
1337 6,892 1.12124241 0.96314788 15,879,364
42 6,894 1.12125743 0.96340191 15,884,280
2024 6,897 1.12043283 0.96269931 15,884,064

Mean val_bpb: 0.96308303. Inter-seed std: 0.00035576.

Architecture

Component Setting
Layers 11 (512d, 8Q, 4KV)
MLP 3x with relu2
XSA All 11 layers
Embeddings Tied
Weight averaging EMA + late SWA
Quantization Post-training mixed INT6 + LZMA
Eval Sliding window, stride 64

Adaptive N-gram Cache

Score-first adaptive n-gram cache with backoff orders 2->7.

Backward-looking evaluation order:

  1. Score each window under torch.inference_mode().
  2. Add only already-scored tokens to the cache.
  3. Apply the cache only to later positions and later windows.

No training data is accessed during evaluation.

Parameter Value
Orders 2->7
Adaptive mode sigmoid_raw_entropy
Alpha range [0.05, 0.60]
Hash buckets 4,194,304
Min count 2

Ablation (seed 1337)

Configuration val_bpb
Post-EMA, pre-quant 1.1369
+ INT6 quantization 1.14466175
+ Sliding window (stride 64) 1.12124241
+ Adaptive n-gram cache 0.96314788

Reproducibility

From the records folder:

torchrun --standalone --nproc_per_node=8 train_gpt.py
SEED=42 torchrun --standalone --nproc_per_node=8 train_gpt.py
SEED=2024 torchrun --standalone --nproc_per_node=8 train_gpt.py

Credits

@valerio-oai
Copy link
Copy Markdown
Contributor

Hashed n-gram models in this way are disallowed, so this submission is illegal, apologies.

@aerosta
Copy link
Copy Markdown
Author

aerosta commented Mar 28, 2026

@valerio-oai Noted. Thanks for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants