Skip to content

[Non-Record] Extended Compute Scaling Analysis: 1.0853 BPB at 50K steps (11.5 hours) on 4×A100MIG#1005

Open
OnlyJundong wants to merge 1 commit intoopenai:mainfrom
OnlyJundong:nonrecord/extended-compute-scaling-50k
Open

[Non-Record] Extended Compute Scaling Analysis: 1.0853 BPB at 50K steps (11.5 hours) on 4×A100MIG#1005
OnlyJundong wants to merge 1 commit intoopenai:mainfrom
OnlyJundong:nonrecord/extended-compute-scaling-50k

Conversation

@OnlyJundong
Copy link
Copy Markdown

Summary

This submission is a non-record submission. It studies how the current record-track SOTA (PR #549 by @abaybektursun) scales under extended compute, removing the 10-minute wall-clock constraint. The same architecture and code are trained for 20K–50K steps (5.5-11.5 hours training) on 4×A100 MIG instances (approximately 10× slower per step than 8×H100 SXM).

Results

Best run: 50K steps and 11.5 hours (4×A100 MIG, seed 1337)

Phase val_loss val_bpb Artifact
Pre-TTT (EMA) 1.8469 1.0939 14,348,646
Int6 roundtrip 1.8963 1.1231 14,348,646
Sliding window (s=64) 1.8566 1.0996 14,348,646
Legal TTT 1.8325 1.0853 14,348,646

20K steps and 5.5 hours (4×A100 MIG, 2-seed comparison)

Seed step_avg steps Pre-TTT bpb Post-TTT bpb TTT gain Artifact
1337 828.7ms 20,000 1.1018 1.0957 -0.0061 15,077,933
42 828.8ms 20,000 1.1020 1.0962 -0.0058 15,137,145
Mean 828.8ms 20,000 1.1019 1.0960 (std 0.0004) -0.0060 15,107,539

Plots

BPB vs Steps (ASCII plot)

BPB
4.10 |*
     |
     |
     |
2.50 |
     |
1.26 | *
1.23 |  *
1.22 |   *
1.20 |    *
1.19 |     * * * * *
1.18 |             * *
1.17 |               *
1.16 |                *
1.15 |                 *
1.13 |                  *
1.12 |                   *
1.09 |                    *
     +----+----+----+----+----+-> steps (K)
     0   10   20   30   40   50

     |<early >|<--- plateau --->|<warmdown>|
      (rapid)                    (sharp drop)

Artifact Size vs Steps (ASCII plot)

MB
17.2 |         * * * * * * * * * * * *
16.8 |                               *
16.4 |                                *
16.0 |------------------------------------*--------  16MB limit
15.7 |                                    *
15.1 |                                     *
14.7 |     *                                *
14.1 |  *                                    *
13.1 | *
 4.6 |*
     +----+----+----+----+----+-> steps (K)
     0   10   20   30   40   50

     |<-fits->|<--- OVER 16MB ------>|<fits->|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant