Skip to content

Pre-training performance issue #148

@Peekaboo-ai

Description

@Peekaboo-ai

I used rum_mlm.py along with the training data you provided for pre-training. After training for 8 epochs with a batch size of 32 across 32 GPUs each with 64GB of VRAM, I noticed that the model's performance remains quite low even as it approaches convergence. Could you kindly share the detailed pre-training script?

{'eval_loss': 5.036761283874512, 'eval_accuracy': 0.2256266404065416, 'eval_f1': 0.2653012523859082, 'eval_mcc': 0.22392924231341627, 'eval_runtime': 16.2194, 'eval_samples_per_second': 1190.672, 'eval_steps_per_second': 1.171, 'epoch': 6.88}
{'loss': 5.0858, 'grad_norm': 0.2914751172065735, 'learning_rate': 6.51907022425929e-06, 'epoch': 6.96}
{'loss': 5.0805, 'grad_norm': 0.2778473198413849, 'learning_rate': 6.007529873956458e-06, 'epoch': 7.04}
{'loss': 5.0823, 'grad_norm': 0.28457939624786377, 'learning_rate': 5.495989523653626e-06, 'epoch': 7.12}
{'loss': 5.082, 'grad_norm': 0.282247930765152, 'learning_rate': 4.984449173350794e-06, 'epoch': 7.2}
{'eval_loss': 5.031910419464111, 'eval_accuracy': 0.22645610628951326, 'eval_f1': 0.2671266363782628, 'eval_mcc': 0.22476370202238263, 'eval_runtime': 16.2135, 'eval_samples_per_second': 1191.109, 'eval_steps_per_second': 1.172, 'epoch': 7.2}
{'loss': 5.0789, 'grad_norm': 0.279897004365921, 'learning_rate': 4.472908823047963e-06, 'epoch': 7.28}
{'loss': 5.0803, 'grad_norm': 0.2841487526893616, 'learning_rate': 3.9613684727451305e-06, 'epoch': 7.37}
{'loss': 5.0788, 'grad_norm': 0.32013562321662903, 'learning_rate': 3.4498281224422987e-06, 'epoch': 7.45}
{'loss': 5.076, 'grad_norm': 0.28553149104118347, 'learning_rate': 2.9382877721394665e-06, 'epoch': 7.53}
{'eval_loss': 5.032684803009033, 'eval_accuracy': 0.22627068077954807, 'eval_f1': 0.266603782932162, 'eval_mcc': 0.22457673959967936, 'eval_runtime': 16.3434, 'eval_samples_per_second': 1181.637, 'eval_steps_per_second': 1.163, 'epoch': 7.53}
{'loss': 5.0784, 'grad_norm': 0.29609060287475586, 'learning_rate': 2.4267474218366343e-06, 'epoch': 7.61}
{'loss': 5.079, 'grad_norm': 0.29126057028770447, 'learning_rate': 1.9152070715338025e-06, 'epoch': 7.69}
{'loss': 5.0747, 'grad_norm': 0.27743813395500183, 'learning_rate': 1.4036667212309707e-06, 'epoch': 7.78}
{'loss': 5.0772, 'grad_norm': 0.2757411301136017, 'learning_rate': 8.921263709281388e-07, 'epoch': 7.86}
{'eval_loss': 5.037484169006348, 'eval_accuracy': 0.22560653778113435, 'eval_f1': 0.2660859160040803, 'eval_mcc': 0.22390840831026426, 'eval_runtime': 16.2032, 'eval_samples_per_second': 1191.866, 'eval_steps_per_second': 1.173, 'epoch': 7.86}
{'loss': 5.0776, 'grad_norm': 0.2676142752170563, 'learning_rate': 3.8058602062530694e-07, 'epoch': 7.94}
{'train_runtime': 81132.2624, 'train_samples_per_second': 616.824, 'train_steps_per_second': 0.602, 'train_loss': 5.26640695736439, 'epoch': 8.0}
07/29/2025 23:01:43 - INFO - main - *** Evaluate ***
07/29/2025 23:01:43 - INFO - main - *** Evaluate ***
07/29/2025 23:01:43 - INFO - main - *** Evaluate ***
***** train metrics *****
epoch = 8.0
total_flos = 32624038352GF
train_loss = 5.2664
train_runtime = 22:32:12.26
train_samples = 6255545
train_samples_per_second = 616.824
train_steps_per_second = 0.602
07/29/2025 23:01:44 - INFO - main - *** Evaluate ***
***** eval metrics *****
epoch = 8.0
eval_accuracy = 0.2252
eval_f1 = 0.2643
eval_loss = 5.0414
eval_mcc = 0.2235
eval_runtime = 0:00:15.66
eval_samples = 19312
eval_samples_per_second = 1233.169
eval_steps_per_second = 1.213
perplexity = 154.6872

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions