Improve eval metrics logging by aireenmei · Pull Request #3840 · AI-Hypercomputer/maxtext

aireenmei · 2026-05-07T18:24:08Z

Description

b/509626795

Summary of issue:

When running eval after every train step (eval_interval=1), it's observed that train step time in the logging increases significantly, from ~9.5s -> 11.5s

Root cause analysis:

Major: in the current code, the train step time boundary mistakenly includes the eval loop. So the train step time right before eval includes eval time, causing step time inflation. In most cases people use a bigger eval_interval and only see the inflated step once in many steps and ignore the issue.
Minor: Unlike train step time that always delay reporting the metrics one step to avoid blocking the data loading of next step. In eval, the metrics are calculated right after the last eval step (the only eval step in this case), blocking the data loading of the following training step.

Tests

Test on v5e-32, default 1b model, per_device_batch_size=1

With the old code, step time ~ 1.6s, screenshot
With the new code, step time separated, train ~1s + eval ~0.5s, screenshot. There is slightly in the total time likely due to fixing the minor blocking issue stated above.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-07T18:30:37Z

Codecov Report

❌ Patch coverage is 29.41176% with 72 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/common/metric_logger.py	30.48%	50 Missing and 7 partials ⚠️
...xt/trainers/post_train/sft/train_sft_deprecated.py	20.00%	8 Missing ⚠️
src/maxtext/trainers/pre_train/train.py	33.33%	6 Missing ⚠️
src/maxtext/trainers/post_train/sft/hooks.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-05-07T21:24:00Z

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This Pull Request successfully introduces a buffering mechanism for metrics to overlap I/O and fixes a significant bug where evaluation time was mistakenly included in training step time measurements. The core logic in metric_logger.py is sound, but there are a few critical inconsistencies and potential data quality issues that should be addressed.

🔍 General Feedback

Inconsistent Inflation Fix: While the training step time inflation fix was correctly implemented in the main train.py loop, it was missed in the RL and deprecated SFT trainers.
TensorBoard Data Quality: The new "running" eval metrics are logged using the evaluation loop index as the step number, which will lead to overlapping and confusing data in TensorBoard.
Robustness: A small adjustment to the buffering order in metric_logger.py can prevent the loss of the final training step's metrics when training is stopped due to reaching the target loss.

github-actions

## 📋 Review Summary (Addendum)

Adding the missing inline comments regarding the training step time inflation fix in RL/SFT trainers and the buffering order in metric_logger.py.

github-actions

## 📋 Review Summary (Final Addendum)

Adding the missing inline comments regarding the training step time inflation fix in RL/SFT trainers and the buffering order in metric_logger.py.

aireenmei force-pushed the aireen/eval_metrics branch 2 times, most recently from ed98cc6 to e4edcdc Compare May 7, 2026 21:14

aireenmei marked this pull request as ready for review May 7, 2026 21:14

RissyRan added the gemini-review label May 7, 2026

github-actions Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/maxtext/common/metric_logger.py

github-actions Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/maxtext/common/metric_logger.py Outdated

Comment thread src/maxtext/experimental/rl/grpo_trainer.py

Comment thread src/maxtext/trainers/post_train/sft/train_sft_deprecated.py

aireenmei force-pushed the aireen/eval_metrics branch from e4edcdc to 0b39494 Compare May 7, 2026 22:00

aireenmei force-pushed the aireen/eval_metrics branch 3 times, most recently from 26d1570 to c81f165 Compare May 8, 2026 06:09

Improve eval metrics

8f9b3a9

aireenmei force-pushed the aireen/eval_metrics branch from c81f165 to 8f9b3a9 Compare May 8, 2026 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve eval metrics logging#3840

Improve eval metrics logging#3840
aireenmei wants to merge 1 commit intomainfrom
aireen/eval_metrics

aireenmei commented May 7, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aireenmei commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of issue:

Root cause analysis:

Tests

Checklist

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aireenmei commented May 7, 2026 •

edited

Loading

codecov Bot commented May 7, 2026 •

edited

Loading