[training] fix flops computation#3947
Conversation
Review -- fix flops computationThe two fixes are directionally correct:
1. Existing unit tests will break (blocking)
2. W >= T inconsistency (see inline comment) 3. Missing test for non-silu GLU (the motivating fix) Suggested test cases
|
99b97b1 to
ad80a0a
Compare
The factor 3 applies to any GLU, not just SWIGLU. Signed-off-by: Jeremi Piotrowski <jpiotrowski@nvidia.com>
For models like GPT-OSS that use SWA and window size is smaller than sequence length, the division by 2 in the current formula in MBridge is wrong and undercounts FLOPS. Removing the division makes the approximation wrong for models with a large window size, eg. Gemma2. To make things work correctly in all cases expand the full formula. Signed-off-by: Jeremi Piotrowski <jpiotrowski@nvidia.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@nvidia.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@nvidia.com>
ad80a0a to
964c3c7
Compare
Did all 3. |
|
@claude do you want to review again |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 8729946 |
What does this PR do ?
This PR fixes two discrepancies found when analyzing GPT-OSS FLOPs numbers coming from MBridge:
Changelog
GitHub Actions CI
See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information