Skip to content

perf: Update moe_token_dispatcher_type default to alltoall#2004

Open
parthmannan wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
parthmannan:pmannan/update_moe_default
Open

perf: Update moe_token_dispatcher_type default to alltoall#2004
parthmannan wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
parthmannan:pmannan/update_moe_default

Conversation

@parthmannan
Copy link
Contributor

@parthmannan parthmannan commented Feb 22, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Chores
    • Updated MOE token dispatcher configuration from "allgather" to "alltoall" across multiple example configurations.
    • Updated corresponding test configurations and documentation to reflect the new dispatcher default.

@parthmannan parthmannan requested review from a team as code owners February 22, 2026 21:19
@parthmannan parthmannan requested review from guyueh1 and removed request for a team February 22, 2026 21:19
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

📝 Walkthrough

Walkthrough

The PR updates Megatron MoE token dispatcher configuration from "allgather" to "alltoall" across example configuration files, test files, and documentation, changing the distributed token routing strategy.

Changes

Cohort / File(s) Summary
Example Configurations
examples/configs/distillation_math.yaml, examples/configs/distillation_math_megatron.yaml, examples/configs/dpo.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/grpo_math_1B_megatron.yaml, examples/configs/sft.yaml, examples/configs/sft_openmathinstruct2_megatron.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml, examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml
Updated moe_token_dispatcher_type from "allgather" to "alltoall" in Megatron policy configurations across all example YAML files.
Documentation
nemo_rl/models/policy/__init__.py
Updated documentation comment for MegatronConfig.moe_token_dispatcher_type to reflect new default value of "alltoall".
Test Configurations
tests/unit/models/generation/test_vllm_generation.py, tests/unit/models/megatron/test_megatron_setup.py, tests/unit/models/policy/test_megatron_worker.py
Updated Megatron test configurations to expect moe_token_dispatcher_type value of "alltoall" instead of "allgather".

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

Suggested labels

Performance

Suggested reviewers

  • terrykong
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR changes moe_token_dispatcher_type default from 'allgather' to 'alltoall' across multiple configs, a significant distributed training change affecting performance and convergence, but PR description lacks test results or regression validation. Add test results and performance metrics demonstrating no regressions from the dispatcher type change, including convergence validation and before/after performance numbers.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: updating the default value of moe_token_dispatcher_type from 'allgather' to 'alltoall' across multiple configuration files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@parthmannan parthmannan changed the title Update moe_token_dispatcher_type default to alltoall perf: Update moe_token_dispatcher_type default to alltoall Feb 22, 2026
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_rl/models/policy/__init__.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to 2026.

The file was modified but the header still shows 2025; update it to the current year.

🔧 Suggested fix
-# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+# Copyright (c) 2026, NVIDIA CORPORATION.  All rights reserved.

As per coding guidelines: Add the NVIDIA copyright header (with current year) to all Python files and shell scripts, excluding tests (files under tests/ or test-only scripts).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/models/policy/__init__.py` at line 1, Replace the outdated copyright
year in the top-of-file header comment from 2025 to 2026; locate the header
comment at the beginning of the module (the copyright comment line in
nemo_rl/models/policy/__init__.py) and update the year to "2026" so the NVIDIA
copyright header is current.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemo_rl/models/policy/__init__.py`:
- Line 1: Replace the outdated copyright year in the top-of-file header comment
from 2025 to 2026; locate the header comment at the beginning of the module (the
copyright comment line in nemo_rl/models/policy/__init__.py) and update the year
to "2026" so the NVIDIA copyright header is current.

@parthmannan parthmannan force-pushed the pmannan/update_moe_default branch from 90a737d to 493ddef Compare February 22, 2026 21:49
@parthmannan parthmannan added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant