Skip to content

[training] feat: Add Bridge support for standalone MTP pipeline stage#3928

Open
yaoyu-33 wants to merge 2 commits into
mainfrom
casey/mtp-standalone-mcore2136
Open

[training] feat: Add Bridge support for standalone MTP pipeline stage#3928
yaoyu-33 wants to merge 2 commits into
mainfrom
casey/mtp-standalone-mcore2136

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

Summary

  • Adds an opt-in DeepSeek-V3 layout helper mode that places MTP layers in a standalone penultimate PP/VPP stage and leaves loss in the final stage.
  • Exports the DeepSeek-V3 layout helper and documents standalone MTP usage with pipeline_model_parallel_layout / m stages.
  • Adds targeted unit coverage for default colocated MTP placement and standalone layout validation against Megatron Core layout parsing.

Context

Validation

  • git diff --check
  • uv run --active --no-sync ruff check src/megatron/bridge/recipes/deepseek/__init__.py src/megatron/bridge/recipes/deepseek/deepseek_v3.py tests/unit_tests/recipes/test_deepseek_recipes.py
  • uv run --active --no-sync ruff format --check src/megatron/bridge/recipes/deepseek/__init__.py src/megatron/bridge/recipes/deepseek/deepseek_v3.py tests/unit_tests/recipes/test_deepseek_recipes.py
  • uv run --active --no-sync pre-commit run --all-files

Unit pytest and training smoke were not run on this workstation.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test ad50a53

getattr(_deepseek_module, name)
for name in getattr(_deepseek_module, "__all__", [])
if callable(getattr(_deepseek_module, name, None))
if "_config" in name and callable(getattr(_deepseek_module, name, None))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this filter works today but is fragile — if a future export contains _config in its name but isn't a recipe builder (e.g., get_default_config_overrides), it would silently enter the parametrized test and likely fail in a confusing way. Consider matching on a suffix like name.endswith("_config") or name.endswith("_config_32nodes"), or adding a _RECIPE_NAMES set for explicit control.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Light Code Review

Clean PR — the new mtp_standalone path is well-isolated, existing behavior is unchanged, and the two unit tests validate the happy path against Megatron Core's PipelineParallelLayerLayout parser.

Minor — test filter fragility (inline comment posted on test_deepseek_recipes.py:38)
The _config in name filter that replaced the old callable(...) filter could silently pick up future non-recipe exports whose names happen to contain _config. An explicit name set or stricter suffix match would be more robust.

Missing test coverage worth adding

  • _build_standalone_mtp_layout with total_stages < 3 — verify the ValueError is raised.
  • _build_standalone_mtp_layout with mtp_layers=0 — verify the ValueError is raised.
  • mtp_standalone=True with mtp_num_layers > 1 (e.g., 3) — confirm multiple MTP entries appear in the penultimate stage.
  • layout explicitly provided alongside mtp_standalone=True — confirm layout takes precedence and mtp_standalone is ignored.

Suggested test cases

No perf tests impacted.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33 yaoyu-33 force-pushed the casey/mtp-standalone-mcore2136 branch from ad50a53 to 47acb24 Compare May 21, 2026 22:35
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 47acb24

@yaoyu-33 yaoyu-33 added area:recipe Training recipes and launch configs feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels May 21, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 74dd4f7

last_layer = ["mtp"] * mtp_layers + ["loss"]
pp_size = model_cfg.pipeline_model_parallel_size or 1
vp_size = model_cfg.virtual_pipeline_model_parallel_size or 1
num_decoder_layers = getattr(model_cfg, "num_layers", DEEPSEEK_V3_NUM_DECODER_LAYERS)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEEPSEEK_V3_NUM_DECODER_LAYERS I don't think we need this global variable, you should always be able to num layers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:recipe Training recipes and launch configs feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant