[training] feat: Add Bridge support for standalone MTP pipeline stage#3928
[training] feat: Add Bridge support for standalone MTP pipeline stage#3928yaoyu-33 wants to merge 2 commits into
Conversation
|
/ok to test ad50a53 |
| getattr(_deepseek_module, name) | ||
| for name in getattr(_deepseek_module, "__all__", []) | ||
| if callable(getattr(_deepseek_module, name, None)) | ||
| if "_config" in name and callable(getattr(_deepseek_module, name, None)) |
There was a problem hiding this comment.
Nit: this filter works today but is fragile — if a future export contains _config in its name but isn't a recipe builder (e.g., get_default_config_overrides), it would silently enter the parametrized test and likely fail in a confusing way. Consider matching on a suffix like name.endswith("_config") or name.endswith("_config_32nodes"), or adding a _RECIPE_NAMES set for explicit control.
Light Code ReviewClean PR — the new mtp_standalone path is well-isolated, existing behavior is unchanged, and the two unit tests validate the happy path against Megatron Core's PipelineParallelLayerLayout parser. Minor — test filter fragility (inline comment posted on test_deepseek_recipes.py:38) Missing test coverage worth adding
Suggested test cases No perf tests impacted. |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
ad50a53 to
47acb24
Compare
|
/ok to test 47acb24 |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 74dd4f7 |
| last_layer = ["mtp"] * mtp_layers + ["loss"] | ||
| pp_size = model_cfg.pipeline_model_parallel_size or 1 | ||
| vp_size = model_cfg.virtual_pipeline_model_parallel_size or 1 | ||
| num_decoder_layers = getattr(model_cfg, "num_layers", DEEPSEEK_V3_NUM_DECODER_LAYERS) |
There was a problem hiding this comment.
DEEPSEEK_V3_NUM_DECODER_LAYERS I don't think we need this global variable, you should always be able to num layers
Summary
pipeline_model_parallel_layout/mstages.Context
mtp_standalone,LayerType.mtp,mtp_on_this_rank,get_mtp_ranks, andmlayout docs.Validation
git diff --checkuv run --active --no-sync ruff check src/megatron/bridge/recipes/deepseek/__init__.py src/megatron/bridge/recipes/deepseek/deepseek_v3.py tests/unit_tests/recipes/test_deepseek_recipes.pyuv run --active --no-sync ruff format --check src/megatron/bridge/recipes/deepseek/__init__.py src/megatron/bridge/recipes/deepseek/deepseek_v3.py tests/unit_tests/recipes/test_deepseek_recipes.pyuv run --active --no-sync pre-commit run --all-filesUnit pytest and training smoke were not run on this workstation.