Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of
packed_seq_paramsto the_get_embeddingscall is correct and necessary for MTP to properly handle sequence packing (THD format). This ensures that the internalroll_tensorcall correctly handles sequence boundaries.Critical Issue: Note that this patched
forwardmethod returns a 3-tuple(hidden_states, input_ids, position_ids)at line 470. However, the caller inGPTModel._postprocess(atsrc/mcore_bridge/model/gpt_model.py:398) expects a single return value:hidden_states = self.mtp(...). This mismatch will cause aTypeErrororValueErrorin subsequent operations (liketorch.chunkat line 413 ofgpt_model.py) becausehidden_stateswill be a tuple instead of a tensor.Since the caller handles its own label/mask shifting and logging, you should consider updating the return statement at line 470 to only return
hidden_statesto maintain compatibility with the standard Megatron-Core API and the existing caller logic.