Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the package version to 1.1.0.dev0 and introduces changes to the GPT model and patching logic, including support for multimodal context parallelism and a refactor of the Multi-Token Prediction (MTP) loss calculation. Review feedback identifies critical issues in the MTP logic where the removal of a null check on labels could cause crashes and a lack of zero-division guards for token counts could lead to runtime errors. Additionally, it is recommended to split position_ids alongside input_ids during multimodal context parallel processing to avoid shape mismatches.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the MCore-Bridge version to 1.1.0.dev0, improves documentation in README files, and introduces several functional enhancements. Key changes include adding support for keyword arguments in rotary positional embedding functions, implementing input splitting for multimodal models with context parallelism, and refactoring the Multi-Token Prediction (MTP) loss calculation logic. Regarding the review feedback, while the reviewer suggested a safer check for packed_seq_params, the current implementation using getattr with a default value is functionally equivalent and safe for the existing structure.
No description provided.