Conversation
|
🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This PR introduces a significant refactor to the Learn-To-Init (LTI) mechanism, making it model-agnostic and more flexible by using dynamic NNX module augmentation. While the overall design improvement is positive and aligns with the goal of generalizing LTI, there are several critical and high-severity issues that need to be addressed, including syntax errors in assertions, missing f-string prefixes, and logical inconsistencies in layer collection.
🔍 General Feedback
- Regex Support: The move to regex-based weight sharing and copying is a great addition that improves the flexibility of the distillation pipeline.
- Model Agnostic LTI: Decoupling LTI from specific model architectures (like Llama2) is a good architectural move.
- Testing: New tests were added for the generic augmentation, but they currently have structural issues (missing indentation) and incorrect mock patch paths that will prevent them from running correctly.
- Inconsistencies: There is some inconsistency in how different layer prefixes (e.g.,
dense_layers_,moe_layers_) are handled between initialization and the final weight update.
gagika
reviewed
May 1, 2026
JamesDeng42
reviewed
May 1, 2026
JamesDeng42
reviewed
May 1, 2026
gagika
approved these changes
May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Layer-wise LTI
Generalization of Learn-To-Init (LTI) Mechanism
The rest of the description includes relevant details and context, examples:
Shortcomings:
Tests
learn_to_init_test.py and train_distill_test.py were refactored to validate the new generic LTI augmentation functionality and the regex-based weight preparation logic.
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.