Skip to content

Conversation

@kiw8
Copy link

@kiw8 kiw8 commented Jun 21, 2025

This code is designed to build a GemmaForCausalLM model based on Google's pre-trained language model Gemma-2B, enhanced with the Infini-attention mechanism. The model safely inherits pre-trained weights, and then performs training and text generation tests using various input texts.

In particular, it verifies the model’s trainability by calculating the loss and performing backpropagation on both short and long input texts. Additionally, the code is structured to test two types of text generation: a step-by-step sampling method and automated generation using HuggingFace’s .generate() function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant