Conversation
Signed-off-by: n1ck-guo <heng.guo@intel.com>
There was a problem hiding this comment.
Pull request overview
Adds runtime handling for the gemma4 model type by patching Gemma4 decoder layers to avoid shape mismatches during auto-round block-wise quantization.
Changes:
- Added
gemma4to the special model list and introduced a Gemma4-specific patch routine. - Hooked the patch into
_handle_special_modelwhenmodel.config.model_type == "gemma4". - Removed a couple of stray whitespace-only lines near ignore-layer registrations.
for more information, see https://pre-commit.ci
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
…into support_gemma4
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
…port_for_gemma4 # Conflicts: # auto_round/compressors/base.py # auto_round/special_model_handler.py # auto_round/utils/common.py
for more information, see https://pre-commit.ci
…emma4 # Conflicts: # auto_round/compressors/base.py # auto_round/utils/model.py
…/auto-round into hengguo/support_for_gemma4
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Hi @n1ck-guo , I encountered a crash while testing this PR with The Issue: Root Cause: For Gemma-4 E2B/E4B, this dictionary is required to manage the shared KV cache between anchor and sharer layers. When Suggested Fix: # In _patch_gemma4_model
shared_kv_states_global = {}
# In patched_layer_forward signature
def patched_layer_forward(self, hidden_states, per_layer_input=None, shared_kv_states=None, ...):
if shared_kv_states is None:
shared_kv_states = shared_kv_states_global
...
return orig_fwd(..., shared_kv_states=shared_kv_states, ...)This fix resolved the issue in my local tests and allowed quantization to complete. |
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting