add support for gemma4 model by n1ck-guo · Pull Request #1655 · intel/auto-round

n1ck-guo · 2026-04-03T07:25:45Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot

Pull request overview

Adds runtime handling for the gemma4 model type by patching Gemma4 decoder layers to avoid shape mismatches during auto-round block-wise quantization.

Changes:

Added gemma4 to the special model list and introduced a Gemma4-specific patch routine.
Hooked the patch into _handle_special_model when model.config.model_type == "gemma4".
Removed a couple of stray whitespace-only lines near ignore-layer registrations.

for more information, see https://pre-commit.ci

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

for more information, see https://pre-commit.ci

…into support_gemma4

for more information, see https://pre-commit.ci

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…port_for_gemma4 # Conflicts: # auto_round/compressors/base.py # auto_round/special_model_handler.py # auto_round/utils/common.py

for more information, see https://pre-commit.ci

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…emma4 # Conflicts: # auto_round/compressors/base.py # auto_round/utils/model.py

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…/auto-round into hengguo/support_for_gemma4

n1ck-guo · 2026-04-14T07:00:55Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-14T07:01:07Z

Azure Pipelines successfully started running 1 pipeline(s).

JGSphaela · 2026-04-16T11:07:53Z

Hi @n1ck-guo , I encountered a crash while testing this PR with google/gemma-4-E2B-it using iters 0 (RTN).

The Issue:
The quantization crashes with TypeError: 'NoneType' object does not support item assignment at transformers/models/gemma4/modeling_gemma4.py:1226. This happens because shared_kv_states is passed as None to the attention module.

Root Cause:
The patched_layer_forward signature in auto_round/special_model_handler.py is missing the shared_kv_states argument, which is the 3rd positional argument in the latest Gemma4DecoderLayer.forward implementation.

For Gemma-4 E2B/E4B, this dictionary is required to manage the shared KV cache between anchor and sharer layers. When auto-round patches the layer, it accidentally drops this argument.

Suggested Fix:
Update the signature in _patch_gemma4_model to include shared_kv_states and ensure it is propagated to orig_fwd. Additionally, since auto-round processes blocks individually, we should initialize a shared dictionary in the patching closure to maintain the state across layers:

# In _patch_gemma4_model
shared_kv_states_global = {}

# In patched_layer_forward signature
def patched_layer_forward(self, hidden_states, per_layer_input=None, shared_kv_states=None, ...):
    if shared_kv_states is None:
        shared_kv_states = shared_kv_states_global
    ...
    return orig_fwd(..., shared_kv_states=shared_kv_states, ...)

This fix resolved the issue in my local tests and allowed quantization to complete.

add support for gemma4 model

cf55efb

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot AI review requested due to automatic review settings April 3, 2026 07:25

Merge branch 'main' into hengguo/support_for_gemma4

8dfdca8

n1ck-guo mentioned this pull request Apr 3, 2026

[Bug]: gemma 4: RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 3 #1651

Open

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread auto_round/special_model_handler.py

Comment thread auto_round/special_model_handler.py

Comment thread auto_round/special_model_handler.py

Comment thread auto_round/special_model_handler.py Outdated

wenhuach21 and others added 10 commits April 3, 2026 15:46

try to support gemma4

8ab6ebe

[pre-commit.ci] auto fixes from pre-commit.com hooks

0cc631d

for more information, see https://pre-commit.ci

Update auto_round/compressors/base.py

6a6afcf

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update auto_round/compressors/base.py

7f1af02

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update auto_round/compressors/base.py

416797c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

refine

4638d8a

Update auto_round/special_model_handler.py

a8dd583

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8256d92

for more information, see https://pre-commit.ci

fix

d3d1ed8

Merge branch 'support_gemma4' of https://github.com/intel/auto-round …

f2e5332

…into support_gemma4

Copilot started reviewing on behalf of n1ck-guo April 3, 2026 09:23 View session

wenhuach21 and others added 14 commits April 3, 2026 17:26

support opt_rtn

222838e

[pre-commit.ci] auto fixes from pre-commit.com hooks

bc93840

for more information, see https://pre-commit.ci

update

771b003

[pre-commit.ci] auto fixes from pre-commit.com hooks

f5849bf

for more information, see https://pre-commit.ci

fix offload immediate_saving issue

4c087b6

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'main' into support_gemma4

5b4ae05

merge pr1656

a53ea86

Signed-off-by: n1ck-guo <heng.guo@intel.com>

sync

52873d3

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge remote-tracking branch 'origin/support_gemma4' into hengguo/sup…

9218ad6

…port_for_gemma4 # Conflicts: # auto_round/compressors/base.py # auto_round/special_model_handler.py # auto_round/utils/common.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

9987bba

for more information, see https://pre-commit.ci

update

4f0d3fe

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge remote-tracking branch 'origin/main' into hengguo/support_for_g…

d508e02

…emma4 # Conflicts: # auto_round/compressors/base.py # auto_round/utils/model.py

fix merge

6f6a2d3

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'hengguo/support_for_gemma4' of https://github.com/intel…

edea4e1

…/auto-round into hengguo/support_for_gemma4

n1ck-guo requested review from lvliang-intel and wenhuach21 April 14, 2026 06:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for gemma4 model#1655

add support for gemma4 model#1655
n1ck-guo wants to merge 26 commits intomainfrom
hengguo/support_for_gemma4

n1ck-guo commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n1ck-guo commented Apr 14, 2026

Uh oh!

azure-pipelines bot commented Apr 14, 2026

Uh oh!

JGSphaela commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

n1ck-guo commented Apr 3, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n1ck-guo commented Apr 14, 2026

Uh oh!

azure-pipelines bot commented Apr 14, 2026

Uh oh!

JGSphaela commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants