Skip to content

AttributeError: 'InputCollector' object has no attribute 'attention_type' during GPTQ quantization with nvfp format #20

@547435524

Description

@547435524

I encountered an issue while running the model quantization script for GPTQ using the nvfp format. The error occurs during the forward pass when attempting to apply GPTQ quantization. Specifically, the error traceback indicates that the InputCollector object does not have an attribute attention_type.

Steps to Reproduce:

  1. Clone or download the project repository.
  2. Ensure all dependencies are installed (Torch, transformers, etc.).
  3. Run the following bash script with the given arguments:
#!/bin/bash
export OMP_NUM_THREADS=8
export CUDA_VISIBLE_DEVICES=4 

MODEL="/share/global/models/Qwen3-8B"
SAVE_PATH="quantized_models/Qwen3-8B-MR-GPTQ-NVFP4"

python3 model_quant.py \
 --model_name_or_path ${MODEL} \
 --format nvfp \
 --w_bits 4 \
 --a_bits 4 \
 --gptq \
 --transform_class hadamard \
 --hadamard_group_size 128 \
 --dataset_name_or_path c4 \
 --num_sequences 128 \
 --sequence_length 2048 \
 --w_observer minmax \
 --quantization_order default \
 --save_path ${SAVE_PATH} \
 --export_quantized_model realquant \
 --fuse_global_scale \
 --amp \
 --dtype bfloat16

Actual Behavior:
The script throws the following error during the forward pass:

Traceback (most recent call last):
 File "/share/global/xiaonan.zhang/workspace/FP-Quant_Back/FP-Quant/model_quant.py", line 482, in <module>
 main()
 File "/share/global/xiaonan.zhang/workspace/FP-Quant_Back/FP-Quant/model_quant.py", line 394, in main
 quantized_state_dict, non_quantized_state_dict = gptq_quantization(model, calibration_data, args, device)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/share/global/xiaonan.zhang/workspace/FP-Quant_Back/FP-Quant/src/quantization/gptq.py", line 534, in gptq_quantization
 model(sample.to(device=device))
 File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
 return self._call_impl(*args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
 return forward_call(*args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/transformers/utils/generic.py", line 918, in wrapper
 output = func(self, *args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/transformers/models/qwen3/modeling_qwen3.py", line 480, in forward
 outputs: BaseModelOutputWithPast = self.model(
 ^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
 return self._call_impl(*args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
 return forward_call(*args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/transformers/utils/generic.py", line 1072, in wrapper
 outputs = func(self, *args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/transformers/models/qwen3/modeling_qwen3.py", line 412, in forward
 attention_mask=causal_mask_mapping[decoder_layer.attention_type],
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
 raise AttributeError(
AttributeError: 'InputCollector' object has no attribute 'attention_type'

Environment:

  • Python version: 3.12
  • CUDA version: 12.8
  • Model: Qwen3-8B
  • Torch: 2.8.0+cu12

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions