-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
I encountered an issue while running the model quantization script for GPTQ using the nvfp format. The error occurs during the forward pass when attempting to apply GPTQ quantization. Specifically, the error traceback indicates that the InputCollector object does not have an attribute attention_type.
Steps to Reproduce:
- Clone or download the project repository.
- Ensure all dependencies are installed (Torch, transformers, etc.).
- Run the following bash script with the given arguments:
#!/bin/bash
export OMP_NUM_THREADS=8
export CUDA_VISIBLE_DEVICES=4
MODEL="/share/global/models/Qwen3-8B"
SAVE_PATH="quantized_models/Qwen3-8B-MR-GPTQ-NVFP4"
python3 model_quant.py \
--model_name_or_path ${MODEL} \
--format nvfp \
--w_bits 4 \
--a_bits 4 \
--gptq \
--transform_class hadamard \
--hadamard_group_size 128 \
--dataset_name_or_path c4 \
--num_sequences 128 \
--sequence_length 2048 \
--w_observer minmax \
--quantization_order default \
--save_path ${SAVE_PATH} \
--export_quantized_model realquant \
--fuse_global_scale \
--amp \
--dtype bfloat16
Actual Behavior:
The script throws the following error during the forward pass:
Traceback (most recent call last):
File "/share/global/xiaonan.zhang/workspace/FP-Quant_Back/FP-Quant/model_quant.py", line 482, in <module>
main()
File "/share/global/xiaonan.zhang/workspace/FP-Quant_Back/FP-Quant/model_quant.py", line 394, in main
quantized_state_dict, non_quantized_state_dict = gptq_quantization(model, calibration_data, args, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/share/global/xiaonan.zhang/workspace/FP-Quant_Back/FP-Quant/src/quantization/gptq.py", line 534, in gptq_quantization
model(sample.to(device=device))
File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/transformers/utils/generic.py", line 918, in wrapper
output = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/transformers/models/qwen3/modeling_qwen3.py", line 480, in forward
outputs: BaseModelOutputWithPast = self.model(
^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/transformers/utils/generic.py", line 1072, in wrapper
outputs = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/transformers/models/qwen3/modeling_qwen3.py", line 412, in forward
attention_mask=causal_mask_mapping[decoder_layer.attention_type],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
raise AttributeError(
AttributeError: 'InputCollector' object has no attribute 'attention_type'
Environment:
- Python version: 3.12
- CUDA version: 12.8
- Model: Qwen3-8B
- Torch: 2.8.0+cu12
Metadata
Metadata
Assignees
Labels
No labels