Skip to content

Add FP8 support for the ONNX backend#4072

Open
andrey-churkin wants to merge 6 commits into
openvinotoolkit:developfrom
andrey-churkin:ac/fp8_onnx
Open

Add FP8 support for the ONNX backend#4072
andrey-churkin wants to merge 6 commits into
openvinotoolkit:developfrom
andrey-churkin:ac/fp8_onnx

Conversation

@andrey-churkin

@andrey-churkin andrey-churkin commented May 15, 2026

Copy link
Copy Markdown
Contributor

Changes

  • Add support for nncf.CompressWeightsMode.FP8_E4M3 mode in the nncf.compress_weights() method for the ONNX backend.
  • Add support for quantization using nncf.QuantizationMode.FP8_E4M3 and nncf.QuantizationMode.FP8_E5M2 modes in the nncf.quantize() method for the ONNX backend.
  • Add an example ‎examples/llm_compression/onnx/smollm2_360m_fp8

Reason for changes

Add support for FP8 quantization and weight compression in the ONNX backend.

Related tickets

Tests

Weight compression - success

Test examples - failure

@andrey-churkin andrey-churkin requested a review from a team as a code owner May 15, 2026 08:24
@github-actions github-actions Bot added the NNCF ONNX Pull requests that updates NNCF ONNX label May 15, 2026

@daniil-lyakhov daniil-lyakhov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major comments, please add some tests

Comment thread src/nncf/quantization/algorithms/min_max/onnx_backend.py
Comment on lines +367 to +371
if weight_dtype == onnx.TensorProto.FLOAT8E4M3FN:
np_dtype = helper.tensor_dtype_to_np_dtype(weight_dtype)
vals = onnx.numpy_helper.saturate_cast(np.asarray(quantized_weights), np_dtype).flatten()
else:
vals = quantized_weights

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two similar code blocks, maybe worth a private method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten it slightly. Given that it's only two lines, I don't think introducing a separate method provides much value.

@daniil-lyakhov daniil-lyakhov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation NNCF ONNX Pull requests that updates NNCF ONNX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants