Fridah/static fp4 export #858

Fridah-nv · 2026-02-05T18:59:08Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2026-02-05T18:59:12Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-05T18:59:16Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fridah/static-fp4-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

realAsma · 2026-02-05T19:44:36Z

modelopt/torch/quantization/config.py

    "algorithm": "max",
 }

+NVFP4_WEIGHT_MSE_FP8_SWEEP_CFG = {


Activation is quantized, can we make it more evident?

Suggested change

NVFP4_WEIGHT_MSE_FP8_SWEEP_CFG = {

NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG = {

realAsma · 2026-02-05T19:57:31Z

modelopt/torch/export/quant_utils.py

+        weight_scaling_factor_2 = global_amax / (6.0 * 448.0)
+        per_block_scale = per_block_amax / (6.0 * weight_scaling_factor_2.to(per_block_amax.device))
+        per_block_scale[per_block_scale == 0] = 1.0
+
+        # Reshape per_block_scale to match weight's block structure: (rows, num_blocks_per_row)
+        num_blocks_per_row = weight.shape[-1] // block_size
+        expected_shape = (*weight.shape[:-1], num_blocks_per_row)
+        per_block_scale = per_block_scale.view(expected_shape)
+
+        return per_block_scale.to(torch.float8_e4m3fn)


nit:
This is the way I think about FP8 quantization of scale. I think it is more intuitive as following

Suggested change

weight_scaling_factor_2 = global_amax / (6.0 * 448.0)

per_block_scale = per_block_amax / (6.0 * weight_scaling_factor_2.to(per_block_amax.device))

per_block_scale[per_block_scale == 0] = 1.0

# Reshape per_block_scale to match weight's block structure: (rows, num_blocks_per_row)

num_blocks_per_row = weight.shape[-1] // block_size

expected_shape = (*weight.shape[:-1], num_blocks_per_row)

per_block_scale = per_block_scale.view(expected_shape)

return per_block_scale.to(torch.float8_e4m3fn)

per_block_scale_max = global_amax.float() / 6.0 # importance do the scale in float

per_block_scale = per_block_amax.float() / 6.0

per_block_scale[per_block_scale == 0] = 1.0

# Reshape per_block_scale to match weight's block structure: (rows, num_blocks_per_row)

num_blocks_per_row = weight.shape[-1] // block_size

expected_shape = (*weight.shape[:-1], num_blocks_per_row)

per_block_scale = per_block_scale.view(expected_shape)

per_block_scale_fp8 = (per_block_scale * 448.0 / per_block_scale_max).to(torch.float8_e4m3fn)

return per_block_scale_fp8

realAsma · 2026-02-05T23:13:23Z

modelopt/torch/export/quant_utils.py

+        assert (
+            hasattr(weight_quantizer, "_global_amax") and weight_quantizer._global_amax is not None
+        )
+        global_amax = weight_quantizer._global_amax.float()


Can we use this

Suggested change

global_amax = weight_quantizer._global_amax.float()

global_amax = weight_quantizer.global_amax.float()

realAsma · 2026-02-05T23:13:42Z

modelopt/torch/export/quant_utils.py

+        assert (
+            hasattr(weight_quantizer, "_global_amax") and weight_quantizer._global_amax is not None
+        )


Suggested change

assert (

hasattr(weight_quantizer, "_global_amax") and weight_quantizer._global_amax is not None

)

assert weight_quantizer.global_amax is not None

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv added 2 commits February 5, 2026 19:01

export support for NVFP4 static

6e14a7e

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

tmp:experimental config

df4e6a9

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv force-pushed the fridah/static-fp4-export branch from 9f69993 to df4e6a9 Compare February 5, 2026 19:02

realAsma reviewed Feb 5, 2026

View reviewed changes

fix layer fusion

c1ea842

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fridah/static fp4 export #858

Fridah/static fp4 export #858

Uh oh!

Fridah-nv commented Feb 5, 2026

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026

Review skipped

Uh oh!

realAsma Feb 5, 2026

Uh oh!

realAsma Feb 5, 2026

Uh oh!

realAsma Feb 5, 2026

Uh oh!

realAsma Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	NVFP4_WEIGHT_MSE_FP8_SWEEP_CFG = {
	NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG = {

	global_amax = weight_quantizer._global_amax.float()
	global_amax = weight_quantizer.global_amax.float()

Fridah/static fp4 export #858

Are you sure you want to change the base?

Fridah/static fp4 export #858

Uh oh!

Conversation

Fridah-nv commented Feb 5, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026

Review skipped

Uh oh!

realAsma Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants