Skip to content

Conversation

@Fridah-nv
Copy link
Contributor

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 5, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fridah/static-fp4-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv force-pushed the fridah/static-fp4-export branch from 9f69993 to df4e6a9 Compare February 5, 2026 19:02
"algorithm": "max",
}

NVFP4_WEIGHT_MSE_FP8_SWEEP_CFG = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Activation is quantized, can we make it more evident?

Suggested change
NVFP4_WEIGHT_MSE_FP8_SWEEP_CFG = {
NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG = {

Comment on lines +316 to +325
weight_scaling_factor_2 = global_amax / (6.0 * 448.0)
per_block_scale = per_block_amax / (6.0 * weight_scaling_factor_2.to(per_block_amax.device))
per_block_scale[per_block_scale == 0] = 1.0

# Reshape per_block_scale to match weight's block structure: (rows, num_blocks_per_row)
num_blocks_per_row = weight.shape[-1] // block_size
expected_shape = (*weight.shape[:-1], num_blocks_per_row)
per_block_scale = per_block_scale.view(expected_shape)

return per_block_scale.to(torch.float8_e4m3fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
This is the way I think about FP8 quantization of scale. I think it is more intuitive as following

Suggested change
weight_scaling_factor_2 = global_amax / (6.0 * 448.0)
per_block_scale = per_block_amax / (6.0 * weight_scaling_factor_2.to(per_block_amax.device))
per_block_scale[per_block_scale == 0] = 1.0
# Reshape per_block_scale to match weight's block structure: (rows, num_blocks_per_row)
num_blocks_per_row = weight.shape[-1] // block_size
expected_shape = (*weight.shape[:-1], num_blocks_per_row)
per_block_scale = per_block_scale.view(expected_shape)
return per_block_scale.to(torch.float8_e4m3fn)
per_block_scale_max = global_amax.float() / 6.0 # importance do the scale in float
per_block_scale = per_block_amax.float() / 6.0
per_block_scale[per_block_scale == 0] = 1.0
# Reshape per_block_scale to match weight's block structure: (rows, num_blocks_per_row)
num_blocks_per_row = weight.shape[-1] // block_size
expected_shape = (*weight.shape[:-1], num_blocks_per_row)
per_block_scale = per_block_scale.view(expected_shape)
per_block_scale_fp8 = (per_block_scale * 448.0 / per_block_scale_max).to(torch.float8_e4m3fn)
return per_block_scale_fp8

assert (
hasattr(weight_quantizer, "_global_amax") and weight_quantizer._global_amax is not None
)
global_amax = weight_quantizer._global_amax.float()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this

Suggested change
global_amax = weight_quantizer._global_amax.float()
global_amax = weight_quantizer.global_amax.float()

Comment on lines +309 to +311
assert (
hasattr(weight_quantizer, "_global_amax") and weight_quantizer._global_amax is not None
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert (
hasattr(weight_quantizer, "_global_amax") and weight_quantizer._global_amax is not None
)
assert weight_quantizer.global_amax is not None

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants