Skip to content

Add Automated QDQ placement example - Part 4.1#841

Open
willg-nv wants to merge 3 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.1
Open

Add Automated QDQ placement example - Part 4.1#841
willg-nv wants to merge 3 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.1

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Feb 3, 2026

What does this PR do?

Type of change: ?
This change implement a simple example to illustrate the usage of Automated QDQ placement tool.

Overview: ?

Usage

python3 -m modelopt.onnx.quantization.autotune \
    --model resnet50.bs128.onnx \
    --output ./resnet50_autotuned \
    --qdq-baseline resnet50_quantized.onnx \
    --schemes-per-region 50

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • Documentation

    • Added comprehensive guide for QDQ placement optimization with setup and usage examples.
  • New Features

    • Added tool to modify ONNX model batch size configuration.
  • Improvements

    • Enhanced logging output with timestamps for better traceability.

Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv requested review from a team as code owners February 3, 2026 02:43
@willg-nv willg-nv requested review from ChenhanYu and galagam February 3, 2026 02:43
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

Adds comprehensive documentation and a batch size modification utility for the QDQ placement optimization example, along with a minor logging format enhancement. These changes introduce resources for users to set up and run QDQ quantization workflows on ONNX models.

Changes

Cohort / File(s) Summary
QDQ Placement Example
examples/qdq_placement/README.md, examples/qdq_placement/set_batch_size.py
Introduces documentation for QDQ quantization workflow and a utility script that modifies ONNX models to set fixed batch dimensions, including shape inference and model verification logic.
Logging Configuration
modelopt/onnx/logging_config.py
Updates log format string to prepend ISO-formatted timestamp to each log message.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically references the main change: adding an Automated QDQ placement example as part 4.1 of a multi-part implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from 1b8d896 to f36aa99 Compare February 3, 2026 02:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@examples/qdq_placement/README.md`:
- Around line 191-192: Fix the typo in the README sentence about TensorRT remote
autotuning: change "autouning" to "autotuning" in the line that reads "TensorRT
10.16 support remote autotuning, pass remoteAutoTuningConfig to trtexec to
benchmark with remote autouning." to correctly spell "autotuning" and ensure the
sentence still reads clearly (e.g., "TensorRT 10.16 supports remote autotuning;
pass remoteAutoTuningConfig to trtexec to benchmark with remote autotuning.").
- Around line 128-129: The downloaded filename in the curl command is
misleading: the URL fetches resnet101-v2-7.onnx but the saved name and
subsequent commands use resnet101_Opset17.onnx; update the saved filename and
downstream usage to a consistent, accurate name (e.g., use resnet101-v2-7.onnx
in the curl -o and in the python3 set_batch_size.py command) or add a one-line
clarifying comment above the commands explaining that resnet101_Opset17.onnx is
an alias for resnet101-v2-7.onnx so readers know which model variant is being
used.
🧹 Nitpick comments (3)
examples/qdq_placement/set_batch_size.py (3)

46-48: Consider validating that the model has inputs.

If the model has no graph inputs, accessing graph.input[0] will raise an IndexError. While unlikely for typical models, adding a guard improves robustness.

🛡️ Proposed defensive check
     # Get the input tensor
     graph = model.graph
+    if not graph.input:
+        raise ValueError(f"Model {model_path} has no graph inputs")
     input_tensor = graph.input[0]

60-64: Output batch dimension assumption may not hold for all models.

This code assumes the first dimension of every output is the batch dimension. While true for ResNet50 and most classification models, some models may have scalar outputs or outputs where batch isn't the first dimension. Consider adding a note in the docstring about this assumption, or making output modification opt-in.


78-84: Use the repository's utility functions for saving and checking the model to handle large files consistently.

The codebase provides save_onnx() and check_model() utilities in modelopt/onnx/utils.py that handle models larger than 2GB by using external data. Replace the standard onnx.save() (line 80) and onnx.checker.check_model() (line 84) with calls to modelopt.onnx.utils.save_onnx() and modelopt.onnx.utils.check_model(). While ResNet50 won't encounter this limitation, using the existing utilities ensures consistency across the codebase and prevents issues when the script is applied to larger models.

Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from f36aa99 to 7257a12 Compare February 3, 2026 03:42
Signed-off-by: Will Guo <willg@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant