Integrate Automated QDQ placement tool - part 3.3 by willg-nv · Pull Request #839 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-02T03:04:02Z

What does this PR do?

This PR implements QDQ autotuner CLI. This is the initial version of CLI, it will be integrated to modelopt.onnx.quantization.autotune.
Usage:

  python -m modelopt.onnx.quantization.autotune
      --onnx_path model.onnx --schemes_per_region 50
      --pattern_cache cache.yaml --qdq_baseline baseline.onnx
      --quant_type int8 --verbose

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

Overview: ?

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Document will be added in part 4.
Did you update Changelog?: CHANGE log will be added in part 4.

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added a command-line interface for ONNX quantization autotuning with configurable parameters for models, output paths, quantization strategies, and TensorRT benchmarking.
- Introduced an automated workflow for pattern-based region optimization with state management, baseline comparison, and benchmarking capabilities.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-02-02T03:04:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-02T03:05:09Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

These changes introduce a command-line interface and core workflow orchestration for ONNX Q/DQ autotuning. The CLI entry point parses configuration arguments, validates inputs, initializes TensorRT benchmarking, and invokes a region-pattern autotuning workflow that profiles models, applies quantization schemes, benchmarks performance, and exports optimized variants.

Changes

Cohort / File(s)	Summary
CLI Entry Point `modelopt/onnx/quantization/autotune/__main__.py`	Implements `run_autotune()` function with argument parsing via `_get_autotune_parser()`, input validation, TensorRT benchmark initialization, and orchestration of the region-pattern autotuning workflow. Includes error handling for keyboard interruption and general exceptions, plus logging of benchmark configuration.
Workflow Core `modelopt/onnx/quantization/autotune/workflows.py`	Provides `benchmark_onnx_model()` for latency measurement, `init_benchmark_instance()` for TensorRT benchmark setup, and `region_pattern_autotuning_workflow()` for automated Q/DQ optimization via region discovery, pattern filtering, per-region scheme iteration, model export, and state checkpointing.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI (run_autotune)
    participant Validator as Input Validator
    participant Benchmark as Benchmark Init
    participant Workflow as Autotuning Workflow
    participant Model as ONNX Model
    participant TensorRT as TensorRT Engine
    participant Output as Model Export

    User->>CLI: Invoke with arguments
    CLI->>Validator: Validate model & baseline paths
    Validator-->>CLI: Path valid / exit
    CLI->>Benchmark: Initialize benchmark instance
    Benchmark->>TensorRT: Configure with timing cache & plugins
    TensorRT-->>Benchmark: Instance ready
    Benchmark-->>CLI: Benchmark initialized
    CLI->>Workflow: Invoke region_pattern_autotuning_workflow
    Workflow->>Model: Load ONNX model
    Workflow->>Model: Load pattern cache & QDQ baseline
    Workflow->>Workflow: Profile regions & apply node filters
    loop For each region
        Workflow->>Workflow: Generate quantization schemes
        Workflow->>Model: Apply Q/DQ to region
        Workflow->>TensorRT: Benchmark model
        TensorRT-->>Workflow: Latency result
    end
    Workflow->>Output: Export optimized model
    Output-->>Workflow: Export complete
    Workflow->>Output: Save state checkpoint
    Output-->>Workflow: State saved
    Workflow-->>CLI: Return autotuner result
    CLI-->>User: Exit with status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Integrate Automated QDQ placement tool - part 3.3' clearly describes the main change: adding a QDQ autotuner CLI. It directly relates to the changeset which introduces new modules for ONNX Q/DQ autotuning workflow and a command-line entry point.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 107-116: init_benchmark_instance can return None on failure but
the current flow continues; update the caller (the block after
log_benchmark_config) to check the return value of init_benchmark_instance (when
called with use_trtexec=args.use_trtexec,
plugin_libraries=args.plugin_libraries, timing_cache_file=args.timing_cache,
warmup_runs=args.warmup_runs, timing_runs=args.timing_runs,
trtexec_args=trtexec_args) and if it returns None, log an error and exit early
(e.g., sys.exit(1)) so the script fails fast instead of producing misleading
infinite benchmark results.

In `@modelopt/onnx/quantization/autotune/workflows.py`:
- Around line 239-246: The Config instantiation currently hardcodes verbose=True
which forces noisy logging; change the call that constructs Config (the
Config(...) in this file) to accept a verbose parameter (e.g., verbose=verbose
or verbose=args_verbose) and thread that boolean from the CLI invocation that
creates/starts the autotuner (update the CLI call site to pass args.verbose into
the function that triggers this code), ensuring logger.info stays unchanged but
Config uses the provided verbose flag instead of True.

modelopt/onnx/quantization/autotune/__main__.py

modelopt/onnx/quantization/autotune/workflows.py

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv requested a review from a team as a code owner February 2, 2026 03:04

willg-nv requested a review from vishalpandya1990 February 2, 2026 03:04

willg-nv mentioned this pull request Feb 2, 2026

Integrate Automated QDQ benchmark - part 3.1 #837

Open

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/__main__.py Show resolved Hide resolved

modelopt/onnx/quantization/autotune/workflows.py Show resolved Hide resolved

willg-nv mentioned this pull request Feb 2, 2026

Integrate Automated QDQ autotuner - part 3.2 #838

Open

Integrate Automated QDQ placement tool - part 3.3

09e136a

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.3 branch from a87482f to 09e136a Compare February 2, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Automated QDQ placement tool - part 3.3#839

Integrate Automated QDQ placement tool - part 3.3#839
willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part3.3

willg-nv commented Feb 2, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

willg-nv commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willg-nv commented Feb 2, 2026 •

edited

Loading

coderabbitai bot commented Feb 2, 2026 •

edited

Loading