Skip to content

Integrate Automated QDQ placement tool - part 3.3#839

Open
willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part3.3
Open

Integrate Automated QDQ placement tool - part 3.3#839
willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part3.3

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Feb 2, 2026

What does this PR do?

This PR implements QDQ autotuner CLI. This is the initial version of CLI, it will be integrated to modelopt.onnx.quantization.autotune.
Usage:

  python -m modelopt.onnx.quantization.autotune
      --onnx_path model.onnx --schemes_per_region 50
      --pattern_cache cache.yaml --qdq_baseline baseline.onnx
      --quant_type int8 --verbose

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

Overview: ?

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Document will be added in part 4.
  • Did you update Changelog?: CHANGE log will be added in part 4.

Additional Information

Summary by CodeRabbit

Release Notes

  • New Features
    • Added a command-line interface for ONNX quantization autotuning with configurable parameters for models, output paths, quantization strategies, and TensorRT benchmarking.
    • Introduced an automated workflow for pattern-based region optimization with state management, baseline comparison, and benchmarking capabilities.

✏️ Tip: You can customize this high-level summary in your review settings.

@willg-nv willg-nv requested a review from a team as a code owner February 2, 2026 03:04
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

These changes introduce a command-line interface and core workflow orchestration for ONNX Q/DQ autotuning. The CLI entry point parses configuration arguments, validates inputs, initializes TensorRT benchmarking, and invokes a region-pattern autotuning workflow that profiles models, applies quantization schemes, benchmarks performance, and exports optimized variants.

Changes

Cohort / File(s) Summary
CLI Entry Point
modelopt/onnx/quantization/autotune/__main__.py
Implements run_autotune() function with argument parsing via _get_autotune_parser(), input validation, TensorRT benchmark initialization, and orchestration of the region-pattern autotuning workflow. Includes error handling for keyboard interruption and general exceptions, plus logging of benchmark configuration.
Workflow Core
modelopt/onnx/quantization/autotune/workflows.py
Provides benchmark_onnx_model() for latency measurement, init_benchmark_instance() for TensorRT benchmark setup, and region_pattern_autotuning_workflow() for automated Q/DQ optimization via region discovery, pattern filtering, per-region scheme iteration, model export, and state checkpointing.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI (run_autotune)
    participant Validator as Input Validator
    participant Benchmark as Benchmark Init
    participant Workflow as Autotuning Workflow
    participant Model as ONNX Model
    participant TensorRT as TensorRT Engine
    participant Output as Model Export

    User->>CLI: Invoke with arguments
    CLI->>Validator: Validate model & baseline paths
    Validator-->>CLI: Path valid / exit
    CLI->>Benchmark: Initialize benchmark instance
    Benchmark->>TensorRT: Configure with timing cache & plugins
    TensorRT-->>Benchmark: Instance ready
    Benchmark-->>CLI: Benchmark initialized
    CLI->>Workflow: Invoke region_pattern_autotuning_workflow
    Workflow->>Model: Load ONNX model
    Workflow->>Model: Load pattern cache & QDQ baseline
    Workflow->>Workflow: Profile regions & apply node filters
    loop For each region
        Workflow->>Workflow: Generate quantization schemes
        Workflow->>Model: Apply Q/DQ to region
        Workflow->>TensorRT: Benchmark model
        TensorRT-->>Workflow: Latency result
    end
    Workflow->>Output: Export optimized model
    Output-->>Workflow: Export complete
    Workflow->>Output: Save state checkpoint
    Output-->>Workflow: State saved
    Workflow-->>CLI: Return autotuner result
    CLI-->>User: Exit with status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Integrate Automated QDQ placement tool - part 3.3' clearly describes the main change: adding a QDQ autotuner CLI. It directly relates to the changeset which introduces new modules for ONNX Q/DQ autotuning workflow and a command-line entry point.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 107-116: init_benchmark_instance can return None on failure but
the current flow continues; update the caller (the block after
log_benchmark_config) to check the return value of init_benchmark_instance (when
called with use_trtexec=args.use_trtexec,
plugin_libraries=args.plugin_libraries, timing_cache_file=args.timing_cache,
warmup_runs=args.warmup_runs, timing_runs=args.timing_runs,
trtexec_args=trtexec_args) and if it returns None, log an error and exit early
(e.g., sys.exit(1)) so the script fails fast instead of producing misleading
infinite benchmark results.

In `@modelopt/onnx/quantization/autotune/workflows.py`:
- Around line 239-246: The Config instantiation currently hardcodes verbose=True
which forces noisy logging; change the call that constructs Config (the
Config(...) in this file) to accept a verbose parameter (e.g., verbose=verbose
or verbose=args_verbose) and thread that boolean from the CLI invocation that
creates/starts the autotuner (update the CLI call site to pass args.verbose into
the function that triggers this code), ensuring logger.info stays unchanged but
Config uses the provided verbose flag instead of True.

Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.3 branch from a87482f to 09e136a Compare February 2, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant