Integrate Automated QDQ placement tool - part 4.3#843
Integrate Automated QDQ placement tool - part 4.3#843willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughNew documentation file added detailing automated Q/DQ placement optimization for ONNX models, including quick start guides, configuration options, advanced usage patterns, CLI usage, troubleshooting, and API references. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Will Guo <willg@nvidia.com>
76c395b to
916687c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print(" Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
🧹 Nitpick comments (1)
docs/source/guides/9_qdq_placement.rst (1)
457-457: Clarify scheme selection guidance.The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.
💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes. +The optimal scheme count depends on your model's structure: + +* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns. +* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.
| if autotuner.current_profile_pattern_schemes is None: | ||
| print(" Already profiled, skipping") | ||
| continue |
There was a problem hiding this comment.
Potentially confusing logic in code example.
The code checks if autotuner.current_profile_pattern_schemes is None: and then prints "Already profiled, skipping". This logic appears counter-intuitive—typically, checking if something is None suggests it doesn't exist or needs initialization, not that it's already complete. Without API documentation context, this example may confuse users about when regions are skipped.
Consider adding a clarifying comment explaining the API behavior, or verify this logic is correct:
# Check if already profiled (API returns None when region is complete)
if autotuner.current_profile_pattern_schemes is None:
print(" Already profiled, skipping")
continue🤖 Prompt for AI Agents
In `@docs/source/guides/9_qdq_placement.rst` around lines 279 - 281, The
conditional checking autotuner.current_profile_pattern_schemes being None is
counter-intuitive as written; update the example around the if statement that
references autotuner.current_profile_pattern_schemes to either (a) add a
one-line clarifying comment explaining the API semantics (e.g., that the API
returns None when a region is already profiled/complete) or (b) change the
condition if it was reversed after confirming the API behavior; ensure the
comment references autotuner.current_profile_pattern_schemes and the subsequent
print(" Already profiled, skipping") / continue so readers understand why None
means "already profiled."
| python -m modelopt.onnx.quantization.autotune \ | ||
| --model bert_large.onnx \ | ||
| --output ./bert_large_run \ | ||
| --pattern-cache ./bert_base_run/pattern_cache.yaml |
There was a problem hiding this comment.
Inconsistent filename reference.
Line 516 references ./bert_base_run/pattern_cache.yaml, but based on the output structure shown in lines 60 and 150, the pattern cache filename should be autotuner_state_pattern_cache.yaml.
📝 Proposed fix
- --pattern-cache ./bert_base_run/pattern_cache.yaml
+ --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --pattern-cache ./bert_base_run/pattern_cache.yaml | |
| --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml |
🤖 Prompt for AI Agents
In `@docs/source/guides/9_qdq_placement.rst` at line 516, Replace the inconsistent
pattern-cache filename reference `./bert_base_run/pattern_cache.yaml` with the
correct `autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so
the flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
What does this PR do?
This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit