Integrate Automated QDQ autotuner - part 3.2#838
Integrate Automated QDQ autotuner - part 3.2#838willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughIntroduces a new ONNX quantization autotuning module that enables automatic Q/DQ (Quantize/Dequantize) node insertion and optimization using pattern-based region analysis. Provides a comprehensive framework for discovering optimal insertion points, profiling schemes, and exporting quantized models. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Autotuner as QDQAutotuner
participant RegionSearch as CombinedRegionSearch
participant Profiler as Profiling System
participant Inserter as Q/DQ Insertion
participant Exporter as ONNX Exporter
User->>Autotuner: initialize(config, pattern_cache)
Autotuner->>Autotuner: Load model & init state
User->>RegionSearch: discover regions
RegionSearch-->>Autotuner: return regions
loop For each region
User->>Autotuner: set_profile_region(region)
Autotuner->>Autotuner: Commit profiling outcomes
Autotuner->>Profiler: Prepare region-pattern pairs
loop Generate candidates
User->>Autotuner: generate()
Autotuner->>Inserter: Build insertion scheme
Inserter->>Inserter: Insert Q/DQ nodes
User->>Autotuner: submit(latency_ms)
Autotuner->>Autotuner: Track performance metrics
end
end
User->>Autotuner: export_onnx(best=True)
Autotuner->>Inserter: Apply best scheme
Inserter->>Exporter: Finalize Q/DQ graph
Exporter-->>User: return quantized ONNX bytes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 1024-1029: The try/except around graph.cleanup().toposort()
swallows all exceptions (except Exception as e) and merely logs a warning, which
can hide serious graph corruption; update the handler in autotuner.py to either
catch only expected exception types (e.g., specific cleanup/toposort exceptions)
or log the error and re-raise it so execution stops on unexpected failures —
locate the graph.cleanup().toposort() call and replace the broad except with
either a narrowed except for known recoverable exceptions or add a raise after
logger.warning/failure log to propagate the error.
- Line 622: Remove the redundant local import "from datetime import datetime"
(the one added at line with the single import statement) in autotuner.py; the
module already imports datetime at the top of the file, so delete this local
import to avoid duplication and potential shadowing (look for the statement
"from datetime import datetime" inside the function or block and remove it).
- Around line 912-918: The zero-point arrays q_zp_values (and the corresponding
dq_zp_values) are created with a hardcoded dtype np.int8 which can mismatch the
QuantizeLinear/DequantizeLinear output type when quant_type is "uint8" or other
types; update their construction to use the same dtype as the computed
quant_dtype instead of np.int8 so q_zp_values and dq_zp_values match the
quantized output element type used when building q_inputs and dq_inputs (refer
to q_scale_values, q_zp_values, q_inputs and the corresponding dq_* variables to
locate where to change the dtype).
- Around line 1013-1021: The import of get_tensor_consumer_node_indices is wrong
and causes an import error; replace that import with get_tensor_consumer_nodes
and update any usage names accordingly (the code that uses tensor_users_map
already expects a defaultdict(list) so no KeyError handling is needed).
Specifically, change the symbol imported from
modelopt.onnx.quantization.graph_utils from get_tensor_consumer_node_indices to
get_tensor_consumer_nodes and ensure tensor_users_map is assigned from
get_tensor_consumer_nodes(...) where used in the autotuner (references:
get_tensor_consumer_node_indices, get_tensor_consumer_nodes, tensor_users_map).
🧹 Nitpick comments (4)
modelopt/onnx/quantization/autotune/autotuner.py (4)
229-229: Consider defining config attributes explicitly.Using
getattr(self.config, "maximum_generation_attempts", 100)with defaults (also seen at lines 718-719 and 744) suggests these attributes may not be formally defined on theConfigclass. This pattern makes it harder to discover available configuration options.💡 Suggestion
Consider adding these attributes to the
Configclass with documented defaults rather than relying ongetattrfallbacks:# In Config class maximum_generation_attempts: int = 100 top_percent_to_mutate: float = 0.1 minimum_schemes_to_mutate: int = 1 maximum_mutations: int = 3
333-335: Replace assertions with explicit checks for runtime validation.Assertions on lines 333-335 (and similarly at line 314) are used for validating runtime conditions. Since assertions can be disabled with
python -O, these should be explicit checks for production code.🛡️ Proposed fix
- full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph) - assert full_insertion_scheme is not None - all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme) - assert isinstance(all_region_ips, set) + full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph) + if full_insertion_scheme is None: + logger.warning(f"Failed to get full insertion scheme for region {region.id}") + continue + all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme) + if not isinstance(all_region_ips, set): + raise TypeError(f"Expected set from pattern.matches, got {type(all_region_ips)}")
972-985: Assertions used for critical runtime validation.These assertions validate critical invariants (node index bounds, input index bounds, tensor name matching) but can be disabled with
python -O. Consider using explicit checks withValueError/IndexErrorfor production safety.🛡️ Proposed fix
if node_index is not None: - assert node_index < len(graph.nodes), "Node index out of range" + if node_index >= len(graph.nodes): + raise IndexError(f"Node index {node_index} out of range (max: {len(graph.nodes) - 1})") target_node = graph.nodes[node_index] - assert input_index is not None, "Input index must be set when node index is set" - assert input_index < len(target_node.inputs), ( - f"Input index out of range for node {target_node.name}" - ) + if input_index is None: + raise ValueError("Input index must be set when node index is set") + if input_index >= len(target_node.inputs): + raise IndexError(f"Input index {input_index} out of range for node {target_node.name}") original_tensor = target_node.inputs[input_index] - assert tensor_name == original_tensor.name, ( - f"Tensor name mismatch for node {target_node.name} input {input_index}" - ) + if tensor_name != original_tensor.name: + raise ValueError(f"Tensor name mismatch: expected '{tensor_name}', got '{original_tensor.name}'") else: - assert tensor_name in tensor_map, f"Tensor {tensor_name} not found in tensor map" - assert input_index is None, "Input index must be None when node index is None" + if tensor_name not in tensor_map: + raise KeyError(f"Tensor {tensor_name} not found in tensor map") + if input_index is not None: + raise ValueError("Input index must be None when node index is None")
1042-1049: Consider iterative approach for deep region hierarchies.
_visit_region_recursivelyuses recursion which could hit Python's stack limit for very deep region hierarchies. While this is unlikely for typical ONNX models, an iterative approach would be more robust.♻️ Iterative alternative
def _visit_region_recursively(self, region: Region) -> list[Region]: """Iteratively traverse region hierarchy and collect all regions.""" regions = [] stack = [region] while stack: current = stack.pop() regions.append(current) stack.extend(current.get_children()) return regions
| else: | ||
| users = tensor_users_map[tensor_name] | ||
| for user_index in users: | ||
| user_node = graph.nodes[user_index] | ||
| for i, input_tensor in enumerate(user_node.inputs): | ||
| if hasattr(input_tensor, "name") and input_tensor.name == tensor_name: | ||
| user_node.inputs[i] = dq_node.outputs[0] | ||
| break | ||
| logger.debug(f" Q/DQ inserted: tensor '{tensor_name}' → {len(users)} users") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the file around line 1014 to understand the context
sed -n '990,1030p' modelopt/onnx/quantization/autotune/autotuner.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 2033
🏁 Script executed:
# Look for how tensor_users_map is created and populated
rg -n -B 5 -A 10 'tensor_users_map' modelopt/onnx/quantization/autotune/autotuner.py | head -100Repository: NVIDIA/Model-Optimizer
Length of output: 2593
🏁 Script executed:
# Search for get_tensor_consumer_node_indices to understand the mapping logic
rg -n -A 20 'def get_tensor_consumer_node_indices' modelopt/onnx/quantization/autotune/autotuner.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Search for get_tensor_consumer_node_indices in the entire codebase
rg -n 'def get_tensor_consumer_node_indices' --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Also look for the import of get_tensor_consumer_node_indices in autotuner.py
rg -n 'get_tensor_consumer_node_indices|from.*import' modelopt/onnx/quantization/autotune/autotuner.py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 832
🏁 Script executed:
# Find the definition of get_tensor_consumer_node_indices
rg -n -A 20 'def get_tensor_consumer_node_indices' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Look for the graph_utils file
fd 'graph_utils.py' --type fRepository: NVIDIA/Model-Optimizer
Length of output: 107
🏁 Script executed:
# Now read the function definition from graph_utils.py
sed -n '1,50p' modelopt/onnx/quantization/graph_utils.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 2180
🏁 Script executed:
# Search for get_tensor_consumer_node_indices in graph_utils.py
rg -n -A 25 'def get_tensor_consumer_node_indices' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Search for get_tensor_consumer_node_indices without the specific line number restriction
rg 'def get_tensor_consumer_node_indices' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Let's get the line number and then read that section
rg -n 'get_tensor_consumer_node_indices' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Search for get_tensor_consumer_node_indices in the entire codebase
rg -l 'def get_tensor_consumer_node_indices' --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Also search the common.py file since it was imported
rg -n -A 25 'def get_tensor_consumer_node_indices' modelopt/onnx/quantization/autotune/common.pyRepository: NVIDIA/Model-Optimizer
Length of output: 151
🏁 Script executed:
# Let's find the exact line number and read that section of graph_utils.py
wc -l modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 112
🏁 Script executed:
# Search for the function with grep to get line numbers
grep -n 'def get_tensor_consumer_node_indices' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# List the imports in autotuner.py to see what is imported from common
sed -n '30,45p' modelopt/onnx/quantization/autotune/autotuner.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 687
🏁 Script executed:
# Search for get_tensor_consumer_node_indices more broadly with a case-insensitive search
grep -i 'tensor_consumer_node_indices' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Let's try with rg using a different pattern
rg 'tensor_consumer_node_indices' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Maybe it's defined with a different name, let's search for "consumer" in the file
rg -n 'consumer' modelopt/onnx/quantization/graph_utils.py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 1489
🏁 Script executed:
# Search for get_tensor_consumer_node_indices across entire repo
rg 'def get_tensor_consumer_node_indices' --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Let's check if it's an alias or imported from somewhere else
rg 'get_tensor_consumer_node_indices' modelopt/ --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 446
🏁 Script executed:
# Let's search more carefully in graph_utils.py - maybe it's a wrapper or has different indentation
grep -n 'consumer_node_indices\|def.*consumer' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 145
🏁 Script executed:
# Let's look at the entire graph_utils.py file to find all function definitions
rg -n '^def ' modelopt/onnx/quantization/graph_utils.py | head -30Repository: NVIDIA/Model-Optimizer
Length of output: 1610
🏁 Script executed:
# Maybe the function is aliased or it's in a different form
grep -n 'tensor_users_map\|def.*tensor' modelopt/onnx/quantization/graph_utils.py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 603
🏁 Script executed:
# Read the get_tensor_consumer_nodes function to understand what it returns
sed -n '283,310p' modelopt/onnx/quantization/graph_utils.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 1130
🏁 Script executed:
# Let's also check if there's an __all__ export or if the function is imported from somewhere else in graph_utils
head -50 modelopt/onnx/quantization/graph_utils.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 2180
🏁 Script executed:
# Search for any assignment or alias that could be get_tensor_consumer_node_indices
rg 'get_tensor_consumer_node_indices\s*=' modelopt/onnx/quantization/graph_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if get_tensor_consumer_node_indices is imported from gs (onnx_graphsurgeon)
rg -n 'from.*import.*get_tensor_consumer_node_indices' modelopt/Repository: NVIDIA/Model-Optimizer
Length of output: 201
🏁 Script executed:
# Let's check the end of graph_utils.py to see if there's any aliasing
tail -50 modelopt/onnx/quantization/graph_utils.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 2434
🏁 Script executed:
# Search in the middle sections of graph_utils.py
sed -n '1450,1550p' modelopt/onnx/quantization/graph_utils.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 4500
🏁 Script executed:
# Let's check if get_tensor_consumer_node_indices exists anywhere in the repo
find . -name "*.py" -type f -exec grep -l "def get_tensor_consumer_node_indices" {} \;Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check the exact import line and what it should be importing
sed -n '40,50p' modelopt/onnx/quantization/autotune/autotuner.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 601
🏁 Script executed:
# Let's also check if the function exists by searching more broadly
rg 'get_tensor_consumer' --type py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 1271
🏁 Script executed:
# Check if there's any definition of get_tensor_consumer_node_indices anywhere
rg 'def get_tensor_consumer_node_indices|get_tensor_consumer_node_indices\s*=' --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Let's look at what get_tensor_consumer_nodes returns and understand the data structure
sed -n '283,302p' modelopt/onnx/quantization/graph_utils.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 842
🏁 Script executed:
# Let's check the line in autotuner.py where tensor_users_map is used after assignment
sed -n '960,975p' modelopt/onnx/quantization/autotune/autotuner.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 967
🏁 Script executed:
# The import says get_tensor_consumer_node_indices but only get_tensor_consumer_nodes exists
# Let's search for any wrapper or check if it might be defined somewhere we haven't looked
find . -name "*.py" -type f -exec grep -l "get_tensor_consumer_node_indices" {} \;Repository: NVIDIA/Model-Optimizer
Length of output: 116
🏁 Script executed:
# Let's check if there's a conftest or __init__ that might have this function
find . -name "__init__.py" -exec grep -l "get_tensor_consumer_node_indices" {} \;Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Let's check the graph_utils.py file around where get_tensor_consumer_nodes is defined for any related functions
sed -n '270,350p' modelopt/onnx/quantization/graph_utils.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 3189
🏁 Script executed:
# Let's check the graph object used in autotuner.py - it might be using onnx_graphsurgeon which has different API
sed -n '50,70p' modelopt/onnx/quantization/autotune/autotuner.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 1204
🏁 Script executed:
# Let's look at what Graph object is being used
rg -n 'from.*import.*Graph|import.*Graph' modelopt/onnx/quantization/autotune/autotuner.py | head -5Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check the actual usage context - what graph object is passed to get_tensor_consumer_node_indices
sed -n '950,965p' modelopt/onnx/quantization/autotune/autotuner.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 796
Import error: get_tensor_consumer_node_indices does not exist.
The function get_tensor_consumer_node_indices imported at line 47 is not defined in modelopt.onnx.quantization.graph_utils. The available function is get_tensor_consumer_nodes, which returns a defaultdict(list).
If get_tensor_consumer_nodes is the intended function, the KeyError concern is invalid—defaultdict returns an empty list for missing keys, so line 1014 would not raise an exception. However, the import itself must be corrected to resolve the module-level import error.
🤖 Prompt for AI Agents
In `@modelopt/onnx/quantization/autotune/autotuner.py` around lines 1013 - 1021,
The import of get_tensor_consumer_node_indices is wrong and causes an import
error; replace that import with get_tensor_consumer_nodes and update any usage
names accordingly (the code that uses tensor_users_map already expects a
defaultdict(list) so no KeyError handling is needed). Specifically, change the
symbol imported from modelopt.onnx.quantization.graph_utils from
get_tensor_consumer_node_indices to get_tensor_consumer_nodes and ensure
tensor_users_map is assigned from get_tensor_consumer_nodes(...) where used in
the autotuner (references: get_tensor_consumer_node_indices,
get_tensor_consumer_nodes, tensor_users_map).
d01535e to
b5032ed
Compare
Signed-off-by: Will Guo <willg@nvidia.com>
b5032ed to
1ffcf7f
Compare
What does this PR do?
This PR implements QDQAutotuner class. This class is used to drive the main Autotuner workflow.
The workflow is:
This PR is part 2/4 of #703.
PR 3.1: #837
PR 3.2 #838
PR 3.3: #839
Overview: ?
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.