PHX/HPT: quark_quantization benchmark falls back to CPU, but the same model runs on NPU with predict.py

## Summary

On PHX/HPT, `CNN-examples/quark_quantization/quark_quantize.py` falls back to CPU, while the same quantized model runs on the NPU when loaded through `CNN-examples/getting_started_resnet/int8/predict.py`.

I could reproduce this with a quantized CNN model that is clearly NPU-runnable on PHX/HPT.

## Details

`quark_quantize.py` uses Vitis AI EP provider options that differ from the working `predict.py` path:

- `quark_quantize.py` uses `cacheDir` / `cacheKey`
- `predict.py` uses `cache_dir` / `cache_key`
- `predict.py` also sets `xlnx_enable_py3_round = 0` for PHX/HPT

After changing `quark_quantize.py` to:

- use `cache_dir` instead of `cacheDir`
- use `cache_key` instead of `cacheKey`
- add `xlnx_enable_py3_round = 0` for PHX/HPT

the same benchmark path was able to run on the NPU correctly on my PHX/HPT system.

## Suggested fix

Please align `CNN-examples/quark_quantization/quark_quantize.py` with the working PHX/HPT Vitis AI EP initialization used in `CNN-examples/getting_started_resnet/int8/predict.py`.

```diff
@@ -109,15 +109,16 @@ def main(args):
         quant_model = onnx.load(output_model_path)
         provider = ['VitisAIExecutionProvider']
         cache_dir = Path(__file__).parent.resolve()
-        provider_options = [
-            {
-                'cacheDir': str(cache_dir),
-                'cacheKey': 'modelcachekey',
-                'enable_cache_file_io_in_mem':'0'
-            }
-        ]
+        provider_options = [{
+            'cache_dir': str(cache_dir),
+            'cache_key': 'modelcachekey',
+            'enable_cache_file_io_in_mem': '0'
+        }]
+
         # Create session options
         session_options = ort.SessionOptions()
         session_options.log_severity_level = 1  # 0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal

         # For PHX/HPT, xclbin is required
         if npu_device == 'PHX/HPT':
             provider_options[0]['target'] = 'X1'
+            provider_options[0]['xlnx_enable_py3_round'] = 0
             provider_options[0]['xclbin'] = get_xclbin(npu_device)

         session = ort.InferenceSession(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHX/HPT: quark_quantization benchmark falls back to CPU, but the same model runs on NPU with predict.py #376

Summary

Details

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PHX/HPT: quark_quantization benchmark falls back to CPU, but the same model runs on NPU with predict.py #376

Description

Summary

Details

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions