Skip to content

PHX/HPT: quark_quantization benchmark falls back to CPU, but the same model runs on NPU with predict.py #376

@TJonH

Description

@TJonH

Summary

On PHX/HPT, CNN-examples/quark_quantization/quark_quantize.py falls back to CPU, while the same quantized model runs on the NPU when loaded through CNN-examples/getting_started_resnet/int8/predict.py.

I could reproduce this with a quantized CNN model that is clearly NPU-runnable on PHX/HPT.

Details

quark_quantize.py uses Vitis AI EP provider options that differ from the working predict.py path:

  • quark_quantize.py uses cacheDir / cacheKey
  • predict.py uses cache_dir / cache_key
  • predict.py also sets xlnx_enable_py3_round = 0 for PHX/HPT

After changing quark_quantize.py to:

  • use cache_dir instead of cacheDir
  • use cache_key instead of cacheKey
  • add xlnx_enable_py3_round = 0 for PHX/HPT

the same benchmark path was able to run on the NPU correctly on my PHX/HPT system.

Suggested fix

Please align CNN-examples/quark_quantization/quark_quantize.py with the working PHX/HPT Vitis AI EP initialization used in CNN-examples/getting_started_resnet/int8/predict.py.

@@ -109,15 +109,16 @@ def main(args):
         quant_model = onnx.load(output_model_path)
         provider = ['VitisAIExecutionProvider']
         cache_dir = Path(__file__).parent.resolve()
-        provider_options = [
-            {
-                'cacheDir': str(cache_dir),
-                'cacheKey': 'modelcachekey',
-                'enable_cache_file_io_in_mem':'0'
-            }
-        ]
+        provider_options = [{
+            'cache_dir': str(cache_dir),
+            'cache_key': 'modelcachekey',
+            'enable_cache_file_io_in_mem': '0'
+        }]
+
         # Create session options
         session_options = ort.SessionOptions()
         session_options.log_severity_level = 1  # 0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal

         # For PHX/HPT, xclbin is required
         if npu_device == 'PHX/HPT':
             provider_options[0]['target'] = 'X1'
+            provider_options[0]['xlnx_enable_py3_round'] = 0
             provider_options[0]['xclbin'] = get_xclbin(npu_device)

         session = ort.InferenceSession(

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions