Summary
On PHX/HPT, CNN-examples/quark_quantization/quark_quantize.py falls back to CPU, while the same quantized model runs on the NPU when loaded through CNN-examples/getting_started_resnet/int8/predict.py.
I could reproduce this with a quantized CNN model that is clearly NPU-runnable on PHX/HPT.
Details
quark_quantize.py uses Vitis AI EP provider options that differ from the working predict.py path:
quark_quantize.py uses cacheDir / cacheKey
predict.py uses cache_dir / cache_key
predict.py also sets xlnx_enable_py3_round = 0 for PHX/HPT
After changing quark_quantize.py to:
- use
cache_dir instead of cacheDir
- use
cache_key instead of cacheKey
- add
xlnx_enable_py3_round = 0 for PHX/HPT
the same benchmark path was able to run on the NPU correctly on my PHX/HPT system.
Suggested fix
Please align CNN-examples/quark_quantization/quark_quantize.py with the working PHX/HPT Vitis AI EP initialization used in CNN-examples/getting_started_resnet/int8/predict.py.
@@ -109,15 +109,16 @@ def main(args):
quant_model = onnx.load(output_model_path)
provider = ['VitisAIExecutionProvider']
cache_dir = Path(__file__).parent.resolve()
- provider_options = [
- {
- 'cacheDir': str(cache_dir),
- 'cacheKey': 'modelcachekey',
- 'enable_cache_file_io_in_mem':'0'
- }
- ]
+ provider_options = [{
+ 'cache_dir': str(cache_dir),
+ 'cache_key': 'modelcachekey',
+ 'enable_cache_file_io_in_mem': '0'
+ }]
+
# Create session options
session_options = ort.SessionOptions()
session_options.log_severity_level = 1 # 0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal
# For PHX/HPT, xclbin is required
if npu_device == 'PHX/HPT':
provider_options[0]['target'] = 'X1'
+ provider_options[0]['xlnx_enable_py3_round'] = 0
provider_options[0]['xclbin'] = get_xclbin(npu_device)
session = ort.InferenceSession(
Summary
On PHX/HPT,
CNN-examples/quark_quantization/quark_quantize.pyfalls back to CPU, while the same quantized model runs on the NPU when loaded throughCNN-examples/getting_started_resnet/int8/predict.py.I could reproduce this with a quantized CNN model that is clearly NPU-runnable on PHX/HPT.
Details
quark_quantize.pyuses Vitis AI EP provider options that differ from the workingpredict.pypath:quark_quantize.pyusescacheDir/cacheKeypredict.pyusescache_dir/cache_keypredict.pyalso setsxlnx_enable_py3_round = 0for PHX/HPTAfter changing
quark_quantize.pyto:cache_dirinstead ofcacheDircache_keyinstead ofcacheKeyxlnx_enable_py3_round = 0for PHX/HPTthe same benchmark path was able to run on the NPU correctly on my PHX/HPT system.
Suggested fix
Please align
CNN-examples/quark_quantization/quark_quantize.pywith the working PHX/HPT Vitis AI EP initialization used inCNN-examples/getting_started_resnet/int8/predict.py.