Skip to content

qwen3.5-9b-cuda-gpu cannot be loaded - config/runtime version mismatch (+ a related bin/` copy bug) #450

@davrous

Description

@davrous

Foundry Toolkit: qwen3.5-9b-cuda-gpu cannot be loaded — config/runtime version mismatch (+ a related bin/ copy bug)

This report covers two issues found while running local models with the Foundry Toolkit (formerly AI Toolkit) on Windows + CUDA. Issue 1 is the blocker.


Issue 1 (BLOCKER): Shipped qwen3.5-9b-cuda-gpu model is incompatible with the bundled onnxruntime-genai runtime

Summary

The Toolkit's catalog offers qwen3.5-9b-cuda-gpu, but the bundled onnxruntime-genai runtime cannot load it. The model's genai_config.json uses a schema/architecture newer than the runtime shipped with the extension. This reproduces on a clean re-download with the latest extension (1.4.2), so it is not a corrupted-download or stale-cache problem — the catalog and the runtime are simply out of sync.

Errors (two symptoms, same root cause)

First, a strict JSON parse error on an unknown vision field:

Failed loading model qwen3.5-9b-cuda-gpu:2.
Error encountered while parsing
'C:\Users\<user>\.aitk\models\microsoft\qwen3.5-9b-cuda-gpu-2\v2\genai_config.json'
JSON Error: model:vision: Unknown value "patch_size" at line 68 index 23

Then, after removing patch_size to get past the parser, the real wall — the runtime has no implementation for the architecture:

Failed loading model qwen3.5-9b-cuda-gpu:2.
Unsupported model_type in config.json: qwen3_5

Root cause

  • genai_config.json declares "type": "qwen3_5" and a vision block containing patch_size.
  • The bundled runtime (libonnxruntime_cuda_windows 0.0.7) does not recognize the qwen3_5 model type, and its strict parser rejects the unknown vision.patch_size key.
  • The patch_size parse error is just the first symptom; Unsupported model_type: qwen3_5 is the definitive blocker. No config edit can fix this — the architecture support must exist in the runtime.

Repro steps

  1. Install Foundry Toolkit 1.4.2 (latest).
  2. From the catalog, download qwen3.5-9b-cuda-gpu (CUDA execution provider).
  3. Load the model.
  4. Loading fails with the patch_size parse error; if that field is removed, it fails with Unsupported model_type in config.json: qwen3_5.
  5. Deleting and re-downloading the model does not help — same result.

Relevant config snippet (genai_config.json)

"model": {
  "type": "qwen3_5",          // <-- runtime 0.0.7 does not support this model_type
  "vision": {
    "filename": "vision.onnx",
    "spatial_merge_size": 2,
    "tokens_per_second": 2.0,
    "patch_size": 16,         // <-- line 68: strict parser rejects unknown key
    ...
  }
}

Impact

  • A model advertised in the catalog is impossible to run with the runtime shipped in the same extension version. Users hit this only after a multi-GB download.

Suggested fixes (product side)

  • Sync the catalog with runtime capability: don't list/allow a model whose model_type (e.g. qwen3_5) and config schema aren't supported by the runtime bundled in that extension build.
  • Bump the bundled onnxruntime-genai to a version that implements qwen3_5 and the current vision schema (incl. patch_size), and pair it with the model download.
  • Fail fast with a clear message: e.g. "This model requires onnxruntime-genai ≥ X.Y.Z; your runtime is 0.0.7" instead of a late-stage JSON parse / unsupported-type error.
  • Consider forward-compatible parsing: tolerate unknown optional config keys (warn instead of hard-fail) so a schema addition like patch_size doesn't block loading on its own.

Workarounds (for users, limited)

  • Editing genai_config.json (removing patch_size) only advances to the Unsupported model_type: qwen3_5 error — not a real fix.
  • Until the runtime supports qwen3_5, use a model the current runtime can run (e.g. a Qwen2.5 / Phi CUDA build) to keep working.

Environment

  • OS: Windows
  • Extension: Foundry Toolkit ms-windows-ai-studio.windows-ai-studio 1.4.2 (win32-x64)
  • Execution provider: CUDA
  • Bundled runtime: libonnxruntime_cuda_windows 0.0.7
  • Model: qwen3.5-9b-cuda-gpu (model_type: qwen3_5, multimodal/vision)

Issue 2 (separate, lower severity): CUDA DLL copy fails with ENOENT when extension bin/ folder is missing

Summary

When switching a local model to the CUDA execution provider, the Toolkit fails to copy onnxruntime-genai-cuda.dll into the extension's bin/ folder. The thrown ENOENT points at the source DLL, which is misleading — the source exists and is fully downloaded. The real cause is that the destination bin/ directory does not exist, and Node's fs.copyFile throws ENOENT when the target directory is missing.

Error

Sorry, your request failed. Please try again.

Reason: ENOENT: no such file or directory, copyfile
'C:\Users\<user>\.aitk\bin\libonnxruntime_cuda_windows\0.0.7\onnxruntime-genai-cuda.dll'
-> 'c:\Users\<user>\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\bin\onnxruntime-genai-cuda.dll'

Root cause

  • The source file exists and is complete:
    ...\.aitk\bin\libonnxruntime_cuda_windows\0.0.7\onnxruntime-genai-cuda.dll (~85 MB) ✅
  • The destination directory does not exist:
    ...\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\bin\
  • fs.copyFile(src, dest) raises ENOENT when the parent directory of dest is missing. The error string only names the two file paths, so it reads as if the source is missing — sending people to debug the wrong path.

Repro steps

  1. Fresh install of the extension where the bin/ folder was not created (or was removed/cleaned).
  2. Load a local model and select the CUDA execution provider.
  3. Toolkit downloads libonnxruntime_cuda_windows (0.0.7) to ~/.aitk/bin/....
  4. Toolkit attempts to copy the CUDA DLLs into <extension>\bin\.
  5. Copy fails with the ENOENT above.

Workaround

Manually create the missing folder, then retry:

New-Item -ItemType Directory -Force `
  "$env:USERPROFILE\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\bin"

Proposed fix

Ensure the destination directory exists before copying:

import * as fs from 'fs';
import * as path from 'path';

await fs.promises.mkdir(path.dirname(dest), { recursive: true });
await fs.promises.copyFile(src, dest);

Additional improvements:

  • Wrap the copy in a try/catch that distinguishes source-missing vs dest-dir-missing, and surface an actionable message (e.g. "runtime download incomplete" vs "extension bin folder missing").
  • Optionally verify the source DLL size/hash before copy to catch interrupted downloads.

Environment

  • OS: Windows
  • Extension: ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64
  • Execution provider: CUDA
  • Runtime package: libonnxruntime_cuda_windows 0.0.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs attentionThe issue needs contributor's attention

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions