qwen3.5-9b-cuda-gpu` cannot be loaded - config/runtime version mismatch (+ a related `bin/` copy bug)

# Foundry Toolkit: `qwen3.5-9b-cuda-gpu` cannot be loaded — config/runtime version mismatch (+ a related `bin/` copy bug)

This report covers **two issues** found while running local models with the Foundry Toolkit (formerly AI Toolkit) on Windows + CUDA. **Issue 1 is the blocker.**

---

## Issue 1 (BLOCKER): Shipped `qwen3.5-9b-cuda-gpu` model is incompatible with the bundled onnxruntime-genai runtime

### Summary
The Toolkit's catalog offers **`qwen3.5-9b-cuda-gpu`**, but the **bundled onnxruntime-genai runtime cannot load it**. The model's `genai_config.json` uses a schema/architecture newer than the runtime shipped with the extension. This reproduces on a **clean re-download** with the **latest extension (1.4.2)**, so it is not a corrupted-download or stale-cache problem — the catalog and the runtime are simply out of sync.

### Errors (two symptoms, same root cause)

First, a strict JSON parse error on an unknown vision field:
```
Failed loading model qwen3.5-9b-cuda-gpu:2.
Error encountered while parsing
'C:\Users\<user>\.aitk\models\microsoft\qwen3.5-9b-cuda-gpu-2\v2\genai_config.json'
JSON Error: model:vision: Unknown value "patch_size" at line 68 index 23
```

Then, after removing `patch_size` to get past the parser, the real wall — the runtime has no implementation for the architecture:
```
Failed loading model qwen3.5-9b-cuda-gpu:2.
Unsupported model_type in config.json: qwen3_5
```

### Root cause
- `genai_config.json` declares `"type": "qwen3_5"` and a `vision` block containing `patch_size`.
- The bundled runtime (`libonnxruntime_cuda_windows` **0.0.7**) does **not** recognize the `qwen3_5` model type, and its strict parser rejects the unknown `vision.patch_size` key.
- The `patch_size` parse error is just the first symptom; `Unsupported model_type: qwen3_5` is the definitive blocker. **No config edit can fix this** — the architecture support must exist in the runtime.

### Repro steps
1. Install Foundry Toolkit **1.4.2** (latest).
2. From the catalog, download **`qwen3.5-9b-cuda-gpu`** (CUDA execution provider).
3. Load the model.
4. Loading fails with the `patch_size` parse error; if that field is removed, it fails with `Unsupported model_type in config.json: qwen3_5`.
5. Deleting and re-downloading the model does **not** help — same result.

### Relevant config snippet (`genai_config.json`)
```jsonc
"model": {
  "type": "qwen3_5",          // <-- runtime 0.0.7 does not support this model_type
  "vision": {
    "filename": "vision.onnx",
    "spatial_merge_size": 2,
    "tokens_per_second": 2.0,
    "patch_size": 16,         // <-- line 68: strict parser rejects unknown key
    ...
  }
}
```

### Impact
- A model advertised in the catalog is **impossible to run** with the runtime shipped in the same extension version. Users hit this only after a multi-GB download.

### Suggested fixes (product side)
- **Sync the catalog with runtime capability**: don't list/allow a model whose `model_type` (e.g. `qwen3_5`) and config schema aren't supported by the runtime bundled in that extension build.
- **Bump the bundled onnxruntime-genai** to a version that implements `qwen3_5` and the current `vision` schema (incl. `patch_size`), and pair it with the model download.
- **Fail fast with a clear message**: e.g. "This model requires onnxruntime-genai ≥ X.Y.Z; your runtime is 0.0.7" instead of a late-stage JSON parse / unsupported-type error.
- Consider **forward-compatible parsing**: tolerate unknown optional config keys (warn instead of hard-fail) so a schema addition like `patch_size` doesn't block loading on its own.

### Workarounds (for users, limited)
- Editing `genai_config.json` (removing `patch_size`) only advances to the `Unsupported model_type: qwen3_5` error — **not a real fix**.
- Until the runtime supports `qwen3_5`, use a model the current runtime can run (e.g. a Qwen2.5 / Phi CUDA build) to keep working.

### Environment
- OS: Windows
- Extension: Foundry Toolkit `ms-windows-ai-studio.windows-ai-studio` **1.4.2** (win32-x64)
- Execution provider: CUDA
- Bundled runtime: `libonnxruntime_cuda_windows` **0.0.7**
- Model: `qwen3.5-9b-cuda-gpu` (`model_type: qwen3_5`, multimodal/vision)

---

## Issue 2 (separate, lower severity): CUDA DLL copy fails with `ENOENT` when extension `bin/` folder is missing

### Summary
When switching a local model to the **CUDA** execution provider, the Toolkit fails to copy `onnxruntime-genai-cuda.dll` into the extension's `bin/` folder. The thrown `ENOENT` points at the *source* DLL, which is misleading — the source exists and is fully downloaded. The real cause is that the **destination `bin/` directory does not exist**, and Node's `fs.copyFile` throws `ENOENT` when the target directory is missing.

### Error
```
Sorry, your request failed. Please try again.

Reason: ENOENT: no such file or directory, copyfile
'C:\Users\<user>\.aitk\bin\libonnxruntime_cuda_windows\0.0.7\onnxruntime-genai-cuda.dll'
-> 'c:\Users\<user>\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\bin\onnxruntime-genai-cuda.dll'
```

### Root cause
- The **source** file exists and is complete:
  `...\.aitk\bin\libonnxruntime_cuda_windows\0.0.7\onnxruntime-genai-cuda.dll` (~85 MB) ✅
- The **destination directory** does **not** exist:
  `...\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\bin\` ❌
- `fs.copyFile(src, dest)` raises `ENOENT` when the parent directory of `dest` is missing. The error string only names the two file paths, so it reads as if the *source* is missing — sending people to debug the wrong path.

### Repro steps
1. Fresh install of the extension where the `bin/` folder was not created (or was removed/cleaned).
2. Load a local model and select the **CUDA** execution provider.
3. Toolkit downloads `libonnxruntime_cuda_windows` (0.0.7) to `~/.aitk/bin/...`.
4. Toolkit attempts to copy the CUDA DLLs into `<extension>\bin\`.
5. Copy fails with the `ENOENT` above.

### Workaround
Manually create the missing folder, then retry:
```powershell
New-Item -ItemType Directory -Force `
  "$env:USERPROFILE\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\bin"
```

### Proposed fix
Ensure the destination directory exists before copying:
```ts
import * as fs from 'fs';
import * as path from 'path';

await fs.promises.mkdir(path.dirname(dest), { recursive: true });
await fs.promises.copyFile(src, dest);
```
Additional improvements:
- Wrap the copy in a try/catch that distinguishes **source-missing** vs **dest-dir-missing**, and surface an actionable message (e.g. "runtime download incomplete" vs "extension bin folder missing").
- Optionally verify the source DLL size/hash before copy to catch interrupted downloads.

### Environment
- OS: Windows
- Extension: `ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64`
- Execution provider: CUDA
- Runtime package: `libonnxruntime_cuda_windows` 0.0.7


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen3.5-9b-cuda-gpu `cannot be loaded - config/runtime version mismatch (+ a related` bin/` copy bug) #450

Foundry Toolkit: `qwen3.5-9b-cuda-gpu` cannot be loaded — config/runtime version mismatch (+ a related `bin/` copy bug)

Issue 1 (BLOCKER): Shipped `qwen3.5-9b-cuda-gpu` model is incompatible with the bundled onnxruntime-genai runtime

Summary

Errors (two symptoms, same root cause)

Root cause

Repro steps

Relevant config snippet (`genai_config.json`)

Impact

Suggested fixes (product side)

Workarounds (for users, limited)

Environment

Issue 2 (separate, lower severity): CUDA DLL copy fails with `ENOENT` when extension `bin/` folder is missing

Summary

Error

Root cause

Repro steps

Workaround

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

qwen3.5-9b-cuda-gpu cannot be loaded - config/runtime version mismatch (+ a related bin/` copy bug) #450

Description

Foundry Toolkit: qwen3.5-9b-cuda-gpu cannot be loaded — config/runtime version mismatch (+ a related bin/ copy bug)

Issue 1 (BLOCKER): Shipped qwen3.5-9b-cuda-gpu model is incompatible with the bundled onnxruntime-genai runtime

Summary

Errors (two symptoms, same root cause)

Root cause

Repro steps

Relevant config snippet (genai_config.json)

Impact

Suggested fixes (product side)

Workarounds (for users, limited)

Environment

Issue 2 (separate, lower severity): CUDA DLL copy fails with ENOENT when extension bin/ folder is missing

Summary

Error

Root cause

Repro steps

Workaround

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

qwen3.5-9b-cuda-gpu `cannot be loaded - config/runtime version mismatch (+ a related` bin/` copy bug) #450

Foundry Toolkit: `qwen3.5-9b-cuda-gpu` cannot be loaded — config/runtime version mismatch (+ a related `bin/` copy bug)

Issue 1 (BLOCKER): Shipped `qwen3.5-9b-cuda-gpu` model is incompatible with the bundled onnxruntime-genai runtime

Relevant config snippet (`genai_config.json`)

Issue 2 (separate, lower severity): CUDA DLL copy fails with `ENOENT` when extension `bin/` folder is missing