-
Notifications
You must be signed in to change notification settings - Fork 240
Description
Describe the bug
I am trying to serve the unsloth/Qwen3.5-9B-GGUF model on a baremetal Windows host using OVMS v2026.0. The server fails to start and throws a gguf_tensor_to_f16 failed error during the LLM node initialization. I suspect the GGUF parser does not yet support the tensor structure of Qwen3.5.
Since there were similar issues with other new architectures like Qwen3-VL, I would like to ask if there is a plan or timeline to support the Qwen3.5 GGUF model structure.
To Reproduce
Steps to reproduce the behavior:
- Download
Qwen3.5-9B-Q4_K_M.gguffrom Hugging Face (unsloth/Qwen3.5-9B-GGUF). - Place the file in the local directory:
C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\ - Run the following OVMS launch command on a Windows baremetal host:
.\ovms.exe --source_model "unsloth/Qwen3.5-9B-GGUF" --model_repository_path \models --model_name unsloth/Qwen3.5-9B-GGUF --task text_generation --gguf_filename Qwen3.5-9B-Q4_K_M.gguf --target_device GPU --port 8000 --rest_port 9000
- See error during startup.
Expected behavior
The model should load successfully, and the OVMS server should start listening on the specified gRPC and REST ports without crashing.
Logs
[2026-03-08 14:18:36.179][22220][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: C:\ovms\\models\unsloth\Qwen3.5-9B-GGUF\./Qwen3.5-9B-Q4_K_M.gguf exception: Check 'data != nullptr' failed at src\cpp\src\gguf_utils\gguf.cpp:96:
[load_gguf] gguf_tensor_to_f16 failed
[2026-03-08 14:18:36.179][22220][modelmanager][error][servable_initializer.cpp:425] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2026-03-08 14:18:36.179][22220][serving][error][mediapipegraphdefinition.cpp:474] Failed to process LLM node graph unsloth/Qwen3.5-9B-GGUF
[2026-03-08 14:18:36.180][22220][modelmanager][error][modelmanager.cpp:184] Couldn't start model manager
Configuration
- OVMS version:
v2026.0(OpenVINO Model Server 2026.0.0.4d3933c5, OpenVINO backend 2026.0.0) - OVMS config.json file: N/A (Using command-line parameters)
- CPU, accelerator's versions: Target device is GPU, Arc B390 with Core X7 Ultra 358H. Baremetal Windows host.
- Model repository directory structure:
C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\
└── Qwen3.5-9B-Q4_K_M.gguf
- Model:
unsloth/Qwen3.5-9B-GGUFfrom Hugging Face.
Additional context
I am running this directly on Windows (baremetal), not in a Docker container. I noticed in other issues that support for newer model structures is sometimes added in later patches. Let me know if there are any workarounds for GGUF loading in the meantime.