Skip to content

gguf_tensor_to_f16 failed when loading Qwen3.5-9B GGUF model #4046

@huhanwj

Description

@huhanwj

Describe the bug
I am trying to serve the unsloth/Qwen3.5-9B-GGUF model on a baremetal Windows host using OVMS v2026.0. The server fails to start and throws a gguf_tensor_to_f16 failed error during the LLM node initialization. I suspect the GGUF parser does not yet support the tensor structure of Qwen3.5.

Since there were similar issues with other new architectures like Qwen3-VL, I would like to ask if there is a plan or timeline to support the Qwen3.5 GGUF model structure.

To Reproduce
Steps to reproduce the behavior:

  1. Download Qwen3.5-9B-Q4_K_M.gguf from Hugging Face (unsloth/Qwen3.5-9B-GGUF).
  2. Place the file in the local directory: C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\
  3. Run the following OVMS launch command on a Windows baremetal host:
.\ovms.exe --source_model "unsloth/Qwen3.5-9B-GGUF" --model_repository_path \models --model_name unsloth/Qwen3.5-9B-GGUF --task text_generation --gguf_filename Qwen3.5-9B-Q4_K_M.gguf --target_device GPU --port 8000 --rest_port 9000
  1. See error during startup.

Expected behavior
The model should load successfully, and the OVMS server should start listening on the specified gRPC and REST ports without crashing.

Logs

[2026-03-08 14:18:36.179][22220][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: C:\ovms\\models\unsloth\Qwen3.5-9B-GGUF\./Qwen3.5-9B-Q4_K_M.gguf exception: Check 'data != nullptr' failed at src\cpp\src\gguf_utils\gguf.cpp:96:
[load_gguf] gguf_tensor_to_f16 failed

[2026-03-08 14:18:36.179][22220][modelmanager][error][servable_initializer.cpp:425] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2026-03-08 14:18:36.179][22220][serving][error][mediapipegraphdefinition.cpp:474] Failed to process LLM node graph unsloth/Qwen3.5-9B-GGUF
[2026-03-08 14:18:36.180][22220][modelmanager][error][modelmanager.cpp:184] Couldn't start model manager

Configuration

  1. OVMS version: v2026.0 (OpenVINO Model Server 2026.0.0.4d3933c5, OpenVINO backend 2026.0.0)
  2. OVMS config.json file: N/A (Using command-line parameters)
  3. CPU, accelerator's versions: Target device is GPU, Arc B390 with Core X7 Ultra 358H. Baremetal Windows host.
  4. Model repository directory structure:
C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\
└── Qwen3.5-9B-Q4_K_M.gguf

  1. Model: unsloth/Qwen3.5-9B-GGUF from Hugging Face.

Additional context
I am running this directly on Windows (baremetal), not in a Docker container. I noticed in other issues that support for newer model structures is sometimes added in later patches. Let me know if there are any workarounds for GGUF loading in the meantime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions