Skip to content

Wire QwenNetworkLoader into CLI for proper Qwen3 inference #46

@michalharakal

Description

@michalharakal

Context

The CLI (Main.kt) always routes GGUF models through LlamaIngestionLlamaRuntime, which works for Llama-architecture models. Qwen3 models load successfully (same tensor names), but produce garbage output because LlamaRuntime doesn't handle Qwen3-specific features:

  • QK-norm (query/key normalization via attn_q_norm.weight / attn_k_norm.weight)
  • RoPE base frequency (1,000,000 vs Llama's 10,000)
  • BOS token differences

The correct loader (QwenNetworkLoader in llm-inference:qwen) exists but isn't wired into the CLI.

Scope

  • Add :llm-inference:qwen dependency to :llm-runtime:kllama
  • Detect qwen* architecture from GGUF metadata in Main.kt
  • Route to QwenNetworkLoader.fromGguf() for Qwen models
  • Wire the Qwen DSL network module into a runtime compatible with AgentLoop
  • Validate end-to-end with Qwen3-1.7B-Q8_0.gguf --demo

Related

Acceptance Criteria

  • Qwen3-1.7B-Q8_0.gguf --demo produces coherent output
  • Tool calling works through the Qwen chat template
  • Llama models continue working unchanged

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions