Context
The CLI (Main.kt) always routes GGUF models through LlamaIngestion → LlamaRuntime, which works for Llama-architecture models. Qwen3 models load successfully (same tensor names), but produce garbage output because LlamaRuntime doesn't handle Qwen3-specific features:
- QK-norm (query/key normalization via
attn_q_norm.weight / attn_k_norm.weight)
- RoPE base frequency (1,000,000 vs Llama's 10,000)
- BOS token differences
The correct loader (QwenNetworkLoader in llm-inference:qwen) exists but isn't wired into the CLI.
Scope
- Add
:llm-inference:qwen dependency to :llm-runtime:kllama
- Detect
qwen* architecture from GGUF metadata in Main.kt
- Route to
QwenNetworkLoader.fromGguf() for Qwen models
- Wire the Qwen DSL network module into a runtime compatible with
AgentLoop
- Validate end-to-end with
Qwen3-1.7B-Q8_0.gguf --demo
Related
Acceptance Criteria
Context
The CLI (
Main.kt) always routes GGUF models throughLlamaIngestion→LlamaRuntime, which works for Llama-architecture models. Qwen3 models load successfully (same tensor names), but produce garbage output becauseLlamaRuntimedoesn't handle Qwen3-specific features:attn_q_norm.weight/attn_k_norm.weight)The correct loader (
QwenNetworkLoaderinllm-inference:qwen) exists but isn't wired into the CLI.Scope
:llm-inference:qwendependency to:llm-runtime:kllamaqwen*architecture from GGUF metadata inMain.ktQwenNetworkLoader.fromGguf()for Qwen modelsAgentLoopQwen3-1.7B-Q8_0.gguf --demoRelated
Acceptance Criteria
Qwen3-1.7B-Q8_0.gguf --demoproduces coherent output