Skip to content

[Feature]: Support Gemma 4 Jinja Templates (Fixes MissingTemplateException) #1375

@zsogitbe

Description

@zsogitbe

Background & Description

Description

Currently, attempting to load a Gemma 4 GGUF model and evaluate its prompt template using LLamaTemplate (or high-level wrappers like ChatSession) results in a crash. LLamaSharp throws a MissingTemplateException because the native llama_chat_apply_template function returns -1.

The Error:

LLama.Exceptions.MissingTemplateException: llama_chat_apply_template failed: 
template not found for '{%- macro format_parameters(properties, required) -%}
{%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
{%- set ns = namespace(found_first=false) -%} ...

Root Cause Analysis:
Gemma 4 introduced a highly complex Jinja template to handle multimodality and native function/tool calling. It expects to iterate over complex objects like properties, required, and tools.

The failure occurs because the upstream llama.cpp core C API (llama_chat_message struct) currently only accepts role and content. LLamaSharp's P/Invoke interop struct perfectly mirrors this:

_nativeChatMessages[i] = new LLamaChatMessage
{
    role = (byte*)r.Pointer,
    content = (byte*)c.Pointer
};

Because the native minijinja engine inside llama.cpp is starved of the tool-calling variables the Gemma 4 template expects, the template evaluation fails internally. The C++ engine falls back to heuristic matching, fails, and returns -1, which LLamaSharp bubbles up as a MissingTemplateException.

Crucially, the upstream llama.cpp repository currently bypasses llama_chat_apply_template for Gemma 4. They rely on hardcoded C++ workarounds (e.g., specialized template functions) to manually format Gemma 4 prompts because their own Jinja parser cannot handle it yet without struct updates.

Proposed Action Plan:
Since LLamaSharp cannot marshal tool definitions until llama.cpp updates its llama_chat_message struct upstream, we need a two-phased approach to support Gemma 4:

1. Short-Term Fix (Actionable PR): Implement a C# equivalent of llama.cpp's specialized template bypass. If a Gemma 4 model is detected (or its specific Jinja signature is read), LLamaSharp should intercept it and automatically apply the safe "gemma" fallback template internally, rather than passing the un-parsable Jinja string to the C++ backend and crashing.

2. Long-Term Fix (Upstream Dependency):
Once llama.cpp overhauls the llama_chat_apply_template API to accept tool/JSON schemas, update LLamaSharp's LLamaChatMessage P/Invoke struct and the LLamaTemplate.Apply() method to marshal C# tool definitions across the boundary.

Current Workaround
Currently, developers can bypass the exception by explicitly overriding the Jinja template extraction and forcing the backend to use its internal C++ defaults via:

var template = new LLamaTemplate("gemma"); 

While this prevents the crash and allows standard text generation, it strips out Gemma 4's native tool-calling and multimodality formatting. Furthermore, developers using high-level wrappers like ChatSession often hit this crash before they realize they need to override the template.

Additional context
This issue will become a widespread blocker as more developers adopt Gemma 4. Implementing the short-term bypass will prevent the immediate crashes while we wait for the upstream C API to accommodate native tool calling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions