Background & Description
Description
Currently, attempting to load a Gemma 4 GGUF model and evaluate its prompt template using LLamaTemplate (or high-level wrappers like ChatSession) results in a crash. LLamaSharp throws a MissingTemplateException because the native llama_chat_apply_template function returns -1.
The Error:
LLama.Exceptions.MissingTemplateException: llama_chat_apply_template failed:
template not found for '{%- macro format_parameters(properties, required) -%}
{%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
{%- set ns = namespace(found_first=false) -%} ...
Root Cause Analysis:
Gemma 4 introduced a highly complex Jinja template to handle multimodality and native function/tool calling. It expects to iterate over complex objects like properties, required, and tools.
The failure occurs because the upstream llama.cpp core C API (llama_chat_message struct) currently only accepts role and content. LLamaSharp's P/Invoke interop struct perfectly mirrors this:
_nativeChatMessages[i] = new LLamaChatMessage
{
role = (byte*)r.Pointer,
content = (byte*)c.Pointer
};
Because the native minijinja engine inside llama.cpp is starved of the tool-calling variables the Gemma 4 template expects, the template evaluation fails internally. The C++ engine falls back to heuristic matching, fails, and returns -1, which LLamaSharp bubbles up as a MissingTemplateException.
Crucially, the upstream llama.cpp repository currently bypasses llama_chat_apply_template for Gemma 4. They rely on hardcoded C++ workarounds (e.g., specialized template functions) to manually format Gemma 4 prompts because their own Jinja parser cannot handle it yet without struct updates.
Proposed Action Plan:
Since LLamaSharp cannot marshal tool definitions until llama.cpp updates its llama_chat_message struct upstream, we need a two-phased approach to support Gemma 4:
1. Short-Term Fix (Actionable PR): Implement a C# equivalent of llama.cpp's specialized template bypass. If a Gemma 4 model is detected (or its specific Jinja signature is read), LLamaSharp should intercept it and automatically apply the safe "gemma" fallback template internally, rather than passing the un-parsable Jinja string to the C++ backend and crashing.
2. Long-Term Fix (Upstream Dependency):
Once llama.cpp overhauls the llama_chat_apply_template API to accept tool/JSON schemas, update LLamaSharp's LLamaChatMessage P/Invoke struct and the LLamaTemplate.Apply() method to marshal C# tool definitions across the boundary.
Current Workaround
Currently, developers can bypass the exception by explicitly overriding the Jinja template extraction and forcing the backend to use its internal C++ defaults via:
var template = new LLamaTemplate("gemma");
While this prevents the crash and allows standard text generation, it strips out Gemma 4's native tool-calling and multimodality formatting. Furthermore, developers using high-level wrappers like ChatSession often hit this crash before they realize they need to override the template.
Additional context
This issue will become a widespread blocker as more developers adopt Gemma 4. Implementing the short-term bypass will prevent the immediate crashes while we wait for the upstream C API to accommodate native tool calling.
Background & Description
Description
Currently, attempting to load a Gemma 4 GGUF model and evaluate its prompt template using
LLamaTemplate(or high-level wrappers likeChatSession) results in a crash. LLamaSharp throws aMissingTemplateExceptionbecause the nativellama_chat_apply_templatefunction returns-1.The Error:
Root Cause Analysis:
Gemma 4 introduced a highly complex Jinja template to handle multimodality and native function/tool calling. It expects to iterate over complex objects like
properties,required, andtools.The failure occurs because the upstream
llama.cppcore C API (llama_chat_messagestruct) currently only acceptsroleandcontent. LLamaSharp's P/Invoke interop struct perfectly mirrors this:Because the native
minijinjaengine insidellama.cppis starved of the tool-calling variables the Gemma 4 template expects, the template evaluation fails internally. The C++ engine falls back to heuristic matching, fails, and returns-1, which LLamaSharp bubbles up as aMissingTemplateException.Crucially, the upstream
llama.cpprepository currently bypassesllama_chat_apply_templatefor Gemma 4. They rely on hardcoded C++ workarounds (e.g., specialized template functions) to manually format Gemma 4 prompts because their own Jinja parser cannot handle it yet without struct updates.Proposed Action Plan:
Since LLamaSharp cannot marshal tool definitions until
llama.cppupdates itsllama_chat_messagestruct upstream, we need a two-phased approach to support Gemma 4:1. Short-Term Fix (Actionable PR): Implement a C# equivalent of
llama.cpp's specialized template bypass. If a Gemma 4 model is detected (or its specific Jinja signature is read), LLamaSharp should intercept it and automatically apply the safe"gemma"fallback template internally, rather than passing the un-parsable Jinja string to the C++ backend and crashing.2. Long-Term Fix (Upstream Dependency):
Once
llama.cppoverhauls thellama_chat_apply_templateAPI to accept tool/JSON schemas, update LLamaSharp'sLLamaChatMessageP/Invoke struct and theLLamaTemplate.Apply()method to marshal C# tool definitions across the boundary.Current Workaround
Currently, developers can bypass the exception by explicitly overriding the Jinja template extraction and forcing the backend to use its internal C++ defaults via:
While this prevents the crash and allows standard text generation, it strips out Gemma 4's native tool-calling and multimodality formatting. Furthermore, developers using high-level wrappers like
ChatSessionoften hit this crash before they realize they need to override the template.Additional context
This issue will become a widespread blocker as more developers adopt Gemma 4. Implementing the short-term bypass will prevent the immediate crashes while we wait for the upstream C API to accommodate native tool calling.