How to set token budget for Gemma4 with Ollama

The Gemma4 documentation by Google suggests setting the token budget to adjust the resolution quality of images for more accurate OCR tasks

https://ai.google.dev/gemma/docs/capabilities/vision/image#variable_resolution_token_budget

However, I do not see any configuration options to set it when running `AsyncClient.chat()`. Adding `max_soft_tokens` to `options` does not seem to do anything. 
```
response = await AsyncClient(host=OLLAMA_HOST).chat(
            model=MODEL,
            messages=messages,
            format=self.result_model.model_json_schema(),
            options={
                "num_ctx": 32 * 1024,
                "temperature": 0.0,
                "max_soft_tokens": 560,    # does nothing
            }
        )
```
By default, Gemma4 models runs at 280 token budget which is not enough for my OCR task; I am testing with Gemma4-e4b.

To verify, I had a vehicle license plate image uploaded to the my ollama endpoint, the result had a missing letter at the end. Then I swapped to using pure transformers library to load an unquantized gemma4-e4b that also produces the same result. However, the transformers `AutoProcessor` library had an option to set `max_soft_tokens` where I set it to `560` and it produced the correct expected result.

```
// ollama default (q4_k_m) and unquantized gemma4
{
    "license_plate_number": "YRSGNB",    // expected YRSGNBY
    "license_plate_state": "California"
}

// unquantized gemma4 with max_soft_token=560
{
    "license_plate_number": "YRSGNBY",    // expected YRSGNBY
    "license_plate_state": "California"
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set token budget for Gemma4 with Ollama #651

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to set token budget for Gemma4 with Ollama #651

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions