The Gemma4 documentation by Google suggests setting the token budget to adjust the resolution quality of images for more accurate OCR tasks
https://ai.google.dev/gemma/docs/capabilities/vision/image#variable_resolution_token_budget
However, I do not see any configuration options to set it when running AsyncClient.chat(). Adding max_soft_tokens to options does not seem to do anything.
response = await AsyncClient(host=OLLAMA_HOST).chat(
model=MODEL,
messages=messages,
format=self.result_model.model_json_schema(),
options={
"num_ctx": 32 * 1024,
"temperature": 0.0,
"max_soft_tokens": 560, # does nothing
}
)
By default, Gemma4 models runs at 280 token budget which is not enough for my OCR task; I am testing with Gemma4-e4b.
To verify, I had a vehicle license plate image uploaded to the my ollama endpoint, the result had a missing letter at the end. Then I swapped to using pure transformers library to load an unquantized gemma4-e4b that also produces the same result. However, the transformers AutoProcessor library had an option to set max_soft_tokens where I set it to 560 and it produced the correct expected result.
// ollama default (q4_k_m) and unquantized gemma4
{
"license_plate_number": "YRSGNB", // expected YRSGNBY
"license_plate_state": "California"
}
// unquantized gemma4 with max_soft_token=560
{
"license_plate_number": "YRSGNBY", // expected YRSGNBY
"license_plate_state": "California"
}
The Gemma4 documentation by Google suggests setting the token budget to adjust the resolution quality of images for more accurate OCR tasks
https://ai.google.dev/gemma/docs/capabilities/vision/image#variable_resolution_token_budget
However, I do not see any configuration options to set it when running
AsyncClient.chat(). Addingmax_soft_tokenstooptionsdoes not seem to do anything.By default, Gemma4 models runs at 280 token budget which is not enough for my OCR task; I am testing with Gemma4-e4b.
To verify, I had a vehicle license plate image uploaded to the my ollama endpoint, the result had a missing letter at the end. Then I swapped to using pure transformers library to load an unquantized gemma4-e4b that also produces the same result. However, the transformers
AutoProcessorlibrary had an option to setmax_soft_tokenswhere I set it to560and it produced the correct expected result.