feat(google-gemini): update model YAMLs [bot]#902
Conversation
|
/test-models |
Gateway test results
Failures (2)
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-gemini/gemini-robotics-er-1.6-preview",
messages=[
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hi, how can I help you"},
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=False,
)
_usage = getattr(response, "usage", None)
_reasoning_detected = False
_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
_message = getattr(_choices[0], "message", None)
else:
_message = None
if _message and getattr(_message, "content", None) is not None:
print(_message.content)
if _usage is not None:
_output_token_details = getattr(_usage, "completion_tokens_details", None)
if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
elif getattr(_usage, "reasoning", None) is not None:
_reasoning_detected = True
if getattr(_message, "reasoning_content", None) is not None:
_reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
_reasoning_detected = True
if not _reasoning_detected:
print("Response: ", response)
raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-gemini/gemini-robotics-er-1.6-preview",
messages=[
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hi, how can I help you"},
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=True,
)
_reasoning_detected = False
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
print(delta.content, end="", flush=True)
if getattr(delta, "reasoning_content", None) is not None:
_reasoning_detected = True
if getattr(delta, "reasoning", None) is not None:
_reasoning_detected = True
_usage = getattr(chunk, "usage", None)
if _usage is not None:
_details = getattr(_usage, "completion_tokens_details", None)
if _details and getattr(_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
if not _reasoning_detected:
raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS") |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 619d9b9. Configure here.
| @@ -1,5 +1,6 @@ | |||
| costs: | |||
| - input_cost_per_token: 0.000001 | |||
| input_cost_per_token_batches: 5e-7 | |||
There was a problem hiding this comment.
Missing output batch cost in TTS model pricing
Medium Severity
Adding input_cost_per_token_batches without a corresponding output_cost_per_token_batches is inconsistent with the other TTS models. Both gemini-2.5-pro-preview-tts.yaml (which has identical per-token pricing) and gemini-2.5-flash-preview-tts.yaml include output_cost_per_token_batches at a 50% discount of their output_cost_per_audio_token. This model's output_cost_per_audio_token is 0.00002, so the expected batch output cost would be 0.00001.
Reviewed by Cursor Bugbot for commit 619d9b9. Configure here.


Auto-generated by poc-agent for provider
google-gemini.Note
Medium Risk
Medium risk because it changes model configuration used for pricing calculations and enforces a significantly smaller
context_window/max_input_tokensforgemini-robotics-er-1.6-preview, which could impact workloads relying on prior limits.Overview
Updates Google Gemini model metadata YAMLs.
Adds batched input pricing (
input_cost_per_token_batches) togemini-3.1-flash-tts-preview.Reduces
gemini-robotics-er-1.6-previewlimits by loweringcontext_windowandmax_input_tokensfrom1048576to131072.Reviewed by Cursor Bugbot for commit 619d9b9. Bugbot is set up for automated code reviews on this repo. Configure here.