fix: Gemini structured output, rate limits, and inspection runtime issue fixed.#32
Open
sharvsave1023 wants to merge 1 commit into
Open
Conversation
Record end-to-end LangGraph wall time as runtime_ms and pass it through the API, persisted message metadata, and the React mappers so inspection shows milliseconds instead of "not reported." Gemini: use response_json_schema with a stripped JSON schema to avoid additionalProperties errors; add optional RPM throttling via env.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
response_json_schemawith Pydantic-generated schemas that have Gemini-incompatible keywords removed, so the API no longer errors onadditionalProperties. Optional per-minute throttling (GEMINI_MAX_REQUESTS_PER_MINUTE/GEMINI_RATE_LIMIT_WINDOW_SECONDS) helps stay within free-tier RPM..env.exampledocuments the new variables.runtime_ms, returned fromPOST /chat, included in assistant metadata for thread reload, and shown in the inspection UI so Runtime is a millisecond value instead of “Not reported.”How to test
Runtime: Run a real analysis, open Inspect SQL or Open execution detail, and confirm Runtime shows
N msin the stat row and under Results.Gemini: Set
LLM_PROVIDER=geminiand validGEMINI_*keys, run a prompt, and confirm noadditionalProperties/ schema errors; optionally tuneGEMINI_MAX_REQUESTS_PER_MINUTEand watch logs for throttle waits.Closes #30