v0.3.0: enum-aware prompts, execution-guided self-repair, tighter generation rules#16
Merged
Merged
Conversation
… in the prompt Introspection now reads col_description() and pg_enum labels for every column; format_schema renders them, so the generator filters on real states instead of guessing them. New generation rules pin down answer shape: exactly the columns asked for, no speculative filters, INNER JOIN by default, status columns over timestamp inference.
When the database rejects a query, feed the SQL plus the database's own error back to the model for a bounded number of corrected attempts. Repaired SQL is re-validated by the sqlglot guard and re-confirmed in the REPL before it runs. Empty results never trigger repair — empty is often the right answer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Retrieval has been measuring at 98–100% recall on a 211-table benchmark for a while now — when a query comes back wrong, it's the SQL itself, not the table selection. I went through all 42 misses of the last benchmark run question by question, and three error classes account for most of them. This release targets those three.
What's in it
Enum-aware schema prompts. Of the 7 hard database errors in that run, 6 were invented enum values — the model wrote
status = 'overdue'when the enum sayspast_due,'churned'when it'scancelled, and so on. Another ~12 silent wrong results came from the model dodging a status column it couldn't read into timestamp guesses (delivered_at IS NOT NULLinstead ofstatus = 'delivered'). Root cause is the same: the prompt never showed the legal values. Introspection now pulls every enum's labels and every column comment frompg_catalog, andformat_schemarenders them. On the benchmark schema that's 50 enum columns the model can now actually see.Execution-guided self-repair. We already execute read-only, so when Postgres rejects a query we have the best error message there is — now it goes back to the model for one corrected attempt (
--max-repair, default 1,0disables). Repaired SQL goes through the same sqlglot validator, and in the REPL through the same confirm prompt, before it runs. Empty results deliberately don't trigger repair: empty is often the correct answer, and "fixing" it risks swapping a right answer for a wrong one.Tighter generation rules. The remaining big class was answer-shape drift: extra id/timestamp columns nobody asked for,
LEFT JOINwhere the question implies a match, speculative filters likedeleted_at IS NULL. The system prompt now pins these down explicitly.Also fixes bare
o4-*model names not inferring the OpenAI provider (o1/o3already did), and reconciles stale v0.1-era docs (embeddings "queued for v0.2", old test counts).Tests
48 → 66, all pure Python. New coverage: the repair loop (including unsafe repairs never execute and declined repairs never run), enum serialization round-trips, provider inference.
ruffclean.Numbers
Benchmark run on this branch is next — results land in the PR that promotes staging to main, with conditions stated. No claims until then.