Skip to content

v0.3.0: enum-aware prompts, execution-guided self-repair, tighter generation rules#16

Merged
Cyberfilo merged 4 commits into
stagingfrom
feat/v0.3-generation-quality
Jun 10, 2026
Merged

v0.3.0: enum-aware prompts, execution-guided self-repair, tighter generation rules#16
Cyberfilo merged 4 commits into
stagingfrom
feat/v0.3-generation-quality

Conversation

@Cyberfilo

Copy link
Copy Markdown
Owner

Retrieval has been measuring at 98–100% recall on a 211-table benchmark for a while now — when a query comes back wrong, it's the SQL itself, not the table selection. I went through all 42 misses of the last benchmark run question by question, and three error classes account for most of them. This release targets those three.

What's in it

Enum-aware schema prompts. Of the 7 hard database errors in that run, 6 were invented enum values — the model wrote status = 'overdue' when the enum says past_due, 'churned' when it's cancelled, and so on. Another ~12 silent wrong results came from the model dodging a status column it couldn't read into timestamp guesses (delivered_at IS NOT NULL instead of status = 'delivered'). Root cause is the same: the prompt never showed the legal values. Introspection now pulls every enum's labels and every column comment from pg_catalog, and format_schema renders them. On the benchmark schema that's 50 enum columns the model can now actually see.

Execution-guided self-repair. We already execute read-only, so when Postgres rejects a query we have the best error message there is — now it goes back to the model for one corrected attempt (--max-repair, default 1, 0 disables). Repaired SQL goes through the same sqlglot validator, and in the REPL through the same confirm prompt, before it runs. Empty results deliberately don't trigger repair: empty is often the correct answer, and "fixing" it risks swapping a right answer for a wrong one.

Tighter generation rules. The remaining big class was answer-shape drift: extra id/timestamp columns nobody asked for, LEFT JOIN where the question implies a match, speculative filters like deleted_at IS NULL. The system prompt now pins these down explicitly.

Also fixes bare o4-* model names not inferring the OpenAI provider (o1/o3 already did), and reconciles stale v0.1-era docs (embeddings "queued for v0.2", old test counts).

Tests

48 → 66, all pure Python. New coverage: the repair loop (including unsafe repairs never execute and declined repairs never run), enum serialization round-trips, provider inference. ruff clean.

Numbers

Benchmark run on this branch is next — results land in the PR that promotes staging to main, with conditions stated. No claims until then.

… in the prompt

Introspection now reads col_description() and pg_enum labels for every column;
format_schema renders them, so the generator filters on real states instead of
guessing them. New generation rules pin down answer shape: exactly the columns
asked for, no speculative filters, INNER JOIN by default, status columns over
timestamp inference.
When the database rejects a query, feed the SQL plus the database's own error
back to the model for a bounded number of corrected attempts. Repaired SQL is
re-validated by the sqlglot guard and re-confirmed in the REPL before it runs.
Empty results never trigger repair — empty is often the right answer.
@Cyberfilo Cyberfilo merged commit a82440e into staging Jun 10, 2026
5 checks passed
@Cyberfilo Cyberfilo deleted the feat/v0.3-generation-quality branch June 10, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant