v0.3.0: enum-aware prompts, execution-guided self-repair, tighter generation rules by Cyberfilo · Pull Request #16 · Cyberfilo/PromptQuery

Cyberfilo · 2026-06-10T07:16:03Z

Retrieval has been measuring at 98–100% recall on a 211-table benchmark for a while now — when a query comes back wrong, it's the SQL itself, not the table selection. I went through all 42 misses of the last benchmark run question by question, and three error classes account for most of them. This release targets those three.

What's in it

Enum-aware schema prompts. Of the 7 hard database errors in that run, 6 were invented enum values — the model wrote status = 'overdue' when the enum says past_due, 'churned' when it's cancelled, and so on. Another ~12 silent wrong results came from the model dodging a status column it couldn't read into timestamp guesses (delivered_at IS NOT NULL instead of status = 'delivered'). Root cause is the same: the prompt never showed the legal values. Introspection now pulls every enum's labels and every column comment from pg_catalog, and format_schema renders them. On the benchmark schema that's 50 enum columns the model can now actually see.

Execution-guided self-repair. We already execute read-only, so when Postgres rejects a query we have the best error message there is — now it goes back to the model for one corrected attempt (--max-repair, default 1, 0 disables). Repaired SQL goes through the same sqlglot validator, and in the REPL through the same confirm prompt, before it runs. Empty results deliberately don't trigger repair: empty is often the correct answer, and "fixing" it risks swapping a right answer for a wrong one.

Tighter generation rules. The remaining big class was answer-shape drift: extra id/timestamp columns nobody asked for, LEFT JOIN where the question implies a match, speculative filters like deleted_at IS NULL. The system prompt now pins these down explicitly.

Also fixes bare o4-* model names not inferring the OpenAI provider (o1/o3 already did), and reconciles stale v0.1-era docs (embeddings "queued for v0.2", old test counts).

Tests

48 → 66, all pure Python. New coverage: the repair loop (including unsafe repairs never execute and declined repairs never run), enum serialization round-trips, provider inference. ruff clean.

Numbers

Benchmark run on this branch is next — results land in the PR that promotes staging to main, with conditions stated. No claims until then.

… in the prompt Introspection now reads col_description() and pg_enum labels for every column; format_schema renders them, so the generator filters on real states instead of guessing them. New generation rules pin down answer shape: exactly the columns asked for, no speculative filters, INNER JOIN by default, status columns over timestamp inference.

When the database rejects a query, feed the SQL plus the database's own error back to the model for a bounded number of corrected attempts. Repaired SQL is re-validated by the sqlglot guard and re-confirmed in the REPL before it runs. Empty results never trigger repair — empty is often the right answer.

Cyberfilo added 4 commits June 10, 2026 09:15

fix: infer OpenAI provider for bare o4-* model names

a2a99a5

chore: 0.3.0 — changelog, pipeline docs, reconcile stale v0.1-era claims

29fa4a6

Cyberfilo merged commit a82440e into staging Jun 10, 2026
5 checks passed

Cyberfilo deleted the feat/v0.3-generation-quality branch June 10, 2026 07:17

Cyberfilo mentioned this pull request Jun 10, 2026

Release v0.3.0 — EX 58% → 72% on the 100-question benchmark, zero hard errors #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0: enum-aware prompts, execution-guided self-repair, tighter generation rules#16

v0.3.0: enum-aware prompts, execution-guided self-repair, tighter generation rules#16
Cyberfilo merged 4 commits into
stagingfrom
feat/v0.3-generation-quality

Cyberfilo commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Cyberfilo commented Jun 10, 2026

What's in it

Tests

Numbers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant