Skip to content

Releases: Cyberfilo/PromptQuery

v0.3.0 — enum-aware prompts, execution-guided self-repair

10 Jun 08:46
0ecab61

Choose a tag to compare

Generation-quality release. Retrieval was already measuring at 98–100% recall on a 211-table benchmark; the misses were in the SQL itself. 0.3.0 attacks the three error classes that benchmarking surfaced — and re-measures.

Measured result

Same 100-question suite, same 211-table Postgres schema, same model (gpt-4o, temperature 0, single-state EX@1) — only the package version changed:

0.2.2 0.3.0
Execution accuracy (EX@1) 58.0% 72.0%
Hard DB errors 7/100 0/100
Soft-F1 60.2 73.9
Set-Recall 98.0% 99.0%
Tokens/query 4,257 4,689

What's new

  • Enum-aware schema prompts — the prompt now carries every enum column's legal values and column comments, read from pg_catalog. The model filters on real states (status = 'delivered') instead of guessing them or inferring state from timestamps. Largest single win: lexical-gap questions went 36 → 73.
  • Execution-guided self-repair (--max-repair, default 1) — when Postgres rejects a query, the SQL plus the database's own error go back to the model for one corrected attempt. Repaired SQL is re-validated by the sqlglot guard (and re-confirmed in the REPL) before it runs. Empty results never trigger repair — empty is often the right answer.
  • Tighter generation rules — exactly the columns asked for, no speculative filters, INNER JOIN unless the question implies otherwise.
  • Fix — bare o4-* model names now infer the OpenAI provider.

Honest negatives, same run: analytical questions (window functions) stay at 20%, multi-join dipped 58 → 50 on a 12-question bucket. Both are composition limits, queued for next.

Full details in the CHANGELOG.

PromptQuery v0.2.0

27 May 12:36

Choose a tag to compare

Headline

On Odoo's real 675-table production schema, gpt-4o (SQL) + gpt-4o-mini (selector):

Pipeline Pass rate Tokens/query Avg latency
Naive (full schema in prompt) 84.0% ~50,000 3.4 s
PromptQuery v0.1 (TF-IDF only) 76.0% ~2,000 2.0 s
PromptQuery v0.2 (TF-IDF + LLM selector) 100.0% ~5,000 5.6 s

+16pp more accurate and ~10× cheaper per query than dumping the full schema.

What changed

  • LLM table selector — TF-IDF (now stemmed) narrows to ~50 candidates, then a cheap model picks the ~15 actually relevant tables. Handles semantic gaps that TF-IDF cannot (e.g. invoiceaccount_move, shipmentstock_picking).
  • CLI flags--selector-model, --select, --no-selector.
  • Reasoning-class OpenAI models (gpt-5.x, o-series) now use max_completion_tokens correctly.
  • End-to-end + parsing benches, committed Odoo schema fixture, Docker compose for reproducible Postgres.
  • 37 tests, all passing.

Reproduce

python -m eval.parsing_bench \
  --fixture eval/fixtures/odoo.schema.json \
  --questions eval.questions.odoo \
  --model gpt-4o --selector-model gpt-4o-mini