Releases: Cyberfilo/PromptQuery
Releases · Cyberfilo/PromptQuery
v0.3.0 — enum-aware prompts, execution-guided self-repair
Generation-quality release. Retrieval was already measuring at 98–100% recall on a 211-table benchmark; the misses were in the SQL itself. 0.3.0 attacks the three error classes that benchmarking surfaced — and re-measures.
Measured result
Same 100-question suite, same 211-table Postgres schema, same model (gpt-4o, temperature 0, single-state EX@1) — only the package version changed:
| 0.2.2 | 0.3.0 | |
|---|---|---|
| Execution accuracy (EX@1) | 58.0% | 72.0% |
| Hard DB errors | 7/100 | 0/100 |
| Soft-F1 | 60.2 | 73.9 |
| Set-Recall | 98.0% | 99.0% |
| Tokens/query | 4,257 | 4,689 |
What's new
- Enum-aware schema prompts — the prompt now carries every enum column's legal values and column comments, read from
pg_catalog. The model filters on real states (status = 'delivered') instead of guessing them or inferring state from timestamps. Largest single win: lexical-gap questions went 36 → 73. - Execution-guided self-repair (
--max-repair, default 1) — when Postgres rejects a query, the SQL plus the database's own error go back to the model for one corrected attempt. Repaired SQL is re-validated by the sqlglot guard (and re-confirmed in the REPL) before it runs. Empty results never trigger repair — empty is often the right answer. - Tighter generation rules — exactly the columns asked for, no speculative filters, INNER JOIN unless the question implies otherwise.
- Fix — bare
o4-*model names now infer the OpenAI provider.
Honest negatives, same run: analytical questions (window functions) stay at 20%, multi-join dipped 58 → 50 on a 12-question bucket. Both are composition limits, queued for next.
Full details in the CHANGELOG.
PromptQuery v0.2.0
Headline
On Odoo's real 675-table production schema, gpt-4o (SQL) + gpt-4o-mini (selector):
| Pipeline | Pass rate | Tokens/query | Avg latency |
|---|---|---|---|
| Naive (full schema in prompt) | 84.0% | ~50,000 | 3.4 s |
| PromptQuery v0.1 (TF-IDF only) | 76.0% | ~2,000 | 2.0 s |
| PromptQuery v0.2 (TF-IDF + LLM selector) | 100.0% | ~5,000 | 5.6 s |
+16pp more accurate and ~10× cheaper per query than dumping the full schema.
What changed
- LLM table selector — TF-IDF (now stemmed) narrows to ~50 candidates, then a cheap model picks the ~15 actually relevant tables. Handles semantic gaps that TF-IDF cannot (e.g. invoice →
account_move, shipment →stock_picking). - CLI flags —
--selector-model,--select,--no-selector. - Reasoning-class OpenAI models (gpt-5.x, o-series) now use
max_completion_tokenscorrectly. - End-to-end + parsing benches, committed Odoo schema fixture, Docker compose for reproducible Postgres.
- 37 tests, all passing.
Reproduce
python -m eval.parsing_bench \
--fixture eval/fixtures/odoo.schema.json \
--questions eval.questions.odoo \
--model gpt-4o --selector-model gpt-4o-mini