Releases · Cyberfilo/PromptQuery

Generation-quality release. Retrieval was already measuring at 98–100% recall on a 211-table benchmark; the misses were in the SQL itself. 0.3.0 attacks the three error classes that benchmarking surfaced — and re-measures.

Measured result

Same 100-question suite, same 211-table Postgres schema, same model (gpt-4o, temperature 0, single-state EX@1) — only the package version changed:

	0.2.2	0.3.0
Execution accuracy (EX@1)	58.0%	72.0%
Hard DB errors	7/100	0/100
Soft-F1	60.2	73.9
Set-Recall	98.0%	99.0%
Tokens/query	4,257	4,689

What's new

Enum-aware schema prompts — the prompt now carries every enum column's legal values and column comments, read from pg_catalog. The model filters on real states (status = 'delivered') instead of guessing them or inferring state from timestamps. Largest single win: lexical-gap questions went 36 → 73.
Execution-guided self-repair (--max-repair, default 1) — when Postgres rejects a query, the SQL plus the database's own error go back to the model for one corrected attempt. Repaired SQL is re-validated by the sqlglot guard (and re-confirmed in the REPL) before it runs. Empty results never trigger repair — empty is often the right answer.
Tighter generation rules — exactly the columns asked for, no speculative filters, INNER JOIN unless the question implies otherwise.
Fix — bare o4-* model names now infer the OpenAI provider.

Honest negatives, same run: analytical questions (window functions) stay at 20%, multi-join dipped 58 → 50 on a 12-question bucket. Both are composition limits, queued for next.

Full details in the CHANGELOG.

Headline

On Odoo's real 675-table production schema, gpt-4o (SQL) + gpt-4o-mini (selector):

Pipeline	Pass rate	Tokens/query	Avg latency
Naive (full schema in prompt)	84.0%	~50,000	3.4 s
PromptQuery v0.1 (TF-IDF only)	76.0%	~2,000	2.0 s
PromptQuery v0.2 (TF-IDF + LLM selector)	100.0%	~5,000	5.6 s

Pipeline

Pass rate

Tokens/query

Avg latency

Naive (full schema in prompt)

84.0%

~50,000

3.4 s

PromptQuery v0.1 (TF-IDF only)

76.0%

~2,000

2.0 s

PromptQuery v0.2 (TF-IDF + LLM selector)

100.0%

~5,000

5.6 s

+16pp more accurate and ~10× cheaper per query than dumping the full schema.

What changed

LLM table selector — TF-IDF (now stemmed) narrows to ~50 candidates, then a cheap model picks the ~15 actually relevant tables. Handles semantic gaps that TF-IDF cannot (e.g. invoice → account_move, shipment → stock_picking).

CLI flags — --selector-model, --select, --no-selector.

Reasoning-class OpenAI models (gpt-5.x, o-series) now use max_completion_tokens correctly.

End-to-end + parsing benches, committed Odoo schema fixture, Docker compose for reproducible Postgres.

37 tests, all passing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Measured result

What's new

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Headline

What changed

Reproduce

Uh oh!

Releases: Cyberfilo/PromptQuery

v0.3.0 — enum-aware prompts, execution-guided self-repair

Measured result

What's new

Uh oh!

PromptQuery v0.2.0

Headline

What changed

Reproduce

Uh oh!