Skip to content

fix: sql --commit executed DML twice and reported COMMITTED on aborted transactions (PGC-104)#17

Merged
jackschultz merged 2 commits into
mainfrom
commit-fix
Jun 10, 2026
Merged

fix: sql --commit executed DML twice and reported COMMITTED on aborted transactions (PGC-104)#17
jackschultz merged 2 commits into
mainfrom
commit-fix

Conversation

@jackschultz

Copy link
Copy Markdown
Owner

Critical fix to the guarded gateway, found via live dogfooding (solitaire data migration): the single-DML sample path executed the user statement twice in one transaction (wrapped for samples + bare for count), and an aborted transaction's COMMIT was silently downgraded by Postgres to ROLLBACK while pgcrate printed COMMITTED.

  • Exactly-once execution: single writable-CTE pass yields count + samples together; verified adversarially with AFTER-INSERT trigger counters and nextval() side effects, plus ON CONFLICT, UPDATE...FROM, RETURNING, multi-statement, and non-transactional paths.
  • Honest failure: any in-transaction error propagates as "ROLLED BACK — nothing changed", exit 10, ok:false in JSON. SET CONSTRAINTS ALL IMMEDIATE before COMMIT defeats the silent COMMIT→ROLLBACK downgrade for deferred constraints (probed live).
  • Sample column ordering preserved (row_to_json + explicit key list; review fix).
  • SKILL.md updated to the honest-abort contract.
  • 768 tests green; all new regression tests are deliberately non-idempotent (INSERT vs UNIQUE) — idempotent UPDATEs are what masked the original bug.

Not in any published release (crates.io 0.6.0 predates the gateway). Blocks 0.7.0.

…ED (PGC-104)

Root cause: the single-DML-no-RETURNING write path executed the user
statement twice in one transaction — once bare for the affected count,
once RETURNING-wrapped for the sample. Both effects committed (2 rows for
a 1-row INSERT). When the second execution hit a unique violation the
transaction aborted; the wrap error was silently swallowed (Err(_) =>
Ok(None)) and COMMIT — which Postgres downgrades to ROLLBACK on an aborted
tx without raising — reported success. Net: false "COMMITTED" with zero
rows written, the worst failure for a trust tool.

Fix:
- Execute each user statement exactly once. Single DML without RETURNING
  now uses one writable-CTE query that yields both the count and the
  sample (count(*) + json_agg over the CTE; the CTE materializes once).
- No swallowed errors: any DB error inside the transaction rolls back and
  propagates, framed "ROLLED BACK — nothing changed" → exit 10 (human +
  JSON), never COMMITTED.
- Harden COMMIT against deferred-constraint silent-rollback: run
  SET CONSTRAINTS ALL IMMEDIATE before COMMIT and propagate any violation.

- Built: rewritten execute_in_tx / new execute_dml_with_sample +
  sample_from_json + rolled_back helpers in src/commands/sql_cmd.rs;
  7 non-idempotent regression tests (INSERT vs UNIQUE) in tests/commands/sql.rs.
- Validation: cargo fmt --check clean, clippy clean on changed files,
  cargo test --no-fail-fast 767 passed / 0 failed (baseline 760 + 7).
  Live verified against PG18 mirroring the solitaire repro: single
  execution, honest ROLLED BACK + exit 10 on abort, data unchanged.
- Notes: every user-SQL path now executes once (EXPLAIN cost-gate plans
  only, no ANALYZE). Board status update left to orchestrator.
- Fixed: dry-run/commit sample columns were alphabetized (to_jsonb +
  sorted-map key inference). Switched the CTE wrap to row_to_json (text,
  order-preserving) and an explicit ordered column-name array so previews
  show columns in statement/table-definition order.
- Fixed: SKILL.md didn't reflect the new behavior — a dry-run of a
  constraint-violating statement now fails (exit 10, ROLLED BACK) rather
  than reporting a clean DRY RUN. Documented exactly-once + honest abort
  (including deferred constraints) in the writes section.
- Added: regression test asserting the dry-run sample preserves
  table-definition column order.
- Noted: empirically verified exactly-once via an insert-trigger side-effect
  counter (3-row insert -> 3 trigger fires) and nextval observation; deferred
  INITIALLY DEFERRED unique violation under --commit caught by SET CONSTRAINTS
  ALL IMMEDIATE (exit 10, table unchanged, never COMMITTED). ON CONFLICT,
  UPDATE ... FROM, volatile functions, multi-statement and non-transactional
  error paths all execute once / report honestly.
@jackschultz jackschultz merged commit c520b28 into main Jun 10, 2026
0 of 2 checks passed
@jackschultz jackschultz deleted the commit-fix branch June 10, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant