Avoid queued maintenance prune locks by hardbyte · Pull Request #349 · hardbyte/awa

hardbyte · 2026-06-12T10:44:00Z

Summary

Change automatic queue-storage prune/rollup child-table locks to ACCESS EXCLUSIVE NOWAIT so best-effort maintenance backs off instead of queueing relation locks ahead of workers.
Keep queue-storage metrics and maintenance probes off the receipt-ring hot path where possible.
Bound queue-storage maintenance emptiness checks and receipt rescue scans with partition-local probes/cursors.
Add per-claim-slot receipt-rescue cursors so rescue work advances through large receipt rings instead of repeatedly re-reading old closed evidence.
Keep lease pruning independent of receipt-rescue cursors.
Raise the default lease-ring rotation cadence from 250ms to 1000ms; faster rotation was only mutating the one-row lease-ring metadata more often under pinned MVCC horizons.
Refine receipt closure evidence: successful receipt completion uses done_entries as durable closure evidence, non-success paths keep explicit lease_claim_closures, and terminal-delete paths materialize a terminal_removed closure before removing terminal evidence.
Add migration v034_receipt_terminal_delete_closures.sql and set CURRENT_VERSION = 34.

Why

The #169 long-horizon gate run against main at 7be5bd8 failed with a maintenance lock convoy. During the pinned-MVCC phase, diagnostics caught LOCK TABLE awa.leases_N IN ACCESS EXCLUSIVE MODE waiting behind the pinned transaction. Because queued ACCESS EXCLUSIVE locks sit ahead of later relation lockers, best-effort maintenance could stall worker traffic even though the operation was supposed to back off quickly.

After the NOWAIT fix removed that convoy, later verification still found a smaller late-idle slowdown. The remaining pressure was not ready/done segment churn: those partitions stayed append-only with zero dead tuples. The residual shape was metadata/receipt coordination work: receipt rescue scans over growing closed-claim evidence, and frequent lease-ring singleton updates under a pinned snapshot. This PR follows that evidence instead of tuning around it.

The receipt-evidence refinement removes the duplicate success-path closure write without weakening safety. Successful receipt completion already writes the durable terminal fact synchronously; claim prune can treat that terminal row as closure evidence. Queue prune now refuses to delete terminal evidence while same-segment claim rows still depend on it, and cold terminal-delete paths first materialize an explicit closure if they remove the terminal fact before claim prune.

Long-Horizon Evidence

Shape: 1 Awa replica, 64 workers, fixed 800/s offered rate, completion batch 1024, phases warmup=5m -> clean_1=20m -> idle_1=idle-in-tx:60m -> recovery_1=10m, 30s metric samples, 5s wait-event samples, Postgres 18.3-alpine.

Failed Main Run

Result dir: results/custom-20260612T090023Z-412cc6, Awa 7be5bd8.

Phase	Median enqueue/s	Median complete/s	Final complete/s	Median depth	Peak/final depth	Relation-lock waits
`clean_1`	799.9	799.9	800.5	40	614 / 40	150
`idle_1`	799.9	716.3	726.2	45,932	296,212 / 296,212	500
`recovery_1`	799.9	947.2	1,072.2	245,900	297,226 / 179,792	31

NOWAIT-Only Branch

Result dir: results/custom-20260612T112048Z-c860b0, Awa ba8a4cf.

Phase	Median enqueue/s	Median complete/s	Final complete/s	Median depth	Peak/final depth	Relation-lock waits
`clean_1`	799.9	799.9	799.9	40	69 / 40	2
`idle_1`	799.9	799.3	678.4	165.5	46,772 / 46,772	0
`recovery_1`	799.9	827.6	798.6	83.5	39,783 / 20	4

NOWAIT fixed the relation-lock convoy, but it was not the whole #169 mechanism. Late-idle backlog was still visible.

Final Stacked Branch

Result dir: results/custom-20260612T195938Z-837e7d, Awa d1e1744 before the rebase onto #350. The rebased branch preserves the same #349 maintenance changes, then adds the receipt-evidence refinement and v034 compatibility migration.

Phase	Median enqueue/s	Median complete/s	Final complete/s	Median e2e p99	p90 e2e p99	Median depth	Peak/final depth	Relation-lock waits	Total-dead peak/final
`clean_1`	799.9	799.9	799.2	130.1 ms	264.3 ms	40.0	231.0 / 20.0	6.0	966.0 / 221.0
`idle_1`	799.9	799.9	799.5	156.0 ms	302.0 ms	40.0	283.0 / 20.0	3.0	7,419 / 7,419
`recovery_1`	800.0	800.0	800.2	167.6 ms	333.4 ms	40.0	60.0 / 40.0	0.0	3,577 / 252.0

The pinned-reader stress condition was active: oldest_idle_in_tx_age_s climbed to about 3,584s and snapshot_xmin stayed flat through idle_1.

Interpretation

This branch fixes the original named mechanism: maintenance relation-lock convoy. It also fixes the later receipt-rescue / lease-ring metadata drag enough that the 60-minute pinned-reader phase sustains the offered 800/s with bounded depth and sub-second p99 pickup.

The append-only segment contract held. In the final SQL snapshot, ready_entries_*, done_entries_*, ready_tombstones_*, queue_terminal_count_deltas_*, and receipt closure partitions had zero dead tuples. Residual dead tuples were in small mutable coordination/counter tables, mainly lease_ring_state, queue_terminal_live_counts, claim_ring_slots, and ring state rows. During idle_1, raw samples peaked at lease_ring_state=3,584, queue_terminal_live_counts=3,328, and claim_ring_slots=343 dead tuples.

pg_stat_statements still identifies claim_ready_runtime as the load-bearing query: final snapshot showed about 235k calls, 4.56M claimed rows, and 17.2ms mean time. Completion was much cheaper: about 181k batched completion calls, 4.56M completed rows, and 1.7ms mean. The remaining claim cost is consistent with bounded receipt/coordination checks, not ready/done heap churn. WAL stayed around 2.3KB per completed job and about 20 WAL records per completed job during the pinned-reader phase.

This is a good #169 gate result for the 800/s long-horizon shape. It is not evidence for the 5k/s or 10k/s ceiling; those still need separate offered-rate runs after this lands or while CI runs.

Local Verification

cargo fmt --all -- --check
cargo check --package awa-model --all-features
cargo clippy --package awa-model --all-targets --all-features -- -D warnings
DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test migration_test — 42 passed
DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa-model --test sql_only_storage_upgrade_test — 2 passed
DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test queue_storage_runtime_test — 99 passed
DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test queue_storage_runtime_test prune — 14 passed

TLC:

AwaSegmentedStorage.cfg, AwaSegmentedStorageInterleavings.cfg
AwaDeadTupleContract.cfg
AwaStorageLockOrder.cfg
AwaSegmentedStorageRacesSafe.cfg, AwaSegmentedStorageRacesMultiWorker.cfg
AwaShardedPrune.cfg
AwaPartitionedQueueRouting.cfg
AwaStorageTransition.cfg
Positive trace witnesses: snooze, receipt rescue, lost-claim advance, running cancel, receipt-only cancel, callback wait, DLQ retry, DLQ purge. Each reached its expected *TraceIncomplete violation.
Negative witnesses: broken storage trace deadlocked as expected; lock-order deadlock demo violated NoDeadlock as expected.

Review Notes

CodeRabbit's stale-rescue budget concern is already addressed in the current branch: rescue attempts the oldest initialized claim slot before sweeping the remaining slots.
CodeRabbit's non-lock-error masking concern is addressed in 326cdd6: prune TRUNCATE failures now return Blocked only for lock contention and propagate other database errors.

Summary by CodeRabbit

New Features
- Added deadline-rescue cursor support and enhanced per-slot receipt rescue sweeps for queue cleanup.
Bug Fixes
- Improved terminal deletion to ensure closure evidence is materialized before removing terminal rows, preserving receipt-plane correctness.
Performance Improvements
- Increased default lease rotation interval to 1000ms.
- Updated pruning/rotation locking to fail fast (NOWAIT) under contention.
- Optimized queue-storage health metrics to use presence checks.
Documentation
- Refreshed architecture/ADR/config and Grafana dashboard text to match the new receipt and locking behavior.

coderabbitai · 2026-06-12T10:44:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: be80b1d5-9335-41cd-a27f-aff2913ef2d2

📥 Commits

Reviewing files that changed from the base of the PR and between c650aee and 53f6426.

📒 Files selected for processing (6)

awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql
awa-model/src/queue_storage.rs
awa/tests/queue_storage_runtime_test.rs
correctness/storage/AwaStorageLockOrder.tla
correctness/storage/MAPPING.md
correctness/storage/README.md

✅ Files skipped from review due to trivial changes (1)

correctness/storage/README.md

🚧 Files skipped from review as they are similar to previous changes (3)

correctness/storage/AwaStorageLockOrder.tla
awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql
correctness/storage/MAPPING.md

📝 Walkthrough

Walkthrough

This PR extends the queue-storage receipt-plane with per-claim-slot rescue cursors for both stale and deadline-based rescue scanning. It introduces migrations v033–v035 to install cursor columns and materialize terminal-delete closure evidence, refactors stale-receipt rescue from full-table scans to cursor-driven per-slot scanning with configurable batch limits, replaces COUNT(*) busy checks with lightweight EXISTS probes, and switches maintenance lock acquisitions to NOWAIT mode to gracefully degrade under contention. The default lease-rotation interval increases from 250ms to 1000ms, observability metrics are updated to avoid receipt-plane aggregation, and formal correctness models and documentation are synchronized with the new bounded-cursor and NOWAIT-lock semantics.

Changes

Receipt-Plane Cursor & Maintenance Locks

Layer / File(s)	Summary
Database substrate extensions for stale & deadline rescue cursors `awa-model/migrations/v023_install_queue_storage_substrate.sql`, `awa-model/migrations/v033_receipt_rescue_cursors.sql`, `awa-model/migrations/v034_receipt_terminal_delete_closures.sql`, `awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql`, `awa-model/src/migrations.rs`	v023 extends `claim_ring_slots` with `rescue_cursor_claimed_at`, `rescue_cursor_job_id`, `rescue_cursor_run_lease` columns and adds `lease_claims` rescue-cursor index; v033 installs these columns on existing schemas; v034 adds `awa.delete_job_compat(p_id)` function materializing `terminal_removed` closure evidence before deletion; v035 introduces deadline-rescue cursor columns; migration registry updated with `CURRENT_VERSION` incremented to 35 and new step references.
Cursor-driven stale-receipt and deadline rescue with batching and closed-prefix stopping `awa-model/src/queue_storage.rs`	Receipt rescue refactored from single-pass query to per-slot cursor scanning: adds `RECEIPT_RESCUE_BATCH_LIMIT` and `RECEIPT_RESCUE_CURSOR_SCAN_LIMIT` constants; wraps per-slot rescue in batch-limit loop; reads rescue cursor position; selects stale candidates relative to cursor; inserts `'rescued'` closures; advances `claim_ring_slots` rescue cursor through contiguous closed prefix; halts before stale open claims. Deadline rescue similarly scans by `deadline_at` ordering, inserts `'deadline_expired'` closures, advances deadline cursor, stops before first open future-deadline claim. Both resets cursor to sentinel values (`-infinity`, `0`) after claim-ring truncation.
Terminal-delete closure materialization and receipt-closed-evidence predicate `awa-model/src/queue_storage.rs`	Introduces `ensure_terminal_removed_receipt_closures_tx` to insert `'terminal_removed'` receipt-closure rows when terminal jobs are removed; integrates into retry-from-terminal and admin-delete flows. Adds shared `receipt_closed_evidence_sql` predicate consolidating closure and terminal evidence checks across `lease_claim_closures`, `done_entries`, `deferred_jobs`, and `dlq_entries`. Refactors completion, cancel, and materialization paths to gate on `NOT {closed_evidence}` instead of closure-only checks, ensuring consistent receipt-plane semantics.
Bounded busy checks via EXISTS probes and presence indicators `awa-model/src/queue_storage.rs`	New `busy_indicator` helper converts EXISTS-style booleans to standardized presence signals (0/1); new `relation_has_rows_tx` helper performs transaction-aware `SELECT EXISTS` probes; replaces `COUNT(*)` queries with lightweight EXISTS throughout rotation/prune validation; returns `BusyCounts` derived from non-zero presence indicators instead of row counts.
Queue/lease/claim rotation with NOWAIT locks and presence checks `awa-model/src/queue_storage.rs`	`rotate_queue`, `rotate_leases`, and `rotate_claims` updated to use `relation_has_rows_tx` presence checks; busy checks return `SkippedBusy` with `busy_indicator`-derived `BusyCounts`; partition locks changed to `ACCESS EXCLUSIVE MODE NOWAIT`; graceful Blocked outcomes on contention.
Queue/lease/claim prune with NOWAIT locks, presence checks, and cursor reset `awa-model/src/queue_storage.rs`	`prune_oldest`, `prune_oldest_leases`, and `prune_oldest_claims` acquire child-partition locks with `NOWAIT`; active/pending/open row checks replaced with EXISTS-based booleans via `receipt_closed_evidence_sql`; return values populated with `busy_indicator` presence signals; prune resets both rescue and deadline cursor fields to sentinel values after successful truncation; errors and lock contention produce graceful `Blocked`/`SkippedActive` outcomes.
Health metrics bounded observability and contention-safe publication `awa-worker/src/maintenance.rs`	`publish_queue_storage_health_metrics` SQL query removes `receipt_claims` CTE/LEFT JOIN aggregation; running depth computed from `leases.running` only; metrics test validates completion under locked receipt tables via `tokio::time::timeout` assertion, confirming publication does not block on receipt-plane access.
Configuration defaults: lease_rotate_interval 250ms → 1000ms `awa-python/src/client.rs`, `awa-worker/src/client.rs`, `awa-worker/src/maintenance.rs`, `docs/configuration.md`, `docs/architecture.md`	Lease rotation interval default changed from 250ms to 1000ms across Rust worker `ClientBuilder::new`, Python `PyClient.start()`, and documentation examples; maintenance comments updated to reflect new cadence; branch-cooldown and prune-backoff estimates adjusted accordingly.
Runtime tests for cursor behavior, deadline rescue, and lock contention `awa/tests/queue_storage_runtime_test.rs`, `awa/tests/queue_storage_benchmark_test.rs`	New/updated tests verify claim rescue cursor reset-on-prune isolation, lease-prune cursor preservation, partition routing with direct store API, receipt-backed cancel closure materialization, SQL compat delete closure insertion, deadline rescue cursor advancement past terminal evidence with closed-prefix blocking; lock contention classification updated for `55P03` SQLSTATE; prune-after-completion test resilient to transient `Blocked` outcomes; timing parameters adjusted from 250ms to 2s for heartbeat staleness; completion closure-count expectations updated for terminal evidence patterns; test helpers improved for cursor aging and atomic release signaling.
Test updates for NOWAIT behavior and partition routing `awa/tests/receipt_plane_chaos_test.rs`	Chaos test updated to reflect `NOWAIT` fail-fast behavior; test schema and assertion messages updated to describe immediate `Blocked` returns under concurrent access.
Correctness models: dead-tuple, cursor mutations, and NOWAIT locking `correctness/storage/AwaDeadTupleContract.tla`, `correctness/storage/AwaSegmentedStorage.tla`, `correctness/storage/AwaStorageLockOrder.tla`	TLA+ specs extended: `RescueReceiptsTx` and `RescueReceiptDeadlinesTx` now model `Update` mutations on `claim_ring_slots`; terminal-delete transactions model closure insertion before `done_entries` deletion; `AwaSegmentedStorage` adds `ClaimKeyJob` operator and strengthens `PruneReadySegment` with closure-evidence requirements; `AwaStorageLockOrder` models `claim_ring_slots` exclusive locks before cursor updates, introduces `TerminalDeletePlan`, and annotates all prune partition locks with `ACCESS EXCLUSIVE NOWAIT`.
Model mapping and documentation: variables, actions, and NOWAIT procedures `correctness/storage/MAPPING.md`, `correctness/storage/README.md`	Variable mapping updated with `heartbeatFresh` and refined `claimSegments[seg]`/`claim_ring_slots` cursor semantics; action mapping refreshed for claim, prune, and rescue sequences with explicit NOWAIT contention details; prune readiness narrative reframed from "check-then-act" to "lock-then-check-then-act"; receipt-plane lifecycle and closure-evidence semantics documented; lock-order coverage notes explicitly reference `NOWAIT` behavior and excluded Postgres deadlock-detector abort modeling.
Architecture Decision Records: cursors, deadline-rescue, terminal-delete, and NOWAIT locks `docs/adr/019-queue-storage-redesign.md`, `docs/adr/023-receipt-plane-ring-partitioning.md`, `docs/adr/026-narrow-terminal-history.md`, `docs/adr/032-failed-terminal-retention.md`, `docs/architecture.md`	ADR-019 updated to describe best-effort `ACCESS EXCLUSIVE NOWAIT` prune locking and graceful contention handling; ADR-023 introduces `claim_ring_slots` as control-plane for per-slot stale/deadline rescue cursors, specifies bounded anti-join reads over partitioned claims/closures, and describes cursor advance/stop semantics; ADR-026 clarifies fused completion and terminal-delete closure materialization; ADR-032 updated for NOWAIT lock semantics; architecture doc expanded on cursor-driven scans and bounded stale-rescue access paths.
Observability documentation: Grafana metrics and configuration `docs/grafana/awa-dashboard.json`, `docs/configuration.md`, `docs/benchmarking.md`	Grafana panels reframed to describe rotation-skip metrics as bounded presence signals rather than row-count magnitudes; "Rotation skip magnitudes p95" renamed to "Rotation skip blocker presence p95"; configuration docs update defaults to 1000ms and document per-slot rescue-cursor presence in `claim_ring_slots`; benchmarking docs update completion hot-path description to reflect fused receipt-claim-lock + terminal-insert + count-delta statement sequence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related issues

hardbyte/awa#246: Per-slot rescue cursors and NOWAIT prune locks directly address deadline-rescue performance degradation at high concurrency by replacing full historical scans with bounded, cursor-driven incremental windows.
hardbyte/awa#197: Implements receipt-rescue cursors, deadline-rescue cursors, terminal-delete closure materialization, and related correctness/testing improvements tracked in the release readiness meta-issue.

Possibly related PRs

hardbyte/awa#306: Introduces queue_ring_state.terminal_counter_trusted_at which v033/v034/v035 migrations check and conditionally backfill to gate exact terminal-count derivation.
hardbyte/awa#310: Extends the same v023 substrate installer that this PR enhances with rescue-cursor and deadline-cursor columns and new lease_claims indexing.

Poem

🐰 Cursors hop through claim-ring slots so neat,
No more COUNT stars blocking rescue's beat,
NOWAIT locks fail-fast, no queue to defeat,
Deadline sweeps stop before open, tight and sweet! 🌙

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

hardbyte · 2026-06-13T06:42:01Z

5k/s offered-load check after rebase on #350

Run: results/custom-20260613T061209Z-c20e35

Shape:

Awa only, revision f4373e0 on codex/maintenance-prune-nowait
benchmark harness cb61d2f on codex/awa-lease-rotate-default
PG postgres:18.3-alpine
--producer-rate 5000 --producer-mode fixed --worker-count 64 --awa-completion-batch-size 1024
phases: warmup=2m, clean_5k=20m, recovery_5k=5m
sample every 10s, wait-event sample every 2s

Results:

Phase	enqueue median	completion median	completion tail median	queue depth median	queue depth peak/final	dead tuples peak/final
`clean_5k`	5,003/s	842/s	768/s	2.85M	5.29M / 5.29M	539 / 167
`recovery_5k`	5,003/s	794/s	768/s	5.96M	6.58M / 6.58M	702 / 217

Wait-event shape during clean_5k:

CPU: 570 / 1188 active samples (48.0%)
WalSync: 338 / 1188 (28.5%)
WALWrite: 154 / 1188 (13.0%)
relation lock: 15 / 1188 (1.3%)

Interpretation:

The old maintenance relation-lock convoy is not the limiting mechanism in this run; relation waits are low.
Dead tuples remain bounded despite the overload. The append-only / partition-truncate story is holding.
The producer path absorbs ~5k/s, but single-queue drain is far below the offered rate and gets worse under a multi-million-row live backlog.
The named residual mechanism here is live-backlog WAL amplification: when offered load outruns completion, Awa accumulates millions of live ready/history rows; claim and completion keep doing durable append/closure work, and Postgres spends active time mostly on CPU + WAL sync/write rather than relation locks or vacuum cleanup.

This is evidence that #349 fixes the queued-prune-lock failure mode, but it is not evidence that a single logical queue can sustainably absorb 5k/s end-to-end on this machine. The next throughput lever is still queue fanout/striping or a deeper claim/receipt batching redesign that reduces per-completion durable writes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4373e0e8e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-13T06:45:38Z

+                    ORDER BY claims.claimed_at, claims.job_id, claims.run_lease
+                    LIMIT $3


Avoid stranding stale claims behind a fresh blocker

When a healthy long-running receipt claim sits just after the rescue cursor, this LIMIT $3 confines every rescue pass to the same first 10k rows because the cursor advancement below stops before the first non-advanceable row. Any stale open claims beyond that window are never considered until the healthy blocker closes or becomes stale, so large claim partitions can leave jobs stuck in running even though maintenance keeps firing; the new test only covers a stale tail within the limited window.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

awa-model/src/queue_storage.rs (1)
11664-11673: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only map actual NOWAIT lock contention to Blocked. These three branches currently treat any LOCK TABLE ... NOWAIT failure as contention. That masks real database errors behind the maintenance backoff path, so the runtime can keep reporting Blocked instead of surfacing a real failure.

awa-model/src/queue_storage.rs#L11664-L11673: match the lock error and return PruneOutcome::Blocked only for 55P03; propagate other errors with map_sqlx_error.

awa-model/src/queue_storage.rs#L11999-L12008: apply the same is_lock_contention_error check before converting the failure to Blocked.

awa-model/src/queue_storage.rs#L12227-L12236: keep the claim-prune NOWAIT semantics, but do not swallow non-lock failures into Blocked.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@awa-model/src/queue_storage.rs` around lines 11664 - 11673,
awa-model/src/queue_storage.rs:11664-11673 — change the current unconditional
match-on-lock failure to inspect the sqlx error and only return
PruneOutcome::Blocked when the DB error code is the Postgres lock contention
code "55P03"; for any other error propagate it via map_sqlx_error (or return
Err(map_sqlx_error(err))) instead of swallowing it into Blocked.
awa-model/src/queue_storage.rs:11999-12008 — apply the same selective check:
treat the NOWAIT lock failure as Blocked only when the error is "55P03",
otherwise propagate the error with map_sqlx_error.
awa-model/src/queue_storage.rs:12227-12236 — for the claim-prune NOWAIT path
keep NOWAIT semantics but do not convert non-lock errors to
PruneOutcome::Blocked; detect "55P03" and return Blocked only for that code, and
for other sqlx errors propagate them using map_sqlx_error. Use a small helper or
inline check on err.as_database_error().and_then(|d| d.code()) == Some("55P03")
to identify lock contention.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 8203-8213: The loop that rescues stale receipts (using
claim_slot_count() and rescue_stale_receipt_claims_for_slot_tx) can exhaust the
global remaining batch on low-numbered slots and never reach the slot that
prune_oldest_claims is waiting on; change the algorithm to identify the
prune-blocking slot first (e.g. by calling the existing prune_oldest_claims() or
the routine that returns the oldest-generation slot) and attempt rescue for that
slot before iterating other slots, subtracting its rescued count from remaining,
then iterate the remaining slots (skipping the already-handled prune-blocking
slot) so the global remaining budget cannot be fully consumed by other slots
before the prune-blocking slot is serviced; ensure you still break when
remaining==0 and preserve error handling around
rescue_stale_receipt_claims_for_slot_tx.

---

Outside diff comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 11664-11673: awa-model/src/queue_storage.rs:11664-11673 — change
the current unconditional match-on-lock failure to inspect the sqlx error and
only return PruneOutcome::Blocked when the DB error code is the Postgres lock
contention code "55P03"; for any other error propagate it via map_sqlx_error (or
return Err(map_sqlx_error(err))) instead of swallowing it into Blocked.
awa-model/src/queue_storage.rs:11999-12008 — apply the same selective check:
treat the NOWAIT lock failure as Blocked only when the error is "55P03",
otherwise propagate the error with map_sqlx_error.
awa-model/src/queue_storage.rs:12227-12236 — for the claim-prune NOWAIT path
keep NOWAIT semantics but do not convert non-lock errors to
PruneOutcome::Blocked; detect "55P03" and return Blocked only for that code, and
for other sqlx errors propagate them using map_sqlx_error. Use a small helper or
inline check on err.as_database_error().and_then(|d| d.code()) == Some("55P03")
to identify lock contention.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 471a5d06-8b88-4b50-a8fe-f710dce2586d

📥 Commits

Reviewing files that changed from the base of the PR and between 58eb350 and f4373e0.

📒 Files selected for processing (21)

awa-model/migrations/v023_install_queue_storage_substrate.sql
awa-model/migrations/v033_receipt_rescue_cursors.sql
awa-model/src/migrations.rs
awa-model/src/queue_storage.rs
awa-python/src/client.rs
awa-worker/src/client.rs
awa-worker/src/maintenance.rs
awa/tests/queue_storage_benchmark_test.rs
awa/tests/queue_storage_runtime_test.rs
awa/tests/receipt_plane_chaos_test.rs
correctness/storage/AwaDeadTupleContract.tla
correctness/storage/AwaSegmentedStorage.tla
correctness/storage/AwaStorageLockOrder.tla
correctness/storage/MAPPING.md
correctness/storage/README.md
docs/adr/019-queue-storage-redesign.md
docs/adr/023-receipt-plane-ring-partitioning.md
docs/adr/032-failed-terminal-retention.md
docs/architecture.md
docs/configuration.md
docs/grafana/awa-dashboard.json

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

correctness/storage/AwaStorageLockOrder.tla (1)

14-16: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify NOWAIT semantics in the module header comment.

Line 14 says ACCESS EXCLUSIVE NOWAIT, but Line 15 says prune “waits.” With NOWAIT, the maintenance path should fail fast/back off rather than queue behind in-flight writers, so this wording is contradictory.

Suggested wording fix

-\*   - LOCK TABLE ACCESS EXCLUSIVE NOWAIT on the partition child (so prune
-\*     waits for in-flight claim/complete writes to commit before
-\*     truncating).
+\*   - LOCK TABLE ACCESS EXCLUSIVE NOWAIT on the partition child (so prune
+\*     fails fast under contention instead of queueing ahead of worker
+\*     traffic before truncating).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@correctness/storage/AwaStorageLockOrder.tla` around lines 14 - 16, The module
header comment in AwaStorageLockOrder.tla contains contradictory semantics
regarding lock behavior in lines 14-16. The comment states ACCESS EXCLUSIVE
NOWAIT (which means fail immediately if the lock cannot be acquired) but then
says prune "waits" for in-flight writes (which contradicts NOWAIT behavior).
Revise the comment to accurately reflect the intended lock semantics: if using
NOWAIT, the language should indicate the operation fails fast or backs off
rather than waits; if the intention is to wait, remove NOWAIT from the
description or adjust the lock specification accordingly to ensure consistency
between the lock mode and the described behavior.

♻️ Duplicate comments (1)

awa-model/src/queue_storage.rs (1)

8552-8567: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prioritize the prune-blocking slot in deadline rescue too.

This loop still spends the global deadline-rescue budget in numeric slot order. Under sustained expired-deadline backlog on lower-numbered slots, remaining can hit zero before the oldest initialized slot is visited, which leaves prune_oldest_claims() stuck behind rescueable open claims even after the stale-receipt path was fixed.

♻️ Minimal fix sketch

     async fn rescue_expired_receipt_deadlines_tx<'a>(
         &self,
         tx: &mut sqlx::Transaction<'a, sqlx::Postgres>,
     ) -> Result<Vec<DeletedLeaseRow>, AwaError> {
+        let schema = self.schema();
         let mut rescued = Vec::new();
         let mut remaining = RECEIPT_RESCUE_BATCH_LIMIT;
+        let preferred_slot: Option<i32> = sqlx::query_as::<_, (i32, i64, i32)>(&format!(
+            r#"
+            SELECT current_slot, generation, slot_count
+            FROM {schema}.claim_ring_state
+            WHERE singleton = TRUE
+            "#
+        ))
+        .fetch_optional(tx.as_mut())
+        .await
+        .map_err(map_sqlx_error)?
+        .and_then(|(current_slot, generation, slot_count)| {
+            oldest_initialized_ring_slot(current_slot, generation, slot_count)
+                .map(|(slot, _)| slot)
+                .filter(|slot| *slot >= 0 && (*slot as usize) < self.claim_slot_count())
+        });
+
+        if let Some(slot) = preferred_slot {
+            let mut slot_rescued = self
+                .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot, remaining)
+                .await?;
+            remaining = remaining.saturating_sub(slot_rescued.len() as i64);
+            rescued.append(&mut slot_rescued);
+            if remaining == 0 {
+                return Ok(rescued);
+            }
+        }
+
         for slot in 0..self.claim_slot_count() {
+            let slot = slot as i32;
+            if Some(slot) == preferred_slot {
+                continue;
+            }
             let mut slot_rescued = self
-                .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot as i32, remaining)
+                .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot, remaining)
                 .await?;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@awa-model/src/queue_storage.rs` around lines 8552 - 8567, The loop that
rescues expired receipt deadlines by iterating through slots in numeric order
(0..self.claim_slot_count()) can exhaust the remaining budget before reaching
the oldest initialized slot, which blocks prune_oldest_claims() from
progressing. Refactor the loop to prioritize the oldest initialized slot (the
prune-blocking slot) first by visiting it before iterating through other slots
in order. This ensures that critical pruning operations are not blocked by
backlog on lower-numbered slots when the deadline-rescue budget is limited.

🧹 Nitpick comments (1)

awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql (1)
57-63: 💤 Low value

Lease-claim-receipts detection heuristic is fragile.

The detection relies on string matching within the function body (position(...IN v_claim_runtime_def)). If the function text changes format (e.g., different quoting, whitespace, or schema aliasing), this could produce false negatives. However, the v_schema = 'awa' fallback ensures the default schema always works, and the heuristic is only used for custom schemas where the function was explicitly defined.

This is acceptable for a migration that runs once per schema upgrade.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql` around lines
57 - 63, The lease-claim-receipts detection uses string matching on the function
body via the position function to detect INSERT statements, which is fragile to
formatting changes. While the comment acknowledges this is acceptable for a
one-time migration due to the v_schema = 'awa' fallback, add a code comment in
the migration explaining the heuristic's limitations and why it is acceptable
(single-run migration, explicit function definition in custom schemas, fallback
for default schema). This documents the technical debt and helps future
maintainers understand the tradeoff without refactoring the detection logic
itself.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@awa/tests/queue_storage_runtime_test.rs`:
- Around line 8255-8256: The test currently sets idle_threshold to 120 seconds
but only ages last_claimed_at by 90 seconds, which keeps the lease on the fresh
path and prevents the stale-lease branch from being exercised. To fix this,
increase the duration used to age last_claimed_at (currently 90 seconds) to
exceed the idle_threshold value (120 seconds), so the stale-lease code path is
properly tested and the final assertion comparing after_stale and after_fresh
can hold true. This change is needed both where last_claimed_at is initially
aged and in any related assertion or verification sections.

In `@correctness/storage/MAPPING.md`:
- Line 67: The PruneLeaseSegment mapping row on line 67 of the MAPPING.md file
describes the prune operation but omits the ACCESS EXCLUSIVE NOWAIT lock
acquisition detail that is part of the current PR's prune semantics. Update the
PruneLeaseSegment row to explicitly mention the NOWAIT behavior in the lock
acquisition specification, making it consistent with the adjacent mapping rows
on lines 66 and 69 and aligned with the lock-order specification narrative.
- Line 195: In the ready-prune race note at line 195 in the MAPPING.md file,
replace the reference to `PruneLeaseSegment` with `PruneReadySegment` because
the paragraph is describing the ready-partition `prune_oldest` concern, not the
lease segment concern. This ensures the documentation correctly identifies which
spec transition addresses the specific issue being discussed and avoids
cross-spec confusion.

In `@correctness/storage/README.md`:
- Around line 150-151: The README.md description of AwaStorageLockOrder.cfg at
lines 150-151 enumerates the modeled flows but omits terminal-delete from the
list. Since AwaStorageLockOrder.tla now includes TerminalDeletePlan and
StartTerminalDelete, you need to add terminal-delete to the comma-separated list
of modeled flows in the AwaStorageLockOrder.cfg description to keep the
configuration summary complete and accurate.

---

Outside diff comments:
In `@correctness/storage/AwaStorageLockOrder.tla`:
- Around line 14-16: The module header comment in AwaStorageLockOrder.tla
contains contradictory semantics regarding lock behavior in lines 14-16. The
comment states ACCESS EXCLUSIVE NOWAIT (which means fail immediately if the lock
cannot be acquired) but then says prune "waits" for in-flight writes (which
contradicts NOWAIT behavior). Revise the comment to accurately reflect the
intended lock semantics: if using NOWAIT, the language should indicate the
operation fails fast or backs off rather than waits; if the intention is to
wait, remove NOWAIT from the description or adjust the lock specification
accordingly to ensure consistency between the lock mode and the described
behavior.

---

Duplicate comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 8552-8567: The loop that rescues expired receipt deadlines by
iterating through slots in numeric order (0..self.claim_slot_count()) can
exhaust the remaining budget before reaching the oldest initialized slot, which
blocks prune_oldest_claims() from progressing. Refactor the loop to prioritize
the oldest initialized slot (the prune-blocking slot) first by visiting it
before iterating through other slots in order. This ensures that critical
pruning operations are not blocked by backlog on lower-numbered slots when the
deadline-rescue budget is limited.

---

Nitpick comments:
In `@awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql`:
- Around line 57-63: The lease-claim-receipts detection uses string matching on
the function body via the position function to detect INSERT statements, which
is fragile to formatting changes. While the comment acknowledges this is
acceptable for a one-time migration due to the v_schema = 'awa' fallback, add a
code comment in the migration explaining the heuristic's limitations and why it
is acceptable (single-run migration, explicit function definition in custom
schemas, fallback for default schema). This documents the technical debt and
helps future maintainers understand the tradeoff without refactoring the
detection logic itself.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26af158f-1f1a-4c9a-a565-f1de279e2f2a

📥 Commits

Reviewing files that changed from the base of the PR and between f4373e0 and c650aee.

📒 Files selected for processing (18)

awa-model/migrations/v023_install_queue_storage_substrate.sql
awa-model/migrations/v033_receipt_rescue_cursors.sql
awa-model/migrations/v034_receipt_terminal_delete_closures.sql
awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql
awa-model/src/migrations.rs
awa-model/src/queue_storage.rs
awa/tests/queue_storage_runtime_test.rs
correctness/storage/AwaDeadTupleContract.tla
correctness/storage/AwaSegmentedStorage.tla
correctness/storage/AwaStorageLockOrder.tla
correctness/storage/MAPPING.md
correctness/storage/README.md
docs/adr/019-queue-storage-redesign.md
docs/adr/023-receipt-plane-ring-partitioning.md
docs/adr/026-narrow-terminal-history.md
docs/architecture.md
docs/benchmarking.md
docs/configuration.md

✅ Files skipped from review due to trivial changes (4)

docs/benchmarking.md
docs/adr/026-narrow-terminal-history.md
docs/configuration.md
docs/adr/023-receipt-plane-ring-partitioning.md

🚧 Files skipped from review as they are similar to previous changes (3)

awa-model/migrations/v033_receipt_rescue_cursors.sql
awa-model/migrations/v023_install_queue_storage_substrate.sql
docs/adr/019-queue-storage-redesign.md

hardbyte · 2026-06-14T00:31:13Z

5k/s offered retest after receipt deadline cursor work

Run id: custom-20260613T235621Z-8c69f4

Command:

uv run bench run --systems awa \
  --producer-rate 5000 \
  --producer-mode fixed \
  --worker-count 64 \
  --awa-completion-batch-size 1024 \
  --sample-every 10 \
  --wait-event-sample-every 2 \
  --phase warmup=warmup:2m \
  --phase clean_5k=clean:20m \
  --phase recovery_5k=recovery:5m

Compared against previous same-shape run custom-20260613T225703Z-0d00f7.

Throughput

phase	previous completion median	this PR completion median	enqueue median
clean_5k	812.84/s	812.76/s	5003.08/s
recovery_5k	744.68/s	744.63/s	5002.88/s

So this change is throughput-neutral for this 5k/s offered single-queue shape. It bounds/removes one class of repeated deadline-rescue scan risk, but it does not by itself move the e2e drain ceiling.

Storage/dead-tuple shape

The append-only/tombstone storage shape held under load:

ready_entries_*: median/peak dead tuples stayed 0.
ready_tombstones_*: median/peak dead tuples stayed 0.
queue_terminal_live_counts: clean median 0, peak 1024; recovery median/peak 0.
lease_claims_0: previous clean peak 56 dead tuples; current clean peak 0 in the summary sample.

Instrumentation notes

I captured pg_stat_statements, WAL counters, relation stats, and EXPLAIN (ANALYZE, BUFFERS) for claim_ready_runtime at warmup, early/mid/late clean, recovery-start, and recovery-late.

claim_ready_runtime EXPLAIN stayed stable through the run: warmup 26.8ms, early clean 27.8ms, mid clean 28.8ms, late clean 28.0ms, recovery-late 28.3ms. Recovery-start had a one-off 73.7ms sample with more buffer reads/writes.
WAL bytes were cumulative from stats reset: warmup 0.31GB, early clean 1.49GB, mid clean 3.18GB, late clean 5.08GB, recovery-start 8.17GB, recovery-late 9.33GB.
Late-clean top cumulative statements were still dominated by claiming and enqueue/copy path: claim wrapper mean 20.6ms over 26,117 calls, claim body mean 3.05ms, target-slot scan mean 2.06ms, COPY enqueue mean 2.37ms.
The named residual mechanism I now see is claim-history live-proof pressure: maintenance/prune periodically proves a ready slot has no live claims by scanning lease_claims and anti-joining terminal evidence. It is not per-job, but it gets expensive under accumulated receipt claim history. In the recovery-late snapshot that proof query had only 28 calls but 24.2s total time / 864ms mean. The slow-query logs showed the same query taking ~1.3-2.2s at rotation boundaries.

My read: this PR is still worth keeping because it gives a bounded deadline-rescue cursor and keeps the storage invariants clean, but the next meaningful performance work should target slot-prune/live-claim proof so maintenance can answer from a cursor/ledger rather than repeatedly reproving over receipt claim history.

hardbyte force-pushed the codex/maintenance-prune-nowait branch from d1e1744 to ecd42ca Compare June 13, 2026 00:59

hardbyte added 8 commits June 13, 2026 14:14

Avoid queued maintenance prune locks

7f8f784

Keep queue storage metrics off receipt rings

1d1045f

Bound queue storage maintenance emptiness checks

06d68c9

Bound receipt rescue with per-slot cursors

ea50b13

Keep lease pruning off receipt rescue cursors

191b291

Reduce default lease ring cadence

4e328cc

Fix prune outcome test pattern after rebase

cc8799e

Keep storage correctness docs aligned after prune changes

f4373e0

hardbyte force-pushed the codex/maintenance-prune-nowait branch from ecd42ca to f4373e0 Compare June 13, 2026 05:55

hardbyte marked this pull request as ready for review June 13, 2026 06:42

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread awa-model/src/queue_storage.rs

hardbyte added 5 commits June 13, 2026 19:31

Fix receipt rescue cursor sweep liveness

a62ed0a

Address receipt rescue and prune review feedback

fe6df69

Refine receipt closure evidence

ef52ee9

Propagate non-lock prune errors

326cdd6

Bound receipt deadline rescue scans

c650aee

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread awa/tests/queue_storage_runtime_test.rs

Comment thread correctness/storage/MAPPING.md Outdated

Comment thread correctness/storage/MAPPING.md Outdated

Comment thread correctness/storage/README.md Outdated

Stabilize receipt rescue runtime test

266b27c

hardbyte added 2 commits June 14, 2026 12:51

Latch blocking worker test gate

2291330

Address receipt rescue review findings

53f6426

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid queued maintenance prune locks#349

Avoid queued maintenance prune locks#349
hardbyte wants to merge 16 commits into
mainfrom
codex/maintenance-prune-nowait

hardbyte commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

hardbyte commented Jun 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hardbyte commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		ORDER BY claims.claimed_at, claims.job_id, claims.run_lease
		LIMIT $3

Conversation

hardbyte commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Long-Horizon Evidence

Failed Main Run

NOWAIT-Only Branch

Final Stacked Branch

Interpretation

Local Verification

Review Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

hardbyte commented Jun 13, 2026

5k/s offered-load check after rebase on #350

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hardbyte commented Jun 14, 2026

5k/s offered retest after receipt deadline cursor work

Throughput

Storage/dead-tuple shape

Instrumentation notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hardbyte commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading