Skip to content

Avoid queued maintenance prune locks#349

Open
hardbyte wants to merge 16 commits into
mainfrom
codex/maintenance-prune-nowait
Open

Avoid queued maintenance prune locks#349
hardbyte wants to merge 16 commits into
mainfrom
codex/maintenance-prune-nowait

Conversation

@hardbyte

@hardbyte hardbyte commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

  • Change automatic queue-storage prune/rollup child-table locks to ACCESS EXCLUSIVE NOWAIT so best-effort maintenance backs off instead of queueing relation locks ahead of workers.
  • Keep queue-storage metrics and maintenance probes off the receipt-ring hot path where possible.
  • Bound queue-storage maintenance emptiness checks and receipt rescue scans with partition-local probes/cursors.
  • Add per-claim-slot receipt-rescue cursors so rescue work advances through large receipt rings instead of repeatedly re-reading old closed evidence.
  • Keep lease pruning independent of receipt-rescue cursors.
  • Raise the default lease-ring rotation cadence from 250ms to 1000ms; faster rotation was only mutating the one-row lease-ring metadata more often under pinned MVCC horizons.
  • Refine receipt closure evidence: successful receipt completion uses done_entries as durable closure evidence, non-success paths keep explicit lease_claim_closures, and terminal-delete paths materialize a terminal_removed closure before removing terminal evidence.
  • Add migration v034_receipt_terminal_delete_closures.sql and set CURRENT_VERSION = 34.

Why

The #169 long-horizon gate run against main at 7be5bd8 failed with a maintenance lock convoy. During the pinned-MVCC phase, diagnostics caught LOCK TABLE awa.leases_N IN ACCESS EXCLUSIVE MODE waiting behind the pinned transaction. Because queued ACCESS EXCLUSIVE locks sit ahead of later relation lockers, best-effort maintenance could stall worker traffic even though the operation was supposed to back off quickly.

After the NOWAIT fix removed that convoy, later verification still found a smaller late-idle slowdown. The remaining pressure was not ready/done segment churn: those partitions stayed append-only with zero dead tuples. The residual shape was metadata/receipt coordination work: receipt rescue scans over growing closed-claim evidence, and frequent lease-ring singleton updates under a pinned snapshot. This PR follows that evidence instead of tuning around it.

The receipt-evidence refinement removes the duplicate success-path closure write without weakening safety. Successful receipt completion already writes the durable terminal fact synchronously; claim prune can treat that terminal row as closure evidence. Queue prune now refuses to delete terminal evidence while same-segment claim rows still depend on it, and cold terminal-delete paths first materialize an explicit closure if they remove the terminal fact before claim prune.

Long-Horizon Evidence

Shape: 1 Awa replica, 64 workers, fixed 800/s offered rate, completion batch 1024, phases warmup=5m -> clean_1=20m -> idle_1=idle-in-tx:60m -> recovery_1=10m, 30s metric samples, 5s wait-event samples, Postgres 18.3-alpine.

Failed Main Run

Result dir: results/custom-20260612T090023Z-412cc6, Awa 7be5bd8.

Phase Median enqueue/s Median complete/s Final complete/s Median depth Peak/final depth Relation-lock waits
clean_1 799.9 799.9 800.5 40 614 / 40 150
idle_1 799.9 716.3 726.2 45,932 296,212 / 296,212 500
recovery_1 799.9 947.2 1,072.2 245,900 297,226 / 179,792 31

NOWAIT-Only Branch

Result dir: results/custom-20260612T112048Z-c860b0, Awa ba8a4cf.

Phase Median enqueue/s Median complete/s Final complete/s Median depth Peak/final depth Relation-lock waits
clean_1 799.9 799.9 799.9 40 69 / 40 2
idle_1 799.9 799.3 678.4 165.5 46,772 / 46,772 0
recovery_1 799.9 827.6 798.6 83.5 39,783 / 20 4

NOWAIT fixed the relation-lock convoy, but it was not the whole #169 mechanism. Late-idle backlog was still visible.

Final Stacked Branch

Result dir: results/custom-20260612T195938Z-837e7d, Awa d1e1744 before the rebase onto #350. The rebased branch preserves the same #349 maintenance changes, then adds the receipt-evidence refinement and v034 compatibility migration.

Phase Median enqueue/s Median complete/s Final complete/s Median e2e p99 p90 e2e p99 Median depth Peak/final depth Relation-lock waits Total-dead peak/final
clean_1 799.9 799.9 799.2 130.1 ms 264.3 ms 40.0 231.0 / 20.0 6.0 966.0 / 221.0
idle_1 799.9 799.9 799.5 156.0 ms 302.0 ms 40.0 283.0 / 20.0 3.0 7,419 / 7,419
recovery_1 800.0 800.0 800.2 167.6 ms 333.4 ms 40.0 60.0 / 40.0 0.0 3,577 / 252.0

The pinned-reader stress condition was active: oldest_idle_in_tx_age_s climbed to about 3,584s and snapshot_xmin stayed flat through idle_1.

Interpretation

This branch fixes the original named mechanism: maintenance relation-lock convoy. It also fixes the later receipt-rescue / lease-ring metadata drag enough that the 60-minute pinned-reader phase sustains the offered 800/s with bounded depth and sub-second p99 pickup.

The append-only segment contract held. In the final SQL snapshot, ready_entries_*, done_entries_*, ready_tombstones_*, queue_terminal_count_deltas_*, and receipt closure partitions had zero dead tuples. Residual dead tuples were in small mutable coordination/counter tables, mainly lease_ring_state, queue_terminal_live_counts, claim_ring_slots, and ring state rows. During idle_1, raw samples peaked at lease_ring_state=3,584, queue_terminal_live_counts=3,328, and claim_ring_slots=343 dead tuples.

pg_stat_statements still identifies claim_ready_runtime as the load-bearing query: final snapshot showed about 235k calls, 4.56M claimed rows, and 17.2ms mean time. Completion was much cheaper: about 181k batched completion calls, 4.56M completed rows, and 1.7ms mean. The remaining claim cost is consistent with bounded receipt/coordination checks, not ready/done heap churn. WAL stayed around 2.3KB per completed job and about 20 WAL records per completed job during the pinned-reader phase.

This is a good #169 gate result for the 800/s long-horizon shape. It is not evidence for the 5k/s or 10k/s ceiling; those still need separate offered-rate runs after this lands or while CI runs.

Local Verification

  • cargo fmt --all -- --check
  • cargo check --package awa-model --all-features
  • cargo clippy --package awa-model --all-targets --all-features -- -D warnings
  • DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test migration_test — 42 passed
  • DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa-model --test sql_only_storage_upgrade_test — 2 passed
  • DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test queue_storage_runtime_test — 99 passed
  • DATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test queue_storage_runtime_test prune — 14 passed

TLC:

  • AwaSegmentedStorage.cfg, AwaSegmentedStorageInterleavings.cfg
  • AwaDeadTupleContract.cfg
  • AwaStorageLockOrder.cfg
  • AwaSegmentedStorageRacesSafe.cfg, AwaSegmentedStorageRacesMultiWorker.cfg
  • AwaShardedPrune.cfg
  • AwaPartitionedQueueRouting.cfg
  • AwaStorageTransition.cfg
  • Positive trace witnesses: snooze, receipt rescue, lost-claim advance, running cancel, receipt-only cancel, callback wait, DLQ retry, DLQ purge. Each reached its expected *TraceIncomplete violation.
  • Negative witnesses: broken storage trace deadlocked as expected; lock-order deadlock demo violated NoDeadlock as expected.

Review Notes

  • CodeRabbit's stale-rescue budget concern is already addressed in the current branch: rescue attempts the oldest initialized claim slot before sweeping the remaining slots.
  • CodeRabbit's non-lock-error masking concern is addressed in 326cdd6: prune TRUNCATE failures now return Blocked only for lock contention and propagate other database errors.

Summary by CodeRabbit

  • New Features
    • Added deadline-rescue cursor support and enhanced per-slot receipt rescue sweeps for queue cleanup.
  • Bug Fixes
    • Improved terminal deletion to ensure closure evidence is materialized before removing terminal rows, preserving receipt-plane correctness.
  • Performance Improvements
    • Increased default lease rotation interval to 1000ms.
    • Updated pruning/rotation locking to fail fast (NOWAIT) under contention.
    • Optimized queue-storage health metrics to use presence checks.
  • Documentation
    • Refreshed architecture/ADR/config and Grafana dashboard text to match the new receipt and locking behavior.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: be80b1d5-9335-41cd-a27f-aff2913ef2d2

📥 Commits

Reviewing files that changed from the base of the PR and between c650aee and 53f6426.

📒 Files selected for processing (6)
  • awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql
  • awa-model/src/queue_storage.rs
  • awa/tests/queue_storage_runtime_test.rs
  • correctness/storage/AwaStorageLockOrder.tla
  • correctness/storage/MAPPING.md
  • correctness/storage/README.md
✅ Files skipped from review due to trivial changes (1)
  • correctness/storage/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • correctness/storage/AwaStorageLockOrder.tla
  • awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql
  • correctness/storage/MAPPING.md

📝 Walkthrough

Walkthrough

This PR extends the queue-storage receipt-plane with per-claim-slot rescue cursors for both stale and deadline-based rescue scanning. It introduces migrations v033–v035 to install cursor columns and materialize terminal-delete closure evidence, refactors stale-receipt rescue from full-table scans to cursor-driven per-slot scanning with configurable batch limits, replaces COUNT(*) busy checks with lightweight EXISTS probes, and switches maintenance lock acquisitions to NOWAIT mode to gracefully degrade under contention. The default lease-rotation interval increases from 250ms to 1000ms, observability metrics are updated to avoid receipt-plane aggregation, and formal correctness models and documentation are synchronized with the new bounded-cursor and NOWAIT-lock semantics.

Changes

Receipt-Plane Cursor & Maintenance Locks

Layer / File(s) Summary
Database substrate extensions for stale & deadline rescue cursors
awa-model/migrations/v023_install_queue_storage_substrate.sql, awa-model/migrations/v033_receipt_rescue_cursors.sql, awa-model/migrations/v034_receipt_terminal_delete_closures.sql, awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql, awa-model/src/migrations.rs
v023 extends claim_ring_slots with rescue_cursor_claimed_at, rescue_cursor_job_id, rescue_cursor_run_lease columns and adds lease_claims rescue-cursor index; v033 installs these columns on existing schemas; v034 adds awa.delete_job_compat(p_id) function materializing terminal_removed closure evidence before deletion; v035 introduces deadline-rescue cursor columns; migration registry updated with CURRENT_VERSION incremented to 35 and new step references.
Cursor-driven stale-receipt and deadline rescue with batching and closed-prefix stopping
awa-model/src/queue_storage.rs
Receipt rescue refactored from single-pass query to per-slot cursor scanning: adds RECEIPT_RESCUE_BATCH_LIMIT and RECEIPT_RESCUE_CURSOR_SCAN_LIMIT constants; wraps per-slot rescue in batch-limit loop; reads rescue cursor position; selects stale candidates relative to cursor; inserts 'rescued' closures; advances claim_ring_slots rescue cursor through contiguous closed prefix; halts before stale open claims. Deadline rescue similarly scans by deadline_at ordering, inserts 'deadline_expired' closures, advances deadline cursor, stops before first open future-deadline claim. Both resets cursor to sentinel values (-infinity, 0) after claim-ring truncation.
Terminal-delete closure materialization and receipt-closed-evidence predicate
awa-model/src/queue_storage.rs
Introduces ensure_terminal_removed_receipt_closures_tx to insert 'terminal_removed' receipt-closure rows when terminal jobs are removed; integrates into retry-from-terminal and admin-delete flows. Adds shared receipt_closed_evidence_sql predicate consolidating closure and terminal evidence checks across lease_claim_closures, done_entries, deferred_jobs, and dlq_entries. Refactors completion, cancel, and materialization paths to gate on NOT {closed_evidence} instead of closure-only checks, ensuring consistent receipt-plane semantics.
Bounded busy checks via EXISTS probes and presence indicators
awa-model/src/queue_storage.rs
New busy_indicator helper converts EXISTS-style booleans to standardized presence signals (0/1); new relation_has_rows_tx helper performs transaction-aware SELECT EXISTS probes; replaces COUNT(*) queries with lightweight EXISTS throughout rotation/prune validation; returns BusyCounts derived from non-zero presence indicators instead of row counts.
Queue/lease/claim rotation with NOWAIT locks and presence checks
awa-model/src/queue_storage.rs
rotate_queue, rotate_leases, and rotate_claims updated to use relation_has_rows_tx presence checks; busy checks return SkippedBusy with busy_indicator-derived BusyCounts; partition locks changed to ACCESS EXCLUSIVE MODE NOWAIT; graceful Blocked outcomes on contention.
Queue/lease/claim prune with NOWAIT locks, presence checks, and cursor reset
awa-model/src/queue_storage.rs
prune_oldest, prune_oldest_leases, and prune_oldest_claims acquire child-partition locks with NOWAIT; active/pending/open row checks replaced with EXISTS-based booleans via receipt_closed_evidence_sql; return values populated with busy_indicator presence signals; prune resets both rescue and deadline cursor fields to sentinel values after successful truncation; errors and lock contention produce graceful Blocked/SkippedActive outcomes.
Health metrics bounded observability and contention-safe publication
awa-worker/src/maintenance.rs
publish_queue_storage_health_metrics SQL query removes receipt_claims CTE/LEFT JOIN aggregation; running depth computed from leases.running only; metrics test validates completion under locked receipt tables via tokio::time::timeout assertion, confirming publication does not block on receipt-plane access.
Configuration defaults: lease_rotate_interval 250ms → 1000ms
awa-python/src/client.rs, awa-worker/src/client.rs, awa-worker/src/maintenance.rs, docs/configuration.md, docs/architecture.md
Lease rotation interval default changed from 250ms to 1000ms across Rust worker ClientBuilder::new, Python PyClient.start(), and documentation examples; maintenance comments updated to reflect new cadence; branch-cooldown and prune-backoff estimates adjusted accordingly.
Runtime tests for cursor behavior, deadline rescue, and lock contention
awa/tests/queue_storage_runtime_test.rs, awa/tests/queue_storage_benchmark_test.rs
New/updated tests verify claim rescue cursor reset-on-prune isolation, lease-prune cursor preservation, partition routing with direct store API, receipt-backed cancel closure materialization, SQL compat delete closure insertion, deadline rescue cursor advancement past terminal evidence with closed-prefix blocking; lock contention classification updated for 55P03 SQLSTATE; prune-after-completion test resilient to transient Blocked outcomes; timing parameters adjusted from 250ms to 2s for heartbeat staleness; completion closure-count expectations updated for terminal evidence patterns; test helpers improved for cursor aging and atomic release signaling.
Test updates for NOWAIT behavior and partition routing
awa/tests/receipt_plane_chaos_test.rs
Chaos test updated to reflect NOWAIT fail-fast behavior; test schema and assertion messages updated to describe immediate Blocked returns under concurrent access.
Correctness models: dead-tuple, cursor mutations, and NOWAIT locking
correctness/storage/AwaDeadTupleContract.tla, correctness/storage/AwaSegmentedStorage.tla, correctness/storage/AwaStorageLockOrder.tla
TLA+ specs extended: RescueReceiptsTx and RescueReceiptDeadlinesTx now model Update mutations on claim_ring_slots; terminal-delete transactions model closure insertion before done_entries deletion; AwaSegmentedStorage adds ClaimKeyJob operator and strengthens PruneReadySegment with closure-evidence requirements; AwaStorageLockOrder models claim_ring_slots exclusive locks before cursor updates, introduces TerminalDeletePlan, and annotates all prune partition locks with ACCESS EXCLUSIVE NOWAIT.
Model mapping and documentation: variables, actions, and NOWAIT procedures
correctness/storage/MAPPING.md, correctness/storage/README.md
Variable mapping updated with heartbeatFresh and refined claimSegments[seg]/claim_ring_slots cursor semantics; action mapping refreshed for claim, prune, and rescue sequences with explicit NOWAIT contention details; prune readiness narrative reframed from "check-then-act" to "lock-then-check-then-act"; receipt-plane lifecycle and closure-evidence semantics documented; lock-order coverage notes explicitly reference NOWAIT behavior and excluded Postgres deadlock-detector abort modeling.
Architecture Decision Records: cursors, deadline-rescue, terminal-delete, and NOWAIT locks
docs/adr/019-queue-storage-redesign.md, docs/adr/023-receipt-plane-ring-partitioning.md, docs/adr/026-narrow-terminal-history.md, docs/adr/032-failed-terminal-retention.md, docs/architecture.md
ADR-019 updated to describe best-effort ACCESS EXCLUSIVE NOWAIT prune locking and graceful contention handling; ADR-023 introduces claim_ring_slots as control-plane for per-slot stale/deadline rescue cursors, specifies bounded anti-join reads over partitioned claims/closures, and describes cursor advance/stop semantics; ADR-026 clarifies fused completion and terminal-delete closure materialization; ADR-032 updated for NOWAIT lock semantics; architecture doc expanded on cursor-driven scans and bounded stale-rescue access paths.
Observability documentation: Grafana metrics and configuration
docs/grafana/awa-dashboard.json, docs/configuration.md, docs/benchmarking.md
Grafana panels reframed to describe rotation-skip metrics as bounded presence signals rather than row-count magnitudes; "Rotation skip magnitudes p95" renamed to "Rotation skip blocker presence p95"; configuration docs update defaults to 1000ms and document per-slot rescue-cursor presence in claim_ring_slots; benchmarking docs update completion hot-path description to reflect fused receipt-claim-lock + terminal-insert + count-delta statement sequence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related issues

  • hardbyte/awa#246: Per-slot rescue cursors and NOWAIT prune locks directly address deadline-rescue performance degradation at high concurrency by replacing full historical scans with bounded, cursor-driven incremental windows.
  • hardbyte/awa#197: Implements receipt-rescue cursors, deadline-rescue cursors, terminal-delete closure materialization, and related correctness/testing improvements tracked in the release readiness meta-issue.

Possibly related PRs

  • hardbyte/awa#306: Introduces queue_ring_state.terminal_counter_trusted_at which v033/v034/v035 migrations check and conditionally backfill to gate exact terminal-count derivation.
  • hardbyte/awa#310: Extends the same v023 substrate installer that this PR enhances with rescue-cursor and deadline-cursor columns and new lease_claims indexing.

Poem

🐰 Cursors hop through claim-ring slots so neat,
No more COUNT stars blocking rescue's beat,
NOWAIT locks fail-fast, no queue to defeat,
Deadline sweeps stop before open, tight and sweet! 🌙

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

@hardbyte hardbyte force-pushed the codex/maintenance-prune-nowait branch from d1e1744 to ecd42ca Compare June 13, 2026 00:59
@hardbyte hardbyte force-pushed the codex/maintenance-prune-nowait branch from ecd42ca to f4373e0 Compare June 13, 2026 05:55
@hardbyte

Copy link
Copy Markdown
Owner Author

5k/s offered-load check after rebase on #350

Run: results/custom-20260613T061209Z-c20e35

Shape:

  • Awa only, revision f4373e0 on codex/maintenance-prune-nowait
  • benchmark harness cb61d2f on codex/awa-lease-rotate-default
  • PG postgres:18.3-alpine
  • --producer-rate 5000 --producer-mode fixed --worker-count 64 --awa-completion-batch-size 1024
  • phases: warmup=2m, clean_5k=20m, recovery_5k=5m
  • sample every 10s, wait-event sample every 2s

Results:

Phase enqueue median completion median completion tail median queue depth median queue depth peak/final dead tuples peak/final
clean_5k 5,003/s 842/s 768/s 2.85M 5.29M / 5.29M 539 / 167
recovery_5k 5,003/s 794/s 768/s 5.96M 6.58M / 6.58M 702 / 217

Wait-event shape during clean_5k:

  • CPU: 570 / 1188 active samples (48.0%)
  • WalSync: 338 / 1188 (28.5%)
  • WALWrite: 154 / 1188 (13.0%)
  • relation lock: 15 / 1188 (1.3%)

Interpretation:

  • The old maintenance relation-lock convoy is not the limiting mechanism in this run; relation waits are low.
  • Dead tuples remain bounded despite the overload. The append-only / partition-truncate story is holding.
  • The producer path absorbs ~5k/s, but single-queue drain is far below the offered rate and gets worse under a multi-million-row live backlog.
  • The named residual mechanism here is live-backlog WAL amplification: when offered load outruns completion, Awa accumulates millions of live ready/history rows; claim and completion keep doing durable append/closure work, and Postgres spends active time mostly on CPU + WAL sync/write rather than relation locks or vacuum cleanup.

This is evidence that #349 fixes the queued-prune-lock failure mode, but it is not evidence that a single logical queue can sustainably absorb 5k/s end-to-end on this machine. The next throughput lever is still queue fanout/striping or a deeper claim/receipt batching redesign that reduces per-completion durable writes.

@hardbyte hardbyte marked this pull request as ready for review June 13, 2026 06:42

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4373e0e8e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread awa-model/src/queue_storage.rs Outdated
Comment on lines +8283 to +8284
ORDER BY claims.claimed_at, claims.job_id, claims.run_lease
LIMIT $3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid stranding stale claims behind a fresh blocker

When a healthy long-running receipt claim sits just after the rescue cursor, this LIMIT $3 confines every rescue pass to the same first 10k rows because the cursor advancement below stops before the first non-advanceable row. Any stale open claims beyond that window are never considered until the healthy blocker closes or becomes stale, so large claim partitions can leave jobs stuck in running even though maintenance keeps firing; the new test only covers a stale tail within the limited window.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
awa-model/src/queue_storage.rs (1)

11664-11673: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only map actual NOWAIT lock contention to Blocked. These three branches currently treat any LOCK TABLE ... NOWAIT failure as contention. That masks real database errors behind the maintenance backoff path, so the runtime can keep reporting Blocked instead of surfacing a real failure.

  • awa-model/src/queue_storage.rs#L11664-L11673: match the lock error and return PruneOutcome::Blocked only for 55P03; propagate other errors with map_sqlx_error.
  • awa-model/src/queue_storage.rs#L11999-L12008: apply the same is_lock_contention_error check before converting the failure to Blocked.
  • awa-model/src/queue_storage.rs#L12227-L12236: keep the claim-prune NOWAIT semantics, but do not swallow non-lock failures into Blocked.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@awa-model/src/queue_storage.rs` around lines 11664 - 11673,
awa-model/src/queue_storage.rs:11664-11673 — change the current unconditional
match-on-lock failure to inspect the sqlx error and only return
PruneOutcome::Blocked when the DB error code is the Postgres lock contention
code "55P03"; for any other error propagate it via map_sqlx_error (or return
Err(map_sqlx_error(err))) instead of swallowing it into Blocked.
awa-model/src/queue_storage.rs:11999-12008 — apply the same selective check:
treat the NOWAIT lock failure as Blocked only when the error is "55P03",
otherwise propagate the error with map_sqlx_error.
awa-model/src/queue_storage.rs:12227-12236 — for the claim-prune NOWAIT path
keep NOWAIT semantics but do not convert non-lock errors to
PruneOutcome::Blocked; detect "55P03" and return Blocked only for that code, and
for other sqlx errors propagate them using map_sqlx_error. Use a small helper or
inline check on err.as_database_error().and_then(|d| d.code()) == Some("55P03")
to identify lock contention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 8203-8213: The loop that rescues stale receipts (using
claim_slot_count() and rescue_stale_receipt_claims_for_slot_tx) can exhaust the
global remaining batch on low-numbered slots and never reach the slot that
prune_oldest_claims is waiting on; change the algorithm to identify the
prune-blocking slot first (e.g. by calling the existing prune_oldest_claims() or
the routine that returns the oldest-generation slot) and attempt rescue for that
slot before iterating other slots, subtracting its rescued count from remaining,
then iterate the remaining slots (skipping the already-handled prune-blocking
slot) so the global remaining budget cannot be fully consumed by other slots
before the prune-blocking slot is serviced; ensure you still break when
remaining==0 and preserve error handling around
rescue_stale_receipt_claims_for_slot_tx.

---

Outside diff comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 11664-11673: awa-model/src/queue_storage.rs:11664-11673 — change
the current unconditional match-on-lock failure to inspect the sqlx error and
only return PruneOutcome::Blocked when the DB error code is the Postgres lock
contention code "55P03"; for any other error propagate it via map_sqlx_error (or
return Err(map_sqlx_error(err))) instead of swallowing it into Blocked.
awa-model/src/queue_storage.rs:11999-12008 — apply the same selective check:
treat the NOWAIT lock failure as Blocked only when the error is "55P03",
otherwise propagate the error with map_sqlx_error.
awa-model/src/queue_storage.rs:12227-12236 — for the claim-prune NOWAIT path
keep NOWAIT semantics but do not convert non-lock errors to
PruneOutcome::Blocked; detect "55P03" and return Blocked only for that code, and
for other sqlx errors propagate them using map_sqlx_error. Use a small helper or
inline check on err.as_database_error().and_then(|d| d.code()) == Some("55P03")
to identify lock contention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 471a5d06-8b88-4b50-a8fe-f710dce2586d

📥 Commits

Reviewing files that changed from the base of the PR and between 58eb350 and f4373e0.

📒 Files selected for processing (21)
  • awa-model/migrations/v023_install_queue_storage_substrate.sql
  • awa-model/migrations/v033_receipt_rescue_cursors.sql
  • awa-model/src/migrations.rs
  • awa-model/src/queue_storage.rs
  • awa-python/src/client.rs
  • awa-worker/src/client.rs
  • awa-worker/src/maintenance.rs
  • awa/tests/queue_storage_benchmark_test.rs
  • awa/tests/queue_storage_runtime_test.rs
  • awa/tests/receipt_plane_chaos_test.rs
  • correctness/storage/AwaDeadTupleContract.tla
  • correctness/storage/AwaSegmentedStorage.tla
  • correctness/storage/AwaStorageLockOrder.tla
  • correctness/storage/MAPPING.md
  • correctness/storage/README.md
  • docs/adr/019-queue-storage-redesign.md
  • docs/adr/023-receipt-plane-ring-partitioning.md
  • docs/adr/032-failed-terminal-retention.md
  • docs/architecture.md
  • docs/configuration.md
  • docs/grafana/awa-dashboard.json

Comment thread awa-model/src/queue_storage.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
correctness/storage/AwaStorageLockOrder.tla (1)

14-16: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify NOWAIT semantics in the module header comment.

Line 14 says ACCESS EXCLUSIVE NOWAIT, but Line 15 says prune “waits.” With NOWAIT, the maintenance path should fail fast/back off rather than queue behind in-flight writers, so this wording is contradictory.

Suggested wording fix
-\*   - LOCK TABLE ACCESS EXCLUSIVE NOWAIT on the partition child (so prune
-\*     waits for in-flight claim/complete writes to commit before
-\*     truncating).
+\*   - LOCK TABLE ACCESS EXCLUSIVE NOWAIT on the partition child (so prune
+\*     fails fast under contention instead of queueing ahead of worker
+\*     traffic before truncating).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@correctness/storage/AwaStorageLockOrder.tla` around lines 14 - 16, The module
header comment in AwaStorageLockOrder.tla contains contradictory semantics
regarding lock behavior in lines 14-16. The comment states ACCESS EXCLUSIVE
NOWAIT (which means fail immediately if the lock cannot be acquired) but then
says prune "waits" for in-flight writes (which contradicts NOWAIT behavior).
Revise the comment to accurately reflect the intended lock semantics: if using
NOWAIT, the language should indicate the operation fails fast or backs off
rather than waits; if the intention is to wait, remove NOWAIT from the
description or adjust the lock specification accordingly to ensure consistency
between the lock mode and the described behavior.
♻️ Duplicate comments (1)
awa-model/src/queue_storage.rs (1)

8552-8567: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prioritize the prune-blocking slot in deadline rescue too.

This loop still spends the global deadline-rescue budget in numeric slot order. Under sustained expired-deadline backlog on lower-numbered slots, remaining can hit zero before the oldest initialized slot is visited, which leaves prune_oldest_claims() stuck behind rescueable open claims even after the stale-receipt path was fixed.

♻️ Minimal fix sketch
     async fn rescue_expired_receipt_deadlines_tx<'a>(
         &self,
         tx: &mut sqlx::Transaction<'a, sqlx::Postgres>,
     ) -> Result<Vec<DeletedLeaseRow>, AwaError> {
+        let schema = self.schema();
         let mut rescued = Vec::new();
         let mut remaining = RECEIPT_RESCUE_BATCH_LIMIT;
+        let preferred_slot: Option<i32> = sqlx::query_as::<_, (i32, i64, i32)>(&format!(
+            r#"
+            SELECT current_slot, generation, slot_count
+            FROM {schema}.claim_ring_state
+            WHERE singleton = TRUE
+            "#
+        ))
+        .fetch_optional(tx.as_mut())
+        .await
+        .map_err(map_sqlx_error)?
+        .and_then(|(current_slot, generation, slot_count)| {
+            oldest_initialized_ring_slot(current_slot, generation, slot_count)
+                .map(|(slot, _)| slot)
+                .filter(|slot| *slot >= 0 && (*slot as usize) < self.claim_slot_count())
+        });
+
+        if let Some(slot) = preferred_slot {
+            let mut slot_rescued = self
+                .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot, remaining)
+                .await?;
+            remaining = remaining.saturating_sub(slot_rescued.len() as i64);
+            rescued.append(&mut slot_rescued);
+            if remaining == 0 {
+                return Ok(rescued);
+            }
+        }
+
         for slot in 0..self.claim_slot_count() {
+            let slot = slot as i32;
+            if Some(slot) == preferred_slot {
+                continue;
+            }
             let mut slot_rescued = self
-                .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot as i32, remaining)
+                .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot, remaining)
                 .await?;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@awa-model/src/queue_storage.rs` around lines 8552 - 8567, The loop that
rescues expired receipt deadlines by iterating through slots in numeric order
(0..self.claim_slot_count()) can exhaust the remaining budget before reaching
the oldest initialized slot, which blocks prune_oldest_claims() from
progressing. Refactor the loop to prioritize the oldest initialized slot (the
prune-blocking slot) first by visiting it before iterating through other slots
in order. This ensures that critical pruning operations are not blocked by
backlog on lower-numbered slots when the deadline-rescue budget is limited.
🧹 Nitpick comments (1)
awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql (1)

57-63: 💤 Low value

Lease-claim-receipts detection heuristic is fragile.

The detection relies on string matching within the function body (position(...IN v_claim_runtime_def)). If the function text changes format (e.g., different quoting, whitespace, or schema aliasing), this could produce false negatives. However, the v_schema = 'awa' fallback ensures the default schema always works, and the heuristic is only used for custom schemas where the function was explicitly defined.

This is acceptable for a migration that runs once per schema upgrade.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql` around lines
57 - 63, The lease-claim-receipts detection uses string matching on the function
body via the position function to detect INSERT statements, which is fragile to
formatting changes. While the comment acknowledges this is acceptable for a
one-time migration due to the v_schema = 'awa' fallback, add a code comment in
the migration explaining the heuristic's limitations and why it is acceptable
(single-run migration, explicit function definition in custom schemas, fallback
for default schema). This documents the technical debt and helps future
maintainers understand the tradeoff without refactoring the detection logic
itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@awa/tests/queue_storage_runtime_test.rs`:
- Around line 8255-8256: The test currently sets idle_threshold to 120 seconds
but only ages last_claimed_at by 90 seconds, which keeps the lease on the fresh
path and prevents the stale-lease branch from being exercised. To fix this,
increase the duration used to age last_claimed_at (currently 90 seconds) to
exceed the idle_threshold value (120 seconds), so the stale-lease code path is
properly tested and the final assertion comparing after_stale and after_fresh
can hold true. This change is needed both where last_claimed_at is initially
aged and in any related assertion or verification sections.

In `@correctness/storage/MAPPING.md`:
- Line 67: The PruneLeaseSegment mapping row on line 67 of the MAPPING.md file
describes the prune operation but omits the ACCESS EXCLUSIVE NOWAIT lock
acquisition detail that is part of the current PR's prune semantics. Update the
PruneLeaseSegment row to explicitly mention the NOWAIT behavior in the lock
acquisition specification, making it consistent with the adjacent mapping rows
on lines 66 and 69 and aligned with the lock-order specification narrative.
- Line 195: In the ready-prune race note at line 195 in the MAPPING.md file,
replace the reference to `PruneLeaseSegment` with `PruneReadySegment` because
the paragraph is describing the ready-partition `prune_oldest` concern, not the
lease segment concern. This ensures the documentation correctly identifies which
spec transition addresses the specific issue being discussed and avoids
cross-spec confusion.

In `@correctness/storage/README.md`:
- Around line 150-151: The README.md description of AwaStorageLockOrder.cfg at
lines 150-151 enumerates the modeled flows but omits terminal-delete from the
list. Since AwaStorageLockOrder.tla now includes TerminalDeletePlan and
StartTerminalDelete, you need to add terminal-delete to the comma-separated list
of modeled flows in the AwaStorageLockOrder.cfg description to keep the
configuration summary complete and accurate.

---

Outside diff comments:
In `@correctness/storage/AwaStorageLockOrder.tla`:
- Around line 14-16: The module header comment in AwaStorageLockOrder.tla
contains contradictory semantics regarding lock behavior in lines 14-16. The
comment states ACCESS EXCLUSIVE NOWAIT (which means fail immediately if the lock
cannot be acquired) but then says prune "waits" for in-flight writes (which
contradicts NOWAIT behavior). Revise the comment to accurately reflect the
intended lock semantics: if using NOWAIT, the language should indicate the
operation fails fast or backs off rather than waits; if the intention is to
wait, remove NOWAIT from the description or adjust the lock specification
accordingly to ensure consistency between the lock mode and the described
behavior.

---

Duplicate comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 8552-8567: The loop that rescues expired receipt deadlines by
iterating through slots in numeric order (0..self.claim_slot_count()) can
exhaust the remaining budget before reaching the oldest initialized slot, which
blocks prune_oldest_claims() from progressing. Refactor the loop to prioritize
the oldest initialized slot (the prune-blocking slot) first by visiting it
before iterating through other slots in order. This ensures that critical
pruning operations are not blocked by backlog on lower-numbered slots when the
deadline-rescue budget is limited.

---

Nitpick comments:
In `@awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql`:
- Around line 57-63: The lease-claim-receipts detection uses string matching on
the function body via the position function to detect INSERT statements, which
is fragile to formatting changes. While the comment acknowledges this is
acceptable for a one-time migration due to the v_schema = 'awa' fallback, add a
code comment in the migration explaining the heuristic's limitations and why it
is acceptable (single-run migration, explicit function definition in custom
schemas, fallback for default schema). This documents the technical debt and
helps future maintainers understand the tradeoff without refactoring the
detection logic itself.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26af158f-1f1a-4c9a-a565-f1de279e2f2a

📥 Commits

Reviewing files that changed from the base of the PR and between f4373e0 and c650aee.

📒 Files selected for processing (18)
  • awa-model/migrations/v023_install_queue_storage_substrate.sql
  • awa-model/migrations/v033_receipt_rescue_cursors.sql
  • awa-model/migrations/v034_receipt_terminal_delete_closures.sql
  • awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql
  • awa-model/src/migrations.rs
  • awa-model/src/queue_storage.rs
  • awa/tests/queue_storage_runtime_test.rs
  • correctness/storage/AwaDeadTupleContract.tla
  • correctness/storage/AwaSegmentedStorage.tla
  • correctness/storage/AwaStorageLockOrder.tla
  • correctness/storage/MAPPING.md
  • correctness/storage/README.md
  • docs/adr/019-queue-storage-redesign.md
  • docs/adr/023-receipt-plane-ring-partitioning.md
  • docs/adr/026-narrow-terminal-history.md
  • docs/architecture.md
  • docs/benchmarking.md
  • docs/configuration.md
✅ Files skipped from review due to trivial changes (4)
  • docs/benchmarking.md
  • docs/adr/026-narrow-terminal-history.md
  • docs/configuration.md
  • docs/adr/023-receipt-plane-ring-partitioning.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • awa-model/migrations/v033_receipt_rescue_cursors.sql
  • awa-model/migrations/v023_install_queue_storage_substrate.sql
  • docs/adr/019-queue-storage-redesign.md

Comment thread awa/tests/queue_storage_runtime_test.rs
Comment thread correctness/storage/MAPPING.md Outdated
Comment thread correctness/storage/MAPPING.md Outdated
Comment thread correctness/storage/README.md Outdated
@hardbyte

Copy link
Copy Markdown
Owner Author

5k/s offered retest after receipt deadline cursor work

Run id: custom-20260613T235621Z-8c69f4

Command:

uv run bench run --systems awa \
  --producer-rate 5000 \
  --producer-mode fixed \
  --worker-count 64 \
  --awa-completion-batch-size 1024 \
  --sample-every 10 \
  --wait-event-sample-every 2 \
  --phase warmup=warmup:2m \
  --phase clean_5k=clean:20m \
  --phase recovery_5k=recovery:5m

Compared against previous same-shape run custom-20260613T225703Z-0d00f7.

Throughput

phase previous completion median this PR completion median enqueue median
clean_5k 812.84/s 812.76/s 5003.08/s
recovery_5k 744.68/s 744.63/s 5002.88/s

So this change is throughput-neutral for this 5k/s offered single-queue shape. It bounds/removes one class of repeated deadline-rescue scan risk, but it does not by itself move the e2e drain ceiling.

Storage/dead-tuple shape

The append-only/tombstone storage shape held under load:

  • ready_entries_*: median/peak dead tuples stayed 0.
  • ready_tombstones_*: median/peak dead tuples stayed 0.
  • queue_terminal_live_counts: clean median 0, peak 1024; recovery median/peak 0.
  • lease_claims_0: previous clean peak 56 dead tuples; current clean peak 0 in the summary sample.

Instrumentation notes

I captured pg_stat_statements, WAL counters, relation stats, and EXPLAIN (ANALYZE, BUFFERS) for claim_ready_runtime at warmup, early/mid/late clean, recovery-start, and recovery-late.

  • claim_ready_runtime EXPLAIN stayed stable through the run: warmup 26.8ms, early clean 27.8ms, mid clean 28.8ms, late clean 28.0ms, recovery-late 28.3ms. Recovery-start had a one-off 73.7ms sample with more buffer reads/writes.
  • WAL bytes were cumulative from stats reset: warmup 0.31GB, early clean 1.49GB, mid clean 3.18GB, late clean 5.08GB, recovery-start 8.17GB, recovery-late 9.33GB.
  • Late-clean top cumulative statements were still dominated by claiming and enqueue/copy path: claim wrapper mean 20.6ms over 26,117 calls, claim body mean 3.05ms, target-slot scan mean 2.06ms, COPY enqueue mean 2.37ms.
  • The named residual mechanism I now see is claim-history live-proof pressure: maintenance/prune periodically proves a ready slot has no live claims by scanning lease_claims and anti-joining terminal evidence. It is not per-job, but it gets expensive under accumulated receipt claim history. In the recovery-late snapshot that proof query had only 28 calls but 24.2s total time / 864ms mean. The slow-query logs showed the same query taking ~1.3-2.2s at rotation boundaries.

My read: this PR is still worth keeping because it gives a bounded deadline-rescue cursor and keeps the storage invariants clean, but the next meaningful performance work should target slot-prune/live-claim proof so maintenance can answer from a cursor/ledger rather than repeatedly reproving over receipt claim history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant