test: add receive two-session lock harness#175
test: add receive two-session lock harness#175alceops wants to merge 2 commits intoNikolayS:mainfrom
Conversation
| } | ||
|
|
||
| # Give session 1 enough time to enter receive() and hold its transaction open. | ||
| sleep 1 |
There was a problem hiding this comment.
can be racy on a slow runner; perhaps better to poll until session 1 is idle-in-tx or sub_batch is not null
smth like:
| sleep 1 | |
| for i in $(seq 1 50); do | |
| "${psql_base[@]}" -tAc " | |
| select from pgque.subscription s | |
| join pgque.queue q on q.queue_id = s.sub_queue | |
| where q.queue_name = '${queue_name}' and s.sub_batch is not null | |
| " | grep -q 1 && break | |
| sleep 0.2 | |
| done |
| v_count integer; | ||
| begin | ||
| select count(*) into v_count from s2_receive; | ||
| assert v_count = 0, format('session2 must not receive duplicate message, got %s', v_count); |
There was a problem hiding this comment.
I guess this assert is going to be changed once #125 lands.
thinking:
the harness is meant to be RED on current main (which it is — your run shows session 2 returns 1 row after waiting ~3.2s) and GREEN once a fix lands
there is a concern related to the GREEN target.
Both the next_batch_custom() early-return at sql/pgque.sql:2127 (if batch_id is not null then return;) and your own description of the intended fix in the #125 thread — "assert session 2 waits and returns the same batch_id after session 1 commits" — point to v_count = 1, not 0. A FOR UPDATE OF s fix serializes the two next_batch_custom calls; the second one sees sub_batch = batch_A and returns it idempotently, then get_batch_events(batch_A) yields the same row.
If the assertion stays at v_count = 0, the harness will stay RED even after the intended fix lands and read as "fix is broken" to whoever runs it next.
how could resolve it:
| assert v_count = 0, format('session2 must not receive duplicate message, got %s', v_count); | |
| assert v_count = 1, format('session2 expected idempotent re-fetch (1 row), got %s', v_count); | |
| -- and assert s2_receive.batch_id = s1_receive.batch_id |
The "no double-delivery" guarantee here is same batch_id, same rows (single-worker contract), not zero rows in the second session
| fi | ||
|
|
||
| elapsed=$((end_epoch - start_epoch)) | ||
| if (( elapsed < 2 )); then |
There was a problem hiding this comment.
thinking maybe < 3 is better; with pg_sleep(4) starting near T+0 and session 2 launching at T+1, the wait is ~3s. The < 2 threshold lets a 2.0s response (which would mean the lock didn't actually serialize) pass.
| cleanup() { | ||
| rm -rf "${workdir}" | ||
| } | ||
| trap cleanup EXIT |
There was a problem hiding this comment.
for "absolute cleanup", worth adding to cleanup:
select pgque.unregister_consumer('${queue_name}', 'c1');
select pgque.drop_queue('${queue_name}', true);| @@ -0,0 +1,108 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
please add the Apache-2.0 / PgQ derivation header that the rest of tests/ carries (see e.g. tests/test_concurrent_receive.sql).
| select count(*) into v_count from s1_receive; | ||
| assert v_count = 1, format('session1 expected 1 message, got %s', v_count); | ||
| end \$\$; | ||
| select pg_sleep(4); |
There was a problem hiding this comment.
a very minor note:
Tightly coupled to the elapsed-time threshold on line 102. If you bump one, bump the other. Worth pulling both into a single HOLD_SECONDS=4 shell variable at the top so the contract is obvious.
|
thanks @alceops!! left some comments above |
Adds an optional two-session
psqlharness for the same-consumerpgque.receive()race discussed in #125.The existing SQL regression coverage is single-session, so this script opens two real backend sessions:
pgque.receive()for the same(queue, consumer)I left it as an explicit script instead of wiring it into CI because it is meant as a red/green validator for the held #125 fix: current pre-fix code fails, while the row-lock fix should make it pass.
Verification run locally against current
maininstall (expected red):