Skip to content

Handle endpos pipeline shutdown cleanly and reset sequences on follow#40

Merged
teknogeek0 merged 2 commits into
mainfrom
fix/cdc-endpos-shutdown
Jun 22, 2026
Merged

Handle endpos pipeline shutdown cleanly and reset sequences on follow#40
teknogeek0 merged 2 commits into
mainfrom
fix/cdc-endpos-shutdown

Conversation

@teknogeek0

Copy link
Copy Markdown
Collaborator

Problem

In follow mode, the receive → transform → apply processes are connected by Unix pipes. When the apply process reaches endpos it exits and closes its read end of the pipe. Upstream processes still writing trailing messages then hit EPIPE and exit non-zero. Because the supervisor (follow_wait_subprocesses) ANDs every child's exit status, a migration that completed correctly at endpos was reported as a failure.

A customer reported exactly this:

pgcopydb seems to have a race condition when it encounters the endpos sentinel whereby the replayer says "ok done", and quits. The json→sql transformer then tries to write to the pipe, EPIPE dies, and then pgcopydb thinks the whole thing failed.

The false failure had a second, quieter consequence: it skipped the end-of-migration sequence reset, so target sequences were left at their initial base-copy values (logical decoding does not replicate sequences). The same gap existed for the standalone pgcopydb follow command used to resume CDC after a crash — it never reset sequences at all.

Fix

Two layers of defense, since this involves customer data:

Child sidetransform and receive treat an EPIPE on the downstream pipe as a clean shutdown only when endpos has been durably reached for the last message they processed. In every other case (endpos unset, or not yet reached) a broken pipe is still a failure.

Supervisor backstopfollow_wait_subprocesses declares overall success when the apply process exited cleanly and endpos has been durably applied (endpos <= replay_lsn), regardless of upstream teardown noise.

The apply process is authoritative — it exits cleanly only after durably applying through endpos and syncing replay_lsn. So the backstop cannot mask a genuine pre-endpos failure: if apply crashed, or endpos was not reached, success is left false and the failure propagates so the operator can resume. The two layers are belt-and-suspenders: the child-side gate handles the common case locally, and the supervisor gate guarantees a completed migration is never reported as failed even if an upstream process exits non-zero for another teardown reason.

Sequences

follow_reset_sequences is now also run by the standalone pgcopydb follow command once endpos is durably reached. This mirrors what clone --follow already does at the end of its run, and makes a resumed CDC run (pgcopydb follow --resume) that catches up to endpos correctly update target sequences to current source values. The reset is gated on endpos being reached, so an interrupted continuous follow (no endpos, or stopped early by a signal) does not advance sequences ahead of the data actually applied.

Interaction with the reconnect/backoff flow

The downstream-EPIPE path is separate from the source-reconnect path: the existing exponential backoff loop fires only on source connection loss, while a broken downstream pipe is handled in its own branch. This change only affects how that downstream branch reports its outcome at endpos; it does not touch the reconnect window, backoff timing, or permissions-error handling.

Testing

Full CDC / follow / unit suites pass on PG18:

cdc-wal2json, cdc-test-decoding, follow-wal2json, follow-defer-indexes, follow-defer-validate-fks, cdc-endpos-between-transaction, endpos-in-multi-wal-txn, cdc-low-level, cdc-message-handling, follow-data-only, follow-9.6, follow-target-reconnect, follow-standby, cdc-filtering, unit — all green.

follow-target-reconnect in particular confirms the reconnect/backoff behavior is unchanged.

Note

The endpos shutdown race is timing-dependent (pipe buffer fill, data volume), so it does not reproduce deterministically in CI — every test run completes via the normal clean-shutdown path. The change is verified not to regress any existing behavior; the correctness of the endpos handling rests on the gating analysis above (apply is authoritative; the override is strictly gated on endpos <= replay_lsn).

In follow mode the receive, transform, and apply processes are connected
by Unix pipes. When the apply process reaches endpos it exits and closes
its read end of the pipe. Upstream processes that are still writing
trailing messages then hit EPIPE and exit non-zero, and the supervisor
ANDs every child's status, so a migration that completed correctly at
endpos was reported as a failure.

This adds two layers of handling:

  - Child side: transform and receive treat an EPIPE on the downstream
    pipe as a clean shutdown only when endpos has been durably reached
    for the last message they processed. In every other case (endpos
    unset, or not yet reached) a broken pipe is still a failure.

  - Supervisor backstop: follow_wait_subprocesses declares overall
    success when the apply process exited cleanly and endpos has been
    durably applied (endpos <= replay_lsn), regardless of upstream
    teardown noise. The apply process is authoritative, so this cannot
    mask a genuine pre-endpos failure: if apply crashed or endpos was
    not reached, the failure still propagates.

The false failure also skipped the end-of-migration sequence reset.
follow_reset_sequences is now also run by the standalone "pgcopydb
follow" command once endpos is durably reached, so a resumed CDC run
that catches up to endpos updates target sequences to current source
values. Previously only "clone --follow" reset sequences, leaving
resume-after-crash with stale sequences.
Adds a deterministic regression test for the sequence reset performed by
the standalone `pgcopydb follow` command when it reaches endpos (the path
used by resume-cdc helpers, which previously did not reset sequences).

The test clones pagila, advances rental_rental_id_seq on the source by
inserting rows, sets endpos, and runs `pgcopydb follow --resume`. Because
CDC replays the inserts with OVERRIDING SYSTEM VALUE (explicit ids that do
not advance the target sequence), the target sequence only catches up to
the source if follow_reset_sequences runs at endpos. The test asserts the
target sequence advanced from the snapshot value to match the source.

Verified the test fails (target stuck at the snapshot value) when the
reset is removed, and passes with it in place.
@teknogeek0 teknogeek0 merged commit 50f630c into main Jun 22, 2026
90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant