Skip to content

[DPE-9964] Pre-upgrade switchover to minimize client write downtime during refresh#1650

Draft
taurus-forever wants to merge 2 commits intomainfrom
alutay/dpe9964
Draft

[DPE-9964] Pre-upgrade switchover to minimize client write downtime during refresh#1650
taurus-forever wants to merge 2 commits intomainfrom
alutay/dpe9964

Conversation

@taurus-forever
Copy link
Copy Markdown
Contributor

@taurus-forever taurus-forever commented Apr 27, 2026

Issue

Primary write endpoint can be temporary outdated for client
applications during the charm upgrade (juju refresh).

Solution

When the Patroni primary is also the Juju leader (upgraded last),
clients lose write access for the entire upgrade cycle because no unit
can update the relation endpoints. Perform a graceful Patroni switchover
before the snap refresh so the endpoint is updated while the unit is
still responsive. Falls back to the current automatic failover behavior
if the switchover fails.

Also, to commit endpoint to client, charm have to defer upgrade_granted, as
Juju batches relation data changes and only commits them when the hook
exits. The previous approach updated endpoints inside _on_upgrade_granted
but the client wouldn't see the change until after the snap refresh
completed — defeating the purpose.

Now the switchover + endpoint update happens in the first invocation,
which defers the event and returns. Juju commits the endpoint change,
the client sees the new primary immediately, and the deferred event
fires a second time to proceed with the snap refresh.

Skip pre-upgrade switchover for single unit application or when snap revision is unchanged.

Assisted-by: Claude:claude-4.6-opus

@taurus-forever taurus-forever added the bug Something isn't working as expected label Apr 27, 2026
@taurus-forever taurus-forever changed the title Alutay/dpe9964 [DPE-9964] Pre-upgrade switchover to minimize client write downtime during refresh Apr 27, 2026
@github-actions github-actions Bot added the Libraries: OK The charm libs used are OK and in-sync label Apr 27, 2026
…uring refresh

When the Patroni primary is also the Juju leader (upgraded last),
clients lose write access for the entire upgrade cycle because no unit
can update the relation endpoints. Perform a graceful Patroni switchover
before the snap refresh so the endpoint is updated while the unit is
still responsive. Falls back to the current automatic failover behavior
if the switchover fails.

Also, to commit endpoint to client, charm have to defer upgrade_granted, as
Juju batches relation data changes and only commits them when the hook
exits. The previous approach updated endpoints inside _on_upgrade_granted
but the client wouldn't see the change until after the snap refresh
completed — defeating the purpose.

Now the switchover + endpoint update happens in the first invocation,
which defers the event and returns. Juju commits the endpoint change,
the client sees the new primary immediately, and the deferred event
fires a second time to proceed with the snap refresh.

Skip pre-upgrade switchover for single unit application or when snap revision is unchanged.

Assisted-by: Claude:claude-4.6-opus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working as expected Libraries: OK The charm libs used are OK and in-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant