Skip to content

MDEV-38147 Mariadb error 1950 after SST#5316

Open
ayurchen wants to merge 2 commits into
10.11from
MDEV-38147-mariadb-error-1950-after-sst
Open

MDEV-38147 Mariadb error 1950 after SST#5316
ayurchen wants to merge 2 commits into
10.11from
MDEV-38147-mariadb-error-1950-after-sst

Conversation

@ayurchen

@ayurchen ayurchen commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Pull request created in: https://jira.mariadb.org/browse/MDEV-38147

Investigation into MDEV-38147 revealed that with --galera-info option given during SST mariabackup rotates the binlog and ships it to joiner. The file is likely to contain wrong Gtid_list info and is used on joiner to initialize binlog.
Since that file is useless, don't rotate the binlog and ship the file, instead the joiner can generate its own correct Gtid_list.

MDEV-40179 - rollback orphaned prepared transactions.

ayurchen and others added 2 commits July 1, 2026 13:11
After a mariabackup SST the joiner could fail with

  ER_GTID_STRICT_OUT_OF_ORDER (error 1950)

while re-binlogging transactions received over IST.

The cause is that the binary log copied from the donor carries a
Gtid_list whose position can be ahead of the storage-engine snapshot:
BACKUP STAGE BLOCK_COMMIT blocks the engine commit (2PC step 3) but not
the binary log write (step 2), so transactions can be present in the
copied binlog that are not committed in the copied engine snapshot.
After the SST the joiner reports the (committed) engine position to the
cluster, IST resends those transactions, and re-binlogging them under
gtid_strict_mode=ON collides with the ahead Gtid_list -> error 1950.
(MDEV-34483 made the engine snapshot stop short of the binlog, which is
what exposed this.)

The copied binary log carries no transactions the joiner needs - only a
Gtid_list - so instead of shipping and then having to truncate/reconcile
it, the joiner now starts a fresh binary log and seeds its GTID position
from the storage-engine checkpoint during recovery. That checkpoint is
the committed cluster position, i.e. exactly where IST resumes, so the
joiner's binary log stays in lockstep with the rest of the cluster and
no out-of-order GTID can occur.

This works for both wsrep_gtid_mode settings; only the binlog domain of
the cluster stream differs:

  - wsrep_gtid_mode=ON : wsrep_gtid_domain_id (cluster writes are
    re-tagged to it), which is the domain stored in the checkpoint;
  - wsrep_gtid_mode=OFF: gtid_domain_id (cluster writes keep the node's
    configured domain).

Async-replica positions (mysql.gtid_slave_pos) are part of the engine
snapshot and survive the SST unchanged, so a Galera node can still serve
as an async master or replica across the SST.

This commit:
 - sql/log.cc: adds wsrep_seed_binlog_gtid_state(), called from
   do_binlog_recovery() when the joiner has no binary log, seeding the
   binlog GTID state for the cluster domain to the SE checkpoint position.
 - scripts/wsrep_sst_mariabackup.sh: no longer moves the donor's binary
   log into place on the joiner.
 - extra/mariabackup: stop flushing and copying the donor's current
   binary log under --galera-info (removed write_current_binlog_file()).
   Its only purpose was to ship that binary log to the joiner, which now
   discards it; flushing needlessly rotated the donor's binary log on
   every SST. xtrabackup_galera_info and xtrabackup_binlog_info are still
   written.
 - sql/wsrep_sst.cc: logs the position actually adopted from storage
   (the authoritative post-SST position) rather than the script-reported
   one.
 - sql/handler.cc: downgrades the "Discovered discontinuity in recovered
   wsrep transaction XIDs" message in wsrep_order_and_check_continuity()
   from warning to debug level. With parallel appliers a snapshot
   routinely captures prepared XIDs that are not contiguous with the
   engine checkpoint, so this is normal during SST recovery and of no
   value in regular operation; the transactions past the checkpoint are
   re-delivered by the cluster (IST/SST) regardless.
 - Adds an MDEV-38147 MTR test reproducing the issue.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
With log_bin=ON a transaction is committed via two-phase commit (the
binary log is the second participant), so it passes through the InnoDB
XA-prepare state. While a donor is held in BLOCK_COMMIT for a mariabackup
backup, its parallel appliers (wsrep_slave_threads > 1) leave one or more
such writesets prepared-but-not-yet-committed, and the snapshot captures
them. On a freshly SST'd joiner nothing resolves these prepared
transactions: binlog crash recovery does not run (the joiner has no in-use
binlog to recover from), and the wsrep continuity-based commit is inactive
because wsrep_emulate_bin_log is FALSE when log_bin is ON. The leftover
prepared transactions then abort startup with "Found <N> prepared
transactions!". Note this does not depend on the prepared set being
non-contiguous - even a contiguous run aborts, because nothing commits
or rolls it back.

Rollback these transactions in xarecover_handlerton().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@ayurchen ayurchen requested a review from temeo July 1, 2026 13:02
@ayurchen ayurchen self-assigned this Jul 1, 2026
@ayurchen ayurchen added the Codership Codership Galera label Jul 1, 2026
@ayurchen ayurchen added this to the 10.11 milestone Jul 1, 2026
@ayurchen

ayurchen commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Codership Codership Galera

Development

Successfully merging this pull request may close these issues.

1 participant