MDEV-38147 Mariadb error 1950 after SST by ayurchen · Pull Request #5316 · MariaDB/server

ayurchen · 2026-07-01T11:27:53Z

Pull request created in: https://jira.mariadb.org/browse/MDEV-38147

Investigation into MDEV-38147 revealed that with --galera-info option given during SST mariabackup rotates the binlog and ships it to joiner. The file is likely to contain wrong Gtid_list info and is used on joiner to initialize binlog.
Since that file is useless, don't rotate the binlog and ship the file, instead the joiner can generate its own correct Gtid_list.

MDEV-40179 - rollback orphaned prepared transactions.

After a mariabackup SST the joiner could fail with ER_GTID_STRICT_OUT_OF_ORDER (error 1950) while re-binlogging transactions received over IST. The cause is that the binary log copied from the donor carries a Gtid_list whose position can be ahead of the storage-engine snapshot: BACKUP STAGE BLOCK_COMMIT blocks the engine commit (2PC step 3) but not the binary log write (step 2), so transactions can be present in the copied binlog that are not committed in the copied engine snapshot. After the SST the joiner reports the (committed) engine position to the cluster, IST resends those transactions, and re-binlogging them under gtid_strict_mode=ON collides with the ahead Gtid_list -> error 1950. (MDEV-34483 made the engine snapshot stop short of the binlog, which is what exposed this.) The copied binary log carries no transactions the joiner needs - only a Gtid_list - so instead of shipping and then having to truncate/reconcile it, the joiner now starts a fresh binary log and seeds its GTID position from the storage-engine checkpoint during recovery. That checkpoint is the committed cluster position, i.e. exactly where IST resumes, so the joiner's binary log stays in lockstep with the rest of the cluster and no out-of-order GTID can occur. This works for both wsrep_gtid_mode settings; only the binlog domain of the cluster stream differs: - wsrep_gtid_mode=ON : wsrep_gtid_domain_id (cluster writes are re-tagged to it), which is the domain stored in the checkpoint; - wsrep_gtid_mode=OFF: gtid_domain_id (cluster writes keep the node's configured domain). Async-replica positions (mysql.gtid_slave_pos) are part of the engine snapshot and survive the SST unchanged, so a Galera node can still serve as an async master or replica across the SST. This commit: - sql/log.cc: adds wsrep_seed_binlog_gtid_state(), called from do_binlog_recovery() when the joiner has no binary log, seeding the binlog GTID state for the cluster domain to the SE checkpoint position. - scripts/wsrep_sst_mariabackup.sh: no longer moves the donor's binary log into place on the joiner. - extra/mariabackup: stop flushing and copying the donor's current binary log under --galera-info (removed write_current_binlog_file()). Its only purpose was to ship that binary log to the joiner, which now discards it; flushing needlessly rotated the donor's binary log on every SST. xtrabackup_galera_info and xtrabackup_binlog_info are still written. - sql/wsrep_sst.cc: logs the position actually adopted from storage (the authoritative post-SST position) rather than the script-reported one. - sql/handler.cc: downgrades the "Discovered discontinuity in recovered wsrep transaction XIDs" message in wsrep_order_and_check_continuity() from warning to debug level. With parallel appliers a snapshot routinely captures prepared XIDs that are not contiguous with the engine checkpoint, so this is normal during SST recovery and of no value in regular operation; the transactions past the checkpoint are re-delivered by the cluster (IST/SST) regardless. - Adds an MDEV-38147 MTR test reproducing the issue. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

With log_bin=ON a transaction is committed via two-phase commit (the binary log is the second participant), so it passes through the InnoDB XA-prepare state. While a donor is held in BLOCK_COMMIT for a mariabackup backup, its parallel appliers (wsrep_slave_threads > 1) leave one or more such writesets prepared-but-not-yet-committed, and the snapshot captures them. On a freshly SST'd joiner nothing resolves these prepared transactions: binlog crash recovery does not run (the joiner has no in-use binlog to recover from), and the wsrep continuity-based commit is inactive because wsrep_emulate_bin_log is FALSE when log_bin is ON. The leftover prepared transactions then abort startup with "Found <N> prepared transactions!". Note this does not depend on the prepared set being non-contiguous - even a contiguous run aborts, because nothing commits or rolls it back. Rollback these transactions in xarecover_handlerton(). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist · 2026-07-01T11:39:04Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

ayurchen · 2026-07-01T14:21:40Z

/gemini review

gemini-code-assist · 2026-07-01T14:43:39Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

ayurchen and others added 2 commits July 1, 2026 13:11

ayurchen requested a review from temeo July 1, 2026 13:02

ayurchen self-assigned this Jul 1, 2026

ayurchen added the Codership Codership Galera label Jul 1, 2026

ayurchen added this to the 10.11 milestone Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MDEV-38147 Mariadb error 1950 after SST#5316

MDEV-38147 Mariadb error 1950 after SST#5316
ayurchen wants to merge 2 commits into
10.11from
MDEV-38147-mariadb-error-1950-after-sst

ayurchen commented Jul 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

ayurchen commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

ayurchen commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

ayurchen commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

ayurchen commented Jul 1, 2026 •

edited

Loading