Skip to content

MDEV-38243 Write binlog row events for changes done by cascading FK o…#5317

Open
sjaakola wants to merge 1 commit into
mainfrom
MDEV-38243
Open

MDEV-38243 Write binlog row events for changes done by cascading FK o…#5317
sjaakola wants to merge 1 commit into
mainfrom
MDEV-38243

Conversation

@sjaakola

@sjaakola sjaakola commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

…perations

This commit implements a feature which changes the handling of cascading foreign key operations to write the changes of cascading operations into binlog. The applying of such transaction, in the slave node, will apply just the binlog events, and does not execute the actual foreign key cascade operation. This will simplify the slave side replication applying and make it more predictable in terms of potential interference with other parallel applying happning in the node.

This feature can be turned ON/OFF by new variable: rpl_use_binlog_events_for_fk_cascade, with default value OFF

The actual implementation is largely by windsurf.

The commit has also mtr tests for testing rpl_use_binlog_events_for_fk_cascade feature: rpl.rpl_fk_cascade_binlog_row, rpl.rpl_fk_set_null_binlog_row and rpl.fk_cascade_binlog_row_rollback

…perations

This commit implements a feature which changes the handling of cascading foreign
key operations to write the changes of cascading operations into binlog.
The applying of such transaction, in the slave node, will apply just the binlog
events, and does not execute the actual foreign key cascade operation.
This will simplify the slave side replication applying and make it more predictable
in terms of potential interference with other parallel applying happning
in the node.

This feature can be turned ON/OFF by new variable:
rpl_use_binlog_events_for_fk_cascade, with default value OFF

The actual implementation is largely by windsurf.

The commit has also mtr tests for testing rpl_use_binlog_events_for_fk_cascade
feature:  rpl.rpl_fk_cascade_binlog_row, rpl.rpl_fk_set_null_binlog_row and
rpl.fk_cascade_binlog_row_rollback

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the rpl_use_binlog_events_for_fk_cascade session variable to allow the master to write row events for foreign key cascade operations to the binary log. The review feedback highlights several critical issues: potential server crashes from NULL pointer dereferences when trx->mysql_thd or m_user_thd is NULL, a missing else branch that could permanently disable foreign key checks on the slave, out-of-order logging caused by immediately logging deletes while deferring updates, and a potential memory leak on the slave during rollback due to an early return in discard_pending_cascade_binlog_for_thd.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sql/log_event_server.cc
Comment on lines +5054 to +5055
if (get_flags(FK_CASCADE_EVENTS_F))
thd->variables.option_bits|= OPTION_NO_FOREIGN_KEY_CHECKS;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

If FK_CASCADE_EVENTS_F is set, OPTION_NO_FOREIGN_KEY_CHECKS is enabled on the slave. However, there is no else branch to clear this flag when the event does not have FK_CASCADE_EVENTS_F. This will cause OPTION_NO_FOREIGN_KEY_CHECKS to remain permanently enabled for all subsequent events applied by the same slave thread, effectively disabling foreign key checks on the slave permanently.

Please add an else branch to clear the flag, similar to how OPTION_NO_CHECK_CONSTRAINT_CHECKS is handled.

    if (get_flags(FK_CASCADE_EVENTS_F))
      thd->variables.option_bits|= OPTION_NO_FOREIGN_KEY_CHECKS;
    else
      thd->variables.option_bits&= ~OPTION_NO_FOREIGN_KEY_CHECKS;


cascade->state = UPD_NODE_UPDATE_CLUSTERED;

if (thd_rpl_use_binlog_events_for_fk_cascade(trx->mysql_thd)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

trx->mysql_thd can be NULL during background rollback or recovery operations. Calling thd_rpl_use_binlog_events_for_fk_cascade with a NULL pointer will cause a NULL pointer dereference and crash the server.

Please add a check to ensure trx->mysql_thd is not NULL before calling this function. Additionally, we should check !trx->mysql_thd->rgi_slave to avoid generating these events on slave applier threads.

	if (trx->mysql_thd != NULL &&
	    thd_rpl_use_binlog_events_for_fk_cascade(trx->mysql_thd) &&
	    !trx->mysql_thd->rgi_slave) {

Comment on lines +8878 to +8881
void
ha_innobase::flush_pending_cascade_binlog()
{
trx_t* trx = thd_to_trx(m_user_thd);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

m_user_thd can be NULL for background threads or internal operations. Calling thd_rpl_use_binlog_events_for_fk_cascade(m_user_thd) and wsrep_emulate_binlog(m_user_thd) when m_user_thd is NULL will cause a NULL pointer dereference and crash the server.

Please add a NULL check for m_user_thd at the beginning of the function.

void
ha_innobase::flush_pending_cascade_binlog()
{
	if (m_user_thd == NULL) {
		return;
	}

	trx_t*	trx = thd_to_trx(m_user_thd);

Comment on lines +1614 to +1641
} else {
thd_binlog_mark_fk_cascade_events(trx->mysql_thd);
const ulint len = child_mysql_table->s->reclength;
unsigned char* before_copy = static_cast<unsigned char*>(
my_malloc(PSI_INSTRUMENT_ME, len, MYF(MY_WME)));
unsigned char* after_copy = static_cast<unsigned char*>(
my_malloc(PSI_INSTRUMENT_ME, len, MYF(MY_WME)));

if (before_copy != NULL && after_copy != NULL) {
memcpy(before_copy, before_mysql_rec, len);
memcpy(after_copy, after_mysql_rec, len);

trx_cascade_binlog_row_event ev;
ev.table = child_mysql_table;
ev.before_record = before_copy;
ev.after_record = after_copy;
ev.log_func = reinterpret_cast<void*>(
row_ins_fk_cascade_update_binlog_row);
trx->pending_cascade_binlog_row_events.push_back(ev);
} else {
if (before_copy != NULL) {
my_free(before_copy);
}
if (after_copy != NULL) {
my_free(after_copy);
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Inconsistency and Out-of-Order Logging: Deletes are logged immediately to the binlog cache, while updates are deferred to the end of the statement/transaction. This can cause out-of-order logging of cascading events within the same transaction/statement. If an update depends on a prior delete (e.g., freeing a unique key value), logging them out of order will cause replication failures (such as duplicate key errors) on the slave.

Furthermore, deferring updates introduces significant complexity (queuing, flushing, discarding, memory copying, and potential leaks). Since the mini-transaction mtr is already committed and no InnoDB page latches are held at this point, we can safely log updates immediately using file->binlog_log_row(before_mysql_rec, after_mysql_rec, log_func) just like the delete case. This would completely eliminate the need for the deferred logging infrastructure and ensure correct event ordering.

		} else {
			thd_binlog_mark_fk_cascade_events(trx->mysql_thd);
			MY_BITMAP* old_read_set = child_mysql_table->read_set;
			MY_BITMAP* old_write_set = child_mysql_table->write_set;
			MY_BITMAP* old_rpl_write_set = child_mysql_table->rpl_write_set;

			child_mysql_table->column_bitmaps_set_no_signal(
				&child_mysql_table->tmp_set, &child_mysql_table->tmp_set);
			bitmap_set_all(&child_mysql_table->tmp_set);
			if (Field **vf = child_mysql_table->vfield) {
				for (; *vf; vf++) {
					bitmap_clear_bit(&child_mysql_table->tmp_set, (*vf)->field_index);
				}
			}
			if (child_mysql_table->rpl_write_set == NULL) {
				child_mysql_table->rpl_write_set = &child_mysql_table->tmp_set;
			}
			Log_func* log_func = row_ins_fk_cascade_update_binlog_row;
			file->binlog_log_row(before_mysql_rec, after_mysql_rec, log_func);
			child_mysql_table->column_bitmaps_set_no_signal(old_read_set, old_write_set);
			child_mysql_table->rpl_write_set = old_rpl_write_set;
		}

Comment thread sql/handler.cc
Comment on lines +133 to +135
static void discard_pending_cascade_binlog_for_thd(THD *thd)
{
if (!thd || thd->rgi_slave) return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If a slave thread is applying transactions and somehow accumulates pending cascade events (e.g., if the feature is enabled on the slave and it's applying a transaction from an older master), returning early here means those events will never be discarded on statement or transaction rollback. This can lead to memory leaks or inconsistent state.

Please remove the thd->rgi_slave check from discard_pending_cascade_binlog_for_thd to ensure safe cleanup on rollback.

static void discard_pending_cascade_binlog_for_thd(THD *thd)
{
  if (!thd) return;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant