MDEV-38243 Write binlog row events for changes done by cascading FK o…#5317
MDEV-38243 Write binlog row events for changes done by cascading FK o…#5317sjaakola wants to merge 1 commit into
Conversation
…perations This commit implements a feature which changes the handling of cascading foreign key operations to write the changes of cascading operations into binlog. The applying of such transaction, in the slave node, will apply just the binlog events, and does not execute the actual foreign key cascade operation. This will simplify the slave side replication applying and make it more predictable in terms of potential interference with other parallel applying happning in the node. This feature can be turned ON/OFF by new variable: rpl_use_binlog_events_for_fk_cascade, with default value OFF The actual implementation is largely by windsurf. The commit has also mtr tests for testing rpl_use_binlog_events_for_fk_cascade feature: rpl.rpl_fk_cascade_binlog_row, rpl.rpl_fk_set_null_binlog_row and rpl.fk_cascade_binlog_row_rollback
There was a problem hiding this comment.
Code Review
This pull request introduces the rpl_use_binlog_events_for_fk_cascade session variable to allow the master to write row events for foreign key cascade operations to the binary log. The review feedback highlights several critical issues: potential server crashes from NULL pointer dereferences when trx->mysql_thd or m_user_thd is NULL, a missing else branch that could permanently disable foreign key checks on the slave, out-of-order logging caused by immediately logging deletes while deferring updates, and a potential memory leak on the slave during rollback due to an early return in discard_pending_cascade_binlog_for_thd.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (get_flags(FK_CASCADE_EVENTS_F)) | ||
| thd->variables.option_bits|= OPTION_NO_FOREIGN_KEY_CHECKS; |
There was a problem hiding this comment.
If FK_CASCADE_EVENTS_F is set, OPTION_NO_FOREIGN_KEY_CHECKS is enabled on the slave. However, there is no else branch to clear this flag when the event does not have FK_CASCADE_EVENTS_F. This will cause OPTION_NO_FOREIGN_KEY_CHECKS to remain permanently enabled for all subsequent events applied by the same slave thread, effectively disabling foreign key checks on the slave permanently.
Please add an else branch to clear the flag, similar to how OPTION_NO_CHECK_CONSTRAINT_CHECKS is handled.
if (get_flags(FK_CASCADE_EVENTS_F))
thd->variables.option_bits|= OPTION_NO_FOREIGN_KEY_CHECKS;
else
thd->variables.option_bits&= ~OPTION_NO_FOREIGN_KEY_CHECKS;|
|
||
| cascade->state = UPD_NODE_UPDATE_CLUSTERED; | ||
|
|
||
| if (thd_rpl_use_binlog_events_for_fk_cascade(trx->mysql_thd)) { |
There was a problem hiding this comment.
trx->mysql_thd can be NULL during background rollback or recovery operations. Calling thd_rpl_use_binlog_events_for_fk_cascade with a NULL pointer will cause a NULL pointer dereference and crash the server.
Please add a check to ensure trx->mysql_thd is not NULL before calling this function. Additionally, we should check !trx->mysql_thd->rgi_slave to avoid generating these events on slave applier threads.
if (trx->mysql_thd != NULL &&
thd_rpl_use_binlog_events_for_fk_cascade(trx->mysql_thd) &&
!trx->mysql_thd->rgi_slave) {| void | ||
| ha_innobase::flush_pending_cascade_binlog() | ||
| { | ||
| trx_t* trx = thd_to_trx(m_user_thd); |
There was a problem hiding this comment.
m_user_thd can be NULL for background threads or internal operations. Calling thd_rpl_use_binlog_events_for_fk_cascade(m_user_thd) and wsrep_emulate_binlog(m_user_thd) when m_user_thd is NULL will cause a NULL pointer dereference and crash the server.
Please add a NULL check for m_user_thd at the beginning of the function.
void
ha_innobase::flush_pending_cascade_binlog()
{
if (m_user_thd == NULL) {
return;
}
trx_t* trx = thd_to_trx(m_user_thd);| } else { | ||
| thd_binlog_mark_fk_cascade_events(trx->mysql_thd); | ||
| const ulint len = child_mysql_table->s->reclength; | ||
| unsigned char* before_copy = static_cast<unsigned char*>( | ||
| my_malloc(PSI_INSTRUMENT_ME, len, MYF(MY_WME))); | ||
| unsigned char* after_copy = static_cast<unsigned char*>( | ||
| my_malloc(PSI_INSTRUMENT_ME, len, MYF(MY_WME))); | ||
|
|
||
| if (before_copy != NULL && after_copy != NULL) { | ||
| memcpy(before_copy, before_mysql_rec, len); | ||
| memcpy(after_copy, after_mysql_rec, len); | ||
|
|
||
| trx_cascade_binlog_row_event ev; | ||
| ev.table = child_mysql_table; | ||
| ev.before_record = before_copy; | ||
| ev.after_record = after_copy; | ||
| ev.log_func = reinterpret_cast<void*>( | ||
| row_ins_fk_cascade_update_binlog_row); | ||
| trx->pending_cascade_binlog_row_events.push_back(ev); | ||
| } else { | ||
| if (before_copy != NULL) { | ||
| my_free(before_copy); | ||
| } | ||
| if (after_copy != NULL) { | ||
| my_free(after_copy); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Inconsistency and Out-of-Order Logging: Deletes are logged immediately to the binlog cache, while updates are deferred to the end of the statement/transaction. This can cause out-of-order logging of cascading events within the same transaction/statement. If an update depends on a prior delete (e.g., freeing a unique key value), logging them out of order will cause replication failures (such as duplicate key errors) on the slave.
Furthermore, deferring updates introduces significant complexity (queuing, flushing, discarding, memory copying, and potential leaks). Since the mini-transaction mtr is already committed and no InnoDB page latches are held at this point, we can safely log updates immediately using file->binlog_log_row(before_mysql_rec, after_mysql_rec, log_func) just like the delete case. This would completely eliminate the need for the deferred logging infrastructure and ensure correct event ordering.
} else {
thd_binlog_mark_fk_cascade_events(trx->mysql_thd);
MY_BITMAP* old_read_set = child_mysql_table->read_set;
MY_BITMAP* old_write_set = child_mysql_table->write_set;
MY_BITMAP* old_rpl_write_set = child_mysql_table->rpl_write_set;
child_mysql_table->column_bitmaps_set_no_signal(
&child_mysql_table->tmp_set, &child_mysql_table->tmp_set);
bitmap_set_all(&child_mysql_table->tmp_set);
if (Field **vf = child_mysql_table->vfield) {
for (; *vf; vf++) {
bitmap_clear_bit(&child_mysql_table->tmp_set, (*vf)->field_index);
}
}
if (child_mysql_table->rpl_write_set == NULL) {
child_mysql_table->rpl_write_set = &child_mysql_table->tmp_set;
}
Log_func* log_func = row_ins_fk_cascade_update_binlog_row;
file->binlog_log_row(before_mysql_rec, after_mysql_rec, log_func);
child_mysql_table->column_bitmaps_set_no_signal(old_read_set, old_write_set);
child_mysql_table->rpl_write_set = old_rpl_write_set;
}| static void discard_pending_cascade_binlog_for_thd(THD *thd) | ||
| { | ||
| if (!thd || thd->rgi_slave) return; |
There was a problem hiding this comment.
If a slave thread is applying transactions and somehow accumulates pending cascade events (e.g., if the feature is enabled on the slave and it's applying a transaction from an older master), returning early here means those events will never be discarded on statement or transaction rollback. This can lead to memory leaks or inconsistent state.
Please remove the thd->rgi_slave check from discard_pending_cascade_binlog_for_thd to ensure safe cleanup on rollback.
static void discard_pending_cascade_binlog_for_thd(THD *thd)
{
if (!thd) return;
…perations
This commit implements a feature which changes the handling of cascading foreign key operations to write the changes of cascading operations into binlog. The applying of such transaction, in the slave node, will apply just the binlog events, and does not execute the actual foreign key cascade operation. This will simplify the slave side replication applying and make it more predictable in terms of potential interference with other parallel applying happning in the node.
This feature can be turned ON/OFF by new variable: rpl_use_binlog_events_for_fk_cascade, with default value OFF
The actual implementation is largely by windsurf.
The commit has also mtr tests for testing rpl_use_binlog_events_for_fk_cascade feature: rpl.rpl_fk_cascade_binlog_row, rpl.rpl_fk_set_null_binlog_row and rpl.fk_cascade_binlog_row_rollback