Skip to content

MDEV-40222 Prevent MTR hang when waiting for wsrep_ready#5312

Open
mariadb-TeemuOllakka wants to merge 1 commit into
MariaDB:10.11from
mariadb-corporation:10.11-MDEV-40222
Open

MDEV-40222 Prevent MTR hang when waiting for wsrep_ready#5312
mariadb-TeemuOllakka wants to merge 1 commit into
MariaDB:10.11from
mariadb-corporation:10.11-MDEV-40222

Conversation

@mariadb-TeemuOllakka

Copy link
Copy Markdown

A query against a server that is up but wedged can connect yet never return, so the loop-count bound in wait_wsrep_ready() did not actually limit the wait and MTR could hang until the suite timeout fired.

Add an optional $timeout to run_query_output(): the mysql client is now spawned via My::SafeProcess->new and waited for with wait_one($timeout), killing the client and returning non-zero if it does not finish in time.

Bound wait_wsrep_ready() by a wall-clock deadline (start_timer) instead of a loop count, and pass the remaining time to each query so no single hung client can exceed the overall server startup budget.

A query against a server that is up but wedged can connect yet never
return, so the loop-count bound in wait_wsrep_ready() did not actually
limit the wait and MTR could hang until the suite timeout fired.

Add an optional $timeout to run_query_output(): the mysql client is
now spawned via My::SafeProcess->new and waited for with
wait_one($timeout), killing the client and returning non-zero if it
does not finish in time.

Bound wait_wsrep_ready() by a wall-clock deadline (start_timer)
instead of a loop count, and pass the remaining time to each query so
no single hung client can exceed the overall server startup budget.
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a timeout mechanism to run_query_output in mariadb-test-run.pl to prevent MTR from hanging indefinitely when querying a wedged server. It replaces the loop-based wait with a wall-clock deadline and uses My::SafeProcess to monitor and kill hung clients if they exceed the remaining startup budget. The review feedback highlights a potential issue where an undefined $timeout argument could trigger Perl warnings or cause premature process termination, and suggests conditionally calling wait_one based on whether $timeout is defined.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +5466 to +5474
# wait_one() returns 1 while the process is still running,
# in which case we kill the hung client.
if ($proc->wait_one($timeout))
{
$proc->kill();
return 1;
}

return $proc->exit_status();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If $timeout is not passed to run_query_output (which is the case for other callers or if it is omitted), $timeout will be undef. Passing undef to $proc->wait_one($timeout) can cause two issues:\n\n1. It may trigger a Perl warning about an uninitialized value.\n2. If wait_one treats undef as 0 (non-blocking poll), it will return 1 immediately because the process is still running, causing the client to be killed prematurely.\n\nTo prevent this, we should explicitly check if $timeout is defined before passing it to wait_one.

  # wait_one() returns 1 while the process is still running,
  # in which case we kill the hung client.
  if (defined $timeout ? $proc->wait_one($timeout) : $proc->wait_one())
  {
    $proc->kill();
    return 1;
  }

  return $proc->exit_status();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants