MDEV-40222 Prevent MTR hang when waiting for wsrep_ready#5312
MDEV-40222 Prevent MTR hang when waiting for wsrep_ready#5312mariadb-TeemuOllakka wants to merge 1 commit into
Conversation
A query against a server that is up but wedged can connect yet never return, so the loop-count bound in wait_wsrep_ready() did not actually limit the wait and MTR could hang until the suite timeout fired. Add an optional $timeout to run_query_output(): the mysql client is now spawned via My::SafeProcess->new and waited for with wait_one($timeout), killing the client and returning non-zero if it does not finish in time. Bound wait_wsrep_ready() by a wall-clock deadline (start_timer) instead of a loop count, and pass the remaining time to each query so no single hung client can exceed the overall server startup budget.
|
|
There was a problem hiding this comment.
Code Review
This pull request introduces a timeout mechanism to run_query_output in mariadb-test-run.pl to prevent MTR from hanging indefinitely when querying a wedged server. It replaces the loop-based wait with a wall-clock deadline and uses My::SafeProcess to monitor and kill hung clients if they exceed the remaining startup budget. The review feedback highlights a potential issue where an undefined $timeout argument could trigger Perl warnings or cause premature process termination, and suggests conditionally calling wait_one based on whether $timeout is defined.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # wait_one() returns 1 while the process is still running, | ||
| # in which case we kill the hung client. | ||
| if ($proc->wait_one($timeout)) | ||
| { | ||
| $proc->kill(); | ||
| return 1; | ||
| } | ||
|
|
||
| return $proc->exit_status(); |
There was a problem hiding this comment.
If $timeout is not passed to run_query_output (which is the case for other callers or if it is omitted), $timeout will be undef. Passing undef to $proc->wait_one($timeout) can cause two issues:\n\n1. It may trigger a Perl warning about an uninitialized value.\n2. If wait_one treats undef as 0 (non-blocking poll), it will return 1 immediately because the process is still running, causing the client to be killed prematurely.\n\nTo prevent this, we should explicitly check if $timeout is defined before passing it to wait_one.
# wait_one() returns 1 while the process is still running,
# in which case we kill the hung client.
if (defined $timeout ? $proc->wait_one($timeout) : $proc->wait_one())
{
$proc->kill();
return 1;
}
return $proc->exit_status();
A query against a server that is up but wedged can connect yet never return, so the loop-count bound in wait_wsrep_ready() did not actually limit the wait and MTR could hang until the suite timeout fired.
Add an optional $timeout to run_query_output(): the mysql client is now spawned via My::SafeProcess->new and waited for with wait_one($timeout), killing the client and returning non-zero if it does not finish in time.
Bound wait_wsrep_ready() by a wall-clock deadline (start_timer) instead of a loop count, and pass the remaining time to each query so no single hung client can exceed the overall server startup budget.