Found a deadlock issue in SH test (more details in SH issue#88):
-
Thread 1 (nuraft-reconfigure): During the replace_member process, after removing the old member, nuraft acquires the nuraft lock to trigger a reconfiguration and clears the snapshot_sync_ctx. The cleanup operation requires the current user_snp_ctx to stop, which in turn depends on all pending prefetch blobs being read. However, this operation is blocked, waiting for an I/O reactor to handle the read.
-
Thread 2 (IO reactor worker 1): This thread calls monitor_replace_member_replication_status, detects that the replace member task is completed, and attempts to reset the quorum size. However, it is blocked waiting for the nuraft lock, which is held by Thread 1. At the same time, Thread 2 holds the m_rd_map_mtx mutex.
-
Thread 3 (IO reactor worker 2): This thread calls gc_repl_reqs, which attempts to acquire the m_rd_map_mtx mutex held by Thread 2. As a result, Thread 3 is blocked.
Since both I/O reactor threads (Thread 2 and Thread 3) are blocked, no I/O operations can proceed. This prevents Thread 1 from completing the read operation required to release the nuraft lock, leading to a deadlock.
Since monitor_replace_member_replication_status and gc_repl_reqs are not typical write/read operations, should we consider isolating them from the default IOMgr workers? Below are the timers currently using default IOMgr workers:
m_rdev_gc_timer_hdl: Triggers gc_repl_reqs and gc_repl_devs every minute.
m_rdev_fetch_timer_hdl: Triggers fetch_pending_data every second.
m_flush_durable_commit_timer_hdl: Triggers flush_durable_commit_lsn every 500ms.
m_replace_member_sync_check_timer_hdl: Triggers monitor_replace_member_replication_status every minute.
m_res_audit_timer_hdl: Triggers trigger_truncate every 2 minutes.
Found a deadlock issue in SH test (more details in SH issue#88):
Thread 1 (nuraft-reconfigure): During the
replace_memberprocess, after removing the old member,nuraftacquires thenuraftlock to trigger a reconfiguration and clears thesnapshot_sync_ctx. The cleanup operation requires the currentuser_snp_ctxto stop, which in turn depends on all pending prefetch blobs being read. However, this operation is blocked, waiting for an I/O reactor to handle the read.Thread 2 (IO reactor worker 1): This thread calls
monitor_replace_member_replication_status, detects that the replace member task is completed, and attempts to reset the quorum size. However, it is blocked waiting for thenuraftlock, which is held by Thread 1. At the same time, Thread 2 holds them_rd_map_mtxmutex.Thread 3 (IO reactor worker 2): This thread calls
gc_repl_reqs, which attempts to acquire them_rd_map_mtxmutex held by Thread 2. As a result, Thread 3 is blocked.Since both I/O reactor threads (Thread 2 and Thread 3) are blocked, no I/O operations can proceed. This prevents Thread 1 from completing the read operation required to release the
nuraftlock, leading to a deadlock.Since
monitor_replace_member_replication_statusandgc_repl_reqsare not typical write/read operations, should we consider isolating them from the default IOMgr workers? Below are the timers currently using default IOMgr workers: