MDEV-40129 Retry transient trylock failures in lock-release fast paths by iMineLink · Pull Request #5298 · MariaDB/server

iMineLink · 2026-06-29T16:25:27Z

The trylock attempts on per-cell lock_sys_t::hash_latch (try_acquire()) and on per-table dict_table_t::lock_mutex_trylock() inside lock_release_try(), lock_release_on_prepare_try() and lock_rec_unlock_unmodified() now use a bounded spin loop (up to LOCK_RELEASE_TRY_SPIN_BUDGET CAS attempts, with MY_RELAX_CPU() between them) instead of a single CAS attempt.

These paths hold trx->mutex while attempting the trylock, which is the reverse of the standard order used by lock_rec_convert_impl_to_expl(). Blocking acquisition is therefore unsafe, hence the trylock pattern. However, a single failed CAS marks the entire pass of lock_release_try() as unsuccessful, and after 5 such failed passes lock_release() falls back to exclusive lock_sys.wr_lock() for the whole transaction. That global wr_lock then blocks every concurrent lock_sys.rd_lock() acquirer in lock_rec_lock() and lock_table(), producing a server-wide convoy under heavy concurrency.

The bounded spin (no syscall, no blocking) gives a transient latch holder time to release without weakening the deadlock-avoidance guarantee that motivated the trylock pattern. The extra trx->mutex hold time is bounded by LOCK_RELEASE_TRY_SPIN_BUDGET times the pause cost.

This is a first, still to be fine-tuned implementation. Only the lock_release_try() path has been positively tested; the lock_release_on_prepare_try() path is not yet covered.

CLAassistant · 2026-06-29T16:25:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist

Code Review

This pull request introduces a bounded spin mechanism (LOCK_RELEASE_TRY_SPIN_BUDGET) and helper functions (cell_latch_try_acquire_spin and table_lock_mutex_try_acquire_spin) to acquire cell and table latches with a bounded spin. This helps reduce contention and avoids escalating to a full lock_sys write lock. The review feedback points out a potential underflow risk in both spin functions if the budget is configured to 0, which could lead to an extremely long loop and server hang. Additionally, a minor formatting correction is suggested for consistent spacing around the assignment operator.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Copilot

Pull request overview

This PR reduces lock-release contention in InnoDB by replacing single-shot trylock attempts (on per-cell hash_latch and per-table lock_mutex_trylock()) with a bounded spin + MY_RELAX_CPU() loop, aiming to avoid frequent fallback to lock_sys.wr_lock() and the resulting server-wide convoy under high concurrency.

Changes:

Introduces LOCK_RELEASE_TRY_SPIN_BUDGET and helper functions to retry trylocks with a bounded spin loop.
Switches lock_release_try() and lock_release_on_prepare_try() to use the new bounded-spin trylock helpers.
Updates lock_rec_unlock_unmodified() to use the same bounded-spin acquisition for the record-hash cell latch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The trylock attempts on per-cell lock_sys_t::hash_latch (try_acquire()) and on per-table dict_table_t::lock_mutex_trylock() inside lock_release_try(), lock_release_on_prepare_try() and lock_rec_unlock_unmodified() now use a bounded spin loop (up to LOCK_RELEASE_TRY_SPIN_BUDGET CAS attempts, with MY_RELAX_CPU() between them) instead of a single CAS attempt. These paths hold trx->mutex while attempting the trylock, which is the reverse of the standard order used by lock_rec_convert_impl_to_expl(). Blocking acquisition is therefore unsafe, hence the trylock pattern. However, a single failed CAS marks the entire pass of lock_release_try() as unsuccessful, and after 5 such failed passes lock_release() falls back to exclusive lock_sys.wr_lock() for the whole transaction. That global wr_lock then blocks every concurrent lock_sys.rd_lock() acquirer in lock_rec_lock() and lock_table(), producing a server-wide convoy under heavy concurrency. The bounded spin (no syscall, no blocking) gives a transient latch holder time to release without weakening the deadlock-avoidance guarantee that motivated the trylock pattern. The extra trx->mutex hold time is bounded by LOCK_RELEASE_TRY_SPIN_BUDGET times the pause cost. This is a first, still to be fine-tuned implementation. Only the lock_release_try() path has been positively tested; the lock_release_on_prepare_try() path is not yet covered.

iMineLink requested review from Copilot and dr-m June 29, 2026 16:25

Copilot started reviewing on behalf of iMineLink June 29, 2026 16:25 View session

gemini-code-assist Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread storage/innobase/lock/lock0lock.cc

Comment thread storage/innobase/lock/lock0lock.cc

Comment thread storage/innobase/lock/lock0lock.cc

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread storage/innobase/lock/lock0lock.cc Outdated

iMineLink added the MariaDB Corporation label Jun 29, 2026

iMineLink force-pushed the MDEV-40129 branch from 3902277 to 4f401e0 Compare June 29, 2026 16:37

iMineLink mentioned this pull request Jun 29, 2026

MDEV-40209 Escalate lock-release via a saturating stall counter #5299

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MDEV-40129 Retry transient trylock failures in lock-release fast paths#5298

MDEV-40129 Retry transient trylock failures in lock-release fast paths#5298
iMineLink wants to merge 1 commit into
11.8from
MDEV-40129

iMineLink commented Jun 29, 2026

Uh oh!

CLAassistant commented Jun 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

iMineLink commented Jun 29, 2026

Uh oh!

CLAassistant commented Jun 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants