Skip to content

fix: add protections against dataraces in socket engines, cluster timer and queues.#1587

Merged
braindigitalis merged 1 commit into
brainboxdotcc:devfrom
fclivaz42:queues-socketengines
May 28, 2026
Merged

fix: add protections against dataraces in socket engines, cluster timer and queues.#1587
braindigitalis merged 1 commit into
brainboxdotcc:devfrom
fclivaz42:queues-socketengines

Conversation

@fclivaz42
Copy link
Copy Markdown

As discussed, here is a smaller PR with less risky changes.
I have changed the following:

  • epoll, where I removed an unused variable and added a lock where elements were removed from the fds vector
  • kqueue, where I have added a lock similar to epoll
  • queues, where I have changed two booleans to std::atomic booleans, added a safeguard for the cli variable and guards for the removals vector.
  • timer, where I added a guard to an unguarded if statement.

I believe these changes are light enough to not cause any issues whatsoever, and have kept the riskier changes out of this PR for now while I do more testing on my end.

Code change checklist

  • I have ensured that all methods and functions are fully documented using doxygen style comments.
  • My code follows the coding style guide.
  • I tested that my change works before raising the PR.
  • I have ensured that I did not break any existing API calls.
  • I have not built my pull request using AI, a static analysis tool or similar without any human oversight.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 15, 2026

Deploy Preview for dpp-dev ready!

Name Link
🔨 Latest commit d730187
🔍 Latest deploy log https://app.netlify.com/projects/dpp-dev/deploys/6a07a9f21335d90008b12e7c
😎 Deploy Preview https://deploy-preview-1587--dpp-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions github-actions Bot added documentation Improvements or additions to documentation code Improvements or additions to code. labels May 15, 2026
@egorpugin
Copy link
Copy Markdown
Contributor

My simple echo bots are stuck in epoll even with 10.1.5 version.
Does this fixes exactly those issues?

@fclivaz42
Copy link
Copy Markdown
Author

My simple echo bots are stuck in epoll even with 10.1.5 version. Does this fixes exactly those issues?

You could try to test run by applying this PR as a patch and compile/install the library then test it. Feedback would actually be very welcome!

@braindigitalis
Copy link
Copy Markdown
Contributor

are you saying a simple test bot now gets stuck in epoll and didn't before 10.1.5?

@egorpugin
Copy link
Copy Markdown
Contributor

egorpugin commented May 27, 2026

10.1.4 and 10.1.5 got stuck like after several hours.
And bot seemed working on 10.1.3 (or earlier).

With this patch my bot has survived this night. Definitely a good sign.
I'll report in 1-2 days. If it will be online - my issue is gone 99%

@egorpugin
Copy link
Copy Markdown
Contributor

My bot is still online.
Good PR.

@braindigitalis braindigitalis merged commit 7b649ad into brainboxdotcc:dev May 28, 2026
41 checks passed
@egorpugin
Copy link
Copy Markdown
Contributor

Dead again now.
Don't know if this is discord, dpp or my program/build.

Thread 5 (Thread 0x7f003854b6c0 (LWP 342729) "exe"):
#0  0x00007f0038687902 in __syscall_cancel_arch () from /lib64/libc.so.6
#1  0x00007f003867bb9c in __internal_syscall_cancel () from /lib64/libc.so.6
#2  0x00007f003867c20c in __futex_abstimed_wait_common () from /lib64/libc.so.6
#3  0x00007f003867e8de in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#4  0x00007f0038844e00 in __gthread_cond_wait (__cond=<optimized out>, __mutex=<optimized out>) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/x86_64-redhat-linux/bits/gthr-default.h:911
#5  std::__condvar::wait (this=<optimized out>, __m=...) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/std_mutex.h:173
#6  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
#7  0x0000558029545c1b in std::thread::_State_impl<std::thread::_Invoker<std::tuple<dpp::thread_pool::thread_pool(dpp::cluster*, unsigned long)::$_0> > >::_M_run() ()
#8  0x00007f003884e424 in std::execute_native_thread_routine (__p=0x55805a32ccd0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#9  0x00007f003867f3c4 in start_thread () from /lib64/libc.so.6
#10 0x00007f003870256c in __clone3 () from /lib64/libc.so.6

Thread 4 (Thread 0x7f0037d4a6c0 (LWP 342730) "exe"):
#0  0x00007f0038687902 in __syscall_cancel_arch () from /lib64/libc.so.6
#1  0x00007f003867bb9c in __internal_syscall_cancel () from /lib64/libc.so.6
#2  0x00007f003867c20c in __futex_abstimed_wait_common () from /lib64/libc.so.6
#3  0x00007f003867e8de in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#4  0x00007f0038844e00 in __gthread_cond_wait (__cond=<optimized out>, __mutex=<optimized out>) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/x86_64-redhat-linux/bits/gthr-default.h:911
#5  std::__condvar::wait (this=<optimized out>, __m=...) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/std_mutex.h:173
#6  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
#7  0x0000558029545c1b in std::thread::_State_impl<std::thread::_Invoker<std::tuple<dpp::thread_pool::thread_pool(dpp::cluster*, unsigned long)::$_0> > >::_M_run() ()
#8  0x00007f003884e424 in std::execute_native_thread_routine (__p=0x55805a32dd60) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#9  0x00007f003867f3c4 in start_thread () from /lib64/libc.so.6
#10 0x00007f003870256c in __clone3 () from /lib64/libc.so.6

Thread 3 (Thread 0x7f00375496c0 (LWP 342731) "exe"):
#0  0x00007f0038687902 in __syscall_cancel_arch () from /lib64/libc.so.6
#1  0x00007f003867bb9c in __internal_syscall_cancel () from /lib64/libc.so.6
#2  0x00007f003867c20c in __futex_abstimed_wait_common () from /lib64/libc.so.6
#3  0x00007f003867e8de in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#4  0x00007f0038844e00 in __gthread_cond_wait (__cond=<optimized out>, __mutex=<optimized out>) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/x86_64-redhat-linux/bits/gthr-default.h:911
#5  std::__condvar::wait (this=<optimized out>, __m=...) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/std_mutex.h:173
#6  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
#7  0x0000558029545c1b in std::thread::_State_impl<std::thread::_Invoker<std::tuple<dpp::thread_pool::thread_pool(dpp::cluster*, unsigned long)::$_0> > >::_M_run() ()
#8  0x00007f003884e424 in std::execute_native_thread_routine (__p=0x55805a32df10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#9  0x00007f003867f3c4 in start_thread () from /lib64/libc.so.6
#10 0x00007f003870256c in __clone3 () from /lib64/libc.so.6

Thread 2 (Thread 0x7f0036d486c0 (LWP 342732) "exe"):
#0  0x00007f0038687902 in __syscall_cancel_arch () from /lib64/libc.so.6
#1  0x00007f003867bb9c in __internal_syscall_cancel () from /lib64/libc.so.6
#2  0x00007f003867c20c in __futex_abstimed_wait_common () from /lib64/libc.so.6
#3  0x00007f003867e8de in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#4  0x00007f0038844e00 in __gthread_cond_wait (__cond=<optimized out>, __mutex=<optimized out>) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/x86_64-redhat-linux/bits/gthr-default.h:911
#5  std::__condvar::wait (this=<optimized out>, __m=...) at /usr/src/debug/gcc-15.2.1-7.fc43.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/std_mutex.h:173
#6  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
#7  0x0000558029545c1b in std::thread::_State_impl<std::thread::_Invoker<std::tuple<dpp::thread_pool::thread_pool(dpp::cluster*, unsigned long)::$_0> > >::_M_run() ()
#8  0x00007f003884e424 in std::execute_native_thread_routine (__p=0x55805a32e090) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#9  0x00007f003867f3c4 in start_thread () from /lib64/libc.so.6
#10 0x00007f003870256c in __clone3 () from /lib64/libc.so.6

Thread 1 (Thread 0x7f0038b36800 (LWP 342725) "exe"):
#0  0x00007f0038687902 in __syscall_cancel_arch () from /lib64/libc.so.6
#1  0x00007f003867bb9c in __internal_syscall_cancel () from /lib64/libc.so.6
#2  0x00007f003867bbe4 in __syscall_cancel () from /lib64/libc.so.6
#3  0x00007f0038702855 in epoll_wait () from /lib64/libc.so.6
#4  0x000055802936372d in dpp::socket_engine_epoll::process_events() ()
#5  0x00005580292a2052 in dpp::cluster::start(dpp::start_type)::$_0::operator()() const ()
#6  0x00005580292a1d55 in dpp::cluster::start(dpp::start_type) ()
#7  0x00005580295a055e in main ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code Improvements or additions to code. documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants