Fix data race in TopicPayloadPool payload acquisition#6339
Open
PavelGuzenfeld wants to merge 1 commit intoeProsima:masterfrom
Open
Fix data race in TopicPayloadPool payload acquisition#6339PavelGuzenfeld wants to merge 1 commit intoeProsima:masterfrom
PavelGuzenfeld wants to merge 1 commit intoeProsima:masterfrom
Conversation
a546454 to
bf6d2b9
Compare
Two thread-safety issues in TopicPayloadPool: 1. do_get_payload() released the pool mutex before calling payload_node->reference(). In the window between releasing the lock and incrementing the reference count, a concurrent release_payload() could see ref_count==0 and return the node to the free list, allowing another thread to reuse it. Fix: hold the lock until after reference() and metadata reads are complete. 2. PayloadNode::reference() used memory_order_relaxed for the atomic increment. When the sharing path (get_payload(const&)) passes a payload buffer from publisher to subscriber, relaxed ordering does not guarantee the subscriber sees the publisher's writes to the buffer data. This can cause TSan-detected races between serialize_array and deserialize_array on the same buffer. Fix: use memory_order_acq_rel to establish happens-before ordering between the writing thread and the reading thread. Ref: ros2/rclcpp#2941 Signed-off-by: Pavel Guzenfeld <me@pavelguzenfeld.com>
bf6d2b9 to
c34dead
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a ThreadSanitizer-detected data race in
TopicPayloadPoolthat manifests during DDS intra-process communication when publisher serialization and subscriber deserialization access the same payload buffer concurrently.Reported in: ros2/rclcpp#2941
Issue filed: #6340
Changes
1. Hold mutex through reference increment in
do_get_payload()Previously, the pool mutex was released before
payload_node->reference()was called. In this window, a concurrentrelease_payload()could seeref_count==0and return the node to the free list, allowing reuse by another thread.2. Strengthen memory ordering in
PayloadNode::reference()Changed from
memory_order_relaxedtomemory_order_acq_rel. When the sharing path (get_payload(const&)) passes a payload buffer from publisher to subscriber, relaxed ordering does not guarantee the subscriber sees the publisher's writes to the buffer.acq_relestablishes the necessary happens-before relationship.Test Results
AddressSanitizer (ASan)
ThreadSanitizer (TSan)
Environment: Ubuntu 24.04, GCC 13, foonathan_memory 0.7.4, fastcdr from thirdparty submodule.
Test plan
halt_on_error=1)Signed-off-by: Pavel Guzenfeld me@pavelguzenfeld.com