[Other] Shrinking QQs of a partitioned cluster member can be slow #15057
Replies: 2 comments 3 replies
-
|
One quick workaround for this is to turn off distribution auto-connection on the node performing the deletion: Notice at 16:15:09 in this test when this option kicks in: Once the option is set, shrinking completes very quickly since the seven-second timeout is eliminated. That option is harmless to set temporarily on a node which is not attempting to join other nodes. New instances which launch while the option is set may still join this node. But this option prevents this node from joining others, so once the shrink is complete, it's a good idea to unset it: |
Beta Was this translation helpful? Give feedback.
-
|
The easiest change to make here might be to batch and parallelize |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Community Support Policy
RabbitMQ version used
4.2.x
How is RabbitMQ deployed?
Other
Steps to reproduce the behavior in question
When a cluster member is lost due to some hardware failures,
rabbitmqctl forget_cluster_node(or other QQ shrink actions) can work through the set of QQs slowly. A hardware failure may cause a node to 'disappear' according to the other nodes in the cluster. Forgetting the lost member can take a long time per-queue.forget_cluster_nodeis an easy way to reproduce this but not the only one:rabbit_quorum_queue:shrink_all/1is also used by peer discovery cleanup (opt-in feature) and QQ CMR has a similar method of working through the queues withrabbit_quorum_queue:delete_member/2.Reproduction
I have three EC2 nodes with
servicetags set torabbitmqsharing an Erlang cookie and using this config:Then on each node I run
make RABBITMQ_CONFIG_FILE=~/config.conf run-brokerusingmainand OTP 27. Then declare a large number of queues:perf-test -qq -qpf 1 -qpt 1000 -qp qq-%d -x 1 -y 0 --time 1.Then we simulate a hardware failure where a node effectively becomes partitioned network-wise using IP-tables.
If there are any QQs with leaders on that node, the majority-side of the cluster will soon-after take leadership. Then nodes A and B recognize that C is unreachable:
Now if we run
rabbitmqctl forget_cluster_node rabbit@ip-172-31-26-76from A or B, we can see that C is removed from each QQ rather slowly:Note the seven seconds between each attempted queue membership removal. Leaving this test to run overnight, it takes around 1hr57min to finish shrinking the member off of the 1000 queues.
Analysis
rabbit_quorum_queue:delete_member/2has three potentially expensive components:ra:remove_member/3. This acts against the (maybe newly elected) leader, though, and is usually very quick.rabbit_amqqueue:update/2to remove the node from thememberslist in the queue type state. This is also quick since the first step offorget_cluster_nodeis to remove the node from the metadata-store membership.ra:force_delete_server/2whenra:remove_member/3succeeds. This is where the seven seconds come from.ra:force_delete_server/2ultimately callsra_server_sup_sup:stop_server/2which performs anrpc:call/4to the failed node. Because of the node disappeared at the network level, Erlang forgets about its distribution table entry and attempts to form a new connection to the node. By default the connect timeout is seven seconds innet_kernel, so this repeatedly waits for up to seven seconds if the destination is not reachable and is not responding. Shrinking is usually very very fast, completing in single-digit minute times even for thousands of queues, as long as the lost member hung up the network connection gracefully. But when it disappears from net_tick_timeout rather than connection_down, shrinking is slow.This situation is benign since all QQs are have a quorum of active members. If there are other membership changes during this long window, like a new instance joining, the quorum can become threatened since the membership increases to four nodes for any QQs waiting to have the original node removed. If any instance then fails, the membership would drop to 2/4 active members on some QQs and progress on those QQs would be blocked.
Beta Was this translation helpful? Give feedback.
All reactions