fix: Recover Hutch worker when consumer channel is closed due to delivery_ack_timeout?#414
Conversation
michaelklishin
left a comment
There was a problem hiding this comment.
Re-opening channels that were closed by the server is OK in some cases (e.g. due to a delivery acknowledgement timeout)
but not others.
This reinvents a portion of connection recovery in Bunny, ignores Bunny::Channel#open?, existing eventual assertion helpers.
|
One thing that this PR does get right: it recovers in one specific place instead of using I'm working on adding a couple of |
|
With [1][2], the implementation of this PR can shrink significantly, coming down to something like this: ch.on_error do |channel, channel_close_method|
if channel_close_method.delivery_ack_timeout?
channel.reopen
connection.recover_channel_topology(channel)
end
end |
be883d7 to
1f23bfc
Compare
|
|
||
| gemspec | ||
|
|
||
| gem 'bunny', github: 'ruby-amqp/bunny', branch: 'main' |
There was a problem hiding this comment.
I only used this for local testing. As soon as bunny 3 is released we can change this in the hutch.gemspec
|
Thank you very much for the quick review. I’ve updated the PR accordingly. Most of the work went into building an integration test that reliably reproduces the issue. One thing I’m still unsure about is whether reopening the channel from a separate thread is the best approach here. I wasn’t able to find a cleaner way to make I also opened a small follow-up PR in |
|
If the channel is already closed by the server, there won't be any activity on it, so all scenarios where reopening the channel would entail a concurrency hazard come down to the channel ID allocator safety. All key methods of the allocator are synchronized. |
|
@eglitobias voilà, Bunny |
|
Hutch |
…very_ack_timeout?
1f23bfc to
f664026
Compare
|
Rebased on top of Bunny 3.0 and updated the branch. |
|
Not sure why the pipeline is failing now with Ruby 3.3. |
|
@eglitobias having considered the "offloading" thread workaround you have ended up with, I think another change to Bunny can be justified: the Otherwise the entire feature set would require every single user to start a new thread. Bunny's I/O loop thread is different from the application thread (let's ignore the single threaded mode, virtually no one even knows about it even though it was contributed around 2012). Which means using a yet another separate thread won't make much difference for the end user. |
This callback already runs on a thread that's different from the application thread, so this behavior change should not affect any apps. On the upside, this allows the callback use `Bunny::Session` methods, including protocol operation ones, without blocking the I/O loop. References ruby-amqp/hutch#414
|
@eglitobias I have released Bunny We can still introduce more tweaks in follow-up PRs. |
Summary
This MR fixes the case where RabbitMQ closes the consumer channel and Hutch gets stuck because subsequent
ack/nackcalls raiseChannelAlreadyClosed.When that happens today, Hutch can stay alive but stop consuming new messages until the process is restarted. That is the behavior described in issue #364.
What changed
ChannelAlreadyClosedfrom both adapters:Hutch::Adapter::Bunny::ChannelAlreadyClosedHutch::Adapter::MarchHare::ChannelAlreadyClosedHutch::Workerto:ChannelAlreadyClosedwhenacknowledge_errortries toack/nackacknot raisingnackafter consumer error not raising