fix(pool): tolerate closing-connection race in checkout (#850)#852
Merged
Conversation
A checkout that races a server-side close crashed the pool GenServer:
find_available calls hackney_conn:is_ready and returns {ok, connected},
then the checkout did ok = hackney_conn:set_owner(Pid, Requester). The
two are separate gen_statem calls, so a tcp_closed can be processed in
between, leaving set_owner to reply {error, invalid_state} during the
closed grace window and failing the hard match.
Checkout now handles {error, _} from set_owner and starts a fresh
connection. The async checkin/prewarm path (set_owner_async) had the
same race silently: a pooled connection that already closed now stops on
the cast so the pool's monitor drops it instead of handing it out. Spec
for set_owner/2 corrected to ok | {error, invalid_state}.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #850.
An intermittent pool GenServer crash when a server closes a pooled keep-alive connection during checkout.
find_availablereturns a connection onis_ready->{ok, connected}, then checkout callsset_owneras a separategen_statemcall. Atcp_closedprocessed between the two makesset_ownerreply{error, invalid_state}during the closed grace window, which broke the hardok =match and crashed the pool (it restarts, so the practical impact is a failed checkout plus log noise).Checkout now handles
{error, _}fromset_ownerand falls through to a fresh connection. The async checkin/prewarm path (set_owner_async) had the same race silently, leaving an already-closed connection briefly in the pool'savailablemap; a pooled connection that has closed now stops on the cast so the monitor drops it.set_owner/2's spec is corrected took | {error, invalid_state}.Thanks to @ashutoshrishi for the report and root-cause analysis.