Skip to content

fix: keep chash ring stable during health changes#13532

Open
nic-6443 wants to merge 4 commits into
apache:masterfrom
nic-6443:codex/chash-stable-health-ring
Open

fix: keep chash ring stable during health changes#13532
nic-6443 wants to merge 4 commits into
apache:masterfrom
nic-6443:codex/chash-stable-health-ring

Conversation

@nic-6443

@nic-6443 nic-6443 commented Jun 11, 2026

Copy link
Copy Markdown
Member

Chash previously rebuilt the ring from only healthy nodes whenever health status changed. With uneven weights, that changes the ring topology and can move keys that were already mapped to healthy nodes.

This keeps chash rings tied to config and weight changes, then checks health at pick time. The health snapshot is cached by checker status version, so it only recomputes when health state changes. If the selected node or AI instance is unhealthy, it is marked as tried and the existing chash next path selects another physical target. Non-chash algorithms still build pickers from the healthy subset. If healthcheck targets are not registered yet after DNS or checker changes, their status is treated as unknown instead of unhealthy. ai-proxy-multi keeps its existing exhaustion error string for compatibility.

The same behavior is applied to upstream chash balancing and ai-proxy-multi chash balancing, with regression coverage for both paths.

Copilot AI review requested due to automatic review settings June 11, 2026 11:06
@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 11, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes chash balancing to keep the hash ring topology stable across upstream/instance health transitions by building the ring from the configured nodes/instances and applying health filtering at pick time (falling through the ring via the existing “tried” mechanism when an unhealthy target is selected). This reduces unnecessary key movement for keys that were already mapped to healthy backends, especially with uneven weights.

Changes:

  • Upstream chash: build the picker from all configured nodes and skip unhealthy nodes during selection instead of rebuilding the ring from only healthy nodes.
  • ai-proxy-multi chash: build the picker from all configured instances and skip unhealthy instances during selection; non-chash algorithms keep using healthy-only pickers.
  • Add regression tests covering stable-ring behavior for both upstream chash and ai-proxy-multi chash.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
apisix/balancer.lua Keeps chash ring stable by building pickers from all nodes and filtering health during pick.
apisix/plugins/ai-proxy-multi.lua Keeps ai-proxy-multi chash ring stable by building pickers from all instances and filtering health during pick.
t/node/chash-healthcheck-stable-ring.t Regression test ensuring healthy-key mappings don’t move when health changes for upstream chash.
t/plugin/ai-proxy-multi-chash-healthcheck-stable-ring.t Regression test ensuring healthy-key mappings don’t move when health changes for ai-proxy-multi chash.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/balancer.lua
Comment thread apisix/plugins/ai-proxy-multi.lua
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants