Summary
kvctl-server propagates cluster topology via a push to each data node, but
if the push fails (network blip, target restarting, etc.) it is not retried.
After a controller restart or any partial failure the data nodes end up with
different views of the cluster, and the cluster does not self-heal.
Observed
In an 18-shard cluster (replicas=3, 54 nodes) we ran CLUSTER NODES on
four representative masters at the same time:
| Master |
Visible nodes (master + slave) |
| shard 0 |
3 (a residual view from a 3-shard cluster created earlier) |
| shard 1 |
17 |
| shard 9 |
33 (a residual view from a 27-shard cluster created in between) |
| shard 17 |
8 |
| Expected |
54 |
cluster_state:ok and cluster_slots_ok:16384 on every node, so routing
formally works, but every node holds a different cluster_nodes snapshot
and clients that bootstrap from a "small" master keep redirecting traffic
to a few hot masters.
The state never converges back. Manager restart (rolling, all 3 instances)
only partially refreshed the views. There is no operator-facing endpoint
(/sync, /refresh, /reload, …) to force a re-push — all returned 404.
Expected
- Controller retries the per-node push with backoff until each node
acknowledges the new topology version, OR
- Exposes a
POST /clusters/{name}/sync (or similar) endpoint that
re-pushes the current cluster_nodes to every member.
Repro (vanilla setup)
- Start
kvctl-server v1.3.0 and 6 Apache Kvrocks 2.15.0 data nodes.
- Create a cluster with
replicas=2 (2 shards × 3 nodes).
iptables -A INPUT -p tcp --dport <node-3-port> -j DROP for ~10 s.
- Remove the rule, wait, then run
CLUSTER NODES on every node.
Node 3 has a stale cluster_nodes view; nothing converges back even
after several minutes.
Versions
kvctl-server v1.3.0
- Apache Kvrocks 2.15.0
Summary
kvctl-serverpropagates cluster topology via a push to each data node, butif the push fails (network blip, target restarting, etc.) it is not retried.
After a controller restart or any partial failure the data nodes end up with
different views of the cluster, and the cluster does not self-heal.
Observed
In an 18-shard cluster (
replicas=3, 54 nodes) we ranCLUSTER NODESonfour representative masters at the same time:
cluster_state:okandcluster_slots_ok:16384on every node, so routingformally works, but every node holds a different
cluster_nodessnapshotand clients that bootstrap from a "small" master keep redirecting traffic
to a few hot masters.
The state never converges back. Manager restart (rolling, all 3 instances)
only partially refreshed the views. There is no operator-facing endpoint
(
/sync,/refresh,/reload, …) to force a re-push — all returned 404.Expected
acknowledges the new topology version, OR
POST /clusters/{name}/sync(or similar) endpoint thatre-pushes the current
cluster_nodesto every member.Repro (vanilla setup)
kvctl-serverv1.3.0 and 6 Apache Kvrocks 2.15.0 data nodes.replicas=2(2 shards × 3 nodes).iptables -A INPUT -p tcp --dport <node-3-port> -j DROPfor ~10 s.CLUSTER NODESon every node.Node 3 has a stale
cluster_nodesview; nothing converges back evenafter several minutes.
Versions
kvctl-serverv1.3.0