Skip to content

config: add master_node_uuids to EdgeNodeCluster#145

Merged
eriknordmark merged 2 commits into
lf-edge:mainfrom
naiming-zededa:naiming-masternode-uuids
May 19, 2026
Merged

config: add master_node_uuids to EdgeNodeCluster#145
eriknordmark merged 2 commits into
lf-edge:mainfrom
naiming-zededa:naiming-masternode-uuids

Conversation

@naiming-zededa

@naiming-zededa naiming-zededa commented May 18, 2026

Copy link
Copy Markdown
Contributor
  In a 3-master k3s cluster, replacing a master node leaves a stale
  Node object that blocks the new master from joining — k3s rejects
  a new server while an old member is still registered.

  Add master_node_uuids (field 13) to EdgeNodeCluster so the controller
  can ship the authoritative list of current master UUIDs to surviving
  devices. EVE uses this list to delete stale control-plane Node
  objects, which causes k3s to automatically drop the corresponding
  member and allows the replacement master to join cleanly.

@zedi-pramodh

Copy link
Copy Markdown

I am trying to understand, so in a replace node operation, eve will get three uuid of the two old nodes + one new node ?
Then the eve code will delete the node that is missing from the list ? Which eve node will handle that in a cluster ? Will there be a race ? Does this also mean our replace node never worked earlier ?

@naiming-zededa

Copy link
Copy Markdown
Contributor Author

I am trying to understand, so in a replace node operation, eve will get three uuid of the two old nodes + one new node ? Then the eve code will delete the node that is missing from the list ? Which eve node will handle that in a cluster ? Will there be a race ? Does this also mean our replace node never worked earlier ?

right, two old nodes uuid, plus one new node uuid.
it is the elected lease leader, who is responsible for deleting the stale node
our replace node works (without counting for corner cases), when the deleted node is alive during the operation

@eriknordmark eriknordmark left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Is this tied to using etcd as the database, or should the comment be more general?

@naiming-zededa naiming-zededa force-pushed the naiming-masternode-uuids branch from 0d78d8a to 93da5c9 Compare May 19, 2026 17:05
  In a 3-master k3s cluster, replacing a master node leaves a stale
  Node object that blocks the new master from joining — k3s rejects
  a new server while an old member is still registered.

  Add master_node_uuids (field 13) to EdgeNodeCluster so the controller
  can ship the authoritative list of current master UUIDs to surviving
  devices. EVE uses this list to delete stale control-plane Node
  objects, which causes k3s to automatically drop the corresponding
  member and allows the replacement master to join cleanly.

Signed-off-by: naiming-zededa <naiming@zededa.com>
  add generated pb.go and pb2.py files

Signed-off-by: naiming-zededa <naiming@zededa.com>
@naiming-zededa naiming-zededa force-pushed the naiming-masternode-uuids branch from 93da5c9 to 081597b Compare May 19, 2026 17:10
@naiming-zededa

Copy link
Copy Markdown
Contributor Author

LGTM. Is this tied to using etcd as the database, or should the comment be more general?

@eriknordmark removed wording on etcd in the commit message.

@eriknordmark eriknordmark merged commit 2221192 into lf-edge:main May 19, 2026
4 checks passed
@naiming-zededa naiming-zededa deleted the naiming-masternode-uuids branch May 19, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants