Skip to content

Add logical cluster migration#4191

Open
ntnn wants to merge 26 commits into
kcp-dev:mainfrom
ntnn:ws-migration
Open

Add logical cluster migration#4191
ntnn wants to merge 26 commits into
kcp-dev:mainfrom
ntnn:ws-migration

Conversation

@ntnn
Copy link
Copy Markdown
Member

@ntnn ntnn commented Jun 5, 2026

Summary

This PR adds migrating logical clusters between shards.
The intent of an admin to migrate an LC is expressed using LogicalClusterMigration.migration.kcp.io.
The object is replicated to the cache-server and used to synchronize the migration between the origin and destination shard.
The destination shards pulls the data of the migrating LC using the LogicalClusterDump type through the migrating VW.
The migrating VW was added only to not put the pressure of the migration on the front-proxy as well.

More details on the process are in the reconciler package docs.

The happy path works but it still needs some resilience and we'll probably hit some real blockers once we try it on real infrastructure.
Also I haven't put in guard rails yet for who can actually create lcmigration etcpp. But I don't want the PR to get even bigger than it is.

What Type of PR Is This?

/kind feature

Related Issue(s)

Fixes #

Release Notes

Added migrating logical clusters between shards gated by the LogicalClusterMigration feature gate

ntnn added 3 commits June 5, 2026 09:38
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jun 5, 2026
@kcp-ci-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign embik for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 5, 2026
ntnn added 21 commits June 5, 2026 10:10
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
ntnn added 2 commits June 5, 2026 10:10
The paths aren't equal and making a shared list for two entries is excessive.
@ntnn
Copy link
Copy Markdown
Member Author

ntnn commented Jun 5, 2026

/test pull-kcp-test-e2e

flake

 === FAIL: test/e2e/authorizer TestShardMetricsEndpoint (30.36s) 

@kcp-ci-bot kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2026
@ntnn ntnn added this to tbd Jun 5, 2026
@ntnn ntnn moved this to In review in tbd Jun 5, 2026
s.blockDeltas.Lock()
defer s.blockDeltas.Unlock()

if s.ignoreFunc != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't handleBatchDeltas need to special-case ReplacedAllInfo deltas, similar to how handleDeltas does?

Comment thread pkg/informer/informer.go
continue
}
for _, obj := range objs {
if err := store.Delete(obj); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible the handlers that use this informer are still processing this object? It sounds a bit racy :D I wonder if it could be moved into the controller (the caller, using this informer), purge outside of the reconcile func while holding a lock. So that all threads can observe the same events.

}

// ForceRelist causes the underlying reflector to perform a full relist
// on the next list/watch cycle.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... causes the underlying reflector to perform a full relist IF the said reflector also implements this magical forceRelister interface.

I think more context could be added into the comment. With the single user of this being the LC migration controller, it's not super obvious (without further digging) WHY is this even here.

// lastSyncResourceVersionMutex guards read/write access to lastSyncResourceVersion
lastSyncResourceVersionMutex sync.RWMutex
// activeWatch holds the current watch so ForceRelist can stop it.
activeWatch watch.Interface
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, but could this be thisWatch, just watch or something like that? :D While reading, I was wondering that if this is THE active watch, I was looking for where is the other, inactive watch.

@kcp-ci-bot kcp-ci-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants