Skip to content

Fix CachedResource virtual-storage discovery on multi-replica/embedded-VW shards#4190

Merged
kcp-ci-bot merged 2 commits into
mainfrom
fix/dynamicrestmapper-builtin-established-race
Jun 5, 2026
Merged

Fix CachedResource virtual-storage discovery on multi-replica/embedded-VW shards#4190
kcp-ci-bot merged 2 commits into
mainfrom
fix/dynamicrestmapper-builtin-established-race

Conversation

@mjudeikis
Copy link
Copy Markdown
Contributor

@mjudeikis mjudeikis commented Jun 4, 2026

Summary

CachedResource-backed (virtual-storage) resources were intermittently undiscoverable and unservable on a kcp shard running with multiple replicas and/or embedded virtual workspaces. kubectl get <resource> returned:

Warning: Some resources are temporarily unavailable: [<resource>.<group>].
error: the server doesn't have a resource type "<resource>"

and the list/get failed. Whether it failed flapped depending on which shard replica served the request. There are two independent root causes.

1. DynamicRESTMapper controllers only ran on the leader replica

The builtin and dynamic REST mapper controllers populate a per-process, in-memory DynamicRESTMapper that every shard replica's own API server reads — the aggregating CRD-version discovery server and the virtual resources server use it to resolve GroupKinds (e.g. cache.kcp.io/CachedResourceEndpointSlice) to resources. They were registered under the leader-elected kcp-start-controllers hook, so on a shard with more than one replica only the leader populated its mapper. Non-leader replicas served requests against an empty mapper, so RESTMapping for system types failed (no matches for kind "CachedResourceEndpointSlice" in group "cache.kcp.io") and every virtual-storage resource was dropped from discovery there.

Both controllers only mutate the in-memory mapper and never write to storage, so they are safe to run on every replica. They are now started from a dedicated post-start hook that is not gated by leader election, guarded by a sync.Once so a leadership re-acquisition does not re-wire them.

2. Virtual workspace forward client used the loopback SNI/identity with embedded virtual workspaces

The aggregating discovery and virtual resources servers forward requests to the virtual workspace endpoint URLs advertised in CachedResourceEndpointSlice. Each such URL is the serving shard's own external, SNI-routed virtual-workspace URL (Shard.Spec.VirtualWorkspaceURL, e.g. https://<shard>.example:8443/services/replication/...) — reached through the ingress/front-proxy, not the in-cluster Service address and not loopback, even when the virtual workspace runs embedded in that shard.

vwClientConfig was only given a usable TLS config (shard client cert + VW CA) when virtual workspaces were not embedded. With embedded virtual workspaces (the default), it kept the loopback client config, whose ServerName is the loopback name and whose bearer token is only valid on the loopback listener. The forwarded request's TLS SNI was therefore the loopback name, the ingress could not SNI-route the connection, and it was reset:

failed to perform API discovery: ... read: connection reset by peer

The forward client now always authenticates with the shard client certificate, drops the loopback bearer token, and leaves ServerName empty so the SNI is derived from the target host. When --shard-virtual-workspace-ca-file is unset (typical with embedded virtual workspaces) it falls back to the root CA, which signs the shard serving certificate.

What Type of PR Is This?

/kind bug

Related Issue(s)

Fixes #

Release Notes

Fix CachedResource-backed (virtual-storage) resources being intermittently undiscoverable or unservable on kcp shards running with multiple replicas or embedded virtual workspaces.

@kcp-ci-bot kcp-ci-bot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has signed the DCO. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 4, 2026
@mjudeikis
Copy link
Copy Markdown
Contributor Author

/test pull-kcp-build-image

@mjudeikis
Copy link
Copy Markdown
Contributor Author

/build-image

3 similar comments
@mjudeikis
Copy link
Copy Markdown
Contributor Author

/build-image

@mjudeikis
Copy link
Copy Markdown
Contributor Author

/build-image

@mjudeikis
Copy link
Copy Markdown
Contributor Author

/build-image

Comment thread pkg/server/config.go Outdated
@mjudeikis
Copy link
Copy Markdown
Contributor Author

/build-image

@mjudeikis mjudeikis force-pushed the fix/dynamicrestmapper-builtin-established-race branch from e0763e5 to 3b3206f Compare June 5, 2026 07:41
@mjudeikis
Copy link
Copy Markdown
Contributor Author

/build-image

@mjudeikis mjudeikis changed the title Fix dynamicrestmapper builtin controller dropping not-yet-established… Fix CachedResource virtual-storage discovery on multi-replica/embedded-VW shards Jun 5, 2026
@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 5, 2026
@mjudeikis mjudeikis force-pushed the fix/dynamicrestmapper-builtin-established-race branch from 3b3206f to 4370ab9 Compare June 5, 2026 08:18
Comment thread pkg/server/controllers.go
Comment thread pkg/server/controllers.go
Comment on lines +1752 to 1768
// The DynamicRESTMapper is per-process, in-memory state that THIS shard replica's own
// API server reads (the aggregating CRD-version discovery server and the virtual resources
// server) to resolve GroupKinds to resources. Both controllers only mutate that in-memory
// mapper and never write to storage, so — unlike the leader-elected controllers started by
// kcp-start-controllers — they MUST run on every replica. If they only ran on the leader, a
// non-leader replica would serve requests against an empty mapper, making built-in/system
// types such as cache.kcp.io/CachedResourceEndpointSlice unresolvable and breaking discovery
// and serving of virtual-storage (CachedResource-backed) resources on that replica. Start
// them from a post-start hook that is not gated by leader election.
return s.AddPostStartHook("kcp-start-dynamicrestmapper-controllers", func(hookContext genericapiserver.PostStartHookContext) error {
controllerCtx := klog.NewContext(ctx, klog.FromContext(ctx).WithValues("postStartHook", "kcp-start-dynamicrestmapper-controllers"))
for _, controller := range controllers {
go s.runController(controllerCtx, controller)
}
return nil
})
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move the dynrest controller out to separate between authoritative and information controllers? If we have more controllers with the same error pattern it would make sense to separate the two.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it would be cleaner to formalize the "authoritative (leader-elected) vs informational (per-replica, in-memory)" split once there is more than one such controller. To keep this PR scoped to the bug I left it as a single dedicated post-start hook for now; happy to file a follow-up issue for the abstraction (or fold it in here if you'd prefer).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now non-ai comment:
will do in follow-up :D

Comment thread pkg/server/controllers.go Outdated
Comment thread pkg/server/controllers.go
return s.AddPostStartHook("kcp-start-dynamicrestmapper-controllers", func(hookContext genericapiserver.PostStartHookContext) error {
controllerCtx := klog.NewContext(ctx, klog.FromContext(ctx).WithValues("postStartHook", "kcp-start-dynamicrestmapper-controllers"))
for _, controller := range controllers {
go s.runController(controllerCtx, controller)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go s.runController(controllerCtx, controller)
go s.runController(hookContext, controller)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was correct, no? Just the parent of controllerCtx was wrong and should be hookCtx.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true, yeah

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — controllerCtx now uses hookContext as its parent (and derives the logger from it), and we keep passing controllerCtx to runController, matching @gman0's point.

mjudeikis added 2 commits June 5, 2026 12:33
The DynamicRESTMapper is per-process, in-memory state that each shard replica's own
API server reads — the aggregating CRD-version discovery server and the virtual
resources server use it to resolve GroupKinds (e.g. cache.kcp.io/CachedResourceEndpointSlice)
to resources. Its builtin and dynamic controllers were registered through the
leader-elected kcp-start-controllers hook, so on a multi-replica shard only the
leader populated its mapper. Non-leader replicas served API traffic against an empty
mapper, so the REST mapping for system types failed there and virtual-storage
(CachedResource-backed) resources became undiscoverable and unservable whenever a
request landed on a non-leader replica — observed as intermittent "Some resources are
temporarily unavailable" and "no matches for kind ... in group cache.kcp.io".

Both controllers only mutate the in-memory mapper and never write to storage, so they
are safe to run everywhere. Start them from a dedicated post-start hook that is not
gated by leader election, guarded by a sync.Once so the wiring happens exactly once
per process even though installControllers can run again on leadership re-acquisition.
The aggregating CRD-version discovery server and the virtual resources server forward
requests to the virtual workspace endpoint URLs advertised in CachedResourceEndpointSlices.
Each such URL is the serving shard's own external virtual-workspace URL
(Shard.Spec.VirtualWorkspaceURL), reached through the cluster ingress that routes by TLS SNI
directly to that shard — even when virtual workspaces run embedded in the shard.

vwClientConfig was only given a usable TLS config (shard client cert + VW CA) when virtual
workspaces were NOT embedded. With embedded virtual workspaces (the default) it kept the
loopback client config, whose ServerName is the loopback name and whose bearer token is only
valid on the loopback listener. The forwarded request's TLS SNI was therefore the loopback
name, the ingress could not route it, and the connection was reset ("failed to perform API
discovery: ... read: connection reset by peer"), so every virtual-storage (CachedResource-
backed) resource was dropped from discovery and could not be served.

When the shard is configured with a client certificate, authenticate the forward client with
it, leave ServerName empty so the SNI is derived from the target host, and drop the loopback
bearer token; fall back to the root CA (which signs the shard serving cert) when
--shard-virtual-workspace-ca-file is unset. Without a shard client certificate (e.g. an
in-process test server) the endpoint URL points back at this same server, so the loopback
config is kept as-is.
@mjudeikis mjudeikis force-pushed the fix/dynamicrestmapper-builtin-established-race branch from 4370ab9 to 41a22b9 Compare June 5, 2026 09:33
Copy link
Copy Markdown
Member

@ntnn ntnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 5, 2026
@kcp-ci-bot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 5c73f0ab08327c24e6a1206bfaf5d9052a633632

@kcp-ci-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ntnn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2026
@kcp-ci-bot kcp-ci-bot merged commit fb8629a into main Jun 5, 2026
14 checks passed
@kcp-ci-bot kcp-ci-bot deleted the fix/dynamicrestmapper-builtin-established-race branch June 5, 2026 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants