Skip to content

Fix up multicluster watches for nodepools and tls hostnames#1534

Merged
andrewstucki merged 1 commit into
mainfrom
as/cert-and-watch-fixes
May 22, 2026
Merged

Fix up multicluster watches for nodepools and tls hostnames#1534
andrewstucki merged 1 commit into
mainfrom
as/cert-and-watch-fixes

Conversation

@andrewstucki
Copy link
Copy Markdown
Contributor

@andrewstucki andrewstucki commented May 19, 2026

Summary

Two fixes for the StretchCluster multicluster controller:

1. Watch NodePool from the StretchCluster reconciler

The StretchCluster reconciler now watches NodePool events across local and provider clusters and re-enqueues the owning StretchCluster (spec.clusterRef.Name) on every change, with the originating cluster name preserved so the multicluster runtime routes the request correctly. Without this, NodePool edits (replica count, image, services, …) didn't trigger a StretchCluster reconcile until something else nudged it, so scaling and config changes lagged unpredictably.

2. Operator-issued broker TLS certs reject strict hostname verification on flat/MCS modes (fixes #1499)

In flat & MCS cross-cluster networking modes, the operator writes 2-label broker hostnames (<pod>.<ns>) into seed_servers[].host.address / advertised_rpc_api.address. The generated cert-manager Certificate covered those only via a *.<ns> wildcard — a single-label-parent wildcard that RFC 6125 §6.4.3 disallows and OpenSSL ≥3.0 rejects with verify error:num=62: hostname mismatch. The broker RPC handshake then fails, cluster_bootstrap_info RPCs return rpc::errc:4, and the StretchCluster stays Ready: False forever.

The fix is purely additive on the SAN list (plus the dead, RFC-violating wildcard removed):

  • Drop *.<ns> from the operator-emitted SAN list. It can never validate under any modern strict TLS stack, and it was the only thing previously "covering" the 2-label hostnames the operator advertises. Sibling wildcards *.<ns>.svc / *.<ns>.svc.<domain> (multi-label parents) are kept — those are RFC-valid and cover the FQDN forms.
  • Enumerate one explicit DNS SAN per broker, matching the host portion of seed_servers[].host.address / advertised_rpc_api.address:
    • flat mode: <pod>.<ns>
    • MCS mode: <pod>.<ns> and <pod>.<ns>.svc.clusterset.local

Same enumeration site that already builds seed_servers, so no new code path — just consume the broker list the renderer already produces. Non-stretch / pre-26.2 deployments are unaffected; the wildcards that did the work for them (*.<cluster>.<ns>.svc.cluster.local etc.) are untouched.

Changes

  • operator/internal/controller/redpanda/multicluster_controller.go — add a Watches(&NodePool{}, …) on the StretchCluster controller that maps NodePool events to their owning StretchCluster via spec.clusterRef (only when IsStretchCluster()), with cluster-name preservation.
  • operator/multicluster/certs.go — drop *.<ns>; append <pod>.<ns> per broker in the flat/applyInternalDNSNames branch; append <pod>.<ns>.svc.clusterset.local per broker in the MCS branch.
  • operator/multicluster/testdata/render-cases.txtar — add flat-network-tls and mcs-network-tls fixtures (the pre-existing flat-network / mcs-network cases had TLS disabled and never exercised the cert SAN path).
  • Regenerated golden files:
    • operator/multicluster/testdata/render-cases.resources.golden.txtar
    • operator/multicluster/testdata/render-cases.pools.golden.txtar
    • operator/internal/lifecycle/testdata/stretch-cluster-cases.resources.golden.txtar

@andrewstucki andrewstucki force-pushed the as/cert-and-watch-fixes branch from 6e4831f to 929a695 Compare May 19, 2026 16:08
Copy link
Copy Markdown
Contributor

@RafalKorepta RafalKorepta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, This fixes our implementation.

I wonder if it would be worth to have test that asserts our generated seed_servers or advertised_rpc_api matches one of certificates DNSNames?

@andrewstucki andrewstucki merged commit ad506dd into main May 22, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operator-issued broker TLS certs reject strict hostname verification on the advertised RPC hostname (*.redpanda violates RFC-6125)

2 participants