charts/redpanda: Gateway API TLSRoute support for external access#1447
charts/redpanda: Gateway API TLSRoute support for external access#1447david-yu wants to merge 1 commit into
Conversation
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
End-to-end test results (PASS)Validated this PR locally on a k3d cluster (HEAD Stack
What worked
The design works end-to-end: client connects to the bootstrap SNI hostname, Envoy passes TLS through to a broker, the broker advertises the per-broker hostname Minor papercuts hit during the test
🤖 Generated with Claude Code |
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
A complete end-to-end deployable example for the Gateway API TLSRoute
support introduced by this PR:
gateway-api-tls/
README.md top-level runbook (terraform → Helm → OMB)
terraform/ EKS cluster + IRSA + AWS LB Controller + gp3 SC
manifests/ Envoy Gateway, cert-manager, GatewayClass,
Gateway, Redpanda chart values (gateway: true)
omb/ OMB Kafka driver (SASL/SCRAM over TLS) +
20 Mbps and 20 MB/s workloads + Job manifest
verify/ rpk smoke test, JKS truststore builder,
results-template.md for the writeup
Benchmark has not yet been executed; results section in README.md is
populated by running through the steps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7745931 to
075130b
Compare
End-to-end validation on EKS ✅Stood up this PR on a fresh EKS 1.32 cluster in Topology. 3 brokers ( Chart renders the right objects. Smoke test: per-broker SNI routing is working. Client connects to the bootstrap hostname, gets metadata back advertising per-broker SNI hostnames (the OMB 20 Mbps run — steady state, 10 min. Workload: 1 topic × 12 partitions, RF=3, 1 KiB payload, 4 producers + 4 consumers, target produce rate 2,500 msg/s. Connection: SASL/SCRAM-SHA-256 over TLS through the Gateway, JKS truststore built from
Single-digit-ms p99 publish latency through Envoy Gateway + NLB + per-broker TLSRoute + TLS Passthrough + SASL — i.e. the Gateway overhead in this topology is well within noise for typical Kafka client SLOs. |
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
End-to-end test on EKS 1.34 in tandem with PR #1329TL;DR: Provisioned a fresh EKS 1.34 cluster, deployed both PRs on a single integration branch ( Stack
Kafka throughput (steady-state, 10 min window)
Publish latency — steady-state aggregate
Read: latency at p50/p95/p99 is better than the 20 Mbps baseline (half the produce rate, as expected); p99.9 ticks up by ~2 ms vs baseline, plausibly explained by run-to-run noise on a small sample. The tail (max, p99.99) is well-bounded. End-to-end latency — final aggregate quantiles
Tandem isolation checkBoth routes were live and serving traffic during the entire 10-min steady-state window: Concurrent Console k6 load (35.65 req/s aggregate, mixed UI + API) ran continuously through the 12 min, did not affect Kafka latency at the percentiles above. See the PR #1329 tandem comment for the Console-side metrics from the same run. What this validates for PR #1447
Bug fix: TLSRouteList registrationDiff applied on the integration branch: diff --git a/charts/redpanda/chart.go b/charts/redpanda/chart.go
+ gatewayv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2"
func Types() []kube.Object {
- &TLSRoute{},
+ // Use the upstream Gateway API v1alpha2 TLSRoute so controller-runtime
+ // can List/Watch via the registered v1alpha2 scheme. The chart's
+ // lightweight struct is only used by the gotohelm-rendered path.
+ &gatewayv1alpha2.TLSRoute{},
-func addTLSRouteToScheme(s *runtime.Scheme) {
- gv := schema.GroupVersion{Group: "gateway.networking.k8s.io", Version: "v1alpha2"}
- s.AddKnownTypeWithName(gv.WithKind("TLSRoute"), &TLSRoute{})
+func addTLSRouteToScheme(s *runtime.Scheme) {
+ must(gatewayv1alpha2.Install(s))
}
diff --git a/operator/internal/controller/scheme.go b/operator/internal/controller/scheme.go
v2SchemeFns = []func(s *runtime.Scheme) error{
gatewayv1.Install,
+ gatewayv1alpha2.Install,Reproducible artifactsAll manifests + workload definitions are in
Raw artifacts (in test-harness repo)
🤖 Generated with Claude Code |
075130b to
afdea88
Compare
The earlier shape — registering only the chart's lightweight TLSRoute kind via AddKnownTypeWithName — left the v1alpha2 ListOptions / List kinds missing from the operator's scheme. controller-runtime's reflector calls List on every Watch, so every reconcile pass that included TLSRoute resources errored with: Failed to watch *redpanda.TLSRoute: no kind "ListOptions" is registered for version "gateway.networking.k8s.io/v1alpha2" Fix: switch the chart's Types() entry to the upstream `gatewayv1alpha2.TLSRoute` and register `gatewayv1alpha2.Install` on the operator's v2 scheme. The chart still renders the same wire bytes via the lightweight struct returned by TLSRoutes(); only the type the controller cache binds to changes. Surfaced during the tandem PR #1329 + #1447 e2e on EKS 1.34. See: #1447 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and CRD Closes #1308. Squashed rebase of the original 14-commit branch onto current main; drops stale golden-file drift and a review-cycle import-reordering revert that had accumulated through iteration. Chart side (`charts/console/`): - New `gateway` values block alongside `ingress`. Mutual exclusion enforced in render: enabling both fails with a clear error. - `gateway.go` renders a single HTTPRoute attached to the supplied parent Gateway(s) via `parentRefs` + optional `sectionName`. - Notes template shows the Gateway URL when gateway is enabled (and the Ingress URL when not). - Tests: mutual-exclusion, gateway-only, gateway→ingress switch, gateway removal scenarios. Operator side (`operator/`): - `Console` CRD gains a `spec.gateway` field with the same shape as the chart's values; goverter conversion auto-generated. - V2 scheme registers `gatewayv1` so the Console reconciler can watch HTTPRoutes. - RBAC adds `gateway.networking.k8s.io/httproutes` perms. - Console controller's `SetupWithManager` skips the HTTPRoute watch if the Gateway API CRDs aren't installed in the cluster (graceful degradation; same pattern used for ServiceMonitor). Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide). Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute support; both routes coexisted cleanly on the same Envoy Gateway. See: #1329 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #1361. Squashed rebase of the original 14-commit branch onto current main; consolidates the iterative CI/lint fixes and includes the v1alpha2 scheme registration fix surfaced during the tandem PR #1329 + #1447 e2e test on EKS 1.34. Design: - User brings their own Gateway (TLSRoute-capable, e.g. Envoy Gateway). The chart only manages TLSRoute + ClusterIP backend services. - Per-listener `gateway: true` opt-in enables gradual migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports. - SNI-based routing: each broker gets a unique hostname via `host` / `hostTemplate` per listener. - Bootstrap TLSRoute handles initial client connections; per-broker TLSRoutes handle direct broker connections after metadata discovery. Chart side (`charts/redpanda/`): - `external.gateway` block with `enabled`, `parentRefs`, `advertisedPort`. - Per-listener `gateway`, `host`, `hostTemplate` fields on `listeners.{kafka,http,admin,schemaRegistry}.external.*`. - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker) with proper SNI hostnames. - `service.gateway.go` renders ClusterIP backend services. - LoadBalancer / NodePort service rendering skips gateway-opted listeners so they coexist on different ports. - `secrets.go` constructs the per-listener gateway-aware advertised address. Operator side (`operator/`): - `Redpanda` CRD gains the `external.gateway` and per-listener fields; goverter conversion auto-generated. - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList + ListOptions) so the controller-runtime cache can List/Watch the chart-rendered TLSRoute resources. The chart's lightweight TLSRoute struct stays for gotohelm rendering; the type the operator watches via `Types()` is the upstream `gatewayv1alpha2.TLSRoute`. - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms. Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS Passthrough mode, OMB at 10 Mbps + Console k6 in tandem: #1447 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
afdea88 to
1a67fc2
Compare
… and CRD Closes #1308. Squashed rebase of the original 14-commit branch onto current main; drops stale golden-file drift and a review-cycle import-reordering revert that had accumulated through iteration. Chart side (`charts/console/`): - New `gateway` values block alongside `ingress`. Mutual exclusion enforced in render: enabling both fails with a clear error. - `gateway.go` renders a single HTTPRoute attached to the supplied parent Gateway(s) via `parentRefs` + optional `sectionName`. - Notes template shows the Gateway URL when gateway is enabled (and the Ingress URL when not). - Tests: mutual-exclusion, gateway-only, gateway→ingress switch, gateway removal scenarios. Operator side (`operator/`): - `Console` CRD gains a `spec.gateway` field with the same shape as the chart's values; goverter conversion auto-generated. - V2 scheme registers `gatewayv1` so the Console reconciler can watch HTTPRoutes. - RBAC adds `gateway.networking.k8s.io/httproutes` perms. - Console controller's `SetupWithManager` skips the HTTPRoute watch if the Gateway API CRDs aren't installed in the cluster (graceful degradation; same pattern used for ServiceMonitor). Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide). Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute support; both routes coexisted cleanly on the same Envoy Gateway. See: #1329 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1a67fc2 to
4914721
Compare
Closes #1361. Squashed rebase of the original 14-commit branch onto current main; consolidates the iterative CI/lint fixes and includes the v1alpha2 scheme registration fix surfaced during the tandem PR #1329 + #1447 e2e test on EKS 1.34. Design: - User brings their own Gateway (TLSRoute-capable, e.g. Envoy Gateway). The chart only manages TLSRoute + ClusterIP backend services. - Per-listener `gateway: true` opt-in enables gradual migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports. - SNI-based routing: each broker gets a unique hostname via `host` / `hostTemplate` per listener. - Bootstrap TLSRoute handles initial client connections; per-broker TLSRoutes handle direct broker connections after metadata discovery. Chart side (`charts/redpanda/`): - `external.gateway` block with `enabled`, `parentRefs`, `advertisedPort`. - Per-listener `gateway`, `host`, `hostTemplate` fields on `listeners.{kafka,http,admin,schemaRegistry}.external.*`. - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker) with proper SNI hostnames. - `service.gateway.go` renders ClusterIP backend services. - LoadBalancer / NodePort service rendering skips gateway-opted listeners so they coexist on different ports. - `secrets.go` constructs the per-listener gateway-aware advertised address. Operator side (`operator/`): - `Redpanda` CRD gains the `external.gateway` and per-listener fields; goverter conversion auto-generated. - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList + ListOptions) so the controller-runtime cache can List/Watch the chart-rendered TLSRoute resources. The chart's lightweight TLSRoute struct stays for gotohelm rendering; the type the operator watches via `Types()` is the upstream `gatewayv1alpha2.TLSRoute`. - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms. Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS Passthrough mode, OMB at 10 Mbps + Console k6 in tandem: #1447 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4914721 to
bd28a50
Compare
Closes #1361. Squashed rebase of the original 14-commit branch onto current main; consolidates the iterative CI/lint fixes and includes the v1alpha2 scheme registration fix surfaced during the tandem PR #1329 + #1447 e2e test on EKS 1.34. Design: - User brings their own Gateway (TLSRoute-capable, e.g. Envoy Gateway). The chart only manages TLSRoute + ClusterIP backend services. - Per-listener `gateway: true` opt-in enables gradual migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports. - SNI-based routing: each broker gets a unique hostname via `host` / `hostTemplate` per listener. - Bootstrap TLSRoute handles initial client connections; per-broker TLSRoutes handle direct broker connections after metadata discovery. Chart side (`charts/redpanda/`): - `external.gateway` block with `enabled`, `parentRefs`, `advertisedPort`. - Per-listener `gateway`, `host`, `hostTemplate` fields on `listeners.{kafka,http,admin,schemaRegistry}.external.*`. - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker) with proper SNI hostnames. - `service.gateway.go` renders ClusterIP backend services. - LoadBalancer / NodePort service rendering skips gateway-opted listeners so they coexist on different ports. - `secrets.go` constructs the per-listener gateway-aware advertised address. Operator side (`operator/`): - `Redpanda` CRD gains the `external.gateway` and per-listener fields; goverter conversion auto-generated. - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList + ListOptions) so the controller-runtime cache can List/Watch the chart-rendered TLSRoute resources. The chart's lightweight TLSRoute struct stays for gotohelm rendering; the type the operator watches via `Types()` is the upstream `gatewayv1alpha2.TLSRoute`. - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms. Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS Passthrough mode, OMB at 10 Mbps + Console k6 in tandem: #1447 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and CRD Closes #1308. Squashed rebase of the original 14-commit branch onto current main; drops stale golden-file drift and a review-cycle import-reordering revert that had accumulated through iteration. Chart side (`charts/console/`): - New `gateway` values block alongside `ingress`. Mutual exclusion enforced in render: enabling both fails with a clear error. - `gateway.go` renders a single HTTPRoute attached to the supplied parent Gateway(s) via `parentRefs` + optional `sectionName`. - Notes template shows the Gateway URL when gateway is enabled (and the Ingress URL when not). - Tests: mutual-exclusion, gateway-only, gateway→ingress switch, gateway removal scenarios. Operator side (`operator/`): - `Console` CRD gains a `spec.gateway` field with the same shape as the chart's values; goverter conversion auto-generated. - V2 scheme registers `gatewayv1` so the Console reconciler can watch HTTPRoutes. - RBAC adds `gateway.networking.k8s.io/httproutes` perms. - Console controller's `SetupWithManager` skips the HTTPRoute watch if the Gateway API CRDs aren't installed in the cluster (graceful degradation; same pattern used for ServiceMonitor). Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide). Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute support; both routes coexisted cleanly on the same Envoy Gateway. See: #1329 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and CRD Closes #1308. Squashed rebase of the original 14-commit branch onto current main; drops stale golden-file drift and a review-cycle import-reordering revert that had accumulated through iteration. Chart side (`charts/console/`): - New `gateway` values block alongside `ingress`. Mutual exclusion enforced in render: enabling both fails with a clear error. - `gateway.go` renders a single HTTPRoute attached to the supplied parent Gateway(s) via `parentRefs` + optional `sectionName`. - Notes template shows the Gateway URL when gateway is enabled (and the Ingress URL when not). - Tests: mutual-exclusion, gateway-only, gateway→ingress switch, gateway removal scenarios. Operator side (`operator/`): - `Console` CRD gains a `spec.gateway` field with the same shape as the chart's values; goverter conversion auto-generated. - V2 scheme registers `gatewayv1` so the Console reconciler can watch HTTPRoutes. - RBAC adds `gateway.networking.k8s.io/httproutes` perms. - Console controller's `SetupWithManager` skips the HTTPRoute watch if the Gateway API CRDs aren't installed in the cluster (graceful degradation; same pattern used for ServiceMonitor). Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide). Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute support; both routes coexisted cleanly on the same Envoy Gateway. See: #1329 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and CRD Closes #1308. Squashed rebase of the original 14-commit branch onto current main; drops stale golden-file drift and a review-cycle import-reordering revert that had accumulated through iteration. Chart side (`charts/console/`): - New `gateway` values block alongside `ingress`. Mutual exclusion enforced in render: enabling both fails with a clear error. - `gateway.go` renders a single HTTPRoute attached to the supplied parent Gateway(s) via `parentRefs` + optional `sectionName`. - Notes template shows the Gateway URL when gateway is enabled (and the Ingress URL when not). - Tests: mutual-exclusion, gateway-only, gateway→ingress switch, gateway removal scenarios. Operator side (`operator/`): - `Console` CRD gains a `spec.gateway` field with the same shape as the chart's values; goverter conversion auto-generated. - V2 scheme registers `gatewayv1` so the Console reconciler can watch HTTPRoutes. - RBAC adds `gateway.networking.k8s.io/httproutes` perms. - Console controller's `SetupWithManager` skips the HTTPRoute watch if the Gateway API CRDs aren't installed in the cluster (graceful degradation; same pattern used for ServiceMonitor). Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide). Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute support; both routes coexisted cleanly on the same Envoy Gateway. See: #1329 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This should be ready for review, then 1329 would be need to be merge after to help complete out the Gateway API story. We can marke this as beta for 26.2. |
How
|
| Placeholder | Substituted with | Source |
|---|---|---|
$POD_ORDINAL |
loop index 0, 1, 2, … |
the for-loop index in TLSRoutes() |
$POD_NAME |
the StatefulSet pod name, e.g. redpanda-0 |
PodNames(state, …) |
From charts/redpanda/tlsroute.go:
for i, podname := range pods {
brokerHost := renderBrokerHost(hostTemplate, i, podname)
brokerSvcName := fmt.Sprintf("gw-%s", podname)
routes = append(routes, &TLSRoute{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s-%s-%d", fullname, listenerTag, name, i),
...
},
Spec: TLSRouteSpec{
Hostnames: []string{brokerHost},
Rules: []TLSRouteRule{{ BackendRefs: []TLSRouteBackendRef{{
Name: brokerSvcName, Port: port,
}}}},
...
},
})
}
func renderBrokerHost(tmpl string, ordinal int, podName string) string {
result := strings.ReplaceAll(tmpl, "$POD_ORDINAL", fmt.Sprintf("%d", ordinal))
result = strings.ReplaceAll(result, "$POD_NAME", podName)
return result
}Worked example: replicas: 3 with $POD_ORDINAL
Given this values snippet:
statefulset:
replicas: 3
external:
enabled: true
domain: example.com
gateway:
enabled: true
parentRefs:
- name: redpanda-gateway
sectionName: kafka
listeners:
kafka:
external:
default:
port: 9094
gateway: true
host: redpanda.example.com
hostTemplate: redpanda-$POD_ORDINAL.example.comThe chart renders 4 TLSRoute objects (1 bootstrap + 3 per broker):
# 1. Bootstrap route — single, taken verbatim from `host`
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
name: redpanda-kafka-default-bootstrap
spec:
hostnames: [redpanda.example.com]
rules:
- backendRefs:
- { name: redpanda-gateway-bootstrap, port: 9094 }
# 2. Per-broker route, i=0, podName=redpanda-0
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
name: redpanda-kafka-default-0
spec:
hostnames: [redpanda-0.example.com] # $POD_ORDINAL → 0
rules:
- backendRefs:
- { name: gw-redpanda-0, port: 9094 }
# 3. Per-broker route, i=1, podName=redpanda-1
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
name: redpanda-kafka-default-1
spec:
hostnames: [redpanda-1.example.com] # $POD_ORDINAL → 1
rules:
- backendRefs:
- { name: gw-redpanda-1, port: 9094 }
# 4. Per-broker route, i=2, podName=redpanda-2
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
name: redpanda-kafka-default-2
spec:
hostnames: [redpanda-2.example.com] # $POD_ORDINAL → 2
rules:
- backendRefs:
- { name: gw-redpanda-2, port: 9094 }$POD_NAME variant
If you'd rather key on the pod name than the ordinal — useful when the StatefulSet name isn't redpanda — use $POD_NAME:
hostTemplate: $POD_NAME.example.comFor the same 3-broker cluster above, this produces hostnames: [redpanda-0.example.com], [redpanda-1.example.com], [redpanda-2.example.com] — identical here because the pod name happens to be redpanda-<ordinal>, but with a renamed StatefulSet (e.g. rp.fullnameOverride: cluster-a) you'd get cluster-a-0.example.com etc.
You can mix them too: hostTemplate: $POD_NAME-broker-$POD_ORDINAL.example.com → redpanda-0-broker-0.example.com, …
Multi-pool clusters
pods is built from the default StatefulSet plus any additional pools:
pods := PodNames(state, Pool{Statefulset: state.Values.Statefulset})
for _, set := range state.Pools {
pods = append(pods, PodNames(state, set)...)
}So a cluster with replicas: 3 on the default STS and an extra pool of 2 brokers renders 6 TLSRoutes (1 bootstrap + 5 per-broker), with $POD_ORDINAL running 0→4 across the combined pod list.
Required-when-replicas>1 safety check
For Kafka, hostTemplate is mandatory when replicas > 1 (Kafka clients need a unique advertised hostname per broker for the per-partition leader routing to work). Omitting it intentionally panics chart render to surface the misconfiguration early:
if listenerTag == "kafka" && len(pods) > 1 && hostTemplate == "" {
panic(fmt.Sprintf("gateway listener %s/%s requires hostTemplate when replicas > 1", listenerTag, name))
}For non-Kafka listeners (HTTP proxy / Admin / Schema Registry) the bootstrap route alone is enough; hostTemplate is optional and only emits per-broker routes when set.
🤖 Generated with Claude Code
Closes #1361. Squashed rebase of the original 14-commit branch onto current main; consolidates the iterative CI/lint fixes and includes the v1alpha2 scheme registration fix surfaced during the tandem PR #1329 + #1447 e2e test on EKS 1.34. Design: - User brings their own Gateway (TLSRoute-capable, e.g. Envoy Gateway). The chart only manages TLSRoute + ClusterIP backend services. - Per-listener `gateway: true` opt-in enables gradual migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports. - SNI-based routing: each broker gets a unique hostname via `host` / `hostTemplate` per listener. - Bootstrap TLSRoute handles initial client connections; per-broker TLSRoutes handle direct broker connections after metadata discovery. Chart side (`charts/redpanda/`): - `external.gateway` block with `enabled`, `parentRefs`, `advertisedPort`. - Per-listener `gateway`, `host`, `hostTemplate` fields on `listeners.{kafka,http,admin,schemaRegistry}.external.*`. - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker) with proper SNI hostnames. - `service.gateway.go` renders ClusterIP backend services. - LoadBalancer / NodePort service rendering skips gateway-opted listeners so they coexist on different ports. - `secrets.go` constructs the per-listener gateway-aware advertised address. Operator side (`operator/`): - `Redpanda` CRD gains the `external.gateway` and per-listener fields; goverter conversion auto-generated. - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList + ListOptions) so the controller-runtime cache can List/Watch the chart-rendered TLSRoute resources. The chart's lightweight TLSRoute struct stays for gotohelm rendering; the type the operator watches via `Types()` is the upstream `gatewayv1alpha2.TLSRoute`. - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms. Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS Passthrough mode, OMB at 10 Mbps + Console k6 in tandem: #1447 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bd28a50 to
5433e06
Compare
Summary
gateway: trueopt-in enables gradual migration — traditional and TLSRoute listeners coexisthost/hostTemplatefields enable unique SNI hostnames per external listener, solving the per-listener domain problem (Different domain per listener #1361)parentRefsDesign
The design follows the established pattern for TLSRoute-based access using Gateway API:
gateway: true, enabling gradual migrationUsing TLS Passthrough (recommended)
In passthrough mode, the Gateway forwards the TLS connection as-is to the Redpanda broker. Redpanda's own TLS certificate is used, and mTLS authentication works.
Step 1: Create a Gateway with TLS passthrough
Step 2: Configure Redpanda with gateway listeners
Step 3: Configure DNS
Point these DNS records to the Gateway's external IP:
redpanda.example.com→ Gateway IP (bootstrap)redpanda-0-broker.example.com→ Gateway IP (broker 0)redpanda-1-broker.example.com→ Gateway IP (broker 1)redpanda-2-broker.example.com→ Gateway IP (broker 2)Step 4: Connect clients
Using TLS Termination
In termination mode, the Gateway decrypts TLS and forwards plaintext to the broker. The Gateway's own certificate is presented to clients. mTLS authentication is not available in this mode.
Step 1: Create a Gateway with TLS termination
Step 2: Configure Redpanda without TLS on the external listener
Since the Gateway handles TLS, the Redpanda listener receives plaintext:
Migrating from Traditional Listeners to Gateway API
The per-listener
gateway: truefield enables zero-downtime migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports.Step 1: Deploy the Gateway
Create the Gateway resource in your cluster:
Step 2: Add a TLSRoute listener alongside the existing one
Update your Helm values to add a new listener with
gateway: true. The existing NodePort listener continues to work:After
helm upgrade, the cluster has both:Step 3: Configure DNS and migrate clients
redpanda.example.com:9094)Step 4: Remove the old listener
Once all clients have migrated, remove the NodePort listener:
Optionally remove
external.type: NodePortas it is no longer used.Verified end-to-end with Envoy Gateway +
rpkReproducible recipe used to validate this PR on a local k3d cluster, producing and consuming over TLS through the Gateway. Run from the
feat/gateway-api-tlsroutebranch.1. Cluster + dependencies
2. Create a Gateway (user-managed, the chart only attaches TLSRoutes)
3. Install the Redpanda chart with a gateway-mode listener
values.yaml:What the chart renders:
TLSRouterp-kafka-default-bootstrapredpanda.test.local→Service/rp-gateway-bootstrap:9094TLSRouterp-kafka-default-0redpanda-0.test.local→Service/gw-rp-0:9094Service(ClusterIP)rp-gateway-bootstrapService(ClusterIP)gw-rp-0rp-0Certificaterp-default-cert*.test.localandtest.local, covering both hostnamesVerify the routes are attached:
4. Connect with
rpkover TLSFor local testing without DNS, run
rpkin a container on the same docker network as the cluster, mapping the SNI hostnames to the Envoy data-plane IP via--add-host:Result on the validation run:
SNI-based routing confirmed in the Envoy data-plane access log:
requested_server_nameupstream_clusterredpanda.test.localtlsroute/rp-gw/rp-kafka-default-bootstrap/rule/-1redpanda-0.test.localtlsroute/rp-gw/rp-kafka-default-0/rule/-1So the design works end-to-end: bootstrap connection → bootstrap TLSRoute → bootstrap service; then the broker advertises
redpanda-0.test.local:9094; the client SNI-reconnects; Envoy routes by SNI to the per-broker TLSRoute → per-broker ClusterIP → broker pod, all under TLS Passthrough using Redpanda's own cert (no Gateway-side cert needed).Files changed
charts/redpanda/values.goGatewayConfig,GatewayParentReftypes; per-listenerGateway/Host/HostTemplatefields;ServicePorts()filters gateway listenerscharts/redpanda/service.gateway.gocharts/redpanda/tlsroute.gocharts/redpanda/chart.gocharts/redpanda/secrets.gocharts/redpanda/service.{loadbalancer,nodeport}.gooperator/api/.../redpanda_clusterspec_types.gooperator/.../redpanda_controller.gogateway.networking.k8s.io/tlsroutesOut of scope (future work)
Test plan
go build ./charts/redpanda/...go build ./operator/...gateway-api-tlsroute— all listeners on TLSRoutegateway-api-migration— NodePort + TLSRoute coexistingTestTemplatesuite passes with no regressionsrpkover TLS, SNI routing verified (recipe + results above)task generatein CICloses #1361
🤖 Generated with Claude Code