Skip to content

charts/redpanda: Gateway API TLSRoute support for external access#1447

Open
david-yu wants to merge 1 commit into
mainfrom
feat/gateway-api-tlsroute
Open

charts/redpanda: Gateway API TLSRoute support for external access#1447
david-yu wants to merge 1 commit into
mainfrom
feat/gateway-api-tlsroute

Conversation

@david-yu
Copy link
Copy Markdown
Contributor

@david-yu david-yu commented Apr 15, 2026

Summary

  • Adds Gateway API TLSRoute-based external access as an alternative to NodePort/LoadBalancer
  • Per-listener gateway: true opt-in enables gradual migration — traditional and TLSRoute listeners coexist
  • Creates bootstrap and per-broker ClusterIP services as TLSRoute backends
  • Per-listener host/hostTemplate fields enable unique SNI hostnames per external listener, solving the per-listener domain problem (Different domain per listener #1361)
  • User manages the Gateway externally; the chart only creates TLSRoute resources referencing it via parentRefs

Design

The design follows the established pattern for TLSRoute-based access using Gateway API:

  1. User brings their own Gateway — the operator/chart only manages TLSRoute resources and ClusterIP services
  2. Per-listener opt-in — each external listener independently chooses gateway mode via gateway: true, enabling gradual migration
  3. SNI-based routing — each broker gets a unique hostname, allowing the Gateway to route TLS traffic by SNI to the correct per-broker service
  4. Bootstrap + per-broker architecture — a bootstrap TLSRoute handles initial client connections; per-broker TLSRoutes handle direct broker connections after metadata discovery

Using TLS Passthrough (recommended)

In passthrough mode, the Gateway forwards the TLS connection as-is to the Redpanda broker. Redpanda's own TLS certificate is used, and mTLS authentication works.

Client ──[TLS]──▶ Gateway ──[TLS passthrough]──▶ Redpanda broker

Step 1: Create a Gateway with TLS passthrough

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
spec:
  gatewayClassName: envoy   # or any TLSRoute-capable implementation
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Passthrough    # forward TLS as-is to Redpanda

Step 2: Configure Redpanda with gateway listeners

external:
  enabled: true
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094
tls:
  enabled: true
  certs:
    default:
      caEnabled: true
listeners:
  kafka:
    external:
      default:
        port: 9094
        gateway: true                                  # opt-in to TLSRoute mode
        host: redpanda.example.com                     # bootstrap hostname
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com  # per-broker hostname
        tls:
          enabled: true
          cert: default

Step 3: Configure DNS

Point these DNS records to the Gateway's external IP:

  • redpanda.example.com → Gateway IP (bootstrap)
  • redpanda-0-broker.example.com → Gateway IP (broker 0)
  • redpanda-1-broker.example.com → Gateway IP (broker 1)
  • redpanda-2-broker.example.com → Gateway IP (broker 2)

Step 4: Connect clients

rpk topic list \
  --brokers redpanda.example.com:9094 \
  --tls-enabled \
  --tls-truststore ca.crt

Using TLS Termination

In termination mode, the Gateway decrypts TLS and forwards plaintext to the broker. The Gateway's own certificate is presented to clients. mTLS authentication is not available in this mode.

Note: TLS termination for TLSRoutes is not yet supported by most Gateway implementations. Passthrough is the practical choice today.

Client ──[TLS]──▶ Gateway ──[plaintext]──▶ Redpanda broker

Step 1: Create a Gateway with TLS termination

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
spec:
  gatewayClassName: envoy
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Terminate       # Gateway decrypts TLS
        certificateRefs:
          - name: gateway-tls-cert

Step 2: Configure Redpanda without TLS on the external listener

Since the Gateway handles TLS, the Redpanda listener receives plaintext:

external:
  enabled: true
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094
listeners:
  kafka:
    external:
      default:
        port: 9094
        gateway: true
        host: redpanda.example.com
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com
        # No TLS config — connection arrives decrypted from the Gateway

Migrating from Traditional Listeners to Gateway API

The per-listener gateway: true field enables zero-downtime migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports.

Step 1: Deploy the Gateway

Create the Gateway resource in your cluster:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
spec:
  gatewayClassName: envoy
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Passthrough

Step 2: Add a TLSRoute listener alongside the existing one

Update your Helm values to add a new listener with gateway: true. The existing NodePort listener continues to work:

external:
  enabled: true
  type: NodePort              # existing NodePort config unchanged
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094
tls:
  enabled: true
  certs:
    default:
      caEnabled: true
listeners:
  kafka:
    external:
      default:                 # existing NodePort listener — unchanged
        port: 9094
        advertisedPorts:
          - 30092
      gw-listener:             # new TLSRoute listener on a different port
        port: 9095
        gateway: true
        host: redpanda.example.com
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com
        tls:
          cert: default

After helm upgrade, the cluster has both:

  • NodePort service with port 9094 (for existing clients)
  • Gateway ClusterIP services + TLSRoutes with port 9095 (for new clients)

Step 3: Configure DNS and migrate clients

  1. Set up DNS records pointing to the Gateway IP
  2. Migrate clients one at a time to the new bootstrap address (redpanda.example.com:9094)
  3. Monitor that clients are connecting through the Gateway

Step 4: Remove the old listener

Once all clients have migrated, remove the NodePort listener:

listeners:
  kafka:
    external:
      # default: removed
      gw-listener:
        port: 9095
        gateway: true
        host: redpanda.example.com
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com
        tls:
          cert: default

Optionally remove external.type: NodePort as it is no longer used.

Verified end-to-end with Envoy Gateway + rpk

Reproducible recipe used to validate this PR on a local k3d cluster, producing and consuming over TLS through the Gateway. Run from the feat/gateway-api-tlsroute branch.

1. Cluster + dependencies

# Cluster (any K8s 1.30+ should work)
k3d cluster create rp-gw-test --image rancher/k3s:v1.32.13-k3s1

# Gateway API CRDs — TLSRoute is in the experimental channel, you must use experimental-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/experimental-install.yaml

# Envoy Gateway (any TLSRoute-capable Gateway implementation works; Cilium / Istio also fine)
helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.2.6 \
  --namespace envoy-gateway-system --create-namespace \
  --wait

# Envoy Gateway does not auto-create a default GatewayClass; create one
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
EOF

# cert-manager (for the chart's self-signed CA)
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --version v1.17.2 --set crds.enabled=true --wait

2. Create a Gateway (user-managed, the chart only attaches TLSRoutes)

kubectl create namespace rp-gw
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
  namespace: rp-gw
spec:
  gatewayClassName: eg
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Passthrough
      allowedRoutes:
        kinds:
          - kind: TLSRoute
            group: gateway.networking.k8s.io
EOF

3. Install the Redpanda chart with a gateway-mode listener

values.yaml:

statefulset:
  replicas: 1
storage:
  persistentVolume:
    enabled: false
external:
  enabled: true
  domain: test.local                       # appended to the cert SANs as *.test.local
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094                   # port advertised in broker metadata
tls:
  enabled: true
  certs:
    default:
      caEnabled: true
listeners:
  kafka:
    external:
      default:
        port: 9094
        gateway: true                      # opt this listener into TLSRoute mode
        host: redpanda.test.local          # bootstrap SNI hostname
        hostTemplate: redpanda-$POD_ORDINAL.test.local   # per-broker SNI hostname
        tls:
          enabled: true
          cert: default
helm install rp ./charts/redpanda/chart -n rp-gw -f values.yaml --wait

What the chart renders:

Resource Name Notes
TLSRoute rp-kafka-default-bootstrap hostname redpanda.test.localService/rp-gateway-bootstrap:9094
TLSRoute rp-kafka-default-0 hostname redpanda-0.test.localService/gw-rp-0:9094
Service (ClusterIP) rp-gateway-bootstrap LB to all brokers on port 9094
Service (ClusterIP) gw-rp-0 per-broker, selects pod rp-0
Certificate rp-default-cert SANs include *.test.local and test.local, covering both hostnames

Verify the routes are attached:

$ kubectl -n rp-gw get tlsroute
NAME                         AGE
rp-kafka-default-0           1m
rp-kafka-default-bootstrap   1m

$ kubectl -n rp-gw get gateway redpanda-gateway -o jsonpath='{.status.listeners[*].attachedRoutes}'
2

$ kubectl -n rp-gw get tlsroute -o jsonpath='{.items[*].status.parents[*].conditions[?(@.type=="Accepted")].status}'
True True

4. Connect with rpk over TLS

For local testing without DNS, run rpk in a container on the same docker network as the cluster, mapping the SNI hostnames to the Envoy data-plane IP via --add-host:

# Extract the chart-issued CA
kubectl -n rp-gw get secret rp-default-root-certificate -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt

# Find the Envoy data-plane LoadBalancer IP
GW_IP=$(kubectl -n envoy-gateway-system get svc \
  -l gateway.envoyproxy.io/owning-gateway-name=redpanda-gateway \
  -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')

# Wrapper that runs rpk inside the docker network, with hostname resolution + CA mounted
RPK="docker run --rm -i --network k3d-rp-gw-test \
  --add-host redpanda.test.local:$GW_IP \
  --add-host redpanda-0.test.local:$GW_IP \
  -v $PWD/ca.crt:/etc/rp/ca.crt:ro \
  --entrypoint rpk docker.redpanda.com/redpandadata/redpanda:v25.2.5 \
  -X brokers=redpanda.test.local:9094 \
  -X tls.enabled=true -X tls.ca=/etc/rp/ca.crt"

$RPK cluster info
$RPK topic create gateway-test -p 3 -r 1
printf 'key-1\thello' | $RPK topic produce gateway-test --format '%k\t%v'
$RPK topic consume gateway-test -n 1 -o :end -f '%p:%o key=%k value=%v\n'

Result on the validation run:

$ rpk cluster info
ID    HOST                   PORT
0*    redpanda-0.test.local  9094

$ rpk topic create gateway-test -p 3 -r 1
TOPIC         STATUS
gateway-test  OK

$ rpk topic produce gateway-test            # 5 records, partitions 0 + 2
Produced to partition 0 at offset 0 …
Produced to partition 2 at offset 0 …
…

$ rpk topic consume gateway-test -n 5 -o :end
0:0 key=key-1 value=Hello via TLSRoute msg #1 …
0:1 key=key-4 value=Hello via TLSRoute msg #4 …
2:0 key=key-2 value=Hello via TLSRoute msg #2 …
2:1 key=key-3 value=Hello via TLSRoute msg #3 …
2:2 key=key-5 value=Hello via TLSRoute msg #5 …

SNI-based routing confirmed in the Envoy data-plane access log:

requested_server_name upstream_cluster
redpanda.test.local tlsroute/rp-gw/rp-kafka-default-bootstrap/rule/-1
redpanda-0.test.local tlsroute/rp-gw/rp-kafka-default-0/rule/-1

So the design works end-to-end: bootstrap connection → bootstrap TLSRoute → bootstrap service; then the broker advertises redpanda-0.test.local:9094; the client SNI-reconnects; Envoy routes by SNI to the per-broker TLSRoute → per-broker ClusterIP → broker pod, all under TLS Passthrough using Redpanda's own cert (no Gateway-side cert needed).

Files changed

File Change
charts/redpanda/values.go GatewayConfig, GatewayParentRef types; per-listener Gateway/Host/HostTemplate fields; ServicePorts() filters gateway listeners
charts/redpanda/service.gateway.go Bootstrap + per-broker ClusterIP service generation (gateway-opted listeners only)
charts/redpanda/tlsroute.go Bootstrap + per-broker TLSRoute generation with SNI hostnames
charts/redpanda/chart.go Register TLSRoute type and wire into render pipeline
charts/redpanda/secrets.go Per-listener gateway-aware advertised address construction
charts/redpanda/service.{loadbalancer,nodeport}.go Skip gateway-opted listeners in port generation
operator/api/.../redpanda_clusterspec_types.go CRD types for gateway config
operator/.../redpanda_controller.go RBAC for gateway.networking.k8s.io/tlsroutes

Out of scope (future work)

  • TCPRoute support (for non-TLS listeners)
  • Gateway resource management by the operator
  • East-west (service mesh) traffic routing
  • TLSRoute status checking (wait for gateway acceptance)

Test plan

  • Verify chart compiles: go build ./charts/redpanda/...
  • Verify operator compiles: go build ./operator/...
  • Golden test: gateway-api-tlsroute — all listeners on TLSRoute
  • Golden test: gateway-api-migration — NodePort + TLSRoute coexisting
  • Full TestTemplate suite passes with no regressions
  • Manual testing with Envoy Gateway — full produce/consume via rpk over TLS, SNI routing verified (recipe + results above)
  • Regenerate templates via task generate in CI

Closes #1361

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the stale label Apr 23, 2026
@david-yu david-yu removed the stale label Apr 23, 2026
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@david-yu
Copy link
Copy Markdown
Contributor Author

End-to-end test results (PASS)

Validated this PR locally on a k3d cluster (HEAD 075130b2). Full step-by-step recipe is now in the PR description under Verified end-to-end with Envoy Gateway + rpk.

Stack

  • k3d on rancher/k3s:v1.32.13-k3s1
  • Gateway API CRDs v1.2.1 (experimental channel — TLSRoute is in experimental)
  • Envoy Gateway v1.2.6 with mode: Passthrough on port 9094
  • cert-manager v1.17.2 issuing the chart's self-signed CA
  • charts/redpanda/chart from this branch, 1 broker, external.domain=test.local, listener kafka.external.default with gateway: true, host: redpanda.test.local, hostTemplate: redpanda-$POD_ORDINAL.test.local

What worked

Check Result
Chart renders 2 TLSRoutes (bootstrap + per-broker) rp-kafka-default-bootstrap, rp-kafka-default-0
TLSRoutes accepted by Envoy Gateway Accepted=True, ResolvedRefs=True on both; Gateway reports attachedRoutes: 2
Cert SANs cover bootstrap + per-broker hostnames *.test.local + test.local from external.domain covers both
rpk cluster info over TLS via bootstrap host ✅ Returns redpanda-0.test.local:9094 (gateway-aware advertised address)
rpk topic create / produce / consume round-trip ✅ 5 messages, partitions 0+2, all read back correctly
SNI-based routing to the right backend ✅ Confirmed in Envoy access log (table below)
requested_server_name=redpanda.test.local    → tlsroute/rp-gw/rp-kafka-default-bootstrap
requested_server_name=redpanda-0.test.local  → tlsroute/rp-gw/rp-kafka-default-0

The design works end-to-end: client connects to the bootstrap SNI hostname, Envoy passes TLS through to a broker, the broker advertises the per-broker hostname redpanda-0.test.local:9094, the client SNI-reconnects to that hostname, Envoy's SNI matcher selects the per-broker TLSRoute → the per-broker ClusterIP → broker pod. Redpanda's own cert is used throughout — no Gateway-side cert required for Passthrough.

Minor papercuts hit during the test

  • external.gateway and the per-listener gateway/host/hostTemplate fields are absent from charts/redpanda/chart/values.yaml. They're valid struct fields, but a documented stub in values.yaml would help discovery — happy to send a follow-up if useful.
  • Envoy Gateway's helm chart doesn't ship a default GatewayClass named eg; users have to create one. Worth a sentence in the docs section that lists Envoy as a known-good implementation.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the stale label May 6, 2026
@david-yu david-yu removed the stale label May 6, 2026
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the stale label May 12, 2026
@david-yu david-yu removed the stale label May 12, 2026
david-yu added a commit that referenced this pull request May 12, 2026
A complete end-to-end deployable example for the Gateway API TLSRoute
support introduced by this PR:

  gateway-api-tls/
    README.md           top-level runbook (terraform → Helm → OMB)
    terraform/          EKS cluster + IRSA + AWS LB Controller + gp3 SC
    manifests/          Envoy Gateway, cert-manager, GatewayClass,
                        Gateway, Redpanda chart values (gateway: true)
    omb/                OMB Kafka driver (SASL/SCRAM over TLS) +
                        20 Mbps and 20 MB/s workloads + Job manifest
    verify/             rpk smoke test, JKS truststore builder,
                        results-template.md for the writeup

Benchmark has not yet been executed; results section in README.md is
populated by running through the steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@david-yu david-yu force-pushed the feat/gateway-api-tlsroute branch from 7745931 to 075130b Compare May 12, 2026 04:27
@david-yu
Copy link
Copy Markdown
Contributor Author

david-yu commented May 12, 2026

End-to-end validation on EKS ✅

Stood up this PR on a fresh EKS 1.32 cluster in us-east-2, exercised it with rpk and a 20 Mbps OMB run, both through Envoy Gateway with TLS Passthrough.

Topology. 3 brokers (m6i.2xlarge, one per AZ — us-east-2a/b/c), 4 client nodes (m6i.xlarge), 1 OMB worker pod inside the cluster. Envoy Gateway v1.2.6 fronted by an AWS NLB. helm install from feat/gateway-api-tlsroute with listeners.kafka.external.default.gateway: true + host/hostTemplate.

Chart renders the right objects.

$ kubectl -n redpanda get tlsroute
NAME                         AGE
rp-kafka-default-bootstrap   …    # hostname: redpanda.example.com   → rp-gateway-bootstrap Svc
rp-kafka-default-0           …    # hostname: redpanda-0.example.com → gw-rp-0 Svc → pod rp-0
rp-kafka-default-1           …    # hostname: redpanda-1.example.com → gw-rp-1 Svc → pod rp-1
rp-kafka-default-2           …    # hostname: redpanda-2.example.com → gw-rp-2 Svc → pod rp-2

$ kubectl -n redpanda get gateway redpanda-gateway -o jsonpath='{.status.listeners[*].attachedRoutes}'
4

$ kubectl -n redpanda get tlsroute -o jsonpath='{range .items[*]}{.metadata.name}{": Accepted="}{.status.parents[*].conditions[?(@.type=="Accepted")].status}{"\n"}{end}'
rp-kafka-default-0:          Accepted=True
rp-kafka-default-1:          Accepted=True
rp-kafka-default-2:          Accepted=True
rp-kafka-default-bootstrap:  Accepted=True

Smoke test: per-broker SNI routing is working.

$ rpk -X brokers=redpanda.example.com:9094 -X tls.enabled=true \
      -X user=kubernetes-controller -X pass=… -X sasl.mechanism=SCRAM-SHA-256 \
      cluster info

CLUSTER
=======
redpanda.4cf1ff54-05d6-40e7-9d10-b2d860ef7ec3

BROKERS
=======
ID    HOST                    PORT
0*    redpanda-0.example.com  9094
1     redpanda-1.example.com  9094
2     redpanda-2.example.com  9094

Client connects to the bootstrap hostname, gets metadata back advertising per-broker SNI hostnames (the hostTemplate substitution), and reconnects directly to each broker — every reconnect hits the same NLB, and Envoy routes by SNI to the correct per-broker TLSRoute → per-broker ClusterIP → broker pod. TLS Passthrough means Redpanda's chart-issued cert is the only thing the client trusts.

OMB 20 Mbps run — steady state, 10 min.

Workload: 1 topic × 12 partitions, RF=3, 1 KiB payload, 4 producers + 4 consumers, target produce rate 2,500 msg/s. Connection: SASL/SCRAM-SHA-256 over TLS through the Gateway, JKS truststore built from rp-default-root-certificate.

Metric Value
Sustained produce rate 2,500 msg/s — 2.4 MB/s — ~20 Mbps (target met exactly)
Sustained consume rate 2,500 msg/s (no backlog)
Produce errors 0 over the whole run
Pub latency p50 5.4 ms
Pub latency p95 8.8 ms
Pub latency p99 10.1 ms
Pub latency p99.9 11.9 ms
Pub latency p99.99 22.7 ms
Pub latency max 37.9 ms
Pub delay (client-side scheduling) p99 84 µs

Single-digit-ms p99 publish latency through Envoy Gateway + NLB + per-broker TLSRoute + TLS Passthrough + SASL — i.e. the Gateway overhead in this topology is well within noise for typical Kafka client SLOs.

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the stale label May 18, 2026
@david-yu david-yu removed the stale label May 18, 2026
@david-yu
Copy link
Copy Markdown
Contributor Author

End-to-end test on EKS 1.34 in tandem with PR #1329

TL;DR: Provisioned a fresh EKS 1.34 cluster, deployed both PRs on a single integration branch (test/1329-1447-integration), ran a 10 Mbps OMB Kafka workload through this PR's TLSRoute alongside PR #1329's Console HTTPRoute on the same Envoy Gateway. Both PRs work in tandem. Kafka publish latency stayed under 9ms at p99 over a 10-min steady-state window, with concurrent ~35 RPS Console k6 traffic.

Stack

Piece Value
EKS 1.34 (was 1.32 in the 2026-05-12 20 Mbps run)
Region us-west-2
Brokers 3 × m6i.2xlarge, one per AZ
Clients node group 4 × m6i.xlarge (Envoy, OMB workers, k6)
Operator image 605419575229.dkr.ecr.us-west-2.amazonaws.com/redpanda-operator-pr1329-1447:f6083b79-v4 (PR #1329 + PR #1447 merged on main)
Gateway Envoy Gateway v1.2.6, TLS:9094 (Passthrough — Kafka) + HTTPS:443 (Terminate — Console) on the same redpanda-gateway Gateway
OMB workload workload-10mbits.yaml — 2 min warmup + 10 min steady, 1 × 12 partitions, 1 KiB records, 4 producers / 4 consumers, target 1,250 msg/s

Kafka throughput (steady-state, 10 min window)

Metric Value
Produced msg/s — avg 1,250.0
Produced MB/s — avg 1.28 (~10 Mbps as targeted)
Consumed msg/s — avg 1,250.0
Consumed MB/s — avg 1.28
Backlog peak 3 (negligible)
Produce errors 0

Publish latency — steady-state aggregate

Percentile This run 2026-05-12 20 Mbps baseline (no Console) Delta
avg 4.65 5.6 -0.95
p50 4.39 5.4 -1.01
p75 5.32
p95 6.85 8.8 -1.95
p99 8.74 10.1 -1.36
p99.9 14.22 11.9 +2.32
p99.99 17.02 22.7 -5.68
max 27.10

Read: latency at p50/p95/p99 is better than the 20 Mbps baseline (half the produce rate, as expected); p99.9 ticks up by ~2 ms vs baseline, plausibly explained by run-to-run noise on a small sample. The tail (max, p99.99) is well-bounded.

End-to-end latency — final aggregate quantiles

Percentile Value (ms)
p50 4.00
p95 7.00
p99 9.00
p99.9 50.00
p99.99 472.00

Tandem isolation check

Both routes were live and serving traffic during the entire 10-min steady-state window:

$ kubectl -n redpanda get tlsroute -o jsonpath='{range .items[*]}{.metadata.name}: {.status.parents[*].conditions[?(@.type=="Accepted")].status}{"\n"}{end}'
rp-kafka-default-0: True
rp-kafka-default-1: True
rp-kafka-default-2: True
rp-kafka-default-bootstrap: True

$ kubectl -n redpanda get gateway redpanda-gateway -o json | jq '.status.listeners[] | "\(.name): attachedRoutes=\(.attachedRoutes)"'
"kafka: attachedRoutes=4"
"console-https: attachedRoutes=1"

Concurrent Console k6 load (35.65 req/s aggregate, mixed UI + API) ran continuously through the 12 min, did not affect Kafka latency at the percentiles above. See the PR #1329 tandem comment for the Console-side metrics from the same run.

What this validates for PR #1447

  • ✅ TLSRoute attaches successfully on EKS 1.34. (Previous validated env was EKS 1.32.)

  • ✅ SNI-based per-broker routing through Envoy Gateway is unchanged: client → bootstrap SNI → bootstrap TLSRoute → bootstrap Service; metadata redirect to redpanda-N.example.com:9094; per-broker SNI → per-broker TLSRoute → per-broker Service → broker pod.

  • ✅ The chart's gateway-mode listener coexists cleanly with a sibling HTTPS listener (added by PR Add Gateway API (HTTPRoute) support to Console Helm chart and CRD #1329 for Console) on the same Gateway resource. No allowed-routes overlap or attached-routes contention; attachedRoutes=4 for the Kafka listener and =1 for the Console listener.

  • ✅ Per-broker hostname rendering via host / hostTemplate still works.

  • ⚠️ Two scheme-registration bugs surfaced when running the operator with both PRs simultaneously:

    1. charts/redpanda/chart.go — the chart's lightweight TLSRoute struct was registered to the operator's runtime.Scheme alone, with no matching TLSRouteList. Result: controller-runtime cache Failed to watch *redpanda.TLSRoute: no kind "TLSRouteList" is registered. I switched the chart's Types() to use &gatewayv1alpha2.TLSRoute{} directly, and addTLSRouteToScheme() now calls gatewayv1alpha2.Install(s) so List/Watch/ListOptions all resolve. (diff)
    2. operator/internal/controller/scheme.go — same root cause on the operator side; gatewayv1alpha2.Install was not in the v2 scheme. Added.

    Patches are committed on the integration branch test/1329-1447-integration — pulling them into PR charts/redpanda: Gateway API TLSRoute support for external access #1447 would unblock the v2 controller path; otherwise reconcile-time List calls fail.

Bug fix: TLSRouteList registration

Diff applied on the integration branch:

diff --git a/charts/redpanda/chart.go b/charts/redpanda/chart.go
+	gatewayv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2"

 func Types() []kube.Object {
-		&TLSRoute{},
+		// Use the upstream Gateway API v1alpha2 TLSRoute so controller-runtime
+		// can List/Watch via the registered v1alpha2 scheme. The chart's
+		// lightweight struct is only used by the gotohelm-rendered path.
+		&gatewayv1alpha2.TLSRoute{},

-func addTLSRouteToScheme(s *runtime.Scheme) {
-	gv := schema.GroupVersion{Group: "gateway.networking.k8s.io", Version: "v1alpha2"}
-	s.AddKnownTypeWithName(gv.WithKind("TLSRoute"), &TLSRoute{})
+func addTLSRouteToScheme(s *runtime.Scheme) {
+	must(gatewayv1alpha2.Install(s))
 }

diff --git a/operator/internal/controller/scheme.go b/operator/internal/controller/scheme.go
 v2SchemeFns = []func(s *runtime.Scheme) error{
 	gatewayv1.Install,
+	gatewayv1alpha2.Install,

Reproducible artifacts

All manifests + workload definitions are in eks-api-gateway-tls/ of the test harness repo. Specifically the additions for this run:

  • manifests/07-console-gateway-listener.yaml — HTTPS sibling listener on the same Gateway.
  • manifests/08-console-cert.yaml — cert-manager Issuer + Certificate.
  • omb/workload-10mbits.yaml — half the rate of the existing 20 Mbps workload.
  • omb/omb-job-10mbps.yaml — OMB job definition.
  • manifests/10-k6-console-load.yaml — in-cluster k6 Job for Console mixed-load test.

Raw artifacts (in test-harness repo)

  • results/2026-05-19-tandem/omb-results.json — OMB run output.
  • results/2026-05-19-tandem/omb-stdout.log — OMB pod stdout.
  • results/2026-05-19-tandem/k6-stdout.log — k6 pod stdout with full summary.

🤖 Generated with Claude Code

@david-yu david-yu force-pushed the feat/gateway-api-tlsroute branch from 075130b to afdea88 Compare May 19, 2026 17:37
david-yu added a commit that referenced this pull request May 19, 2026
The earlier shape — registering only the chart's lightweight TLSRoute
kind via AddKnownTypeWithName — left the v1alpha2 ListOptions / List
kinds missing from the operator's scheme. controller-runtime's
reflector calls List on every Watch, so every reconcile pass that
included TLSRoute resources errored with:

  Failed to watch *redpanda.TLSRoute:
  no kind "ListOptions" is registered for version
  "gateway.networking.k8s.io/v1alpha2"

Fix: switch the chart's Types() entry to the upstream
`gatewayv1alpha2.TLSRoute` and register `gatewayv1alpha2.Install` on
the operator's v2 scheme. The chart still renders the same wire bytes
via the lightweight struct returned by TLSRoutes(); only the type the
controller cache binds to changes.

Surfaced during the tandem PR #1329 + #1447 e2e on EKS 1.34. See:
#1447 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
david-yu added a commit that referenced this pull request May 19, 2026
… and CRD

Closes #1308. Squashed rebase of the original 14-commit branch onto
current main; drops stale golden-file drift and a review-cycle
import-reordering revert that had accumulated through iteration.

Chart side (`charts/console/`):
  - New `gateway` values block alongside `ingress`. Mutual exclusion
    enforced in render: enabling both fails with a clear error.
  - `gateway.go` renders a single HTTPRoute attached to the supplied
    parent Gateway(s) via `parentRefs` + optional `sectionName`.
  - Notes template shows the Gateway URL when gateway is enabled (and
    the Ingress URL when not).
  - Tests: mutual-exclusion, gateway-only, gateway→ingress switch,
    gateway removal scenarios.

Operator side (`operator/`):
  - `Console` CRD gains a `spec.gateway` field with the same shape as
    the chart's values; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1` so the Console reconciler can
    watch HTTPRoutes.
  - RBAC adds `gateway.networking.k8s.io/httproutes` perms.
  - Console controller's `SetupWithManager` skips the HTTPRoute watch
    if the Gateway API CRDs aren't installed in the cluster (graceful
    degradation; same pattern used for ServiceMonitor).

Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide).

Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute
support; both routes coexisted cleanly on the same Envoy Gateway. See:
#1329 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
david-yu added a commit that referenced this pull request May 19, 2026
Closes #1361. Squashed rebase of the original 14-commit branch onto
current main; consolidates the iterative CI/lint fixes and includes
the v1alpha2 scheme registration fix surfaced during the tandem
PR #1329 + #1447 e2e test on EKS 1.34.

Design:
  - User brings their own Gateway (TLSRoute-capable, e.g. Envoy
    Gateway). The chart only manages TLSRoute + ClusterIP backend
    services.
  - Per-listener `gateway: true` opt-in enables gradual migration.
    Traditional NodePort/LoadBalancer listeners and TLSRoute listeners
    coexist on different ports.
  - SNI-based routing: each broker gets a unique hostname via
    `host` / `hostTemplate` per listener.
  - Bootstrap TLSRoute handles initial client connections; per-broker
    TLSRoutes handle direct broker connections after metadata
    discovery.

Chart side (`charts/redpanda/`):
  - `external.gateway` block with `enabled`, `parentRefs`,
    `advertisedPort`.
  - Per-listener `gateway`, `host`, `hostTemplate` fields on
    `listeners.{kafka,http,admin,schemaRegistry}.external.*`.
  - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker)
    with proper SNI hostnames.
  - `service.gateway.go` renders ClusterIP backend services.
  - LoadBalancer / NodePort service rendering skips gateway-opted
    listeners so they coexist on different ports.
  - `secrets.go` constructs the per-listener gateway-aware advertised
    address.

Operator side (`operator/`):
  - `Redpanda` CRD gains the `external.gateway` and per-listener
    fields; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList +
    ListOptions) so the controller-runtime cache can List/Watch the
    chart-rendered TLSRoute resources. The chart's lightweight
    TLSRoute struct stays for gotohelm rendering; the type the
    operator watches via `Types()` is the upstream
    `gatewayv1alpha2.TLSRoute`.
  - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms.

Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS
Passthrough mode, OMB at 10 Mbps + Console k6 in tandem:
#1447 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@david-yu david-yu force-pushed the feat/gateway-api-tlsroute branch from afdea88 to 1a67fc2 Compare May 19, 2026 19:30
david-yu added a commit that referenced this pull request May 19, 2026
… and CRD

Closes #1308. Squashed rebase of the original 14-commit branch onto
current main; drops stale golden-file drift and a review-cycle
import-reordering revert that had accumulated through iteration.

Chart side (`charts/console/`):
  - New `gateway` values block alongside `ingress`. Mutual exclusion
    enforced in render: enabling both fails with a clear error.
  - `gateway.go` renders a single HTTPRoute attached to the supplied
    parent Gateway(s) via `parentRefs` + optional `sectionName`.
  - Notes template shows the Gateway URL when gateway is enabled (and
    the Ingress URL when not).
  - Tests: mutual-exclusion, gateway-only, gateway→ingress switch,
    gateway removal scenarios.

Operator side (`operator/`):
  - `Console` CRD gains a `spec.gateway` field with the same shape as
    the chart's values; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1` so the Console reconciler can
    watch HTTPRoutes.
  - RBAC adds `gateway.networking.k8s.io/httproutes` perms.
  - Console controller's `SetupWithManager` skips the HTTPRoute watch
    if the Gateway API CRDs aren't installed in the cluster (graceful
    degradation; same pattern used for ServiceMonitor).

Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide).

Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute
support; both routes coexisted cleanly on the same Envoy Gateway. See:
#1329 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@david-yu david-yu force-pushed the feat/gateway-api-tlsroute branch from 1a67fc2 to 4914721 Compare May 19, 2026 19:41
david-yu added a commit that referenced this pull request May 19, 2026
Closes #1361. Squashed rebase of the original 14-commit branch onto
current main; consolidates the iterative CI/lint fixes and includes
the v1alpha2 scheme registration fix surfaced during the tandem
PR #1329 + #1447 e2e test on EKS 1.34.

Design:
  - User brings their own Gateway (TLSRoute-capable, e.g. Envoy
    Gateway). The chart only manages TLSRoute + ClusterIP backend
    services.
  - Per-listener `gateway: true` opt-in enables gradual migration.
    Traditional NodePort/LoadBalancer listeners and TLSRoute listeners
    coexist on different ports.
  - SNI-based routing: each broker gets a unique hostname via
    `host` / `hostTemplate` per listener.
  - Bootstrap TLSRoute handles initial client connections; per-broker
    TLSRoutes handle direct broker connections after metadata
    discovery.

Chart side (`charts/redpanda/`):
  - `external.gateway` block with `enabled`, `parentRefs`,
    `advertisedPort`.
  - Per-listener `gateway`, `host`, `hostTemplate` fields on
    `listeners.{kafka,http,admin,schemaRegistry}.external.*`.
  - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker)
    with proper SNI hostnames.
  - `service.gateway.go` renders ClusterIP backend services.
  - LoadBalancer / NodePort service rendering skips gateway-opted
    listeners so they coexist on different ports.
  - `secrets.go` constructs the per-listener gateway-aware advertised
    address.

Operator side (`operator/`):
  - `Redpanda` CRD gains the `external.gateway` and per-listener
    fields; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList +
    ListOptions) so the controller-runtime cache can List/Watch the
    chart-rendered TLSRoute resources. The chart's lightweight
    TLSRoute struct stays for gotohelm rendering; the type the
    operator watches via `Types()` is the upstream
    `gatewayv1alpha2.TLSRoute`.
  - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms.

Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS
Passthrough mode, OMB at 10 Mbps + Console k6 in tandem:
#1447 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@david-yu david-yu force-pushed the feat/gateway-api-tlsroute branch from 4914721 to bd28a50 Compare May 19, 2026 19:54
david-yu added a commit that referenced this pull request May 19, 2026
Closes #1361. Squashed rebase of the original 14-commit branch onto
current main; consolidates the iterative CI/lint fixes and includes
the v1alpha2 scheme registration fix surfaced during the tandem
PR #1329 + #1447 e2e test on EKS 1.34.

Design:
  - User brings their own Gateway (TLSRoute-capable, e.g. Envoy
    Gateway). The chart only manages TLSRoute + ClusterIP backend
    services.
  - Per-listener `gateway: true` opt-in enables gradual migration.
    Traditional NodePort/LoadBalancer listeners and TLSRoute listeners
    coexist on different ports.
  - SNI-based routing: each broker gets a unique hostname via
    `host` / `hostTemplate` per listener.
  - Bootstrap TLSRoute handles initial client connections; per-broker
    TLSRoutes handle direct broker connections after metadata
    discovery.

Chart side (`charts/redpanda/`):
  - `external.gateway` block with `enabled`, `parentRefs`,
    `advertisedPort`.
  - Per-listener `gateway`, `host`, `hostTemplate` fields on
    `listeners.{kafka,http,admin,schemaRegistry}.external.*`.
  - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker)
    with proper SNI hostnames.
  - `service.gateway.go` renders ClusterIP backend services.
  - LoadBalancer / NodePort service rendering skips gateway-opted
    listeners so they coexist on different ports.
  - `secrets.go` constructs the per-listener gateway-aware advertised
    address.

Operator side (`operator/`):
  - `Redpanda` CRD gains the `external.gateway` and per-listener
    fields; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList +
    ListOptions) so the controller-runtime cache can List/Watch the
    chart-rendered TLSRoute resources. The chart's lightweight
    TLSRoute struct stays for gotohelm rendering; the type the
    operator watches via `Types()` is the upstream
    `gatewayv1alpha2.TLSRoute`.
  - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms.

Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS
Passthrough mode, OMB at 10 Mbps + Console k6 in tandem:
#1447 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
david-yu added a commit that referenced this pull request May 19, 2026
… and CRD

Closes #1308. Squashed rebase of the original 14-commit branch onto
current main; drops stale golden-file drift and a review-cycle
import-reordering revert that had accumulated through iteration.

Chart side (`charts/console/`):
  - New `gateway` values block alongside `ingress`. Mutual exclusion
    enforced in render: enabling both fails with a clear error.
  - `gateway.go` renders a single HTTPRoute attached to the supplied
    parent Gateway(s) via `parentRefs` + optional `sectionName`.
  - Notes template shows the Gateway URL when gateway is enabled (and
    the Ingress URL when not).
  - Tests: mutual-exclusion, gateway-only, gateway→ingress switch,
    gateway removal scenarios.

Operator side (`operator/`):
  - `Console` CRD gains a `spec.gateway` field with the same shape as
    the chart's values; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1` so the Console reconciler can
    watch HTTPRoutes.
  - RBAC adds `gateway.networking.k8s.io/httproutes` perms.
  - Console controller's `SetupWithManager` skips the HTTPRoute watch
    if the Gateway API CRDs aren't installed in the cluster (graceful
    degradation; same pattern used for ServiceMonitor).

Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide).

Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute
support; both routes coexisted cleanly on the same Envoy Gateway. See:
#1329 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
david-yu added a commit that referenced this pull request May 19, 2026
… and CRD

Closes #1308. Squashed rebase of the original 14-commit branch onto
current main; drops stale golden-file drift and a review-cycle
import-reordering revert that had accumulated through iteration.

Chart side (`charts/console/`):
  - New `gateway` values block alongside `ingress`. Mutual exclusion
    enforced in render: enabling both fails with a clear error.
  - `gateway.go` renders a single HTTPRoute attached to the supplied
    parent Gateway(s) via `parentRefs` + optional `sectionName`.
  - Notes template shows the Gateway URL when gateway is enabled (and
    the Ingress URL when not).
  - Tests: mutual-exclusion, gateway-only, gateway→ingress switch,
    gateway removal scenarios.

Operator side (`operator/`):
  - `Console` CRD gains a `spec.gateway` field with the same shape as
    the chart's values; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1` so the Console reconciler can
    watch HTTPRoutes.
  - RBAC adds `gateway.networking.k8s.io/httproutes` perms.
  - Console controller's `SetupWithManager` skips the HTTPRoute watch
    if the Gateway API CRDs aren't installed in the cluster (graceful
    degradation; same pattern used for ServiceMonitor).

Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide).

Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute
support; both routes coexisted cleanly on the same Envoy Gateway. See:
#1329 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
david-yu added a commit that referenced this pull request May 19, 2026
… and CRD

Closes #1308. Squashed rebase of the original 14-commit branch onto
current main; drops stale golden-file drift and a review-cycle
import-reordering revert that had accumulated through iteration.

Chart side (`charts/console/`):
  - New `gateway` values block alongside `ingress`. Mutual exclusion
    enforced in render: enabling both fails with a clear error.
  - `gateway.go` renders a single HTTPRoute attached to the supplied
    parent Gateway(s) via `parentRefs` + optional `sectionName`.
  - Notes template shows the Gateway URL when gateway is enabled (and
    the Ingress URL when not).
  - Tests: mutual-exclusion, gateway-only, gateway→ingress switch,
    gateway removal scenarios.

Operator side (`operator/`):
  - `Console` CRD gains a `spec.gateway` field with the same shape as
    the chart's values; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1` so the Console reconciler can
    watch HTTPRoutes.
  - RBAC adds `gateway.networking.k8s.io/httproutes` perms.
  - Console controller's `SetupWithManager` skips the HTTPRoute watch
    if the Gateway API CRDs aren't installed in the cluster (graceful
    degradation; same pattern used for ServiceMonitor).

Bumps `sigs.k8s.io/gateway-api` to v1.5.1 (workspace-wide).

Validated end-to-end on EKS 1.34 in tandem with PR #1447's TLSRoute
support; both routes coexisted cleanly on the same Envoy Gateway. See:
#1329 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@david-yu
Copy link
Copy Markdown
Contributor Author

This should be ready for review, then 1329 would be need to be merge after to help complete out the Gateway API story. We can marke this as beta for 26.2.

@david-yu
Copy link
Copy Markdown
Contributor Author

How hostTemplate placeholders work — step by step

A few questions came up on how $POD_ORDINAL / $POD_NAME behave in listeners.kafka.external.<name>.hostTemplate. Short answer: these are chart-side template placeholders, expanded at chart render time. They are not environment variables; nothing needs to set them at runtime. The chart iterates the broker pods produced by the StatefulSet (plus any extra pools) and substitutes the placeholders per pod to produce one TLSRoute per broker.

The two placeholders

Placeholder Substituted with Source
$POD_ORDINAL loop index 0, 1, 2, … the for-loop index in TLSRoutes()
$POD_NAME the StatefulSet pod name, e.g. redpanda-0 PodNames(state, …)

From charts/redpanda/tlsroute.go:

for i, podname := range pods {
    brokerHost := renderBrokerHost(hostTemplate, i, podname)
    brokerSvcName := fmt.Sprintf("gw-%s", podname)
    routes = append(routes, &TLSRoute{
        ObjectMeta: metav1.ObjectMeta{
            Name: fmt.Sprintf("%s-%s-%s-%d", fullname, listenerTag, name, i),
            ...
        },
        Spec: TLSRouteSpec{
            Hostnames: []string{brokerHost},
            Rules: []TLSRouteRule{{ BackendRefs: []TLSRouteBackendRef{{
                Name: brokerSvcName, Port: port,
            }}}},
            ...
        },
    })
}

func renderBrokerHost(tmpl string, ordinal int, podName string) string {
    result := strings.ReplaceAll(tmpl, "$POD_ORDINAL", fmt.Sprintf("%d", ordinal))
    result = strings.ReplaceAll(result, "$POD_NAME", podName)
    return result
}

Worked example: replicas: 3 with $POD_ORDINAL

Given this values snippet:

statefulset:
  replicas: 3
external:
  enabled: true
  domain: example.com
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
listeners:
  kafka:
    external:
      default:
        port: 9094
        gateway: true
        host: redpanda.example.com
        hostTemplate: redpanda-$POD_ORDINAL.example.com

The chart renders 4 TLSRoute objects (1 bootstrap + 3 per broker):

# 1. Bootstrap route — single, taken verbatim from `host`
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: redpanda-kafka-default-bootstrap
spec:
  hostnames: [redpanda.example.com]
  rules:
    - backendRefs:
        - { name: redpanda-gateway-bootstrap, port: 9094 }

# 2. Per-broker route, i=0, podName=redpanda-0
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: redpanda-kafka-default-0
spec:
  hostnames: [redpanda-0.example.com]      # $POD_ORDINAL → 0
  rules:
    - backendRefs:
        - { name: gw-redpanda-0, port: 9094 }

# 3. Per-broker route, i=1, podName=redpanda-1
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: redpanda-kafka-default-1
spec:
  hostnames: [redpanda-1.example.com]      # $POD_ORDINAL → 1
  rules:
    - backendRefs:
        - { name: gw-redpanda-1, port: 9094 }

# 4. Per-broker route, i=2, podName=redpanda-2
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: redpanda-kafka-default-2
spec:
  hostnames: [redpanda-2.example.com]      # $POD_ORDINAL → 2
  rules:
    - backendRefs:
        - { name: gw-redpanda-2, port: 9094 }

$POD_NAME variant

If you'd rather key on the pod name than the ordinal — useful when the StatefulSet name isn't redpanda — use $POD_NAME:

hostTemplate: $POD_NAME.example.com

For the same 3-broker cluster above, this produces hostnames: [redpanda-0.example.com], [redpanda-1.example.com], [redpanda-2.example.com] — identical here because the pod name happens to be redpanda-<ordinal>, but with a renamed StatefulSet (e.g. rp.fullnameOverride: cluster-a) you'd get cluster-a-0.example.com etc.

You can mix them too: hostTemplate: $POD_NAME-broker-$POD_ORDINAL.example.comredpanda-0-broker-0.example.com, …

Multi-pool clusters

pods is built from the default StatefulSet plus any additional pools:

pods := PodNames(state, Pool{Statefulset: state.Values.Statefulset})
for _, set := range state.Pools {
    pods = append(pods, PodNames(state, set)...)
}

So a cluster with replicas: 3 on the default STS and an extra pool of 2 brokers renders 6 TLSRoutes (1 bootstrap + 5 per-broker), with $POD_ORDINAL running 0→4 across the combined pod list.

Required-when-replicas>1 safety check

For Kafka, hostTemplate is mandatory when replicas > 1 (Kafka clients need a unique advertised hostname per broker for the per-partition leader routing to work). Omitting it intentionally panics chart render to surface the misconfiguration early:

if listenerTag == "kafka" && len(pods) > 1 && hostTemplate == "" {
    panic(fmt.Sprintf("gateway listener %s/%s requires hostTemplate when replicas > 1", listenerTag, name))
}

For non-Kafka listeners (HTTP proxy / Admin / Schema Registry) the bootstrap route alone is enough; hostTemplate is optional and only emits per-broker routes when set.

🤖 Generated with Claude Code

Closes #1361. Squashed rebase of the original 14-commit branch onto
current main; consolidates the iterative CI/lint fixes and includes
the v1alpha2 scheme registration fix surfaced during the tandem
PR #1329 + #1447 e2e test on EKS 1.34.

Design:
  - User brings their own Gateway (TLSRoute-capable, e.g. Envoy
    Gateway). The chart only manages TLSRoute + ClusterIP backend
    services.
  - Per-listener `gateway: true` opt-in enables gradual migration.
    Traditional NodePort/LoadBalancer listeners and TLSRoute listeners
    coexist on different ports.
  - SNI-based routing: each broker gets a unique hostname via
    `host` / `hostTemplate` per listener.
  - Bootstrap TLSRoute handles initial client connections; per-broker
    TLSRoutes handle direct broker connections after metadata
    discovery.

Chart side (`charts/redpanda/`):
  - `external.gateway` block with `enabled`, `parentRefs`,
    `advertisedPort`.
  - Per-listener `gateway`, `host`, `hostTemplate` fields on
    `listeners.{kafka,http,admin,schemaRegistry}.external.*`.
  - `tlsroute.go` renders TLSRoute resources (bootstrap + per-broker)
    with proper SNI hostnames.
  - `service.gateway.go` renders ClusterIP backend services.
  - LoadBalancer / NodePort service rendering skips gateway-opted
    listeners so they coexist on different ports.
  - `secrets.go` constructs the per-listener gateway-aware advertised
    address.

Operator side (`operator/`):
  - `Redpanda` CRD gains the `external.gateway` and per-listener
    fields; goverter conversion auto-generated.
  - V2 scheme registers `gatewayv1alpha2` (TLSRoute + TLSRouteList +
    ListOptions) so the controller-runtime cache can List/Watch the
    chart-rendered TLSRoute resources. The chart's lightweight
    TLSRoute struct stays for gotohelm rendering; the type the
    operator watches via `Types()` is the upstream
    `gatewayv1alpha2.TLSRoute`.
  - RBAC adds `gateway.networking.k8s.io/tlsroutes` perms.

Validated end-to-end on EKS 1.34 with Envoy Gateway v1.2.6, TLS
Passthrough mode, OMB at 10 Mbps + Console k6 in tandem:
#1447 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@david-yu david-yu force-pushed the feat/gateway-api-tlsroute branch from bd28a50 to 5433e06 Compare May 20, 2026 05:17
@david-yu david-yu marked this pull request as ready for review May 20, 2026 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Different domain per listener

1 participant