Skip to content

docs(networking): Cilium Gateway API — architecture, security, migration#509

Open
lexfrei wants to merge 13 commits intomainfrom
docs/gateway-api-cilium
Open

docs(networking): Cilium Gateway API — architecture, security, migration#509
lexfrei wants to merge 13 commits intomainfrom
docs/gateway-api-cilium

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented Apr 23, 2026

What this PR does

Adds a new networking/gateway-api.md page to the next/ docs trunk describing the Cilium-backed Gateway API feature that lands in cozystack/cozystack#2470 (and its dependency stack on #2464 / #2468).

The page is intentionally detailed because the feature introduces:

  • a new platform-level toggle (gateway.enabled) and a new tenant-level toggle (tenant.spec.gateway);
  • per-tenant pinning of the Gateway's external IP via tenant.spec.gatewayIP (renders a dedicated CiliumLoadBalancerIPPool with a single-IP block and propagates the value to the Gateway as lbipam.cilium.io/ips via spec.infrastructure.annotations, GEP-1762);
  • a migration away from ingress-nginx for every cozystack-native exposed service (dashboard, keycloak via HTTPRoute; kubeapiserver, vm-exportproxy, cdi-uploadproxy via TLSRoute passthrough; harbor and bucket attached to per-tenant Gateways);
  • a new per-tenant cert-manager Issuer that gives every tenant an isolated ACME account, so child tenants no longer share HTTP-01 state with the parent;
  • a four-layer runtime admission defence against cross-tenant hostname hijacking (cozystack-gateway-hostname-policy, cozystack-tenant-host-policy, cozystack-namespace-host-label-policy, cozystack-gateway-attached-namespaces-policy) plus the listener allowedRoutes namespace whitelist;
  • a render-time safety net against misconfiguring publishing.gateway.attachedNamespaces with tenant namespaces.

Sections:

  • Overview — one-paragraph summary, opt-in defaults, coexistence with ingress-nginx.
  • Architecture — traffic-path mermaid, listener layout per tenant Gateway.
  • Enabling Gateway API — platform-level Package example and per-tenant Tenant example, with full attachedNamespaces list, plus a section on pinning a tenant Gateway to a specific external IP via tenant.spec.gatewayIP.
  • Per-service routing — tables for HTTPRoute (termination) and TLSRoute (passthrough), mapping service → namespace → route name → backend → listener.
  • Security — mermaid diagram and one paragraph per admission layer, explaining what each one enforces and what is explicitly left to trust boundaries (cluster-admin credentials, DNS control, shared LB IP pool).
  • Certificates — per-tenant Issuer, supported ACME servers, Let's Encrypt rate limits and mitigations.
  • Migration from ingress-nginx — step-by-step for new and existing clusters.
  • Known limitations — multi-tenant shared-IP (deferred until Cilium ListenerSet, cilium#42756), TLSRoute v1alpha2, tenant.spec.host admin responsibility, upstream application gaps.
  • Troubleshooting — concrete kubectl commands for the four most likely "stuck" states.
  • See also — upstream Gateway API, Cilium docs, KEP-5707, Let's Encrypt rate-limits, GEP-1762.

Target branch

next/ — the version-agnostic trunk. When cozystack/cozystack#2470 lands in a minor release, this page ships with that version's docs automatically.

Not included

The legacy v1/networking/gateway-api.md page on the abandoned docs/gateway-api branch (from the Envoy Gateway proposal in cozystack/cozystack#2213) is unrelated to this PR. That PR proposed a different architecture that has since been superseded. This PR ships fresh docs for the new Cilium-based design.

Release note

NONE

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 23, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit b48b79d
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/69fbd8d218f81700084dac27
😎 Deploy Preview https://deploy-preview-509--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds documentation and platform configuration for an opt-in Cilium-backed Gateway API: Helm-rendered per-tenant Gateway resources, controller materialization of Gateways/Issuers/Certificates, HTTPRoute/TLSRoute handling (redirects, ACME, passthrough), LoadBalancer IP pool usage, admission policies, migration notes, and troubleshooting.

Changes

Cohort / File(s) Summary
Gateway API documentation
content/en/docs/next/networking/gateway-api.md
New doc describing Cozystack’s opt-in Gateway API with Cilium: per-tenant TenantGateway rendering, controller materialization of Gateways/Issuers/Certificates, HTTPRoute/TLSRoute listener patterns (HTTP→HTTPS redirect, HTTP-01 vs DNS-01, optional TLS passthrough), cert issuance modes (prod/stage), LB IP allocation via CiliumLoadBalancerIPPool, security/validation policies, coexistence and migration guidance, and troubleshooting steps.
Platform package docs & schema
content/en/docs/next/operations/configuration/platform-package.md
Adds publishing.exposure documentation and publishing.certificates.dns01.* fields for DNS-01 providers; introduces gateway.enabled and gateway.attachedNamespaces platform values (with defaults) and documents how publishing modes map to Service types and validation/fail-fast behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Admin as Platform Helm/Values
    participant K8s as Kubernetes API
    participant Controller as cozystack-controller
    participant TenantNS as Tenant Namespace
    participant CertManager
    participant Envoy as Envoy DaemonSet
    participant LBPool as CiliumLoadBalancerIPPool

    Admin->>K8s: enable platform Gateway (gateway.enabled, attachedNamespaces)
    Admin->>K8s: install GatewayClass, ValidatingAdmissionPolicies
    TenantNS->>K8s: tenant with spec.gateway: true
    K8s->>Controller: render TenantGateway CRs
    Controller->>K8s: materialize Gateway, Issuer, Certificate, HTTPRoute/TLSRoute
    K8s->>CertManager: ACME certificate request (HTTP-01 or DNS-01)
    CertManager-->>K8s: certificate issued
    Controller->>LBPool: allocate tenant LoadBalancer IP
    Envoy->>K8s: program listeners (HTTPS, redirects, optional passthrough)
    Client->>Envoy: TLS or HTTP request
    Envoy->>TenantNS: route to backend per HTTPRoute/TLSRoute
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through Gateways, tidy and bright,

Tenants got doors, certs shining in light,
Envoy listens, LB IPs in a row,
Admission checks guard where requests may go,
I left a carrot-shaped doc to show.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: comprehensive documentation of Cilium-backed Gateway API covering architecture, security model, and migration strategy.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/gateway-api-cilium

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lexfrei lexfrei marked this pull request as ready for review April 23, 2026 17:43
@lexfrei lexfrei requested review from kvaps and lllamnyp as code owners April 23, 2026 17:43
@lexfrei lexfrei self-assigned this Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation for the Gateway API support in Cozystack, detailing its architecture, security model, and migration path from ingress-nginx. The review feedback identifies opportunities to improve technical accuracy and consistency, specifically by clarifying that namespace whitelisting applies to both HTTPRoute and TLSRoute resources and resolving a naming inconsistency for the Kubernetes API route.

- The exposed-service templates (dashboard, keycloak) stop rendering their `Ingress` and start rendering their `HTTPRoute`.
- TLS-passthrough services (cozystack-api, vm-exportproxy, cdi-uploadproxy) stop rendering their `Ingress` and start rendering a `TLSRoute` attached to a dedicated Passthrough listener.

The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation mentions that attachedNamespaces restricts HTTPRoute attachments. However, the architecture also utilizes TLSRoute for services like the Kubernetes API and KubeVirt proxies (as shown in the routing tables). It would be more accurate to state that this list applies to both HTTPRoute and TLSRoute (or Gateway API routes in general).

Suggested change
The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry.
The attachedNamespaces list restricts which namespaces may attach HTTPRoutes and TLSRoutes to tenant Gateways through the listener allowedRoutes whitelist (see [Security](#security)). It is also guarded by a runtime ValidatingAdmissionPolicy that rejects any tenant-* entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1ea0093: the paragraph now says HTTPRoute or TLSRoute. The allowedRoutes whitelist on the listener is route-kind-agnostic, so in practice it restricts every route type that attaches to the Gateway — including the TLSRoutes used for the Kubernetes API, vm-exportproxy, and cdi-uploadproxy.


| Service | Namespace | `TLSRoute` name | Backend | Listener |
|---|---|---|---|---|
| Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency in naming the Kubernetes API route. It is referred to as cozystack-api in the Mermaid diagram (line 27) and the migration section (line 265), but as kubernetes-api in this table. Using a consistent name throughout the document would improve clarity.

Suggested change
| Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` |
| Kubernetes API | default | cozystack-api | kubernetes:443 | tls-api |

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in dcb805c by aligning the Mermaid diagram to the real resource name. The TLSRoute is literally named kubernetes-api (see packages/system/cozystack-api/templates/api-tlsroute.yaml), so the table at line 144 is the source of truth. cozystack-api in the diagram referred to the cozystack package that ships this route, which was misleading. The diagram now says kubernetes-api and the migration prose clarifies the relationship (cozystack-api (Kubernetes API)).


Every listener on a tenant Gateway pins `allowedRoutes.namespaces.from: Selector` to a `matchExpressions` whitelist against the built-in `kubernetes.io/metadata.name` label. That label is written by kube-apiserver on every namespace and cannot be spoofed.

The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the comment on line 101, this section should clarify that the whitelist applies to both HTTPRoute and TLSRoute, as both are used in the described architecture.

Suggested change
The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway.
The whitelist is the publishing tenant's namespace (always, implicit) plus publishing.gateway.attachedNamespaces. A namespace outside the list literally cannot attach any HTTPRoute or TLSRoute to the Gateway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e15d865: the Layer 1 description now explicitly says HTTPRoute or TLSRoute. Same root cause as the line 101 comment — the listener-level whitelist applies to every route kind attaching to that listener.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@content/en/docs/next/networking/gateway-api.md`:
- Line 56: The in-page anchor "#tls-passthrough" in the sentence "Plus one extra
listener per TLS-passthrough service (see [TLS passthrough](`#tls-passthrough`)
below)" doesn't match the actual heading ID; locate the "TLS passthrough"
section heading in this document and either rename that heading (or add an
explicit HTML anchor/id) to produce the ID tls-passthrough, or update the link
fragment to the existing heading ID (for example whatever the generated slug
is); ensure the link target and the heading ID for the TLS passthrough section
are identical so the anchor works.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a003edc-54d7-4122-a90c-e40d9592e1c7

📥 Commits

Reviewing files that changed from the base of the PR and between 5415111 and 2a68b49.

📒 Files selected for processing (1)
  • content/en/docs/next/networking/gateway-api.md

Comment thread content/en/docs/next/networking/gateway-api.md Outdated
Copy link
Copy Markdown
Contributor

@myasnikovdaniil myasnikovdaniil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explanation of publishing.exposure flag in platform package, needs to be added

@lexfrei
Copy link
Copy Markdown
Contributor Author

lexfrei commented Apr 27, 2026

@myasnikovdaniil Added a publishing.exposure subsection to the Migration section in 927ca5f.

It covers what the flag does (ingress-nginx Service shape: ClusterIP+externalIPs vs LoadBalancer), why a Gateway API rollout is the natural moment to flip it (so ingress-nginx and the per-tenant Gateway draw from the same Cilium-managed pool), the KEP-5707 deprecation timeline that forces the move before Kubernetes v1.40, and the loadBalancer-mode caveats lifted from the platform values.yaml: non-empty publishing.externalIPs, externalTrafficPolicy: Local, no built-in Cilium announcement, brief ingress interruption when switching, and the scope limit to ingress-nginx (vpn and similar still need separate migration).

@lexfrei lexfrei requested a review from myasnikovdaniil April 27, 2026 09:05
@myasnikovdaniil
Copy link
Copy Markdown
Contributor

Platform parameters must also land into platform package reference

lexfrei added 8 commits April 27, 2026 14:41
…r-tenant ingress

Covers the architecture, the two-step opt-in (gateway.enabled at
platform level, tenant.spec.gateway per tenant), per-service routing
(HTTPRoute for termination, TLSRoute for passthrough), the four
independent ValidatingAdmissionPolicies that guard cross-tenant
hostname hijacking plus the listener allowedRoutes whitelist, the
per-tenant cert-manager Issuer that enables isolated ACME state for
child tenants, migration from ingress-nginx, rate-limit
considerations, and operational troubleshooting.

Weight 15 places the page between 'Architecture' (5) and 'HTTP
Cache' (20) in the networking section sidebar.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…d TLSRoute

Address review feedback from gemini-code-assist on
content/en/docs/next/networking/gateway-api.md:101: the whitelist guards both
HTTPRoute attachments (dashboard, keycloak, harbor, bucket) and TLSRoute
attachments (Kubernetes API, vm-exportproxy, cdi-uploadproxy), not only
HTTPRoute.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…TLSRoute name kubernetes-api

Address review feedback from gemini-code-assist on
content/en/docs/next/networking/gateway-api.md:144: the routing table listed
the TLSRoute as kubernetes-api (the real resource name in the cozystack-api
package, pointing at the kubernetes Service in the default namespace), but
the Mermaid diagram labelled it cozystack-api. Update the diagram to match
the actual resource name and add a parenthetical clarification in the
migration section that the cozystack-api package ships the Kubernetes API
TLSRoute.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…te and TLSRoute

Address review feedback from gemini-code-assist on
content/en/docs/next/networking/gateway-api.md:185: the Security section's
Layer 1 description said the listener allowedRoutes whitelist blocks
HTTPRoute attachments, but listener.allowedRoutes in Gateway API applies to
every route kind attaching to that listener — HTTPRoute on the HTTPS
listeners and TLSRoute on the tls-* Passthrough listeners.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…section

Address review feedback from coderabbitai on
content/en/docs/next/networking/gateway-api.md:56: the link fragment
#tls-passthrough did not match the heading ID Hugo generates for
'TLSRoute (TLS passthrough)' (which slugifies to tlsroute-tls-passthrough),
so the jump target was broken and markdownlint-cli2 flagged MD051.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…x Service modes

Address review feedback from @myasnikovdaniil: the Migration section
referenced `exposure: loadBalancer` in a YAML example without explaining
what the flag does. Add a subsection covering both modes (externalIPs vs
loadBalancer), the KEP-5707 deprecation timeline that motivates the
flip, and the loadBalancer-mode caveats (non-empty externalIPs,
externalTrafficPolicy: Local, no built-in Cilium announcement, brief
ingress interruption on switch, scope limited to ingress-nginx).

Signed-off-by: Aleksei Sviridkin <f@lex.la>
Layer 1 of the Security section called the whitelist
publishing.gateway.attachedNamespaces. The actual platform values
schema (packages/core/platform/values.yaml on
chore/gateway-api-crds-v1.5.1) puts attachedNamespaces directly under
the root gateway: key, and the helm consumer
(packages/core/platform/templates/apps.yaml) reads
.Values.gateway.attachedNamespaces. The publishing.gateway path appears
in the upstream PR description, the extra/gateway README, and one
helm-fail string, but it is not the real config path. Use
gateway.attachedNamespaces here to match the schema authors will
actually configure.

Signed-off-by: Aleksei Sviridkin <f@lex.la>
…meters

Address review feedback from @myasnikovdaniil: the platform parameters
introduced by the Gateway API rollout (gateway.enabled,
gateway.attachedNamespaces) and publishing.exposure were only described
in the Gateway API guide. Add them to the Platform Package Reference,
which is where operators look up platform values.

- publishing.exposure: new row in the Publishing table with both modes,
  KEP-5707 deprecation pointer, and a cross-reference to the Gateway API
  page for the full caveat list.
- New Gateway section between Authentication and Scheduling, mirroring
  the schema from packages/core/platform/values.yaml on
  chore/gateway-api-crds-v1.5.1: gateway.enabled and
  gateway.attachedNamespaces, with the default whitelist printed
  verbatim and a forward link to the Gateway API guide.

Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei lexfrei force-pushed the docs/gateway-api-cilium branch from 10b1e7c to 6888f84 Compare April 27, 2026 12:26
@lexfrei
Copy link
Copy Markdown
Contributor Author

lexfrei commented Apr 27, 2026

@myasnikovdaniil Done — added the platform parameters to the Platform Package Reference in 6888f84:

  • publishing.exposure row in the Publishing table.
  • New Gateway section between Authentication and Scheduling with gateway.enabled and gateway.attachedNamespaces, including the default whitelist verbatim.

Schema was verified against packages/core/platform/values.yaml on chore/gateway-api-crds-v1.5.1 (the parent PR is not yet merged). Side note: that verification surfaced one inconsistency in the Gateway API page itself — Layer 1 of the Security section called the whitelist publishing.gateway.attachedNamespaces, while the actual key is the root-level gateway.attachedNamespaces (consumed at packages/core/platform/templates/apps.yaml as .Values.gateway.attachedNamespaces). Fixed in 7db740b. The publishing.gateway path also appears in the upstream PR description and packages/extra/gateway/README.md; might be worth a one-line nit upstream.

The branch was rebased onto main to pick up the latest telemetry fixes, so SHAs from earlier comments shifted (10b1e7c1927ca5f).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
content/en/docs/next/operations/configuration/platform-package.md (2)

108-108: Minor: Consider consistent spelling variant.

Line 108 uses "Materialising" (British English). While both variants are correct, using consistent spelling throughout the documentation improves polish. Consider "Materializing" if the project prefers American English, or keep the current form if British English is the standard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/next/operations/configuration/platform-package.md` at line
108, The documentation uses the British English spelling "Materialising" in the
description for `gateway.enabled`; update that word to the project's chosen
variant (e.g., change "Materialising" to "Materializing") for consistency with
the rest of the docs—edit the text in the `gateway.enabled` description to the
preferred spelling.

66-66: Consider breaking up the dense table description for better scannability.

The publishing.exposure description packs mode definitions, deprecation timeline, validation behavior, and a caveat link into a single paragraph. Users scanning the table may miss the critical deprecation warning or the fail-fast validation note.

♻️ Suggested restructure for improved readability
-| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service. `externalIPs` creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`. `loadBalancer` creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` populated with the same addresses. `Service.spec.externalIPs` is deprecated upstream in Kubernetes v1.36 ([KEP-5707][kep-5707]) — switch to `loadBalancer` before upgrading past v1.40. The chart fails fast if `loadBalancer` is set with an empty `publishing.externalIPs`. See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for the full caveat list. |
+| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service.<br/><br/>`externalIPs`: Creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`.<br/><br/>`loadBalancer`: Creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` using the same addresses.<br/><br/>**Deprecation notice:** `Service.spec.externalIPs` is deprecated in Kubernetes v1.36 ([KEP-5707][kep-5707]). Switch to `loadBalancer` before upgrading to v1.40.<br/><br/>**Validation:** The chart returns an error if `loadBalancer` is set with an empty `publishing.externalIPs`.<br/><br/>See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for additional caveats. |

This uses <br/> tags (permitted by unsafe: true Goldmark config) to create visual breaks within the table cell, making each concept easier to locate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/next/operations/configuration/platform-package.md` at line
66, The table cell for publishing.exposure is too dense; split its single
paragraph into separate sentences or lines (using permitted <br/> tags) that
each cover: the two modes and what they do (externalIPs vs loadBalancer and that
loadBalancer uses CiliumLoadBalancerIPPool), the deprecation of
Service.spec.externalIPs (KEP-5707) with the upgrade advice to switch before
v1.40, the validation/fail-fast behavior when loadBalancer is set but
publishing.externalIPs is empty, and the link to Gateway API → ingress-nginx
Service mode, so readers can scan and find the deprecation and validation notes
quickly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@content/en/docs/next/operations/configuration/platform-package.md`:
- Line 108: The documentation uses the British English spelling "Materialising"
in the description for `gateway.enabled`; update that word to the project's
chosen variant (e.g., change "Materialising" to "Materializing") for consistency
with the rest of the docs—edit the text in the `gateway.enabled` description to
the preferred spelling.
- Line 66: The table cell for publishing.exposure is too dense; split its single
paragraph into separate sentences or lines (using permitted <br/> tags) that
each cover: the two modes and what they do (externalIPs vs loadBalancer and that
loadBalancer uses CiliumLoadBalancerIPPool), the deprecation of
Service.spec.externalIPs (KEP-5707) with the upgrade advice to switch before
v1.40, the validation/fail-fast behavior when loadBalancer is set but
publishing.externalIPs is empty, and the link to Gateway API → ingress-nginx
Service mode, so readers can scan and find the deprecation and validation notes
quickly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 144efd7e-3b34-413f-890f-38c874ba5c84

📥 Commits

Reviewing files that changed from the base of the PR and between 10b1e7c and 6888f84.

📒 Files selected for processing (2)
  • content/en/docs/next/networking/gateway-api.md
  • content/en/docs/next/operations/configuration/platform-package.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • content/en/docs/next/networking/gateway-api.md

Major rewrite of the Gateway API page to reflect the architecture
shipped in cozystack/cozystack#2470:

- Add TenantGateway CRD + cozystack-controller reconciliation flow.
  The chart no longer renders Gateway / Issuer / Certificate
  directly — those come from the controller reconciling a per-tenant
  TenantGateway CR. Adds a reconciliation-flow mermaid alongside the
  traffic-path one.
- Add HTTP-01 (default) vs DNS-01 (opt-in) cert-mode section. HTTP-01
  is the new default with per-listener Certificates and listeners
  added dynamically per HTTPRoute hostname. DNS-01 is the wildcard
  opt-in with parametrized provider — full provider matrix
  (cloudflare, route53, digitalocean, rfc2136). Document that the
  same provider config drives both per-tenant Issuers and the
  cluster-wide ClusterIssuers used by the legacy ingress flow.
- Renumber the security model from 5 layers to 7 layers and add the
  missing layers:
  - Layer 3 (cozystack-gateway-attached-namespaces-policy) was
    previously listed as Layer 5; recategorised to match the
    in-repo README ordering.
  - Layer 7 (cozystack-route-hostname-policy) — the HTTPRoute /
    TLSRoute hostname VAP scoped to tenant-* namespaces — was
    missing entirely. This is the layer that closes the cross-apex
    hostname surface a tenant user with HTTPRoute RBAC could
    otherwise exploit. Document its fail-closed behavior on missing
    namespace.cozystack.io/host label.
- Document the narrow port-80 listener allowedRoutes (only the
  tenant namespace + cozy-cert-manager) — the Layer 1 hardening
  that prevents app HTTPRoutes attaching by hostname from binding
  to port 80 and serving plaintext.
- Document the HTTPS listener allowedRoutes.kinds=[HTTPRoute]
  restriction (TLSRoute for passthrough listeners) — prevents
  GRPCRoute / TCPRoute / UDPRoute from bypassing the route-hostname
  VAP.
- Add HostnameConflict resolution section: cozy-* > tenant-*
  priority, lexicographic tiebreak, status condition under the
  controller's name in Status.Parents.
- Add Foreign-takeover guards section listing all five reconcile
  paths (Gateway, redirect HTTPRoute, Issuer, wildcard Certificate,
  per-listener Certificate) that refuse to silently take over
  pre-existing objects without an OwnerReference back to the
  TenantGateway.
- Add cilium-lb-pool empty-IPs exception note: in loadBalancer mode
  with empty publishing.externalIPs the chart skips the per-tenant
  pool render rather than failing — this is the legitimate operator
  pattern for clusters running BGP / L2-announce pools managed
  outside the chart.
- Refresh listener-naming docs: per-app listeners use a sha256
  suffix (`https-<first-label>-<8-hex>`) so two hostnames sharing
  the first label (harbor.foo.example.com vs harbor.alice.example.com)
  produce distinct names.
- Refresh troubleshooting: TenantGateway Ready=False with
  ReconcileError points at foreign-takeover refusals; route VAP
  rejection messages.

Plus platform-package.md: extend publishing.certificates table with
the dns01.* provider matrix so operators can find the wiring keys
in one place.

Signed-off-by: Aleksei Sviridkin <f@lex.la>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
content/en/docs/next/operations/configuration/platform-package.md (1)

59-67: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address markdownlint MD052 for kep-5707 by switching to an inline link in the table cell.

markdownlint-cli2 reports Missing link or image reference definition: "kep-5707" on the publishing.exposure row, even though the definition appears later in the file. This is typically a parsing limitation around table cells / reference-style links. Converting that specific mention to an inline link avoids the ambiguity and should clear the warning.

🔧 Proposed fix
-| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service. `externalIPs` creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`. `loadBalancer` creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` populated with the same addresses. `Service.spec.externalIPs` is deprecated upstream in Kubernetes v1.36 ([KEP-5707][kep-5707]) — switch to `loadBalancer` before upgrading past v1.40. The chart fails fast if `loadBalancer` is set with an empty `publishing.externalIPs`. See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for the full caveat list. |
+| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service. `externalIPs` creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`. `loadBalancer` creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` populated with the same addresses. `Service.spec.externalIPs` is deprecated upstream in Kubernetes v1.36 ([KEP-5707](https://github.com/kubernetes/enhancements/issues/5707)) — switch to `loadBalancer` before upgrading past v1.40. The chart fails fast if `loadBalancer` is set with an empty `publishing.externalIPs`. See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for the full caveat list. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/next/operations/configuration/platform-package.md` around
lines 59 - 67, The markdownlint MD052 warning is caused by the reference-style
link `[kep-5707]` inside the `publishing.exposure` table cell; update that table
row (the `publishing.exposure` entry in the diff) to use an inline link with the
full URL for KEP-5707 instead of the reference-style link so the linter can
resolve it (leave the later definition in place or remove it if you prefer, but
the table must use the inline URL).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@content/en/docs/next/networking/gateway-api.md`:
- Around line 100-127: Update the platform Package example for
cozystack.cozystack-platform so gateway.attachedNamespaces includes the default
namespace; modify the values under
spec.components.platform.values.gateway.attachedNamespaces to add "default"
alongside the listed cozy-* namespaces (ensure you edit the snippet showing
gateway.enabled: true for cozystack.cozystack-platform and update
gateway.attachedNamespaces accordingly).

---

Outside diff comments:
In `@content/en/docs/next/operations/configuration/platform-package.md`:
- Around line 59-67: The markdownlint MD052 warning is caused by the
reference-style link `[kep-5707]` inside the `publishing.exposure` table cell;
update that table row (the `publishing.exposure` entry in the diff) to use an
inline link with the full URL for KEP-5707 instead of the reference-style link
so the linter can resolve it (leave the later definition in place or remove it
if you prefer, but the table must use the inline URL).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 25ca4fc4-0ed7-4315-bd1a-4ae23063592b

📥 Commits

Reviewing files that changed from the base of the PR and between 6888f84 and b4d413c.

📒 Files selected for processing (2)
  • content/en/docs/next/networking/gateway-api.md
  • content/en/docs/next/operations/configuration/platform-package.md

Comment thread content/en/docs/next/networking/gateway-api.md
lexfrei added 3 commits May 1, 2026 13:32
Two findings from CodeRabbit on the docs/gateway-api-cilium branch:

1. The platform Package example in `gateway-api.md` was missing the
   `default` namespace from `gateway.attachedNamespaces`. The actual
   cozystack platform values (packages/core/platform/values.yaml)
   include it because the Kubernetes API TLSRoute lives next to the
   `kubernetes` Service in the `default` namespace. Add the entry
   and a one-line explanation so a copy-paste of the example matches
   the working default.

2. `platform-package.md` triggered markdownlint MD052 on the
   `[KEP-5707][kep-5707]` reference-style link inside the
   `publishing.exposure` table cell. markdownlint-cli2 has a known
   parsing limitation around reference-style links inside table
   cells. Switch to an inline URL and drop the now-orphaned
   `[kep-5707]:` reference definition.

Signed-off-by: Aleksei Sviridkin <f@lex.la>
…x tenants

cozystack/cozystack#2470 lands `tenant.spec.gateway` as auto-on for
tenants whose apex is derived from the parent (i.e. `tenant.spec.host`
is empty), opt-in for custom-apex tenants, and an explicit opt-out
escape hatch via `gateway: false`. Reflect that in the Per-tenant
Gateway section: replace the "Set spec.gateway: true on any tenant"
framing with the actual three-rule resolution (auto-on, opt-in,
opt-out) plus example manifests for each.

Update the Gateway section in platform-package.md's parameter table
to mirror the new resolution semantics.

Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ay to a specific external IP

Add a section under "Enabling Gateway API" describing the gatewayIP
field. When non-empty, the tenant chart renders a dedicated
CiliumLoadBalancerIPPool with a single-IP block scoped to the tenant
namespace, propagates the value through TenantGateway.spec.loadBalancerIP,
and the controller writes lbipam.cilium.io/ips=<addr> on the rendered
Gateway via spec.infrastructure.annotations (GEP-1762).

Note the IP must not overlap with publishing.externalIPs (Cilium
flags overlapping pools as conflicting), and the multi-tenant
shared-IP case is deferred until Cilium ListenerSet (cilium#42756).

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Copy link
Copy Markdown
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed alongside cozystack/cozystack#2470 — see parent review for the full context. Three asks specific to this docs PR; they all stem from the same restructure of the security framing that the PR review requests in the implementation repo.

1. Restructure the ## Security section in content/en/docs/next/networking/gateway-api.md

The current "guarded at seven independent layers" framing overstates the role of layers that aren't protecting against tenant users — under Cozystack's API surface model, tenants only write apps.cozystack.io/* resources through cozystack-api and don't hold RBAC on gateway.networking.k8s.io/*, core Namespaces, or cozystack.io/Package. So most of the seven layers are defense-in-depth (against chart bugs, controller bugs, supply-chain compromise, admin mistakes), not first-line tenant defenses.

Asks:

  • Replace the section intro with a split-by-purpose framing: tenant-user-input gates (Layer 4 + the cozystack-api admission-chain fix), defense-in-depth (Layers 1, 2, 5, 6, 7), and admin-against-themselves (Layer 3).
  • Update the mermaid diagram (lines 266–292) so the attacker arrow lands on Layer 4 / cozystack-api admission as the user-input boundary, with the other layers branching off as covering chart / controller / supply-chain failure modes rather than all converging from the same ATK node.
  • In the Layer 7 description (lines 328–334) replace a tenant user with HTTPRoute RBAC could otherwise exploit — that RBAC isn't granted in Cozystack by design. Reframe Layer 7 as defense-in-depth against an app-chart bug or supply-chain compromise emitting HTTPRoutes outside the apex.

2. Document tenant-user API surface explicitly somewhere in the page

A short paragraph (probably in ## Overview or just before ## Security) stating that tenants interact with the platform exclusively through apps.cozystack.io/* resources (Tenant, Bucket, Kubernetes, etc.) and that the security model is built around that constraint. This makes the rest of the Security section read correctly — without that anchor, readers can mistakenly assume tenants write Gateways or HTTPRoutes directly.

3. Rewrite the migration / pinning sections after the implementation PR drops tenant.spec.gatewayIP

The parent PR review asks for tenant.spec.gatewayIP and the CiliumLoadBalancerIPPool branch to be removed (they don't fit Cozystack's MetalLB-default LB stack and the node-public-IP semantics of publishing.externalIPs). When that lands:

  • ### 3. Pinning a tenant Gateway to a specific external IP (lines 177–197) needs to be removed or replaced with a shorter note that per-tenant IP pinning, when needed, is a cluster-admin-side metallb.universe.tf/loadBalancerIPs annotation, not a tenant API field.
  • The ### Gateway Service <pending> LoadBalancer IP troubleshooting entry (line 509) should be updated to point at MetalLB pool configuration as the resolution, since that's where IPs come from in default Cozystack.

Hold this third point until the implementation PR settles — the docs change is mechanical once gatewayIP is gone.


Happy to re-review once these are in.

…curity framing

Mirrors the cozystack/cozystack#2470 revisions:

- Security section reframed as three groups (tenant-user-input gates,
  defense-in-depth, admin-against-themselves) with the mermaid diagram
  redrawn so the attacker arrow lands on Layer 4 / cozystack-api
  admission as the user-input boundary; defense-in-depth and
  admin-against-themselves layers branch off as separate sources.
- Layer 7 wording reframed as defense-in-depth against chart bugs /
  supply-chain compromise; the prior 'tenant user with HTTPRoute RBAC'
  framing is wrong — tenants in Cozystack do not hold
  gateway.networking.k8s.io/* RBAC by design.
- New Tenant API surface section before the security model anchors the
  three-group framing.
- Pinning section rewritten under MetalLB: per-tenant IPAddressPool in
  cozy-metallb (label cozystack.io/per-tenant-gateway=true), controller
  writes metallb.universe.tf/address-pool annotation, advertisement is
  admin-side. Includes canonicalisation rules for the cross-Tenant
  uniqueness check (bare-vs-CIDR, IPv6 alternates, whitespace,
  unparseable input, too-broad CIDR rejected at admission) and the
  TOCTOU caveat under strict concurrency.
- Troubleshooting <pending> entry rewritten under MetalLB diagnostics:
  IPAddressPool / Advertisement / cross-Tenant query / speaker logs.
- New 'LB allocator prerequisites' subsection in the migration block
  with a worked L2Advertisement example for the per-tenant-gateway
  label selector.
- Traffic path diagram and architecture text updated to reference
  MetalLB pool flow instead of Cilium LB IPAM.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei
Copy link
Copy Markdown
Contributor Author

lexfrei commented May 7, 2026

Updated in lockstep with the implementation PR cozystack/cozystack#2470 — see my reply on the parent review for the full reasoning behind the three-group security framing and the design declines on inheritance and on dropping tenant.spec.gatewayIP.

Three docs-side asks from the previous review are addressed:

1. ## Security section restructured. Opens with the three-group framing (tenant-user-input gates / defense-in-depth / admin-against-themselves), lists the seven layer descriptions under that framing for completeness. Mermaid diagram redrawn so the attacker arrow lands on Layer 4 + cozystack-api admission as the user-input boundary; defense-in-depth and admin-against-themselves layers branch off as separate sources rather than all converging from a single ATK node. Layer 7 description reworded — the "tenant user with HTTPRoute RBAC" framing is dropped; tenants in Cozystack don't hold gateway.networking.k8s.io/* RBAC by design (new "Tenant API surface" subsection in ## Overview anchors that constraint).

2. Pinning section rewritten under MetalLB. tenant.spec.gatewayIP now translates to a per-tenant IPAddressPool in cozy-metallb labeled cozystack.io/per-tenant-gateway=true; controller writes metallb.universe.tf/address-pool on the Gateway's spec.infrastructure.annotations. The section explains the canonicalisation rules for the cross-Tenant uniqueness check (bare-vs-CIDR, IPv6 alternates, whitespace, unparseable input, too-broad CIDR rejected at admission) and includes the TOCTOU caveat under strict concurrency.

3. Troubleshooting <pending> rewritten to point at MetalLB diagnostics: kubectl get ipaddresspool / l2advertisement / bgpadvertisement, plus a one-liner for finding every Tenant currently using a gatewayIP.

Plus a fourth piece: new "LB allocator prerequisites" subsection in the migration block with a worked L2Advertisement example using the cozystack.io/per-tenant-gateway=true label selector — operators can copy-paste it as the minimum config the chart expects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants