diff --git a/docs/compute/development/rfcs/configmap-secret-mounts.md b/docs/compute/development/rfcs/configmap-secret-mounts.md new file mode 100644 index 00000000..8f1fe652 --- /dev/null +++ b/docs/compute/development/rfcs/configmap-secret-mounts.md @@ -0,0 +1,300 @@ +--- +status: proposed +--- + +# Mounting ConfigMaps and Secrets into Compute Instances (Unikraft Provider) + +> Drafted 2026-05-30, revised 2026-05-31. **This is the foundational referenced-data delivery design for compute, and it ships before [Image Pull Credentials](./image-pull-credentials.md)** — it introduces the resolver, companion delivery, and the scheduling gate; pull secrets become a later consumer of the same path. + +## Table of Contents + +- [Summary](#summary) +- [What this enables for users](#what-this-enables-for-users) +- [End-to-end flow](#end-to-end-flow) +- [The gap: cross-plane delivery](#the-gap-cross-plane-delivery) +- [Design](#design) + - [The referenced-data resolver](#the-referenced-data-resolver) + - [Consumption on the provider](#consumption-on-the-provider) + - [Scheduling gate](#scheduling-gate) + - [Rotation and restart](#rotation-and-restart) +- [Platform direction](#platform-direction) +- [Security](#security) +- [Alternatives](#alternatives) +- [Failure modes](#failure-modes) +- [What gets built](#what-gets-built) +- [Decisions](#decisions) +- [Open questions](#open-questions) + +--- + +## Summary + +A compute `Workload` can already *describe* config and secret mounts: a volume +sourced from a ConfigMap or Secret, a container attachment with a mount path, and +environment variables that reference a key. The runtimes can already *consume* them — +the Unikraft runtime runs instances from Pod specs through its kubelet integration, +which honors ConfigMap/Secret references as both environment variables and volume +mounts, and the GCP provider mounts them as files too. So the API is real and the +runtimes support it. + +The one thing missing is in the middle: **the referenced data never reaches the cell +where the instance runs.** It lives in the user's project; the instance runs on an +edge cell; federation propagates only the `WorkloadDeployment`. The instance comes up +referencing data that isn't there. + +This RFC closes that gap. It keeps the user contract unchanged (create a +ConfigMap/Secret, reference it by name), resolves the reference in the trusted +management plane, and delivers the data to the edge as a derived companion object — +secret bytes **never enter the Workload or Instance spec**. Both environment +variables and file mounts work, because once the data is present the runtime's +existing Pod-spec consumption handles the rest. + +## What this enables for users + +Today users can only set literal environment variables, so configuration and +credentials get baked into images or pasted in as plaintext. After this: + +- A user creates a `ConfigMap` and `Secret` in their project and references them + from the Workload; the platform delivers that data to every instance in every POP + cell, without the user ever knowing federation exists. +- **Both forms work** — keys surfaced as environment variables (the twelve-factor + case) and ConfigMaps/Secrets mounted as files at a path (config files, + certificates). +- **Secrets stay secret** — values never appear in the Workload or Instance the user + sees; they travel only as Secret objects. + +## End-to-end flow + +The decided design: management-plane resolution → companion object → federation → +cell → provider. The `WorkloadReconciler`, `ReferencedDataController`, and +`Federator` run in the management plane; the edge cell and the compute provider run +on the POP cell. + +```mermaid +sequenceDiagram + actor User + participant P as Project plane + participant WC as WorkloadReconciler + participant RDC as ReferencedDataController + participant F as Federator + participant K as Karmada hub + participant C as Edge cell + participant PR as Compute provider + participant U as kraftlet / UKC + + User->>P: 1. Create ConfigMap + Secret + User->>P: 2. Create Workload referencing them + Note over P: Admission check —
author may read the referenced objects + WC->>P: 3. Create a WorkloadDeployment per placement + RDC->>P: 4. Read referenced ConfigMap/Secret (scoped, trusted) + RDC->>K: 5. Materialize a companion copy in the project's federation namespace + RDC->>P: 6. Record the expected companion set on the WorkloadDeployment + F->>K: 7. Replicate the WorkloadDeployment + routing policy + K->>C: 8. Propagate the deployment + companions to each matching cell + C->>C: 9. Create the Instance, held by a referenced-data gate + C->>C: 10. Companions present? clear the gate, mark data ready + PR->>C: 11. Translate the Instance into a Pod spec referencing the companions + Note over PR,U: Kubelet integration mounts the volumes and
injects the env vars natively from the present data + PR->>U: 12. Launch the instance with config/secret applied + U-->>User: 13. Instance running with config/secret applied +``` + +## The gap: cross-plane delivery + +The referenced ConfigMap/Secret lives in the user's project namespace; the instance +runs on an edge cell, possibly thousands of miles away. The federation channel +carries only the `WorkloadDeployment`, so the data has no path to the cell. There is +also no gate guarding this — the instance isn't even held back; it simply launches +with the reference unresolved. + +Consumption is *not* a gap: the Unikraft runtime's kubelet integration already +resolves ConfigMap/Secret references in a Pod spec — env vars and volume mounts +alike — provided the referenced objects are present where it resolves them. So the +whole problem is getting the data to the cell, and faithfully carrying the +references through into the Pod spec the runtime consumes. + +## Design + +### The referenced-data resolver + +A new management-plane controller is the heart of delivery. For each +WorkloadDeployment it: + +1. **Collects** every ConfigMap/Secret the template references — environment + references and volume sources today; image pull secrets later. +2. **Reads** them with a scoped, trusted project-plane identity. The management plane + already has legitimate project access, so broad project-secret read never leaves + it. +3. **Materializes** one labeled companion per referenced object in the project's + federation namespace. +4. **Records** the expected companion set on the WorkloadDeployment, so the cell + knows exactly what to wait for rather than guessing. +5. **Routes** companions to cells by extending the existing federation routing policy + to carry the labeled companions alongside the deployment. + +One companion exists per referenced object and is replicated to each placed cell — a +single object to create, update, and delete. When several deployments reference the +same object the companion is shared and reference-counted, removed only when the last +reference goes away. In single-cluster mode the same resolver runs and the companion +is simply a local copy. + +### Consumption on the provider + +Once the companions are present on the cell, the provider translates the Instance +into the Pod spec the runtime consumes — carrying the volume sources, volume mounts, +and environment references through faithfully and pointing them at the delivered +companions, with the referenced data present in the namespace the runtime resolves +from. The kubelet integration then mounts the volumes and injects the environment +variables natively; there is no provider-side inlining of secret values. (The +provider does not do this faithful translation today — it drops volumes and copies +only literal env values — so this is the provider-side work this RFC covers. The GCP +provider performs the equivalent translation already, which is why the same Workload +runs on either substrate.) + +### Scheduling gate + +An instance that references any ConfigMap/Secret is held by a **referenced-data +scheduling gate**, alongside the existing network and quota gates. The cell clears it +once exactly the expected companion set is present, and surfaces a +`ReferencedDataReady` status with clear reasons — resolving, awaiting propagation, +source not found, source unauthorized, source too large, or ready — backed by events +and metrics so a held instance is diagnosable, not a silent hang. The compute +provider must respect scheduling gates so an instance is never launched with its data +missing; this RFC adds that behavior. + +### Rotation and restart + +Decided: **no automatic roll; an explicit restart instead.** When a source changes, +the resolver re-reads it and refreshes the companion, so the latest values are staged +at the edge for the next instance launch. Running instances are not rolled +automatically — a fleet-wide restart on every edit is surprising, and a running +instance's environment isn't mutated in place regardless. + +Compute already performs ordered, in-place rolling updates when a Workload's template +changes. The restart reuses that: a conventional restart annotation on the template +rolls the instances, which pick up the refreshed values — no new machinery. An +opt-in automatic roll on content change is a possible future addition, not part of +this RFC. + +## Platform direction + +The delivery half of this design — follow references, read them in the trusted +plane, materialize derived companions, route them to the cells where the resource is +placed, and signal readiness — is **not specific to compute**. It's a recurring +platform need: image pull credentials want the same thing next, and the network +operator already propagates derived Secrets/ConfigMaps to cells by label today. The +building blocks are already platform-level — the shared namespace-mapping and +downstream-delivery library, the label-based propagation pattern, and the +established policy-driven capabilities (quota, activity, insights) that a delivery +policy would sit naturally beside. + +We deliberately **do not** build that generic capability now. With a single consumer +in hand the abstraction's seams aren't yet known, and a cross-cutting platform +capability would slow the first ship and widen the security review. Instead, this RFC +builds toward it on purpose: + +- **Build the resolver in compute now, behind a narrow, capability-shaped + interface** — in: a subject, its set of referenced objects, and its placement + targets; out: companions delivered plus a readiness signal. It reuses the existing + platform delivery library rather than inventing its own placement and cleanup. +- **Keep delivery cleanly separable from consumption.** The scheduling gate and the + translation into the runtime's Pod spec stay in compute and depend only on the + readiness signal, so the delivery component carries no compute-specific knowledge. +- **Promote on the second consumer.** When a second user of this pattern appears + (image pull credentials, or another service), lift the delivery component into the + platform as a capability — most likely an admin-authored delivery policy that + declares, per resource kind, which references to follow and where to deliver them, + fitting the existing capability-policy pattern. Two real consumers is when the + abstraction can be shaped correctly. + +This keeps compute shippable and autonomous today while making the design a +deliberate step toward a shared capability, not a one-off to untangle later. A +governance benefit falls out: when the policy lands, *what may be propagated, and +where* becomes an inspectable, access-controlled object rather than logic buried in a +controller. + +## Security + +- **Bytes never in user-visible specs.** The Workload and the Instance the user sees + carry references only. Values exist as Secret objects in the project's federation + namespace and on the cell where the runtime mounts them — never in anything + projected back to the user. +- **Companion Secrets stay Secrets** end to end; ConfigMap companions carry only + non-secret config. The runtime mounts the companions directly, so the provider + never has to inline secret values itself. +- **Authorization.** Admission verifies the submitting user can read each referenced + object — the same check already used for referenced Networks. A user cannot pull in + an object they couldn't read themselves; the resolver's system identity is never + the authority. +- **Trust boundary at the edge.** Resolving in the management plane is deliberate, so + the shared, lower-trust edge never holds a credential that can read project + ConfigMaps/Secrets. Companions are isolated per project namespace on each cell. +- **At rest.** Companions live in storage on the project plane, the hub, and each + cell; this presumes encryption at rest on every plane. + +## Alternatives + +- **Let the provider read the originals from the edge (no companions).** The leanest + option — it removes the resolver, companions, routing changes, and the data gate, + and it is how the GCP provider already works. **Rejected for secret bytes:** it + requires the shared edge to hold a credential that reads project ConfigMaps and + Secrets, exactly the trust boundary this design keeps in the management plane. (A + config-only hybrid was considered and rejected to avoid maintaining two delivery + paths.) +- **Inline resolved values into the Instance.** Rejected — leaks secret bytes into + storage everywhere and into what the user sees. +- **Propagate the user's original objects directly.** Rejected — couples cell + contents to arbitrary project objects and loses the scoping boundary. +- **A separate controller for pull secrets.** Rejected — same machinery; pull secrets + become a thin consumer of this resolver instead. + +## Failure modes + +- **Source missing, unauthorized, or too large** → gate held, status names the + offending object; optional sources are skipped. +- **Companion not yet on the cell** → gate held (awaiting propagation); a normal + transient state during placement. +- **Source changed, instances not rolled** → stale by design until restarted; + last-synced state is surfaced so it's observable. +- **Single-cluster mode** → the local-copy path must be exercised so the absence of + federation never silently disables delivery. + +## What gets built + +- A **referenced-data resolver** in the management plane: collect, read, materialize, + reference-count, and clean up companions. +- A **scoped project-plane read identity** for the resolver (built here; reused later + by image pull credentials). +- **Federation routing** extended to carry companions to the same cells as the + deployment. +- A **referenced-data scheduling gate**, cell-side clearing, and the + `ReferencedDataReady` status with reasons, events, and metrics. +- **API additions**: a bulk "import all keys" env form, and completing volume + validation (secret volumes, key→path selection, file mode). +- **Provider changes**: respect scheduling gates, and faithfully translate the + Instance's volumes, mounts, and env references into the Pod spec the runtime + consumes. +- A **restart** path (a conventional template annotation) so a rotated source can be + picked up on demand. + +## Decisions + +- **Delivery:** management-plane companions (not edge-read). +- **Rotation:** no auto-roll; explicit restart. +- **Gate contract:** an explicit expected-companion set recorded on the deployment, + not guessed. +- **One resolver, not two:** pull secrets are a later consumer. +- **Platform direction:** build delivery behind a capability-shaped seam in compute + now; promote it to a platform-owned, policy-driven capability when a second + consumer appears — not before. +- **Sequencing:** ships before image pull credentials; owns the scoped read identity + and provider gate-honoring. + +## Open questions + +1. **Scoped-read granularity:** can the resolver's project read be scoped to specific + object types or labels, or is it broad config/secret read? +2. **Companion size limits** and behavior when exceeded. +3. **Bulk env import in v1**, or per-key references only for the first release? +4. **VM runtime** consumption — out of scope for Unikraft (sandbox-only); confirm + deferral.