feat: Kubernetes deployment support (Dockerfile, manifests, docs) by chaosreload · Pull Request #13 · zerobootdev/zeroboot

chaosreload · 2026-03-24T10:18:46Z

Summary

Add first-class Kubernetes deployment support for Zeroboot, addressing all items in #9.

EKS validation: All manifests and scripts validated on EKS 1.31 / ap-southeast-1 / c8i.xlarge with nested virtualization.

Note: The serve bind address fix (127.0.0.1 → 0.0.0.0) required for K8s health probes is tracked separately in #14 and should be merged before or alongside this PR.

Changes

`Dockerfile`

Multi-stage build: Rust 1.86 compiler stage + Ubuntu 22.04 runtime. Firecracker binary bundled at build time. vmlinux and rootfs images are not baked into the image — they are mounted via host storage, keeping the image lean and allowing runtime upgrades without rebuilding.

`docker/entrypoint.sh`

Validates /dev/kvm access on startup (fast-fail with clear error message)
Creates the snapshot template on first boot (~19s), skips if already present
Supports Python and optional Node.js runtimes via env vars

`deploy/k8s/`

Reference manifests ready to apply:

namespace.yaml — dedicated zeroboot namespace
daemonset.yaml — default deployment (see DaemonSet section below)
deployment.yaml — alternative for HPA / manual replica control (single-replica, RWO PVC)
pvc.yaml — 20 Gi gp3 PVC for deployment.yaml
service.yaml — ClusterIP service (works with both DaemonSet and Deployment)
hpa.yaml — custom-metric HPA on zeroboot_concurrent_forks (for deployment.yaml)

`deploy/eks/`

EKS-specific deployment configs:

eks-cluster-only.yaml — create cluster without node groups (Step 1)
eks-self-managed-kvm.sh — end-to-end script for self-managed ASG with CpuOptions.NestedVirtualization=enabled; see EKS note below
eks-with-kvm-nodegroup.yaml — eksctl managed node group config (⚠️ see caveat)
eks-add-kvm-nodegroup.yaml — add KVM node group to existing cluster

`docs/KUBERNETES.md`

Comprehensive deployment guide covering:

EC2 instance selection: c8i/m8i/r8i recommended
EKS managed vs self-managed node groups — root cause and solution for CpuOptions being silently dropped
KVM device access (privileged + hostPath, with note on kubevirt alternative)
DaemonSet vs Deployment trade-offs and when to use each
Storage: hostPath for DaemonSet, PVC for Deployment
Autoscaling, Prometheus, configuration reference, known limitations

DaemonSet as Default (updated after reviewer feedback)

Following @minhoryang's suggestion, the default deployment is now DaemonSet instead of Deployment.

Why DaemonSet:

zeroboot is tightly bound to /dev/kvm and host CPU microarchitecture — the natural unit of scale is a Node, not a Pod replica
Adding a KVM-capable node should automatically bring up a new zeroboot instance — DaemonSet is the correct primitive for this semantics
Eliminates the need for podAntiAffinity + replicas: N workarounds
Integrates cleanly with Karpenter/Cluster Autoscaler for node-level autoscaling

Storage: DaemonSet uses hostPath: /var/lib/zeroboot (DirectoryOrCreate). Firecracker snapshots are bound to the host CPU microarchitecture and KVM state — local storage is the correct choice. Cross-node snapshot migration is not supported and not needed for short-lived sandbox workloads.

Deployment option retained for cases requiring HPA or manual replica control. Documented as an advanced alternative.

⚠️ EKS Node Group: Use Self-Managed

TL;DR: EKS managed node groups silently drop CpuOptions.NestedVirtualization — your nodes start without /dev/kvm even if the eksctl YAML looks correct.

Root cause: When you provide a Launch Template to a managed node group, EKS generates a new internal LT and merges only a subset of fields. CpuOptions is not in that subset — despite not being listed in the official blocked-fields list. This is a documentation gap.

Solution: Use deploy/eks/eks-self-managed-kvm.sh, which creates an ASG + Launch Template directly via AWS CLI, bypassing EKS's internal LT generation entirely.

Architecture note

Kubernetes manages the lifecycle of the zeroboot server process — it does not schedule individual sandboxes. Each /v1/exec request is handled entirely within the Pod via a KVM fork (~0.8 ms). K8s role is capacity management: health checks, rolling updates, and node-level scaling.

Testing

Validated on EKS 1.31 / ap-southeast-1 / c8i.xlarge (self-managed node group with nested virt):

✅ DaemonSet: 2 nodes → 2 pods scheduled automatically (one per KVM-capable node)
✅ hostPath /var/lib/zeroboot created automatically on each node
✅ /dev/kvm present on nodes
✅ Template creation completes (~19s)
✅ Readiness probe passes after template creation
✅ /v1/health → {"status":"ok","templates":{"python":{"ready":true,"numpy":true}}}
✅ CODE:print(1+1) → 2 (fork 2.2ms, exec 118ms)
✅ CODE:import numpy as np; print(np.array([1,2,3,4,5]).mean()) → 3.0 (exec 259ms)
✅ cat /etc/os-release → Ubuntu 22.04 content (exec 28ms)

Depends on: #14

Closes #9

P0 fixes: - serve: change default bind from 127.0.0.1 to 0.0.0.0 to fix K8s health probes and Service routing; add --bind flag for explicit control - entrypoint.sh: pass $ZEROBOOT_BIND (default 0.0.0.0) to serve command P1 fixes: - deployment.yaml: replace devices.kubevirt.io/kvm (requires kubevirt) with privileged: true + hostPath /dev/kvm (works on plain EKS) - deployment.yaml: increase livenessProbe initialDelaySeconds from 60 to 120; template creation takes ~19s, 60s was too tight on slow EBS attach - deployment.yaml: add /dev/kvm hostPath volume and mount EKS self-managed node group (new file): - deploy/eks/eks-self-managed-kvm.sh: end-to-end script to create a self-managed ASG + Launch Template with CpuOptions.NestedVirtualization=enabled EKS managed node groups silently drop CpuOptions — self-managed bypasses this - deploy/eks/eks-with-kvm-nodegroup.yaml: add warning about CpuOptions being dropped by managed node groups (documented as a gap vs AWS official docs) Docs: - docs/KUBERNETES.md: add EKS managed vs self-managed section with root cause analysis and the recommended self-managed approach - docs/KUBERNETES.md: add server bind address configuration note - docs/KUBERNETES.md: add ZEROBOOT_BIND env var reference Validated on: EKS 1.31 / ap-southeast-1 / c8i.xlarge (nested virt) Ref: chaosreload/zeroboot PR zerobootdev#13

- Dockerfile: multi-stage build (Rust compiler + Ubuntu runtime) Firecracker bundled; vmlinux/rootfs mounted via PVC - docker/entrypoint.sh: handles template creation on first boot, skips if snapshot already exists on PVC - deploy/k8s/: namespace, PVC (gp3 20Gi), Deployment with KVM device plugin resource, podAntiAffinity, health probes, HPA, Service - docs/KUBERNETES.md: EC2 instance family requirements, KVM device plugin setup, PVC storage guidance, autoscaling with custom metric (zeroboot_concurrent_forks), Karpenter NodePool example, ServiceMonitor config, configuration reference Closes zerobootdev#9

c8i/m8i/r8i support nested virtualization on regular (non-metal) sizes via --cpu-options NestedVirtualization=enabled. Other families (c6i, m6i etc.) require .metal sizes for KVM access. Update instance table to make this distinction explicit.

Three files covering two scenarios: - eks-with-kvm-nodegroup.yaml: cluster + KVM node group in one shot - eks-cluster-only.yaml: cluster only (no node groups) - eks-add-kvm-nodegroup.yaml: add KVM node group to existing cluster All configs use c8i.xlarge with cpuOptions.nestedVirtualization=enabled, AmazonLinux2023 AMI, and aws-ebs-csi-driver addon for PVC support.

P0 fixes: - serve: change default bind from 127.0.0.1 to 0.0.0.0 to fix K8s health probes and Service routing; add --bind flag for explicit control - entrypoint.sh: pass $ZEROBOOT_BIND (default 0.0.0.0) to serve command P1 fixes: - deployment.yaml: replace devices.kubevirt.io/kvm (requires kubevirt) with privileged: true + hostPath /dev/kvm (works on plain EKS) - deployment.yaml: increase livenessProbe initialDelaySeconds from 60 to 120; template creation takes ~19s, 60s was too tight on slow EBS attach - deployment.yaml: add /dev/kvm hostPath volume and mount EKS self-managed node group (new file): - deploy/eks/eks-self-managed-kvm.sh: end-to-end script to create a self-managed ASG + Launch Template with CpuOptions.NestedVirtualization=enabled EKS managed node groups silently drop CpuOptions — self-managed bypasses this - deploy/eks/eks-with-kvm-nodegroup.yaml: add warning about CpuOptions being dropped by managed node groups (documented as a gap vs AWS official docs) Docs: - docs/KUBERNETES.md: add EKS managed vs self-managed section with root cause analysis and the recommended self-managed approach - docs/KUBERNETES.md: add server bind address configuration note - docs/KUBERNETES.md: add ZEROBOOT_BIND env var reference Validated on: EKS 1.31 / ap-southeast-1 / c8i.xlarge (nested virt) Ref: chaosreload/zeroboot PR zerobootdev#13

src/main.rs and entrypoint.sh bind address changes belong in a dedicated fix PR. This PR should only contain K8s deployment configs and docs. The deployment.yaml already handles the 127.0.0.1 limitation via the hostPath /dev/kvm approach; users can add a socat sidecar if needed until the fix PR is merged.

minhoryang · 2026-03-28T05:51:39Z

docs/KUBERNETES.md

+               │
+        ┌──────┼──────┐
+        │      │      │
+     Pod-1  Pod-2  Pod-3        ← one Pod per KVM-capable Node (podAntiAffinity)


Curious — if we need one Pod per KVM-capable Node, would a DaemonSet make sense here?

Good call — switched to DaemonSet as the default in the latest commit.

Added deploy/k8s/daemonset.yaml:

nodeSelector: kvm-capable: "true" — only runs on KVM-capable nodes

hostPath: /var/lib/zeroboot with DirectoryOrCreate — local storage is the right choice here since Firecracker snapshots are bound to the host CPU microarchitecture and KVM state; cross-node migration is not meaningful anyway

updateStrategy: RollingUpdate with maxUnavailable: 1

Kept deployment.yaml as an annotated advanced alternative for HPA or manual replica control scenarios.

Updated KUBERNETES.md: DaemonSet + node-level autoscaling (Karpenter) is now the primary scaling model; Deployment + HPA documented as the fallback.

- Add deploy/k8s/daemonset.yaml: one Pod per KVM-capable node via nodeSelector, hostPath /var/lib/zeroboot (DirectoryOrCreate), RollingUpdate strategy - Update deploy/k8s/deployment.yaml: add header comment clarifying it is an advanced alternative for HPA/manual replica control scenarios - Update docs/KUBERNETES.md: - Architecture diagram updated to reflect DaemonSet semantics - New "Deploying" section with Option A (DaemonSet) and Option B (Deployment) - New "Persistent storage" section explaining hostPath rationale and trade-offs (snapshot CPU-affinity, no cross-node migration, node drain behavior) - Autoscaling section: DaemonSet node-level scaling as primary path, HPA documented under Deployment advanced usage - Limitations updated to reflect DaemonSet/hostPath model Addresses reviewer question: DaemonSet is the correct primitive when the scaling unit is a KVM-capable node, not a Pod replica.

…i→c8i/m8i/r8i - deployment.yaml: replicas: 2 → 1; RWO PVC cannot be mounted by multiple pods simultaneously. Running 2 replicas would cause one pod to be stuck Pending. - docs/KUBERNETES.md: - Karpenter NodePool: remove c6i/c7i from instance-family list; these families require .metal sizes for KVM — non-metal c6i/c7i nodes provisioned by Karpenter would never pass the kvm-capable check - Remove duplicate horizontal rule (lines 77-78) - Fix two stale "set in deployment.yaml" references to include daemonset.yaml Found by Claude Code review.

chaosreload · 2026-04-04T14:51:38Z

Hi @adammiribyan 👋

This PR adds Kubernetes deployment support for Zeroboot (closes #9). It's been validated on EKS 1.31 / ap-southeast-1 / c8i.xlarge.

Following feedback from @minhoryang, I've updated the default deployment to DaemonSet (instead of Deployment), which better fits Zeroboot's node-bound nature (/dev/kvm + host CPU microarchitecture).

Would you be able to take a look and review when you have a chance? Happy to address any feedback. 🙏

chaosreload mentioned this pull request Mar 24, 2026

K8s deployment #9

Open

4 tasks

chaosreload and others added 7 commits March 25, 2026 21:29

docs: add K8s PR validation plan

3b44e2e

chore: remove VALIDATION-PLAN.md (internal doc, not for upstream)

8809eb6

chaosreload force-pushed the feat/kubernetes-deployment branch from a2898a1 to c41f21c Compare March 25, 2026 21:29

minhoryang reviewed Mar 28, 2026

View reviewed changes

weichao added 2 commits March 30, 2026 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Kubernetes deployment support (Dockerfile, manifests, docs)#13

feat: Kubernetes deployment support (Dockerfile, manifests, docs)#13
chaosreload wants to merge 9 commits intozerobootdev:mainfrom
chaosreload:feat/kubernetes-deployment

chaosreload commented Mar 24, 2026 •

edited

Loading

Uh oh!

minhoryang Mar 28, 2026

Uh oh!

chaosreload Mar 30, 2026

Uh oh!

chaosreload commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chaosreload commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Dockerfile

docker/entrypoint.sh

deploy/k8s/

deploy/eks/

docs/KUBERNETES.md

DaemonSet as Default (updated after reviewer feedback)

⚠️ EKS Node Group: Use Self-Managed

Architecture note

Testing

Uh oh!

minhoryang Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

chaosreload Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

chaosreload commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaosreload commented Mar 24, 2026 •

edited

Loading

`Dockerfile`

`docker/entrypoint.sh`

`deploy/k8s/`

`deploy/eks/`

`docs/KUBERNETES.md`