Skip to content

feat(ec2): k8s NetworkPolicy enforcement for security groups (#1745 phase 4)#1757

Open
vieiralucas wants to merge 4 commits into
worktree-ec2-netiso-batch3-sg-nftablesfrom
worktree-ec2-netiso-batch4-k8s-netpol
Open

feat(ec2): k8s NetworkPolicy enforcement for security groups (#1745 phase 4)#1757
vieiralucas wants to merge 4 commits into
worktree-ec2-netiso-batch3-sg-nftablesfrom
worktree-ec2-netiso-batch4-k8s-netpol

Conversation

@vieiralucas

@vieiralucas vieiralucas commented Jun 17, 2026

Copy link
Copy Markdown
Member

Summary

Phase 4 of EC2 real network isolation (#1745). Stacked on #1756 (phase 3); base retargets as the stack merges.

The Docker backend filters with host nftables (phase 3). k8s Pods share a flat L3 network with no bridge to hook, so isolation there is expressed as NetworkPolicy objects, enforced by the cluster CNI.

  • runtime::netpolicy — pure, unit-tested translation of the shared per-instance SG model into one NetworkPolicy per instance: podSelector on the instance's fakecloud-ec2 label; ingress/egress as ipBlock peers + TCP/UDP ports; referenced security groups arrive as member /32s; all-protocols/ICMP ride as "all ports".
  • CniDriver — pluggable CNI abstraction (Calico + Cilium known-enforcing, Unknown otherwise), detected from kube-system component names. Calico first, extendable. When the CNI doesn't enforce NetworkPolicy (e.g. kind's kindnet), policies are still created and a one-time startup warning is logged — graceful degrade, never blocks Pod creation.
  • fakecloud-k8s clientapply_network_policy (delete-then-create), prune_network_policies (reap policies for terminated instances), kube_system_pod_names for CNI detection.
  • The phase-3 reconcile path now dispatches on backend: nftables for Docker, NetworkPolicies for k8s. The shared SG-flatten (InstanceRules) feeds both.

Test plan

  • 5 unit tests: policy translation (selector, ipBlock peers, port ranges, member /32, anywhere/all-proto) + CNI detection/enforcement.
  • kind integration test security_groups_become_network_policies (in the existing k8s_integration feature suite, run by the "K8s backend (kind)" job): a reconcile creates the NetworkPolicy with the right selector + ingress rule, and a reconcile with no instances prunes it.
  • Also fixes the feature-gated k8s integration test's run_instance call to the phase-2 4-arg signature (was only compiled in the kind job).
  • cargo test -p fakecloud-ec2 -p fakecloud-k8s, clippy, fmt clean.

Real enforcement needs a NetworkPolicy-enforcing CNI; kindnet (CI) doesn't, so the kind test asserts policy creation + pruning while the translation correctness is unit-tested.


Summary by cubic

Add Kubernetes NetworkPolicy support to enforce EC2 security groups in the k8s backend, creating one policy per instance with graceful CNI detection and fallback. Docker continues to use nftables; the control plane now dispatches by backend.

  • New Features
    • Added runtime::netpolicy: pure translation from InstanceRules to one NetworkPolicy per instance (selector on the fakecloud-ec2 label; ipBlock peers; TCP/UDP ports; referenced SGs as /32s; ICMP/all-proto as “all ports”).
    • Introduced CniDriver with detection via kube-system components (Calico/Cilium enforce; unknown logs a startup warning). Policies are always created and never block Pod creation.
    • Extended fakecloud-k8s client with apply_network_policy, prune_network_policies, and kube_system_pod_names.
    • Updated reconcile path: Docker -> nftables; k8s -> NetworkPolicies. Control plane uses network_isolation_enforced() and prunes policies for terminated instances.
    • Tests: unit tests for translation and CNI detection; kind integration test asserts policy creation and pruning; fixed k8s test run_instance call signature.

Written for commit 118e3b5. Summary will update on new commits.

Review in cubic

Backing containers all shared the default bridge, so instances in different
VPCs could reach each other and there was no L3 segmentation. Attach each
instance's container to a per-subnet daemon network instead:

- RunInstances computes an `InstanceNetwork { subnet_id, internal }` from the
  resolved subnet (internal = the subnet has no `0.0.0.0/0 -> igw` route) and
  passes it to the runtime.
- The Docker backend ensures `fakecloud-subnet-<id>` exists (idempotent;
  `--internal` for private subnets), labels it `fakecloud-subnet=<id>` plus the
  shared `fakecloud-instance` ownership label so the startup reaper prunes it,
  and attaches the container with `--network`.
- Same-subnet instances share a bridge and can talk; different VPCs/subnets get
  different bridges and cannot route to each other. Network creation is
  best-effort: on failure the instance still boots on the default bridge (no
  regression vs metadata-only).
- k8s pods keep their flat network (isolation there is a NetworkPolicy concern,
  phase 4). Subnet placement is captured in the runtime record so persisted
  instances recover onto the same network after a restart, and so phase-5
  introspection can report the backing network.

Tests: e2e (Docker-gated, hard-fails in CI) proving same-subnet reachability,
cross-VPC isolation (ping passes/fails accordingly), and that private subnets
back onto `--internal` networks while public/default subnets do not.
The per-subnet network arg added for phase-2 changed run_instance's
signature; the feature-gated k8s integration test (only compiled in the kind
CI job) still called the 3-arg form and would fail to compile there.
Phase 2 isolates subnets at L3 but does nothing within a subnet -- SG/NACL
rules still block no traffic. Add a network-driver abstraction that translates
the SG/NACL model into an nftables ruleset and applies it on the host, scoped
to fakecloud's per-subnet bridges.

- runtime::firewall: pure, exhaustively unit-tested renderer turning a
  per-subnet model (instances + their flattened SG ingress/egress + subnet
  NACL denies) into an `inet fakecloud_ec2` nft table -- stateful
  (established,related accept), per-instance allow-then-default-deny, NACL
  denies first. Protocols/ports/CIDRs/icmp/referenced-groups all handled.
- FirewallEnforcer: the driver. nftables when capable, else a degraded no-op.
  Enforcement is opt-in via FAKECLOUD_EC2_SG_ENFORCEMENT and capability-gated
  by an `nft list ruleset` probe; when requested but unbacked (CI, Docker
  Desktop, rootless podman) it warns once and degrades to metadata-only --
  phase-2 isolation still holds, no regression. Apply is an atomic
  `nft -f -` swap of fakecloud's own table (never touches docker's rules).
- service::firewall_model: builds the model across every account partition
  (the host nft table is global) -- referenced security groups expand to
  member /32s so the default SG's allow-from-self works; only running,
  subnet-placed instances are enforced.
- Re-applied on RunInstances (once up), Start/Stop/Terminate, Authorize/Revoke
  ingress/egress, and network-ACL entry/association edits. All reconciles are
  background + skipped entirely when enforcement is disabled (the default).

k8s keeps a disabled enforcer (isolation there is NetworkPolicy, phase 4).

Tests: 11 new unit tests for ruleset rendering + model building; a
Docker-gated e2e proving the degrade path (enforcement requested, no NET_ADMIN
-> instances still boot and same-subnet reachability is unchanged).
…hase 4)

The Docker backend filters traffic with host nftables (phase 3). k8s Pods
share a flat L3 network with no bridge to hook, so isolation there is
expressed as NetworkPolicy objects and enforced by the cluster CNI -- if it
enforces NetworkPolicy at all.

- runtime::netpolicy: pure, unit-tested translation of the shared per-instance
  SG model into one NetworkPolicy per instance (podSelector on the instance's
  `fakecloud-ec2` label; ingress/egress as ipBlock peers + TCP/UDP ports;
  referenced groups arrive as member /32s; all-protocols/ICMP ride as "all").
- CniDriver: pluggable CNI abstraction (Calico + Cilium known-enforcing,
  Unknown otherwise), detected from kube-system component names. Calico first,
  extendable. When the CNI doesn't enforce NetworkPolicy (e.g. kindnet) the
  policies are still created and a one-time startup warning is logged --
  graceful degrade, never blocks Pod creation.
- fakecloud-k8s client: apply_network_policy (delete-then-create),
  prune_network_policies (reap policies for gone instances), kube_system pod
  listing for CNI detection.
- The phase-3 reconcile path now dispatches on backend: nftables for Docker,
  NetworkPolicies for k8s. The shared SG-flatten (InstanceRules) feeds both.

Tests: 5 unit tests for policy translation + CNI detection; a kind
integration test (security_groups_become_network_policies) asserting policies
are created with the right selector/rules and pruned when the instance is gone.
Also fixes the feature-gated k8s integration test's run_instance call to the
phase-2 4-arg signature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant