Skip to content

feat(ec2): security-group + NACL nftables enforcement (#1745 phase 3)#1756

Open
vieiralucas wants to merge 3 commits into
mainfrom
worktree-ec2-netiso-batch3-sg-nftables
Open

feat(ec2): security-group + NACL nftables enforcement (#1745 phase 3)#1756
vieiralucas wants to merge 3 commits into
mainfrom
worktree-ec2-netiso-batch3-sg-nftables

Conversation

@vieiralucas

@vieiralucas vieiralucas commented Jun 17, 2026

Copy link
Copy Markdown
Member

Summary

Phase 3 of EC2 real network isolation (#1745). Stacked on #1755 (phase 2) -> #1754 (phase 1); bases retarget as each merges.

Phase 2 isolates subnets at L3 but does nothing within a subnet — SG/NACL rules block no traffic. This adds a network-driver abstraction that translates the SG/NACL model into an nftables ruleset and applies it on the host, scoped to fakecloud's per-subnet bridges.

  • runtime::firewall — pure, exhaustively unit-tested renderer: a per-subnet model (instances + flattened SG ingress/egress + subnet NACL denies) -> an inet fakecloud_ec2 nft table. Stateful (established,related accept), per-instance allow-then-default-deny, NACL denies first. Protocols/ports/CIDRs/icmp/referenced-groups handled.
  • FirewallEnforcer — the driver. nftables when capable, else a degraded no-op. Opt-in via FAKECLOUD_EC2_SG_ENFORCEMENT, capability-gated by an nft list ruleset probe. When requested but unbacked (CI, Docker Desktop, rootless podman) it warns once and degrades to metadata-only — phase-2 isolation still holds, no regression. Apply is an atomic nft -f - swap of fakecloud's own table (never touches docker's rules).
  • service::firewall_model — builds the model across every account partition (the host nft table is global); referenced security groups expand to member /32s so the default SG's allow-from-self works; only running, subnet-placed instances are enforced.
  • Re-applied on RunInstances (once up), Start/Stop/Terminate, Authorize/Revoke ingress/egress, and network-ACL entry/association edits — all in the background, all skipped when enforcement is disabled.

k8s keeps a disabled enforcer (isolation there is NetworkPolicy, phase 4).

Test plan

  • 11 new unit tests: nft ruleset rendering (allow/deny, protocols, ports, CIDR//32, icmp, NACL ordering, opt-in/capability gating) + model building (CIDR + referenced-group expansion, pending exclusion).
  • crates/fakecloud-e2e/tests/ec2_sg_enforcement.rs (Docker-gated): the degrade path — enforcement requested, no NET_ADMIN -> instances still boot and same-subnet reachability is unchanged.
  • cargo test -p fakecloud-ec2 (38 pass), clippy + fmt clean.

Enforcement-on (real packet drops) needs nftables + CAP_NET_ADMIN, which CI lacks; that path is covered by the unit-tested ruleset.


Summary by cubic

Add host-level enforcement of EC2 security groups and NACLs via nftables, scoped to per-subnet bridges. This completes phase 3 of real network isolation (#1745), gated by FAKECLOUD_EC2_SG_ENFORCEMENT, and degrades cleanly when nft/CAP_NET_ADMIN are unavailable.

  • New Features

    • runtime::firewall: renderer from a per-subnet model to an inet fakecloud_ec2 nft table; stateful (established,related accept), per-instance allow-then-default-deny; NACL denies first.
    • FirewallEnforcer: picks nftables or no-op via capability probe; applies with atomic nft -f - swaps; warns once when requested but unavailable; never touches Docker rules.
    • service::firewall_model: builds the model across all accounts; expands referenced groups to member /32s; only running, subnet-placed instances are enforced.
    • Automatic reconcile after RunInstances start, Start/Stop/Terminate, SG authorize/revoke, and NACL entry/association changes; runs in the background; k8s backend keeps enforcement disabled.
  • Migration

    • Enable by setting FAKECLOUD_EC2_SG_ENFORCEMENT=1 on hosts with nft and CAP_NET_ADMIN.
    • If requested but unavailable (e.g., CI, Docker Desktop, rootless podman), it logs once and falls back to metadata-only; phase-2 isolation is unchanged.
    • No action needed if you leave enforcement disabled.

Written for commit 39117a4. Summary will update on new commits.

Review in cubic

@vieiralucas vieiralucas force-pushed the worktree-ec2-netiso-batch3-sg-nftables branch from 8741e73 to 37592c0 Compare June 17, 2026 11:52
@vieiralucas vieiralucas force-pushed the worktree-ec2-netiso-batch2-subnet-networks branch from ee40f92 to 5b8c1af Compare June 17, 2026 14:56
Base automatically changed from worktree-ec2-netiso-batch2-subnet-networks to main June 17, 2026 18:38
Phase 2 isolates subnets at L3 but does nothing within a subnet -- SG/NACL
rules still block no traffic. Add a network-driver abstraction that translates
the SG/NACL model into an nftables ruleset and applies it on the host, scoped
to fakecloud's per-subnet bridges.

- runtime::firewall: pure, exhaustively unit-tested renderer turning a
  per-subnet model (instances + their flattened SG ingress/egress + subnet
  NACL denies) into an `inet fakecloud_ec2` nft table -- stateful
  (established,related accept), per-instance allow-then-default-deny, NACL
  denies first. Protocols/ports/CIDRs/icmp/referenced-groups all handled.
- FirewallEnforcer: the driver. nftables when capable, else a degraded no-op.
  Enforcement is opt-in via FAKECLOUD_EC2_SG_ENFORCEMENT and capability-gated
  by an `nft list ruleset` probe; when requested but unbacked (CI, Docker
  Desktop, rootless podman) it warns once and degrades to metadata-only --
  phase-2 isolation still holds, no regression. Apply is an atomic
  `nft -f -` swap of fakecloud's own table (never touches docker's rules).
- service::firewall_model: builds the model across every account partition
  (the host nft table is global) -- referenced security groups expand to
  member /32s so the default SG's allow-from-self works; only running,
  subnet-placed instances are enforced.
- Re-applied on RunInstances (once up), Start/Stop/Terminate, Authorize/Revoke
  ingress/egress, and network-ACL entry/association edits. All reconciles are
  background + skipped entirely when enforcement is disabled (the default).

k8s keeps a disabled enforcer (isolation there is NetworkPolicy, phase 4).

Tests: 11 new unit tests for ruleset rendering + model building; a
Docker-gated e2e proving the degrade path (enforcement requested, no NET_ADMIN
-> instances still boot and same-subnet reachability is unchanged).
@vieiralucas vieiralucas force-pushed the worktree-ec2-netiso-batch3-sg-nftables branch from 37592c0 to 5e4e1c2 Compare June 17, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant