Skip to content

fix(ec2): honest enforcement status + subnet IP + nft sanitize + serialize reconcile#1764

Merged
vieiralucas merged 2 commits into
mainfrom
worktree-ec2-netiso-fix5-honest-status
Jun 18, 2026
Merged

fix(ec2): honest enforcement status + subnet IP + nft sanitize + serialize reconcile#1764
vieiralucas merged 2 commits into
mainfrom
worktree-ec2-netiso-fix5-honest-status

Conversation

@vieiralucas

@vieiralucas vieiralucas commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Bug-hunt 2026-06-18 findings 1.5, 1.6, 1.7, 2.2, 4.3 (the audit's remaining tail).

  • 1.5 (MED) — nftables enforcement now requires a native-Linux host whose daemon shares this netns (resolve_enforcement_mode gains host_local). On Docker Desktop / podman-machine the bridges live in the daemon's VM, so host nft filters nothing — yet enforced falsely read true. Now degrades with an accurate warning.
  • 1.6 (LOW) — CNI detection scans calico-system / tigera-operator / cilium besides kube-system, so a Tigera-operator / dedicated-namespace Calico/Cilium isn't mis-reported as non-enforcing.
  • 1.7 (LOW) — instance metadata private IP derived from the subnet CIDR (was hard-coded 10.0.0.x outside the subnet); real container IPs still override.
  • 2.2 (LOW) — CIDR/protocol tokens are charset-sanitized before interpolation into the nft -f - script, closing a ruleset-injection surface.
  • 4.3 (LOW) — firewall reconciles serialized behind a per-runtime async mutex so a concurrent k8s apply+prune can't delete a just-applied NetworkPolicy.

Test plan

  • cargo test -p fakecloud-ec2 -p fakecloud-k8s (incl. host-local gate + out-of-range port tests), control-plane e2e green, clippy + fmt clean.

Summary by cubic

Fix false-positive nftables enforcement on non-host-local setups, sanitize nft input, derive instance private IPs from the subnet, broaden CNI detection, and serialize policy reconciles to avoid races. Adds unit tests for sanitization, subnet-derived IPs, and the host-local enforcement gate.

  • Bug Fixes
    • Require a native Linux host whose daemon shares the netns for nftables; if opted-in via FAKECLOUD_EC2_SG_ENFORCEMENT but not host-local or nft/CAP_NET_ADMIN capable, degrade to disabled with a clear warning.
    • Sanitize CIDR [0-9a-fA-F.:/] and protocol tokens [a-z0-9-] before embedding into nft -f -; drop invalid matches only.
    • Derive metadata private IPs from the instance’s subnet CIDR instead of a fixed 10.0.0.x; real container IPs still override when available.
    • Detect CNI by scanning kube-system, calico-system, tigera-operator, and cilium; exposed as cni_component_names() in fakecloud-k8s.
    • Serialize firewall and NetworkPolicy reconciles with a per-runtime async mutex to prevent apply/prune races.

Written for commit 0488455. Summary will update on new commits.

Review in cubic

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

@vieiralucas vieiralucas force-pushed the worktree-ec2-netiso-fix5-honest-status branch 2 times, most recently from 8eaa7d5 to b147f1a Compare June 18, 2026 15:56
…e + serialize reconcile

Bug-hunt 2026-06-18 findings 1.5, 1.6, 1.7, 2.2, 4.3.

- 1.5: nftables enforcement now requires a native-Linux host whose daemon
  shares this network namespace (resolve_enforcement_mode gains host_local).
  On Docker Desktop / podman-machine the per-subnet bridges live in the
  daemon's VM, so host nft filters nothing -- yet the probe passed and
  `enforced` falsely read true. Now degrades with an accurate warning.
- 1.6: CNI detection scans calico-system / tigera-operator / cilium besides
  kube-system, so a Tigera-operator or dedicated-namespace Calico/Cilium isn't
  mis-reported as non-enforcing (cni_component_names).
- 1.7: an instance's metadata private IP is derived from its subnet CIDR (was
  a hard-coded 10.0.0.x outside the subnet); real container IPs still override.
- 2.2: CIDR and protocol tokens from SG params are sanitized before
  interpolation into the `nft -f -` script (charset-restricted), closing a
  ruleset-injection surface.
- 4.3: firewall reconciles are serialized behind a per-runtime async mutex so a
  concurrent k8s apply+prune can't delete a just-applied NetworkPolicy.

Tests: unit tests for the host-local gate and out-of-range/sanitize paths;
control-plane e2e green.
@vieiralucas vieiralucas force-pushed the worktree-ec2-netiso-fix5-honest-status branch from b147f1a to 0488455 Compare June 18, 2026 16:52
@vieiralucas vieiralucas merged commit 44a7e28 into main Jun 18, 2026
53 checks passed
@vieiralucas vieiralucas deleted the worktree-ec2-netiso-fix5-honest-status branch June 18, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant