evpn: ADR-0059 slice 4 — M40 manual smoke against FRR EVPN-MH by lance0 · Pull Request #89 · lance0/rustbgpd

lance0 · 2026-05-13T01:17:05Z

Summary

First end-to-end validation of the slice 3b coordinator + Pass 1b diff path against a real upstream peer. M40 proves that rustbgpd consumes real FRR EVPN-MH routes (Type 1 EAD-per-EVI + Type 2 with shared ESI) and installs an FDB nexthop group on the receiving VTEP via `NDA_NH_ID`. After this lands, aliasing-ECMP graduates from "shipped, kernel-primitive smoke green" (PR #88) to "shipped, FRR-validated against 10.3.1" — the gating evidence for flipping the production-default knob.

Topology

`tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml`:

```
pe-rb (rustbgpd VTEP) ── pe-a (FRR, ES1)
└── pe-c (FRR, ES1)
```

rustbgpd peers with two FRR VTEPs that share ES1 (`es-id 1`, `es-sys-mac aa:bb:cc:dd:ee:ff` — same convention as M32). FRR side reuses the existing `start-frr-vtep-mh.sh` so the bond-as-ES shape (FRR EVPN-MH requires it for the local ES to register) is the same known-good config the M32 reflector test exercises.

Test driver

`scripts/test-m40-evpn-aliasing-ecmp-frr.sh`:

Phase 1 — Inject a test MAC on pe-a's `esdummy` bond. Expect:

FDB row on rustbgpd's `vxlan100` with `nhid N` (decimal; iproute2's `bridge fdb show` always prints decimal NHIDs).
`ip nexthop show` lists `id N group X/Y … fdb` whose two members resolve to pe-a's and pe-c's VTEP IPs.

Phase 2 — Stop pe-c's bgpd. EAD-per-EVI from pe-c withdraws; per the projection invariant (`empty alias_vtep_ips ⇔ alias_group_key.is_none()`) the entry collapses from FDB-NHG to single-dst pointing at pe-a. Expect the row to fall back to `dst pe-a` (no `nhid`) and the group + pe-c member to tear down.

Phase 3 — `bridge fdb del` removes the MAC on pe-a. Expect the row + remaining FDB nexthops for the test VTEPs to disappear.

What's deliberately not in this PR

Per-flow ECMP via real traffic. Kernel hashing + container topology make traffic-based ECMP assertions flaky; PR evpn-linux: ADR-0059 slice 3b — wire FDB-NHG onto the apply path #88's actor-level tests + M40's kernel-programming assertions are sufficient evidence that the data path is wired correctly. A traffic-validation PR is a v1.1 follow-up if operators ask.
Privileged-runner CI gate. Same as M36/M37/M38/M39 — manual / local only. ROADMAP already tracks the privileged-runner gate as a Gate 9 follow-up; M40 lands in the same slot once that exists.
pe-a ↔ pe-c peering. Both FRR nodes treat the ES as local (`evpn mh es-id` makes it local regardless of peering). DF election between them is not tested here; the test focuses on rustbgpd's kernel programming.

Test plan

`cargo fmt --all -- --check`
`cargo clippy --workspace --all-targets -- -D warnings`
`cargo test --workspace --no-fail-fast` (no Rust source changes; pre-existing test set passes — the slice 3b actor-level tests added in PR evpn-linux: ADR-0059 slice 3b — wire FDB-NHG onto the apply path #88 cover the coordinator the new M40 exercises against the kernel)
Bash syntax check on the driver (`bash -n`)
Manual containerlab smoke on the cloudbox (privileged runner):
```bash
docker build -t rustbgpd:dev .
containerlab deploy -t tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml
bash tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh
containerlab destroy -t tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml
```

References

ADR-0059 (merged docs: ADR-0059 — EVPN aliasing dataplane via FDB nexthop groups #83) — design
Slice 1 (evpn: ADR-0059 slice 1 — RemoteMacEntry alias_group_key + projection AF guard #84) — portable intent (`alias_group_key`)
Slice 2 (evpn-linux: ADR-0059 slice 2 — nexthop_raw raw-netlink FDB-NHG primitive #86) — `nexthop_raw` netlink primitive
Slice 3a (evpn-linux: ADR-0059 slice 3a — FDB-NHG state types + apply primitive + CVE guard #87) — state types + apply primitive + CVE-2025-39851 guard
Slice 3b (evpn-linux: ADR-0059 slice 3b — wire FDB-NHG onto the apply path #88) — diff Pass 1b + coordinator + adoption + actor tests
M32 — sibling FRR EVPN-MH smoke (reused FRR setup helper + ES convention)
iproute2 `bridge/fdb.c` (NHID print path), `ip/ipnexthop.c` (group print path) — wire format the assertions match

First end-to-end validation of the slice 3b coordinator + Pass 1b diff path against a real upstream peer. After this, aliasing-ECMP moves from "shipped, kernel-primitive smoke green" to "shipped, FRR-validated against 10.3.1" — gating evidence to flip the production-default knob. Topology (`tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml`): pe-rb (rustbgpd VTEP) ── pe-a (FRR, ES1) └── pe-c (FRR, ES1) rustbgpd peers with two FRR VTEPs that share ES1 (es-id 1, es-sys-mac aa:bb:cc:dd:ee:ff — same as M32). The FRR side reuses `start-frr-vtep-mh.sh` so the bond-as-ES shape (FRR requires it for EVPN-MH to register the local ES) is the same known-good config the M32 reflector test exercises. Driver (`scripts/test-m40-evpn-aliasing-ecmp-frr.sh`): Phase 1 — Inject a test MAC on pe-a's `esdummy` bond. Expect: - FDB row on rustbgpd's `vxlan100` with `nhid N` (decimal — iproute2's `bridge fdb show` always prints decimal NHIDs). - `ip nexthop show` lists `id N group X/Y … fdb` whose two members resolve to pe-a's and pe-c's VTEP IPs. Phase 2 — Stop pe-c's bgpd. EAD-per-EVI from pe-c withdraws; per the projection invariant (`empty alias_vtep_ips ⇔ alias_group_key.is_none()`) the entry collapses from FDB-NHG to single-dst pointing at pe-a. Expect the row to fall back to `dst pe-a` (no `nhid`) and the group + pe-c member to be torn down. Phase 3 — `bridge fdb del` removes the MAC on pe-a. Expect the row + remaining FDB nexthops for the test VTEPs to disappear. Manual / local only — containerlab needs privileges CI doesn't carry. M39's privileged-runner gate (ROADMAP follow-up) is what brings this into automated coverage; until then the test runs the same way M36 / M37 / M38 / M39 do. Files: clab topology, rustbgpd VTEP TOML, two FRR confs, the driver. The rustbgpd-side bridge + VXLAN topology is set up by the existing generic `start-rustbgpd-vtep.sh` (parameterized by loopback IP + VNI). ROADMAP flips slice 3b → shipped and slice 4 → in-flight; CHANGELOG `[Unreleased] ### Added` gets the slice 4 entry.

Copilot

Pull request overview

Adds the M40 manual containerlab interop smoke to validate ADR-0059 aliasing-ECMP (FDB nexthop groups via NDA_NH_ID) against real FRR EVPN-MH, plus documentation updates recording slice-4 status.

Changes:

Adds a new containerlab topology for rustbgpd ↔ 2× FRR EVPN-MH peers sharing an ES.
Adds a new M40 bash driver that injects/withdraws a MAC and asserts FDB-NHG programming via bridge fdb show + ip nexthop show.
Updates ROADMAP.md and CHANGELOG.md to reflect slice-4 progress.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh	New M40 driver script that validates FDB-NHG install, drain-to-single-dst, and teardown.
tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml	New containerlab topology wiring rustbgpd to two FRR EVPN-MH VTEPs.
tests/interop/configs/rustbgpd-m40-vtep.toml	rustbgpd config for M40 EVPN VTEP + two EVPN neighbors.
tests/interop/configs/frr-bgpd-m40-vtep-a.conf	FRR pe-a config: ES1 + EVPN BGP session to rustbgpd.
tests/interop/configs/frr-bgpd-m40-vtep-c.conf	FRR pe-c config: ES1 (alias-only) + EVPN BGP session to rustbgpd.
ROADMAP.md	Marks slice 4 (M40) as in-flight and updates status text for ADR-0059.
CHANGELOG.md	Adds an Unreleased entry describing the new M40 manual smoke and what it validates.

Comments suppressed due to low confidence (1)

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:88

The topology already starts rustbgpd via /usr/local/bin/start-rustbgpd-vtep.sh in the containerlab exec section, but the driver also calls start_rustbgpd here. That will try to launch a second rustbgpd process (and by default uses /usr/local/bin/start-rustbgpd.sh, which isn’t mounted in this topology), making the smoke unreliable. Pick a single owner for daemon startup: either remove this start_rustbgpd call, or stop auto-starting in the .clab.yml and rely on the driver.

resolve_grpc_addr
start_rustbgpd

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+# shellcheck source=test-lib.sh
+source "$SCRIPT_DIR/test-lib.sh"
+
+PE_RB="clab-${TOPO}-pe-rb"


+log "[phase 2] stop pe-c — withdrawal of EAD-per-EVI must drain the group"
+docker exec "$PE_C" sh -c "pkill -TERM bgpd || true; sleep 1; pkill -KILL bgpd || true" \
+    >/dev/null 2>&1
+ok "pe-c bgpd stopped"


+        - ip link set eth1 up
+        - ip addr add 10.0.1.1/24 dev eth2
+        - ip link set eth2 up
+        - /usr/local/bin/start-rustbgpd-vtep.sh 10.0.0.1 100


Three findings on the slice 4 PR (#89): 1. `test-lib.sh::preflight` runs at source time and expects the rustbgpd container at `clab-${TOPO}-rustbgpd`. M40's node is `pe-rb`, so the source-time check would fail and the driver would exit before any assertion ran. Override `RUSTBGPD=$PE_RB` + `export RUSTBGPD` before sourcing the lib — same pattern M39 uses for its `pe1` node. 2. Phase 2 used `pkill -TERM/-KILL bgpd` inside pe-c, but FRR's `watchfrr` would auto-restart bgpd a second later and re-originate EAD-per-EVI, racing the drain assertion. Switch to `docker stop --time 5 pe-c` so watchfrr + bgpd go down together — M1 relies on the auto-restart behavior; M40 needs the opposite (deterministic drain). 3. The clab `exec` block already runs `start-rustbgpd-vtep.sh` which launches rustbgpd in the background. The driver was ALSO calling `start_rustbgpd` from test-lib, which would spawn a second daemon competing for the same gRPC port. Match the M36 pattern: only call `resolve_grpc_addr` + poll `grpc_health` until ready; don't double-spawn.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:156

In the wait_for_rb predicate, $TEST_MAC is wrapped in single quotes inside the sh -c string, so it will not be expanded by the outer shell and (unless exported into the container) will be treated as the literal string $TEST_MAC inside the container. This will prevent the predicate from ever matching the MAC and cause the wait to time out. Prefer embedding the expanded value (or passing args without sh -c) so the grep sees the actual MAC.

wait_for_rb \
    "FDB row for $TEST_MAC has nhid" \
    "bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -E ' nhid [0-9]+'" \
    45

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:235

Same quoting issue as phase 1: $TEST_MAC and $PE_A_IP are inside single quotes in the sh -c command string, so they won’t expand and the wait_for_rb predicate can never succeed. Remove the single quotes (or otherwise ensure the host expands these values before passing into the container).

wait_for_rb \
    "FDB row falls back to single-dst (no nhid)" \
    "bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -vE ' nhid [0-9]+' | grep -E ' dst $PE_A_IP'" \
    45

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:259

Same quoting issue as earlier phases: the wait_for_rb predicate uses '$TEST_MAC', which won’t expand (and isn’t exported into the container), so the negated grep will not reliably assert the MAC is absent. Expand the MAC value before sending the command to sh -c (or avoid sh -c).

wait_for_rb \
    "FDB row for $TEST_MAC removed from rustbgpd's bridge FDB" \
    "! bridge fdb show dev $VXLAN | grep -qi '$TEST_MAC'" \
    45

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:118

wait_frr_established already emits ok/fail and updates the pass/fail counters. Wrapping it with && ok ... || fail ... double-counts results and can report misleading totals. Call wait_frr_established directly and rely on its return code if you need to abort.

log "[baseline] both FRR peers Established to rustbgpd"
wait_frr_established "$PE_A" "10.0.0.1" "pe-a EVPN" \
    && ok "pe-a Established" || fail "pe-a not Established"
wait_frr_established "$PE_C" "10.0.1.1" "pe-c EVPN" \
    && ok "pe-c Established" || fail "pe-c not Established"

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:186

The ip nexthop show parsing assumes member IDs are strictly decimal (group [0-9/]+), but elsewhere in the repo the netns FDB-NHG tests explicitly allow iproute2 to print member IDs in hex (e.g. group 0x30000001). To make this interop script resilient across iproute2 versions, accept both decimal and 0x... forms and strip the 0x prefix before matching member lines.

nh_dump=$(nexthop_show)
group_line=$(echo "$nh_dump" | grep -E "^id $nhid group [0-9/]+ .*fdb" || true)
if [ -n "$group_line" ]; then
    ok "group line present: $group_line"
else
    fail "no group line for nhid=$nhid in ip nexthop show"
    echo "--- nh_dump ---" >&2
    echo "$nh_dump" | head -30 >&2
fi

# Extract member IDs from the group line: `group 12/13 ...` → 12 and 13.
member_ids=$(echo "$group_line" \
    | grep -oE 'group [0-9/]+' \
    | awk '{print $2}' \
    | tr '/' ' ')

+        break
+    fi
+    if [ "$i" -eq 15 ]; then
+        fail "gRPC endpoint not reachable within 30s"


+#   Shared ESI Type 0 derived from es-sys-mac aa:bb:cc:dd:ee:ff +
+#   es-id 1 (matches the M32 convention). pe-a and pe-c don't
+#   peer directly with each other — we don't need their Type 4


Two findings on `bd5d69c`: 1. ESI type comment in `m40-evpn-aliasing-ecmp-frr.clab.yml` said "Type 0" but the test asserts on a leading `03:` (Type 3 MAC-based ESI per RFC 7432 §5; same as `start-frr-vtep-mh.sh` documents). Carried over wrong from M32's clab comment. Corrected. 2. Hard prereqs in the driver called `fail` but continued running, producing cascading false-negative assertions in subsequent phases. The four checkpoints whose failure invalidates every downstream phase (gRPC ready, Phase-1 FDB-row-with-nhid, Phase-2 single-dst-fallback, Phase-3 FDB-row-removed) now `print_summary` (which exits non-zero) on failure. The in-phase soft assertions still accumulate per the M-series convention so the operator sees all simultaneous breakage in a single run.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

+log "[phase 1] ip nexthop show has the matching group with two members"
+# Group line shape: `id <gid> group <mid>/<mid> ... fdb`. Members
+# print as `id <mid> via <ip> fdb`. We assert:
+#   - the group exists with our nhid
+#   - exactly two members via pe-a's IP and pe-c's IP
+nh_dump=$(nexthop_show)


+docker stop --time 5 "$PE_C" >/dev/null 2>&1 \
+    && ok "pe-c container stopped" \
+    || fail "could not stop pe-c container"


+fdb_row=$(fdb_show_test_mac)
+nhid=$(echo "$fdb_row" | grep -oE ' nhid [0-9]+' | awk '{print $2}' | head -1)
+if [ -z "$nhid" ]; then
+    fail "could not parse nhid from FDB row: $fdb_row"


Three findings on `d47a4d0`: 1. Member-count assertion was permissive: checked that both expected VTEP IPs appeared in the group but tolerated extra members. Now asserts exactly 2 distinct gateways covering pe-a + pe-c; an unexpected third member trips the check. 2. `docker stop pe-c` failure was soft-failed (continued into the drain assertions even though the prerequisite withdrawal never happened). Now `print_summary` on failure — Phase 2 is meaningless without it. 3. `nhid` parse failure left `\$nhid` empty, so the next `^id group ...` grep would match anything starting with `id `. Hard-exit if the parse fails so the cascade can't fire.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:202

Because this script sources test-lib.sh (which enables set -euo pipefail), this command substitution will cause the whole script to exit early if grep finds no match (e.g., if ip nexthop show output format differs or the group line is temporarily absent). Add || true / a non-failing parser (e.g., awk) so the script can fall through to the explicit if [ -n ... ] failure path and still print a summary.

# Extract member IDs from the group line: `group 12/13 ...` → 12 and 13.
member_ids=$(echo "$group_line" \
    | grep -oE 'group [0-9/]+' \
    | awk '{print $2}' \
    | tr '/' ' ')
if [ -n "$member_ids" ]; then
    ok "parsed group member IDs: $member_ids"
else
    fail "could not parse group member IDs from: $group_line"
fi

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:219

Same strict-mode issue here: if a member line doesn’t match, grep -oE 'via ...' (and the grep -v / grep -c pipelines below) can return exit code 1 inside command substitutions, and set -e will abort the script before it can report a structured failure. Make these parsing pipelines non-fatal (e.g., append || true, or use awk/wc -l for counting).

    mline=$(echo "$nh_dump" | grep -E "^id $mid via [0-9.]+ .*fdb" || true)
    mip=$(echo "$mline" | grep -oE 'via [0-9.]+' | awk '{print $2}')
    if [ -n "$mip" ]; then
        member_ips="$member_ips $mip"
    fi
done
# Sort + dedupe so we count distinct gateways (one IP shouldn't
# appear twice in a healthy group; the count check below rules
# that out).
member_ip_list=$(echo "$member_ips" | tr ' ' '\n' | sort -u | grep -v '^$')
member_ip_count=$(echo "$member_ip_list" | grep -c .)
if [ "$member_ip_count" -eq 2 ] \

+fdb_row=$(fdb_show_test_mac)
+nhid=$(echo "$fdb_row" | grep -oE ' nhid [0-9]+' | awk '{print $2}' | head -1)
+if [ -z "$nhid" ]; then
+    fail "could not parse nhid from FDB row: $fdb_row"
+    print_summary
+fi


`test-lib.sh` sets `set -euo pipefail`; the captured-grep pipelines in the driver could trip `inherit_errexit` (one `shopt` away) and abort the script before the explicit empty-checks could handle the no-match path. Add defensive `|| true` to every captured grep pipeline: - nhid parse out of the FDB row - group-line member-ID list parse - per-member `via <ip>` extraction - dedupe + count of distinct gateways Each one is paired with an explicit `[ -z ... ]` / count check that already covers the failure mode; the `|| true` keeps the no-match path flowing into that check instead of taking down the script.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:266

Same quoting issue as Phase 1: grep -i '$TEST_MAC' searches for the literal $TEST_MAC rather than the actual MAC, so this Phase 2 wait condition can never succeed. Use a pattern that expands to the MAC value (or export TEST_MAC into the container exec environment).

wait_for_rb \
    "FDB row falls back to single-dst (no nhid)" \
    "bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -vE ' nhid [0-9]+' | grep -E ' dst $PE_A_IP'" \
    45 \
    || { print_summary; }

tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:291

Same quoting issue as earlier phases: grep -qi '$TEST_MAC' matches the literal $TEST_MAC, so this wait condition will not detect the MAC being removed. Expand the variable in the host script (or export it) before passing the command to sh -c.

wait_for_rb \
    "FDB row for $TEST_MAC removed from rustbgpd's bridge FDB" \
    "! bridge fdb show dev $VXLAN | grep -qi '$TEST_MAC'" \
    45 \
    || { print_summary; }

+# bail out on failure rather than emit cascading false negatives.
+wait_for_rb \
+    "FDB row for $TEST_MAC has nhid" \
+    "bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -E ' nhid [0-9]+'" \


+wait_frr_established "$PE_A" "10.0.0.1" "pe-a EVPN" \
+    && ok "pe-a Established" || fail "pe-a not Established"
+wait_frr_established "$PE_C" "10.0.1.1" "pe-c EVPN" \
+    && ok "pe-c Established" || fail "pe-c not Established"


One legit finding fixed; one false positive skipped. Fixed: `wait_frr_established` already calls `ok` / `fail` and increments the pass/fail counters internally. The `&& ok ... || fail ...` wrapper M32 / older M-series scripts use double-counts the result. Dispatch the helper directly and use `|| print_summary` to abort on a failed prereq instead. (The existing sibling tests have the same bug; cleanup in a separate pass.) Skipped (false positive): Copilot flagged `grep -i '$TEST_MAC'` inside the `sh -c` predicate as not expanding `$TEST_MAC`. The nested single quotes are inside an outer double-quoted string, so bash expands `$TEST_MAC` on the host before the string ever reaches `sh -c`. Verified with `bash -c 'str="...'\''$VAR'\''..."; echo "$str"` — variable is interpolated, single quotes survive as literals to protect the MAC's colons from word-splitting in `sh -c`. The quoting is correct. This is round 5/5 — per the project memory rule capping Copilot loops at ~5 rounds before switching to deep human review, no further Copilot rounds will be requested. Branch ready for human review.

lance0 requested a review from Copilot May 13, 2026 01:17

Copilot started reviewing on behalf of lance0 May 13, 2026 01:17 View session