evpn: ADR-0059 slice 4 — M40 manual smoke against FRR EVPN-MH#89
evpn: ADR-0059 slice 4 — M40 manual smoke against FRR EVPN-MH#89lance0 wants to merge 6 commits into
Conversation
First end-to-end validation of the slice 3b coordinator + Pass 1b
diff path against a real upstream peer. After this, aliasing-ECMP
moves from "shipped, kernel-primitive smoke green" to "shipped,
FRR-validated against 10.3.1" — gating evidence to flip the
production-default knob.
Topology (`tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml`):
pe-rb (rustbgpd VTEP) ── pe-a (FRR, ES1)
└── pe-c (FRR, ES1)
rustbgpd peers with two FRR VTEPs that share ES1 (es-id 1,
es-sys-mac aa:bb:cc:dd:ee:ff — same as M32). The FRR side
reuses `start-frr-vtep-mh.sh` so the bond-as-ES shape (FRR
requires it for EVPN-MH to register the local ES) is the same
known-good config the M32 reflector test exercises.
Driver (`scripts/test-m40-evpn-aliasing-ecmp-frr.sh`):
Phase 1 — Inject a test MAC on pe-a's `esdummy` bond. Expect:
- FDB row on rustbgpd's `vxlan100` with `nhid N` (decimal —
iproute2's `bridge fdb show` always prints decimal NHIDs).
- `ip nexthop show` lists `id N group X/Y … fdb` whose two
members resolve to pe-a's and pe-c's VTEP IPs.
Phase 2 — Stop pe-c's bgpd. EAD-per-EVI from pe-c withdraws;
per the projection invariant (`empty alias_vtep_ips ⇔
alias_group_key.is_none()`) the entry collapses from
FDB-NHG to single-dst pointing at pe-a. Expect the row to
fall back to `dst pe-a` (no `nhid`) and the group + pe-c
member to be torn down.
Phase 3 — `bridge fdb del` removes the MAC on pe-a. Expect the
row + remaining FDB nexthops for the test VTEPs to disappear.
Manual / local only — containerlab needs privileges CI doesn't
carry. M39's privileged-runner gate (ROADMAP follow-up) is what
brings this into automated coverage; until then the test runs
the same way M36 / M37 / M38 / M39 do.
Files: clab topology, rustbgpd VTEP TOML, two FRR confs, the
driver. The rustbgpd-side bridge + VXLAN topology is set up by
the existing generic `start-rustbgpd-vtep.sh` (parameterized by
loopback IP + VNI). ROADMAP flips slice 3b → shipped and slice
4 → in-flight; CHANGELOG `[Unreleased] ### Added` gets the slice
4 entry.
There was a problem hiding this comment.
Pull request overview
Adds the M40 manual containerlab interop smoke to validate ADR-0059 aliasing-ECMP (FDB nexthop groups via NDA_NH_ID) against real FRR EVPN-MH, plus documentation updates recording slice-4 status.
Changes:
- Adds a new containerlab topology for rustbgpd ↔ 2× FRR EVPN-MH peers sharing an ES.
- Adds a new M40 bash driver that injects/withdraws a MAC and asserts FDB-NHG programming via
bridge fdb show+ip nexthop show. - Updates
ROADMAP.mdandCHANGELOG.mdto reflect slice-4 progress.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh | New M40 driver script that validates FDB-NHG install, drain-to-single-dst, and teardown. |
| tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml | New containerlab topology wiring rustbgpd to two FRR EVPN-MH VTEPs. |
| tests/interop/configs/rustbgpd-m40-vtep.toml | rustbgpd config for M40 EVPN VTEP + two EVPN neighbors. |
| tests/interop/configs/frr-bgpd-m40-vtep-a.conf | FRR pe-a config: ES1 + EVPN BGP session to rustbgpd. |
| tests/interop/configs/frr-bgpd-m40-vtep-c.conf | FRR pe-c config: ES1 (alias-only) + EVPN BGP session to rustbgpd. |
| ROADMAP.md | Marks slice 4 (M40) as in-flight and updates status text for ADR-0059. |
| CHANGELOG.md | Adds an Unreleased entry describing the new M40 manual smoke and what it validates. |
Comments suppressed due to low confidence (1)
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:88
- The topology already starts rustbgpd via
/usr/local/bin/start-rustbgpd-vtep.shin the containerlabexecsection, but the driver also callsstart_rustbgpdhere. That will try to launch a second rustbgpd process (and by default uses/usr/local/bin/start-rustbgpd.sh, which isn’t mounted in this topology), making the smoke unreliable. Pick a single owner for daemon startup: either remove thisstart_rustbgpdcall, or stop auto-starting in the .clab.yml and rely on the driver.
resolve_grpc_addr
start_rustbgpd
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" | ||
| # shellcheck source=test-lib.sh | ||
| source "$SCRIPT_DIR/test-lib.sh" | ||
|
|
||
| PE_RB="clab-${TOPO}-pe-rb" |
| log "[phase 2] stop pe-c — withdrawal of EAD-per-EVI must drain the group" | ||
| docker exec "$PE_C" sh -c "pkill -TERM bgpd || true; sleep 1; pkill -KILL bgpd || true" \ | ||
| >/dev/null 2>&1 | ||
| ok "pe-c bgpd stopped" |
| - ip link set eth1 up | ||
| - ip addr add 10.0.1.1/24 dev eth2 | ||
| - ip link set eth2 up | ||
| - /usr/local/bin/start-rustbgpd-vtep.sh 10.0.0.1 100 |
Three findings on the slice 4 PR (#89): 1. `test-lib.sh::preflight` runs at source time and expects the rustbgpd container at `clab-${TOPO}-rustbgpd`. M40's node is `pe-rb`, so the source-time check would fail and the driver would exit before any assertion ran. Override `RUSTBGPD=$PE_RB` + `export RUSTBGPD` before sourcing the lib — same pattern M39 uses for its `pe1` node. 2. Phase 2 used `pkill -TERM/-KILL bgpd` inside pe-c, but FRR's `watchfrr` would auto-restart bgpd a second later and re-originate EAD-per-EVI, racing the drain assertion. Switch to `docker stop --time 5 pe-c` so watchfrr + bgpd go down together — M1 relies on the auto-restart behavior; M40 needs the opposite (deterministic drain). 3. The clab `exec` block already runs `start-rustbgpd-vtep.sh` which launches rustbgpd in the background. The driver was ALSO calling `start_rustbgpd` from test-lib, which would spawn a second daemon competing for the same gRPC port. Match the M36 pattern: only call `resolve_grpc_addr` + poll `grpc_health` until ready; don't double-spawn.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (5)
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:156
- In the
wait_for_rbpredicate,$TEST_MACis wrapped in single quotes inside thesh -cstring, so it will not be expanded by the outer shell and (unless exported into the container) will be treated as the literal string$TEST_MACinside the container. This will prevent the predicate from ever matching the MAC and cause the wait to time out. Prefer embedding the expanded value (or passing args withoutsh -c) so the grep sees the actual MAC.
wait_for_rb \
"FDB row for $TEST_MAC has nhid" \
"bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -E ' nhid [0-9]+'" \
45
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:235
- Same quoting issue as phase 1:
$TEST_MACand$PE_A_IPare inside single quotes in thesh -ccommand string, so they won’t expand and thewait_for_rbpredicate can never succeed. Remove the single quotes (or otherwise ensure the host expands these values before passing into the container).
wait_for_rb \
"FDB row falls back to single-dst (no nhid)" \
"bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -vE ' nhid [0-9]+' | grep -E ' dst $PE_A_IP'" \
45
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:259
- Same quoting issue as earlier phases: the
wait_for_rbpredicate uses'$TEST_MAC', which won’t expand (and isn’t exported into the container), so the negated grep will not reliably assert the MAC is absent. Expand the MAC value before sending the command tosh -c(or avoidsh -c).
wait_for_rb \
"FDB row for $TEST_MAC removed from rustbgpd's bridge FDB" \
"! bridge fdb show dev $VXLAN | grep -qi '$TEST_MAC'" \
45
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:118
wait_frr_establishedalready emitsok/failand updates the pass/fail counters. Wrapping it with&& ok ... || fail ...double-counts results and can report misleading totals. Callwait_frr_establisheddirectly and rely on its return code if you need to abort.
log "[baseline] both FRR peers Established to rustbgpd"
wait_frr_established "$PE_A" "10.0.0.1" "pe-a EVPN" \
&& ok "pe-a Established" || fail "pe-a not Established"
wait_frr_established "$PE_C" "10.0.1.1" "pe-c EVPN" \
&& ok "pe-c Established" || fail "pe-c not Established"
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:186
- The
ip nexthop showparsing assumes member IDs are strictly decimal (group [0-9/]+), but elsewhere in the repo the netns FDB-NHG tests explicitly allow iproute2 to print member IDs in hex (e.g.group 0x30000001). To make this interop script resilient across iproute2 versions, accept both decimal and0x...forms and strip the0xprefix before matching member lines.
nh_dump=$(nexthop_show)
group_line=$(echo "$nh_dump" | grep -E "^id $nhid group [0-9/]+ .*fdb" || true)
if [ -n "$group_line" ]; then
ok "group line present: $group_line"
else
fail "no group line for nhid=$nhid in ip nexthop show"
echo "--- nh_dump ---" >&2
echo "$nh_dump" | head -30 >&2
fi
# Extract member IDs from the group line: `group 12/13 ...` → 12 and 13.
member_ids=$(echo "$group_line" \
| grep -oE 'group [0-9/]+' \
| awk '{print $2}' \
| tr '/' ' ')
| break | ||
| fi | ||
| if [ "$i" -eq 15 ]; then | ||
| fail "gRPC endpoint not reachable within 30s" |
| # Shared ESI Type 0 derived from es-sys-mac aa:bb:cc:dd:ee:ff + | ||
| # es-id 1 (matches the M32 convention). pe-a and pe-c don't | ||
| # peer directly with each other — we don't need their Type 4 |
Two findings on `bd5d69c`: 1. ESI type comment in `m40-evpn-aliasing-ecmp-frr.clab.yml` said "Type 0" but the test asserts on a leading `03:` (Type 3 MAC-based ESI per RFC 7432 §5; same as `start-frr-vtep-mh.sh` documents). Carried over wrong from M32's clab comment. Corrected. 2. Hard prereqs in the driver called `fail` but continued running, producing cascading false-negative assertions in subsequent phases. The four checkpoints whose failure invalidates every downstream phase (gRPC ready, Phase-1 FDB-row-with-nhid, Phase-2 single-dst-fallback, Phase-3 FDB-row-removed) now `print_summary` (which exits non-zero) on failure. The in-phase soft assertions still accumulate per the M-series convention so the operator sees all simultaneous breakage in a single run.
| log "[phase 1] ip nexthop show has the matching group with two members" | ||
| # Group line shape: `id <gid> group <mid>/<mid> ... fdb`. Members | ||
| # print as `id <mid> via <ip> fdb`. We assert: | ||
| # - the group exists with our nhid | ||
| # - exactly two members via pe-a's IP and pe-c's IP | ||
| nh_dump=$(nexthop_show) |
| docker stop --time 5 "$PE_C" >/dev/null 2>&1 \ | ||
| && ok "pe-c container stopped" \ | ||
| || fail "could not stop pe-c container" |
| fdb_row=$(fdb_show_test_mac) | ||
| nhid=$(echo "$fdb_row" | grep -oE ' nhid [0-9]+' | awk '{print $2}' | head -1) | ||
| if [ -z "$nhid" ]; then | ||
| fail "could not parse nhid from FDB row: $fdb_row" |
Three findings on `d47a4d0`: 1. Member-count assertion was permissive: checked that both expected VTEP IPs appeared in the group but tolerated extra members. Now asserts exactly 2 distinct gateways covering pe-a + pe-c; an unexpected third member trips the check. 2. `docker stop pe-c` failure was soft-failed (continued into the drain assertions even though the prerequisite withdrawal never happened). Now `print_summary` on failure — Phase 2 is meaningless without it. 3. `nhid` parse failure left `\$nhid` empty, so the next `^id group ...` grep would match anything starting with `id `. Hard-exit if the parse fails so the cascade can't fire.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:202
- Because this script sources
test-lib.sh(which enablesset -euo pipefail), this command substitution will cause the whole script to exit early ifgrepfinds no match (e.g., ifip nexthop showoutput format differs or the group line is temporarily absent). Add|| true/ a non-failing parser (e.g.,awk) so the script can fall through to the explicitif [ -n ... ]failure path and still print a summary.
# Extract member IDs from the group line: `group 12/13 ...` → 12 and 13.
member_ids=$(echo "$group_line" \
| grep -oE 'group [0-9/]+' \
| awk '{print $2}' \
| tr '/' ' ')
if [ -n "$member_ids" ]; then
ok "parsed group member IDs: $member_ids"
else
fail "could not parse group member IDs from: $group_line"
fi
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:219
- Same strict-mode issue here: if a member line doesn’t match,
grep -oE 'via ...'(and thegrep -v/grep -cpipelines below) can return exit code 1 inside command substitutions, andset -ewill abort the script before it can report a structured failure. Make these parsing pipelines non-fatal (e.g., append|| true, or useawk/wc -lfor counting).
mline=$(echo "$nh_dump" | grep -E "^id $mid via [0-9.]+ .*fdb" || true)
mip=$(echo "$mline" | grep -oE 'via [0-9.]+' | awk '{print $2}')
if [ -n "$mip" ]; then
member_ips="$member_ips $mip"
fi
done
# Sort + dedupe so we count distinct gateways (one IP shouldn't
# appear twice in a healthy group; the count check below rules
# that out).
member_ip_list=$(echo "$member_ips" | tr ' ' '\n' | sort -u | grep -v '^$')
member_ip_count=$(echo "$member_ip_list" | grep -c .)
if [ "$member_ip_count" -eq 2 ] \
| fdb_row=$(fdb_show_test_mac) | ||
| nhid=$(echo "$fdb_row" | grep -oE ' nhid [0-9]+' | awk '{print $2}' | head -1) | ||
| if [ -z "$nhid" ]; then | ||
| fail "could not parse nhid from FDB row: $fdb_row" | ||
| print_summary | ||
| fi |
`test-lib.sh` sets `set -euo pipefail`; the captured-grep pipelines in the driver could trip `inherit_errexit` (one `shopt` away) and abort the script before the explicit empty-checks could handle the no-match path. Add defensive `|| true` to every captured grep pipeline: - nhid parse out of the FDB row - group-line member-ID list parse - per-member `via <ip>` extraction - dedupe + count of distinct gateways Each one is paired with an explicit `[ -z ... ]` / count check that already covers the failure mode; the `|| true` keeps the no-match path flowing into that check instead of taking down the script.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (2)
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:266
- Same quoting issue as Phase 1:
grep -i '$TEST_MAC'searches for the literal$TEST_MACrather than the actual MAC, so this Phase 2 wait condition can never succeed. Use a pattern that expands to the MAC value (or export TEST_MAC into the container exec environment).
wait_for_rb \
"FDB row falls back to single-dst (no nhid)" \
"bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -vE ' nhid [0-9]+' | grep -E ' dst $PE_A_IP'" \
45 \
|| { print_summary; }
tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh:291
- Same quoting issue as earlier phases:
grep -qi '$TEST_MAC'matches the literal$TEST_MAC, so this wait condition will not detect the MAC being removed. Expand the variable in the host script (or export it) before passing the command tosh -c.
wait_for_rb \
"FDB row for $TEST_MAC removed from rustbgpd's bridge FDB" \
"! bridge fdb show dev $VXLAN | grep -qi '$TEST_MAC'" \
45 \
|| { print_summary; }
| # bail out on failure rather than emit cascading false negatives. | ||
| wait_for_rb \ | ||
| "FDB row for $TEST_MAC has nhid" \ | ||
| "bridge fdb show dev $VXLAN | grep -i '$TEST_MAC' | grep -E ' nhid [0-9]+'" \ |
| wait_frr_established "$PE_A" "10.0.0.1" "pe-a EVPN" \ | ||
| && ok "pe-a Established" || fail "pe-a not Established" | ||
| wait_frr_established "$PE_C" "10.0.1.1" "pe-c EVPN" \ | ||
| && ok "pe-c Established" || fail "pe-c not Established" |
One legit finding fixed; one false positive skipped. Fixed: `wait_frr_established` already calls `ok` / `fail` and increments the pass/fail counters internally. The `&& ok ... || fail ...` wrapper M32 / older M-series scripts use double-counts the result. Dispatch the helper directly and use `|| print_summary` to abort on a failed prereq instead. (The existing sibling tests have the same bug; cleanup in a separate pass.) Skipped (false positive): Copilot flagged `grep -i '$TEST_MAC'` inside the `sh -c` predicate as not expanding `$TEST_MAC`. The nested single quotes are inside an outer double-quoted string, so bash expands `$TEST_MAC` on the host before the string ever reaches `sh -c`. Verified with `bash -c 'str="...'\''$VAR'\''..."; echo "$str"` — variable is interpolated, single quotes survive as literals to protect the MAC's colons from word-splitting in `sh -c`. The quoting is correct. This is round 5/5 — per the project memory rule capping Copilot loops at ~5 rounds before switching to deep human review, no further Copilot rounds will be requested. Branch ready for human review.
Summary
First end-to-end validation of the slice 3b coordinator + Pass 1b diff path against a real upstream peer. M40 proves that rustbgpd consumes real FRR EVPN-MH routes (Type 1 EAD-per-EVI + Type 2 with shared ESI) and installs an FDB nexthop group on the receiving VTEP via `NDA_NH_ID`. After this lands, aliasing-ECMP graduates from "shipped, kernel-primitive smoke green" (PR #88) to "shipped, FRR-validated against 10.3.1" — the gating evidence for flipping the production-default knob.
Topology
`tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml`:
```
pe-rb (rustbgpd VTEP) ── pe-a (FRR, ES1)
└── pe-c (FRR, ES1)
```
rustbgpd peers with two FRR VTEPs that share ES1 (`es-id 1`, `es-sys-mac aa:bb:cc:dd:ee:ff` — same convention as M32). FRR side reuses the existing `start-frr-vtep-mh.sh` so the bond-as-ES shape (FRR EVPN-MH requires it for the local ES to register) is the same known-good config the M32 reflector test exercises.
Test driver
`scripts/test-m40-evpn-aliasing-ecmp-frr.sh`:
Phase 1 — Inject a test MAC on pe-a's `esdummy` bond. Expect:
Phase 2 — Stop pe-c's bgpd. EAD-per-EVI from pe-c withdraws; per the projection invariant (`empty alias_vtep_ips ⇔ alias_group_key.is_none()`) the entry collapses from FDB-NHG to single-dst pointing at pe-a. Expect the row to fall back to `dst pe-a` (no `nhid`) and the group + pe-c member to tear down.
Phase 3 — `bridge fdb del` removes the MAC on pe-a. Expect the row + remaining FDB nexthops for the test VTEPs to disappear.
What's deliberately not in this PR
Test plan
```bash
docker build -t rustbgpd:dev .
containerlab deploy -t tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml
bash tests/interop/scripts/test-m40-evpn-aliasing-ecmp-frr.sh
containerlab destroy -t tests/interop/m40-evpn-aliasing-ecmp-frr.clab.yml
```
References