Also enable AODV route discovery on source-routed unicasts#719
Also enable AODV route discovery on source-routed unicasts#719TheJulianJES wants to merge 1 commit into
Conversation
When `CONF_SOURCE_ROUTING` is enabled, bellows previously set only `APS_OPTION_ENABLE_ADDRESS_DISCOVERY` on outgoing unicasts. The two APS option flags are independent in the stack: - `ENABLE_ADDRESS_DISCOVERY` triggers a network-broadcast address resolution if the destination NWK isn't known. - `ENABLE_ROUTE_DISCOVERY` initiates AODV route discovery if no route exists. If a packet is sent to a destination that isn't yet in the NCP's source-route table (e.g. immediately after startup, before MTORR has propagated), `ADDRESS_DISCOVERY` alone is not enough — the delivery falls back to whatever default the stack chooses. Setting both flags lets AODV act as a transparent fallback. zigbee-herdsman's ember adapter applies both flags to all unicasts for this reason — its `DEFAULT_APS_OPTIONS` in `emberAdapter.ts` includes them together with the comment "Removing `ENABLE_ROUTE_DISCOVERY` leads to devices that won't reconnect/go offline, and various other issues." The non-source-routing path is untouched: it already includes `ENABLE_ROUTE_DISCOVERY`, and the explicit `FORCE_ROUTE_DISCOVERY` caller-override path also stays as-is.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #719 +/- ##
=======================================
Coverage 99.54% 99.54%
=======================================
Files 61 61
Lines 4147 4147
=======================================
Hits 4128 4128
Misses 19 19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
And/or we should consider not ignoring the bellows/bellows/zigbee/application.py Lines 993 to 1001 in 4b97a6d |
|
TODO: Check if the stack removes broken/outdated source routes, or just keeps using them until the next MTOR happens. |
|
Little more experience / problem with routing in ZHA / Bellows. Next is 1 tuya 4 way switch that oft is not responding on commands (Zigbee transmission failed) from automatons and is working OK after repower it. Was sniffing it last week and (Z)HA was not sending the on commands that automation was set to do also the sending one command from device card = ZHA is not having one know rout to the device and is not trying sending one command only error in the log. The system is for very long time running with By the way: forcing manual source routers is one very danger thing then its disabling the self healing functionality in the mesh network so if one router is getting problem then some device can being not reatshe ball and cant being fixed in one easy way what i knowing so if implanting it pleas put strong warnings in the instructions !!! Great work done !!!! |
|
PS: Coordinator hard and software = Connected via [IKEA Billy EZSP 6.10.7.0] |
zigpy-review-bot
left a comment
There was a problem hiding this comment.
Reviewed as a draft. The code change is correct and matches zigbee-herdsman's long-standing default — confirming the explanatory mechanism, not just the safety of the change.
Behavior
Verified against the Silicon Labs SDK semantics for APS_OPTION_ENABLE_ROUTE_DISCOVERY: "causes a route discovery to be initiated if no route to the destination is known." So when a source route is present the stack uses it; AODV only kicks in when the source-route table is empty for that destination (immediately after startup before the first MTORR propagates, after a device rejoin where the route record hasn't been re-sent, etc.). Source routing remains the primary path; AODV is strictly a fallback. The new code comment captures this accurately.
zigbee-herdsman's ember adapter sets exactly the same combination in DEFAULT_APS_OPTIONS = RETRY | ENABLE_ROUTE_DISCOVERY | ENABLE_ADDRESS_DISCOVERY and explicitly cites BUGZID 12261 in the same comment block, noting they're ignoring the SDK guidance because removing route discovery causes "devices that won't reconnect/go offline, and various other issues." (See src/adapter/ember/adapter/emberAdapter.ts, DEFAULT_APS_OPTIONS.) This PR brings bellows into alignment with that empirically validated default. The two targeted tests in this PR (test_send_packet_unicast_source_route and test_send_packet_unicast_manual_source_route) both pass locally.
Concerns with the PR description (vs the code)
The PR body and the inline comment in the diff tell slightly different stories — the body is where the tidy-up TODO is needed:
-
"never refreshed the network addresses" mechanism is mis-stated in the body, but corrected in the AI summary. Route discovery doesn't refresh NWK addresses — NWK address mapping for a known EUI64 is refreshed by ZDO
Device_annce(on rejoin) or byNWK_addr_req, which is whatAPS_OPTION_ENABLE_ADDRESS_DISCOVERY(already set) triggers.ENABLE_ROUTE_DISCOVERYdiscovers a path, not a new NWK ID. The "AI summary" section of the PR body has this right; the prose at the top conflates the two. Worth tightening before merge so the commit message doesn't ship the confused version. -
BUGZID 12261 isn't refuted, it's overridden by empirical evidence. SDK guidance is genuinely "concentrators should not enable route discovery because MTORRs do it." The justification for ignoring it is the same one zigbee-herdsman documents: stale source-route entries after rejoins / cold start lead to silent packet drops at the NWK layer when AODV isn't allowed to recover. Worth saying in the description that this PR is deliberately diverging from SDK guidance in favor of zigbee-herdsman's tested behavior, not claiming BUGZID 12261 is wrong. See Koenkk/zigbee2mqtt#29902 for the long-form discussion of this tension on the zigbee2mqtt side.
-
Author's "accidental usage for years" history isn't a clean A/B test. The narrative is: source routing was enabled in zigpy config once, then disabled, but the coordinator kept setting source routes. That doesn't actually demonstrate that the combined flags worked — it demonstrates that the source-route table on the NCP persists, and that
CONF_SOURCE_ROUTING=Falsewas used (soENABLE_ROUTE_DISCOVERYwas set via theelsebranch at line 1008, not via the new combined branch). The new combined branch is genuinely a new code path the author hasn't been running. Not a blocker — zigbee-herdsman parity is the stronger argument — but the description shouldn't oversell prior validation.
Minor
- Negative test (
CONF_SOURCE_ROUTING=False→ENABLE_ROUTE_DISCOVERYset via theelsebranch butENABLE_ADDRESS_DISCOVERYnot set) is already implicitly covered by the non-source-routingtest_send_packet_unicast. No new negative test needed. - This complements PRs #716/#717/#718; consider noting the series in the description so a reviewer can see the broader source-routing-hardening context.
Verdict: code change LGTM. Recommend tidying the PR body before un-drafting — particularly the "refreshed network addresses" mechanism and the framing of BUGZID 12261 (diverging from SDK by design, not refuting it).
DRAFT/TODO: Check this and tidy up AI comment.
I've been somewhat accidentally using this for years by turning on source routing in the zigpy config once, then disabling it. This still caused the coordinator to set source routes, but apparently it never refreshed the network addresses this way.
But only with this was source routing actually worth using in my network: as zigpy thought source routing to be off, it still set
APS_OPTION_ENABLE_ROUTE_DISCOVERYas well.According to some sources referencing "BUGZID 12261", this is supposedly incorrect. This should be looked at with more detail.
AI summary (check if accurate)
When
CONF_SOURCE_ROUTINGis enabled, bellows previously set onlyAPS_OPTION_ENABLE_ADDRESS_DISCOVERYon outgoing NWK unicasts. The two flags are independent and address different stack subsystems:ENABLE_ROUTE_DISCOVERY(NWK frame flag) — initiates AODV route discovery if and only if the stack has no known route to the destination (persl_zigbee_types.h: "causes a route discovery to be initiated if no route to the destination is known"). Contrast withFORCE_ROUTE_DISCOVERYwhich initiates discovery unconditionally.ENABLE_ADDRESS_DISCOVERY(APS frame flag) — sends a ZDONWK_addr_reqto resolve the destination NWK ID from its EUI64 if not already cached. It does not discover routes; it resolves addresses.When source routing is on but a destination isn't yet in the NCP's source-route table (e.g. immediately after startup, before MTORR has propagated, or after a device rejoin), the previous configuration left the stack with neither a source route nor permission to fall back to AODV. The packet was dropped at the NWK layer instead of recovering via on-demand route discovery.
This change makes source-routed unicasts also set
ENABLE_ROUTE_DISCOVERY, mirroring what zigbee-herdsman's ember adapter does (emberAdapter.tsDEFAULT_APS_OPTIONS):Why this is correct (source routing is still preferred)
ENABLE_ROUTE_DISCOVERYdoes not run alongside source routing on every send. The SDK header text is explicit: discovery happens "if no route is known." The stack's outbound path for a concentrator is:sl_zigbee_af_override_append_source_route_cbruns first. If a source route exists for the destination, it's prepended to the NWK header and each hop along the way uses that explicit route. AODV is not consulted.ENABLE_ROUTE_DISCOVERYflag actually trigger an AODV route request.Net effect: source routing remains the primary path; AODV is engaged strictly as a fallback for destinations the source-route table doesn't yet know about.