Skip to content

container related fixes and additions#85

Open
dangowrt wants to merge 7 commits into
openwrt:masterfrom
dangowrt:netifd-ubus-device-add
Open

container related fixes and additions#85
dangowrt wants to merge 7 commits into
openwrt:masterfrom
dangowrt:netifd-ubus-device-add

Conversation

@dangowrt

@dangowrt dangowrt commented Jun 8, 2026

Copy link
Copy Markdown
Member

Fix a few bugs preventing container restart and expose support for dynamically creating veth pairs (or any device in that regard) in order to allow containers to spawn netifd-managed veth pairs without having to take the (problematic) detour via UCI.

@dangowrt dangowrt requested a review from nbd168 June 8, 2026 03:58
@dangowrt dangowrt force-pushed the netifd-ubus-device-add branch from 6929f91 to 21489fa Compare June 11, 2026 02:07
dangowrt added 7 commits June 12, 2026 00:47
system_link_netns_move() built its RTM_NEWLINK message with whatever
system_if_resolve() returned for the source device. When the device does
not exist (e.g. a jailed interface whose veth has not been created yet),
system_if_resolve() returns 0, and the message goes out with
ifi_index = 0 while IFLA_IFNAME is set to target_ifname (the jail_device
name).

The kernel's __rtnl_newlink() then selects the target device by name when
ifi_index is 0 (net/core/rtnetlink.c: rtnl_dev_get -> __dev_get_by_name),
so the move operates on whatever host device happens to be named like the
jail_device. With a jail_device of "eth0" this moves the host's real eth0
into the container's network namespace, knocking the host off the
network.

Bail out when the source device cannot be resolved to a real ifindex so
the request is never sent with a zero index.

Fixes: d93126d ("interface: allow renaming interface when moving to jail netns")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
interface_start_jail() moved each jailed interface's main device into the
container netns at the moment the jail's netns was registered. If the
device did not exist yet (e.g. a veth whose creation had not been
triggered), the move was a no-op and the container came up with no
network device; the in-jail netifd then failed DHCP with
"udhcpc: SIOCGIFINDEX: No such device" until the operator manually
brought the host interface up.

Keep a duplicated reference to the jail netns on the interface when the
move cannot be performed yet, and complete it from interface_main_dev_cb
once the device appears (DEV_EVENT_ADD). Any stale pending reference from
an earlier jail incarnation is dropped on the next interface_start_jail()
invocation, so a successful immediate move can never leave a dangling
netns reference behind that would later divert the device into a dead
namespace. interface_stop_jail and interface_free drop any still-pending
reference as well.

Fixes: 1321c1b ("add basic support for jail network namespaces")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
The vlan_aliases kvlist is initialised lazily in device_vlan_update(),
which only runs during configuration load. A bridge that is created and
freed without going through a config load (e.g. one created at runtime
over ubus) reaches bridge_free() with the kvlist still zero-filled, and
kvlist_free() then walks an uninitialised avl_tree whose list head
contains NULL pointers, crashing netifd. Guard the call on get_len, the
same marker device_vlan_update() uses to tell an initialised kvlist
apart.

Fixes: 4544f02 ("bridge-vlan: add support for defining aliases for vlan ids")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
device_create() duplicates the config blob before invoking the type's
create callback, but returned without freeing the duplicate when the
callback fails (e.g. invalid device name). Free it on that path.

Fixes: cd51d94 ("rename a variable for clarity")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
interface_do_remove() freed the interface synchronously. For a dynamic
interface with an immediate protocol (none, static), removal can be
triggered from deep inside a callback chain that keeps using the
interface afterwards: deleting the device of an interface that is up
runs device_broadcast_event -> interface_main_dev_cb ->
interface_set_available(false) -> __interface_set_down ->
interface_proto_event. PROTO_FLAG_IMMEDIATE protocols deliver IFPEV_DOWN
synchronously from there, and its handler ends in
interface_handle_config_change -> interface_do_remove -> interface_free.
While unwinding, __interface_set_down then calls
interface_flush_state(iface) and interface_main_dev_cb dereferences
dep->dev, both reading freed memory.

Keep the teardown, cleanup and deregistration synchronous, but defer the
final free to a uloop timeout. remove_timer can be reused for this as
interface_cleanup() has already cancelled it at that point. Callbacks
unwinding through the cleaned-up interface then only see NULLed device
users, which they handle already.

Fixes: d9872db ("interface: fix removal of dynamic interfaces")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Add create_device {name, type, ...} and delete_device {name} ubus methods
on the network object, the device-level analog of add_dynamic. create_device
resolves the device type and creates the device, marking it dynamic;
delete_device finds the device, clears its dynamic and current_config flags
and frees it once unused. A new bool dynamic on struct device exempts these
runtime-created devices from the reload sweep (device_reset_config leaves the
current_config of dynamic devices alone), mirroring how add_dynamic
interfaces survive a reload.

This lets an external manager assemble ephemeral devices such as a veth pair
entirely over ubus, with no persistent UCI 'config device' section.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
A dynamic interface is removed (IFC_REMOVE) whenever it tears down, including
on a bare carrier loss. For a veth feeding a container whose peer has not
gained carrier yet this destroys the interface and tears down the veth pair
while it is still needed; an external manager re-adding it then churns.

Add a per-interface 'persistent' flag, set at add_dynamic time and carried
across reloads, that suppresses the self-removal on carrier loss: a persistent
dynamic interface is only removed once its device actually goes away. As the
suppression means such an interface can now sit in IFS_DOWN or IFS_TEARDOWN
when its device disappears, handle the removal from interface_main_dev_cb on
DEV_EVENT_REMOVE for those states. Other add_dynamic users keep the current
behaviour (default off).

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant