interface/jail: don't hijack a host interface when the jailed device is missing#83
Closed
dangowrt wants to merge 2 commits into
Closed
interface/jail: don't hijack a host interface when the jailed device is missing#83dangowrt wants to merge 2 commits into
dangowrt wants to merge 2 commits into
Conversation
system_link_netns_move() built its RTM_NEWLINK message with whatever system_if_resolve() returned for the source device. When the device does not exist (e.g. a jailed interface whose veth has not been created yet), system_if_resolve() returns 0, and the message goes out with ifi_index = 0 while IFLA_IFNAME is set to target_ifname (the jail_device name). The kernel's __rtnl_newlink() then selects the target device by name when ifi_index is 0 (net/core/rtnetlink.c: rtnl_dev_get -> __dev_get_by_name), so the move operates on whatever host device happens to be named like the jail_device. With a jail_device of "eth0" this moves the host's real eth0 into the container's network namespace, knocking the host off the network. Bail out when the source device cannot be resolved to a real ifindex so the request is never sent with a zero index. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
interface_start_jail() moved each jailed interface's main device into the container netns at the moment the jail's netns was registered. If the device did not exist yet (e.g. a veth whose creation had not been triggered), the move was a no-op and the container came up with no network device; the in-jail netifd then failed DHCP with "udhcpc: SIOCGIFINDEX: No such device" until the operator manually brought the host interface up. Keep a duplicated reference to the jail netns on the interface when the move cannot be performed yet, and complete it from interface_main_dev_cb once the device appears (DEV_EVENT_ADD). interface_stop_jail and interface_free drop any still-pending reference. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Member
Author
|
Closing in favor of #85 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
When an interface carries
option jail '<name>', host netifd moves thatinterface's device into the container's network namespace when the
container is created. procd's ujail triggers this through the
network.netns_updownubus call, which lands ininterface_start_jail()->
system_link_netns_move().This is the mechanism behind the bug reported in
openwrt/procd#38, where bringing up a netifd-enabled (uxc) container
either left the container with no network device — the in-jail udhcpc
logging
udhcpc: SIOCGIFINDEX: No such device/can't bind to interface <dev>: No such device— or, in the worst case,pulled the host's own
eth0into the container and knocked the host offthe network.
What was broken
system_link_netns_move()built itsRTM_NEWLINKmessage from whateversystem_if_resolve()returned for the source device:A jailed interface is forced
autostart = false(seeinterface_alloc()/interface_change_config()), so host netifd neverbrings it up and never instantiates its device on its own. A
config deviceveth is only created once its primary end is actually claimed bysomething (e.g. a bridge port). If the operator's config never causes the
veth to be created on the host before the container starts, the jailed
interface's device simply does not exist at
interface_start_jail()time, and
system_if_resolve()returns 0.With
index == 0, the message still carriesIFLA_IFNAME = target_ifname(thejail_devicename). The kernel's__rtnl_newlink()selects the target device by name when
ifi_indexis 0(
net/core/rtnetlink.c:rtnl_dev_get()->__dev_get_by_name()), sothe move operates on whatever host device happens to be named like the
jail_device. With ajail_deviceofeth0this moves the host's realeth0into the container's netns. That is exactly the failure mode inopenwrt/procd#38: the container is created, the host loses its uplink,
and ssh to the host dies while the box itself stays up.
How it is fixed
Two commits:
system-linux: refuse netns move of an unresolved device.
Bail out of
system_link_netns_move()when the source device cannotbe resolved to a real ifindex, so a request is never sent with
ifi_index == 0and a name-based selector. This removes thehost-interface-hijack entirely: a missing jailed device can no longer
cause an unrelated host device to be moved.
interface: defer the jailed device move until the device exists.
When the move cannot be performed yet, keep a duplicated reference to
the jail netns on the interface (
netns_fd+jail_pending) andcomplete the move from
interface_main_dev_cb()once the device showsup (
DEV_EVENT_ADD).interface_stop_jail()andinterface_free()drop any still-pending reference. The pending state survives a
network reloadbecauseinterface_update()->interface_change_config()keeps the existing interface object. This turns the previously-required
manual workaround (
ubus call network.interface.<jail-iface> upbeforecontainer creation, mentioned in the ticket) into automatic behaviour:
as soon as the device is created the container receives it.
Testing
Reproduced openwrt/procd#38 on an x86_64 VM with a uxc/netifd container
(
org.openwrt.procd.netifdannotation), a veth pair, and a jailedinterface with
jail_device 'eth0'whose veth was deliberately notpre-created:
uxc createmoved the host'seth0into the container; thehost lost networking.
eth0is untouched (verified by MAC acrossthe create); the container's netns has only
lo.br-lan),the deferred move fires automatically, the veth end lands in the
container as
eth0, the in-jail netifd completes DHCP, and the host'seth0remains in place.A correctly-configured setup (veth host end as a
br-lanport so thepair is created at boot) works on unpatched netifd too; these changes
make the misconfigured / device-not-yet-present cases fail safe instead
of hijacking a host interface.