init.d/cgroups: move cgroup2 root processes into rc.init#1014
Conversation
Use cgroup_path consistently for cgroup2_find_path() results, and replace generic controller loop variables with more descriptive names. No functional change.
|
I've done tests; the patch needs more work. I'll keep it for now, since it is good to illustrate the idea, but I'll send an updated fix later. |
When OpenRC runs in a delegated cgroup v2 hierarchy, processes can remain in the cgroup namespace root. This prevents enabling domain controllers in cgroup.subtree_control due to the "no internal processes" rule (man 7 cgroups). Create an rc.init cgroup and move existing root cgroup processes there before enabling subtree controllers in unified and hybrid mode. Do this for cgroup v2 directly instead of trying to detect containers, since OpenRC cannot reliably distinguish a delegated namespace root from the real cgroup root. Use /rc.init as the marker for initialized OpenRC cgroup v2 setup. Do not create per-service cgroups before rc.init exists, and move transient OpenRC service runner processes back to rc.init when removing temporary service cgroups, since the cgroup v2 root becomes parent-only after subtree controllers are enabled. Fixes OpenRC#1013.
3417eb5 to
e6aa66e
Compare
|
Well, now I've checked - inside container everything works correctly. I'll check on host later. |
|
Well, now I have tested this both in an Incus container and on a real host boot. Testing on a real host exposed one more issue: the cgroup v2 root contains kernel threads. Those can not be moved and must not be moved into rc.init, since writing kernel thread PIDs to a child cgroup's cgroup.procs fails with EINVAL. I fixed that in the latest commit by skipping kernel threads before migration. For the kernel-thread detection I followed the same approach systemd uses: read /proc/$pid/stat, take the flags field, and check PF_KTHREAD. Now I think this patchset is complete. |
Kernel threads can appear in the cgroup v2 root on a real host, but they cannot be moved into a child cgroup. Skip such pids before writing to rc.init/cgroup.procs to avoid EINVAL during boot. Detect kernel threads from /proc/pid/stat flags using PF_KTHREAD, matching the approach used by systemd.
ef8417e to
751e27b
Compare
Before OpenRC enables cgroup v2 subtree controllers, move all PIDs out of the cgroup v2 root into a dedicated
rc.initcgroup.This is needed because a delegated cgroup v2 hierarchy, such as the one seen inside an Incus/LXC container, is still subject to the no-internal-processes rule. If PID 1 or other early processes remain in the cgroup namespace root, writes to
cgroup.subtree_controlfail withEBUSY, and child cgroups do not receive delegated controllers. This breaks resource accounting for nested runtimes such as Docker/containerd.The patch creates
rc.initunder the cgroup v2 mount point, moves all currently listed rootcgroup.procsentries there, and only then enables subtree controllers.This is done as part of the normal cgroup v2 setup rather than as a container-specific workaround because OpenRC cannot reliably distinguish a delegated container cgroup namespace root from the real host root. It also matches the general model used by systemd's
init.scope: init and early processes should not live in the cgroup used as a parent for delegated children.The fix is applied to both unified and hybrid cgroup modes. It only affects the cgroup v2 hierarchy.
Fixes #1013
Testing: