BPF-LSM mitigation for CVE-2026-31431 ("Copy Fail") and
the RxRPC variant of Dirty Frag, and
for similar privilege-escalation vulnerabilities that depend on userspace
access to a kernel-side in-place crypto path reachable via the AF_ALG or
AF_RXRPC socket families.
A small DaemonSet attaches a single BPF-LSM program to the socket_create
hook on every node. The program returns -EPERM for any userspace
socket(AF_ALG, ...) or socket(AF_RXRPC, ...) call, regardless of
process capabilities, namespace, or seccomp profile. Kernel-internal
sock_create_kern() callers (e.g. fs/afs, the IPsec stack) are
allowed through, so legitimate in-kernel users keep working.
Tested on Talos Linux (which ships with CONFIG_BPF_LSM=y and bpf in the
default LSM stack since v1.10), works on any distribution with the same
kernel configuration.
Copy Fail (CVE-2026-31431) is a logic flaw in algif_aead that lets an
unprivileged local user perform a 4-byte page-cache write to any setuid
binary, achieving root with a 732-byte Python script. The exploit needs
nothing but AF_ALG + splice(), both of which are reachable from any
unprivileged process by default. The mainline fix is a664bf3d603d.
Dirty Frag is a follow-up vulnerability class disclosed in May 2026 by
the same research line. It chains two bugs that "dirty" the frag member
of sk_buff — xfrm-ESP Page-Cache Write and RxRPC Page-Cache Write. The
RxRPC variant performs an in-place pcbc(fcrypt) decrypt on a splice()-pinned
page-cache page inside rxkad_verify_packet_1() and reaches root without
needing user namespace creation, which makes it the more universally
exploitable half of the chain on hardened distributions. The xfrm-ESP fix
landed in netdev as f4c50a4034e6 (2026-05-07); distros are still
backporting at the time of writing, and RxRPC has no public fix yet — see
the upstream write-up
for the disclosure timeline.
Both exploits depend on opening a socket in the affected family. Until the
kernel fixes land in your distribution, the attack surface can be removed
by preventing userspace from ever creating AF_ALG or AF_RXRPC sockets.
Compared to alternatives:
| Mitigation | Coverage | Reboot? | Persists? |
|---|---|---|---|
Kernel cmdline module_blacklist=af_alg,rxrpc (family handlers, not just algif_aead) |
host-wide | yes | yes |
/etc/modprobe.d/*.conf with install af_alg /bin/false + install rxrpc /bin/false and rmmod of already-loaded modules (matches the upstream Dirty Frag guidance — note that plain blacklist does not stop in-kernel request_module() autoload, only install … /bin/false does) |
host-wide | no | yes (while file is present) |
Custom kernel without CRYPTO_USER_API / AF_RXRPC |
host-wide | yes | yes |
| Per-pod custom seccomp profile | only labelled workloads | no | yes |
| copy-fail-blocker (this project) | host-wide userspace | no | while DS runs |
This project is the no-reboot option. Run it cluster-wide, then plan the permanent kernel fixes on your normal patch cadence.
Note on the ESP variant of Dirty Frag. The
xfrm-ESP Page-Cache Writehalf of Dirty Frag is not closed by this DaemonSet — it is triggered via XFRM netlink +UDP_ENCAP_ESPINUDP, not via a dedicated socket family, and a clean BPF-LSM filter for it would either break legitimate IPsec on the host or require user-namespace-aware logic. Not currently tracked here — contributions welcome. On hardened distributions that block unprivileged user namespaces (e.g. Ubuntu's default AppArmor policy), the ESP variant is unreachable in the first place and the RxRPC block here is sufficient.
bpf/blocker.c is a short BPF-LSM program:
SEC("lsm/socket_create")
int BPF_PROG(block_socket_family, int family, int type, int protocol,
int kern, int ret)
{
if (ret)
return ret;
/* kern != 0 means sock_create_kern() — let in-kernel callers through. */
if (!kern && (family == AF_ALG || family == AF_RXRPC)) // 38, 33
return -EPERM;
return 0;
}The Go loader (main.go, ~40 lines) loads the program and attaches it via
bpf(BPF_LINK_CREATE). The link is held for the lifetime of the pod. On
SIGTERM, the link is closed and the hook detaches.
Requires a kernel built with CONFIG_BPF_LSM=y and bpf in the active LSM
stack (lsm=...,bpf on the kernel command line). Talos Linux ships with
both enabled by default since v1.10.
kubectl apply -f https://raw.githubusercontent.com/cozystack/copy-fail-blocker/v0.3.0/manifests/copy-fail-blocker.yamlFor the latest commit on main (may include unreleased changes):
kubectl apply -f https://raw.githubusercontent.com/cozystack/copy-fail-blocker/main/manifests/copy-fail-blocker.yamlThe chart is not published as an OCI artifact (the registry path is shared with the container image). Install from a tagged checkout:
git clone --branch v0.3.0 https://github.com/cozystack/copy-fail-blocker
cd copy-fail-blocker
helm upgrade --install copy-fail-blocker charts/copy-fail-blocker \
--namespace kube-systemOr via the Makefile shortcuts:
make apply # helm upgrade --install into kube-system
make diff # preview changes against the cluster
make delete # uninstall
make manifest # regenerate manifests/copy-fail-blocker.yamlThe DaemonSet must run privileged (it loads BPF programs and writes to
bpffs). Place it in a namespace with the privileged Pod Security Standard,
or in kube-system, which is privileged by default.
From any pod on a covered node:
python3 -c '
import errno, socket
# Pass each family with a type the family-specific create() actually
# supports (AF_ALG → SOCK_SEQPACKET, AF_RXRPC → SOCK_DGRAM) so that on
# a node WITHOUT this hook the call would either succeed (FAIL: socket
# created) or fail with a non-EPERM errno — both surface as FAIL below.
# With the hook active, security_socket_create() returns -EPERM before
# pf->create() runs, so the type does not matter; we still pass the
# correct one to keep the FAIL diagnostic unambiguous.
for name, family, stype in [("AF_ALG", 38, socket.SOCK_SEQPACKET),
("AF_RXRPC", 33, socket.SOCK_DGRAM)]:
try:
socket.socket(family, stype, 0)
print(f"FAIL: {name} socket created")
except OSError as e:
if e.errno == errno.EPERM:
print(f"OK ({name}): blocked with EPERM")
else:
print(f"FAIL: {name} got {e.errno} ({e.strerror}), expected EPERM")'Expected output:
OK (AF_ALG): blocked with EPERM
OK (AF_RXRPC): blocked with EPERM
Any other errno (e.g. ESOCKTNOSUPPORT 94, EAFNOSUPPORT 97) means the
hook is not active on that node — investigate before assuming you are
covered.
make image # docker buildx build + push
make image REGISTRY=ghcr.io/myorg TAG=v0.3.0 # custom tag
make image PUSH=0 LOAD=1 # build locally without pushingmake image updates charts/copy-fail-blocker/values.yaml with the
resolved image digest so the chart always pins by digest.
Build dependencies live in the Containerfile (clang, libbpf-dev, Go). Local
host needs only docker buildx, helm, yq (mikefarah), kubectl, and
helm-diff.
charts/copy-fail-blocker/values.yaml:
| Key | Default | Notes |
|---|---|---|
image.repository |
ghcr.io/cozystack/copy-fail-blocker |
Auto-updated by make image |
image.tag |
vX.Y.Z@sha256:... |
Pinned by digest, current value in values.yaml |
priorityClassName |
system-node-critical |
Ensures the daemon survives evictions |
tolerations |
[{operator: Exists}] |
Runs on every node, including tainted |
resources.requests |
5m CPU / 16Mi memory |
Idle footprint after attach |
- The hook lives only while the pod runs. On pod restart there is a
short window (seconds) where userspace
AF_ALGandAF_RXRPCare reachable again. For most threat models this is acceptable; if not, consider pinning the BPF link to bpffs (not currently implemented — contributions welcome). - Anyone with
CAP_BPFandCAP_SYS_ADMINon the host can detach the hook. This is not a substitute for cluster-wide privilege restrictions. - Does not block
algif_skcipher/algif_hash/ etc. The program rejects the entireAF_ALGfamily, but onlyalgif_aeadis currently known to be exploitable. If a future CVE needs a finer filter (e.g. hookbind()and inspectsalg_type), this is straightforward to add. - No effect on processes that already hold an open
AF_ALGorAF_RXRPCsocket. Existing sockets keep working until closed. - Does not cover the ESP variant of Dirty Frag. That path is reached
through XFRM netlink and
UDP_ENCAP_ESPINUDP, not via a dedicated socket family — not currently tracked, contributions welcome. See the note in Why. - Userspace
AF_RXRPCclients are blocked. RxRPC is the AFS network protocol. The!kernguard means the in-treefs/afs(kAFS) module keeps working since it opens its sockets viasock_create_kern(). Userspace AFS tooling (e.g. OpenAFS userspace daemons) that opens anAF_RXRPCsocket directly viasocket(2)will be denied — do not deploy this DaemonSet on nodes running such tooling.
Apache License 2.0 — see LICENSE.