copy-fail-blocker

BPF-LSM mitigation for CVE-2026-31431 ("Copy Fail") and the RxRPC variant of Dirty Frag, and for similar privilege-escalation vulnerabilities that depend on userspace access to a kernel-side in-place crypto path reachable via the AF_ALG or AF_RXRPC socket families.

A small DaemonSet attaches a single BPF-LSM program to the socket_create hook on every node. The program returns -EPERM for any userspace socket(AF_ALG, ...) or socket(AF_RXRPC, ...) call, regardless of process capabilities, namespace, or seccomp profile. Kernel-internal sock_create_kern() callers (e.g. fs/afs, the IPsec stack) are allowed through, so legitimate in-kernel users keep working.

Tested on Talos Linux (which ships with CONFIG_BPF_LSM=y and bpf in the default LSM stack since v1.10), works on any distribution with the same kernel configuration.

Why

Copy Fail (CVE-2026-31431) is a logic flaw in algif_aead that lets an unprivileged local user perform a 4-byte page-cache write to any setuid binary, achieving root with a 732-byte Python script. The exploit needs nothing but AF_ALG + splice(), both of which are reachable from any unprivileged process by default. The mainline fix is a664bf3d603d.

Dirty Frag is a follow-up vulnerability class disclosed in May 2026 by the same research line. It chains two bugs that "dirty" the frag member of sk_buff — xfrm-ESP Page-Cache Write and RxRPC Page-Cache Write. The RxRPC variant performs an in-place pcbc(fcrypt) decrypt on a splice()-pinned page-cache page inside rxkad_verify_packet_1() and reaches root without needing user namespace creation, which makes it the more universally exploitable half of the chain on hardened distributions. The xfrm-ESP fix landed in netdev as f4c50a4034e6 (2026-05-07); distros are still backporting at the time of writing, and RxRPC has no public fix yet — see the upstream write-up for the disclosure timeline.

Both exploits depend on opening a socket in the affected family. Until the kernel fixes land in your distribution, the attack surface can be removed by preventing userspace from ever creating AF_ALG or AF_RXRPC sockets. Compared to alternatives:

Mitigation	Coverage	Reboot?	Persists?
Kernel cmdline `module_blacklist=af_alg,rxrpc` (family handlers, not just `algif_aead`)	host-wide	yes	yes
`/etc/modprobe.d/.conf` with `install af_alg /bin/false` + `install rxrpc /bin/false` and `rmmod` of already-loaded modules (matches the upstream Dirty Frag guidance — note that plain `blacklist` does not* stop in-kernel `request_module()` autoload, only `install … /bin/false` does)	host-wide	no	yes (while file is present)
Custom kernel without `CRYPTO_USER_API` / `AF_RXRPC`	host-wide	yes	yes
Per-pod custom seccomp profile	only labelled workloads	no	yes
copy-fail-blocker (this project)	host-wide userspace	no	while DS runs

This project is the no-reboot option. Run it cluster-wide, then plan the permanent kernel fixes on your normal patch cadence.

Note on the ESP variant of Dirty Frag. The xfrm-ESP Page-Cache Write half of Dirty Frag is not closed by this DaemonSet — it is triggered via XFRM netlink + UDP_ENCAP_ESPINUDP, not via a dedicated socket family, and a clean BPF-LSM filter for it would either break legitimate IPsec on the host or require user-namespace-aware logic. Not currently tracked here — contributions welcome. On hardened distributions that block unprivileged user namespaces (e.g. Ubuntu's default AppArmor policy), the ESP variant is unreachable in the first place and the RxRPC block here is sufficient.

How it works

bpf/blocker.c is a short BPF-LSM program:

SEC("lsm/socket_create")
int BPF_PROG(block_socket_family, int family, int type, int protocol,
             int kern, int ret)
{
    if (ret)
        return ret;
    /* kern != 0 means sock_create_kern() — let in-kernel callers through. */
    if (!kern && (family == AF_ALG || family == AF_RXRPC))   // 38, 33
        return -EPERM;
    return 0;
}

The Go loader (main.go, ~40 lines) loads the program and attaches it via bpf(BPF_LINK_CREATE). The link is held for the lifetime of the pod. On SIGTERM, the link is closed and the hook detaches.

Requires a kernel built with CONFIG_BPF_LSM=y and bpf in the active LSM stack (lsm=...,bpf on the kernel command line). Talos Linux ships with both enabled by default since v1.10.

Install

kubectl

kubectl apply -f https://raw.githubusercontent.com/cozystack/copy-fail-blocker/v0.3.0/manifests/copy-fail-blocker.yaml

For the latest commit on main (may include unreleased changes):

kubectl apply -f https://raw.githubusercontent.com/cozystack/copy-fail-blocker/main/manifests/copy-fail-blocker.yaml

Helm

The chart is not published as an OCI artifact (the registry path is shared with the container image). Install from a tagged checkout:

git clone --branch v0.3.0 https://github.com/cozystack/copy-fail-blocker
cd copy-fail-blocker
helm upgrade --install copy-fail-blocker charts/copy-fail-blocker \
  --namespace kube-system

Or via the Makefile shortcuts:

make apply         # helm upgrade --install into kube-system
make diff          # preview changes against the cluster
make delete        # uninstall
make manifest      # regenerate manifests/copy-fail-blocker.yaml

The DaemonSet must run privileged (it loads BPF programs and writes to bpffs). Place it in a namespace with the privileged Pod Security Standard, or in kube-system, which is privileged by default.

Verify

From any pod on a covered node:

python3 -c '
import errno, socket
# Pass each family with a type the family-specific create() actually
# supports (AF_ALG → SOCK_SEQPACKET, AF_RXRPC → SOCK_DGRAM) so that on
# a node WITHOUT this hook the call would either succeed (FAIL: socket
# created) or fail with a non-EPERM errno — both surface as FAIL below.
# With the hook active, security_socket_create() returns -EPERM before
# pf->create() runs, so the type does not matter; we still pass the
# correct one to keep the FAIL diagnostic unambiguous.
for name, family, stype in [("AF_ALG",   38, socket.SOCK_SEQPACKET),
                            ("AF_RXRPC", 33, socket.SOCK_DGRAM)]:
    try:
        socket.socket(family, stype, 0)
        print(f"FAIL: {name} socket created")
    except OSError as e:
        if e.errno == errno.EPERM:
            print(f"OK ({name}): blocked with EPERM")
        else:
            print(f"FAIL: {name} got {e.errno} ({e.strerror}), expected EPERM")'

Expected output:

OK (AF_ALG): blocked with EPERM
OK (AF_RXRPC): blocked with EPERM

Any other errno (e.g. ESOCKTNOSUPPORT 94, EAFNOSUPPORT 97) means the hook is not active on that node — investigate before assuming you are covered.

Build

make image                                       # docker buildx build + push
make image REGISTRY=ghcr.io/myorg TAG=v0.3.0     # custom tag
make image PUSH=0 LOAD=1                         # build locally without pushing

make image updates charts/copy-fail-blocker/values.yaml with the resolved image digest so the chart always pins by digest.

Build dependencies live in the Containerfile (clang, libbpf-dev, Go). Local host needs only docker buildx, helm, yq (mikefarah), kubectl, and helm-diff.

Configuration

charts/copy-fail-blocker/values.yaml:

Key	Default	Notes
`image.repository`	`ghcr.io/cozystack/copy-fail-blocker`	Auto-updated by `make image`
`image.tag`	`vX.Y.Z@sha256:...`	Pinned by digest, current value in `values.yaml`
`priorityClassName`	`system-node-critical`	Ensures the daemon survives evictions
`tolerations`	`[{operator: Exists}]`	Runs on every node, including tainted
`resources.requests`	`5m CPU / 16Mi memory`	Idle footprint after attach

Limitations

The hook lives only while the pod runs. On pod restart there is a short window (seconds) where userspace AF_ALG and AF_RXRPC are reachable again. For most threat models this is acceptable; if not, consider pinning the BPF link to bpffs (not currently implemented — contributions welcome).
Anyone with CAP_BPF and CAP_SYS_ADMIN on the host can detach the hook. This is not a substitute for cluster-wide privilege restrictions.
Does not block algif_skcipher / algif_hash / etc. The program rejects the entire AF_ALG family, but only algif_aead is currently known to be exploitable. If a future CVE needs a finer filter (e.g. hook bind() and inspect salg_type), this is straightforward to add.
No effect on processes that already hold an open AF_ALG or AF_RXRPC socket. Existing sockets keep working until closed.
Does not cover the ESP variant of Dirty Frag. That path is reached through XFRM netlink and UDP_ENCAP_ESPINUDP, not via a dedicated socket family — not currently tracked, contributions welcome. See the note in Why.
Userspace AF_RXRPC clients are blocked. RxRPC is the AFS network protocol. The !kern guard means the in-tree fs/afs (kAFS) module keeps working since it opens its sockets via sock_create_kern(). Userspace AFS tooling (e.g. OpenAFS userspace daemons) that opens an AF_RXRPC socket directly via socket(2) will be denied — do not deploy this DaemonSet on nodes running such tooling.

License

Apache License 2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

copy-fail-blocker

Why

How it works

Install

kubectl

Helm

Verify

Build

Configuration

Limitations

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
bpf		bpf
charts/copy-fail-blocker		charts/copy-fail-blocker
manifests		manifests
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

copy-fail-blocker

Why

How it works

Install

kubectl

Helm

Verify

Build

Configuration

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages