Skip to content

cozystack/copy-fail-blocker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

copy-fail-blocker

BPF-LSM mitigation for CVE-2026-31431 ("Copy Fail") and the RxRPC variant of Dirty Frag, and for similar privilege-escalation vulnerabilities that depend on userspace access to a kernel-side in-place crypto path reachable via the AF_ALG or AF_RXRPC socket families.

A small DaemonSet attaches a single BPF-LSM program to the socket_create hook on every node. The program returns -EPERM for any userspace socket(AF_ALG, ...) or socket(AF_RXRPC, ...) call, regardless of process capabilities, namespace, or seccomp profile. Kernel-internal sock_create_kern() callers (e.g. fs/afs, the IPsec stack) are allowed through, so legitimate in-kernel users keep working.

Tested on Talos Linux (which ships with CONFIG_BPF_LSM=y and bpf in the default LSM stack since v1.10), works on any distribution with the same kernel configuration.

Why

Copy Fail (CVE-2026-31431) is a logic flaw in algif_aead that lets an unprivileged local user perform a 4-byte page-cache write to any setuid binary, achieving root with a 732-byte Python script. The exploit needs nothing but AF_ALG + splice(), both of which are reachable from any unprivileged process by default. The mainline fix is a664bf3d603d.

Dirty Frag is a follow-up vulnerability class disclosed in May 2026 by the same research line. It chains two bugs that "dirty" the frag member of sk_buffxfrm-ESP Page-Cache Write and RxRPC Page-Cache Write. The RxRPC variant performs an in-place pcbc(fcrypt) decrypt on a splice()-pinned page-cache page inside rxkad_verify_packet_1() and reaches root without needing user namespace creation, which makes it the more universally exploitable half of the chain on hardened distributions. The xfrm-ESP fix landed in netdev as f4c50a4034e6 (2026-05-07); distros are still backporting at the time of writing, and RxRPC has no public fix yet — see the upstream write-up for the disclosure timeline.

Both exploits depend on opening a socket in the affected family. Until the kernel fixes land in your distribution, the attack surface can be removed by preventing userspace from ever creating AF_ALG or AF_RXRPC sockets. Compared to alternatives:

Mitigation Coverage Reboot? Persists?
Kernel cmdline module_blacklist=af_alg,rxrpc (family handlers, not just algif_aead) host-wide yes yes
/etc/modprobe.d/*.conf with install af_alg /bin/false + install rxrpc /bin/false and rmmod of already-loaded modules (matches the upstream Dirty Frag guidance — note that plain blacklist does not stop in-kernel request_module() autoload, only install … /bin/false does) host-wide no yes (while file is present)
Custom kernel without CRYPTO_USER_API / AF_RXRPC host-wide yes yes
Per-pod custom seccomp profile only labelled workloads no yes
copy-fail-blocker (this project) host-wide userspace no while DS runs

This project is the no-reboot option. Run it cluster-wide, then plan the permanent kernel fixes on your normal patch cadence.

Note on the ESP variant of Dirty Frag. The xfrm-ESP Page-Cache Write half of Dirty Frag is not closed by this DaemonSet — it is triggered via XFRM netlink + UDP_ENCAP_ESPINUDP, not via a dedicated socket family, and a clean BPF-LSM filter for it would either break legitimate IPsec on the host or require user-namespace-aware logic. Not currently tracked here — contributions welcome. On hardened distributions that block unprivileged user namespaces (e.g. Ubuntu's default AppArmor policy), the ESP variant is unreachable in the first place and the RxRPC block here is sufficient.

How it works

bpf/blocker.c is a short BPF-LSM program:

SEC("lsm/socket_create")
int BPF_PROG(block_socket_family, int family, int type, int protocol,
             int kern, int ret)
{
    if (ret)
        return ret;
    /* kern != 0 means sock_create_kern() — let in-kernel callers through. */
    if (!kern && (family == AF_ALG || family == AF_RXRPC))   // 38, 33
        return -EPERM;
    return 0;
}

The Go loader (main.go, ~40 lines) loads the program and attaches it via bpf(BPF_LINK_CREATE). The link is held for the lifetime of the pod. On SIGTERM, the link is closed and the hook detaches.

Requires a kernel built with CONFIG_BPF_LSM=y and bpf in the active LSM stack (lsm=...,bpf on the kernel command line). Talos Linux ships with both enabled by default since v1.10.

Install

kubectl

kubectl apply -f https://raw.githubusercontent.com/cozystack/copy-fail-blocker/v0.3.0/manifests/copy-fail-blocker.yaml

For the latest commit on main (may include unreleased changes):

kubectl apply -f https://raw.githubusercontent.com/cozystack/copy-fail-blocker/main/manifests/copy-fail-blocker.yaml

Helm

The chart is not published as an OCI artifact (the registry path is shared with the container image). Install from a tagged checkout:

git clone --branch v0.3.0 https://github.com/cozystack/copy-fail-blocker
cd copy-fail-blocker
helm upgrade --install copy-fail-blocker charts/copy-fail-blocker \
  --namespace kube-system

Or via the Makefile shortcuts:

make apply         # helm upgrade --install into kube-system
make diff          # preview changes against the cluster
make delete        # uninstall
make manifest      # regenerate manifests/copy-fail-blocker.yaml

The DaemonSet must run privileged (it loads BPF programs and writes to bpffs). Place it in a namespace with the privileged Pod Security Standard, or in kube-system, which is privileged by default.

Verify

From any pod on a covered node:

python3 -c '
import errno, socket
# Pass each family with a type the family-specific create() actually
# supports (AF_ALG → SOCK_SEQPACKET, AF_RXRPC → SOCK_DGRAM) so that on
# a node WITHOUT this hook the call would either succeed (FAIL: socket
# created) or fail with a non-EPERM errno — both surface as FAIL below.
# With the hook active, security_socket_create() returns -EPERM before
# pf->create() runs, so the type does not matter; we still pass the
# correct one to keep the FAIL diagnostic unambiguous.
for name, family, stype in [("AF_ALG",   38, socket.SOCK_SEQPACKET),
                            ("AF_RXRPC", 33, socket.SOCK_DGRAM)]:
    try:
        socket.socket(family, stype, 0)
        print(f"FAIL: {name} socket created")
    except OSError as e:
        if e.errno == errno.EPERM:
            print(f"OK ({name}): blocked with EPERM")
        else:
            print(f"FAIL: {name} got {e.errno} ({e.strerror}), expected EPERM")'

Expected output:

OK (AF_ALG): blocked with EPERM
OK (AF_RXRPC): blocked with EPERM

Any other errno (e.g. ESOCKTNOSUPPORT 94, EAFNOSUPPORT 97) means the hook is not active on that node — investigate before assuming you are covered.

Build

make image                                       # docker buildx build + push
make image REGISTRY=ghcr.io/myorg TAG=v0.3.0     # custom tag
make image PUSH=0 LOAD=1                         # build locally without pushing

make image updates charts/copy-fail-blocker/values.yaml with the resolved image digest so the chart always pins by digest.

Build dependencies live in the Containerfile (clang, libbpf-dev, Go). Local host needs only docker buildx, helm, yq (mikefarah), kubectl, and helm-diff.

Configuration

charts/copy-fail-blocker/values.yaml:

Key Default Notes
image.repository ghcr.io/cozystack/copy-fail-blocker Auto-updated by make image
image.tag vX.Y.Z@sha256:... Pinned by digest, current value in values.yaml
priorityClassName system-node-critical Ensures the daemon survives evictions
tolerations [{operator: Exists}] Runs on every node, including tainted
resources.requests 5m CPU / 16Mi memory Idle footprint after attach

Limitations

  • The hook lives only while the pod runs. On pod restart there is a short window (seconds) where userspace AF_ALG and AF_RXRPC are reachable again. For most threat models this is acceptable; if not, consider pinning the BPF link to bpffs (not currently implemented — contributions welcome).
  • Anyone with CAP_BPF and CAP_SYS_ADMIN on the host can detach the hook. This is not a substitute for cluster-wide privilege restrictions.
  • Does not block algif_skcipher / algif_hash / etc. The program rejects the entire AF_ALG family, but only algif_aead is currently known to be exploitable. If a future CVE needs a finer filter (e.g. hook bind() and inspect salg_type), this is straightforward to add.
  • No effect on processes that already hold an open AF_ALG or AF_RXRPC socket. Existing sockets keep working until closed.
  • Does not cover the ESP variant of Dirty Frag. That path is reached through XFRM netlink and UDP_ENCAP_ESPINUDP, not via a dedicated socket family — not currently tracked, contributions welcome. See the note in Why.
  • Userspace AF_RXRPC clients are blocked. RxRPC is the AFS network protocol. The !kern guard means the in-tree fs/afs (kAFS) module keeps working since it opens its sockets via sock_create_kern(). Userspace AFS tooling (e.g. OpenAFS userspace daemons) that opens an AF_RXRPC socket directly via socket(2) will be denied — do not deploy this DaemonSet on nodes running such tooling.

License

Apache License 2.0 — see LICENSE.

About

BPF-LSM mitigation for CVE-2026-31431 (Copy Fail) — denies AF_ALG socket creation cluster-wide

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors