Skip to content

RFC: Python parity for the Handler trait (Follow-up B) #43

@dzerik

Description

@dzerik

RFC: Python parity for the Handler trait (Follow-up B)

PR #36 landed the Rust Handler trait and the
Sandbox::run_with_extra_handlers(I: IntoIterator<Item = (S, H)>) shape.
The existing Python SDK (ctypes-based, in python/src/sandlock/) has no
equivalent surface — Python users can spawn a sandbox and set policy,
but cannot register user handlers.

This RFC asks for direction on five design questions before opening a
PR. The Python SDK is currently sync ctypes over libsandlock_ffi.so,
which I treat as the binding constraint (no PyO3 introduction here).

Q1: Async model

Python async def handle() semantics across the FFI boundary.

  • A. Sync handler signature (def handle(ctx) -> NotifAction).
    Users that want async wrap themselves with
    asyncio.run_coroutine_threadsafe(...).result().
    Smaller C ABI, supervisor task blocks fully on handler.
  • B. Native async handler via completion-pipe / eventfd bridge.
    C ABI exposes a sandlock_completion_t* that handler signals when
    ready. Idiomatic, but 3-4× more C surface and ctypes needs custom
    completion glue (no PyO3-asyncio equivalent).
  • C. Handler runs in isolated Python subprocess, IPC per
    notification. Full isolation, no GIL contention; but ~ms-per-syscall
    overhead makes high-frequency interception (VFS) impractical.

Q2: HandlerCtx FFI surface

How to expose notif/notif_fd/child-memory helpers to Python:

  • A. Fully opaque pointer + getter functions
    (sandlock_ctx_pid(ctx), sandlock_ctx_arg(ctx, idx),
    sandlock_ctx_read_cstr(ctx, addr, buf, cap), ...).
    ABI-safe to extend; per-call FFI overhead.
  • B. repr(C) struct exposed verbatim (notif_id, pid, syscall_nr,
    args[6], notif_fd inline). Direct ctypes Structure mapping.
    Zero-cost field access; freezes layout — kernel seccomp_notif
    changes break Python ABI.
  • C. Hybrid: repr(C) notification snapshot + opaque
    sandlock_mem_handle_t* for child-memory access. Notification
    data direct, memory access wrapped (sandlock controls TOCTOU
    lifetime).

Q3: NotifAction FFI surface

Eight Rust variants, some with owned resources (OwnedFd) and a
callback (InjectFdSendTracked.on_success).

  • A. Tagged union (enum kind + union u). Direct memory layout;
    freezes union ABI. Ownership of contained fds and callback
    user-data unclear.
  • B. Opaque builder functions (sandlock_action_continue(),
    sandlock_action_inject_fd_send_tracked(fd, flags, cb, ud, ud_drop)).
    Sandlock owns lifecycle including ud_drop cleanup callback. Heap
    per action.
  • C. Output-parameter setters into a sandlock-pre-allocated
    sandlock_action_out_t* passed to handler.
    No heap allocation; default is "no setter called → Continue";
    layout still partially fixed.

Q4: Handler ownership / lifetime through FFI

When a Python handler is registered, sandlock holds an
Arc<dyn Handler> for the duration of the sandbox. Through FFI, that
means a PyObject* lives across thread/runtime boundaries.

  • A. Raw PyObject* + caller-provided Py_IncRef/Py_DecRef
    callbacks. Compact API; couples sandlock_ffi to Python ABI;
    GIL-acquired-before-callback contract easy to violate.
  • B. Opaque sandlock_handler_t* allocated by
    sandlock_handler_new(handle_fn, ud, ud_drop). Sandlock owns
    lifecycle; ud_drop is arbitrary cleanup (Py_DecRef one option).
    Per-handler heap.
  • C. Static-dict approach: handler registered by integer ID;
    Python keeps dict[int, Handler] and dispatches via trampoline.
    Minimum FFI surface; global mutable state, doesn't scale to multiple
    sandboxes per process.

Q5: Error propagation — Python exception → NotifAction

If a handler raises (or Python interpreter halts mid-dispatch):

  • A. Fail-open (return Continue). Simple; handler bug becomes
    silent security hole for VFS-style enforcement.
  • B. Fail-closed (return Kill). Defensive; aborts the entire
    sandbox session on the first buggy notification.
  • C. Configurable per-handler — registration takes
    on_exception: NotifAction. Audit and VFS handlers pick different
    policies. Larger registration surface.
  • D. Sandbox-level default + per-handler override. Set once,
    overridable; biggest API but most flexible.

Cross-cutting decisions (need a position regardless of A/B/C choice)

  • OwnedFd ownership rules across FFI. After a Python handler
    returns an InjectFdSend{fd} action, who closes the fd on the failure
    path? Proposed contract: "sandlock takes ownership; user must not
    close after returning".
  • GIL contention. Handler runs sync inside the supervisor task,
    holding the GIL for the duration. Many concurrent notifications →
    supervisor stalls. Mitigations (dedicated thread, subinterpreters)
    are out of scope for v1; document as known limitation?
  • Python interpreter halt during dispatch. Py_FinalizeEx running
    while sandbox alive → trampoline cannot safely call Python. Proposed:
    trampoline checks Py_IsInitialized() and falls back to the
    configured exception action (Q5).
  • Segfaults inside Python handler. Native crash leaves supervisor
    task hung, child trapped indefinitely. Proposed: not recoverable;
    document as user responsibility.

Out of scope for this RFC

  • CPython 3.12+ subinterpreters per sandbox.
  • PyO3 / cffi alternatives (existing SDK is ctypes).
  • Cross-process handler sharing.
  • FFI / Python parity for Sandbox::run / dry_run / checkpoint
    separate scope; this RFC is handler-focused.

Phasing proposal

If preferred direction emerges, suggested split:

  1. C ABI surface only (Q1-Q3 chosen) — new sandlock_ffi symbols, no
    Python wrapper yet. CI builds, no runtime test.
  2. Python wrapper layer — minimal Handler base class + registration
    into existing Sandbox.run_* Python entry points. Smoke test:
    audit-only handler counting SYS_openats.
  3. Ergonomic layer — error mapping (Q5), context helpers
    (ctx.read_path()), test fixtures, docs page.

Happy to split into 3 PRs if that's the preferred review unit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions