Skip to content

sandlock_spawn fails with ENOSYS (clone3) when called from a multi-threaded Python process (uvicorn/asyncio + Kubernetes RuntimeDefault seccomp) #47

@mrsimpson

Description

@mrsimpson

Summary

Calling Sandbox(policy).run(...) from a uvicorn server process returns exit_code=-1, error=\"sandlock_spawn failed\" every time. The identical call succeeds from a fresh single-threaded Python process in the same container.

Context

I was setting up sandlock as the execution backend for an MCP tool server — following the recommendation in lobehub/lobehub#12472 to use sandlock as a self-hosted alternative to LobeHub's cloud sandbox. Because LobeHub requires Streamable HTTP MCP transport (not SSE), I wrote a thin FastMCP wrapper around Sandbox.run().

The server runs as a sidecar container in a Kubernetes k3s pod.

Environment

  • Python 3.12, sandlock 0.7.0 (pip)
  • uvicorn + FastMCP (Streamable HTTP transport)
  • Kubernetes k3s, kernel 6.18.18, Landlock ABI v7
  • Pod seccomp: RuntimeDefault (Kubernetes PSS restricted)
  • Container: UID 1000, readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, capabilities: drop ALL

Reproduction

Any FastMCP/uvicorn server that calls Sandbox(policy).run() from its request handler:

@mcp.tool()
async def execute_python(code: str) -> str:
    ws = pathlib.Path("/tmp/sessions/default")
    policy = Policy(fs_readable=["/usr","/lib","/etc"], fs_writable=[str(ws)], ...)
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, lambda: Sandbox(policy).run(["python3", "-c", code]))

Result: Result(success=False, exit_code=-1, error='sandlock_spawn failed')

Diagnosis

I am not a kernel developer or Python internals expert — I figured this out in collaboration with Claude Sonnet 4.6, so please correct any mistakes in the analysis.

A diagnostic endpoint injected into the running server process revealed:

{
  "pid": 1,
  "active_threads": 2,
  "fork": "ok",
  "clone3": "ret=-1 errno=38 (Function not implemented)",
  "new_thread": "ok",
  "minimal_policy": {"ok": false, "error": "sandlock_spawn failed"}
}

Key observations:

  • fork() works fine from the server process
  • clone3 returns ENOSYS — it is blocked by Kubernetes' RuntimeDefault seccomp profile
  • Python's threading.Thread still works because glibc falls back from clone3 to clone
  • sandlock_spawn fails even with the most minimal policy

Reading crates/sandlock-ffi/src/lib.rs:

let rt = match tokio::runtime::Runtime::new() {   // = new_multi_thread()
    Ok(rt) => rt,
    Err(_) => return ptr::null_mut(),             // → "sandlock_spawn failed"
};

Runtime::new() calls new_multi_thread(), which spawns OS worker threads. Our hypothesis: when called from a multi-threaded parent process (uvicorn has 2 threads — event loop + thread pool), Tokio's worker thread spawning fails. Either clone3 is blocked and the fallback doesn't work reliably in a multi-threaded context, or glibc's pthread_atfork handlers deadlock in the forked child. Python itself warns:

DeprecationWarning: This process (pid=1) is multi-threaded,
use of fork() may lead to deadlocks in the child.

The same issue exists in the current source at lib.rs lines ~694, ~744, ~890, ~1042, ~1224, ~1330, ~1628, ~1679, ~1710 and handler/run.rs.

Workaround

Spawn a fresh single-threaded Python subprocess per sandlock call. The subprocess has no active event loop or thread pool, so Tokio's runtime creation succeeds:

def _run_sandboxed_sync(cmd, ws, timeout):
    helper = r"""
import sys, json, pathlib
from sandlock import Sandbox, Policy
req = json.loads(sys.stdin.read())
# build policy, call Sandbox(policy).run(), return JSON
"""
    proc = subprocess.run(
        [sys.executable, "-c", helper],
        input=json.dumps({"cmd": cmd, "ws": str(ws), "timeout": timeout}),
        capture_output=True, text=True, timeout=timeout + 5,
    )
    return json.loads(proc.stdout)["output"]

This works, but adds ~50ms overhead (Python startup time) and an extra unconfined intermediary process.

Suggested fix

Replace Runtime::new() with a current-thread runtime at every call site in the FFI layer:

// Before
let rt = match tokio::runtime::Runtime::new() {

// After
let rt = match tokio::runtime::Builder::new_current_thread()
    .enable_all()
    .build() {

A current-thread runtime runs entirely on the calling thread — no worker thread spawning, no clone3, no fork-safety issues. The async operations sandlock performs (waiting for child process I/O) are I/O-bound, not CPU-parallel, so there is no functional regression from dropping the multi-thread scheduler.


Happy to provide any additional diagnostic information or test a patched build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions