Open
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
8f111a5 to
6007b79
Compare
…uncation
- Implement head+tail output truncation (60/40 split) so LLM sees both
beginning and final results; add streaming byte-limited reads in backend
to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
- Define BoxProfile model with locked fields and max_timeout_sec clamping
- Add four built-in profiles: default, offline_readonly, network_basic,
network_extended with differentiated resource and security constraints
- Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
read_only_rootfs) and pass corresponding container CLI flags
(--cpus, --memory, --pids-limit, --read-only, --tmpfs)
- Profile loaded from config (box.profile), applied in service layer
before BoxSpec validation; locked fields cannot be overridden by
tool-call parameters
After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dep install, security
## Summary
When Podman/Docker is available, all stdio-mode MCP servers now automatically
run inside Box containers with dependency installation, path rewriting, and
lifecycle management. When no container runtime exists, LangBot starts normally
and stdio MCP falls back to host-direct execution.
## What changed
### MCP stdio → Box integration (mcp.py)
- Add `MCPServerBoxConfig` pydantic model for structured box configuration
with validation and defaults (network, host_path_mode, timeouts, resources)
- Auto-infer `host_path` from command/args with venv detection: recognizes
`.venv/bin/python` patterns and walks up to the project root
- Rewrite host paths to container `/workspace` paths transparently
- Replace venv python commands with container-native `python`
- Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
`pip install` inside the container before starting the MCP server
- Copy project to `/tmp` before install to handle read-only mounts
- Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
- Add Box managed process health monitoring (poll every 5s)
- Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
block of `_lifecycle_loop`, covering all exit paths
- Fix retry logic: `_ready_event` is only set after all retries exhaust
or on success, not on first failure
- Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`
### Box security (security.py — new)
- `validate_sandbox_security()` blocks dangerous host paths:
`/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
docker.sock, podman socket
- Called at the start of `CLISandboxBackend.start_session()`
### Box models (models.py)
- Add `BoxHostMountMode.NONE` — skips volume mount entirely
- Adjust `validate_host_mount_consistency` to allow arbitrary workdir
when `host_path_mode=NONE`
### Box backend (backend.py)
- Add `validate_sandbox_security()` call in `start_session()`
- Add `langbot.box.config_hash` label on containers for drift detection
- Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
- Add `cleanup_orphaned_containers()` to base class (no-op default) and
CLI implementation (single batched `rm -f` command)
### Box runtime (runtime.py)
- Call `cleanup_orphaned_containers()` during `initialize()` to remove
lingering containers from previous runs
### Box service (service.py)
- Graceful degradation: `initialize()` catches runtime errors and sets
`available=False` instead of crashing LangBot startup
- Add `available` property and guard on `execute_sandbox_tool()`
- Add `skip_host_mount_validation` parameter to `build_spec()` and
`create_session()` — MCP paths are admin-configured and trusted,
bypassing `allowed_host_mount_roots` restrictions meant for
LLM-generated sandbox_exec commands
### Default behavior
- stdio MCP servers automatically use Box when `box_service.available`
is True (Podman/Docker detected); no explicit `box` config needed
- When no container runtime exists, falls back to host-direct stdio
- MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
(for site-packages), `host_path_mode=ro`, `startup_timeout=120s`
### Tests
- `test_box_security.py`: blocked paths, safe paths, subpath rejection
- `test_mcp_box_integration.py`: config model, path rewriting, venv
unwrap, host_path inference, payload building, runtime info, box
availability check
- `test_box_service.py`: `BoxHostMountMode.NONE` validation tests
…ession API, and integration tests
## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
- Fix O(n²) stderr trimming in runtime.py with running length tracker
- Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
unused config_hash computation, unused imports
- Deduplicate connection callback in BoxRuntimeConnector, parse URL once
- Use enum comparison instead of stringly-typed spec.network.value check
- Replace manual _result_to_dict/_session_to_dict with model_dump()
- Cache NativeToolLoader tool definition and sandbox system guidance
- Extract _is_path_under() helper to eliminate duplicated path checks
- Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
- Add JSON startswith guard in logging_utils to skip futile json.loads
- Fix ruff lint errors (F401 unused imports, F841 unused variables)
- Move sandbox system-prompt guidance from LocalAgentRunner into
BoxService.get_system_guidance() so all box domain knowledge stays
in the box module.
- Remove standalone logging_utils.py; merge format_result_log() into
MessageHandler base class alongside cut_str().
- Strip sandbox-specific JSON parsing from log formatting; tool
results now use generic truncation.
- Revert TYPE_CHECKING changes in stage.py and runner.py that were
unrelated to this feature.
- Skip two test files affected by a pre-existing circular import
(runner ↔ app) until the import cycle is resolved in a separate PR.
Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.*`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths
6007b79 to
726da24
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LangBot Box:沙箱执行系统
概述
本 PR 引入 LangBot Box,让 LLM Agent、MCP Server,以及后续的 Skill/工具执行都能在隔离环境中运行 Shell 命令、Python 脚本和长生命周期进程。
当前实现已经不是“代码主要都在 LangBot 主仓”那种结构了,职责现在明确拆成两层:
sandbox_exec工具暴露、Profile/宿主机路径策略、MCP Box-stdio 集成、状态接口,以及运行时连接管理。langbot-plugin-sdk:负责 Box Runtime 底座,包括协议、模型、错误类型、Session 生命周期、Backend 抽象、Docker/Podman/nsjail 执行后端,以及独立运行的 Box Server。换句话说,这个 PR 现在本质上是一个跨仓协作的沙箱能力接入:LangBot 侧负责接入和策略,SDK 侧承载大部分可复用的运行时实现。
分支:
feat/sandbox功能
sandbox_exec原生工具:LLM Agent 获得一个原生工具,可在隔离环境中运行 Shell 命令和 Python 脚本,用于精确计算、结构化解析、临时文件处理和代码执行。stdio模式的 MCP Server 在 Box 可用时会自动运行在沙箱中,支持依赖安装、路径重写和 stdio-over-WebSocket 桥接。Podman、Docker和nsjail三种 Backend,统一走同一套BoxRuntime生命周期管理。/api/v1/box/status、/api/v1/box/sessions、/api/v1/box/errors供运维和调试使用。架构
分层与职责
pkg/box/service.pypkg/box/connector.pyprovider/tools/loaders/native.pysandbox_exec给模型provider/tools/loaders/mcp.pystdioMCP Server 接入 Box Session / managed processapi/http/controller/groups/box.pylangbot_plugin.box.actions/client.pylangbot_plugin.box.modelsBoxSpec、BoxProfile、BoxSessionInfo等共享模型langbot_plugin.box.runtimebackend.py/nsjail_backend.pylangbot_plugin.box.server核心设计决策
1. Runtime 底座下沉到 SDK
现在 Box 的核心不再放在 LangBot 主仓,而是下沉到
langbot-plugin-sdk/src/langbot_plugin/box/。这样做的原因是:LangBot 主仓保留的是产品语义相关能力:是否暴露工具、如何应用 Profile、哪些宿主机路径允许挂载、MCP 如何接入、HTTP 如何观测。
2. 独立进程架构
Box Runtime 作为独立进程 / 独立容器运行,通过 Action RPC 与 LangBot 主进程通信:
BoxRuntimeConnector启动python -m langbot_plugin.box.server --port 5410子进程,并用 stdio 建立连接。langbot_box_runtime容器暴露的 WebSocket 服务,运行时本身仍由 SDK 中的langbot_plugin.box.server提供。这样主进程不需要直接承载底层容器操作逻辑,也更方便把 Box Runtime 单独部署、单独升级。
3. Session 复用
Session 是 Box 的核心调度单元。
BoxRuntime维护一个session_id -> RuntimeSession映射:sandbox_exec默认以query_id作为session_idmcp-{uuid}形式持有独立 SessionSession 带 TTL(默认 300 秒)。回收条件是:
last_used_at超过 TTL这保证了:
sandbox_exec可以在同一次对话里做多步有状态执行4. Profile 体系在 LangBot 层生效
sandbox_exec不直接把所有隔离参数完全裸露给模型,而是先通过 LangBot 的BoxService应用 Profile:timeout_sec会被 clamp 到profile.max_timeout_sec当前内置 Profile 仍包括:
defaultoffline_readonlynetwork_basicnetwork_extended5. Backend 抽象与探测顺序
SDK 里的
BoxRuntime现在统一从以下顺序探测可用 Backend:PodmanBackendDockerBackendNsjailBackend三者都实现同一套
BaseSandboxBackend接口,上层BoxService/BoxRuntimeConnector/ActionRPCBoxClient都不感知底层具体是容器还是 nsjail。6. MCP Box-stdio 模式
LangBot 中的
RuntimeMCPSession在检测到stdioMCP 且 Box 可用时,会执行下面这条链路:BoxService.create_session()创建 Sessionpyproject.toml/requirements.txt自动安装依赖/workspace/...start_managed_process()启动 MCP 进程MCP 协议语义仍然在 LangBot 侧,SDK 里的 Box Runtime 只负责“把一个托管进程安全地跑起来并提供 attach 能力”。
7. Host Path 挂载
Box 把宿主机目录挂载到沙箱内固定的
/workspace:sandbox_exec:默认取config.yaml中的box.default_host_workspacebox.host_pathDocker 部署下,
langbot和langbot_box_runtime会共享挂载同一宿主机目录,例如./data/box-workspaces:/workspaces。这样:核心接口
LangBot:
BoxServiceSDK:
BoxSpecSDK:
BaseSandboxBackend通信方式
Action RPC
Box 复用
langbot_plugin.runtime.io这一套 Action RPC / Connection / Handler 基础设施。当前 Box Runtime 暴露的动作包括:box_healthbox_statusbox_execbox_create_sessionbox_get_sessionbox_get_sessionsbox_delete_sessionbox_start_managed_processbox_get_managed_processbox_get_backend_infobox_shutdown传输模式
langbot_plugin.box.server子进程并通过 stdio 通信ws://langbot_box_runtime:5411WebSocket Relay
Box Runtime 还会在
:5410起一个轻量 aiohttp 服务,用于 MCP 托管进程 attach:GET /v1/sessions/{session_id}/managed-process/ws该接口负责把 WebSocket 文本消息桥接到托管进程的 stdin/stdout。
部署方式
本地开发
无需额外服务编排。LangBot 会自动启动本地 Box Runtime 子进程。
宿主机需要具备至少一种可用后端:
Podman、Docker或nsjail。Docker Compose
主 LangBot 进程通过
ws://langbot_box_runtime:5411与 Runtime 通信,通过http://langbot_box_runtime:5410访问 managed-process relay。安全模型
/etc、/proc、/sys、/dev、/root、/boot、容器运行时 socket 等路径被硬编码阻断。allowed_host_mount_roots下的路径才允许挂载到/workspace。nsjailBackend 也固定以只读系统挂载为核心模型。langbot.box=true容器。Skill / 插件如何接入
1. 通过
sandbox_exec最简单的接入方式仍然是把
sandbox_exec放进模型工具列表,让模型在需要时自行调用。2. 直接调用
BoxService适合插件、Skill 或平台内部逻辑明确需要执行固定命令的场景:
3. MCP Server in Box
stdioMCP Server 在 Box 可用时自动运行在沙箱内,并支持通过box字段覆盖镜像、网络、挂载模式、启动超时等参数:{ "name": "my-mcp-server", "mode": "stdio", "command": "python", "args": ["server.py"], "box": { "image": "node:20", "network": "on", "host_path_mode": "ro", "startup_timeout_sec": 180 } }文件结构
LangBot 主仓
langbot-plugin-sdk部署与测试
测试覆盖
BoxService、BoxRuntimeConnector、sandbox_exec接入、MCP Box 配置与路径改写等逻辑。nsjailBackend 的探测、执行、Session 清理与隔离行为。Q&A
Q: Profile 是全局的吗?模型能覆盖哪些参数?
是全局配置,来源于
config.yaml的box.profile。未锁定字段可被模型覆盖;锁定字段始终回退到 Profile 值。Q: MCP Server 为什么不走 Profile?
因为 MCP Server 是管理员显式配置的可信进程,需求和 LLM 生成代码不同。它默认需要更高可用性,比如联网安装依赖,所以走
MCPServerBoxConfig独立配置。Q: Session TTL 会不会把 MCP Server 提前清掉?
不会。只要 Session 上还有运行中的 managed process,TTL 回收逻辑就会跳过它。
Q: 现在没有 Docker / Podman 怎么办?
Runtime 会按
Podman -> Docker -> nsjail的顺序探测可用 Backend。三者都没有时,BoxService.available = False,sandbox_exec不会暴露给模型,stdioMCP 也会回退到宿主机直接运行。Q:
nsjail现在是什么状态?已经接入当前代码路径,不再只是规划。它是
BoxRuntime的正式候选 Backend 之一,只是在实际部署中是否命中它,取决于宿主机上是否安装并可用。Q: 如何接入新的 Backend?
实现
BaseSandboxBackend接口并加入BoxRuntime.backends探测列表即可。LangBot 集成层、Action RPC 协议、工具定义都不需要改。