feat(sandbox): add MCP box integration on top of sandbox base#2083
Open
huanghuoguoguo wants to merge 26 commits intofeat/sandboxfrom
Open
feat(sandbox): add MCP box integration on top of sandbox base#2083huanghuoguoguo wants to merge 26 commits intofeat/sandboxfrom
huanghuoguoguo wants to merge 26 commits intofeat/sandboxfrom
Conversation
…uncation
- Implement head+tail output truncation (60/40 split) so LLM sees both
beginning and final results; add streaming byte-limited reads in backend
to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
- Define BoxProfile model with locked fields and max_timeout_sec clamping
- Add four built-in profiles: default, offline_readonly, network_basic,
network_extended with differentiated resource and security constraints
- Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
read_only_rootfs) and pass corresponding container CLI flags
(--cpus, --memory, --pids-limit, --read-only, --tmpfs)
- Profile loaded from config (box.profile), applied in service layer
before BoxSpec validation; locked fields cannot be overridden by
tool-call parameters
After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dep install, security
## Summary
When Podman/Docker is available, all stdio-mode MCP servers now automatically
run inside Box containers with dependency installation, path rewriting, and
lifecycle management. When no container runtime exists, LangBot starts normally
and stdio MCP falls back to host-direct execution.
## What changed
### MCP stdio → Box integration (mcp.py)
- Add `MCPServerBoxConfig` pydantic model for structured box configuration
with validation and defaults (network, host_path_mode, timeouts, resources)
- Auto-infer `host_path` from command/args with venv detection: recognizes
`.venv/bin/python` patterns and walks up to the project root
- Rewrite host paths to container `/workspace` paths transparently
- Replace venv python commands with container-native `python`
- Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
`pip install` inside the container before starting the MCP server
- Copy project to `/tmp` before install to handle read-only mounts
- Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
- Add Box managed process health monitoring (poll every 5s)
- Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
block of `_lifecycle_loop`, covering all exit paths
- Fix retry logic: `_ready_event` is only set after all retries exhaust
or on success, not on first failure
- Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`
### Box security (security.py — new)
- `validate_sandbox_security()` blocks dangerous host paths:
`/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
docker.sock, podman socket
- Called at the start of `CLISandboxBackend.start_session()`
### Box models (models.py)
- Add `BoxHostMountMode.NONE` — skips volume mount entirely
- Adjust `validate_host_mount_consistency` to allow arbitrary workdir
when `host_path_mode=NONE`
### Box backend (backend.py)
- Add `validate_sandbox_security()` call in `start_session()`
- Add `langbot.box.config_hash` label on containers for drift detection
- Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
- Add `cleanup_orphaned_containers()` to base class (no-op default) and
CLI implementation (single batched `rm -f` command)
### Box runtime (runtime.py)
- Call `cleanup_orphaned_containers()` during `initialize()` to remove
lingering containers from previous runs
### Box service (service.py)
- Graceful degradation: `initialize()` catches runtime errors and sets
`available=False` instead of crashing LangBot startup
- Add `available` property and guard on `execute_sandbox_tool()`
- Add `skip_host_mount_validation` parameter to `build_spec()` and
`create_session()` — MCP paths are admin-configured and trusted,
bypassing `allowed_host_mount_roots` restrictions meant for
LLM-generated sandbox_exec commands
### Default behavior
- stdio MCP servers automatically use Box when `box_service.available`
is True (Podman/Docker detected); no explicit `box` config needed
- When no container runtime exists, falls back to host-direct stdio
- MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
(for site-packages), `host_path_mode=ro`, `startup_timeout=120s`
### Tests
- `test_box_security.py`: blocked paths, safe paths, subpath rejection
- `test_mcp_box_integration.py`: config model, path rewriting, venv
unwrap, host_path inference, payload building, runtime info, box
availability check
- `test_box_service.py`: `BoxHostMountMode.NONE` validation tests
…ession API, and integration tests
## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
- Fix O(n²) stderr trimming in runtime.py with running length tracker
- Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
unused config_hash computation, unused imports
- Deduplicate connection callback in BoxRuntimeConnector, parse URL once
- Use enum comparison instead of stringly-typed spec.network.value check
- Replace manual _result_to_dict/_session_to_dict with model_dump()
- Cache NativeToolLoader tool definition and sandbox system guidance
- Extract _is_path_under() helper to eliminate duplicated path checks
- Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
- Add JSON startswith guard in logging_utils to skip futile json.loads
- Fix ruff lint errors (F401 unused imports, F841 unused variables)
- Move sandbox system-prompt guidance from LocalAgentRunner into
BoxService.get_system_guidance() so all box domain knowledge stays
in the box module.
- Remove standalone logging_utils.py; merge format_result_log() into
MessageHandler base class alongside cut_str().
- Strip sandbox-specific JSON parsing from log formatting; tool
results now use generic truncation.
- Revert TYPE_CHECKING changes in stage.py and runner.py that were
unrelated to this feature.
- Skip two test files affected by a pre-existing circular import
(runner ↔ app) until the import cycle is resolved in a separate PR.
Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.*`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths
8f111a5 to
6007b79
Compare
93c87aa to
9813665
Compare
6007b79 to
726da24
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
The original PR mixed sandbox infrastructure and MCP integration changes, which made review scope larger than necessary. This follow-up PR keeps MCP-specific logic isolated on top of the sandbox base branch.
Scope in this PR
Verification
platform linux -- Python 3.10.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/yhh/workspace/LangBot
configfile: pytest.ini
plugins: anyio-4.12.1, langsmith-0.6.4, asyncio-1.3.0, mock-3.15.1
asyncio: mode=auto, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 0 items
============================ no tests ran in 0.01s =============================
platform linux -- Python 3.10.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/yhh/workspace/LangBot
configfile: pytest.ini
plugins: anyio-4.12.1, langsmith-0.6.4, asyncio-1.3.0, mock-3.15.1
asyncio: mode=auto, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 65 items
tests/unit_tests/provider/test_tool_manager_native.py ............. [ 20%]
tests/unit_tests/provider/test_localagent_sandbox_exec.py .. [ 23%]
tests/unit_tests/box/test_box_service.py ............................... [ 70%]
................... [100%]
======================== 65 passed, 9 warnings in 5.40s ========================