Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
57f9b70
refactor: core sandbox infra to use openenv.core.harness.sandbox
rycerzes May 12, 2026
024e904
feat: CLIAgentDriver Abstraction
rycerzes May 12, 2026
455b0e9
feat: add tests
rycerzes May 12, 2026
e97fda0
feat: impl Docker sandbox backend
rycerzes May 13, 2026
9a35006
feat: pi agent adapter
rycerzes May 13, 2026
06df791
chore: agent specifications and improve interception module ref
rycerzes May 14, 2026
a3c4a3d
feat: add tests for opencode + pi harness adapters
rycerzes May 14, 2026
81e37a2
chore: migrate opencode_env to coding_agent_env
rycerzes May 14, 2026
ddf1313
feat: hf sandbox backend
rycerzes May 14, 2026
9d85640
chore: ruff + usort format pass
rycerzes May 15, 2026
2f9435c
refactor: remove transparent_proxy mode and in-sandbox interception p…
rycerzes May 15, 2026
71bd9e9
feat: InterceptionServer + interception_gate mode for trainer-owned g…
rycerzes May 15, 2026
171a3ea
refactor: wire coding_agent_env with interception_gate
rycerzes May 15, 2026
52a024e
chore: update tests for interception_gate, remove proxy test cases
rycerzes May 15, 2026
4b1b707
chore: address greptile review comments
rycerzes May 15, 2026
a478fa8
refactor: extract sandbox bootstrap to driver and fix interception races
rycerzes May 16, 2026
b18caf2
chore: remove unsupported mode checks from CodingAgentEnvironment
rycerzes May 16, 2026
bfc7305
chore: revert linting for out of scope files
rycerzes May 16, 2026
c2ca0c8
feat: host-side tool routing and Pi gate models bootstrap
rycerzes May 16, 2026
1b1d9fb
fix: support configurable Pi workdir for command and MCP config
rycerzes May 16, 2026
2f52e48
fix: Docker host gateway mapping and per-create image override
rycerzes May 16, 2026
f3fede2
feat: configurable extension directory support for CLI agents
rycerzes May 16, 2026
448f690
fix: thread-safe queue handling
rycerzes May 18, 2026
37e549d
fix: interception gate support in CodingAgentSessionFactory
rycerzes May 18, 2026
8aa9d18
fix: improve error handling and config propagation across agent pipeline
rycerzes May 18, 2026
61e5524
fix: whitespace secret validation + conditional /root/ write
rycerzes May 18, 2026
a2b4388
fix: cross-loop safe request queue via stdlib queue.Queue
rycerzes May 18, 2026
b10a448
fix: replace asyncio.Queue with queue.Queue for thread-safe request h…
rycerzes May 18, 2026
659288b
fix: pi config discovery for CLIAgentDriver to be independent of runt…
rycerzes May 18, 2026
5136337
fix: interception params and update max_tokens_cap validation
rycerzes May 20, 2026
8137b15
refactor: remove RolloutTurn references
rycerzes May 20, 2026
3c4ffa4
feat: add tool name allowlist validation
rycerzes May 20, 2026
151d1ab
feat: provider-specific env var handling for Pi agent
rycerzes May 20, 2026
3962490
chore: exit notification handling and build interception rollout URL
rycerzes Jun 1, 2026
88f6a55
refactor(opencode_env): migrate to core harness
rycerzes Jun 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -549,10 +549,10 @@ AgentWorldModel-1K — 1,000 synthetic MCP tool-use environments with 10,000 tas
```
````

````{grid-item-card} Opencode
````{grid-item-card} OpenCode
:class-card: sd-border-1

`opencode_env` runs the OpenCode coding agent inside an isolated E2B sandbox against any OpenAI-compatible LLM endpoint, optionally capturing per-token logpr...
`opencode_env` runs the OpenCode coding agent inside an isolated E2B sandbox against any OpenAI-compatible LLM endpoint, with trainer-owned interception for RL workflows.

+++
```{button-link} environments/opencode.html
Expand Down
71 changes: 36 additions & 35 deletions envs/opencode_env/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,32 +9,33 @@ app_port: 8000
base_path: /web
tags:
- openenv
short_description: OpenCode coding agent in an E2B sandbox with logprob capture
short_description: OpenCode coding agent in an E2B sandbox
---

# OpenCode Environment for OpenEnv

`opencode_env` runs the [OpenCode](https://opencode.ai) coding agent inside
an isolated [E2B](https://e2b.dev) sandbox against any OpenAI-compatible
LLM endpoint, optionally capturing per-token logprobs for GRPO training.
`opencode_env` runs the [OpenCode](https://opencode.ai) coding agent
inside an isolated [E2B](https://e2b.dev) sandbox against any OpenAI-compatible
LLM endpoint, optionally capturing per-token logprobs through a transparent
in-sandbox proxy for RL training data.

**🚀 Try it live**: [`AdithyaSK/opencode-env`](https://huggingface.co/spaces/AdithyaSK/opencode-env)

The deployed Space exposes:

- **Web UI** at [`/web`](https://adithyask-opencode-env.hf.space/web) — pick endpoint, write task, hit Run, watch live phase log + reward + logprobs.
- **Web UI** at [`/web`](https://adithyask-opencode-env.hf.space/web) — pick endpoint, write task, hit Run, watch live phase log + reward.
- **MCP tool API** at [`/mcp`](https://adithyask-opencode-env.hf.space/mcp) — programmatic `run_rollout` calls.
- **OpenAPI docs** at [`/docs`](https://adithyask-opencode-env.hf.space/docs).
- **Health** at [`/health`](https://adithyask-opencode-env.hf.space/health).

The env is **task-agnostic** — every rollout is configured at call-time
with a uniform Task shape:

- **`instruction`** — prompt for the agent
- **`setup`** — list of bash commands run *before* the agent (pip
- **`instruction`** — prompt for OpenCode
- **`setup`** — list of bash commands run *before* OpenCode (pip
install, git clone, file downloads — anything you need staged in the
sandbox)
- **`verify`** — list of bash commands run *after* the agent (asserts,
- **`verify`** — list of bash commands run *after* OpenCode (asserts,
pytest invocations, score-file writes)

Reward = `passed_verify / total_verify` unless any `verify` command writes
Expand Down Expand Up @@ -81,7 +82,6 @@ async def main():
result = RolloutResult.model_validate_json(_extract_text(raw))

print("reward:", result.reward)
print("turns:", len(result.proxy_turns))
print("files:", list(result.files.keys()))
print("wall:", result.wall_s, "s")

Expand All @@ -93,7 +93,6 @@ Expected output (~20s with the prebaked template):

```
reward: 1.0
turns: 3
files: ['/home/user/workdir/binary_search.py', ...]
wall: 19.8 s
```
Expand Down Expand Up @@ -132,11 +131,10 @@ factory = OpenCodeSessionFactory(
model="gpt-4o-mini",
),
sandbox_backend=E2BSandboxBackend(),
mode="transparent_proxy", # captures per-token logprobs
mode="interception_gate", # trainer-owned interception mode
)
session = factory.create(task=OpenCodeTask(instruction="..."))
session.wait_for_completion()
turns = session.fetch_proxy_trace() # per-turn (tokens, logprobs)
session.close()
```

Expand Down Expand Up @@ -174,7 +172,7 @@ The image:

## The MCP Tool: `run_rollout`

Single tool, two ways to specify the LLM endpoint:
Single tool, with two ways to specify the LLM endpoint:

**Option A — endpoint shorthand (recommended)**: pass
`endpoint="vllm"` (or `"openai"` / `"hf_router"`). The server resolves
Expand All @@ -188,27 +186,29 @@ directly.
|---|---|---|---|
| `endpoint` | `str` | `""` | One of `"vllm"` / `"openai"` / `"hf_router"`. |
| `base_url` / `api_key` / `model` | `str` | `""` | Override / supply explicitly. |
| `instruction` | `str` | required | Prompt passed to `opencode run`. |
| `setup` | `list[str]` | `[]` | Bash commands run **before** the agent. |
| `verify` | `list[str]` | `[]` | Bash commands run **after** the agent. |
| `instruction` | `str` | required | Prompt passed to OpenCode. |
| `setup` | `list[str]` | `[]` | Bash commands run **before** OpenCode. |
| `verify` | `list[str]` | `[]` | Bash commands run **after** OpenCode. |
| `task_id` | `str` | `""` | Echoed back in result. |
| `mode` | `str` | `"transparent_proxy"` | Or `"black_box"` (no logprobs). |
| `mode` | `str` | `"transparent_proxy"` | Or `"black_box"` for direct LLM calls. In-process trainers can also construct `OpenCodeSessionFactory(mode="interception_gate", ...)`. |
| `disable_thinking` | `bool \| None` | `None` (catalog default) | Inject `chat_template_kwargs.enable_thinking=false`. |
| `max_tokens_cap` | `int` | `4096` | Per-turn `max_tokens` clamp. |
| `top_logprobs` | `int` | `5` | HF Router cap is 5; OpenAI 0–20; vLLM unbounded. |
| `agent_timeout_s` | `float` | `600.0` | Hard wall budget for opencode. |
| `top_logprobs` | `int` | `5` | Per-token top-k logprobs requested in `transparent_proxy` mode. |
| `agent_timeout_s` | `float` | `600.0` | Hard wall budget for OpenCode. |
| `template` | `str` | `""` | E2B template name; `"opencode-rl"` skips ~2 min of install per rollout. |

Returns `RolloutResult` JSON with: `reward`, `setup_results[]`,
`verify_results[]`, `proxy_turns[]`, `files{}`, `agent_log_tail`,
`proxy_log_tail`, `wall_s`, `agent_exit_code`, `sandbox_id`, `error`.
`verify_results[]`, `proxy_turns[]` (logprob records in transparent-proxy
mode), `files{}`, `agent_log_tail`, `proxy_log_tail`, `wall_s`,
`agent_exit_code`, `sandbox_id`, `error`.

## Two Operating Modes

| Mode | What it does | Best for |
|---|---|---|
| **`transparent_proxy`** (default) | In-sandbox proxy at `localhost:7000` forwards opencode's LLM calls to `base_url`, injects `logprobs=true`, captures per-turn `(messages, completion_tokens, logprobs)` to `proxy_trace.jsonl`. | GRPO / RL training, observability, top-k distillation. |
| **`black_box`** | No proxy. opencode talks straight to `base_url`. | Smoke tests, eval, SFT data collection. |
| **`transparent_proxy`** (default) | OpenCode talks to an in-sandbox proxy. The proxy forwards to `base_url`, requests logprobs, strips them before returning to OpenCode, and records `proxy_turns`. | RL data collection, GRPO-style traces. |
| **`black_box`** | OpenCode talks directly to `base_url`. No logprob capture. | Smoke tests, eval, SFT data collection. |
| **`interception_gate`** | Available through the in-process `OpenCodeSessionFactory`; OpenCode calls are routed through trainer-host interception endpoints. | Trainer-owned generation. |

## Environment Variables

Expand All @@ -227,21 +227,17 @@ sibling `.env` file; on HF Spaces, set them as **Space secrets**.
| **OpenAI endpoint** | | |
| `OPENAI_API_KEY` | required for `endpoint="openai"` | Standard OpenAI key. |
| `OPENAI_BASE_URL` | no | Defaults to `https://api.openai.com/v1`. |
| `OPENAI_MODEL` | no | Defaults to `gpt-4o-mini` (gpt-5.x and o-series refuse logprobs). |
| `OPENAI_MODEL` | no | Defaults to `gpt-4o-mini`. |
| **HF Router endpoint** | | |
| `HF_ROUTER_API_KEY` | required for `endpoint="hf_router"` | HF user token. |
| `HF_ROUTER_BASE_URL` | no | Defaults to `https://router.huggingface.co/v1`. |
| `HF_ROUTER_MODEL` | no | Defaults to `Qwen/Qwen3-4B-Instruct-2507:nscale`. |

Pick `provider:` suffixes that actually return logprobs:
**Together / Nscale / Scaleway / SambaNova / Cerebras**. Avoid Novita /
Hyperbolic / Featherless (silent drop) and Groq (HTTP 400).

## Pre-baked E2B Template

The first rollout in a fresh E2B sandbox spends ~2 min installing
opencode and the proxy's Python deps. Build a one-time template that
ships those pre-installed:
OpenCode tooling. Build a one-time template that ships it pre-installed:

```bash
.venv/bin/python envs/opencode_env/sandbox/build_template.py
Expand All @@ -263,7 +259,7 @@ opencode_env/
├── __init__.py # re-exports primitive + client + models
├── client.py # OpenCodeEnv(MCPToolClient)
├── models.py # RolloutResult / RolloutTurn / OpenCodeState
├── models.py # RolloutResult / OpenCodeState
├── config.py # OpenCodeConfig (primitive)
├── harness.py # OpenCodeSession / OpenCodeSessionFactory (CLI-only)
Expand All @@ -273,22 +269,27 @@ opencode_env/
├── server/
│ ├── __init__.py
│ ├── app.py # FastAPI factory; mounts Gradio at /web
│ ├── opencode_environment.py # MCPEnvironment with single ``run_rollout`` tool
│ ├── opencode_environment.py # MCPEnvironment with single ``run_rollout`` tool
│ ├── gradio_ui.py # the /web Gradio Blocks UI
│ ├── catalog.py # endpoint shorthand resolver
│ └── Dockerfile # multi-stage uv build (used by ``openenv build``)
└── sandbox/
├── __init__.py
├── base.py # SandboxBackend / SandboxHandle Protocols
├── e2b.py # E2B implementation
├── interception.py # in-sandbox FastAPI proxy (logprob capture)
└── build_template.py # one-time E2B template builder

# Shared sandbox runtime (moved to core):
src/openenv/core/harness/sandbox/
├── base.py # SandboxBackend / SandboxHandle protocols
├── e2b_backend.py # E2B implementation
├── docker_backend.py # local Docker backend
├── hf_backend.py # HF sandbox backend
└── _util.py # shared sandbox shell utilities
```

## References

- [OpenEnv docs](https://meta-pytorch.org/OpenEnv/)
- [OpenCode CLI](https://opencode.ai/docs/cli/)
- [E2B Python SDK](https://e2b.dev/docs)
- [HF Inference Providers logprob matrix](../../../DOCS/HF/hf_inference_providers_logprobs.md)

24 changes: 12 additions & 12 deletions envs/opencode_env/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,31 @@

Two layers in this package:

1. **Harness primitive** :class:`OpenCodeSessionFactory` /
1. **Harness primitive** -- :class:`OpenCodeSessionFactory` /
:class:`OpenCodeSession` / :class:`OpenCodeConfig` /
:class:`E2BSandboxBackend`. Used in-process to drive one rollout
inside an E2B sandbox. See ``harness.py``.
:class:`E2BSandboxBackend`. Built on the generic
:class:`CLIAgentDriver` from ``openenv.core.harness.agents``.

2. **Deployable env** :class:`OpenCodeEnv` (MCP client) talks to the
2. **Deployable env** -- :class:`OpenCodeEnv` (MCP client) talks to the
FastAPI server at ``server/app.py`` over HTTP. Use this when the
sandbox + agent live behind an HTTP boundary (e.g. an HF Space).
sandbox + OpenCode live behind an HTTP boundary (e.g. an HF Space).
See ``client.py`` and ``server/``.
"""

from openenv.core.env_server.mcp_types import CallToolAction, ListToolsAction
from openenv.core.harness.sandbox import SandboxBackend, SandboxHandle

from .client import OpenCodeEnv
from .config import OpenCodeConfig, Provider
from .harness import OpenCodeSession, OpenCodeSessionFactory
from .models import (
CommandResult,
OpenCodeState,
RolloutResult,
RolloutTurn,
)
from .sandbox import E2BSandboxBackend, SandboxBackend, SandboxHandle
from .models import CommandResult, OpenCodeState, RolloutResult, RolloutTurn
from .task import OpenCodeTask

try:
from openenv.core.harness.sandbox import E2BSandboxBackend
except ImportError: # e2b not installed
E2BSandboxBackend = None # type: ignore[assignment,misc]

__all__ = [
# Deployed-env client
"OpenCodeEnv",
Expand Down
36 changes: 18 additions & 18 deletions envs/opencode_env/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@
"""Client for the deployed opencode_env server.

The server exposes a single MCP tool ``run_rollout`` that runs one OpenCode
rollout in an E2B sandbox and returns a JSON-serialized :class:`RolloutResult`.
rollout in an E2B sandbox and returns a JSON-serialized
:class:`RolloutResult`.

Example::

from opencode_env import OpenCodeEnv

with OpenCodeEnv(base_url="https://adithya-sk-opencode-env.hf.space") as env:
with OpenCodeEnv(base_url="https://your-space.hf.space") as env:
env.reset()
result = env.run_rollout(
base_url="https://api.openai.com/v1",
Expand All @@ -24,7 +25,7 @@
verify=["python /home/user/test.py"],
task_id="binary_search_v1",
)
print(result.reward, len(result.proxy_turns))
print(result.reward)
"""

from __future__ import annotations
Expand All @@ -50,8 +51,8 @@ class OpenCodeEnv(MCPToolClient):
def run_rollout(
self,
*,
# Endpoint — pass either the shorthand selector OR explicit fields.
endpoint: str = "", # "vllm" | "openai" | "hf_router"
# Endpoint — pass either shorthand endpoint or explicit fields.
endpoint: str = "", # "vllm" | "openai" | "hf_router"
base_url: str = "",
api_key: str = "",
model: str = "",
Expand All @@ -68,7 +69,7 @@ def run_rollout(
agent_timeout_s: float = 600.0,
template: str = "",
) -> RolloutResult:
"""Run one OpenCode rollout and return the typed result.
"""Run one opencode rollout and return the typed result.

Args:
base_url: OpenAI-compatible LLM endpoint (with trailing /v1).
Expand All @@ -77,30 +78,29 @@ def run_rollout(
model: Model id understood by the LLM endpoint
(e.g. ``"gpt-4o-mini"``, ``"Qwen/Qwen3.5-4B"``,
``"Qwen/Qwen3-4B-Instruct-2507:nscale"``).
instruction: Prompt passed to ``opencode run``.
setup: Bash commands run sequentially **before** the agent starts.
instruction: Prompt passed to OpenCode.
setup: Bash commands run sequentially **before** OpenCode starts.
Each command runs in the sandbox; non-zero exit aborts setup.
verify: Bash commands run sequentially **after** the agent exits.
verify: Bash commands run sequentially **after** OpenCode exits.
Reward = ``passed_count / total`` unless any command writes a
float to ``/home/user/logs/verifier/reward.txt`` (override).
task_id: Echoed back in the result for traceability.
mode: ``"transparent_proxy"`` (captures per-token logprobs via
an in-sandbox FastAPI proxy) or ``"black_box"`` (no proxy).
mode: ``"transparent_proxy"`` (default, captures logprobs) or
``"black_box"`` (OpenCode talks directly to the LLM).
disable_thinking: Inject
``chat_template_kwargs.enable_thinking=false`` on forwarded
requests. Needed for Qwen3.5 vLLM; harmless on Instruct
variants; rejected by OpenAI direct.
max_tokens_cap: Clamp on per-turn ``max_tokens``. OpenCode asks
for ~32k by default; gpt-4o-mini caps at 16k.
top_logprobs: Top-k logprobs requested upstream. HF Router caps
at 5; OpenAI accepts up to 20; vLLM is unbounded.
agent_timeout_s: Hard wall-clock budget for one ``opencode run``.
max_tokens_cap: Clamp on per-turn ``max_tokens``.
top_logprobs: Per-token top-k logprobs requested in
``transparent_proxy`` mode.
agent_timeout_s: Hard wall-clock budget for one OpenCode run.
template: E2B template name (e.g. ``"opencode-rl"``). Empty
string uses the default (slow) base image.

Returns:
A :class:`RolloutResult` with reward, per-turn logprobs, file
outputs, setup/verify results, and diagnostic tails.
A :class:`RolloutResult` with reward, proxy_turns, file outputs,
setup/verify results, and diagnostic tails.
"""
raw = self.call_tool(
"run_rollout",
Expand Down
30 changes: 14 additions & 16 deletions envs/opencode_env/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,7 @@ class OpenCodeConfig(BaseModel):

# --- OpenCode CLI ---------------------------------------------------------
opencode_version: str = "latest"
disabled_tools: list[str] = Field(
default_factory=lambda: ["webfetch", "question"]
)
disabled_tools: list[str] = Field(default_factory=lambda: ["webfetch", "question"])
enabled_tools: list[str] | None = None
system_prompt: str | None = None
extra_opencode_json: dict[str, Any] = Field(default_factory=dict)
Expand All @@ -47,25 +45,25 @@ class OpenCodeConfig(BaseModel):
extra_env: dict[str, str] = Field(default_factory=dict)
extra_setup_shell: str | None = None

# --- Model behavior --------------------------------------------------------
# Direct OpenCode config knobs (black_box / interception_gate).
disable_thinking: bool = False
max_tokens_cap: int | None = None

# --- Transparent-proxy logprob capture ------------------------------------
# Compatibility knobs for the HTTP env's logprob-capturing mode. The proxy
# requests OpenAI-compatible logprobs upstream, records them, and strips
# them before returning the response to OpenCode.
proxy_max_tokens_cap: int | None = 16384
proxy_top_logprobs: int = 5
proxy_disable_thinking: bool = False

# --- Sandbox paths --------------------------------------------------------
# Root directory inside the sandbox where the primitive writes config,
# task files, and logs. E2B's default user is ``user`` with home
# ``/home/user``. Override when using a root-privileged backend (Docker).
sandbox_home: str = "/home/user"

# --- Transparent-proxy tuning --------------------------------------------
# Cap ``max_tokens`` / ``max_completion_tokens`` on forwarded requests.
# OpenCode defaults to a very large number (~32000) which exceeds some
# provider limits (e.g. gpt-4o-mini = 16384). Only used in
# ``mode="transparent_proxy"``. ``None`` disables the cap.
proxy_max_tokens_cap: int | None = 16384
# Per-turn top-k logprobs the proxy requests from the upstream.
proxy_top_logprobs: int = 5
# Disable reasoning/thinking mode for Qwen3 / Qwen3.5 models. Proxy sets
# ``extra_body.chat_template_kwargs.enable_thinking=false`` on forwarded
# requests. Ignored by providers that don't support the field.
proxy_disable_thinking: bool = False


_PROVIDER_NPM = {
"openai_compatible": "@ai-sdk/openai-compatible",
Expand Down
Loading