huggingface · rycerzes · May 12, 2026 · May 12, 2026 · May 12, 2026 · May 13, 2026
diff --git a/docs/source/environments.md b/docs/source/environments.md
@@ -549,10 +549,10 @@ AgentWorldModel-1K — 1,000 synthetic MCP tool-use environments with 10,000 tas
 ```
 ````
 
-````{grid-item-card} Opencode
+````{grid-item-card} OpenCode
 :class-card: sd-border-1
 
-`opencode_env` runs the OpenCode coding agent inside an isolated E2B sandbox against any OpenAI-compatible LLM endpoint, optionally capturing per-token logpr...
+`opencode_env` runs the OpenCode coding agent inside an isolated E2B sandbox against any OpenAI-compatible LLM endpoint, with trainer-owned interception for RL workflows.
 
 +++
 ```{button-link} environments/opencode.html

diff --git a/envs/opencode_env/README.md b/envs/opencode_env/README.md
@@ -9,32 +9,33 @@ app_port: 8000
 base_path: /web
 tags:
   - openenv
-short_description: OpenCode coding agent in an E2B sandbox with logprob capture
+short_description: OpenCode coding agent in an E2B sandbox
 ---
 
 # OpenCode Environment for OpenEnv
 
-`opencode_env` runs the [OpenCode](https://opencode.ai) coding agent inside
-an isolated [E2B](https://e2b.dev) sandbox against any OpenAI-compatible
-LLM endpoint, optionally capturing per-token logprobs for GRPO training.
+`opencode_env` runs the [OpenCode](https://opencode.ai) coding agent
+inside an isolated [E2B](https://e2b.dev) sandbox against any OpenAI-compatible
+LLM endpoint, optionally capturing per-token logprobs through a transparent
+in-sandbox proxy for RL training data.
 
 **🚀 Try it live**: [`AdithyaSK/opencode-env`](https://huggingface.co/spaces/AdithyaSK/opencode-env)
 
 The deployed Space exposes:
 
-- **Web UI** at [`/web`](https://adithyask-opencode-env.hf.space/web) — pick endpoint, write task, hit Run, watch live phase log + reward + logprobs.
+- **Web UI** at [`/web`](https://adithyask-opencode-env.hf.space/web) — pick endpoint, write task, hit Run, watch live phase log + reward.
 - **MCP tool API** at [`/mcp`](https://adithyask-opencode-env.hf.space/mcp) — programmatic `run_rollout` calls.
 - **OpenAPI docs** at [`/docs`](https://adithyask-opencode-env.hf.space/docs).
 - **Health** at [`/health`](https://adithyask-opencode-env.hf.space/health).
 
 The env is **task-agnostic** — every rollout is configured at call-time
 with a uniform Task shape:
 
-  - **`instruction`** — prompt for the agent
-  - **`setup`** — list of bash commands run *before* the agent (pip
+  - **`instruction`** — prompt for OpenCode
+  - **`setup`** — list of bash commands run *before* OpenCode (pip
     install, git clone, file downloads — anything you need staged in the
     sandbox)
-  - **`verify`** — list of bash commands run *after* the agent (asserts,
+  - **`verify`** — list of bash commands run *after* OpenCode (asserts,
     pytest invocations, score-file writes)
 
 Reward = `passed_verify / total_verify` unless any `verify` command writes
@@ -81,7 +82,6 @@ async def main():
         result = RolloutResult.model_validate_json(_extract_text(raw))
 
         print("reward:", result.reward)
-        print("turns:", len(result.proxy_turns))
         print("files:", list(result.files.keys()))
         print("wall:", result.wall_s, "s")
 
@@ -93,7 +93,6 @@ Expected output (~20s with the prebaked template):
 
 ```
 reward: 1.0
-turns: 3
 files: ['/home/user/workdir/binary_search.py', ...]
 wall: 19.8 s
 ```
@@ -132,11 +131,10 @@ factory = OpenCodeSessionFactory(
         model="gpt-4o-mini",
     ),
     sandbox_backend=E2BSandboxBackend(),
-    mode="transparent_proxy",                   # captures per-token logprobs
+    mode="interception_gate",                  # trainer-owned interception mode
 )
 session = factory.create(task=OpenCodeTask(instruction="..."))
 session.wait_for_completion()
-turns = session.fetch_proxy_trace()             # per-turn (tokens, logprobs)
 session.close()
 ```
 
@@ -174,7 +172,7 @@ The image:
 
 ## The MCP Tool: `run_rollout`
 
-Single tool, two ways to specify the LLM endpoint:
+Single tool, with two ways to specify the LLM endpoint:
 
 **Option A — endpoint shorthand (recommended)**: pass
 `endpoint="vllm"` (or `"openai"` / `"hf_router"`). The server resolves
@@ -188,27 +186,29 @@ directly.
 |---|---|---|---|
 | `endpoint` | `str` | `""` | One of `"vllm"` / `"openai"` / `"hf_router"`. |
 | `base_url` / `api_key` / `model` | `str` | `""` | Override / supply explicitly. |
-| `instruction` | `str` | required | Prompt passed to `opencode run`. |
-| `setup` | `list[str]` | `[]` | Bash commands run **before** the agent. |
-| `verify` | `list[str]` | `[]` | Bash commands run **after** the agent. |
+| `instruction` | `str` | required | Prompt passed to OpenCode. |
+| `setup` | `list[str]` | `[]` | Bash commands run **before** OpenCode. |
+| `verify` | `list[str]` | `[]` | Bash commands run **after** OpenCode. |
 | `task_id` | `str` | `""` | Echoed back in result. |
-| `mode` | `str` | `"transparent_proxy"` | Or `"black_box"` (no logprobs). |
+| `mode` | `str` | `"transparent_proxy"` | Or `"black_box"` for direct LLM calls. In-process trainers can also construct `OpenCodeSessionFactory(mode="interception_gate", ...)`. |
 | `disable_thinking` | `bool \| None` | `None` (catalog default) | Inject `chat_template_kwargs.enable_thinking=false`. |
 | `max_tokens_cap` | `int` | `4096` | Per-turn `max_tokens` clamp. |
-| `top_logprobs` | `int` | `5` | HF Router cap is 5; OpenAI 0–20; vLLM unbounded. |
-| `agent_timeout_s` | `float` | `600.0` | Hard wall budget for opencode. |
+| `top_logprobs` | `int` | `5` | Per-token top-k logprobs requested in `transparent_proxy` mode. |
+| `agent_timeout_s` | `float` | `600.0` | Hard wall budget for OpenCode. |
 | `template` | `str` | `""` | E2B template name; `"opencode-rl"` skips ~2 min of install per rollout. |
 
 Returns `RolloutResult` JSON with: `reward`, `setup_results[]`,
-`verify_results[]`, `proxy_turns[]`, `files{}`, `agent_log_tail`,
-`proxy_log_tail`, `wall_s`, `agent_exit_code`, `sandbox_id`, `error`.
+`verify_results[]`, `proxy_turns[]` (logprob records in transparent-proxy
+mode), `files{}`, `agent_log_tail`, `proxy_log_tail`, `wall_s`,
+`agent_exit_code`, `sandbox_id`, `error`.
 
 ## Two Operating Modes
 
 | Mode | What it does | Best for |
 |---|---|---|
-| **`transparent_proxy`** (default) | In-sandbox proxy at `localhost:7000` forwards opencode's LLM calls to `base_url`, injects `logprobs=true`, captures per-turn `(messages, completion_tokens, logprobs)` to `proxy_trace.jsonl`. | GRPO / RL training, observability, top-k distillation. |
-| **`black_box`** | No proxy. opencode talks straight to `base_url`. | Smoke tests, eval, SFT data collection. |
+| **`transparent_proxy`** (default) | OpenCode talks to an in-sandbox proxy. The proxy forwards to `base_url`, requests logprobs, strips them before returning to OpenCode, and records `proxy_turns`. | RL data collection, GRPO-style traces. |
+| **`black_box`** | OpenCode talks directly to `base_url`. No logprob capture. | Smoke tests, eval, SFT data collection. |
+| **`interception_gate`** | Available through the in-process `OpenCodeSessionFactory`; OpenCode calls are routed through trainer-host interception endpoints. | Trainer-owned generation. |
 
 ## Environment Variables
 
@@ -227,21 +227,17 @@ sibling `.env` file; on HF Spaces, set them as **Space secrets**.
 | **OpenAI endpoint** | | |
 | `OPENAI_API_KEY` | required for `endpoint="openai"` | Standard OpenAI key. |
 | `OPENAI_BASE_URL` | no | Defaults to `https://api.openai.com/v1`. |
-| `OPENAI_MODEL` | no | Defaults to `gpt-4o-mini` (gpt-5.x and o-series refuse logprobs). |
+| `OPENAI_MODEL` | no | Defaults to `gpt-4o-mini`. |
 | **HF Router endpoint** | | |
 | `HF_ROUTER_API_KEY` | required for `endpoint="hf_router"` | HF user token. |
 | `HF_ROUTER_BASE_URL` | no | Defaults to `https://router.huggingface.co/v1`. |
 | `HF_ROUTER_MODEL` | no | Defaults to `Qwen/Qwen3-4B-Instruct-2507:nscale`. |
 
-Pick `provider:` suffixes that actually return logprobs:
-**Together / Nscale / Scaleway / SambaNova / Cerebras**. Avoid Novita /
-Hyperbolic / Featherless (silent drop) and Groq (HTTP 400).
 
 ## Pre-baked E2B Template
 
 The first rollout in a fresh E2B sandbox spends ~2 min installing
-opencode and the proxy's Python deps. Build a one-time template that
-ships those pre-installed:
+OpenCode tooling. Build a one-time template that ships it pre-installed:
 
 ```bash
 .venv/bin/python envs/opencode_env/sandbox/build_template.py
@@ -263,7 +259,7 @@ opencode_env/
 ├── __init__.py                     # re-exports primitive + client + models
 │
 ├── client.py                       # OpenCodeEnv(MCPToolClient)
-├── models.py                       # RolloutResult / RolloutTurn / OpenCodeState
+├── models.py                       # RolloutResult / OpenCodeState
 │
 ├── config.py                       # OpenCodeConfig (primitive)
 ├── harness.py                      # OpenCodeSession / OpenCodeSessionFactory (CLI-only)
@@ -273,22 +269,27 @@ opencode_env/
 ├── server/
 │   ├── __init__.py
 │   ├── app.py                      # FastAPI factory; mounts Gradio at /web
-│   ├── opencode_environment.py     # MCPEnvironment with single ``run_rollout`` tool
+│   ├── opencode_environment.py    # MCPEnvironment with single ``run_rollout`` tool
 │   ├── gradio_ui.py                # the /web Gradio Blocks UI
 │   ├── catalog.py                  # endpoint shorthand resolver
 │   └── Dockerfile                  # multi-stage uv build (used by ``openenv build``)
 │
 └── sandbox/
     ├── __init__.py
-    ├── base.py                     # SandboxBackend / SandboxHandle Protocols
-    ├── e2b.py                      # E2B implementation
-    ├── interception.py             # in-sandbox FastAPI proxy (logprob capture)
     └── build_template.py           # one-time E2B template builder
+
+# Shared sandbox runtime (moved to core):
+src/openenv/core/harness/sandbox/
+├── base.py                         # SandboxBackend / SandboxHandle protocols
+├── e2b_backend.py                  # E2B implementation
+├── docker_backend.py               # local Docker backend
+├── hf_backend.py                   # HF sandbox backend
+└── _util.py                        # shared sandbox shell utilities
 ```
 
 ## References
 
 - [OpenEnv docs](https://meta-pytorch.org/OpenEnv/)
 - [OpenCode CLI](https://opencode.ai/docs/cli/)
 - [E2B Python SDK](https://e2b.dev/docs)
-- [HF Inference Providers logprob matrix](../../../DOCS/HF/hf_inference_providers_logprobs.md)
+
diff --git a/envs/opencode_env/__init__.py b/envs/opencode_env/__init__.py
@@ -8,31 +8,31 @@
 
 Two layers in this package:
 
-1. **Harness primitive** — :class:`OpenCodeSessionFactory` /
+1. **Harness primitive** -- :class:`OpenCodeSessionFactory` /
    :class:`OpenCodeSession` / :class:`OpenCodeConfig` /
-   :class:`E2BSandboxBackend`. Used in-process to drive one rollout
-   inside an E2B sandbox. See ``harness.py``.
+   :class:`E2BSandboxBackend`. Built on the generic
+   :class:`CLIAgentDriver` from ``openenv.core.harness.agents``.
 
-2. **Deployable env** — :class:`OpenCodeEnv` (MCP client) talks to the
+2. **Deployable env** -- :class:`OpenCodeEnv` (MCP client) talks to the
    FastAPI server at ``server/app.py`` over HTTP. Use this when the
-   sandbox + agent live behind an HTTP boundary (e.g. an HF Space).
+   sandbox + OpenCode live behind an HTTP boundary (e.g. an HF Space).
    See ``client.py`` and ``server/``.
 """
 
 from openenv.core.env_server.mcp_types import CallToolAction, ListToolsAction
+from openenv.core.harness.sandbox import SandboxBackend, SandboxHandle
 
 from .client import OpenCodeEnv
 from .config import OpenCodeConfig, Provider
 from .harness import OpenCodeSession, OpenCodeSessionFactory
-from .models import (
-    CommandResult,
-    OpenCodeState,
-    RolloutResult,
-    RolloutTurn,
-)
-from .sandbox import E2BSandboxBackend, SandboxBackend, SandboxHandle
+from .models import CommandResult, OpenCodeState, RolloutResult, RolloutTurn
 from .task import OpenCodeTask
 
+try:
+    from openenv.core.harness.sandbox import E2BSandboxBackend
+except ImportError:  # e2b not installed
+    E2BSandboxBackend = None  # type: ignore[assignment,misc]
+
 __all__ = [
     # Deployed-env client
     "OpenCodeEnv",

diff --git a/envs/opencode_env/client.py b/envs/opencode_env/client.py
@@ -7,13 +7,14 @@
 """Client for the deployed opencode_env server.
 
 The server exposes a single MCP tool ``run_rollout`` that runs one OpenCode
-rollout in an E2B sandbox and returns a JSON-serialized :class:`RolloutResult`.
+rollout in an E2B sandbox and returns a JSON-serialized
+:class:`RolloutResult`.
 
 Example::
 
     from opencode_env import OpenCodeEnv
 
-    with OpenCodeEnv(base_url="https://adithya-sk-opencode-env.hf.space") as env:
+    with OpenCodeEnv(base_url="https://your-space.hf.space") as env:
         env.reset()
         result = env.run_rollout(
             base_url="https://api.openai.com/v1",
@@ -24,7 +25,7 @@
             verify=["python /home/user/test.py"],
             task_id="binary_search_v1",
         )
-        print(result.reward, len(result.proxy_turns))
+        print(result.reward)
 """
 
 from __future__ import annotations
@@ -50,8 +51,8 @@ class OpenCodeEnv(MCPToolClient):
     def run_rollout(
         self,
         *,
-        # Endpoint — pass either the shorthand selector OR explicit fields.
-        endpoint: str = "",                # "vllm" | "openai" | "hf_router"
+        # Endpoint — pass either shorthand endpoint or explicit fields.
+        endpoint: str = "",  # "vllm" | "openai" | "hf_router"
         base_url: str = "",
         api_key: str = "",
         model: str = "",
@@ -68,7 +69,7 @@ def run_rollout(
         agent_timeout_s: float = 600.0,
         template: str = "",
     ) -> RolloutResult:
-        """Run one OpenCode rollout and return the typed result.
+        """Run one opencode rollout and return the typed result.
 
         Args:
             base_url: OpenAI-compatible LLM endpoint (with trailing /v1).
@@ -77,30 +78,29 @@ def run_rollout(
             model: Model id understood by the LLM endpoint
                 (e.g. ``"gpt-4o-mini"``, ``"Qwen/Qwen3.5-4B"``,
                 ``"Qwen/Qwen3-4B-Instruct-2507:nscale"``).
-            instruction: Prompt passed to ``opencode run``.
-            setup: Bash commands run sequentially **before** the agent starts.
+            instruction: Prompt passed to OpenCode.
+            setup: Bash commands run sequentially **before** OpenCode starts.
                 Each command runs in the sandbox; non-zero exit aborts setup.
-            verify: Bash commands run sequentially **after** the agent exits.
+            verify: Bash commands run sequentially **after** OpenCode exits.
                 Reward = ``passed_count / total`` unless any command writes a
                 float to ``/home/user/logs/verifier/reward.txt`` (override).
             task_id: Echoed back in the result for traceability.
-            mode: ``"transparent_proxy"`` (captures per-token logprobs via
-                an in-sandbox FastAPI proxy) or ``"black_box"`` (no proxy).
+            mode: ``"transparent_proxy"`` (default, captures logprobs) or
+                ``"black_box"`` (OpenCode talks directly to the LLM).
             disable_thinking: Inject
                 ``chat_template_kwargs.enable_thinking=false`` on forwarded
                 requests. Needed for Qwen3.5 vLLM; harmless on Instruct
                 variants; rejected by OpenAI direct.
-            max_tokens_cap: Clamp on per-turn ``max_tokens``. OpenCode asks
-                for ~32k by default; gpt-4o-mini caps at 16k.
-            top_logprobs: Top-k logprobs requested upstream. HF Router caps
-                at 5; OpenAI accepts up to 20; vLLM is unbounded.
-            agent_timeout_s: Hard wall-clock budget for one ``opencode run``.
+            max_tokens_cap: Clamp on per-turn ``max_tokens``.
+            top_logprobs: Per-token top-k logprobs requested in
+                ``transparent_proxy`` mode.
+            agent_timeout_s: Hard wall-clock budget for one OpenCode run.
             template: E2B template name (e.g. ``"opencode-rl"``). Empty
                 string uses the default (slow) base image.
 
         Returns:
-            A :class:`RolloutResult` with reward, per-turn logprobs, file
-            outputs, setup/verify results, and diagnostic tails.
+            A :class:`RolloutResult` with reward, proxy_turns, file outputs,
+            setup/verify results, and diagnostic tails.
         """
         raw = self.call_tool(
             "run_rollout",

diff --git a/envs/opencode_env/config.py b/envs/opencode_env/config.py
@@ -34,9 +34,7 @@ class OpenCodeConfig(BaseModel):
 
     # --- OpenCode CLI ---------------------------------------------------------
     opencode_version: str = "latest"
-    disabled_tools: list[str] = Field(
-        default_factory=lambda: ["webfetch", "question"]
-    )
+    disabled_tools: list[str] = Field(default_factory=lambda: ["webfetch", "question"])
     enabled_tools: list[str] | None = None
     system_prompt: str | None = None
     extra_opencode_json: dict[str, Any] = Field(default_factory=dict)
@@ -47,25 +45,25 @@ class OpenCodeConfig(BaseModel):
     extra_env: dict[str, str] = Field(default_factory=dict)
     extra_setup_shell: str | None = None
 
+    # --- Model behavior --------------------------------------------------------
+    # Direct OpenCode config knobs (black_box / interception_gate).
+    disable_thinking: bool = False
+    max_tokens_cap: int | None = None
+
+    # --- Transparent-proxy logprob capture ------------------------------------
+    # Compatibility knobs for the HTTP env's logprob-capturing mode. The proxy
+    # requests OpenAI-compatible logprobs upstream, records them, and strips
+    # them before returning the response to OpenCode.
+    proxy_max_tokens_cap: int | None = 16384
+    proxy_top_logprobs: int = 5
+    proxy_disable_thinking: bool = False
+
     # --- Sandbox paths --------------------------------------------------------
     # Root directory inside the sandbox where the primitive writes config,
     # task files, and logs. E2B's default user is ``user`` with home
     # ``/home/user``. Override when using a root-privileged backend (Docker).
     sandbox_home: str = "/home/user"
 
-    # --- Transparent-proxy tuning --------------------------------------------
-    # Cap ``max_tokens`` / ``max_completion_tokens`` on forwarded requests.
-    # OpenCode defaults to a very large number (~32000) which exceeds some
-    # provider limits (e.g. gpt-4o-mini = 16384). Only used in
-    # ``mode="transparent_proxy"``. ``None`` disables the cap.
-    proxy_max_tokens_cap: int | None = 16384
-    # Per-turn top-k logprobs the proxy requests from the upstream.
-    proxy_top_logprobs: int = 5
-    # Disable reasoning/thinking mode for Qwen3 / Qwen3.5 models. Proxy sets
-    # ``extra_body.chat_template_kwargs.enable_thinking=false`` on forwarded
-    # requests. Ignored by providers that don't support the field.
-    proxy_disable_thinking: bool = False
-
 
 _PROVIDER_NPM = {
     "openai_compatible": "@ai-sdk/openai-compatible",