Skip to content

[feature] Import external envs (ors, verifiers)#726

Open
burtenshaw wants to merge 4 commits into
huggingface:mainfrom
burtenshaw:codex/deterministic-env-import
Open

[feature] Import external envs (ors, verifiers)#726
burtenshaw wants to merge 4 commits into
huggingface:mainfrom
burtenshaw:codex/deterministic-env-import

Conversation

@burtenshaw
Copy link
Copy Markdown
Collaborator

@burtenshaw burtenshaw commented May 21, 2026

This PR adds deterministic import support for third-party environment libraries and wraps imported sources as normal OpenEnv environments.

Changes:

  • Adds openenv import SOURCE --name NAME --output-dir DIR with an importer registry.
  • Supports AST-based ORS/OpenReward detection and generated wrappers.
  • Supports AST-based Prime Intellect Verifiers detection and generated wrappers.
  • Adds ORS-compatible task/split APIs to the OpenEnv server.
  • Adds MCP-style tools/list and tools/call support for generated wrappers.
  • Updates README and docs for the import flow.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 21, 2026
@burtenshaw burtenshaw changed the title import external envs [feature] [wip] import external envs (ors, verifiers) May 22, 2026
Copy link
Copy Markdown
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so good and unlocks a lot of ideas!

I've briefly tested it against a couple of tasks from https://huggingface.co/datasets/AdithyaSK/repo2rlenv-v083-pr_diff (generated using Repo2RLEnv), wrapping them with openreward. The deployed space is here (cc @adithya-s-k)

CC has found these 4 issues but works! 😃

1. Circular import — blocks startup completely
src/openenv/cli/importers/ors.py and verifiers.py: the top-level imports of _copy_template_directory / _create_template_replacements from openenv.cli.commands.init create a circular dependency at import time (commands.__init__import_envimportersorscommands.init). Fix: move those imports inside generate().

2. _ORS_MODULES missing real openreward import paths
src/openenv/cli/importers/ors.py, the _ORS_MODULES set: the real openreward SDK uses from openreward.environments import Environment, but neither "openreward.environments" nor "openreward.environments.environment" are in the set, so AST detection fails for real openreward environments.

3. Source filename collision with generated package name
src/openenv/cli/importers/ors.py, _wrapper_source(): the generated import_module("{source_module}") call fails if the source filename matches --name, because Python resolves it to the generated package instead of the vendored file.

4. Generated dependency is ors-sdk which doesn't exist on PyPI
src/openenv/cli/importers/ors.py, _append_dependency_files(): the importer hardcodes ors-sdk>=0.1.0 in requirements.txt and pyproject.toml, but that package doesn't exist on PyPI. Any deployment (e.g. HuggingFace Spaces) fails at build time with ModuleNotFoundError. The importer should detect which ORS SDK the source actually uses (openreward, ors, etc.) and insert the correct package name.

Copy link
Copy Markdown
Collaborator

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Tier 1: Bugs & Correctness

  • _run_sync thread safety — The generated wrapper's _run_sync spawns a new thread unconditionally when inside an async loop. This is wasteful in sync contexts and can cause deadlocks with reentrant locks. Should check is_running() before spawning.

  • _call_task_method creates/destroys env per requesthttp_server.py's _call_task_method calls self._env_factory() for every task API call and immediately .close()s it. Fine for stateless listing, but breaks for envs with side-effectful construction (DB connections, external auth). The statelessness requirement needs documentation.

  • call_mcp_style_step uses _env.step_async.__func__ — This pattern fails if step_async is a functools.partial, lambda, or slot wrapper. Generated wrappers are dynamic classes; if step_async isn't a true def method, .__func__ raises AttributeError.

  • _append_dependency_files string-replaces pyproject.toml — String replacement into TOML is fragile: can corrupt syntax if formatting differs. Use tomllib/tomli_w for safe TOML mutation.

  • _safe_vendor_dir_name name collision — Both "my-source" and "my_source" normalize to "my_source". Two imports can silently overwrite each other's vendor dir. Should detect collisions or incorporate a content hash.

  • _copy_source_tree copies everythingshutil.copytree may copy .env, secrets.yaml, *.pem, compiled .so binaries, or large data. Should exclude known secret file patterns and document that the source tree is copied as-is.

  • sys.path mutation at import time — Generated wrappers insert into sys.path at module import time. If two wrappers vendor different versions of the same library, the first import wins silently, causing version conflicts. Consider isolated import machinery (importlib with custom finder).

  • Code duplication_is_excluded, _iter_python_files, _module_path, _copy_source_tree, _write_text are duplicated between ors.py and verifiers.py. Should live in base.py or shared utils.

Tier 2: Alignment Flags (for human review)

1. New HTTP endpoints during active HTTP deprecation

INVARIANTS.md §4 says "WebSocket for all environment communication" and notes HTTP is being deprecated (PR #252). This PR adds /list_environments, /{env_name}/splits, /{env_name}/tasks, etc. as new HTTP-only endpoints — expanding HTTP surface area during the migration away from it. Should these be WebSocket or MCP methods instead?

2. Environment base class gains new method stubs

Adding list_splits, list_tasks, num_tasks, get_task, get_task_range to the Environment base class changes the contract for all environment authors. The Gymnasium API is reset / step / state. Conflating "task-loading" with the step-loop responsibility may warrant a separate TaskProvider mixin/interface instead.

3. MCP boundary enforcement in generated wrappers

Generated wrappers expose tools/list and tools/call directly through HTTP, bypassing the established MCPEnvironment path. If a wrapper's tool list includes reward-computation internals, those become agent-callable via MCP, potentially violating "rewards inside environment."

4. _ensure_session auto-reset violates agent isolation

The generated wrapper's _ensure_session silently calls reset() if no session exists. If an agent triggers a tool call that invokes _ensure_session, the agent indirectly causes a reset — leaking simulation control to the agent (INVARIANTS.md §1).

5. No RFC for this architectural feature

Importing/wrapping third-party environments introduces a second environment creation path alongside openenv init. The design decisions (task API integration, HTTP vs WebSocket for task routes, external verifier reward handling) are exactly the kind that PRINCIPLES.md says require an RFC before implementation.

Verdict

Request changes. 8 mechanical issues to fix plus 5 alignment questions that should be resolved (ideally via RFC) before this leaves draft. The most critical are the implicit agent-triggered reset (#4) and the absence of an RFC (#5).


Automated review by Claude Code | Learn more

@burtenshaw burtenshaw changed the title [feature] [wip] import external envs (ors, verifiers) [feature] Import external envs (ors, verifiers) May 29, 2026
@burtenshaw burtenshaw marked this pull request as ready for review May 29, 2026 11:24
Copy link
Copy Markdown
Collaborator

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review — PR #726

Tier 1: Fixes Required

[SECURITY] Missing secret-file exclusion in _copy_source_tree
ors.py and verifiers.py define their own _is_excluded() that only excludes build dirs and .pyc files. Unlike base.py's version, they don't exclude .env files, secrets.yaml, or private key files. Running openenv import ./my_ors_env will copy .env files with API keys into the generated vendor/ directory, which could then be published.

  • Files: src/openenv/cli/importers/ors.py, src/openenv/cli/importers/verifiers.py

[BUG] _ensure_session() auto-calls reset() without orchestrator knowledge
In both ORS and Verifiers wrappers, _ensure_session() calls self.reset() if no session exists. An agent sending step() before orchestrator calls reset() triggers an invisible auto-reset, breaking episode tracking. Fix: raise RuntimeError("Call reset() before stepping") instead.

  • Files: ors.py wrapper template (~line 394), verifiers.py wrapper template (~line 418)

[BUG] _run_sync coroutine reuse risk
In both generated wrappers, _run_sync passes a coroutine to asyncio.run() inside a thread. If the coroutine has already been partially consumed, it raises RuntimeError: cannot reuse already awaited coroutine. No guard exists.

[BUG] verifiers.py _score_completion swallows all scoring errors
A bare except Exception: pass around state_cls.for_task(task) silently returns None reward, corrupting the training signal without any log output.

  • File: verifiers.py template (~line 496)

[BUG] Module-level sys.path mutation in generated wrappers
Both wrappers permanently modify sys.path at import time. If two ORS environments vendor different versions of the same package, the first to load wins silently. The PR diff version fixes this with a context manager; the on-disk version doesn't have this fix.

Tier 2: Alignment Flags

ALIGNMENT FLAG: _ensure_session() allows agents to indirectly trigger reset()

  • Invariant: "Agents cannot reset" (INVARIANTS.md)
  • Concern: An agent acting immediately after episode end (when _ors_env is None) causes auto-reset, giving agents indirect reset control.

ALIGNMENT FLAG: New HTTP task/split API widens HTTP surface during WebSocket deprecation

  • Invariant: Communication patterns — "deprecating HTTP in favor of WebSocket-only"
  • Concern: 6 new HTTP endpoints (/list_environments, /{env_name}/splits, /tasks, /num_tasks, /task, /task_range) with no WebSocket equivalent, creating permanent HTTP dependency for ORS/Verifiers environments.

ALIGNMENT FLAG: _call_task_method instantiates throwaway environments for discovery

  • Invariant: "One env = one trajectory" + performance
  • Concern: Every task API call creates a fresh environment via env_factory(). For Verifiers wrappers, this loads the full HF dataset each time. No caching at the HTTP level.

ALIGNMENT FLAG: Generated wrappers use Any/plain dicts instead of Pydantic models

  • Invariant: "All wire types must be Pydantic models" (INVARIANTS.md)
  • Concern: reset() returns Observation with arbitrary dicts in metadata, step() returns plain Python dicts. Extra State kwargs bypass Pydantic validation.

Summary

  • 5 mechanical issues (1 security blocker, 4 bugs)
  • 4 alignment flags for human review
  • Most urgent: .env file leakage in _copy_source_tree and _ensure_session() auto-reset

Automated review by Claude Code | Learn more

@henryjcee
Copy link
Copy Markdown

henryjcee commented Jun 2, 2026

Hi @burtenshaw thanks for getting this together. I've had a play around with it this afternoon across a few of our envs and it seems to be working. I have a few comments but empirically it seems good and I think the approach is good. A few comments:

  • Possible that I'm holding it wrong but appears that the deps from the source environment don't actually get built into the env?
  • A lot of our envs rely on file data that is made available to the filesystem of the env server. Would be cool to have support for baking that into the env at build time or making the data available in some other way to the env.
  • We have an egress proxy that strips and replaces secrets on the way out of our env infrastructure to avoid leaking tenant credentials to env creators. I don't think that's an issue for this but something to flag.

H

Copy link
Copy Markdown
Collaborator

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS (CI green)
  • Tests: PASS (CI green on 3.11 + 3.12)
  • Debug code: CLEAN

Tier 1: Fixes Required

  • Silent reward corruptionverifiers.py generated wrapper _score_completion: except Exception: pass around State.for_task(task) silently swallows errors and produces None reward. At minimum log the exception before continuing.
  • DRY violation_overrides_method is defined identically in both http_server.py and web_interface.py. Extract to a shared utility.
  • Incomplete MCP fallthrough — When supports_mcp_style_actions is True but call_mcp_style_step returns a non-CallToolObservation, the code silently falls through to an "Environment does not support MCP" error. Add explicit handling.
  • _call_task_method statelessness assumption — Document in TaskProvider docstring that list_splits/list_tasks/etc. must be callable on a freshly constructed instance, since the function creates and immediately closes an env.

Tier 2: Alignment Discussion

FLAG 1: Task API endpoints (/list_environments, /{env_name}/splits, etc.) are HTTP on the same server as /mcp. Agents with server access can call them directly and potentially access task specs including answers.

  • Principle: Dual API boundary (INVARIANTS.md §1)
  • Suggested reviewer: @Darktex

FLAG 2: supports_mcp_style_actions introduces a second dispatch path in the MCP handler, bifurcating the MCP contract between "true MCP" and "MCP-style step" environments.

  • Principle: MCP as universal standard (PRINCIPLES.md)
  • Suggested reviewer: @Darktex

Summary

  • 4 mechanical issues (1 is a silent reward corruption risk)
  • 2 alignment points
  • Overall implementation is solid — AST-based detection, vendor isolation, and secret exclusion are well done.

Automated review by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants