[feature] Import external envs (ors, verifiers) by burtenshaw · Pull Request #726 · huggingface/OpenEnv

burtenshaw · 2026-05-21T08:08:52Z

This PR adds deterministic import support for third-party environment libraries and wraps imported sources as normal OpenEnv environments.

Changes:

Adds openenv import SOURCE --name NAME --output-dir DIR with an importer registry.
Supports AST-based ORS/OpenReward detection and generated wrappers.
Supports AST-based Prime Intellect Verifiers detection and generated wrappers.
Adds ORS-compatible task/split APIs to the OpenEnv server.
Adds MCP-style tools/list and tools/call support for generated wrappers.
Updates README and docs for the import flow.

sergiopaniego

this is so good and unlocks a lot of ideas!

I've briefly tested it against a couple of tasks from https://huggingface.co/datasets/AdithyaSK/repo2rlenv-v083-pr_diff (generated using Repo2RLEnv), wrapping them with openreward. The deployed space is here (cc @adithya-s-k)

CC has found these 4 issues but works! 😃

1. Circular import — blocks startup completely
src/openenv/cli/importers/ors.py and verifiers.py: the top-level imports of _copy_template_directory / _create_template_replacements from openenv.cli.commands.init create a circular dependency at import time (commands.__init__ → import_env → importers → ors → commands.init). Fix: move those imports inside generate().

2. _ORS_MODULES missing real openreward import paths
src/openenv/cli/importers/ors.py, the _ORS_MODULES set: the real openreward SDK uses from openreward.environments import Environment, but neither "openreward.environments" nor "openreward.environments.environment" are in the set, so AST detection fails for real openreward environments.

3. Source filename collision with generated package name
src/openenv/cli/importers/ors.py, _wrapper_source(): the generated import_module("{source_module}") call fails if the source filename matches --name, because Python resolves it to the generated package instead of the vendored file.

4. Generated dependency is ors-sdk which doesn't exist on PyPI
src/openenv/cli/importers/ors.py, _append_dependency_files(): the importer hardcodes ors-sdk>=0.1.0 in requirements.txt and pyproject.toml, but that package doesn't exist on PyPI. Any deployment (e.g. HuggingFace Spaces) fails at build time with ModuleNotFoundError. The importer should detect which ORS SDK the source actually uses (openreward, ors, etc.) and insert the correct package name.

Darktex

Note: This is an automated review by Claude Code, not a human review.

Tier 1: Bugs & Correctness

_run_sync thread safety — The generated wrapper's _run_sync spawns a new thread unconditionally when inside an async loop. This is wasteful in sync contexts and can cause deadlocks with reentrant locks. Should check is_running() before spawning.
_call_task_method creates/destroys env per request — http_server.py's _call_task_method calls self._env_factory() for every task API call and immediately .close()s it. Fine for stateless listing, but breaks for envs with side-effectful construction (DB connections, external auth). The statelessness requirement needs documentation.
call_mcp_style_step uses _env.step_async.__func__ — This pattern fails if step_async is a functools.partial, lambda, or slot wrapper. Generated wrappers are dynamic classes; if step_async isn't a true def method, .__func__ raises AttributeError.
_append_dependency_files string-replaces pyproject.toml — String replacement into TOML is fragile: can corrupt syntax if formatting differs. Use tomllib/tomli_w for safe TOML mutation.
_safe_vendor_dir_name name collision — Both "my-source" and "my_source" normalize to "my_source". Two imports can silently overwrite each other's vendor dir. Should detect collisions or incorporate a content hash.
_copy_source_tree copies everything — shutil.copytree may copy .env, secrets.yaml, *.pem, compiled .so binaries, or large data. Should exclude known secret file patterns and document that the source tree is copied as-is.
sys.path mutation at import time — Generated wrappers insert into sys.path at module import time. If two wrappers vendor different versions of the same library, the first import wins silently, causing version conflicts. Consider isolated import machinery (importlib with custom finder).
Code duplication — _is_excluded, _iter_python_files, _module_path, _copy_source_tree, _write_text are duplicated between ors.py and verifiers.py. Should live in base.py or shared utils.

Tier 2: Alignment Flags (for human review)

1. New HTTP endpoints during active HTTP deprecation

INVARIANTS.md §4 says "WebSocket for all environment communication" and notes HTTP is being deprecated (PR #252). This PR adds /list_environments, /{env_name}/splits, /{env_name}/tasks, etc. as new HTTP-only endpoints — expanding HTTP surface area during the migration away from it. Should these be WebSocket or MCP methods instead?

2. `Environment` base class gains new method stubs

Adding list_splits, list_tasks, num_tasks, get_task, get_task_range to the Environment base class changes the contract for all environment authors. The Gymnasium API is reset / step / state. Conflating "task-loading" with the step-loop responsibility may warrant a separate TaskProvider mixin/interface instead.

3. MCP boundary enforcement in generated wrappers

Generated wrappers expose tools/list and tools/call directly through HTTP, bypassing the established MCPEnvironment path. If a wrapper's tool list includes reward-computation internals, those become agent-callable via MCP, potentially violating "rewards inside environment."

4. `_ensure_session` auto-reset violates agent isolation

The generated wrapper's _ensure_session silently calls reset() if no session exists. If an agent triggers a tool call that invokes _ensure_session, the agent indirectly causes a reset — leaking simulation control to the agent (INVARIANTS.md §1).

5. No RFC for this architectural feature

Importing/wrapping third-party environments introduces a second environment creation path alongside openenv init. The design decisions (task API integration, HTTP vs WebSocket for task routes, external verifier reward handling) are exactly the kind that PRINCIPLES.md says require an RFC before implementation.

Verdict

Request changes. 8 mechanical issues to fix plus 5 alignment questions that should be resolved (ideally via RFC) before this leaves draft. The most critical are the implicit agent-triggered reset (#4) and the absence of an RFC (#5).

Automated review by Claude Code | Learn more

Darktex

Note: This is an automated review by Claude Code, not a human review.

Alignment Review — PR #726

Tier 1: Fixes Required

[SECURITY] Missing secret-file exclusion in _copy_source_tree
ors.py and verifiers.py define their own _is_excluded() that only excludes build dirs and .pyc files. Unlike base.py's version, they don't exclude .env files, secrets.yaml, or private key files. Running openenv import ./my_ors_env will copy .env files with API keys into the generated vendor/ directory, which could then be published.

Files: src/openenv/cli/importers/ors.py, src/openenv/cli/importers/verifiers.py

[BUG] _ensure_session() auto-calls reset() without orchestrator knowledge
In both ORS and Verifiers wrappers, _ensure_session() calls self.reset() if no session exists. An agent sending step() before orchestrator calls reset() triggers an invisible auto-reset, breaking episode tracking. Fix: raise RuntimeError("Call reset() before stepping") instead.

Files: ors.py wrapper template (~line 394), verifiers.py wrapper template (~line 418)

[BUG] _run_sync coroutine reuse risk
In both generated wrappers, _run_sync passes a coroutine to asyncio.run() inside a thread. If the coroutine has already been partially consumed, it raises RuntimeError: cannot reuse already awaited coroutine. No guard exists.

[BUG] verifiers.py _score_completion swallows all scoring errors
A bare except Exception: pass around state_cls.for_task(task) silently returns None reward, corrupting the training signal without any log output.

File: verifiers.py template (~line 496)

[BUG] Module-level sys.path mutation in generated wrappers
Both wrappers permanently modify sys.path at import time. If two ORS environments vendor different versions of the same package, the first to load wins silently. The PR diff version fixes this with a context manager; the on-disk version doesn't have this fix.

Tier 2: Alignment Flags

ALIGNMENT FLAG: _ensure_session() allows agents to indirectly trigger reset()

Invariant: "Agents cannot reset" (INVARIANTS.md)
Concern: An agent acting immediately after episode end (when _ors_env is None) causes auto-reset, giving agents indirect reset control.

ALIGNMENT FLAG: New HTTP task/split API widens HTTP surface during WebSocket deprecation

Invariant: Communication patterns — "deprecating HTTP in favor of WebSocket-only"
Concern: 6 new HTTP endpoints (/list_environments, /{env_name}/splits, /tasks, /num_tasks, /task, /task_range) with no WebSocket equivalent, creating permanent HTTP dependency for ORS/Verifiers environments.

ALIGNMENT FLAG: _call_task_method instantiates throwaway environments for discovery

Invariant: "One env = one trajectory" + performance
Concern: Every task API call creates a fresh environment via env_factory(). For Verifiers wrappers, this loads the full HF dataset each time. No caching at the HTTP level.

ALIGNMENT FLAG: Generated wrappers use Any/plain dicts instead of Pydantic models

Invariant: "All wire types must be Pydantic models" (INVARIANTS.md)
Concern: reset() returns Observation with arbitrary dicts in metadata, step() returns plain Python dicts. Extra State kwargs bypass Pydantic validation.

Summary

5 mechanical issues (1 security blocker, 4 bugs)
4 alignment flags for human review
Most urgent: .env file leakage in _copy_source_tree and _ensure_session() auto-reset

Automated review by Claude Code | Learn more

henryjcee · 2026-06-02T19:03:54Z

Hi @burtenshaw thanks for getting this together. I've had a play around with it this afternoon across a few of our envs and it seems to be working. I have a few comments but empirically it seems good and I think the approach is good. A few comments:

Possible that I'm holding it wrong but appears that the deps from the source environment don't actually get built into the env?
A lot of our envs rely on file data that is made available to the filesystem of the env server. Would be cool to have support for baking that into the env at build time or making the data available in some other way to the env.
We have an egress proxy that strips and replaces secrets on the way out of our env infrastructure to avoid leaking tenant credentials to env creators. I don't think that's an issue for this but something to flag.

H

Darktex

Note: This is an automated review by Claude Code, not a human review.

Alignment Review Report

Automated Checks

Lint: PASS (CI green)
Tests: PASS (CI green on 3.11 + 3.12)
Debug code: CLEAN

Tier 1: Fixes Required

Silent reward corruption — verifiers.py generated wrapper _score_completion: except Exception: pass around State.for_task(task) silently swallows errors and produces None reward. At minimum log the exception before continuing.
DRY violation — _overrides_method is defined identically in both http_server.py and web_interface.py. Extract to a shared utility.
Incomplete MCP fallthrough — When supports_mcp_style_actions is True but call_mcp_style_step returns a non-CallToolObservation, the code silently falls through to an "Environment does not support MCP" error. Add explicit handling.
_call_task_method statelessness assumption — Document in TaskProvider docstring that list_splits/list_tasks/etc. must be callable on a freshly constructed instance, since the function creates and immediately closes an env.

Tier 2: Alignment Discussion

FLAG 1: Task API endpoints (/list_environments, /{env_name}/splits, etc.) are HTTP on the same server as /mcp. Agents with server access can call them directly and potentially access task specs including answers.

Principle: Dual API boundary (INVARIANTS.md §1)
Suggested reviewer: @Darktex

FLAG 2: supports_mcp_style_actions introduces a second dispatch path in the MCP handler, bifurcating the MCP contract between "true MCP" and "MCP-style step" environments.

Principle: MCP as universal standard (PRINCIPLES.md)
Suggested reviewer: @Darktex

Summary

4 mechanical issues (1 is a silent reward corruption risk)
2 alignment points
Overall implementation is solid — AST-based detection, vendor isolation, and secret exclusion are well done.

Automated review by Claude Code

feat: import external environments

68ee605

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 21, 2026

feat: add import integration demo

a703f31

burtenshaw changed the title ~~import external envs~~ [feature] [wip] import external envs (ors, verifiers) May 22, 2026

sergiopaniego reviewed May 22, 2026

View reviewed changes

Darktex reviewed May 22, 2026

View reviewed changes

fix: address import wrapper review feedback

83549d5

burtenshaw changed the title ~~[feature] [wip] import external envs (ors, verifiers)~~ [feature] Import external envs (ors, verifiers) May 29, 2026

burtenshaw marked this pull request as ready for review May 29, 2026 11:24

fix: satisfy ci formatting and tests

dac82c6

Darktex requested changes May 29, 2026

View reviewed changes

burtenshaw mentioned this pull request Jun 1, 2026

[RFC 006] add external environment import rfc #731

Open

Darktex reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Import external envs (ors, verifiers)#726

[feature] Import external envs (ors, verifiers)#726
burtenshaw wants to merge 4 commits into
huggingface:mainfrom
burtenshaw:codex/deterministic-env-import

burtenshaw commented May 21, 2026 •

edited

Loading

Uh oh!

sergiopaniego left a comment

Uh oh!

Darktex left a comment

Uh oh!

Darktex left a comment

Uh oh!

henryjcee commented Jun 2, 2026 •

edited

Loading

Uh oh!

Darktex left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

burtenshaw commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Tier 1: Bugs & Correctness

Tier 2: Alignment Flags (for human review)

1. New HTTP endpoints during active HTTP deprecation

2. Environment base class gains new method stubs

3. MCP boundary enforcement in generated wrappers

4. _ensure_session auto-reset violates agent isolation

5. No RFC for this architectural feature

Verdict

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review — PR #726

Tier 1: Fixes Required

Tier 2: Alignment Flags

Summary

Uh oh!

henryjcee commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review Report

Automated Checks

Tier 1: Fixes Required

Tier 2: Alignment Discussion

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

burtenshaw commented May 21, 2026 •

edited

Loading

2. `Environment` base class gains new method stubs

4. `_ensure_session` auto-reset violates agent isolation

henryjcee commented Jun 2, 2026 •

edited

Loading