[feature] Import external envs (ors, verifiers)#726
Conversation
sergiopaniego
left a comment
There was a problem hiding this comment.
this is so good and unlocks a lot of ideas!
I've briefly tested it against a couple of tasks from https://huggingface.co/datasets/AdithyaSK/repo2rlenv-v083-pr_diff (generated using Repo2RLEnv), wrapping them with openreward. The deployed space is here (cc @adithya-s-k)
CC has found these 4 issues but works! 😃
1. Circular import — blocks startup completely
src/openenv/cli/importers/ors.py and verifiers.py: the top-level imports of _copy_template_directory / _create_template_replacements from openenv.cli.commands.init create a circular dependency at import time (commands.__init__ → import_env → importers → ors → commands.init). Fix: move those imports inside generate().
2. _ORS_MODULES missing real openreward import paths
src/openenv/cli/importers/ors.py, the _ORS_MODULES set: the real openreward SDK uses from openreward.environments import Environment, but neither "openreward.environments" nor "openreward.environments.environment" are in the set, so AST detection fails for real openreward environments.
3. Source filename collision with generated package name
src/openenv/cli/importers/ors.py, _wrapper_source(): the generated import_module("{source_module}") call fails if the source filename matches --name, because Python resolves it to the generated package instead of the vendored file.
4. Generated dependency is ors-sdk which doesn't exist on PyPI
src/openenv/cli/importers/ors.py, _append_dependency_files(): the importer hardcodes ors-sdk>=0.1.0 in requirements.txt and pyproject.toml, but that package doesn't exist on PyPI. Any deployment (e.g. HuggingFace Spaces) fails at build time with ModuleNotFoundError. The importer should detect which ORS SDK the source actually uses (openreward, ors, etc.) and insert the correct package name.
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Tier 1: Bugs & Correctness
-
_run_syncthread safety — The generated wrapper's_run_syncspawns a new thread unconditionally when inside an async loop. This is wasteful in sync contexts and can cause deadlocks with reentrant locks. Should checkis_running()before spawning. -
_call_task_methodcreates/destroys env per request —http_server.py's_call_task_methodcallsself._env_factory()for every task API call and immediately.close()s it. Fine for stateless listing, but breaks for envs with side-effectful construction (DB connections, external auth). The statelessness requirement needs documentation. -
call_mcp_style_stepuses_env.step_async.__func__— This pattern fails ifstep_asyncis afunctools.partial, lambda, or slot wrapper. Generated wrappers are dynamic classes; ifstep_asyncisn't a truedefmethod,.__func__raisesAttributeError. -
_append_dependency_filesstring-replacespyproject.toml— String replacement into TOML is fragile: can corrupt syntax if formatting differs. Usetomllib/tomli_wfor safe TOML mutation. -
_safe_vendor_dir_namename collision — Both"my-source"and"my_source"normalize to"my_source". Two imports can silently overwrite each other's vendor dir. Should detect collisions or incorporate a content hash. -
_copy_source_treecopies everything —shutil.copytreemay copy.env,secrets.yaml,*.pem, compiled.sobinaries, or large data. Should exclude known secret file patterns and document that the source tree is copied as-is. -
sys.pathmutation at import time — Generated wrappers insert intosys.pathat module import time. If two wrappers vendor different versions of the same library, the first import wins silently, causing version conflicts. Consider isolated import machinery (importlibwith custom finder). -
Code duplication —
_is_excluded,_iter_python_files,_module_path,_copy_source_tree,_write_textare duplicated betweenors.pyandverifiers.py. Should live inbase.pyor shared utils.
Tier 2: Alignment Flags (for human review)
1. New HTTP endpoints during active HTTP deprecation
INVARIANTS.md §4 says "WebSocket for all environment communication" and notes HTTP is being deprecated (PR #252). This PR adds /list_environments, /{env_name}/splits, /{env_name}/tasks, etc. as new HTTP-only endpoints — expanding HTTP surface area during the migration away from it. Should these be WebSocket or MCP methods instead?
2. Environment base class gains new method stubs
Adding list_splits, list_tasks, num_tasks, get_task, get_task_range to the Environment base class changes the contract for all environment authors. The Gymnasium API is reset / step / state. Conflating "task-loading" with the step-loop responsibility may warrant a separate TaskProvider mixin/interface instead.
3. MCP boundary enforcement in generated wrappers
Generated wrappers expose tools/list and tools/call directly through HTTP, bypassing the established MCPEnvironment path. If a wrapper's tool list includes reward-computation internals, those become agent-callable via MCP, potentially violating "rewards inside environment."
4. _ensure_session auto-reset violates agent isolation
The generated wrapper's _ensure_session silently calls reset() if no session exists. If an agent triggers a tool call that invokes _ensure_session, the agent indirectly causes a reset — leaking simulation control to the agent (INVARIANTS.md §1).
5. No RFC for this architectural feature
Importing/wrapping third-party environments introduces a second environment creation path alongside openenv init. The design decisions (task API integration, HTTP vs WebSocket for task routes, external verifier reward handling) are exactly the kind that PRINCIPLES.md says require an RFC before implementation.
Verdict
Request changes. 8 mechanical issues to fix plus 5 alignment questions that should be resolved (ideally via RFC) before this leaves draft. The most critical are the implicit agent-triggered reset (#4) and the absence of an RFC (#5).
Automated review by Claude Code | Learn more
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Alignment Review — PR #726
Tier 1: Fixes Required
[SECURITY] Missing secret-file exclusion in _copy_source_tree
ors.py and verifiers.py define their own _is_excluded() that only excludes build dirs and .pyc files. Unlike base.py's version, they don't exclude .env files, secrets.yaml, or private key files. Running openenv import ./my_ors_env will copy .env files with API keys into the generated vendor/ directory, which could then be published.
- Files:
src/openenv/cli/importers/ors.py,src/openenv/cli/importers/verifiers.py
[BUG] _ensure_session() auto-calls reset() without orchestrator knowledge
In both ORS and Verifiers wrappers, _ensure_session() calls self.reset() if no session exists. An agent sending step() before orchestrator calls reset() triggers an invisible auto-reset, breaking episode tracking. Fix: raise RuntimeError("Call reset() before stepping") instead.
- Files:
ors.pywrapper template (~line 394),verifiers.pywrapper template (~line 418)
[BUG] _run_sync coroutine reuse risk
In both generated wrappers, _run_sync passes a coroutine to asyncio.run() inside a thread. If the coroutine has already been partially consumed, it raises RuntimeError: cannot reuse already awaited coroutine. No guard exists.
[BUG] verifiers.py _score_completion swallows all scoring errors
A bare except Exception: pass around state_cls.for_task(task) silently returns None reward, corrupting the training signal without any log output.
- File:
verifiers.pytemplate (~line 496)
[BUG] Module-level sys.path mutation in generated wrappers
Both wrappers permanently modify sys.path at import time. If two ORS environments vendor different versions of the same package, the first to load wins silently. The PR diff version fixes this with a context manager; the on-disk version doesn't have this fix.
Tier 2: Alignment Flags
ALIGNMENT FLAG: _ensure_session() allows agents to indirectly trigger reset()
- Invariant: "Agents cannot reset" (INVARIANTS.md)
- Concern: An agent acting immediately after episode end (when
_ors_env is None) causes auto-reset, giving agents indirect reset control.
ALIGNMENT FLAG: New HTTP task/split API widens HTTP surface during WebSocket deprecation
- Invariant: Communication patterns — "deprecating HTTP in favor of WebSocket-only"
- Concern: 6 new HTTP endpoints (
/list_environments,/{env_name}/splits,/tasks,/num_tasks,/task,/task_range) with no WebSocket equivalent, creating permanent HTTP dependency for ORS/Verifiers environments.
ALIGNMENT FLAG: _call_task_method instantiates throwaway environments for discovery
- Invariant: "One env = one trajectory" + performance
- Concern: Every task API call creates a fresh environment via
env_factory(). For Verifiers wrappers, this loads the full HF dataset each time. No caching at the HTTP level.
ALIGNMENT FLAG: Generated wrappers use Any/plain dicts instead of Pydantic models
- Invariant: "All wire types must be Pydantic models" (INVARIANTS.md)
- Concern:
reset()returnsObservationwith arbitrary dicts inmetadata,step()returns plain Python dicts. ExtraStatekwargs bypass Pydantic validation.
Summary
- 5 mechanical issues (1 security blocker, 4 bugs)
- 4 alignment flags for human review
- Most urgent:
.envfile leakage in_copy_source_treeand_ensure_session()auto-reset
Automated review by Claude Code | Learn more
|
Hi @burtenshaw thanks for getting this together. I've had a play around with it this afternoon across a few of our envs and it seems to be working. I have a few comments but empirically it seems good and I think the approach is good. A few comments:
H |
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Alignment Review Report
Automated Checks
- Lint: PASS (CI green)
- Tests: PASS (CI green on 3.11 + 3.12)
- Debug code: CLEAN
Tier 1: Fixes Required
- Silent reward corruption —
verifiers.pygenerated wrapper_score_completion:except Exception: passaroundState.for_task(task)silently swallows errors and producesNonereward. At minimum log the exception before continuing. - DRY violation —
_overrides_methodis defined identically in bothhttp_server.pyandweb_interface.py. Extract to a shared utility. - Incomplete MCP fallthrough — When
supports_mcp_style_actionsisTruebutcall_mcp_style_stepreturns a non-CallToolObservation, the code silently falls through to an "Environment does not support MCP" error. Add explicit handling. -
_call_task_methodstatelessness assumption — Document inTaskProviderdocstring thatlist_splits/list_tasks/etc. must be callable on a freshly constructed instance, since the function creates and immediately closes an env.
Tier 2: Alignment Discussion
FLAG 1: Task API endpoints (/list_environments, /{env_name}/splits, etc.) are HTTP on the same server as /mcp. Agents with server access can call them directly and potentially access task specs including answers.
- Principle: Dual API boundary (INVARIANTS.md §1)
- Suggested reviewer: @Darktex
FLAG 2: supports_mcp_style_actions introduces a second dispatch path in the MCP handler, bifurcating the MCP contract between "true MCP" and "MCP-style step" environments.
- Principle: MCP as universal standard (PRINCIPLES.md)
- Suggested reviewer: @Darktex
Summary
- 4 mechanical issues (1 is a silent reward corruption risk)
- 2 alignment points
- Overall implementation is solid — AST-based detection, vendor isolation, and secret exclusion are well done.
Automated review by Claude Code
This PR adds deterministic import support for third-party environment libraries and wraps imported sources as normal OpenEnv environments.
Changes:
openenv import SOURCE --name NAME --output-dir DIRwith an importer registry.tools/listandtools/callsupport for generated wrappers.