Add wall-clock timeout support to MultiTurnEnv by xeophon · Pull Request #1166 · PrimeIntellect-ai/verifiers

xeophon · 2026-04-17T11:12:17Z

Summary

add a nullable timeout_seconds kwarg to MultiTurnEnv and stop timed out rollouts with timeout_reached
keep timeout configuration on the existing kwargs path (extra_env_kwargs) instead of adding a first-party eval flag
add focused tests and docs updates, plus a tiny CliAgentEnv typing fix needed for the repo's ty pre-push check

Testing

uv run pytest tests/test_multiturn_env.py tests/test_eval_cli.py
uv run pre-commit run --all-files
ad hoc smoke test: ToolEnv with a 10s sleeping tool and timeout_seconds=5 stopped in ~5s with stop_condition=timeout_reached

Note

Medium Risk
Changes the core MultiTurnEnv.rollout execution loop to support wall-clock cancellation, which could affect rollout completion/cleanup behavior and stop-condition reporting across all multi-turn environments.

Overview
Adds a nullable timeout_seconds parameter to MultiTurnEnv and a new built-in stop condition timeout_reached, allowing multi-turn rollouts to terminate based on wall-clock time.

Updates MultiTurnEnv.rollout to enforce the timeout via task cancellation and to mark timed-out states as completed/truncated with stop_condition="timeout_reached", ensuring timing, cleanup, and completion rendering still run.

Extends eval configuration/tests to pass timeout_seconds through extra_env_kwargs (CLI + TOML), adjusts CliAgentEnv sandbox resource calculation when the timeout is unset, and updates docs/reference to document the new timeout behavior and configuration.

^{Reviewed by Cursor Bugbot for commit 12d9b08. Bugbot is set up for automated code reviews on this repo. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 777360656e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T11:18:53Z

+            done, _ = await asyncio.wait({rollout_task}, timeout=self.timeout_seconds)
+            if rollout_task in done:
+                return await rollout_task
+
+            rollout_task.cancel()


Re-check rollout task completion before forcing timeout

When asyncio.wait(..., timeout=...) returns due to timeout, the rollout task can still complete naturally in the small window before rollout_task.cancel() runs. This branch unconditionally marks the rollout as timed out/truncated, so near-deadline successful rollouts can be mislabeled and have cleanup run again with timeout metadata. Add a rollout_task.done() check immediately before cancellation/timeout state mutation so completed work is returned as-is.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T11:18:53Z

+            state["timed_out"] = True
+            state["is_completed"] = True
+            state["is_truncated"] = True
+            state["stop_condition"] = "timeout_reached"


Preserve timeout hook side effects in forced-timeout path

This forced-timeout path writes timed_out/stop_condition directly instead of executing stop-condition logic, so subclass timeout side effects are skipped. In ApiEnv, timeout_reached sets agent_timed_out and ApiEnvMonitorRubric reads that field, so rollouts that time out via this branch can be reported as not timed out in metrics/logs. Route timeout finalization through the same timeout hook (or mirror its side effects) to keep subclass state consistent.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 12d9b08. Configure here.}

cursor · 2026-04-17T11:22:59Z

+            await self._render_timing(state)
+            await self._cleanup(state)
+            await self.render_completion(state)
+            return state


Rollout timeout races with stop condition causing double cleanup

Medium Severity

CliAgentEnv always sets self.timeout_seconds = 3600.0, so the new rollout method enters the asyncio.wait timeout path. Both timeout_reached (checking start_time) and asyncio.wait fire at nearly the same wall-clock moment (offset by microseconds). When timeout_reached triggers, is_completed calls _cleanup (which includes destroy_sandbox — a network call). Microseconds later asyncio.wait returns, the task is cancelled mid-cleanup, then the outer code calls _cleanup again — running handlers like destroy_sandbox a second time on a partially-cleaned state.

Additional Locations (1)

verifiers/envs/experimental/cli_agent_env.py#L120-L121

^{Reviewed by Cursor Bugbot for commit 12d9b08. Configure here.}

cursor · 2026-04-17T11:22:59Z


 ```bash
-prime eval run my-env -x '{"max_turns": 20}'
+prime eval run my-env -x '{"max_turns": 20, "timeout_seconds": 600}'


Skills file not updated after docs/evaluation.md change

Low Severity

docs/evaluation.md now documents timeout_seconds as an --extra-env-kwargs option, but skills/evaluate-environments/SKILL.md (line 92) still only shows max_turns as the extra_env_kwargs example. Per project rules, changes to docs/evaluation.md require verifying and updating affected skill files.

^{Triggered by project rule: BugBot Instructions}

^{Reviewed by Cursor Bugbot for commit 12d9b08. Configure here.}

snimu · 2026-04-17T11:56:09Z

+    async def timeout_reached(self, state: State) -> bool:
+        if self.timeout_seconds is None:
+            return False
+        if time.time() - state["timing"]["start_time"] <= self.timeout_seconds:


should use time.perf_counter() everywhere, not time.time(). The former is more accurate and robust against things like time-zone changes.

Add MultiTurnEnv timeout kwarg

12d9b08

xeophon force-pushed the codex/multiturn-timeout-kwargs branch from 7773606 to 12d9b08 Compare April 17, 2026 11:14

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

cursor bot reviewed Apr 17, 2026

View reviewed changes

snimu reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wall-clock timeout support to MultiTurnEnv#1166

Add wall-clock timeout support to MultiTurnEnv#1166
xeophon wants to merge 1 commit intomainfrom
codex/multiturn-timeout-kwargs

xeophon commented Apr 17, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 17, 2026

Uh oh!

cursor bot Apr 17, 2026

Uh oh!

snimu Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xeophon commented Apr 17, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 17, 2026

Choose a reason for hiding this comment

Rollout timeout races with stop condition causing double cleanup

Uh oh!

cursor bot Apr 17, 2026

Choose a reason for hiding this comment

Skills file not updated after docs/evaluation.md change

Uh oh!

snimu Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xeophon commented Apr 17, 2026 •

edited by cursor bot

Loading