Skip to content

Add wall-clock timeout support to MultiTurnEnv#1166

Open
xeophon wants to merge 1 commit intomainfrom
codex/multiturn-timeout-kwargs
Open

Add wall-clock timeout support to MultiTurnEnv#1166
xeophon wants to merge 1 commit intomainfrom
codex/multiturn-timeout-kwargs

Conversation

@xeophon
Copy link
Copy Markdown
Member

@xeophon xeophon commented Apr 17, 2026

Summary

  • add a nullable timeout_seconds kwarg to MultiTurnEnv and stop timed out rollouts with timeout_reached
  • keep timeout configuration on the existing kwargs path (extra_env_kwargs) instead of adding a first-party eval flag
  • add focused tests and docs updates, plus a tiny CliAgentEnv typing fix needed for the repo's ty pre-push check

Testing

  • uv run pytest tests/test_multiturn_env.py tests/test_eval_cli.py
  • uv run pre-commit run --all-files
  • ad hoc smoke test: ToolEnv with a 10s sleeping tool and timeout_seconds=5 stopped in ~5s with stop_condition=timeout_reached

Note

Medium Risk
Changes the core MultiTurnEnv.rollout execution loop to support wall-clock cancellation, which could affect rollout completion/cleanup behavior and stop-condition reporting across all multi-turn environments.

Overview
Adds a nullable timeout_seconds parameter to MultiTurnEnv and a new built-in stop condition timeout_reached, allowing multi-turn rollouts to terminate based on wall-clock time.

Updates MultiTurnEnv.rollout to enforce the timeout via task cancellation and to mark timed-out states as completed/truncated with stop_condition="timeout_reached", ensuring timing, cleanup, and completion rendering still run.

Extends eval configuration/tests to pass timeout_seconds through extra_env_kwargs (CLI + TOML), adjusts CliAgentEnv sandbox resource calculation when the timeout is unset, and updates docs/reference to document the new timeout behavior and configuration.

Reviewed by Cursor Bugbot for commit 12d9b08. Bugbot is set up for automated code reviews on this repo. Configure here.

@xeophon xeophon force-pushed the codex/multiturn-timeout-kwargs branch from 7773606 to 12d9b08 Compare April 17, 2026 11:14
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 777360656e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +205 to +209
done, _ = await asyncio.wait({rollout_task}, timeout=self.timeout_seconds)
if rollout_task in done:
return await rollout_task

rollout_task.cancel()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-check rollout task completion before forcing timeout

When asyncio.wait(..., timeout=...) returns due to timeout, the rollout task can still complete naturally in the small window before rollout_task.cancel() runs. This branch unconditionally marks the rollout as timed out/truncated, so near-deadline successful rollouts can be mislabeled and have cleanup run again with timeout metadata. Add a rollout_task.done() check immediately before cancellation/timeout state mutation so completed work is returned as-is.

Useful? React with 👍 / 👎.

Comment on lines +213 to +216
state["timed_out"] = True
state["is_completed"] = True
state["is_truncated"] = True
state["stop_condition"] = "timeout_reached"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve timeout hook side effects in forced-timeout path

This forced-timeout path writes timed_out/stop_condition directly instead of executing stop-condition logic, so subclass timeout side effects are skipped. In ApiEnv, timeout_reached sets agent_timed_out and ApiEnvMonitorRubric reads that field, so rollouts that time out via this branch can be reported as not timed out in metrics/logs. Route timeout finalization through the same timeout hook (or mirror its side effects) to keep subclass state consistent.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 12d9b08. Configure here.

await self._render_timing(state)
await self._cleanup(state)
await self.render_completion(state)
return state
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollout timeout races with stop condition causing double cleanup

Medium Severity

CliAgentEnv always sets self.timeout_seconds = 3600.0, so the new rollout method enters the asyncio.wait timeout path. Both timeout_reached (checking start_time) and asyncio.wait fire at nearly the same wall-clock moment (offset by microseconds). When timeout_reached triggers, is_completed calls _cleanup (which includes destroy_sandbox — a network call). Microseconds later asyncio.wait returns, the task is cancelled mid-cleanup, then the outer code calls _cleanup again — running handlers like destroy_sandbox a second time on a partially-cleaned state.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 12d9b08. Configure here.

Comment thread docs/evaluation.md

```bash
prime eval run my-env -x '{"max_turns": 20}'
prime eval run my-env -x '{"max_turns": 20, "timeout_seconds": 600}'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills file not updated after docs/evaluation.md change

Low Severity

docs/evaluation.md now documents timeout_seconds as an --extra-env-kwargs option, but skills/evaluate-environments/SKILL.md (line 92) still only shows max_turns as the extra_env_kwargs example. Per project rules, changes to docs/evaluation.md require verifying and updating affected skill files.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 12d9b08. Configure here.

async def timeout_reached(self, state: State) -> bool:
if self.timeout_seconds is None:
return False
if time.time() - state["timing"]["start_time"] <= self.timeout_seconds:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use time.perf_counter() everywhere, not time.time(). The former is more accurate and robust against things like time-zone changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants