Skip to content

Timeout wedged Codex exec calls#115

Open
camwest wants to merge 2 commits into
openclaw:mainfrom
camwest:codex/codex-provider-timeout
Open

Timeout wedged Codex exec calls#115
camwest wants to merge 2 commits into
openclaw:mainfrom
camwest:codex/codex-provider-timeout

Conversation

@camwest
Copy link
Copy Markdown
Contributor

@camwest camwest commented May 26, 2026

[problem]

  • Codex exec can wedge indefinitely.
  • One stuck child blocks the run.
  • No report is emitted.

[solution]

Wire Codex through existing provider timeout handling. Default mirrors Cursor: 300s; override with CLAWPATCH_CODEX_TIMEOUT_MS or CLAWPATCH_PROVIDER_TIMEOUT_MS.

[proof]

  • Shimmed wedged Codex exec timed out at 50ms.
  • Regression asserts feature error state and released locks.
  • Checks: typecheck, lint, format, build, focused Vitest.

@camwest camwest requested a review from a team as a code owner May 26, 2026 15:11
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 26, 2026

Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 9:42 AM ET / 13:42 UTC.

Summary
The PR adds a Codex provider timeout using existing subprocess timeout handling, adds Codex-specific and generic timeout overrides, documents the timeout, and covers the timeout path in provider/workflow tests.

Reproducibility: yes. from source: current main invokes Codex exec without timeoutMs, while runCommandArgs only enforces termination when timeoutMs is set. The PR comment also supplies terminal output from a shimmed wedged Codex run timing out after 50ms.

Review metrics: 2 noteworthy metrics.

  • Provider default changed: unlimited Codex exec runtime -> 300000ms default. This is the compatibility decision maintainers should notice before merge.
  • Diff size: 4 files changed; +113/-1. The patch is focused on provider timeout wiring, docs, and regression coverage.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Maintainers should explicitly accept the 300s Codex default or request a different compatibility contract before merge.

Risk before merge

  • [P1] Merging changes Codex map, review, fix, and revalidate calls from no clawpatch-imposed runtime limit to a 300000ms default, so legitimate long-running Codex jobs could start failing unless maintainers accept the documented override path.

Maintainer options:

  1. Accept the documented 300s default (recommended)
    Merge as-is if maintainers want wedged default-provider runs to fail after 300000ms and consider the environment overrides sufficient for longer Codex runs.
  2. Change the compatibility posture
    Request a longer default, explicit opt-out, or strict-mode-style behavior before merge, then update the docs and timeout tests to match that contract.

Next step before merge

  • [P2] Human maintainer judgment is needed for the 300s default compatibility tradeoff; there is no narrow automation repair to request.

Security
Cleared: The diff routes an existing local Codex subprocess through existing timeout handling and adds tests/docs; no dependency, workflow, secret, or supply-chain change was found.

Review details

Best possible solution:

Land the focused timeout once maintainers accept the 300s default as the intended Codex provider contract, keeping the documented provider-level and Codex-specific overrides.

Do we have a high-confidence way to reproduce the issue?

Yes from source: current main invokes Codex exec without timeoutMs, while runCommandArgs only enforces termination when timeoutMs is set. The PR comment also supplies terminal output from a shimmed wedged Codex run timing out after 50ms.

Is this the best way to solve the issue?

Mostly yes: reusing runCommandArgs timeout handling is the narrow implementation and the PR adds docs plus regression coverage. The remaining decision is whether the hard 300s Codex default is the right compatibility contract.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ed3d5750ff89.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The latest PR comment provides copied terminal output from a shimmed wedged Codex exec timing out after the fix, plus the focused regression result and lock-state assertion.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The latest PR comment provides copied terminal output from a shimmed wedged Codex exec timing out after the fix, plus the focused regression result and lock-state assertion.
  • remove status: 📣 needs proof: Current PR status label is status: 👀 ready for maintainer look.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.

Label justifications:

  • P1: A wedged default Codex provider call can block a real clawpatch run and prevent a report from being emitted.
  • merge-risk: 🚨 compatibility: The PR introduces a hard Codex exec timeout where existing slow runs previously had no clawpatch-imposed limit.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The latest PR comment provides copied terminal output from a shimmed wedged Codex exec timing out after the fix, plus the focused regression result and lock-state assertion.
  • proof: sufficient: Contributor real behavior proof is sufficient. The latest PR comment provides copied terminal output from a shimmed wedged Codex exec timing out after the fix, plus the focused regression result and lock-state assertion.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was read in full and applied for the TypeScript CLI, provider/workflow test, docs, and generated-output review expectations. (AGENTS.md:1, ed3d5750ff89)
  • Current main has no Codex exec timeout: Current main calls runCommandArgs("codex", ...) without an options object in runCodexJson, so Codex map/review/fix/revalidate calls do not use timeoutMs today. (src/provider.ts:1545, ed3d5750ff89)
  • Existing timeout machinery is available: runCommandArgs already accepts timeoutMs, terminates timed children, returns exit 124, and appends a timeout message when timeoutMs is set. (src/exec.ts:72, ed3d5750ff89)
  • PR wires Codex into timeout handling: The PR adds CODEX_DEFAULT_TIMEOUT_MS, codexTimeoutMs(), and passes timeoutMs: codexTimeoutMs() into the Codex runCommandArgs call. (src/provider.ts:1547, d4ff0dbc568f)
  • PR adds regression coverage: The workflow regression installs a wedged codex shim, sets CLAWPATCH_CODEX_TIMEOUT_MS=50, expects the review to time out, and asserts the feature is error with no remaining lock files. (src/workflow.test.ts:674, d4ff0dbc568f)
  • PR documents the new contract: The provider docs now state the Codex 300 second default and the CLAWPATCH_CODEX_TIMEOUT_MS / CLAWPATCH_PROVIDER_TIMEOUT_MS overrides. (docs/providers.md:35, d4ff0dbc568f)

Likely related people:

  • Peter Steinberger: git blame points the current Codex provider, exec timeout implementation, and review lock/error handling to the v0.4.0 release commit, and recent main history also touches this area. (role: introduced behavior and recent area contributor; confidence: high; commits: cdd58ac59213, ed3d5750ff89, ac2724f; files: src/provider.ts, src/exec.ts, src/app.ts)
  • Humberto Farias: git log shows prior Codex exec argument handling work in provider/exec-adjacent code, which is relevant to subprocess invocation behavior. (role: adjacent Codex subprocess contributor; confidence: medium; commits: 59ee4e1; files: src/provider.ts, src/exec.ts)
  • Samuel Rodda: git log shows prior work on Codex provider sandbox override behavior near the changed Codex provider surface. (role: adjacent Codex provider contributor; confidence: medium; commits: 09ce939; files: src/provider.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 Urgent regression or broken agent/channel workflow affecting real users now. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels May 26, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 26, 2026

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@camwest
Copy link
Copy Markdown
Contributor Author

camwest commented May 28, 2026

Added docs in d4ff0db to make the timeout contract explicit: Codex now mirrors Cursor's 300s provider timeout default, with CLAWPATCH_CODEX_TIMEOUT_MS and CLAWPATCH_PROVIDER_TIMEOUT_MS overrides.

Behavior proof from a shimmed wedged codex exec:

clawpatch review feature-error index=1 total=1 feature=feat_cli-command_4fd29cd328 elapsed=1s error=codex provider failed: command timed out after 50ms
clawpatch review failed run=20260528T133615-e7b02e errors=1

Test Files  1 passed (1)
Tests  1 passed | 110 skipped (111)

The regression also asserts the feature is marked error and .clawpatch/locks is empty after timeout.

Checks run:

pnpm format:check
pnpm typecheck
pnpm lint
pnpm build
pnpm vitest run src/provider.test.ts src/workflow.test.ts
pnpm vitest run src/workflow.test.ts -t "times out wedged codex exec review children"

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 28, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P1 Urgent regression or broken agent/channel workflow affecting real users now. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant