Skip to content

fix: bound codex provider executions#116

Open
corben-tech wants to merge 1 commit into
openclaw:mainfrom
corben-tech:fix/codex-provider-timeout
Open

fix: bound codex provider executions#116
corben-tech wants to merge 1 commit into
openclaw:mainfrom
corben-tech:fix/codex-provider-timeout

Conversation

@corben-tech
Copy link
Copy Markdown

@corben-tech corben-tech commented May 26, 2026

Summary

  • Add a 180 second default timeout to Codex provider executions.
  • Allow timeout overrides through CLAWPATCH_CODEX_TIMEOUT_MS and fallback CLAWPATCH_PROVIDER_TIMEOUT_MS.
  • Add regression coverage with a fake hanging codex executable.
  • Document the timeout in provider docs.

Behavior proof

Targeted proof command:

pnpm exec vitest run src/provider.test.ts -t "times out stalled Codex provider executions" --reporter=verbose

Observed output from the contributor branch:

src/provider.test.ts > Codex provider args > times out stalled Codex provider executions 563ms

Test Files  1 passed (1)
Tests       1 passed | 133 skipped (134)
Duration    789ms

What this proves: the test puts a fake codex executable first on PATH; the fake process never exits. With CLAWPATCH_CODEX_TIMEOUT_MS=50, the provider rejects with command timed out after 50ms instead of waiting indefinitely.

Compatibility note: the default is intentionally bounded at 180 seconds. Slow legitimate Codex runs can opt into a longer timeout with CLAWPATCH_CODEX_TIMEOUT_MS or the generic CLAWPATCH_PROVIDER_TIMEOUT_MS.

Verification

  • pnpm test src/provider.test.ts: 134 passed after changelog cleanup
  • pnpm format:check: passed after changelog cleanup
  • pnpm typecheck: passed
  • pnpm lint: passed
  • pnpm build: passed
  • pnpm test: passed, 717 passed and 1 skipped
  • pnpm pack:smoke: passed
  • codex review --uncommitted: no actionable bugs found

@corben-tech corben-tech requested a review from a team as a code owner May 26, 2026 21:59
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 26, 2026

Codex review: needs real behavior proof before merge. Reviewed May 26, 2026, 7:01 PM ET / 23:01 UTC.

Summary
The PR adds a 180-second Codex provider timeout with Codex-specific and generic environment-variable overrides, regression coverage for a hanging fake Codex executable, provider documentation, and a changelog deletion.

Reproducibility: yes. from source inspection: current main calls the Codex JSON provider without timeoutMs, while runCommandArgs only enforces timeouts when that option is supplied. I did not run a hanging real Codex process in this read-only review.

Review metrics: 3 noteworthy metrics.

  • Changed files: 4 files affected. The patch changes default provider runtime behavior, tests, docs, and a release-owned changelog file.
  • Codex timeout default: unbounded to 180 seconds. This is the compatibility-relevant behavior change for existing Codex users.
  • Release-owned file touched: 1 changelog deletion. Normal PRs should leave CHANGELOG.md release structure intact.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Restore the existing 0.4.1 - Unreleased changelog header.
  • Post redacted terminal output, logs, or a terminal screenshot from an actual Codex invocation showing the timeout path, or get a maintainer proof override.
  • Confirm maintainer acceptance of the 180-second default for upgraded users with slow Codex runs.

Proof guidance:
Needs real behavior proof before merge: The PR body includes targeted test output using a fake hanging Codex executable; contributor still needs redacted terminal output, logs, or a terminal screenshot from an actual Codex invocation and should update the PR body to trigger re-review.

Risk before merge

  • The new default changes Codex from unbounded execution to a 180-second fail-closed timeout; existing slow but legitimate Codex runs may start failing until operators set CLAWPATCH_CODEX_TIMEOUT_MS or CLAWPATCH_PROVIDER_TIMEOUT_MS.
  • The PR body’s proof is targeted Vitest output using a fake hanging codex executable, which validates the code path but does not show an actual stalled Codex CLI run in a real setup.

Maintainer options:

  1. Accept the bounded Codex default
    Maintainers can intentionally accept 180 seconds as the new Codex default because the PR documents Codex-specific and generic environment-variable overrides for slower environments.
  2. Require a compatibility default change
    Ask for the timeout to remain opt-in, be substantially higher, or have explicit upgrade messaging if existing long-running Codex workflows must not fail after upgrade.
  3. Hold for real Codex proof
    Keep the PR open but blocked until the contributor posts redacted terminal output, logs, or a terminal screenshot from an actual Codex invocation demonstrating the timeout path.

Next step before merge
Needs maintainer handling for real behavior proof and the 180-second compatibility policy; the remaining changelog defect is mechanical but not enough to make this a repair-lane candidate while proof is missing.

Security
Cleared: The diff touches provider timeout logic, tests, docs, and the changelog; it does not introduce new dependency, workflow, credential, package, or code-download surfaces.

Review findings

  • [P3] Leave the release-owned changelog header intact — CHANGELOG.md:3
Review details

Best possible solution:

Restore the changelog header, then merge a Codex timeout that matches the existing provider timeout pattern once maintainers explicitly accept the 180-second default or choose a different upgrade policy.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection: current main calls the Codex JSON provider without timeoutMs, while runCommandArgs only enforces timeouts when that option is supplied. I did not run a hanging real Codex process in this read-only review.

Is this the best way to solve the issue?

Mostly, but not merge-ready as submitted: the provider change follows existing timeout-helper patterns, but the branch should restore the changelog header and maintainers need to accept or revise the 180-second upgrade behavior.

Full review comments:

  • [P3] Leave the release-owned changelog header intact — CHANGELOG.md:3
    This diff still deletes the current 0.4.1 - Unreleased header from CHANGELOG.md. Release notes are release-owned for normal PRs, so keep the timeout patch scoped to provider code, tests, and docs and restore the existing changelog entry.
    Confidence: 0.94

Overall correctness: patch is correct
Overall confidence: 0.82

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ed3d5750ff89.

Label changes

Label changes:

  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦐 gold shrimp.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal provider reliability fix with limited but real compatibility impact.
  • merge-risk: 🚨 compatibility: Existing Codex runs that legitimately take longer than 180 seconds can fail after upgrade unless users configure a longer timeout.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body includes targeted test output using a fake hanging Codex executable; contributor still needs redacted terminal output, logs, or a terminal screenshot from an actual Codex invocation and should update the PR body to trigger re-review.
Evidence reviewed

What I checked:

  • Current Codex execution is unbounded: On current main, runCodexJson invokes runCommandArgs("codex", args, root, prompt) without a timeout option, so a stalled Codex child can leave map/review/fix/revalidate waiting indefinitely. (src/provider.ts:1545, ed3d5750ff89)
  • Shared timeout support already exists: runCommandArgs accepts timeoutMs, terminates timed-out commands, returns exit code 124, and appends command timed out after <ms>ms to stderr. (src/exec.ts:72, ed3d5750ff89)
  • Adjacent provider timeout pattern exists: Cursor, Claude, Pi, and ACPX already route provider executions through provider-specific timeout helpers with CLAWPATCH_PROVIDER_TIMEOUT_MS fallback, so the Codex helper follows an established local pattern. (src/provider.ts:741, ed3d5750ff89)
  • PR diff still touches release-owned changelog: The fetched PR diff deletes the existing 0.4.1 - Unreleased header from CHANGELOG.md even after the contributor follow-up said the changelog edit was cleaned up. (CHANGELOG.md:3, 05e24547c688)
  • Provider docs currently omit Codex timeout: Current docs/providers.md describes Codex invocation without a timeout line, while ACPX already documents a 180-second timeout with provider-specific and generic overrides. (docs/providers.md:28, ed3d5750ff89)
  • Relevant history owner signal: git blame ties the current Codex provider and shared exec timeout implementation to commit cdd58ac by Peter Steinberger, which is also tagged v0.4.0. (src/provider.ts:1521, cdd58ac59213)

Likely related people:

  • Peter Steinberger: git blame on src/provider.ts and src/exec.ts ties the current Codex provider, runCodexJson, and runCommandArgs timeout machinery to cdd58ac. (role: introduced current provider and exec surface; confidence: medium; commits: cdd58ac59213; files: src/provider.ts, src/exec.ts, src/provider.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels May 26, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 26, 2026

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@corben-tech
Copy link
Copy Markdown
Author

Local verification from the contributor branch is complete:

  • pnpm test src/provider.test.ts passed: 134 tests
  • pnpm typecheck passed
  • pnpm lint passed
  • pnpm build passed
  • pnpm test passed: 717 passed, 1 skipped
  • pnpm pack:smoke passed
  • codex review --uncommitted found no actionable bugs

The upstream CI and CodeQL runs are currently blocked in action_required because this is a fork PR. I attempted to approve them from corben-tech, but GitHub reports that account has only read access to openclaw/clawpatch, so a repo admin/maintainer will need to approve the workflow runs.

@corben-tech corben-tech force-pushed the fix/codex-provider-timeout branch from e17bd4a to 05e2454 Compare May 26, 2026 22:56
@corben-tech
Copy link
Copy Markdown
Author

Updated the PR body with real timeout behavior proof and removed the release-owned changelog edit in commit 05e2454.

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 26, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant