Skip to content

fix(benchmarks): make all comparator lanes cross-platform on Windows (#97)#97

Merged
PatrickSys merged 2 commits intomasterfrom
fix/benchmark-comparator-lanes-cross-platform
Apr 13, 2026
Merged

fix(benchmarks): make all comparator lanes cross-platform on Windows (#97)#97
PatrickSys merged 2 commits intomasterfrom
fix/benchmark-comparator-lanes-cross-platform

Conversation

@PatrickSys
Copy link
Copy Markdown
Owner

Summary

Fixes all five comparator adapter lanes in scripts/benchmark-comparators.mjs that were setup_failed on Windows 11 due to Unix-only shell constructs. Satisfies EVAL-02 (real benchmark data from at least 2 lanes).

Root causes fixed:

  • 2>/dev/null shell redirects (not valid on cmd.exe/batch) — replaced with stdio: 'pipe'
  • which (Unix-only) — removed; npx and binary commands used directly
  • Hardcoded python3 (doesn't exist on Windows) — replaced with pythonCmd platform constant (python on Win32, python3 on Unix)
  • Shell-based execAsync with manual quote escaping in raw Claude Code — replaced with execFileAsync (no shell, no quoting issues) + --output-format json

Per-lane changes:

Lane Fix
raw Claude Code execFileAsync + --output-format json, 120s timeout, hard setup_failed on missing CLI
codebase-memory-mcp npx-only detection, initTimeout 5s→10s
jCodeMunch pythonCmd everywhere, python -m pip install, initTimeout 8s→15s
CodeGraphContext pythonCmd consistency fix (still needs Neo4j — deferred)
GrepAI grepai --version instead of which grepai (still needs Go+Ollama — deferred)

Test plan

  • tests/benchmark-comparators.test.ts — 2/2 lifecycle tests pass
  • Dry-run: all 5 adapters listed cleanly (python shown for jCodeMunch/CodeGraphContext on Windows)
  • Structural: 0 2>/dev/null, 0 which (code), 0 serverCommand: 'python3', 0 pending_evidence in production paths
  • Smoke test: raw Claude Code lane → status: ok, 12 tasks, 21% usefulness (EVAL-02 minimum met)
  • Full suite: 449/449 passing (pre-push hook)

All five comparator adapters in scripts/benchmark-comparators.mjs were
setup_failed on Windows 11 due to Unix-only shell constructs. This fixes
the root causes per-lane so EVAL-02 (real benchmark data) is achievable.

Changes by lane:
- raw Claude Code: drop `2>/dev/null` from checkInstalled, switch
  runRawClaudeCode() from execAsync (shell, brittle quoting) to
  execFileAsync (no shell), add `--output-format json`, raise timeout
  60s→120s, change pending_evidence fallback to hard setup_failed
- codebase-memory-mcp: replace `which ... 2>/dev/null || npx` with
  npx-only check (cross-platform), raise initTimeout 5s→10s
- jCodeMunch: replace hardcoded python3 with pythonCmd (python on
  Windows), use `python -m pip install`, raise initTimeout 8s→15s
- CodeGraphContext: same pythonCmd consistency fix as jCodeMunch
- GrepAI: replace `which grepai 2>/dev/null` with `grepai --version`

Adds execFile import + execFileAsync, adds pythonCmd platform constant.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 12, 2026

Greptile Summary

This PR fixes five comparator adapter lanes in scripts/benchmark-comparators.mjs to be cross-platform on Windows by replacing 2>/dev/null redirects (stdio: 'pipe'), removing which calls, adding a pythonCmd constant for Python 2/3 dispatch, and switching the raw Claude Code baseline from a shell-interpolated execAsync to execFileAsync with structured args.

  • P1 — execFileAsync on Windows: child_process.execFile does not use a shell, so it cannot resolve npm's .cmd wrappers (the standard global-install artifact for Node.js CLIs on Windows). checkInstalled() succeeds via execSync (which does use cmd.exe internally), but runRawClaudeCode then throws ENOENT through execFileAsync, causing the baseline lane to return setup_failed on Windows — the exact platform this PR targets. Adding shell: process.platform === 'win32' to the execFileAsync options resolves this without reintroducing quoting issues.

Confidence Score: 4/5

Safe to merge once the execFileAsync shell-resolution issue is confirmed or fixed; one P1 may leave the baseline lane broken on the targeted Windows platform.

All five lanes receive meaningful cross-platform improvements and the jCodeMunch/CodeGraphContext/codebase-memory-mcp fixes are correct. The single P1 concern — execFileAsync not resolving .cmd wrappers on Windows — directly contradicts the PR's stated goal for the baseline lane. If Claude Code ships a native .exe on Windows the concern evaporates, but that should be confirmed. Dead exec/execAsync code is minor P2.

scripts/benchmark-comparators.mjs — specifically the execFileAsync call in runRawClaudeCode (lines 413-417) and the leftover exec/execAsync imports (lines 17, 22).

Important Files Changed

Filename Overview
scripts/benchmark-comparators.mjs Five cross-platform fixes (pythonCmd, npx-only detection, remove 2>/dev/null, execFileAsync); one P1 concern — execFileAsync cannot resolve npm's .cmd wrappers on Windows without shell:true, potentially leaving the baseline lane as setup_failed on the targeted platform.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[runComparator adapter] --> B{adapter.name == 'raw Claude Code'?}
    B -- yes --> C[runRawClaudeCode]
    B -- no --> D[runComparatorViaMcp]

    C --> E{checkInstalled
execSync 'claude --version'
uses shell on Windows}
    E -- false --> F[install throws
setup_failed]
    E -- true --> G[execFileAsync 'claude'
NO shell - cannot find .cmd]

    G -- success --> H[JSON.parse stdout
parsed.result ?? stdout]
    G -- ENOENT --> I[throw 'claude CLI not found']
    I --> J[setup_failed on Windows
despite CLI being installed]

    H --> K[matchSignals / aggregateResults]

    D --> L{MCP server adapters
npx / pythonCmd / grepai}
    L --> M[StdioClientTransport spawn]
    M --> N[listTools / callTool / extractPayload]
    N --> K
Loading

Reviews (1): Last reviewed commit: "fix(benchmarks): make all comparator lan..." | Re-trigger Greptile

Comment on lines +413 to 417
const { stdout } = await execFileAsync(
'claude',
['-p', prompt, '--output-format', 'json', '--allowedTools', 'Read,Grep,Glob'],
{ timeout: 120000, cwd: path.resolve(rootPath) }
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 execFileAsync won't resolve .cmd wrappers on Windows

child_process.execFile (and its promisified form) does not use a shell, so it cannot locate .cmd files through PATHEXT resolution. When @anthropic-ai/claude-code is installed globally via npm on Windows, npm creates claude.cmd — not claude.exe. Calling execFileAsync('claude', ...) without shell: true will throw ENOENT on Windows, even when the CLI is correctly installed and execSync('claude --version') (which does use cmd.exe internally) returns cleanly in checkInstalled(). The result would be setup_failed for the baseline lane on the very platform this PR targets.

Use shell: process.platform === 'win32' to get shell-based .cmd resolution on Windows while still avoiding quoting issues (the array form of args is safe with a shell):

const { stdout } = await execFileAsync(
  'claude',
  ['-p', prompt, '--output-format', 'json', '--allowedTools', 'Read,Grep,Glob'],
  { timeout: 120000, cwd: path.resolve(rootPath), shell: process.platform === 'win32' }
);

Or, if Claude Code ships a native .exe on Windows and this never arose during testing, the concern doesn't apply — but it's worth confirming before closing the Windows ticket.

Comment on lines 17 to 22
import { execSync, exec, execFile } from 'child_process';
import { parseArgs } from 'util';
import { promisify } from 'util';
import { withManagedStdioClientSession } from './lib/managed-mcp-session.mjs';

const execAsync = promisify(exec);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead code left over from the execAsyncexecFileAsync migration

exec is imported and immediately promisified into execAsync, but execAsync is no longer called anywhere in the file after the runRawClaudeCode rewrite. Both the import and the promisify line can be removed.

Suggested change
import { execSync, exec, execFile } from 'child_process';
import { parseArgs } from 'util';
import { promisify } from 'util';
import { withManagedStdioClientSession } from './lib/managed-mcp-session.mjs';
const execAsync = promisify(exec);
import { execSync, execFile } from 'child_process';

And remove line 22 (const execAsync = promisify(exec);) entirely.

…import

On Windows, execFile does not use a shell and cannot resolve npm's .cmd
wrappers (e.g. claude.cmd). checkInstalled() succeeded via execSync (which
goes through cmd.exe) but runRawClaudeCode threw ENOENT, returning
setup_failed on the very platform the previous commit targeted.

Add shell: process.platform === 'win32' to the execFileAsync call so
cmd.exe is used on Windows (resolves .cmd) while POSIX keeps shell: false
(no injection risk, args are already an array).

Also removes the dead exec / execAsync imports left over from the
shell-interpolated execAsync refactor.
@PatrickSys
Copy link
Copy Markdown
Owner Author

Good catch on both counts, addressed in a1e9d42.

P1 — execFileAsync .cmd resolution: Added shell: process.platform === 'win32' to the execFileAsync options in runRawClaudeCode. On Windows this routes through cmd.exe so claude.cmd is resolved; on POSIX shell stays false so we keep the args-array safety without shell interpolation. Confirmed this is the right split since checkInstalled() goes through execSync (always shell-aware) — both paths now agree.

P2 — dead imports: Dropped exec from the child_process import and removed the unused execAsync = promisify(exec) binding.

449/449 tests green on push.

@PatrickSys PatrickSys merged commit 6c19628 into master Apr 13, 2026
2 of 3 checks passed
@PatrickSys PatrickSys deleted the fix/benchmark-comparator-lanes-cross-platform branch April 13, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant