fix(benchmarks): make all comparator lanes cross-platform on Windows (#97) by PatrickSys · Pull Request #97 · PatrickSys/codebase-context

PatrickSys · 2026-04-12T19:50:15Z

Summary

Fixes all five comparator adapter lanes in scripts/benchmark-comparators.mjs that were setup_failed on Windows 11 due to Unix-only shell constructs. Satisfies EVAL-02 (real benchmark data from at least 2 lanes).

Root causes fixed:

2>/dev/null shell redirects (not valid on cmd.exe/batch) — replaced with stdio: 'pipe'
which (Unix-only) — removed; npx and binary commands used directly
Hardcoded python3 (doesn't exist on Windows) — replaced with pythonCmd platform constant (python on Win32, python3 on Unix)
Shell-based execAsync with manual quote escaping in raw Claude Code — replaced with execFileAsync (no shell, no quoting issues) + --output-format json

Per-lane changes:

Lane	Fix
`raw Claude Code`	`execFileAsync` + `--output-format json`, 120s timeout, hard `setup_failed` on missing CLI
`codebase-memory-mcp`	npx-only detection, `initTimeout` 5s→10s
`jCodeMunch`	`pythonCmd` everywhere, `python -m pip install`, `initTimeout` 8s→15s
`CodeGraphContext`	`pythonCmd` consistency fix (still needs Neo4j — deferred)
`GrepAI`	`grepai --version` instead of `which grepai` (still needs Go+Ollama — deferred)

Test plan

tests/benchmark-comparators.test.ts — 2/2 lifecycle tests pass
Dry-run: all 5 adapters listed cleanly (python shown for jCodeMunch/CodeGraphContext on Windows)
Structural: 0 2>/dev/null, 0 which (code), 0 serverCommand: 'python3', 0 pending_evidence in production paths
Smoke test: raw Claude Code lane → status: ok, 12 tasks, 21% usefulness (EVAL-02 minimum met)
Full suite: 449/449 passing (pre-push hook)

All five comparator adapters in scripts/benchmark-comparators.mjs were setup_failed on Windows 11 due to Unix-only shell constructs. This fixes the root causes per-lane so EVAL-02 (real benchmark data) is achievable. Changes by lane: - raw Claude Code: drop `2>/dev/null` from checkInstalled, switch runRawClaudeCode() from execAsync (shell, brittle quoting) to execFileAsync (no shell), add `--output-format json`, raise timeout 60s→120s, change pending_evidence fallback to hard setup_failed - codebase-memory-mcp: replace `which ... 2>/dev/null || npx` with npx-only check (cross-platform), raise initTimeout 5s→10s - jCodeMunch: replace hardcoded python3 with pythonCmd (python on Windows), use `python -m pip install`, raise initTimeout 8s→15s - CodeGraphContext: same pythonCmd consistency fix as jCodeMunch - GrepAI: replace `which grepai 2>/dev/null` with `grepai --version` Adds execFile import + execFileAsync, adds pythonCmd platform constant.

greptile-apps · 2026-04-12T19:54:45Z

Greptile Summary

This PR fixes five comparator adapter lanes in scripts/benchmark-comparators.mjs to be cross-platform on Windows by replacing 2>/dev/null redirects (stdio: 'pipe'), removing which calls, adding a pythonCmd constant for Python 2/3 dispatch, and switching the raw Claude Code baseline from a shell-interpolated execAsync to execFileAsync with structured args.

P1 — execFileAsync on Windows: child_process.execFile does not use a shell, so it cannot resolve npm's .cmd wrappers (the standard global-install artifact for Node.js CLIs on Windows). checkInstalled() succeeds via execSync (which does use cmd.exe internally), but runRawClaudeCode then throws ENOENT through execFileAsync, causing the baseline lane to return setup_failed on Windows — the exact platform this PR targets. Adding shell: process.platform === 'win32' to the execFileAsync options resolves this without reintroducing quoting issues.

Confidence Score: 4/5

Safe to merge once the execFileAsync shell-resolution issue is confirmed or fixed; one P1 may leave the baseline lane broken on the targeted Windows platform.

All five lanes receive meaningful cross-platform improvements and the jCodeMunch/CodeGraphContext/codebase-memory-mcp fixes are correct. The single P1 concern — execFileAsync not resolving .cmd wrappers on Windows — directly contradicts the PR's stated goal for the baseline lane. If Claude Code ships a native .exe on Windows the concern evaporates, but that should be confirmed. Dead exec/execAsync code is minor P2.

scripts/benchmark-comparators.mjs — specifically the execFileAsync call in runRawClaudeCode (lines 413-417) and the leftover exec/execAsync imports (lines 17, 22).

Important Files Changed

Filename	Overview
scripts/benchmark-comparators.mjs	Five cross-platform fixes (pythonCmd, npx-only detection, remove 2>/dev/null, execFileAsync); one P1 concern — execFileAsync cannot resolve npm's .cmd wrappers on Windows without shell:true, potentially leaving the baseline lane as setup_failed on the targeted platform.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[runComparator adapter] --> B{adapter.name == 'raw Claude Code'?}
    B -- yes --> C[runRawClaudeCode]
    B -- no --> D[runComparatorViaMcp]

    C --> E{checkInstalled
execSync 'claude --version'
uses shell on Windows}
    E -- false --> F[install throws
setup_failed]
    E -- true --> G[execFileAsync 'claude'
NO shell - cannot find .cmd]

    G -- success --> H[JSON.parse stdout
parsed.result ?? stdout]
    G -- ENOENT --> I[throw 'claude CLI not found']
    I --> J[setup_failed on Windows
despite CLI being installed]

    H --> K[matchSignals / aggregateResults]

    D --> L{MCP server adapters
npx / pythonCmd / grepai}
    L --> M[StdioClientTransport spawn]
    M --> N[listTools / callTool / extractPayload]
    N --> K

_{Reviews (1): Last reviewed commit: "fix(benchmarks): make all comparator lan..." | Re-trigger Greptile}

greptile-apps · 2026-04-12T19:54:48Z

scripts/benchmark-comparators.mjs

+      const { stdout } = await execFileAsync(
+        'claude',
+        ['-p', prompt, '--output-format', 'json', '--allowedTools', 'Read,Grep,Glob'],
+        { timeout: 120000, cwd: path.resolve(rootPath) }
      );


execFileAsync won't resolve .cmd wrappers on Windows

child_process.execFile (and its promisified form) does not use a shell, so it cannot locate .cmd files through PATHEXT resolution. When @anthropic-ai/claude-code is installed globally via npm on Windows, npm creates claude.cmd — not claude.exe. Calling execFileAsync('claude', ...) without shell: true will throw ENOENT on Windows, even when the CLI is correctly installed and execSync('claude --version') (which does use cmd.exe internally) returns cleanly in checkInstalled(). The result would be setup_failed for the baseline lane on the very platform this PR targets.

Use shell: process.platform === 'win32' to get shell-based .cmd resolution on Windows while still avoiding quoting issues (the array form of args is safe with a shell):

const { stdout } = await execFileAsync( 'claude', ['-p', prompt, '--output-format', 'json', '--allowedTools', 'Read,Grep,Glob'], { timeout: 120000, cwd: path.resolve(rootPath), shell: process.platform === 'win32' } );

Or, if Claude Code ships a native .exe on Windows and this never arose during testing, the concern doesn't apply — but it's worth confirming before closing the Windows ticket.

greptile-apps · 2026-04-12T19:54:49Z

scripts/benchmark-comparators.mjs

+import { execSync, exec, execFile } from 'child_process';
 import { parseArgs } from 'util';
 import { promisify } from 'util';
 import { withManagedStdioClientSession } from './lib/managed-mcp-session.mjs';

 const execAsync = promisify(exec);


Dead code left over from the execAsync → execFileAsync migration

exec is imported and immediately promisified into execAsync, but execAsync is no longer called anywhere in the file after the runRawClaudeCode rewrite. Both the import and the promisify line can be removed.

Suggested change

import { execSync, exec, execFile } from 'child_process';

import { parseArgs } from 'util';

import { promisify } from 'util';

import { withManagedStdioClientSession } from './lib/managed-mcp-session.mjs';

const execAsync = promisify(exec);

import { execSync, execFile } from 'child_process';

And remove line 22 (const execAsync = promisify(exec);) entirely.

…import On Windows, execFile does not use a shell and cannot resolve npm's .cmd wrappers (e.g. claude.cmd). checkInstalled() succeeded via execSync (which goes through cmd.exe) but runRawClaudeCode threw ENOENT, returning setup_failed on the very platform the previous commit targeted. Add shell: process.platform === 'win32' to the execFileAsync call so cmd.exe is used on Windows (resolves .cmd) while POSIX keeps shell: false (no injection risk, args are already an array). Also removes the dead exec / execAsync imports left over from the shell-interpolated execAsync refactor.

PatrickSys · 2026-04-13T07:20:07Z

Good catch on both counts, addressed in a1e9d42.

P1 — execFileAsync .cmd resolution: Added shell: process.platform === 'win32' to the execFileAsync options in runRawClaudeCode. On Windows this routes through cmd.exe so claude.cmd is resolved; on POSIX shell stays false so we keep the args-array safety without shell interpolation. Confirmed this is the right split since checkInstalled() goes through execSync (always shell-aware) — both paths now agree.

P2 — dead imports: Dropped exec from the child_process import and removed the unused execAsync = promisify(exec) binding.

449/449 tests green on push.

greptile-apps bot reviewed Apr 12, 2026

View reviewed changes

PatrickSys merged commit 6c19628 into master Apr 13, 2026
2 of 3 checks passed

PatrickSys deleted the fix/benchmark-comparator-lanes-cross-platform branch April 13, 2026 07:21

github-actions bot mentioned this pull request Apr 13, 2026

chore(master): release 1.10.0 #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmarks): make all comparator lanes cross-platform on Windows (#97)#97

fix(benchmarks): make all comparator lanes cross-platform on Windows (#97)#97
PatrickSys merged 2 commits intomasterfrom
fix/benchmark-comparator-lanes-cross-platform

PatrickSys commented Apr 12, 2026

Uh oh!

greptile-apps bot commented Apr 12, 2026

Important Files Changed

Uh oh!

greptile-apps bot Apr 12, 2026

Uh oh!

greptile-apps bot Apr 12, 2026

Uh oh!

PatrickSys commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PatrickSys commented Apr 12, 2026

Summary

Test plan

Uh oh!

greptile-apps bot commented Apr 12, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

PatrickSys commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant