CLI agent benchmarker dashboard. Run multiple coding agents on the same task, watch their terminals live, and compare each output using reviewer.
harness-bench-demo-github.mp4
- Run
amp,opencode,claude,codex,pi,droidin parallel - WebSocket-driven PTY streaming for live terminal output
- Explicit global stop path via
POST /stopwith shutdown ladder (Ctrl-C,Ctrl-C,SIGTERM,SIGKILL) - Dark, monospace-first UI with
ghostty-webterminals (optionallyxtermjs,resttyalternatives) - Per agent Git worktree set up with cleanup control
- Git
OPENROUTER_API_KEY(add to PATH or provide in UI)- Bun:
curl -fsSL https://bun.sh/install | bash - Amp:
curl -fsSL https://ampcode.com/install.sh | bash - Droid:
curl -fsSL https://app.factory.ai/cli | sh - OpenCode:
curl -fsSL https://opencode.ai/install | bash - Codex:
bun i -g @openai/codex - Pi:
bun i -g @mariozechner/pi-coding-agent - Claude Code:
curl -fsSL https://claude.ai/install.sh | bash
Run locally without installing:
bunx @frixaco/hbenchbunx @frixaco/hbench runs the prebuilt dist server bundle shipped in the package.
Development:
bun install
bun run devbun run dev # UI + PTY + REST server
bun run build # build fullstack server bundle into dist/
bun run start # run dist bundle (production runtime)
bun run lint # eslint
bun run format # prettier
bun run check # format + lint
bun publish --access public # publish new version