diff --git a/astro.config.ts b/astro.config.ts index 6b5caa2..8d77558 100644 --- a/astro.config.ts +++ b/astro.config.ts @@ -63,6 +63,7 @@ export default defineConfig({ items: [ { slug: "expanding-horizons/threads-context-and-caching" }, { slug: "expanding-horizons/model-pricing" }, + { slug: "expanding-horizons/high-level-harnesses" }, { slug: "expanding-horizons/what-to-read-next" }, ], }, diff --git a/src/content/docs/expanding-horizons/high-level-harnesses.mdx b/src/content/docs/expanding-horizons/high-level-harnesses.mdx new file mode 100644 index 0000000..4cc0f4f --- /dev/null +++ b/src/content/docs/expanding-horizons/high-level-harnesses.mdx @@ -0,0 +1,134 @@ +--- +title: High-level harnesses +description: Beyond individual agent sessions — scheduled automations, parallel agent fleets, and the emerging pattern of AI-driven code pipelines. +--- + +import ExternalLink from "../../../components/ExternalLink.astro"; + +The [harness engineering](/becoming-productive/harness-engineering/) chapter covered shaping a single agent's actions through AGENTS.md, skills, hooks, and subagents. +This page is one level of abstraction up — it covers tools and patterns that treat agents as a manageable workforce. + + +## From engineering to managing + +So far in this guide, you have been an **engineer** — you have worked interactively with a single agent, steering it turn by turn in real time. +Now, you will become a **manager**, delegating work to a fleet of agents running in parallel. +Instead of supervising each agent individually, you will manage the output queue — a review inbox, an issue tracker, a PR pipeline. +Your coding assistant no longer serves as a conductor, but as an orchestrator. + +:::note[Remember] +The key shift is from "what should the agent do?" to "what work should be running right now, and how do I review what came back?" +::: + +## Running agents in parallel + +The key difference is running several agents simultaneously, each on an isolated task. +You hand different issues to separate agents at once, come back and review, and merge the ones you like. +That is qualitatively different from the sequential, one-task-at-a-time conductor workflow from the previous chapters. + +[Subagents](/becoming-productive/harness-engineering/#subagents) are also parallel, but they are different: a subagent is spawned **by the agent** to partition a single task's context. +The agent decides when to spawn one, waits for the result, and folds it back into its own session. +You as the human still trigger one top-level session and review one result. + +What is described here is different: **you** spawn multiple fully independent agent sessions, each assigned to a separate task. +No session knows about the others. +You do not need to wait for any single agent — you come back later and review the queue of results in bulk. + +In practice, each agent needs its own isolated workspace — typically a separate Git worktree — so their changes do not interfere. +A dashboard or queue then surfaces results as agents finish, letting you review and merge at your own pace. + +For example, is a tool built around this model, +running multiple AI coding agents (Claude Code and Codex) in parallel worktrees with a shared review dashboard. + +## Scheduled and recurring agents + +Agents do not always need to wait for you to trigger them — you can set them up in advance to run on a schedule. +The pattern is similar to a cron job or a CI pipeline: describe a recurring task, define when it should run, and have an agent execute it in the background. +Results land in a review inbox or are auto-archived if nothing needs attention. + +This is well-suited for tasks like: +- Daily issue triage +- Surfacing and summarizing CI failures +- Generating release briefs +- Checking for regressions between versions + +With scheduled agents, the process becomes closer to a CI pipeline than a chat window — an agent is no longer a tool you reach for, but a background process. + +Example application features built around this pattern: +- +- +- + +## Issue-tracker-driven orchestration + +A natural extension of scheduled agents is wiring them directly to your issue tracker. +Instead of manually assigning tasks to agents, the system monitors a board and automatically spawns an agent for each new issue in scope. +Engineers decide what issues belong in scope; the orchestrator handles assignment and execution. + +Agent behavior can be defined in a workflow file versioned alongside the code — the same way you version a CI pipeline. +When an agent finishes, it gathers evidence (CI results, PR review feedback, complexity analysis) for human review. + +For example, is an open-source orchestration service that implements this pattern, +monitoring a Linear board and running a Codex agent per issue in an isolated workspace. + +:::tip +Issue-tracker-driven orchestration works best on codebases that have adopted [harness engineering](/becoming-productive/harness-engineering/). +::: + +## Agent communication + +Running multiple agents in parallel may create coordination problems — agents must exchange information without overloading any one context window. +Two broad patterns have emerged. + +The simpler one is **hub-and-spoke orchestration**, where a lead agent spawns workers, collects their outputs, and consolidates them. +Workers never communicate directly. +The benefit is simplicity, as the full picture is present in one place. +The cost is that every intermediate result, log line, and failed attempt flows back through the orchestrator's context, degrading its reasoning quality over time. + +The more capable pattern is **collaborative teaming**, where agents share a task list, claim work independently, and can send messages directly to one another. +A worker can flag a dependency, request a peer review, or broadcast a finding without routing it through the lead. +The lead's context stays clean; coordination happens at the edges. + +In practice, most pipelines fall somewhere on a spectrum between these extremes, often organized into three levels: + +1. **Isolated workers** — each agent runs independently and returns its output to the caller. +2. **Orchestrated workflows** — outputs become inputs for the next stage via shared files or aggregated results. +3. **Collaborative teams** — agents share a task graph, can send direct or broadcast messages, and notify the lead when work completes. + +The right level depends on how tightly coupled the tasks are. +Independent parallel tasks — security scans, test runs, lint checks — fit level 1 or 2 well. +Tasks that need to challenge or build on each other's intermediate findings call for level 3. + +For reference, implements level 3 with a shared task list, file-locked claiming, mailboxes for direct and broadcast messages, and idle notifications back to the lead. + +## Code factories + +Beyond specific products, there is an emerging pattern popularized by Ryan Carson under the name **Code Factory**. +The idea is a repository setup where agents autonomously write code, open pull requests, and a separate review agent validates those PRs with machine-verifiable evidence. +If validation passes, the PR merges without human intervention. + +The continuous loop looks like this: + +1. Agent writes code and opens a PR. +2. Risk-aware CI gates check the change. +3. A review agent inspects the PR and collects evidence — screenshots, test results, static analysis. +4. If all checks pass, the PR lands automatically. +5. If anything fails, the agent retries or flags the issue for human review. + +:::caution +A Code Factory is only as good as its quality gates. +An automated pipeline that merges bad PRs is strictly worse than one that does nothing. +Invest in solid tests, linters, and CI before automating the merge step. +::: + +- + +## One-human companies + +The code factory pattern is the technical foundation of a broader idea: that a single person with a well-configured agent fleet can operate at the scale that would previously have required a full engineering team. + +This requires connecting agents to communication platforms, scheduling systems, and external services — turning a single machine into an always-on runtime that responds to messages, executes tasks, and ships work continuously. +As an example of tooling in this space, packages infrastructure for exactly this kind of setup. + +In , Yegge argues that the engineering profession is reorganizing around exactly this spectrum. +His framing: most engineers are at the low end of AI adoption today, and those who stay there risk being outcompeted by engineers who learn to orchestrate agent fleets — to act as owners of work queues rather than writers of individual functions. diff --git a/src/data/links.csv b/src/data/links.csv index ca4f650..04eaceb 100644 --- a/src/data/links.csv +++ b/src/data/links.csv @@ -19,18 +19,21 @@ https://claude.com/plugins/playground,Playground Claude Plugin,Anthropic,,2026-0 https://claude.com/pricing,Claude Subscription,,,2026-03-04 https://cli.github.com/,GitHub CLI | Take GitHub to the command line,,,2026-03-13 https://cline.bot/blog/post-mortem-unauthorized-cline-cli-npm,Unauthorized Cline CLI npm publish,Saoud Rizwan,2026-02-24,2026-03-16 +https://code.claude.com/docs/en/agent-teams,Claude Code Agent Teams,Anthropic,,2026-04-08 https://code.claude.com/docs/en/best-practices#write-an-effective-claude-md,Best Practices for Claude Code - Claude Code Docs,Anthropic,,2026-03-04 https://code.claude.com/docs/en/hooks,Hooks reference - Claude Code Docs,Anthropic,,2026-03-13 https://code.claude.com/docs/en/security,Security - Claude Code Docs,Anthropic,,2026-03-16 https://code.claude.com/docs/en/sub-agents,Create custom subagents - Claude Code Docs,Anthropic,,2026-03-13 https://code.claude.com/docs/en/sub-agents#code-reviewer,Create custom subagents - Claude Code Docs,,,2026-03-05 https://coderabbit.ai/,CodeRabbit,,,2026-03-05 +https://conductor.build/,Conductor,Melty Labs,,2026-03-25 https://context7.com/,Context7 - Up-to-date documentation for LLMs and AI code editors,,,2026-03-13 https://cursor.com/blog,Cursor Blog,,,2026-03-04 https://cursor.com/bugbot,Cursor Bugbot,,,2026-03-05 https://cursor.com/docs/agent/browser,Cursor Browser,,,2026-03-04 https://cursor.com/docs/agent/modes#debug,Cursor Debug Mode,,,2026-03-04 https://cursor.com/docs/agent/review,Cursor Review Agent,,,2026-03-04 +https://cursor.com/docs/cloud-agent/automations,Cloud Agents Automations,Cursor,,2026-04-08 https://cursor.com/docs/context/rules,Cursor Rules,,,2026-03-04 https://cursor.com/docs/hooks,Hooks Docs,Cursor,,2026-03-13 https://cursor.com/docs/subagents,Cursor Subagents,Cursor,,2026-03-13 @@ -38,6 +41,7 @@ https://cursor.com/for/code-review,Reviewing Code with Cursor | Cursor Docs,,,20 https://cursor.com/pricing,Cursor Subscription,,,2026-03-04 https://developers.openai.com/api/docs/guides/compaction,Compaction,OpenAI,,2026-03-04 https://developers.openai.com/codex/agent-approvals-security,Codex: Agent approvals & security,OpenAI,,2026-03-16 +https://developers.openai.com/codex/app/automations,Automations in Codex app,OpenAI,,2026-03-25 https://developers.openai.com/codex/app/worktrees/#working-between-local-and-worktree,Worktrees,,,2026-03-10 https://developers.openai.com/codex/cli/features#run-local-code-review,Codex CLI features (run local code review),,,2026-03-05 https://developers.openai.com/codex/integrations/github/,Use Codex in GitHub,,,2026-03-05 @@ -56,6 +60,7 @@ https://github.com/mcp,GitHub MCP Registry,,,2026-03-13 https://github.com/microsoft/playwright-mcp,microsoft/playwright-mcp,Microsoft,,2026-03-13 https://github.com/mkaput,Marek Kaput,,,2026-03-04 https://github.com/openai/skills,openai/skills,OpenAI,,2026-03-12 +https://github.com/openai/symphony,Symphony,OpenAI,,2026-03-25 https://github.com/software-mansion-labs/skills,software-mansion-labs/skills,Software Mansion,,2026-03-12 https://github.com/steipete/mcporter/,"steipete/mcporter: Call MCPs via TypeScript, masquerading as simple TypeScript API. Or package them as cli.",Peter Steinberger,,2026-03-04 https://github.com/topics/agent-skills,GitHub Topic: agent-skills,,,2026-03-12 @@ -73,9 +78,11 @@ https://lucumr.pocoo.org/,Thoughts and Writings,Armin Ronacher,,2026-03-04 https://mcp.grep.app/,mcp.grep.app,Vercel,,2026-03-04 https://mitchellh.com/,Blog,Mitchell Hashimoto,,2026-03-04 https://models.dev/,Models.dev - An open-source database of AI models,Opencode,,2026-03-04 +https://newsletter.pragmaticengineer.com/p/from-ides-to-ai-agents-with-steve,From IDEs to AI Agents with Steve Yegge,Gergely Orosz,,2026-03-25 https://openai.com/chatgpt/pricing/,ChatGPT Subscription,,,2026-03-04 https://openai.com/index/harness-engineering/,Harness engineering: leveraging Codex in an agent-first world,OpenAI,2026-02-11,2026-03-04 https://openai.com/news/engineering/,OpenAI Engineering News,,,2026-03-04 +https://openclaw.ai/,OpenClaw,Peter Steinberger,,2026-04-02 https://opencode.ai/docs/go/,Opencode Go,,,2026-03-04 https://platform.claude.com/docs/en/build-with-claude/compaction,Compaction,Anthropic,,2026-03-04 https://platform.claude.com/docs/en/resources/prompt-library/socratic-sage,Prompting best practices,Anthropic,,2026-03-04 @@ -95,6 +102,7 @@ https://skills.sh/mitsuhiko/agent-stuff/tmux,tmux skill,Armin Ronacher,2026-01-2 https://skills.sh/vercel-labs/agent-browser/agent-browser,agent-browser,Vercel,2026-01-16,2026-03-04 https://skills.sh/vercel-labs/agent-skills/vercel-react-best-practices,vercel-react-best-practices skill,Vercel,2026-01-16,2026-03-04 https://support.apple.com/guide/mac-help/mh40584/mac,Dictate messages and documents on Mac - Apple Support,,,2026-03-10 +https://support.claude.com/en/articles/13854387-schedule-recurring-tasks-in-cowork,Schedule recurring tasks in Cowork,Anthropic,,2026-04-08 https://support.microsoft.com/en-us/windows/use-voice-typing-to-talk-instead-of-type-on-your-pc-fec94565-c4bd-329d-e59a-af033fa5689f,Use voice typing to talk instead of type on your PC - Microsoft Support,,,2026-03-10 https://swmansion.com/,Software Mansion,,,2026-03-04 https://tidewave.ai/,Tidewave,,,2026-03-04 @@ -110,6 +118,7 @@ https://x.com/GeminiApp,Google Gemini (@GeminiApp) on X,,,2026-03-04 https://x.com/karpathy,Andrej Karpathy (@karpathy) on X,,,2026-03-04 https://x.com/opencode,OpenCode (@opencode) on X,,,2026-03-04 https://x.com/RLanceMartin,Lance Martin (@RLanceMartin) on X,,,2026-03-04 +https://x.com/ryancarson,Ryan Carson (@ryancarson) on X,,,2026-03-25 https://x.com/thorstenball,Thorsten Ball (@thorstenball) on X,,,2026-03-04 https://x.com/thsottiaux,Tibo (@thsottiaux) on X,,,2026-03-04 https://x.com/trq212,Thariq Shihipar (@trq212) on X,,,2026-03-04