Describe what you want to build — a pipeline of AI agents plans, writes, reviews, and tests the code for you.
Built with LangGraph and Claude, this tool turns a plain-English description into working, tested code. Five specialized agents collaborate in a graph: one parses your intent, one plans the implementation, one writes the code, one reviews it for quality, and one runs real tests in an isolated Docker sandbox. If tests fail or the review score is too low, the coder revises automatically — up to a configurable number of iterations.
The sidebar shows real-time agent status indicators. Here, a student grade tracker was generated, reviewed (9/10), and passed 79/79 tests on the first attempt.
User Request
│
▼
┌─────────────┐
│ Orchestrator│ Parses the request, tracks overall progress
└──────┬──────┘
│
▼
┌─────────────┐
│ Planner │ Produces a structured plan: steps, files, dependencies
└──────┬──────┘
│
▼
┌─────────────┐ ◄─────────────────────────────┐
│ Coder │ Generates (or revises) code │ revision loop
└──────┬──────┘ │ (up to N times)
│ │
▼ │
┌─────────────┐ │
│ Reviewer │ Scores code quality (0–10), │
│ │ flags issues & suggestions │
└──────┬──────┘ │
│ │
▼ │
┌─────────────┐ tests fail or review │
│ Tester │ not approved? ────────────────┘
└──────┬──────┘
│ all green
▼
Final Output
(code files + plan + review + test report)
Each agent shares a single AgentState object that flows through the LangGraph StateGraph. The tester node uses a conditional edge (should_continue) to either end the pipeline or loop back to the coder for revision.
# 1. Clone the repo
git clone https://github.com/your-username/multi-agent-codegen.git
cd multi-agent-codegen
# 2. Install dependencies
pip install -e ".[dev]"
# 3. Set your API key
cp .env.example .env
# open .env and set ANTHROPIC_API_KEY=sk-ant-...
# 4. Run the app
streamlit run app.pyThe app opens at http://localhost:8501.
| Agent | Model | Role |
|---|---|---|
| Orchestrator | claude-opus-4-6 | Interprets the user request and sets pipeline status |
| Planner | claude-sonnet-4-6 | Outputs a structured Plan: objective, steps, files to create, dependencies, complexity |
| Coder | claude-sonnet-4-6 | Generates CodeArtifact objects (filename, language, content). On revisions, receives prior review issues and test failures as context |
| Reviewer | claude-sonnet-4-6 | Scores the code 0–10, marks it approved or not, lists issues and suggestions |
| Tester | user's choice | Generates pytest files and runs them in a Docker sandbox. The model is selected from the sidebar: Haiku (fast, best for simple scripts), Sonnet (default, reliable for most projects), or Opus (most thorough, best for complex logic and edge cases) |
After the Tester runs, a should_continue router decides what happens next:
- Review approved + all tests pass → pipeline ends with
COMPLETED - Review failed or tests failed + iterations remaining → loop back to Coder, which receives the reviewer's issues and test errors as additional context
- Max iterations reached → pipeline ends regardless (avoids infinite loops)
Max iterations is configurable in the sidebar (default: 3).
All agents read from and write to a Pydantic AgentState model:
class AgentState(BaseModel):
user_request: str
plan: Optional[Plan]
artifacts: list[CodeArtifact]
review: Optional[ReviewFeedback]
test_result: Optional[TestResult]
status: TaskStatus
iteration: int
max_iterations: intInput prompt:
A Python FastAPI server with a /health endpoint and a /echo POST endpoint
that returns the request body as JSON.
Generated files:
main.py
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class EchoRequest(BaseModel):
message: str
@app.get("/health")
def health():
return {"status": "ok"}
@app.post("/echo")
def echo(body: EchoRequest):
return bodytest_main.py
from fastapi.testclient import TestClient
from main import app
client = TestClient(app)
def test_health():
r = client.get("/health")
assert r.status_code == 200
assert r.json() == {"status": "ok"}
def test_echo():
r = client.post("/echo", json={"message": "hello"})
assert r.status_code == 200
assert r.json() == {"message": "hello"}Review: 9/10 — approved Tests: 2/2 passed Status: COMPLETED
| Library | Version | Purpose |
|---|---|---|
| LangGraph | ≥ 0.2.0 | Agent orchestration graph with conditional edges |
| LangChain Anthropic | ≥ 0.3.0 | Claude API integration via ChatAnthropic |
| LangChain Core | ≥ 0.3.0 | Message types, structured output |
| Pydantic | ≥ 2.0.0 | Typed state schema and structured LLM outputs |
| Streamlit | ≥ 1.40.0 | Web UI with real-time streaming via app.stream() |
| python-dotenv | ≥ 1.0.0 | .env file loading |
| LangSmith | ≥ 0.1.0 | LLM tracing and observability (optional) |
Python ≥ 3.10 required.
Every LLM call across all five agents is automatically traced when LangSmith is enabled. Each pipeline run appears as a single trace in your project dashboard, with a child span per agent showing:
- Inputs / outputs — the exact prompt and response for each agent
- Token usage — prompt, completion, and total tokens per call
- Latency — time spent in each agent and end-to-end pipeline duration
- Revision loops — each coder → reviewer → tester iteration is captured as separate spans
To enable tracing:
- Create a free account at smith.langchain.com
- Go to Settings → API Keys and generate a new key
- Add the following to your
.envfile:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langsmith_api_key_here
LANGCHAIN_PROJECT=multi-agent-codegen- Run the app — traces appear in the
multi-agent-codegenproject automatically.
To disable tracing, set LANGCHAIN_TRACING_V2=false or remove the variable.
- Claude Code — AI coding assistant used during development
- File system output — write generated files directly to a user-specified directory with one click
- Agent memory — persist previous runs so the coder can learn from past mistakes across sessions
- Custom agent prompts — let users edit agent system prompts from the UI without touching code
- Streaming token output — stream individual tokens from the coder agent for a faster perceived response
- Multi-language support — extend beyond Python to TypeScript, Go, Rust with language-specific test runners
- GitHub integration — push generated code directly to a new branch and open a pull request
