A multi-agent deep research tool that produces structured, fact-checked research reports from a single query.
Input: A research question, technical topic, or competitive analysis request Output: A structured research report with citations, fact-check verification, and source metadata
autoresearch orchestrates five specialist agents through a YAML-defined workflow engine:
User Query
|
v
Planner (task decomposition, research brief)
|
+---> Searcher (web search, source collection)
|
+---> Reader (deep document reading, content extraction)
|
+---> Synthesizer (report generation from collected materials)
|
+---> Fact-Checker (claim verification, citation validation)
Each agent operates on a per-task directory under .autoresearch/tasks/{task-id}/. Agent communication is file-based — no direct inter-agent calls. The Planner is the sole orchestrator that advances task state.
Tasks progress through a validated state machine:
CREATED -> PLANNING -> SEARCHING -> READING -> SYNTHESIZING -> FACT_CHECKING -> DONE
|
REVISION (if disputes found)
State transitions are validated; invalid transitions raise errors. The revision loop runs up to 3 rounds before yielding control.
The Synthesizer and Fact-Checker must use different models to ensure independent verification. autoresearch validate enforces this constraint.
Three workflow definitions ship under workflows/:
| Workflow | Steps | Use Case |
|---|---|---|
deep-research |
plan → search → read → synthesize → fact-check | Full research pipeline |
quick-scan |
plan → search → synthesize | Fast search-only output |
fact-check-only |
Standalone fact-check pass | Verify an existing draft |
Workflows are YAML files with topological step ordering, conditional execution, and dependency wiring.
uv sync --all-groups# Initialize project structure and detect host environment
autoresearch init
# Validate configuration and SOD compliance
autoresearch validate
# Run a research task
autoresearch run "multi-agent AI architecture patterns"
autoresearch run "competitor analysis: Perplexity vs You.com" --depth deep
autoresearch run "quick survey of LLM tool use" --depth quick
autoresearch run "query" --template technical --json
# Task management
autoresearch status
autoresearch status <task-id>
autoresearch list
autoresearch list --last 5
autoresearch resume <task-id>
# Export reports
autoresearch export <task-id> --format markdown
autoresearch export <task-id> --format json
# Memory management
autoresearch memory show
autoresearch memory clear --older-than 30dAll commands support --json for machine-readable output.
autoresearch ships an MCP stdio server for integration with Claude Code, Cursor, and other AI coding tools:
# Entry point
autoresearch-mcp-serverExposed tools:
autoresearch_run— Start a research task (query, depth, template)autoresearch_status— Check task statusautoresearch_read_report— Read a completed report
Configuration lives in autoresearch.yaml (falls back to defaults if absent):
spec_version: "0.1.0"
name: autoresearch
agents:
planner:
enabled: true
model: claude-opus-4-20250514
fallback_model: claude-sonnet-4-20250514
temperature: 0.3
searcher:
enabled: true
model: claude-sonnet-4-20250514
reader:
enabled: true
model: gemini-2.0-pro
synthesizer:
enabled: true
model: claude-sonnet-4-20250514
fact_checker:
enabled: true
model: claude-sonnet-4-20250514
mcp_servers:
exa:
type: url
url: https://mcp.exa.ai/mcp
api_key_env: EXA_API_KEY
enabled: true
memory:
auto_summarize: true
summarize_after_sessions: 3
retention_days: 30
output:
default_format: markdown
include_sources: true
citation_style: simplified
features:
fact_checking: true
citation_validation: true
human_in_the_loop: falseThree-level memory hierarchy under .autoresearch/memory/:
| Level | Path | Retention | Purpose |
|---|---|---|---|
| Session records | sessions/{id}.json |
Configurable | Raw per-session agent outputs |
| Task summaries | summaries/{task-id}.md |
Auto-generated | Aggregated task history |
| Long-term memory | long-term/{key}.md |
Persistent | Cross-task knowledge |
autoresearch init auto-detects the host environment (Claude Code, Cursor, OpenCode) and writes an integration skill file to .autoresearch/.
src/autoresearch/
├── agents/ # Agent implementations (Planner, Searcher, Reader, Synthesizer, FactChecker)
├── adapters/ # MCP server, host detection
├── config/ # YAML loader, schema (msgspec), SOD validation
├── engine/ # Workflow engine, state machine, memory manager
├── models/ # Type definitions (TaskStatus, AgentRole, ResearchDepth)
├── tools/ # Web search, URL extraction, git operations
├── templates/ # Report rendering templates
└── cli.py # Click-based CLI entry point
workflows/ # YAML workflow definitions
tests/ # pytest tests + Hypothesis property tests
features/ # Gherkin BDD scenarios
just setup # Install dependencies + tools
just format # Format code (ruff + rumdl)
just lint # Lint (ruff, ty, typos, rumdl)
just test # Run pytest
just bdd # Run behave BDD scenarios
just test-all # pytest + behave
just typecheck # ty type checking
just bench # pytest benchmarks
just build # uv build- Runtime: Python 3.12+, managed by
uv - CLI: Click
- Config: PyYAML + msgspec structs
- Serialization: orjson, msgspec
- HTTP: httpx
- MCP: mcp SDK (FastMCP)
- Logging: structlog
- Testing: pytest, behave, Hypothesis
- Linting: ruff, ty
Apache-2.0