This repository is an educational but working Python Agent Harness. It keeps dependencies near zero so the moving parts stay visible:
Agent = Model + Tools + Skills + Memory + Permissions + Trace + Context + Eval
It is not trying to replace LangGraph, OpenAI Agents SDK, CrewAI, or Claude Code. It is a small framework for learning how those systems are built.
- OpenAI-compatible
/v1/chat/completionsclient with retries, timeouts, redacted errors, provider-compatible tool-call parsing, and mock clients. - Tool registry with calculator, workspace file tools, safer patch-style editing, grep, git status/diff, and permission-gated shell execution.
- File-based skills in
skills/*.mdwith metadata, trigger matching, priority, and advisory allowed/risky tool lists. - Context compression that truncates large tool outputs and keeps a simple state summary.
- JSONL traces, a trace analyzer, and an eval runner with isolated workspaces.
- MCP adapter and multi-agent orchestration skeletons.
- Offline lab-report and paper-reading examples for research workflows.
Python 3.10+ is recommended.
python -m venv .venv
.venv\Scripts\activate
pip install -e .
pip install pytestThe package itself has no required third-party dependencies.
Copy .env.example values into your shell or environment manager:
set MODEL_API_KEY=your_api_key_here
set MODEL_BASE_URL=https://api.openai.com/v1
set MODEL_NAME=gpt-4.1-miniOptional tuning:
set MODEL_TIMEOUT_SECONDS=120
set MODEL_MAX_TOKENS=2048
set MODEL_TEMPERATURE=0.2
set MODEL_MAX_RETRIES=2Never commit real API keys. run_shell is dangerous and is not auto-approved by
default. Leave AGENT_AUTO_APPROVE_DANGEROUS=0 unless you are in a trusted local
demo workspace.
python examples/quickstart.py
python examples/research_note_agent.py
python examples/coding_agent_demo.py
python examples/lab_report_agent.py
python examples/paper_reading_agent.pyModel-backed examples need MODEL_API_KEY. The lab-report and paper-reading
examples are deterministic offline workflows.
Core built-in tools from build_builtin_tools(workspace):
calculate(expression)read_text(path)read_text_range(path, start_line, end_line)write_text(path, content)replace_text(path, old, new, expected_replacements=1)append_text(path, content)list_dir(path=".")grep(pattern, path=".")git_status()git_diff(path=".")run_shell(command, timeout_seconds=20)
Patch-style editing is safer than whole-file rewriting because replace_text
requires an exact match count. It refuses zero matches and unexpected duplicate
matches, which prevents many accidental broad rewrites.
Skills are reusable task procedures, not executable tools. The default skill
files live in skills/ and are loaded by build_default_skills().
---
name: "Research Note"
description: "Read or synthesize material into a structured research note."
triggers: ["paper", "research"]
priority: 40
allowed_tools: ["read_text", "grep"]
risky_tools: []
---
Instructions go here.See docs/skills.md.
Run tests:
python -m pytestRun evals with a configured model:
python -m agent_harness.eval_runner evals/sample_tasks.jsonlAnalyze a trace:
python -m agent_harness.trace_analyzer runs/quickstart_trace.jsonlTrace and eval outputs are generated under runs/ and are ignored by Git.
agent_harness/
agent.py # Agent loop
builtin_tools.py # Built-in workspace, edit, git, shell tools
context.py # Context compression and state summaries
eval_runner.py # JSONL eval runner
mcp_adapter.py # MCP-to-Tool adapter boundary
model_client.py # OpenAI-compatible client and mock clients
multi_agent.py # Manager/worker orchestration skeleton
skills.py # Skill model and file loader
trace.py # JSONL trace writer
examples/ # Demo workflows
docs/ # Design and extension notes
skills/ # File-based default skills
tests/ # Focused pytest coverage