Structured observability for AI agent systems: capture, trace, and debug agent workflows in real time.
Stop flying blind with AI agents. AEP is a lightweight, structured observability framework for multi-agent systems. Capture causation chains, debug orchestration logic, visualize agent workflows: all in real time.
Perfect for:
- 🎯 Orchestrators managing multiple agents and sub-agents
- 🔍 Researchers studying agent behavior and decision trees
- 🏢 Enterprises auditing agent actions for compliance
- 👨💻 Developers debugging complex agentic systems
Requirements: Node.js 20+
The dashboard and read APIs are open in dev mode, but ingest (POST /events)
always requires a write-scoped API key — so the quick start mints one first.
# 1. Clone & install
git clone https://github.com/surpradhan/agent-event-protocol.git
cd agent-event-protocol
npm install
# 2. Start the ingest server (ADMIN_TOKEN lets you mint an API key)
ADMIN_TOKEN=dev-admin npm run ingest
# 3. In another terminal: mint a write key and emit a sample event
export AEP_API_KEY=$(curl -s -X POST http://localhost:8787/admin/keys \
-H "Authorization: Bearer dev-admin" -H "Content-Type: application/json" \
-d '{"tenantId":"dev","label":"quickstart","scopes":["read","write"]}' \
| node -e "process.stdin.once('data', d => console.log(JSON.parse(d).key))")
npm run emit:example # → { "status": 202, ... }
# 4. Open the live dashboard (open in dev; set DASHBOARD_TOKEN to lock it down)
open http://localhost:8787/dashboardWhy a key in dev? "Dev mode" (no
DASHBOARD_TOKEN) only opens the dashboard and read endpoints. Ingest is authenticated in every mode — see AUTH.md. The demo scripts below also readAEP_API_KEY.
See it in action with demo scenarios:
npm run demo:support # 📞 Support ticket triage agent
npm run demo:itops # 🛠️ IT ops incident response
npm run demo:research # 🔬 Research & synthesis
npm run demo:subagent # 🌳 Orchestrator + 3 parallel sub-agents
npm run demo:logging # 📋 Log spike investigation# Install (requires Python ≥ 3.10)
pip install -e "sdks/python[dev]"
# Emit an event
python - <<'EOF'
from aep import create_event, AEPClient
event = create_event(
source="agent://my-agent",
type="task.created",
session_id="ses_001",
trace_id="trc_001",
payload={"task": "summarise document"},
)
# AEPClient picks up AEP_API_KEY from the environment — export a write-scoped
# key first (ingest always needs one; see "Local Development" above).
with AEPClient() as client:
print(client.emit(event))
EOF
# Run the multi-agent research demo
python sdks/python/demos/subagent_research.py
# Auto-instrument LangGraph (zero-code) and run the 10-node demo
pip install -e "sdks/python[langgraph]"
python sdks/python/demos/langgraph_multiagent.py
# Or auto-instrument CrewAI (runs offline, no LLM key needed)
pip install -e "sdks/python[crewai]"
python sdks/python/demos/crewai_multiagent.py
# Or auto-instrument AutoGen AgentChat (runs offline, no LLM key needed)
pip install -e "sdks/python[autogen]"
python sdks/python/demos/autogen_multiagent.py
# Or auto-instrument the OpenAI Agents SDK (runs offline, no LLM key needed)
pip install -e "sdks/python[openai-agents]"
python sdks/python/demos/openai_agents_multiagent.py
# Or auto-instrument the Anthropic Claude Agent SDK (runs offline, no LLM key needed)
pip install -e "sdks/python[claude-agent]"
python sdks/python/demos/claude_agent_multiagent.pyAuto-instrumentation: import aep; aep.instrument() makes LangGraph, CrewAI, AutoGen AgentChat, the OpenAI Agents SDK, and the Anthropic Claude Agent SDK workflows emit a full AEP event DAG with no other code changes — see sdks/python/aep/instrument.py.
See sdks/python/README.md for the full Python SDK reference.
# The Go SDK is a subdirectory module of this monorepo.
go get github.com/surpradhan/agent-event-protocol/sdks/go@latest
# Emit an event
package main
import (
"context"
"log"
"os"
"github.com/surpradhan/agent-event-protocol/sdks/go/aep"
)
func main() {
event, _ := aep.CreateEvent(
"agent://my-agent",
aep.EventTypeTaskCreated,
"ses_001",
"trc_001",
map[string]interface{}{"task": "analyze data"},
nil,
)
client := aep.NewClient()
// Ingest always needs a write-scoped key. NewClient() does not read the
// environment, so set it explicitly (export AEP_API_KEY first).
client.SetAPIKey(os.Getenv("AEP_API_KEY"))
defer client.Close()
resp, err := client.Emit(context.Background(), event)
if err != nil {
log.Fatal(err)
}
log.Printf("Emitted: %s", resp.ID)
}See sdks/go/README.md for the full Go SDK reference.
npm install @surpradhan/aep # from sdks/node/ (Node >= 20, dual ESM + CJS)import { AEPClient, createEvent } from "@surpradhan/aep";
const event = createEvent(
"agent://my-agent",
"task.created",
"ses_001",
"trc_001",
{ task: "analyze data" },
{ agentRole: "orchestrator" },
);
// Reads AEP_INGEST_URL / AEP_API_KEY from the environment when not passed in.
const client = new AEPClient({ apiKey: process.env.AEP_API_KEY });
await client.emit(event);// Zero-code auto-instrumentation for LangChain.js / LangGraph:
import { instrument } from "@surpradhan/aep";
await instrument(); // then run your graph as usual — emits a full AEP DAGSee sdks/node/README.md for the full Node SDK reference.
Vercel AI SDK is supported via the OpenTelemetry bridge — flip on
experimental_telemetry: { isEnabled: true }, point an OTEL Collector running
the AEP exporter at AEP, and generateText / streamText / ai.toolCall spans
land as AEP events with full trace/causation preserved. See
docs/integrations/vercel-ai-sdk.md for
the wiring and an honest write-up of the current mapping gaps.
# Set required security tokens
export DASHBOARD_TOKEN=$(openssl rand -hex 32)
export ADMIN_TOKEN=$(openssl rand -hex 32)
# Start the server
PORT=8787 npm run ingest
# Deploy behind TLS reverse proxy (nginx, ELB, CloudFront)
# See SECURITY.md for complete production checklistKey differences from dev mode:
- ✅
DASHBOARD_TOKEN&ADMIN_TOKENrequired (not set = 503 Service Unavailable) - ✅ TLS/HTTPS via reverse proxy (no direct exposure)
- ✅ Network isolation (VPC, security groups, firewall rules)
- 🔒 See SECURITY.md for complete hardening guide
| Challenge | Solution |
|---|---|
| Multi-agent workflows are hard to debug | Live causation DAG shows exactly which agent called what, when, and why |
| Black-box agent behavior | Structured event logs let you audit decisions and trace reasoning |
| Distributed agent traces are fragmented | Single trace ID ties together all agents, sub-agents, and tool calls |
| Performance issues are invisible | Metrics track latency, throughput, and error rates per agent |
| Compliance auditing is manual | Structured logs with signatures enable automated compliance checks |
📋 Event Protocol
- 12 structured event types: Task (created/completed/updated/failed), Tool (called/result), Memory (read/write), Handoff (started/completed), Error/Policy (raised/blocked)
- JSON Schema validation with AJV
- Distributed tracing via
trace_id+session_id+parent_session_id - HMAC-SHA256 event signing for authenticity
🔌 Ingest API
- High-throughput event ingestion with deduplication
- Automatic tenant isolation per API key
- Rate limiting + HMAC verification
- Returns 202 Accepted for async processing
📊 Live Dashboard
- Real-time causation DAG (shows call chains)
- Session timeline with event swim lanes
- Multi-agent workflow tree visualization
- Server-Sent Events (SSE) for instant updates
- Dark mode support
⚙️ CLI Toolkit
aep emit --type task.created --source agent://my-agent --session ses_123 --trace trc_456
aep session ses_123 --type task.created --q "search term"
aep export ses_123 --format json|csv --out export.json
npm run export -- --format jsonl|csv|parquet --compression none|gzip|brotli --sink local|s3 --all-tenants
aep workflow trc_456
aep validate events.json📈 Observability
- Prometheus
/metricsendpoint for monitoring - Structured JSON logs with Pino
- Health checks (
/health,/ready) - Rejection logs with rejection reasons
🔐 Security & Isolation
- API key authentication (Bearer token format)
- Multi-tenant isolation (per-tenant scopes)
- Optional HMAC signing for event verification
- Dashboard token protection (dev mode optional)
Copy .env.example to .env. Key variables:
| Variable | Default | Dev | Prod |
|---|---|---|---|
PORT |
8787 |
Same | Same (behind TLS reverse proxy) |
DATABASE_PATH |
./data/aep.db |
Local SQLite | Durable storage + backups |
DASHBOARD_TOKEN |
(unset) | Open (no auth) | REQUIRED |
ADMIN_TOKEN |
(unset) | Disabled | REQUIRED |
NODE_ENV |
(unset) | Optional | Set to production |
Development mode: Dashboard and read endpoints are open (rapid iteration, NOT for shared networks).
Production mode: All endpoints require auth, must deploy behind TLS reverse proxy with strong tokens (openssl rand -hex 32).
See AUTH.md for auth setup, SECURITY.md for hardening, and SETUP.md for troubleshooting.
cp .env.example .env
docker compose up -dTo run the pre-built image directly (without Compose):
docker build -t aep-ingest .
docker run -p 8787:8787 \
-e ADMIN_TOKEN=change-me \
-e DASHBOARD_TOKEN=change-me \
-v aep_data:/data \
aep-ingestReference these common response structures when building clients and integrations.
202 Accepted — POST /events (async ingest)
{ "accepted": true, "duplicate": false, "id": "evt_01HXYZ..." }200 OK — GET /sessions
{ "sessions": [ { "session_id": "ses_01HXYZ...", "created_at": "..." } ], "next_cursor": "..." }200 OK — GET /sessions/{sessionId}/events
{ "session_id": "ses_01HXYZ...", "events": [ { "id": "evt_...", "type": "task.created", ... } ] }200 OK — GET /sessions/{sessionId}/audit-bundle and GET /workflows/{traceId}/audit-bundle
{
"aep_audit_version": "0.1.0",
"manifest": { "scope": { "session_id": "ses_01HXYZ..." }, "event_count": 12, "content_digest": "…", "content_digest_alg": "sha256", "exported_at": "..." },
"events": [ { "id": "evt_...", "type": "task.created", ... } ],
"signature": { "alg": "hmac-sha256", "value": "…" }
}Returns a tamper-evident, HMAC-signed audit bundle (Phase 14). Verify offline with aep audit verify <bundle.json>. Append ?format=pdf for a human-readable PDF report rendering (the JSON bundle remains the verifiable artifact), or render locally with aep audit render <bundle.json>. Requires AUDIT_SIGNING_SECRET to be configured server-side, else 503.
400 Bad Request — schema or validation failure
{ "accepted": false, "errors": [ "/ must have required property 'session_id'", "/type must be one of: task.created, ..." ] }401 Unauthorized — authentication failure (missing/invalid/revoked API key)
{ "error": "Invalid API key" }See AUTH.md for details on key authentication and scoping.
403 Forbidden — insufficient permissions
{ "error": "Forbidden" }Typically indicates cross-tenant access attempt or insufficient scopes for the requested operation.
| Resource | Purpose |
|---|---|
| OpenAPI Docs | Interactive API reference (Swagger UI) |
| openapi.json | Machine-readable OpenAPI 3.1 spec |
| sdks/python/README.md | Python SDK reference — install, quick start, API, exceptions |
| sdks/go/README.md | Go SDK reference — install, quick start, API, CLI, examples |
| AUTH.md | API key management, tenant scoping, HMAC signing |
| CONTRIBUTING.md | Development setup, code style, contribution workflow |
| SECURITY.md | Threat model, vulnerability disclosure, production deployment checklist |
| SETUP.md | Installation, configuration, troubleshooting |
| OPERATIONS.md | Operations & deployment: Postgres backend, projects/tiers/quotas, retention/pruning (cron + k8s CronJob), S3/cloud export (Phase 17) |
| CHANGELOG.md | Version history (Phases 1–17) and breaking changes |
| PRD.md | Product vision, roadmap, and success metrics (Phases 12+) |
| CODE_OF_CONDUCT.md | Community standards and expectations |
┌──────────────────────────────────────────┐
│ Your Agents │
│ JS · Python SDK · CLI · raw HTTP │
└────────────────┬─────────────────────────┘
│ POST /events { type, source, session_id, trace_id, … }
↓
┌─────────────────────────────────┐
│ AEP Ingest Server │
│ - Validate (JSON Schema) │
│ - Authenticate (Bearer token) │
│ - Deduplicate (UUID + time) │
│ - Sign (HMAC-SHA256) │
│ - Store (SQLite) │
└────────┬────────────────────────┘
│
↓ Real-time SSE
┌─────────────────────────────────┐
│ Live Dashboard │
│ - Session timeline │
│ - Causation DAG │
│ - Workflow tree │
│ - Metrics/rejection logs │
└─────────────────────────────────┘
Key Guarantees:
- ✅ Causation chains: trace_id + parent_session_id preserve call hierarchy
- ✅ Deduplication: event UUID + timestamp prevent double-processing
- ✅ Authenticity: HMAC signatures verify event origin
- ✅ Tenant isolation: API keys scoped to tenants; cross-tenant access rejected
- ✅ Real-time visibility: SSE updates push to dashboard instantly
JavaScript server (Node.js) — 82 tests
npm test # full suite (55 unit + 27 integration)
npm run test:unit # 55 unit tests (event protocol, validation, CLI)
npm run test:integration # 27 integration tests (HTTP server flow)
npm run lint # ESLint checksPython SDK — 118 tests
cd sdks/python
pip install -e ".[dev]"
pytest tests/unit/ # 107 unit tests (no server needed)
pytest tests/integration/ # 11 integration tests (auto-skip if server is down)Go SDK — 80+ tests
cd sdks/go
go test ./... # 69+ unit tests + 11 integration tests (auto-skip if server is down)Test Coverage:
- ✅ Event protocol validation, creation, signing (all 12 event types)
- ✅ JSON Schema validation with payload schema caching + TTL
- ✅ API endpoints (auth, rate limiting, deduplication, exports)
- ✅ Client libraries (sync + async, error handling, timeouts)
- ✅ Multi-tenant isolation (per-API-key scoping)
- ✅ HMAC-SHA256 signing and verification (constant-time)
- ✅ CLI argument parsing and command behavior
- ✅ Dashboard functionality (SSE, filtering, exports)
We welcome contributions! Here's how:
- Fork the repo
- Create a feature branch (
git checkout -b feature/my-feature) - Make your changes and write tests
- Lint & test (
npm run lint:fix && npm test) - Commit with clear messages
- Push and open a Pull Request
Areas we're looking for help:
- 📱 Mobile dashboard (React Native)
- 📈 Advanced metrics & analytics
- 🌍 Internationalization
- 📚 Docs & tutorials
See CONTRIBUTING.md for detailed guidelines.
- Questions? Open an issue with the
questionlabel or start a discussion - Found a bug? Submit an issue with steps to reproduce
- Security issue? See SECURITY.md for responsible disclosure
- Have an idea? Start a discussion or open a feature request
- Community standards? Check out our CODE_OF_CONDUCT.md
MIT License: see LICENSE for details.
- JavaScript/TypeScript — Server + dashboard + CLI + docs
- Python SDK —
sdks/python/· sync + async clients, validator, HMAC signing, demo - Go SDK —
sdks/go/· sync + async clients, validator, HMAC signing, CLI, demo - Node.js / TypeScript SDK core —
sdks/node/·@surpradhan/aep· event factory, validator, HMAC signing (cross-language parity),fetchclient, dual ESM+CJS (Phase 12g PR1) - Kubernetes operator —
operator/·AgentInstrumentationCRD, sidecar-injection webhook, Helm chart - OTEL (OpenTelemetry) bridge —
sdks/python/aep/otel/· span-to-event mapper +AEPSpanExporter - OTEL Collector plugin —
otelbridge/· Collector exporter (span → AEP event) + ocb build config + demo - LangGraph auto-instrumentation —
aep.instrument()· zero-code patching for LangGraph workflows - CrewAI auto-instrumentation —
aep.instrument()· zero-code event-bus subscription for CrewAI crews (Phase 12c) - AutoGen auto-instrumentation —
aep.instrument()· zero-coderun_streamtap for AutoGen AgentChat teams (Phase 12d) - OpenAI Agents SDK auto-instrumentation —
aep.instrument()· zero-code tracing-processor registration forRunner.run(Phase 12e) - Anthropic Claude Agent SDK auto-instrumentation —
aep.instrument()· zero-code hook injection forquery()/ClaudeSDKClient(Phase 12f) - Node.js / LangChain.js auto-instrumentation —
instrument()· zero-codeCompiledStateGraphcallback injection for LangGraph (Phase 12g PR2) - Vercel AI SDK — docs-only path through the OTEL bridge:
experimental_telemetry→ OTEL Collector →otelbridge/AEP exporter (seedocs/integrations/vercel-ai-sdk.md) - Advanced filtering & visualization in dashboard (Phase 15)
- Webhook integration for alerts (Phase 16)
- S3/cloud export for long-term storage (Phase 17: JSONL/CSV/Parquet, gzip/brotli, local + S3 sink, export-before-prune)
Made with ❤️ for the AI agent community · Star us on GitHub!