The live streaming dev server enables real-time trace evaluation for agent development. Agents stream OpenTelemetry spans via WebSocket as they execute, and results appear instantly in both terminal and browser.
agentevals serve --dev --port 8001This starts:
- WebSocket server at
ws://localhost:8001/ws/traces - API at
http://localhost:8001/api - SSE endpoint at
http://localhost:8001/stream/ui-updates
cd agentevals/ui
npm run devNavigate to http://localhost:5173 and select "I am developing an agent".
The recommended approach uses the async enable_streaming context manager:
from agentevals.streaming import enable_streaming
async with enable_streaming(
ws_url="ws://localhost:8001/ws/traces",
eval_set_id="my-eval-set",
) as session_id:
# Your agent code runs normally — all OTel spans stream automatically
agent = Agent(name="my-agent")
result = agent.invoke("Do something")For more control over the streaming lifecycle, use AgentEvalsStreamingProcessor directly. See examples/README.md for instrumentation patterns tailored to different frameworks (LangChain, Strands, Google ADK).
- Terminal: Session status prints when agent finishes
- Browser: Live span tree builds as agent executes
- Evaluation: Triggered from the UI after session completes
Agent (any OTel-instrumented framework)
↓ WebSocket (OTLP/JSON spans + logs)
agentevals dev server
↓ SSE (real-time updates)
Browser UI
Agents emit OTel spans (and optionally logs for GenAI message content). The dev server receives them over WebSocket, incrementally extracts invocations, and pushes updates to the browser via Server-Sent Events.
See examples/README.md for details on supported instrumentation approaches (OTel GenAI semantic conventions, Google ADK, etc.).
Native OpenTelemetry format. The CLI auto-detects Jaeger vs OTLP from
file contents, so .json and .jsonl exports from Tempo, Jaeger, or
the OTel collector all work without a --format flag:
# Load any trace file directly; format is auto-detected
agentevals run trace.otlp.json --eval-set eval.jsonPass --format otlp-json (or jaeger-json) only as an override when
auto-detection fails on a non-standard export.
The AgentEvalsStreamingProcessor is an OTel SpanProcessor that streams spans over WebSocket as they complete:
from agentevals.streaming.processor import AgentEvalsStreamingProcessor
processor = AgentEvalsStreamingProcessor(
ws_url="ws://localhost:8001/ws/traces",
session_id="my-session",
trace_id="abc123",
)
await processor.connect(eval_set_id="my-eval")
# Register with your TracerProvider
tracer_provider.add_span_processor(processor)
# When done:
await processor.shutdown_async()For GenAI agents that emit message content as OTel Logs, use AgentEvalsLogStreamingProcessor alongside it — see the langchain_agent example.
When a session ends (session_end message), the server:
- Enriches spans with log-based message content (if any)
- Extracts invocations (user/agent messages, tool calls, model info)
- Broadcasts
session_completeto the browser UI via SSE - Sends
session_completeback to the agent via WebSocket
Evaluation is triggered separately from the UI or API.
agentevals supports two mechanisms for receiving GenAI message content (gen_ai.input.messages, gen_ai.output.messages):
Log records (recommended). Instrumentation libraries like opentelemetry-instrumentation-openai-v2 emit message content as OTel log records correlated with spans via trace context. The server merges these back into spans during session completion (see log_enrichment.py).
Span events (legacy, supported for backward compatibility). Some frameworks (notably Strands with OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental) currently emit message content as span event attributes. The AgentEvalsStreamingProcessor promotes these attributes to span-level attributes so downstream converters see a uniform shape. This promotion happens in three places:
streaming/processor.pyfor live WebSocket spansapi/otlp_routes.pyfor OTLP HTTP receptionloader/otlp.pyfor loading OTLP JSON files
The OTel community is deprecating span events in favor of log-based events emitted via the Logs API. As frameworks migrate, the log-based path will become the standard. The span event promotion logic will remain for backward compatibility with older instrumentation versions.
For a full overview of supported OTel conventions and migration guidance, see otel-compatibility.md.
Session start:
{
"type": "session_start",
"session_id": "session-abc123",
"trace_id": "3e289017...",
"eval_set_id": "my-eval",
"metadata": {}
}Span (OTLP/JSON format):
{
"type": "span",
"session_id": "session-abc123",
"span": {
"traceId": "3e289017...",
"spanId": "1f9762ca...",
"name": "chat gpt-4o",
"startTimeUnixNano": "1771237534577907000",
"endTimeUnixNano": "1771237535012345000",
"attributes": [
{"key": "gen_ai.request.model", "value": {"stringValue": "gpt-4o"}}
]
}
}Log (GenAI message content):
{
"type": "log",
"session_id": "session-abc123",
"log": {
"traceId": "3e289017...",
"spanId": "1f9762ca...",
"body": {"stringValue": "gen_ai.user.message"},
"attributes": [
{"key": "gen_ai.user.message", "value": {"stringValue": "{\"role\": \"user\", \"content\": \"Hello\"}"}}
]
}
}Session end:
{
"type": "session_end",
"session_id": "session-abc123"
}Session complete (with extracted invocations):
{
"type": "session_complete",
"invocations": [
{
"invocationId": "inv-1",
"userText": "Roll a 20-sided die",
"agentText": "I rolled a 20-sided die and got 13",
"toolCalls": [{"name": "roll_die", "args": {"sides": 20}}],
"modelInfo": {"model": "gpt-4o"}
}
]
}Error (limit exceeded):
{
"type": "error",
"message": "Session has reached maximum span limit (10000)"
}Real-time events for the browser UI:
const eventSource = new EventSource('http://localhost:8001/stream/ui-updates');
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
switch (data.type) {
case 'session_started':
// New session with metadata
break;
case 'span_received':
// New span — build trace tree incrementally
break;
case 'session_complete':
// Session ended — invocations extracted
break;
}
};- Sessions are stored in-memory only (no database)
- Completed sessions expire after 2 hours (configurable)
- Maximum 100 sessions kept at a time
- Per-session limits: 10,000 spans and 5,000 logs
agentevals serve --dev \
--port 8001 \
--host 0.0.0.0 \
--eval-sets ./eval_sets/ \
--headless \
-v| Option | Default | Description |
|---|---|---|
--dev |
off | Enable WebSocket and live streaming |
--port, -p |
8001 | Server port |
--host |
0.0.0.0 | Host to bind |
--eval-sets |
— | Directory with eval set JSON files to pre-load |
--headless |
off | Run without browser (WebSocket only) |
-v, --verbose |
off | Increase verbosity (-v for INFO, -vv for DEBUG) |
# Terminal 1: Start dev server
agentevals serve --dev --port 8001
# Terminal 2: Start UI
cd ui && npm run dev
# Terminal 3: Run an example agent
python examples/dice_agent/main.py # Google ADK
python examples/langchain_agent/main.py # LangChain + GenAI semconv
python examples/strands_agent/main.py # Strands + GenAI semconvResult:
- Agent executes normally
- Spans (and logs) stream to server in real-time
- UI shows live trace tree building with incremental invocation extraction
- Evaluation can be triggered from the UI after session completes
- Compare multiple sessions side-by-side
See examples/README.md for instrumentation setup for each framework.
Streaming support requires the streaming extras:
pip install "agentevals[streaming]"This installs opentelemetry-sdk>=1.20.0. Agent code also needs websockets for the WebSocket connection.
src/agentevals/streaming/processor.py— OTelSpanProcessor+LogProcessorfor WebSocket streamingsrc/agentevals/streaming/ws_server.py— WebSocket handler + session management (StreamingTraceManager)src/agentevals/streaming/session.py— Session tracking (TraceSession)src/agentevals/streaming/incremental_processor.py— Incremental invocation extraction from spans/logssrc/agentevals/streaming/__init__.py—enable_streaming()convenience functionsrc/agentevals/loader/otlp.py— OTLP/JSON trace loadersrc/agentevals/utils/log_enrichment.py— Merges GenAI log content back into spans
ui/src/components/streaming/LiveStreamingView.tsx— Live sessions UIui/src/components/streaming/SessionCard.tsx— Individual session displayui/src/components/streaming/LiveConversationPanel.tsx— Real-time conversation view
All existing workflows continue to work:
- Trace files (Jaeger or OTLP, including Tempo exports) auto-detect by
content:
agentevals run trace.json --eval-set ... - Pass
--formatonly to override detection on non-standard exports. - Web UI upload flow unchanged.