TokenScope

Agent observability from the network wire. A passive analyzer that watches LLM traffic on the wire and reconstructs what your agents are actually doing — tool calls, multi-step plans, where time is spent, where loops happen, who calls whom — without an SDK, sidecar, or proxy in the request path.

What it does

Most agent code looks fine on paper and falls apart in production: a tool call stalls, the planner loops between two states, a downstream service silently substitutes a different model. TokenScope reconstructs that behavior from the bytes on the wire — packet capture → HTTP / SSE parse → wire-API decode → semantic extraction → agent-turn assembly — and serves the result through a console that's organized around turns and sessions, not raw HTTP calls.

It reads post-TLS traffic — on the inference host, behind a TLS terminator, or fed in from a SPAN/TAP point via cloud-probe. Multi-call agent interactions (planner → tool → planner → tool …) stitch into a single addressable turn. Multi-leg proxy hops (litellm in front of vLLM/SGLang/haproxy) fold automatically. The pipeline never sits in the request path, so the observer can fail without breaking the calls being observed.

NIC / .pcap file / cloud-probe (ZMQ)
        │
        ▼
   capture → flow dispatcher (hash by 5-tuple)
        │
        ▼
   N parallel workers: HTTP/SSE parse → wire-API detection → semantic extraction
        │
        ▼
   turn tracker  +  metrics aggregator  +  storage sink
        │
        ▼
       DuckDB ─── REST API ─── React console (localhost:3000)

Same connection's packets always land on the same worker, so parsing state is local and lock-free. Multiple independent pipelines can run side-by-side — e.g., low-latency local capture isolated from bursty cloud-probe ingress.

Why not an SDK / proxy / OpenTelemetry?

Approach	In request path	Needs client cooperation	Sees full bodies	Reconstructs agent turns
SDK instrumentation	yes	every client must	yes	every client must emit
Reverse proxy (LiteLLM …)	yes	clients point at it	yes	per-call only
OpenTelemetry from server	yes	server must emit	partial	if the server tags it
TokenScope	no	no	yes¹	yes

¹ TLS-terminated traffic only — TokenScope sees plaintext HTTP. Install it where the traffic is already decrypted: on the inference host, behind the TLS terminator, or fed by cloud-probe from a SPAN/TAP point.

The trade-off is honest: you give up cross-cluster client tracing, you get a single passive evidence chain that can't break the call when the observer fails, that requires zero cooperation from the workloads being observed, and that assembles the agent narrative for you instead of leaving you to join calls into turns in your data warehouse.

What's in the box

Ingress

libpcap on a live interface
Replay from .pcap files (any speed)
ZMQ from cloud-probe for hosts you can't install on directly

Agent-turn reconstruction with named profiles for Claude CLI (Claude Code) and OpenAI Codex CLI, a generic profile for everything else, plus an experimental OpenClaw profile. Turns stitch multi-call agent interactions (tool call → tool result → planner → next tool, repeat) into a single addressable unit. The hero screenshot above is one such turn — 247 calls, ordered on the Timeline, drillable into the request/response of any single call.

Service topology — see the agent's call graph, not just the calls. The Services page's Path view shows your inference fleet as a directed graph: clients → litellm proxies → vLLM / SGLang backends, with edge thickness scaled by turn count. Proxy hops paired by the passive sweeper render as solid edges; heuristically-inferred hops (when the inbound client_ip matches a known service) render as dashed; anonymous client traffic is dotted. The classifier names what each endpoint actually serves — vLLM, SGLang, Ollama, llama.cpp, LiteLLM — from the bytes on the wire, not from configuration the operator told it.

Wire-API decoders

OpenAI Chat Completions (/v1/chat/completions)
OpenAI Responses (/v1/responses)
Anthropic Messages (/v1/messages)
Gemini AI Studio (generativelanguage.googleapis.com)

This covers OpenAI direct, Azure OpenAI, Anthropic direct, AWS Bedrock / GCP Vertex (Anthropic wire), Google Gemini, and any OpenAI-compatible local server — vLLM, SGLang, Ollama, llama.cpp's server, LM Studio, etc.

Per-call drill-down when you need it — every LLM call is also captured with structured request/response and the raw body. Stalled tool calls, malformed prompts, unexpected token counts: the evidence is on the page, not behind a re-run.

Metrics are framed first at the agent layer — turn count and duration distribution per agent kind, call count per turn, tool-call success rate — and then at the call layer: TTFT · E2E latency · TPOT · token throughput · call rate · active calls · call error rate · prompt-cache hit ratio. The Overview page is built around both. See glossary for what each means and why.

Storage in DuckDB (default, embedded, single-file) with per-table retention enabled out of the box. Pluggable backend trait — PostgreSQL and ClickHouse are designed but not yet wired.

Console at http://localhost:3000: overview · performance · usage · errors · services (table / path / model views) · agent turns · agent sessions · LLM calls (with full request/response body drill-down) · raw HTTP exchanges · pipeline-health debug views.

More console screenshots

Distribution: prebuilt static binaries for Linux musl (x86_64 + aarch64) and macOS (Intel + Apple Silicon). Web console is embedded in the binary — single artifact, no separate frontend deploy.

Who it's for

Agent developers — debug stalled tool calls, detect plan-loop / "no submit" failure modes, and see exactly which model+endpoint each turn hit, without modifying the agent or its SDK
AI platform / inference ops — see the real service-to-service topology your traffic flows through (clients → litellm → vLLM / SGLang), measure each hop independently, and catch silent model substitutions
FinOps & engineering managers — attribute spend across teams/repos/projects from real turns, not periodic SDK exports that can drift
Compliance & security — capture-once evidence chain of what crossed the wire, scoped per agent kind and per session

Quickstart

# Install (Linux/macOS, no sudo, user-local)
curl -fsSL https://raw.githubusercontent.com/Netis/TokenScope/main/install.sh \
  | INSTALL_DIR="$HOME/.local" sh

# Linux: grant capture privileges to the binary (no sudo at runtime)
sudo setcap cap_net_raw,cap_net_admin=eip ~/.local/bin/tokenscope

# Capture from a live interface
tokenscope -i eth0

# ...or replay a pcap (no privileges needed)
tokenscope --pcap-file capture.pcap --no-retention

Then open http://localhost:3000.

After a pcap finishes replaying, the process keeps the API/console available so you can browse the results — press Ctrl+C to exit, or pass --exit-after-drain for batch/CI use that exits as soon as the pipeline drains.

TokenScope sees plaintext HTTP. Install it where the traffic is already decrypted, such as on the inference host, behind a TLS terminator, or fed from a trusted packet source.

For systemd deployment, capability options, and uninstall, see docs/install.md.

Documentation

Install — one-line installer, systemd, capabilities
Configure — pipelines, sources, storage, retention
Glossary — what every metric means
Architecture — pipeline design and trade-offs
Mission — long-arc vision

Roadmap

The current surface is the foundation layer (Ops use cases). On the way:

Storage — PostgreSQL and ClickHouse backends (schemas already designed)
Wire APIs — more provider-specific extensions (Bedrock variants, Vertex non-Anthropic, etc.)

See docs/mission.md for the full ladder.

Contributing

Bug reports and PRs welcome. Before opening a PR, run:

just build all       # single binary with embedded console
just quality all     # rust fmt + clippy + ts lint + tsc
just test all        # cargo test (all crates)

Run just help for the full menu. Design docs under docs/design/ describe the per-module contract — read the relevant one before changing anything load-bearing.

License

Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.claude		.claude
.github/workflows		.github/workflows
console		console
docs		docs
scripts		scripts
server		server
testdata/pcaps		testdata/pcaps
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
install.sh		install.sh
justfile		justfile
project.yaml		project.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenScope

What it does

Why not an SDK / proxy / OpenTelemetry?

What's in the box

Who it's for

Quickstart

Documentation

Roadmap

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TokenScope

What it does

Why not an SDK / proxy / OpenTelemetry?

What's in the box

Who it's for

Quickstart

Documentation

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages