You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Status: draft, awaiting community input. This is a research-backed shape, not a design we have committed to ship. The "Open decisions" at the bottom are the most useful place to weigh in — those answers materially change the implementation. Comments on the rest are very welcome too.
ReverbCode currently captures zero user analytics. We want to fix that, but want to do it in a way that's defensible from a privacy standpoint and consistent with the rest of the architecture (loopback-only daemon, durable facts + derived reads, ports/adapters, CDC via DB triggers). This post lays out a shape based on how four similar tools handle the same problem.
Why this exists
We want to understand:
Where the user is going — which features get reached, which paths get abandoned, what the typical lifecycle of a session looks like.
Where the user is getting stuck — drop-off in onboarding, failures that leave the user without a path forward, repeated attempts on the same action.
What crashed and when — daemon panics, adapter failures, frontend exceptions, with enough context to triage without a back-and-forth.
What a user does — high-fidelity stream of user actions (CLI invocations, UI clicks, spawns, sends, kills) without capturing the content of their work.
The ability to replay the recorded actions back, both for debugging and for reasoning about regressions.
Today the backend has zero user-analytics. It does have plenty of system observation: backend/internal/observe/ is the SCM/tracker poll loop, the change_log table already records every durable domain mutation via DB triggers, and the CDC poller broadcasts those to in-process subscribers. None of that surfaces user behavior.
Reference designs
Four similar tools were read end-to-end before writing this. The relevant mechanisms are summarised below; cited URLs at the bottom go to the exact sections.
PostHog. Captures workspace created, model selected, message sent (metadata only), provider errors with message, unexpected errors with message + stack. Explicit "no session recordings". enterprise_data_privacy toggle in .conductor/settings.toml.
Concrete event taxonomy; the no-session-recordings stance; repo-level enterprise toggle pattern.
share setting: "manual"/"auto"/"disabled" controls cloud-sync of conversations. MDM-deployable managed configs on macOS (.mobileconfig via Jamf/Kandji/FleetDM) sit at top priority.
Tri-state user-facing setting; managed-config layer for enterprise enforcement (even before we have Mac signed builds).
AbstractEventLogger interface; EVENT_LOGGER config swaps backend at boot (DBEventLogger, StdOutEventLogger, statsd). Decorator-based instrumentation. Curated payload allowlist — only allowlisted keys reach the sink.
The sink interface, the allowlist-by-construction, the decorator pattern.
What we already have to build on
change_log (DB-triggered CDC, in backend/internal/storage/sqlite/migrations): every durable domain mutation is already captured chronologically with a monotonic sequence. This is half of "replay" already, for free.
backend/internal/cdc poller + in-process broadcaster: a tested fan-out primitive we can reuse to deliver telemetry events to multiple sinks.
Structured slog everywhere, with request IDs threaded through the chi router (in backend/internal/httpd/api.go).
Loopback-only HTTP API with an OpenAPI spec, regenerated from the apispec package — natural place to expose read endpoints over telemetry without re-inventing surface.
Single-writer SQLite pool: any telemetry persistence must respect this so triggers and reads stay consistent.
CLI is a thin client: the CLI cannot speak directly to a telemetry sink — it has to go through daemon HTTP, same as everything else.
Architectural constraints we must not violate
These are restated from AGENTS.md and docs/architecture.md because each rules out a "convenient" design that we'd otherwise reach for:
Daemon is loopback-only. Telemetry export to a remote collector must originate from the daemon, not be reachable from outside the daemon. We cannot bind an exporter to anything beyond 127.0.0.1.
CLI is a thin client. No direct sink writes from the CLI. CLI emits telemetry by calling a daemon endpoint, or via slog that the daemon picks up.
Don't store derived/display status. Same rule applies to events: persist durable facts, derive aggregates at read time. No daily-counter tables maintained by application code; that's a recipe for drift.
CDC events come from DB triggers into change_log. Do not bypass the trigger mechanism to write a parallel telemetry stream from store methods. New telemetry tables get their own triggers or a separate insert path that doesn't touch domain tables.
context.Context first for I/O. Sinks must be context-cancellable so shutdown is bounded.
No hand-edited sqlite/gen/*. Any new tables go through migrations/ + queries/ + npm run sqlc.
Proposal
Mental model: a fourth lane
The current model is OBSERVE → UPDATE → DERIVE / ACT. Telemetry adds a fourth lane:
The lane is parallel to "ACT" and reads from the same sources (lifecycle/PR managers, CLI command runners, HTTP handlers). It never writes back to the domain.
One sink interface, several backends
Add a single port in backend/internal/ports/telemetry.go:
// EventSink consumes structured telemetry events. Implementations must be// non-blocking from the caller's perspective: a slow or failing sink must// never stall a user action.typeEventSinkinterface {
Emit(ctx context.Context, evEvent) // best-effort, non-blockingClose(ctx context.Context) error// drain on shutdown
}
Implementations live under backend/internal/adapters/telemetry/:
noop — default. Discards. Zero cost.
localsqlite — appends to a new telemetry_event table behind the single-writer pool. Bounded retention (rolling N days, hard cap by row count). Read-only HTTP surface for the CLI and a future debug dashboard.
otlp — OTLP/HTTP exporter, batched and async. Modeled on Codex's [otel] shape. Mapped fields: events become OTel logs, durations become histograms.
posthog — optional, only if we decide to take the Conductor route. Mapped 1:1 from Event → PostHog capture(). Strict allowlist; no PII.
fanout — composes multiple sinks; used by the daemon wiring to fan to both localsqlite (always, when telemetry is enabled at all) and the user's chosen remote sink.
This is the Superset/Codex pattern: behaviour and policy in the wiring layer, not in the call sites. Call sites only see EventSink.
Two-tier user control (Codex pattern, adapted)
Codex split telemetry into two settings because the privacy posture is fundamentally different between anonymous counters and rich event traces. We should do the same. The defaults below are the privacy-first reading; the opposite is a viable position — see "Open decisions".
Tier
Default
What it includes
Where it goes
metrics
off
anonymous counters + durations only, no per-user fields
localsqlite only unless remote is also enabled
events
off
full event records including user-action shape (still no content)
localsqlite only unless remote is also enabled
remote
off
upload metrics and/or events to a configured exporter
exporter URL must be explicitly set
Configuration lives in the existing env-only config layer:
AO_TELEMETRY_METRICS=off|on # default off
AO_TELEMETRY_EVENTS=off|on # default off
AO_TELEMETRY_REMOTE=off|otlp|posthog
AO_TELEMETRY_OTLP_ENDPOINT=https://otel.example.com/v1/logs
AO_TELEMETRY_OTLP_HEADERS_JSON={"x-otlp-api-key":"…"}
AO_TELEMETRY_REDACT_BRANCH_NAMES=true # default true; project/branch names are sensitive
Per existing convention (no AO_HOST, etc.) these are env vars on the daemon, not flags. They can be inspected via ao doctor so the user can see what is actually configured.
Curated payload allowlist (Superset pattern)
Every event is a typed Go struct, not a map[string]any. The payload schema is the surface area we audit. A new event = a new struct + a new entry in the event-name constants. Free-form extra fields are not permitted at the call site:
typeProjectAddedEventstruct {
ProjectIDHashstring// sha256(project_id), not the id itselfHasGitRemoteboolDurationMsint64
}
The hashing is the same trick PostHog uses for "distinct_id": a stable opaque identifier the daemon can compute locally without ever leaving the machine. The backend stores the raw value alongside the hash so it can join for local debugging, but only the hash leaves the daemon.
Trust boundary for telemetry config
Following Codex: telemetry settings (AO_TELEMETRY_* and any future managed-config equivalent) are user-scope only. A project's checked-in .ao/settings file (if/when we add one) cannot turn on remote export or change the endpoint. This prevents a hostile repo from leaking events from anyone who clones it.
Crash bundles instead of always-on crash reporting
Conductor auto-uploads crash logs to PostHog with stack traces. That requires us to ship a stable identity, an upload endpoint, and a retention policy on day one. We can defer all of that and still solve the "what crashed and when" question with a CLI command:
ao bug-report # default: last 24h of events + config snapshot + redacted
ao bug-report --since=7d --include-prompts=false
This produces a single .zip in the cwd. The contents:
All events from telemetry_event in the window
change_log rows in the window (already durable, already redacted of content)
A redacted snapshot of running.json and the daemon version
The last N lines of the daemon's slog output
A manifest listing exactly what's included so the user can inspect before attaching to an issue
The user attaches the zip to a GitHub issue. We never auto-upload anything without an explicit opt-in.
Replay means event playback, not screen recording
Conductor is explicit: "we don't capture or store any session recordings." That is the right line for us too. The replay capability is event playback, not terminal/UI capture, for these reasons:
Terminal capture inevitably contains agent output, file diffs, prompts, and source code. Sending that anywhere — even to ourselves — is a hard problem we should not take on yet.
change_log + the new telemetry_event table together already give us a chronological, durable, replayable history of what the user did and what the system observed.
Replay against a fresh DB in a test harness is straightforward when the events are durable facts; impossible when they are pixel buffers.
The replay tool is a separate, small thing:
ao replay <bug-report.zip> # spins up an isolated daemon against a temp DB
# and feeds the recorded events through the same
# ingest paths.
This is achievable because everything in the backend already flows through the ports/adapters boundary, so injecting fakes for the runtime/workspace/agent adapters is the existing test pattern.
Event taxonomy (mapped to the five questions)
The names below are the initial set. Each has a typed struct; each is a distinct line item we can debate. All event names are dot-namespaced under ao.<domain>.<verb>.
"Where the user is going" (navigation + funnel)
Event
Trigger
ao.daemon.started
daemon boot
ao.cli.invoked
every CLI command runs (name only, never argv content)
ao.onboarding.first_project_added
first time a project row is created on this install
ao.onboarding.first_session_spawned
first ever session spawn
ao.onboarding.first_pr_observed
first PR row written by the PR manager
ao.onboarding.first_merge
first session that observes a merged PR
These are exactly the lifecycle waypoints the docs already call out. Aggregated, they answer "how far do new users get."
"Where the user is getting stuck"
Event
Trigger
ao.cli.exit_2
usage error path (we already exit 2 for these)
ao.cli.repeated_failure
same command fails ≥3× within 5min
ao.daemon.error_envelope
every API error response (status, code, request_id; no body)
session_id_hash, body_len_chars (length only, never text)
ao.terminal.opened
session_id_hash
ao.doctor.run
failing_checks_count, os, arch, daemon_version
"Replay it back"
Covered by ao bug-report + ao replay above. No additional events.
What we will not capture
Stating these explicitly so they don't quietly creep in via PR review:
Prompts, agent output, terminal scrollback. Length and counts only.
File paths, diff contents. A session's identifying fact off-machine is the hash, never the working-tree path or branch name.
Project / branch / PR titles. All redacted by default (AO_TELEMETRY_REDACT_BRANCH_NAMES=true). An enterprise user who wants names for self-hosted dashboards can opt back in for their own collector — but not the default.
Anything that travels between the user and their AI provider. Same line Conductor draws: we are not a network proxy for that traffic and we don't observe it.
IP addresses or hostnames at the application layer. PostHog/OTLP will see the source IP of the daemon's outbound HTTP request; that's unavoidable and must be documented.
Phasing (each step is a separately reviewable PR)
Plumbing only, default off. Add ports.EventSink, the noop and localsqlite adapters, the telemetry_event table + sqlc queries, the [telemetry] env config, and the new fourth lane wired into the daemon composition root. Instrument exactly two paths as a smoke test: daemon start/stop and ao.cli.invoked. No remote sinks yet. No CLI surface yet. This is the smallest "real but not load-bearing" PR.
Bug-report bundle. Implement ao bug-report over the daemon HTTP surface (new read endpoint that streams a zip). No upload — just the download. This is immediately useful for our own support workflow even if no events are wired beyond the smoke set.
Full event taxonomy + funnel events. Wire every event listed above through the existing services (session_manager, lifecycle, pr, doctor) at the points where the durable fact is already being written. Add tests that assert the event fires exactly once per fact (mirrors the change_log test style).
Remote sinks behind explicit opt-in. Add the otlp adapter, gated by AO_TELEMETRY_REMOTE=otlp + a non-empty endpoint. Optionally add posthog if the answer to Open Decision feat(backend): Lifecycle Manager + Session Manager lane #2 below is PostHog.
Replay command.ao replay <bug-report.zip> against an isolated daemon instance with fake adapters. Useful for our own regression work; ship it later, no rush.
Open decisions (these are where input is most useful)
Default state. Should metrics default to on (Codex / Conductor — more data, harder enterprise sell) or off (privacy-first — slower product feedback loop)? Current lean is off until we have a published privacy notice; Codex's hybrid (metrics on, events off) is a reasonable middle ground if we can stand up a notice quickly.
Remote sink: OTLP vs PostHog. OTLP is vendor-neutral and matches the self-hosted-friendly posture of the project, but we get nothing for free — we have to stand up a collector and a dashboard. PostHog is turnkey and is what Conductor uses, but it's a vendor relationship with an attached privacy policy we'd have to publish. We can support both behind the same sink interface; the question is which one we wire as "blessed."
Scope: backend-only or renderer too? The frontend is still a placeholder. Backend-only is the lowest-risk first slice. Adding a renderer-side analytics.ts later is independent and can reuse the same event names over an existing daemon route.
Replay scope. Confirm we are explicitly choosing event playback only and not full terminal/UI capture. Conductor went the same way and called it out as a feature, not a limitation. Current lean: same.
Crash auto-upload. Bug-report bundles cover the "user files an issue" case. Do we additionally want the daemon to auto-upload daemon.panic events when remote is enabled? Codex does (under [otel]); Conductor does (under PostHog). Worth a separate decision because the answer changes whether we can ever drop the manual bug-report path.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
ReverbCode currently captures zero user analytics. We want to fix that, but want to do it in a way that's defensible from a privacy standpoint and consistent with the rest of the architecture (loopback-only daemon, durable facts + derived reads, ports/adapters, CDC via DB triggers). This post lays out a shape based on how four similar tools handle the same problem.
Why this exists
We want to understand:
Today the backend has zero user-analytics. It does have plenty of system observation:
backend/internal/observe/is the SCM/tracker poll loop, thechange_logtable already records every durable domain mutation via DB triggers, and the CDC poller broadcasts those to in-process subscribers. None of that surfaces user behavior.Reference designs
Four similar tools were read end-to-end before writing this. The relevant mechanisms are summarised below; cited URLs at the bottom go to the exact sections.
[analytics](anonymous, opt-out) +[otel](rich, opt-in,log_user_prompt = falsedefault). Dot-namespaced events (codex.api_request,codex.tool.call). Counter + histogram pairs per event. Project-local config cannot override telemetry keys.workspace created,model selected,message sent(metadata only), provider errors with message, unexpected errors with message + stack. Explicit "no session recordings".enterprise_data_privacytoggle in.conductor/settings.toml.sharesetting:"manual"/"auto"/"disabled"controls cloud-sync of conversations. MDM-deployable managed configs on macOS (.mobileconfigvia Jamf/Kandji/FleetDM) sit at top priority.superset/utils/log.py)AbstractEventLoggerinterface;EVENT_LOGGERconfig swaps backend at boot (DBEventLogger,StdOutEventLogger, statsd). Decorator-based instrumentation. Curated payload allowlist — only allowlisted keys reach the sink.What we already have to build on
change_log(DB-triggered CDC, inbackend/internal/storage/sqlite/migrations): every durable domain mutation is already captured chronologically with a monotonic sequence. This is half of "replay" already, for free.backend/internal/cdcpoller + in-process broadcaster: a tested fan-out primitive we can reuse to deliver telemetry events to multiple sinks.slogeverywhere, with request IDs threaded through the chi router (inbackend/internal/httpd/api.go).Architectural constraints we must not violate
These are restated from
AGENTS.mdanddocs/architecture.mdbecause each rules out a "convenient" design that we'd otherwise reach for:127.0.0.1.change_log. Do not bypass the trigger mechanism to write a parallel telemetry stream from store methods. New telemetry tables get their own triggers or a separate insert path that doesn't touch domain tables.context.Contextfirst for I/O. Sinks must be context-cancellable so shutdown is bounded.sqlite/gen/*. Any new tables go throughmigrations/+queries/+npm run sqlc.Proposal
Mental model: a fourth lane
The current model is OBSERVE → UPDATE → DERIVE / ACT. Telemetry adds a fourth lane:
The lane is parallel to "ACT" and reads from the same sources (lifecycle/PR managers, CLI command runners, HTTP handlers). It never writes back to the domain.
One sink interface, several backends
Add a single port in
backend/internal/ports/telemetry.go:Implementations live under
backend/internal/adapters/telemetry/:noop— default. Discards. Zero cost.localsqlite— appends to a newtelemetry_eventtable behind the single-writer pool. Bounded retention (rolling N days, hard cap by row count). Read-only HTTP surface for the CLI and a future debug dashboard.otlp— OTLP/HTTP exporter, batched and async. Modeled on Codex's[otel]shape. Mapped fields: events become OTel logs, durations become histograms.posthog— optional, only if we decide to take the Conductor route. Mapped 1:1 fromEvent→ PostHogcapture(). Strict allowlist; no PII.fanout— composes multiple sinks; used by the daemon wiring to fan to bothlocalsqlite(always, when telemetry is enabled at all) and the user's chosen remote sink.This is the Superset/Codex pattern: behaviour and policy in the wiring layer, not in the call sites. Call sites only see
EventSink.Two-tier user control (Codex pattern, adapted)
Codex split telemetry into two settings because the privacy posture is fundamentally different between anonymous counters and rich event traces. We should do the same. The defaults below are the privacy-first reading; the opposite is a viable position — see "Open decisions".
metricslocalsqliteonly unless remote is also enabledeventslocalsqliteonly unless remote is also enabledremotemetricsand/oreventsto a configured exporterConfiguration lives in the existing env-only config layer:
Per existing convention (no
AO_HOST, etc.) these are env vars on the daemon, not flags. They can be inspected viaao doctorso the user can see what is actually configured.Curated payload allowlist (Superset pattern)
Every event is a typed Go struct, not a
map[string]any. The payload schema is the surface area we audit. A new event = a new struct + a new entry in the event-name constants. Free-formextrafields are not permitted at the call site:The hashing is the same trick PostHog uses for "distinct_id": a stable opaque identifier the daemon can compute locally without ever leaving the machine. The backend stores the raw value alongside the hash so it can join for local debugging, but only the hash leaves the daemon.
Trust boundary for telemetry config
Following Codex: telemetry settings (
AO_TELEMETRY_*and any future managed-config equivalent) are user-scope only. A project's checked-in.ao/settingsfile (if/when we add one) cannot turn on remote export or change the endpoint. This prevents a hostile repo from leaking events from anyone who clones it.Crash bundles instead of always-on crash reporting
Conductor auto-uploads crash logs to PostHog with stack traces. That requires us to ship a stable identity, an upload endpoint, and a retention policy on day one. We can defer all of that and still solve the "what crashed and when" question with a CLI command:
This produces a single
.zipin the cwd. The contents:telemetry_eventin the windowchange_logrows in the window (already durable, already redacted of content)running.jsonand the daemon versionThe user attaches the zip to a GitHub issue. We never auto-upload anything without an explicit opt-in.
Replay means event playback, not screen recording
Conductor is explicit: "we don't capture or store any session recordings." That is the right line for us too. The replay capability is event playback, not terminal/UI capture, for these reasons:
change_log+ the newtelemetry_eventtable together already give us a chronological, durable, replayable history of what the user did and what the system observed.The replay tool is a separate, small thing:
This is achievable because everything in the backend already flows through the ports/adapters boundary, so injecting fakes for the runtime/workspace/agent adapters is the existing test pattern.
Event taxonomy (mapped to the five questions)
The names below are the initial set. Each has a typed struct; each is a distinct line item we can debate. All event names are dot-namespaced under
ao.<domain>.<verb>."Where the user is going" (navigation + funnel)
ao.daemon.startedao.cli.invokedao.onboarding.first_project_addedao.onboarding.first_session_spawnedao.onboarding.first_pr_observedao.onboarding.first_mergeThese are exactly the lifecycle waypoints the docs already call out. Aggregated, they answer "how far do new users get."
"Where the user is getting stuck"
ao.cli.exit_2ao.cli.repeated_failureao.daemon.error_envelopeao.spawn.failedao.adapter.unavailableao.lifecycle.session_terminated_unexpectedPattern matches Conductor's "provider returned an error" + "unexpected error" shape.
"What crashed and when"
ao.daemon.panicRecoverermiddleware or in any tracked goroutineao.daemon.shutdown_uncleanao.adapter.panicStack traces are included only when
eventstier is on, and only for daemon code (never user/agent code)."What a user does"
The CLI verbs are the natural unit. One event per verb. The payload is a typed struct with allowlisted fields.
ao.project.addedproject_id_hash,has_git_remote,duration_msao.session.spawnedsession_id_hash,agent_kind,runtime_kind,from_pr_branch(bool)ao.session.killedsession_id_hash,reason ∈ {user,reaper,merged}ao.session.restoredsession_id_hashao.sendsession_id_hash,body_len_chars(length only, never text)ao.terminal.openedsession_id_hashao.doctor.runfailing_checks_count,os,arch,daemon_version"Replay it back"
Covered by
ao bug-report+ao replayabove. No additional events.What we will not capture
Stating these explicitly so they don't quietly creep in via PR review:
AO_TELEMETRY_REDACT_BRANCH_NAMES=true). An enterprise user who wants names for self-hosted dashboards can opt back in for their own collector — but not the default.Phasing (each step is a separately reviewable PR)
Plumbing only, default off. Add
ports.EventSink, thenoopandlocalsqliteadapters, thetelemetry_eventtable + sqlc queries, the[telemetry]env config, and the new fourth lane wired into the daemon composition root. Instrument exactly two paths as a smoke test: daemon start/stop andao.cli.invoked. No remote sinks yet. No CLI surface yet. This is the smallest "real but not load-bearing" PR.Bug-report bundle. Implement
ao bug-reportover the daemon HTTP surface (new read endpoint that streams a zip). No upload — just the download. This is immediately useful for our own support workflow even if no events are wired beyond the smoke set.Full event taxonomy + funnel events. Wire every event listed above through the existing services (session_manager, lifecycle, pr, doctor) at the points where the durable fact is already being written. Add tests that assert the event fires exactly once per fact (mirrors the change_log test style).
Remote sinks behind explicit opt-in. Add the
otlpadapter, gated byAO_TELEMETRY_REMOTE=otlp+ a non-empty endpoint. Optionally addposthogif the answer to Open Decision feat(backend): Lifecycle Manager + Session Manager lane #2 below is PostHog.Replay command.
ao replay <bug-report.zip>against an isolated daemon instance with fake adapters. Useful for our own regression work; ship it later, no rush.Open decisions (these are where input is most useful)
Default state. Should
metricsdefault to on (Codex / Conductor — more data, harder enterprise sell) or off (privacy-first — slower product feedback loop)? Current lean is off until we have a published privacy notice; Codex's hybrid (metrics on, events off) is a reasonable middle ground if we can stand up a notice quickly.Remote sink: OTLP vs PostHog. OTLP is vendor-neutral and matches the self-hosted-friendly posture of the project, but we get nothing for free — we have to stand up a collector and a dashboard. PostHog is turnkey and is what Conductor uses, but it's a vendor relationship with an attached privacy policy we'd have to publish. We can support both behind the same sink interface; the question is which one we wire as "blessed."
Scope: backend-only or renderer too? The frontend is still a placeholder. Backend-only is the lowest-risk first slice. Adding a renderer-side
analytics.tslater is independent and can reuse the same event names over an existing daemon route.Replay scope. Confirm we are explicitly choosing event playback only and not full terminal/UI capture. Conductor went the same way and called it out as a feature, not a limitation. Current lean: same.
Crash auto-upload. Bug-report bundles cover the "user files an issue" case. Do we additionally want the daemon to auto-upload
daemon.panicevents when remote is enabled? Codex does (under[otel]); Conductor does (under PostHog). Worth a separate decision because the answer changes whether we can ever drop the manual bug-report path.References:
AbstractEventLogger— https://github.com/apache/superset/blob/master/superset/utils/log.pyBeta Was this translation helpful? Give feedback.
All reactions