Skip to content

feat: stable W3C-compatible correlation IDs in event envelope#46

Merged
scotthavird merged 5 commits into
mainfrom
claude/add-correlation-ids-2IJol
May 10, 2026
Merged

feat: stable W3C-compatible correlation IDs in event envelope#46
scotthavird merged 5 commits into
mainfrom
claude/add-correlation-ids-2IJol

Conversation

@scotthavird
Copy link
Copy Markdown
Contributor

Summary

Adds W3C Trace Context-compatible trace_id / span_id / parent_span_id to every envelope the CLI emits, so server-side OTLP export can map directly without inventing IDs at export time.

This is purely additive — envelope_version bumped from 1.0 to 1.1, and the new correlation block is optional. Older servers ignore it.

Implements the PRD for v0.4.0.

What ships

  • internal/correlationNewTraceID() / NewSpanID() using crypto/rand + hex.EncodeToString (no UUIDs, since UUID version bits violate W3C); rejects the all-zero IDs per spec.
  • Per-session persistence under ~/.config/promptconduit/traces/<session_id>.json so trace IDs survive across separate hook process invocations. Atomic writes via temp-file + rename. O_CREATE|O_EXCL resolves concurrent-hook races to a single trace ID. Probabilistic GC (~1%/fire) removes records older than 30 days.
  • Parent-span chains for Claude Code events:
    • PreToolUsePostToolUse / PostToolUseFailure (keyed by tool_use_id)
    • SubagentStartSubagentStop (keyed by subagent_id)
    • TaskCreatedTaskCompleted (keyed by task_id)
    • ElicitationElicitationResult (keyed by elicitation_id)
    • PreCompactPostCompact (session-keyed)
    • UserPromptSubmitStop / StopFailure (agent response parent is the originating prompt)
    • SessionStartSessionEnd
  • promptconduit debug trace <session_id> — prints the local trace tree (trace_id, recorded parent spans) for support and self-debugging.
  • Debug log line on each hook fire: correlation: trace=… span=… parent=….

Non-goals (per PRD)

  • No OpenTelemetry SDK import. No OTLP wire format.
  • No traceparent header propagation between processes.
  • No back-population of historical events.
  • No client-side sampling.

Smoke test

SessionStart → root_span recorded
UserPromptSubmit → last_prompt_submit recorded
PreToolUse(toolu_abc) → tool_uses.toolu_abc recorded
PostToolUse(toolu_abc) → parent_span_id = PreToolUse's span_id ✓

All 4 events share the same trace_id.

Test plan

  • go test ./internal/correlation/... — unit tests (ID format, uniqueness over 100k, concurrent LoadOrCreateTrace collapses to one trace ID, corrupt spans file falls back to empty)
  • go test ./... — full suite passes
  • go vet ./... — clean
  • End-to-end hook smoke test with synthetic SessionStart / UserPromptSubmit / PreToolUse / PostToolUse — verified shared trace_id and correct parent chaining via debug trace
  • Server-side: confirm correlation.* fields are accepted (envelope_version 1.1 should be transparent to current normalizer)
  • Benchmark hook latency before/after on a real session (target: <1ms p99 overhead)

https://claude.ai/code/session_01XjfDVBhyo2F4NwTSvGA9aa


Generated by Claude Code

claude added 5 commits May 10, 2026 20:35
Generate stable trace_id (16 bytes / 32 hex) and per-event span_id
(8 bytes / 16 hex) on every emitted envelope, with parent_span_id set
for known event chains (PreToolUse→PostToolUse, SubagentStart→SubagentStop,
TaskCreated→TaskCompleted, Elicitation→ElicitationResult,
PreCompact→PostCompact, UserPromptSubmit→Stop, SessionStart→SessionEnd).

Trace IDs persist per session under ~/.config/promptconduit/traces so
they remain stable across separate hook process invocations. Atomic
writes and O_CREATE|O_EXCL handle concurrent hook races. Probabilistic
GC removes records older than 30 days.

This is purely additive — envelope_version bumped to 1.1 and the new
field is optional, so older servers continue to work. The CLI does not
import the OpenTelemetry SDK; it only produces IDs an OTLP exporter
would later need.

Adds `promptconduit debug trace <session_id>` for inspecting locally
recorded trace state.

https://claude.ai/code/session_01XjfDVBhyo2F4NwTSvGA9aa
O_CREATE|O_EXCL leaves a window where another process can see the file
exists but hasn't been written yet, so the read-after-EEXIST path
returned malformed JSON under load. Switch to write-tempfile + os.Link:
the target only becomes visible once it's fully populated, and Link
fails with EEXIST atomically when another writer won the race.

Verified with 20 -race iterations of TestLoadOrCreateTrace_Concurrent.

https://claude.ai/code/session_01XjfDVBhyo2F4NwTSvGA9aa
Per Claude Code hook docs, PreToolUse/PostToolUse and the elicitation/
task events only carry session_id + transcript_path; rich identifiers
(tool_use_id, task_id, elicitation_id) live in the transcript JSONL.
The chain-keying remains correct for synthetic input and future Claude
Code versions, and SubagentStart/Stop still resolves via agent_id.

https://claude.ai/code/session_01XjfDVBhyo2F4NwTSvGA9aa
Move git context and correlation IDs out of top-level envelope fields
and into a single `enrichment` block alongside new source/host/os/arch
fields. The CLI stays a thin client: it forwards the tool's raw
native_payload untouched, plus enrichment hints the server can use
when normalizing.

Envelope shape (1.2):
  {
    envelope_version, cli_version, tool, hook_event, captured_at,
    native_payload,           # raw, untouched
    attachments,              # multipart binary metadata
    enrichment: {             # CLI-computed context
      git, source, correlation, host, os, arch
    }
  }

Source provider is derived from the git remote URL (github / gitlab /
bitbucket / azure / codeberg / sourcehut), supports both SSH and
HTTPS forms; returns "" for unrecognized hosts. Server-side handles
all version translation per the thin-client design.

https://claude.ai/code/session_01XjfDVBhyo2F4NwTSvGA9aa
Servers expecting the 1.0/1.1 envelope shape read git context and
correlation IDs from top-level fields. To avoid a deploy-ordering
risk between this CLI and the platform, restore those top-level
fields and have the envelope constructor mirror them automatically
from enrichment.git and enrichment.correlation.

Both shapes are now on the wire for one or two releases:
  - Old servers continue reading top-level git / correlation
  - New servers can begin reading enrichment.git / enrichment.correlation

Drop the legacy fields once all servers are upgraded — single
mirrorLegacyFields() removal.

Adds envelope_test.go covering mirroring, both-shapes-present, nil
enrichment, and version pinning.

https://claude.ai/code/session_01XjfDVBhyo2F4NwTSvGA9aa
@scotthavird scotthavird marked this pull request as ready for review May 10, 2026 21:20
@scotthavird scotthavird merged commit 4c4b9fb into main May 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants