Skip to content

docs(telemetry): instrumentation conventions + snake_case attribute keys#996

Open
EhabY wants to merge 2 commits into
mainfrom
chore/telemetry-conventions
Open

docs(telemetry): instrumentation conventions + snake_case attribute keys#996
EhabY wants to merge 2 commits into
mainfrom
chore/telemetry-conventions

Conversation

@EhabY

@EhabY EhabY commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

What

Establishes a shared style for local telemetry instrumentation so the in-flight telemetry branches can converge on one convention.

  • New src/instrumentation/CONVENTIONS.md (linked from CONTRIBUTING.md) covering:
    • the framework vs per-domain-instrumentation split,
    • threading: spans are passed explicitly, attributes are set imperatively (setProperty), and you never return a value purely to log it,
    • naming: events are domain.snake_case (past tense for point-in-time logs), and attribute keys follow the OTel convention — lowercase, . for hierarchy, _ for words, never camelCase,
    • properties vs measurements, outcomes/failures/aborts, and secret safety.
  • Brings existing instrumentation into line: renames caller-supplied property/measurement keys from camelCase to snake_case across auth, ssh, workspace, activation, websocket, and cli.download (e.g. cacheSource/workspaceName/maxBackoffMs-style → cache_source/workspace_name/*_ms). The metrics export unit-suffix table is updated to match. Framework-managed result/durationMs (never emitted as attributes) are left as-is.

Why

Telemetry is local-only and unreleased, so unifying the naming now is free. Merging this first gives the other telemetry branches a conformant base to rebase onto and a doc to follow.

@EhabY EhabY self-assigned this Jun 8, 2026
@EhabY EhabY force-pushed the chore/telemetry-conventions branch 3 times, most recently from 13ac79c to df89efa Compare June 8, 2026 15:06
@EhabY EhabY requested review from andrewdennis117 and hugodutka and removed request for hugodutka June 8, 2026 16:05
…e keys

Add src/instrumentation/CONVENTIONS.md documenting how telemetry is structured
(explicit span threading, imperative setProperty, event/attribute naming,
namespace grouping, properties vs measurements) and link it from CONTRIBUTING.

Align existing instrumentation with the OTel naming convention:
- rename caller-supplied property/measurement keys from camelCase to snake_case
  (auth, ssh, workspace, activation, websocket, cli.download);
- strip unit suffixes from exported metric names into the OTLP unit field
  (latency_ms -> metric latency, unit ms) so Prometheus suffixes cleanly;
- group the token-refresh span under its namespace (auth.token_refreshed ->
  auth.token_refresh.completed) next to auth.token_refresh.deduped.

Framework-managed result/durationMs are left unchanged.
@EhabY EhabY force-pushed the chore/telemetry-conventions branch from df89efa to ca8c7ec Compare June 10, 2026 15:22
- rename the trace outcome hook fail() -> error() and the category
  fetch_failed -> fetch_error to match the recordError vocabulary
- name count measurements <entity>.count (agent.count, workspace.count,
  event.count, interval.count) per OTel, replacing flat _count keys
- set error.type on auth error spans (login/logout exceptions, auth_failed)
  and split auth outcomes so errors carry error.type, aborts carry reason
- fold the general telemetry review findings into CONVENTIONS.md and trim
  for conciseness
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants