Skip to content

Add --otel distributed tracing support#126

Merged
mkrueger merged 2 commits into
mainfrom
dev/mkrueger/distributed-tracing
Jun 15, 2026
Merged

Add --otel distributed tracing support#126
mkrueger merged 2 commits into
mainfrom
dev/mkrueger/distributed-tracing

Conversation

@mkrueger

@mkrueger mkrueger commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds distributed tracing to CosmosShell so requests carry a sampled W3C traceparent that external tracing systems (for example, the Cosmos DB emulator) can use to correlate shell activity. Optionally exports spans over OTLP.

This addresses the distributed-tracing request. File-based diagnostic logging (issue #122) is intentionally out of scope and will follow in a separate PR.

How it works

The Cosmos SDK only emits a traceparent with the sampled flag set (-01) when two things are true: the Azure SDK Azure.Experimental.EnableActivitySource switch is on, and a recorded (sampled) Activity is in scope. This PR wires both up:

  • Sets AppContext switch Azure.Experimental.EnableActivitySource before any CosmosClient is created.
  • Registers an OpenTelemetry TracerProvider with AlwaysOnSampler, listening to the Azure.Cosmos.Operation and CosmosDBShell activity sources, so activities are recorded.
  • Wraps each shell command in a CosmosDBShell root activity, so every command is one trace.
  • Sets CosmosClientTelemetryOptions.DisableDistributedTracing = false explicitly.

CLI

New --otel [endpoint] option (optional value, mirroring --mcp):

  • --otel — enable tracing (always emits a sampled traceparent). Spans are exported only if an OTLP endpoint is configured (see below); otherwise tracing is local-only with no export.
  • --otel <endpoint> — also export spans to that explicit OTLP endpoint.
  • When --otel is given without an endpoint, the endpoint falls back to the standard OTEL_EXPORTER_OTLP_ENDPOINT environment variable; if that is also unset, no spans are exported.

A malformed endpoint (from --otel <endpoint> or OTEL_EXPORTER_OTLP_ENDPOINT) now produces a clean error and a non-zero exit code instead of an unhandled exception.

Changes

  • --otel option + plumbing in Program.cs; help-Otel and otel-error-invalid-endpoint in en.ftl.
  • New Core/TracingBootstrap.cs (provider lifecycle + per-command activity source).
  • ShellInterpreter: enable SDK tracing, per-command root activity.
  • OpenTelemetry + OTLP exporter packages (1.15.3, the patched version clearing advisory NU1902).
  • README, docs/navigation.md, and Runtime/TracingBootstrapTests.cs.

Validation

  • Main project build: 0 warnings, 0 errors.
  • Full test suite: 1293 passed, 0 failed, 74 skipped (pre-existing emulator-dependent skips).

Out of scope

File logging / issue #122, OTEL metrics & logs signals, Azure Monitor / App Insights exporters, Direct-mode traceparent specifics.

Emit a sampled W3C traceparent on Cosmos DB requests so external tracing systems (e.g. the emulator) can correlate shell activity.

- New --otel [endpoint] option (optional value, like --mcp): bare --otel enables tracing; an endpoint (or OTEL_EXPORTER_OTLP_ENDPOINT) also exports spans via OTLP
- TracingBootstrap: sets the Azure.Experimental.EnableActivitySource switch and registers an AlwaysOn TracerProvider listening to Azure.Cosmos.Operation and CosmosDBShell sources so activities are recorded (traceparent flag -01)
- Each command runs inside a CosmosDBShell root activity
- CreateClientOptions sets DisableDistributedTracing=false explicitly
- Add OpenTelemetry + OTLP exporter packages (1.15.3)
- Help text, README, docs/navigation.md, and unit tests

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional OpenTelemetry-based distributed tracing to CosmosDBShell via a new --otel [endpoint] startup option, enabling sampled W3C traceparent propagation on Cosmos SDK requests and (optionally) OTLP span export.

Changes:

  • Adds --otel [endpoint] CLI option, wiring initialization/disposal of tracing in Program.cs and help text in en.ftl.
  • Introduces TracingBootstrap to configure an OpenTelemetry TracerProvider and create per-command root activities.
  • Updates docs/README and adds unit tests covering tracing bootstrap behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
README.md Documents the new --otel option in the feature list and options table.
docs/navigation.md Adds --otel and OTEL_EXPORTER_OTLP_ENDPOINT to navigation docs + examples.
Directory.Packages.props Pins OpenTelemetry + OTLP exporter package versions.
CosmosDBShell/Program.cs Parses --otel [endpoint], initializes/disposes tracing, and adds the option to help output.
CosmosDBShell/lang/en.ftl Adds localized help text for --otel.
CosmosDBShell/CosmosDBShell.csproj Adds package references for OpenTelemetry and the OTLP exporter.
CosmosDBShell/Azure.Data.Cosmos.Shell.Core/TracingBootstrap.cs New tracing bootstrapper (provider lifecycle + ActivitySource root activities).
CosmosDBShell/Azure.Data.Cosmos.Shell.Core/ShellInterpreter.cs Wraps each executed command in a tracing activity; enables Cosmos SDK distributed tracing option.
CosmosDBShell.Tests/Runtime/TracingBootstrapTests.cs Adds tests verifying activity creation/recording and the Azure SDK switch.

Comment thread CosmosDBShell/Program.cs
Comment thread CosmosDBShell/Program.cs
Comment thread CosmosDBShell/Azure.Data.Cosmos.Shell.Core/TracingBootstrap.cs
@mkrueger mkrueger merged commit bc6586f into main Jun 15, 2026
8 checks passed
@mkrueger mkrueger deleted the dev/mkrueger/distributed-tracing branch June 15, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants