Skip to content

feat: add execution context attributes to telemetry spans#288

Open
pkosiec wants to merge 8 commits intomainfrom
pkosiec/obo-trace
Open

feat: add execution context attributes to telemetry spans#288
pkosiec wants to merge 8 commits intomainfrom
pkosiec/obo-trace

Conversation

@pkosiec
Copy link
Copy Markdown
Member

@pkosiec pkosiec commented Apr 20, 2026

Summary

  • Adds execution.context, caller.id, and execution.obo_dev_fallback span attributes to the telemetry interceptor for OBO observability
  • Fixes orphaned plugin.execute spans by preserving OTel context across the async generator boundary in executeStream()
  • Adds db.user attribute to lakebase query spans
  • Documents telemetry span attributes in execution-context.md and custom-plugins.md

Changes

Telemetry span attributes (plugin.execute span)

Attribute Type Description
execution.context "user" | "service" Whether the operation runs as OBO or service principal
caller.id string The user ID (OBO) or service principal ID
execution.obo_dev_fallback boolean true when an OBO call falls back to SP in dev mode (no x-forwarded-access-token)

OTel context propagation fix

executeStream() runs the interceptor chain inside an async generator — OTel lost the parent HTTP span context at this boundary. Spans were created but orphaned in separate traces. Fix: capture otelContext.active() before the generator, restore with otelContext.with() inside.

OBO dev mode fallback

In dev mode without x-forwarded-access-token, asUser() returns a thin proxy that sets an OTel context key (DEV_OBO_FALLBACK_KEY). The telemetry interceptor reads this to set execution.obo_dev_fallback: true. No mutable state — scoped automatically per execution via OTel context.

caller.id fix

Previously used context.userKey (a cache key — analytics passes "global" for SP queries). Now uses getCurrentUserId() which always returns the real identity.

Lakebase db.user

Adds db.user attribute to lakebase.query spans showing the PostgreSQL role used for the connection.

Arrow-result clarification

Added comment clarifying the arrow-result route intentionally bypasses the interceptor chain (data download, not query execution).

Test plan

  • Unit tests for all new attributes (user/service/dev fallback paths)
  • All 1578 tests pass
  • Full build pipeline passes
  • Manual: Grafana traces verified via Chrome DevTools MCP + Tempo API

Screenshots

Service Principal

image

User (for deployed app - when we have the forwarded token)

image

Local for OBO

obo-trace-screenshot

Lakebase - DB user

image

Plugin OBO coverage verification

All built-in plugins that use asUser() go through execute() / executeStream():

Plugin Method File
analytics executeStream() analytics.ts:187
genie executeStream() genie.ts:124, :176, :228
files execute() plugin.ts:445, :481, :551, :622, :658, :694, :782, :842, :887
serving execute() serving.ts:297
vector-search execute() vector-search.ts:98, :173, :240
lakebase N/A (OBO not supported yet) db.user attribute added to lakebase.query spans showing the PostgreSQL role. execution.context/caller.id will be added when OBO (per-user pools) is implemented.

Not covered (by design):

  • analytics arrow-result (GET /arrow-result/:jobId) — data download endpoint, always runs as SP. Clarified with comment.

@pkosiec pkosiec force-pushed the pkosiec/obo-trace branch from c5ada0c to 3914b27 Compare April 21, 2026 11:16
pkosiec added 7 commits April 21, 2026 16:30
Add `execution.context` and `caller.id` span attributes to the
telemetry interceptor, allowing traces to distinguish OBO (user)
from service principal code paths.

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
…Stream

The TelemetryInterceptor spans were orphaned because OTel lost the
parent HTTP span context when crossing into the async generator.
Capture context.active() before the generator and restore it with
context.with() inside, so plugin.execute spans appear as children
of the HTTP request trace.

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
When asUser() is called in dev mode without x-forwarded-access-token,
the telemetry span now includes execution.obo_dev_fallback: true to
distinguish intended OBO calls from regular service principal calls.

Uses OTel context key + thin proxy pattern to carry the flag without
mutable state — scoped automatically per execution and concurrent-safe.

Also documents telemetry span attributes in execution-context.md.

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
…reservation

Add tests for previously uncovered behaviors:
- asUser() dev fallback Proxy wraps methods correctly and sets isDevOboFallback context
- EXCLUDED_FROM_PROXY methods bypass OBO fallback wrapping
- executeStream preserves parent OTel context across async generator boundary
- isDevOboFallback() returns false outside proxy context

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
The caller.id attribute was using context.userKey which is a cache key,
not always the real user ID. The analytics plugin passes "global" for SP
queries, so traces showed caller.id: "global" instead of the actual
service principal ID. Now uses getCurrentUserId() which always returns
the real identity.

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
…ement

Add telemetry span attributes table to execution-context.md and note
that execute()/executeStream() is required for automatic instrumentation.
Update custom-plugins.md to link telemetry attributes from the execution
interceptors bullet.

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
Add db.user attribute to lakebase.query telemetry spans so traces show
which PostgreSQL role executed the query. Also add a comment clarifying
that the arrow-result route intentionally bypasses the interceptor chain
(it's a data download, not a query execution).

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
@pkosiec pkosiec force-pushed the pkosiec/obo-trace branch 2 times, most recently from bef4d7a to 97aba49 Compare April 21, 2026 14:37
@pkosiec pkosiec marked this pull request as ready for review April 21, 2026 14:37
If getCurrentUserId() or isInUserContext() threw before this change,
span.end() was never called because the calls were outside the try/finally
block. Now all setAttribute calls are inside the try block, so the finally
block guarantees span cleanup on any error.

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant