Propagate OTel context into spawned snapshot and move-in tasks#4149
Propagate OTel context into spawned snapshot and move-in tasks#4149
Conversation
`Task.Supervisor.async_nolink` / `start_child` start new Erlang processes that do not inherit the caller's OTel context. Spans created inside these tasks via `with_child_span` (e.g. `shape_snapshot.execute_for_shape` and its children) were therefore silently dropped on the initial-snapshot and move-in code paths, because `with_child_span` requires a parent span in the current process's context. Capture the context via `:otel_ctx.get_current()` before spawning and attach it inside the task closure with `:otel_ctx.attach/1`, mirroring the pattern already used for `state.otel_ctx` in the snapshotter's `handle_continue`.
Claude Code ReviewSummaryThe new commit ( What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)None. Suggestions (Nice to Have)None. Issue ConformanceNo linked public issue. The PR description is detailed and provides sufficient context (root cause, affected code paths, fix approach, test plan). The implementation fully matches the described fix. Previous Review Status
Review iteration: 4 | 2026-04-22 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4149 +/- ##
=======================================
Coverage 89.20% 89.20%
=======================================
Files 25 25
Lines 2520 2520
Branches 636 638 +2
=======================================
Hits 2248 2248
Misses 270 270
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replace :otel_ctx.get_current/attach/detach in PartialModes with OpenTelemetry.get_current_context/1 and set_current_context/1, matching the pattern already used in shape_log_collector.ex and consumer.ex. The helper pair just propagates the current span + baggage into the new process, which is all these short-lived tasks need — no detach dance required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow the same pattern as the previous commit and shape_log_collector/consumer: use OpenTelemetry.get_current_context/1 and set_current_context/1 helpers instead of raw :otel_ctx. get_current/attach/detach. Drops the detach dance for both the handle_continue entry in Snapshotter and the nested Task in start_streaming_snapshot_from_db, and updates the producer in Shapes.get_or_create_shape_handle to capture the context via the same helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follows up on the reviewer's suggestion: `get_current_context/0` returns
a {span_ctx, baggage} tuple, not a map. Expose an `otel_ctx` @type on
the OpenTelemetry module and reference it from
`Consumer.initialize_shape_opts` so the spec matches the real shape of
the value being carried.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Fixes a bug where telemetry spans defined with
OpenTelemetry.with_child_spaninside spawned tasks (initial-snapshot and move-in code paths) were silently dropped, hiding expected fine-grained spans such asshape_snapshot.execute_for_shape,shape_snapshot.query_fn,shape_snapshot.checkout_wait,shape_snapshot.setup, andshape_snapshot.queryfrom Honeycomb on hosts whose traffic is dominated by initial snapshots.Root cause
with_child_span/4only creates a span when there is already a parent span in the current Erlang process's OTel context.Task.Supervisor.async_nolink/Task.Supervisor.start_childstart new processes that do not inherit the caller's OTel context, soin_span_context?()returnsfalseand the whole span subtree is dropped. Three spawn sites were affected:Electric.Shapes.Consumer.Snapshotter.start_streaming_snapshot_from_db/4Electric.Shapes.PartialModes.query_move_in_async/5Electric.Shapes.PartialModes.query_move_in/5(
PartialModes.query_subset/4is called synchronously from an HTTP-request process that already has a parent span — it was not affected.)Fix
Capture the context via
:otel_ctx.get_current()before each spawn and attach it inside the task closure with:otel_ctx.attach/1(detached inafter). This mirrors the pattern already used forstate.otel_ctxinSnapshotter.handle_continue/2.Test plan
mix compilecleanmix test test/electric/shapes/consumer_test.exs— 29 passingmix test test/electric/shapes/consumer/move_ins_test.exs test/electric/shapes/consumer/initial_snapshot_test.exs— 60 passingname = shape_snapshot.execute_for_shape AND shape.query_reason = "initial_snapshot"returns rows in Honeycomb (previously 0 across all hosts over 24h)Refs: https://github.com/electric-sql/alco-agent-tasks/issues/27
🤖 Generated with Claude Code