Skip to content

Propagate OTel context into spawned snapshot and move-in tasks#4149

Open
alco wants to merge 4 commits intomainfrom
erik/issue-27-otel-context-propagation
Open

Propagate OTel context into spawned snapshot and move-in tasks#4149
alco wants to merge 4 commits intomainfrom
erik/issue-27-otel-context-propagation

Conversation

@alco
Copy link
Copy Markdown
Member

@alco alco commented Apr 22, 2026

Summary

Fixes a bug where telemetry spans defined with OpenTelemetry.with_child_span inside spawned tasks (initial-snapshot and move-in code paths) were silently dropped, hiding expected fine-grained spans such as shape_snapshot.execute_for_shape, shape_snapshot.query_fn, shape_snapshot.checkout_wait, shape_snapshot.setup, and shape_snapshot.query from Honeycomb on hosts whose traffic is dominated by initial snapshots.

Root cause

with_child_span/4 only creates a span when there is already a parent span in the current Erlang process's OTel context. Task.Supervisor.async_nolink / Task.Supervisor.start_child start new processes that do not inherit the caller's OTel context, so in_span_context?() returns false and the whole span subtree is dropped. Three spawn sites were affected:

  • Electric.Shapes.Consumer.Snapshotter.start_streaming_snapshot_from_db/4
  • Electric.Shapes.PartialModes.query_move_in_async/5
  • Electric.Shapes.PartialModes.query_move_in/5

(PartialModes.query_subset/4 is called synchronously from an HTTP-request process that already has a parent span — it was not affected.)

Fix

Capture the context via :otel_ctx.get_current() before each spawn and attach it inside the task closure with :otel_ctx.attach/1 (detached in after). This mirrors the pattern already used for state.otel_ctx in Snapshotter.handle_continue/2.

Test plan

  • mix compile clean
  • mix test test/electric/shapes/consumer_test.exs — 29 passing
  • mix test test/electric/shapes/consumer/move_ins_test.exs test/electric/shapes/consumer/initial_snapshot_test.exs — 60 passing
  • After deploy: verify name = shape_snapshot.execute_for_shape AND shape.query_reason = "initial_snapshot" returns rows in Honeycomb (previously 0 across all hosts over 24h)

Refs: https://github.com/electric-sql/alco-agent-tasks/issues/27

🤖 Generated with Claude Code

`Task.Supervisor.async_nolink` / `start_child` start new Erlang processes
that do not inherit the caller's OTel context. Spans created inside these
tasks via `with_child_span` (e.g. `shape_snapshot.execute_for_shape` and
its children) were therefore silently dropped on the initial-snapshot and
move-in code paths, because `with_child_span` requires a parent span in
the current process's context.

Capture the context via `:otel_ctx.get_current()` before spawning and
attach it inside the task closure with `:otel_ctx.attach/1`, mirroring
the pattern already used for `state.otel_ctx` in the snapshotter's
`handle_continue`.
@alco alco added the claude label Apr 22, 2026
@claude
Copy link
Copy Markdown

claude Bot commented Apr 22, 2026

Claude Code Review

Summary

The new commit (b230e901b) addresses the last open suggestion from the previous review by adding a proper @type otel_ctx to the OpenTelemetry module and updating the consumer.ex typespec to use it. The PR is complete and has no remaining issues.

What's Working Well

  • Suggestion fully addressed: @type otel_ctx :: {span_ctx() | :undefined, :otel_baggage.t()} is now defined in open_telemetry.ex, span_ctx was promoted from @typep to @type so it can be referenced externally, and the initialize_shape_opts/0 typespec in consumer.ex now reads optional(:otel_ctx) => OpenTelemetry.otel_ctx() | nil — accurate and self-documenting.
  • @typedoc added: The new type includes a doc comment explaining its role, which is good practice for a public type that crosses module boundaries.
  • No new issues introduced: The commit is a pure typespec improvement; no behavioural changes.

Issues Found

Critical (Must Fix)

None.

Important (Should Fix)

None.

Suggestions (Nice to Have)

None.

Issue Conformance

No linked public issue. The PR description is detailed and provides sufficient context (root cause, affected code paths, fix approach, test plan). The implementation fully matches the described fix.

Previous Review Status

  • Suggestion (map() | nil typespec stale in consumer.ex): Resolved in b230e901b. The otel_ctx type is now properly defined and used.

Review iteration: 4 | 2026-04-22

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.20%. Comparing base (388ec63) to head (b230e90).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4149   +/-   ##
=======================================
  Coverage   89.20%   89.20%           
=======================================
  Files          25       25           
  Lines        2520     2520           
  Branches      636      638    +2     
=======================================
  Hits         2248     2248           
  Misses        270      270           
  Partials        2        2           
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 94.30% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 89.20% <ø> (ø)
unit-tests 89.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

alco and others added 3 commits April 22, 2026 02:17
Replace :otel_ctx.get_current/attach/detach in PartialModes with
OpenTelemetry.get_current_context/1 and set_current_context/1, matching
the pattern already used in shape_log_collector.ex and consumer.ex.

The helper pair just propagates the current span + baggage into the new
process, which is all these short-lived tasks need — no detach dance
required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow the same pattern as the previous commit and
shape_log_collector/consumer: use OpenTelemetry.get_current_context/1
and set_current_context/1 helpers instead of raw :otel_ctx.
get_current/attach/detach. Drops the detach dance for both the
handle_continue entry in Snapshotter and the nested Task in
start_streaming_snapshot_from_db, and updates the producer in
Shapes.get_or_create_shape_handle to capture the context via the
same helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follows up on the reviewer's suggestion: `get_current_context/0` returns
a {span_ctx, baggage} tuple, not a map. Expose an `otel_ctx` @type on
the OpenTelemetry module and reference it from
`Consumer.initialize_shape_opts` so the spec matches the real shape of
the value being carried.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants