feat(telemetry): per-shape response_size.bytes histogram by alco · Pull Request #4144 · electric-sql/electric

alco · 2026-04-21T11:06:39Z

Summary

Adds a per-shape response size histogram so operators can attribute HTTP response payload volume to individual shapes.

Emits a new [:electric, :shape, :response_size] telemetry event from ServeShapePlug.end_telemetry_span/_, tagged with root_table, is_live and stack_id.
Registers it as a Telemetry.Metrics.distribution (electric.shape.response_size.bytes, unit :byte), which the existing OtelMetricExporter reporter publishes as an OTel histogram.
Bytes are pulled from conn.assigns[:streaming_bytes_sent], the same value already fed into the existing [:electric, :plug, :serve_shape] event; no new bookkeeping on the hot path.

Motivated by operational visibility work — without this, there is no way to tell which shape is responsible for a request-handler binary-memory spike.

Refs stratovolt#1466.

Why a new event rather than re-tagging the existing one

[:electric, :plug, :serve_shape] already feeds a latency distribution (with a keep filter dropping live requests) and span attributes. Layering high-cardinality shape tags onto it would risk affecting those consumers. A dedicated event keeps the intent and the tagging surface isolated.

Test plan

mix compile in packages/sync-service and packages/electric-telemetry
mix test test/electric/plug/serve_shape_plug_test.exs in packages/sync-service
mix test in packages/electric-telemetry
Manual check in a stack that the histogram shows up on the OTel exporter once a request has been served

Emit a new `[:electric, :shape, :response_size]` telemetry event from `ServeShapePlug.end_telemetry_span/_`, tagged with `root_table`, `is_live` and `stack_id`, and register it as a `distribution` metric (`electric.shape.response_size.bytes`, unit: :byte) so the OTel exporter publishes it as a histogram. This lets operators attribute response payload volume to individual shapes and distinguish initial-snapshot traffic from live long-polls without having to correlate with traces.

Session ID: 2026-04-21--sv1466--shape-response-size-metric

claude · 2026-04-21T11:10:14Z

Claude Code Review

Summary

This PR adds a per-shape electric.shape.response_size.bytes OTel histogram by emitting a new [:electric, :shape, :response_size] telemetry event from ServeShapePlug and registering it as a distribution in StackTelemetry. The approach is clean and well-motivated: it reuses the already-computed streaming_bytes_sent value and keeps the new event isolated from the existing [:electric, :plug, :serve_shape] event. Ready to merge with one test gap addressed.

What's Working Well

Sound separation of concerns: the dedicated event avoids tainting the existing latency distribution with high-cardinality shape tags, which is the right call.
No hot-path cost: streaming_bytes_sent is already tracked; the new event is a cheap metadata attachment.
get_root_table/2 fallback chain is well-designed — parsed params first, raw query params as fallback, nil as last resort.
Changeset file included and covers both changed packages.
Local variable extraction (stack_id, bytes_sent) makes end_telemetry_span cleaner.

Issues Found

Important (Should Fix)

Missing test coverage for the new telemetry event

File: packages/sync-service/test/electric/plug/serve_shape_plug_test.exs

The new [:electric, :shape, :response_size] event is not tested. The codebase has a clear pattern for testing telemetry events (see admission_control_test.exs:207-239). A test should verify that a served shape request emits the event with the correct measurements and metadata.

Minimal example following existing patterns:

test "emits response_size telemetry event", %{...} do
  test_pid = self()
  ref = make_ref()
  handler_id = "test-response-size-#{inspect(ref)}"

  :telemetry.attach(
    handler_id,
    [:electric, :shape, :response_size],
    fn _event, measurements, metadata, _config ->
      send(test_pid, {:response_size, measurements, metadata})
    end,
    nil
  )

  # ... make a request ...

  assert_receive {:response_size, %{bytes: bytes}, %{root_table: table, is_live: false, stack_id: _}}
  assert is_integer(bytes) and bytes >= 0
  assert is_binary(table)

  :telemetry.detach(handler_id)
end

The mix test check in the test plan confirms existing tests pass but doesn't verify the new behaviour is tested.

Suggestions (Nice to Have)

get_live_mode/1 called twice

File: packages/sync-service/lib/electric/plug/serve_shape_plug.ex:306,322

stack_id and bytes_sent were extracted into local variables, but get_live_mode(assigns) is still called twice. Minor consistency nit:

stack_id = get_in(conn.assigns, [:config, :stack_id])
bytes_sent = assigns[:streaming_bytes_sent] || 0
live = get_live_mode(assigns)   # extract once

nil root_table in metric labels

File: packages/sync-service/lib/electric/plug/serve_shape_plug.ex:350

When neither parsed params nor raw query params have a table key, get_root_table/2 returns nil. Depending on the OTel backend, this may surface as root_table="" or root_table="nil". An explicit fallback string avoids ambiguity:

defp get_root_table(_assigns, _conn), do: "unknown"

Consider custom histogram buckets for byte measurements

File: packages/electric-telemetry/lib/electric/telemetry/stack_telemetry.ex:89

The new distribution uses default OTel histogram buckets (tuned for latency, e.g. [0, 5, 10, 25, 50, …, 10000]). For response sizes in bytes, virtually every response will land in the top bucket, reducing histogram utility. Custom buckets like 1 KB → 100 MB would make the metric much more actionable for operators. This is a follow-up consideration if the histogram turns out noisy in practice.

Issue Conformance

No linked issue in this repository (refs an external private issue stratovolt#1466). The PR description is clear and self-contained, explaining both what was added and why a separate event was chosen over augmenting the existing one. The implementation matches the stated scope.

The absence of a linked public issue is worth noting per project convention — flagging for awareness rather than blocking.

Review iteration: 1 | 2026-04-21

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2a4b6b5cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T11:10:21Z

+  defp get_root_table(_assigns, %Conn{query_params: %{"table" => table}}) when is_binary(table),
+    do: table


Stop labeling histogram with raw table query params

This fallback uses conn.query_params["table"] directly, so requests that fail validation still emit [:electric, :shape, :response_size] with attacker-controlled root_table values (the plug’s error paths still call halt/1, which ends telemetry). Because root_table is a metric tag, varying this query param can create unbounded time-series cardinality and put pressure on Otel/StatsD backends; this defeats the low-cardinality goal noted in the comment. Use only validated table values (or a bounded fallback like nil/"unknown") for metric tags.

Useful? React with 👍 / 👎.

codecov · 2026-04-21T11:12:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.31%. Comparing base (365dd17) to head (e2a4b6b).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4144      +/-   ##
==========================================
- Coverage   89.20%   85.31%   -3.90%     
==========================================
  Files          25       40      +15     
  Lines        2520     3077     +557     
  Branches      636      641       +5     
==========================================
+ Hits         2248     2625     +377     
- Misses        270      450     +180     
  Partials        2        2

Flag	Coverage Δ
electric-telemetry	`67.68% <ø> (?)`
elixir	`67.68% <ø> (?)`
packages/experimental	`87.73% <ø> (ø)`
packages/react-hooks	`86.48% <ø> (ø)`
packages/start	`82.83% <ø> (ø)`
packages/typescript-client	`94.30% <ø> (ø)`
packages/y-electric	`56.05% <ø> (ø)`
typescript	`89.20% <ø> (ø)`
unit-tests	`85.31% <ø> (-3.90%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

erik-the-implementer added 2 commits April 21, 2026 13:06

chore: add agent task notes for sv1466

e2a4b6b

Session ID: 2026-04-21--sv1466--shape-response-size-metric

alco added the claude label Apr 21, 2026

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

alco marked this pull request as draft April 21, 2026 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): per-shape response_size.bytes histogram#4144

feat(telemetry): per-shape response_size.bytes histogram#4144
alco wants to merge 2 commits intomainfrom
sv1466-response-size-metric

alco commented Apr 21, 2026

Uh oh!

claude Bot commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		defp get_root_table(_assigns, %Conn{query_params: %{"table" => table}}) when is_binary(table),
		do: table

Conversation

alco commented Apr 21, 2026

Summary

Why a new event rather than re-tagging the existing one

Test plan

Uh oh!

claude Bot commented Apr 21, 2026

Claude Code Review

Summary

What's Working Well

Issues Found

Important (Should Fix)

Suggestions (Nice to Have)

Issue Conformance

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 21, 2026 •

edited

Loading