feat(telemetry): per-shape response_size.bytes histogram#4144
feat(telemetry): per-shape response_size.bytes histogram#4144
Conversation
Emit a new `[:electric, :shape, :response_size]` telemetry event from `ServeShapePlug.end_telemetry_span/_`, tagged with `root_table`, `is_live` and `stack_id`, and register it as a `distribution` metric (`electric.shape.response_size.bytes`, unit: :byte) so the OTel exporter publishes it as a histogram. This lets operators attribute response payload volume to individual shapes and distinguish initial-snapshot traffic from live long-polls without having to correlate with traces.
Session ID: 2026-04-21--sv1466--shape-response-size-metric
Claude Code ReviewSummaryThis PR adds a per-shape What's Working Well
Issues FoundImportant (Should Fix)Missing test coverage for the new telemetry event File: The new Minimal example following existing patterns: test "emits response_size telemetry event", %{...} do
test_pid = self()
ref = make_ref()
handler_id = "test-response-size-#{inspect(ref)}"
:telemetry.attach(
handler_id,
[:electric, :shape, :response_size],
fn _event, measurements, metadata, _config ->
send(test_pid, {:response_size, measurements, metadata})
end,
nil
)
# ... make a request ...
assert_receive {:response_size, %{bytes: bytes}, %{root_table: table, is_live: false, stack_id: _}}
assert is_integer(bytes) and bytes >= 0
assert is_binary(table)
:telemetry.detach(handler_id)
endThe Suggestions (Nice to Have)
File:
stack_id = get_in(conn.assigns, [:config, :stack_id])
bytes_sent = assigns[:streaming_bytes_sent] || 0
live = get_live_mode(assigns) # extract once
File: When neither parsed params nor raw query params have a defp get_root_table(_assigns, _conn), do: "unknown"Consider custom histogram buckets for byte measurements File: The new distribution uses default OTel histogram buckets (tuned for latency, e.g. [0, 5, 10, 25, 50, …, 10000]). For response sizes in bytes, virtually every response will land in the top bucket, reducing histogram utility. Custom buckets like 1 KB → 100 MB would make the metric much more actionable for operators. This is a follow-up consideration if the histogram turns out noisy in practice. Issue ConformanceNo linked issue in this repository (refs an external private issue The absence of a linked public issue is worth noting per project convention — flagging for awareness rather than blocking. Review iteration: 1 | 2026-04-21 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2a4b6b5cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| defp get_root_table(_assigns, %Conn{query_params: %{"table" => table}}) when is_binary(table), | ||
| do: table |
There was a problem hiding this comment.
Stop labeling histogram with raw table query params
This fallback uses conn.query_params["table"] directly, so requests that fail validation still emit [:electric, :shape, :response_size] with attacker-controlled root_table values (the plug’s error paths still call halt/1, which ends telemetry). Because root_table is a metric tag, varying this query param can create unbounded time-series cardinality and put pressure on Otel/StatsD backends; this defeats the low-cardinality goal noted in the comment. Use only validated table values (or a bounded fallback like nil/"unknown") for metric tags.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4144 +/- ##
==========================================
- Coverage 89.20% 85.31% -3.90%
==========================================
Files 25 40 +15
Lines 2520 3077 +557
Branches 636 641 +5
==========================================
+ Hits 2248 2625 +377
- Misses 270 450 +180
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Adds a per-shape response size histogram so operators can attribute HTTP response payload volume to individual shapes.
[:electric, :shape, :response_size]telemetry event fromServeShapePlug.end_telemetry_span/_, tagged withroot_table,is_liveandstack_id.Telemetry.Metrics.distribution(electric.shape.response_size.bytes, unit:byte), which the existingOtelMetricExporterreporter publishes as an OTel histogram.conn.assigns[:streaming_bytes_sent], the same value already fed into the existing[:electric, :plug, :serve_shape]event; no new bookkeeping on the hot path.Motivated by operational visibility work — without this, there is no way to tell which shape is responsible for a request-handler binary-memory spike.
Refs stratovolt#1466.
Why a new event rather than re-tagging the existing one
[:electric, :plug, :serve_shape]already feeds a latency distribution (with akeepfilter dropping live requests) and span attributes. Layering high-cardinality shape tags onto it would risk affecting those consumers. A dedicated event keeps the intent and the tagging surface isolated.Test plan
mix compileinpackages/sync-serviceandpackages/electric-telemetrymix test test/electric/plug/serve_shape_plug_test.exsinpackages/sync-servicemix testinpackages/electric-telemetry