feat(metrics): collect client stats periodically, not only on scrape#113
Draft
miotte wants to merge 1 commit into
Draft
feat(metrics): collect client stats periodically, not only on scrape#113miotte wants to merge 1 commit into
miotte wants to merge 1 commit into
Conversation
c416010 to
f5e0d2c
Compare
Per-client gauges were refreshed only when /metrics was scraped, so a client that connected and disconnected between two scrapes never showed up at all: the on-scrape gather runs at scrape time, finds the client already gone, and so never sets its gauge. Add a periodic collector (collectClientStats) started from Serve that gathers connected-client stats every 15s, independent of scrapes. A client connected when a tick fires has its gauge written to the sink; because the sink expires entries only after 60s (not on disconnect), that value survives to be emitted on the next scrape even though the client has since disconnected. This is the only path by which a client whose whole lifetime falls between two scrapes is recorded at all. Note this does NOT improve freshness for connected clients: the pre-existing on-scrape gather already re-sets every connected client immediately before the sink is collected, so scraped values are already current and the ticker's writes for them are overwritten before any read. The benefit is therefore conditional on the scrape interval. It narrows the short-lived-client miss window from the scrape interval down to min(15s, scrape_interval): it helps when Prometheus scrapes less often than every 15s (e.g. 30s or 60s), and is effectively a no-op when the scrape interval is <=15s, since the on-scrape gather then captures the same clients at equal-or-finer granularity. The 15s constant is a fixed guess at 'shorter than a typical scrape'; it cannot see the actual scrape interval. Minor tradeoff: because ticks keep refreshing each gauge's updatedAt, a disconnected client's last value can linger slightly longer before expiring than under scrape-only collection, bounded by the 60s Expiration either way. Collection remains intentionally incomplete: a client whose entire lifetime falls between two ticks is still never recorded. The per-client gauges are a best-effort, cardinality-bounded snapshot of currently- connected clients - the ClientIdentifier label embeds a per-connection hash, so departed clients must age out via the sink Expiration rather than accumulate forever. Promoting these to per-client counters to catch short-lived clients would defeat that bound and grow cardinality without limit. Where exact throughput is needed it is already captured per message by the aggregate counters arke_recvmsg_total / arke_sendmsg_total in the gRPC interceptors. Replaces the FIXME with notes describing the sampling strategy, its known gaps, and why they are acceptable. Fixes #111 Signed-off-by: Michael Otteni <MichaelGOtteni@gmail.com>
This was referenced May 31, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #111
Per-client gauges were refreshed only on
/metricsscrape, so a client that connected and disconnected between two scrapes was never recorded. Adds a periodiccollectClientStats(15s) started fromServeso a short-lived client is captured when a tick fires, independent of scrape timing.