feat(metrics): collect client stats periodically, not only on scrape by miotte · Pull Request #113 · sassoftware/arke

miotte · 2026-05-31T18:26:56Z

Fixes #111

Per-client gauges were refreshed only on /metrics scrape, so a client that connected and disconnected between two scrapes was never recorded. Adds a periodic collectClientStats (15s) started from Serve so a short-lived client is captured when a tick fires, independent of scrape timing.

Per-client gauges were refreshed only when /metrics was scraped, so a client that connected and disconnected between two scrapes never showed up at all: the on-scrape gather runs at scrape time, finds the client already gone, and so never sets its gauge. Add a periodic collector (collectClientStats) started from Serve that gathers connected-client stats every 15s, independent of scrapes. A client connected when a tick fires has its gauge written to the sink; because the sink expires entries only after 60s (not on disconnect), that value survives to be emitted on the next scrape even though the client has since disconnected. This is the only path by which a client whose whole lifetime falls between two scrapes is recorded at all. Note this does NOT improve freshness for connected clients: the pre-existing on-scrape gather already re-sets every connected client immediately before the sink is collected, so scraped values are already current and the ticker's writes for them are overwritten before any read. The benefit is therefore conditional on the scrape interval. It narrows the short-lived-client miss window from the scrape interval down to min(15s, scrape_interval): it helps when Prometheus scrapes less often than every 15s (e.g. 30s or 60s), and is effectively a no-op when the scrape interval is <=15s, since the on-scrape gather then captures the same clients at equal-or-finer granularity. The 15s constant is a fixed guess at 'shorter than a typical scrape'; it cannot see the actual scrape interval. Minor tradeoff: because ticks keep refreshing each gauge's updatedAt, a disconnected client's last value can linger slightly longer before expiring than under scrape-only collection, bounded by the 60s Expiration either way. Collection remains intentionally incomplete: a client whose entire lifetime falls between two ticks is still never recorded. The per-client gauges are a best-effort, cardinality-bounded snapshot of currently- connected clients - the ClientIdentifier label embeds a per-connection hash, so departed clients must age out via the sink Expiration rather than accumulate forever. Promoting these to per-client counters to catch short-lived clients would defeat that bound and grow cardinality without limit. Where exact throughput is needed it is already captured per message by the aggregate counters arke_recvmsg_total / arke_sendmsg_total in the gRPC interceptors. Replaces the FIXME with notes describing the sampling strategy, its known gaps, and why they are acceptable. Fixes #111 Signed-off-by: Michael Otteni <MichaelGOtteni@gmail.com>

miotte requested review from bithckr and dlawregiets as code owners May 31, 2026 18:26

miotte marked this pull request as draft May 31, 2026 18:58

miotte force-pushed the miotte-pr11 branch 2 times, most recently from c416010 to f5e0d2c Compare May 31, 2026 19:30

miotte force-pushed the miotte-pr11 branch from f5e0d2c to 04dc8e4 Compare May 31, 2026 19:54

This was referenced May 31, 2026

fix(server): make MonitorHealthChan resilient to slow and closed notifiers #115

Draft

Client stats only collected on metrics scrape; short-lived clients can be missed #111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): collect client stats periodically, not only on scrape#113

feat(metrics): collect client stats periodically, not only on scrape#113
miotte wants to merge 1 commit into
mainfrom
miotte-pr11

miotte commented May 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

miotte commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

miotte commented May 31, 2026 •

edited

Loading