-
Notifications
You must be signed in to change notification settings - Fork 4k
ts: update the recording of changefeed child metrics #158889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
ts: update the recording of changefeed child metrics #158889
Conversation
dhartunian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhartunian reviewed 2 of 2 files at r1, 4 of 4 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @jasonlmfong)
-- commits line 4 at r1:
nit: commit message wrap.
-- commits line 12 at r2:
can you add more detail about why a bunch of histogram wrangling code is necessary? Seems like it's related to child histograms not exposing their internals (because they weren't needed before) and the manual work involved in calculating the percentiles, right?
nit: commit message wrap.
-- commits line 18 at r3:
nit: grammar. also consider rewording. "memoize child metric name transformation" might be clearer.
-- commits line 20 at r3:
Be clear about what motivated the introduction of the cache, and what you expect it to improve. This way someone can understand why this code is here in the future. Otherwise, it seems unmotivated on the surface especially since it's not linked to a ticket.
pkg/server/status/recorder_test.go line 1170 at r1 (raw file):
{"5metrics_100children", 5, 100}, {"10metrics_100children", 10, 100}, {"10metrics_1024children", 10, 1024},
what is the purpose of doing different combos for benchmarks? shouldn't 1 benchmark scale to different number of child metrics?
pkg/server/status/recorder.go line 197 at r3 (raw file):
// childMetricNameCache caches the encoded names for child metrics to avoid // rebuilding them on every recording. Uses sync.Map for lock-free reads. childMetricNameCache sync.Map
linter will tell you this, but you need to use syncutil.Map which supports generics.
pkg/server/status/recorder.go line 476 at r3 (raw file):
timestampNanos: now.UnixNano(), } recorder.recordChangefeedChildMetrics(&data, &mr.childMetricNameCache)
did you consider putting the childMetricNameCache in the registryRecorder struct so you don't have to pass it around?
pkg/server/status/recorder.go line 960 at r3 (raw file):
func hashLabels(labels []*prometheusgo.LabelPair) uint64 { var h uint64 = 14695981039346656037 // FNV-1a offset basis const prime = 1099511628211
surprising to me that you have to do this manually. is there no util or stdlib thing that can accumulate a hash from a bunch of strings? you should use hash/maphash or hash/fnv.
pkg/server/status/recorder.go line 1122 at r3 (raw file):
} else { metricName = prom.GetName(false /* useStaticLabels */) + metric.EncodeLabeledName(childMetric) }
can you extract the shared logic for the hash lookup? seems identical as the histogram one above.
pkg/server/status/recorder_test.go line 366 at r2 (raw file):
// Check for changefeed.stage.downstream_client_send.latency with scope labels if strings.Contains(ts.Name, "cr.node.changefeed.stage.downstream_client_send.latency") {
this is a weird construction for the test to conditionally evaluate so many varieties of cases. can this be encoded as a set of test case structs?
pkg/server/status/recorder.go line 1037 at r2 (raw file):
} childMetricsCount++ })
this code feels like it should be implemented by the child metric structs as a method, no? or is it specialized to this output. maybe this is just how we do non-child histogram recording?
jasonlmfong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)
pkg/server/status/recorder.go line 476 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
did you consider putting the childMetricNameCache in the
registryRecorderstruct so you don't have to pass it around?
the registryRecorder is being recreated when we call GetTimeSeriesData() so the cache doesn't live until the next iteration
jasonlmfong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)
pkg/server/status/recorder.go line 1037 at r2 (raw file):
Previously, dhartunian (David Hartunian) wrote…
this code feels like it should be implemented by the child metric structs as a method, no? or is it specialized to this output. maybe this is just how we do non-child histogram recording?
we also do non-child histogram recordings in a similar fashion, it's under the extractValue() function
8116a2e to
cb3c2c5
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
This change adds an explicit list of metrics which we perform child metrics collection for. This helps bound the performance impact even when the clsuter setting is turned on. Epic: CRDB-55079 Release: None
When child metrics collection was performed, it only picked up gauges and counters. This commit adds the ability to record histograms as well. This was done through exposing child histograms snapshots which were previously not accessible. Epic: CRDB-55079 Release: None
Curently the child metric name transformations are being performed every polling cycle. The memoization will ensure every subsequent poll does not need to allocate memory and cpu to transform the names. Epic: CRDB-55079 Release: None
cb3c2c5 to
6bf6467
Compare
ts: limit the recording of child metrics to a small set of metrics
This change adds an explicit list of metrics which we perform child metrics collection for.
This helps bound the performance impact even when the clsuter setting is turned on.
Epic: CRDB-55079
Release: None
--
ts: record histogram child metrics
When child metrics collection was performed, it only picked up gauges and counters.
This commit adds the ability to record histograms as well.
This was done through exposing child histograms snapshots which were previously not accessible.
Epic: CRDB-55079
Release: None
--
ts: memoize child metric name transformation
Curently the child metric name transformations are being performed every polling cycle.
The memoization will ensure every subsequent poll does not need to allocate memory and cpu to transform the names.
Epic: CRDB-55079
Release: None