From 8faa8b177b726e77feb1bbabf0bb74127cadb551 Mon Sep 17 00:00:00 2001
From: Dustin Cote <dustin.cote@temporal.io>
Date: Wed, 17 Jun 2026 10:14:24 -0400
Subject: [PATCH] Clarify v0-vs-v1 latency metric semantics and low-traffic
 percentiles

Two OpenMetrics doc gaps surfaced by a customer alerting false alarm after
migrating from the v0 query endpoint to v1 OpenMetrics:

- Migration guide: add a caution that v0 service_latency_sum/count is an
  average (~p50) and _bucket is a count, not a percentile. Comparing either
  against v1 _p95/_p99 reports higher values for identical traffic. Includes
  safe-migration steps and a pointer to the p99 latency SLO.
- Metrics reference: add a note that percentile metrics on low-traffic
  namespaces are computed from small per-minute samples, so a single slow
  request dominates p50/p95/p99. Recommends gating latency alerts on a
  minimum request count, and notes that pre-calculated percentiles cannot be
  re-aggregated into an accurate longer-window percentile.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 .../metrics/openmetrics/metrics-reference.mdx |  8 ++++++++
 .../metrics/openmetrics/migration-guide.mdx   | 20 +++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/docs/cloud/metrics/openmetrics/metrics-reference.mdx b/docs/cloud/metrics/openmetrics/metrics-reference.mdx
index da8682d920..e00a3fa272 100644
--- a/docs/cloud/metrics/openmetrics/metrics-reference.mdx
+++ b/docs/cloud/metrics/openmetrics/metrics-reference.mdx
@@ -40,6 +40,14 @@ All metrics are stored as 1 minute aggregates. Rate metrics are therefore per-se
 
 :::
 
+:::note Percentile metrics on low-traffic namespaces
+
+Percentile metrics (`*_p50` / `_p95` / `_p99`) are calculated from the requests observed in each 1-minute aggregation window. On a namespace with few requests per minute, that sample is small, so a single slow request dominates every percentile and p50, p95, and p99 converge toward the slowest observed request. Tail percentiles generally need roughly 20 or more samples per window before they are statistically meaningful; below that, values vary widely.
+
+For example, a low-volume namespace that starts one Workflow every few minutes can report a several-hundred-millisecond `StartWorkflowExecution` `temporal_cloud_v1_service_latency_p95` that reflects a single request, not systemic latency. When alerting on percentile latency for low-traffic namespaces, gate the alert on a minimum request count (for example, [`temporal_cloud_v1_service_request_count`](#temporal_cloud_v1_service_request_count)) so that windows with too few samples don't trigger it. These percentiles are pre-calculated per 1-minute window and cannot be re-aggregated into an accurate longer-window percentile, so widening your evaluation window does not by itself make a sparse sample meaningful.
+
+:::
+
 ### Common Labels
 
 All metrics include these base labels:
diff --git a/docs/cloud/metrics/openmetrics/migration-guide.mdx b/docs/cloud/metrics/openmetrics/migration-guide.mdx
index 32b3c86a33..f69386a6e7 100644
--- a/docs/cloud/metrics/openmetrics/migration-guide.mdx
+++ b/docs/cloud/metrics/openmetrics/migration-guide.mdx
@@ -137,6 +137,26 @@ accurately aggregated_. For example:
 - ✅ Can still view individual namespace/task queue percentiles accurately
 - ✅ More accurate percentile calculations for individual series, especially with outliers
 
+:::caution Don't compare a v0 average against a v1 percentile
+
+The v0 latency metrics are a histogram, not a percentile. Dividing `temporal_cloud_v0_service_latency_sum` by
+`temporal_cloud_v0_service_latency_count` yields an **average** (roughly a p50), and a single
+`temporal_cloud_v0_service_latency_bucket{le="..."}` series only **counts** requests below a threshold. Neither is a p95
+or p99.
+
+If your v0 alert compared an average (or a raw `_sum` / `_bucket` value) against a latency threshold, switching to
+`temporal_cloud_v1_service_latency_p95` / `_p99` reports higher values for identical traffic. This is a measurement
+change, not a latency regression.
+
+To migrate latency alerts safely:
+
+- Compare like-for-like. To reproduce a former average-based alert, start on `temporal_cloud_v1_service_latency_p50`,
+  then move to `_p95` / `_p99` deliberately once you have set an appropriate threshold.
+- Confirm which percentile your SLO targets. The Temporal Cloud [latency SLO](/cloud/service-availability#latency) is a
+  **p99**; alerting on p95 against a p99 threshold trips earlier than the SLO.
+
+:::
+
 ### 4\. Authentication Setup
 
 **Before**: mTLS certificates with customer-specific endpoint