feat(telemetry): local telemetry playground using injected OTel sidecar (#342) by udsmicrosoft · Pull Request #286 · documentdb/documentdb-kubernetes-operator

udsmicrosoft · 2026-03-05T18:12:09Z

Summary

Adds a self-contained local telemetry playground (Kind + Prometheus + Grafana + dashboards + traffic generators) and aligns it with the injected OTel Collector sidecar introduced by #342.

Architecture (re-aligned on #342)

Every DocumentDB pod runs a per-pod OTel Collector sidecar injected by the CNPG sidecar-injector plugin (spec.monitoring.enabled=true). There is no central OTel Collector Deployment and no per-node DaemonSet in this stack:

Gateway metrics — gateway pushes OTLP/gRPC → sidecar's otlp receiver on 127.0.0.1:4317 → re-exported via the prometheus exporter on :9188.
Container/node metrics — Prometheus scrapes kubelet/cAdvisor directly (no OTel involvement).
CNPG metrics — Prometheus scrapes the CNPG instance manager on :9187 directly.

documentdb pod
├─ postgres            (CNPG, /metrics :9187)
├─ documentdb-gateway  (OTLP push → 127.0.0.1:4317)
└─ otel-collector      (OTLP in :4317 / Prom out :9188)

Operator changes (under `operator/`)

internal/otel/base_config.yaml — adds an otlp receiver bound to 127.0.0.1:4317 (loopback only; gateway and collector share pod netns) and wires it into the metrics pipeline alongside the existing sqlquery receiver.
cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go — when injecting the sidecar, also adds OTel env vars to the gateway container:
- POD_NAME (downward API)
- OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
- OTEL_METRICS_ENABLED=true ← required: the gateway gates OTel init off by default and checks this env var
- OTEL_RESOURCE_ATTRIBUTES=service.instance.id=$(POD_NAME)
Injection is idempotent across CREATE and PATCH via name-based dedup — without this, repeated reconciles double-append entries and CNPG fails with Pod is invalid: spec: Forbidden: pod updates may not change fields other than ....
New lifecycle_test.go covers: all-appended, metrics-enabled-present, preserves-existing, idempotent re-invocation, no-gateway-container.
internal/otel/config_test.go extended for the new otlp receiver and pipeline wiring.

Playground (`documentdb-playground/telemetry/local/`)

One-command deploy: ./scripts/deploy.sh — Kind cluster + cert-manager + operator (from this branch) + observability stack + DocumentDB HA + traffic generators.

Component	Purpose
3-instance DocumentDB HA cluster	1 primary + 2 streaming replicas
Per-pod OTel Collector sidecar	Injected by the operator's CNPG plugin
Prometheus	Scrapes sidecars (annotation discovery), CNPG, kubelet/cAdvisor
Grafana	Two dashboards: Gateway, Internals
Traffic generators	RW to primary, RO to replicas, with deliberate error injection

Dashboards' Instance variable pivots on the pod label (set by the collector's resource processor) — Prometheus's scrape instance label clobbers OTel's service.instance.id.

Sidecar Prometheus port is 9188 (not the conventional 9187, which collides with the CNPG instance-manager in the same pod netns).

validate.sh polls up to 120s for db_client_operations_total and fails red if absent — surfaces gateway-image issues clearly instead of warning.

⚠️ Gateway image requirement

The default upstream gateway image does not yet emit OTLP db_client_* metrics — that instrumentation lives in pgmongo and has not been published in an upstream release. The playground's cluster.yaml therefore pins:

gatewayImage: "udsmiley/documentdb-gateway-otel:k8s-pgmongo-main-latest"

Once an upstream release ships with OTel metrics, drop the pin to fall back to the operator default. README §Gateway image documents the build recipe (pgmongo oss → emulator-shape image → re-wrap with this repo's Dockerfile_gateway_public_image for the K8s slim shape).

Documentation (`docs/.../monitoring/`)

Architecture overview — sidecar injection flow, operator's base_config.yaml, ServiceMonitor examples
Metrics reference — gateway metrics, container resources, CNPG/PostgreSQL metrics, OTel naming conventions
Updated mkdocs.yml navigation

xgerman · 2026-03-11T22:22:56Z

Review — PR #286: Local Telemetry Playground + Monitoring Docs

This PR has two parts: (1) a code change to the sidecar injector adding OTEL resource attributes, and (2) documentation + playground files for monitoring. Reviewing both.

Code Change: `lifecycle.go` — OTEL Resource Attributes

The code adds POD_NAME (via downward API) and OTEL_RESOURCE_ATTRIBUTES=service.instance.id=$(POD_NAME) to every gateway sidecar container.

Check	Result
Env var ordering (POD_NAME before OTEL_RESOURCE_ATTRIBUTES)	✅ Correct — K8s resolves `$(VAR)` in declaration order
Downward API `metadata.name` field path	✅ Valid
OTEL semantic convention `service.instance.id`	✅ Matches OTel spec
Insertion point (after OTEL_EXPORTER_OTLP_ENDPOINT, before credentials)	✅ Clean
Existing tests cover this code path	⚠️ No `lifecycle_test.go` found — see Major #3

The code change is correct and well-placed.

🔴 Critical (1)

1. PostgreSQL receiver will fail to authenticate — password is `"unused"`

The OTel collector config contains:

postgresql:
  endpoint: documentdb-preview-rw.documentdb-preview-ns.svc.cluster.local:5432
  username: postgres
  password: "unused"

CNPG generates and manages the postgres superuser password in a Kubernetes Secret (typically <cluster-name>-superuser). The hardcoded "unused" password will cause the postgresql receiver to fail SQL authentication, meaning all PostgreSQL-level metrics (replication lag, backends, DB size, commits, rollbacks) documented in metrics.md will not actually be collected.

Fix options:

Mount the CNPG superuser secret into the OTel collector pod and reference it via ${env:PG_MONITOR_PASSWORD}
Create a dedicated monitoring role with limited permissions and a known password
Add a setup step to the README that creates the monitoring credentials

This also applies to PG_MONITOR_USER: postgres / PG_MONITOR_PASSWORD: unused in the collector deployment env vars.

🟠 Major (3)

2. Monitoring docs missing front matter (title, description, tags)

Both overview.md and metrics.md lack YAML front matter. Per the documentation standards, every page should include:

---
title: Monitoring Overview
description: How to monitor DocumentDB clusters using OpenTelemetry, Prometheus, and Grafana.
tags:
  - monitoring
  - observability
  - metrics
---

This affects search ranking, AI discoverability, and consistency with other documentation pages.

3. No tests for the `lifecycle.go` code change

The sidecar injector's lifecycle.go has no test file. The new POD_NAME and OTEL_RESOURCE_ATTRIBUTES env vars are injected into every DocumentDB gateway container — a functional change that affects all deployments. While the change is straightforward, a unit test would prevent regression (e.g., if someone reorders the env vars, breaking $(POD_NAME) resolution).

Suggestion: Add a test in operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle_test.go that verifies:

POD_NAME env var is present with FieldRef to metadata.name
OTEL_RESOURCE_ATTRIBUTES is present and contains service.instance.id=$(POD_NAME)
POD_NAME appears before OTEL_RESOURCE_ATTRIBUTES in the env slice

4. Traffic generator and observability stack contain hardcoded passwords

Multiple files use plaintext passwords:

documentdb-ha.yaml: password: DemoPassword100
traffic-generator.yaml: -p DemoPassword100
observability-stack.yaml: GF_SECURITY_ADMIN_PASSWORD: admin, GF_AUTH_ANONYMOUS_ORG_ROLE: Admin

For a playground/demo this is acceptable, but:

The Grafana anonymous admin access (GF_AUTH_ANONYMOUS_ENABLED: true + GF_AUTH_ANONYMOUS_ORG_ROLE: Admin) should have a comment warning this is for demo use only
Consider adding a ⚠️ DO NOT USE IN PRODUCTION header comment in observability-stack.yaml

🟡 Minor (5)

5. Gateway metric names cannot be verified against source code

The docs reference metrics like db_client_operations_total, gateway_client_connections_active, db_client_connection_active, etc. These are emitted by the gateway binary (separate repo), not the operator Go code. I cannot verify these metric names against this repository.

Recommendation: Add a note in metrics.md indicating these metric names are from the DocumentDB Gateway and may change between versions.

6. Collector deployed as Deployment but kubeletstats only covers one node

The kubeletstats receiver with K8S_NODE_NAME env var only scrapes the node where the collector pod runs. In the 4-node Kind cluster (1 control-plane + 3 workers), you'll miss kubelet metrics from 3 of 4 nodes.

Recommendation: Add a note in the observability-stack.yaml or overview.md that kubeletstats coverage is limited in Deployment mode (vs DaemonSet).

7. Kind config uses containerd v1 registry path

The kind-config.yaml uses [plugins."io.containerd.grpc.v1.cri".registry]. With kindest/node:v1.35.0 and containerd 2.x, this path may have changed to io.containerd.cri.v1.images. Verify against the target Kind version.

8. `K8S_VERSION` defaults to `v1.35.0` in setup-kind.sh

This pins to a very specific Kind image. If the image isn't available, the script fails. Consider defaulting to a known-good version or adding a version check.

9. The `documentdb-preview-rw` endpoint is hardcoded in collector config

The OTel collector config hardcodes documentdb-preview-rw.documentdb-preview-ns.svc.cluster.local:5432. This only works for a cluster named documentdb-preview in namespace documentdb-preview-ns. Document this coupling or make it configurable via env vars.

✅ What's Good

Clean code change: The POD_NAME / OTEL_RESOURCE_ATTRIBUTES injection is well-implemented with correct K8s env var ordering
Complete observability stack: Tempo (traces) + Loki (logs) + Prometheus (metrics) + Grafana (dashboards) + OTel Collector — full three-pillar observability
ExternalName service bridge: Clever cross-namespace OTLP routing from DocumentDB namespace to observability namespace
Comprehensive metrics reference: metrics.md covers container, gateway, operator, and PostgreSQL metrics with useful PromQL examples
Architecture diagram: Clear ASCII art showing data flow from pods → collector → backends → Grafana
OTel naming table: OpenTelemetry ↔ Prometheus metric name mapping table is very helpful
Traffic generator: Well-designed mongosh script with varied CRUD operations, periodic cleanup, and error handling
mkdocs.yml change is clean: Only adds nav entries, no extension conflicts with other PRs
Grafana dashboard JSON: Pre-built 1528-line dashboard — users get instant value
RBAC for collector: Proper ClusterRole with nodes/stats, pods, services, metrics access

Summary

Severity	Count	Items
🔴 Critical	1	PostgreSQL receiver auth will fail (`password: "unused"`)
🟠 Major	3	Missing front matter, no lifecycle tests, hardcoded passwords need warnings
🟡 Minor	5	Unverifiable gateway metrics, kubeletstats single-node, containerd v2, K8s version pin, hardcoded endpoint
✅	—	Code change correct, complete stack, clean mkdocs, good architecture

Verdict: The code change to lifecycle.go is correct and adds genuine value (per-instance metric attribution). The observability stack is comprehensive and well-designed. Fix the PostgreSQL receiver authentication (Critical #1) — without it, half the documented metrics won't actually be collected. Add front matter to the docs and ideally a unit test for the env var injection.

Copilot

Pull request overview

Adds a local “telemetry playground” for running a DocumentDB HA demo on Kind with a full observability stack, and introduces monitoring documentation pages (with MkDocs nav updates) describing the telemetry/metrics architecture and references. It also updates the CNPG sidecar injector to set per-pod OpenTelemetry resource attributes for better per-instance attribution.

Changes:

Add documentdb-playground/telemetry/local/ demo (Kind setup/deploy/teardown scripts, k8s manifests, traffic generator, Grafana dashboards).
Add Monitoring docs (overview.md, metrics.md) and update mkdocs.yml nav + Mermaid fenced blocks support.
Update sidecar injector to inject POD_NAME and OTEL_RESOURCE_ATTRIBUTES (and enable tracing via env var).

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go	Injects additional OTEL-related env vars into the gateway sidecar for per-instance attribution.
mkdocs.yml	Adds Monitoring section to nav and enables Mermaid code fences via `pymdownx.superfences`.
documentdb-playground/telemetry/local/scripts/teardown.sh	Adds Kind cluster teardown + proxy container cleanup.
documentdb-playground/telemetry/local/scripts/setup-kind.sh	Adds Kind cluster + local registry bootstrap script.
documentdb-playground/telemetry/local/scripts/deploy.sh	Adds one-command deploy flow for observability stack + DocumentDB + traffic.
documentdb-playground/telemetry/local/k8s/traffic/traffic-generator.yaml	Adds RW/RO traffic generator Jobs and gateway Services for demo load.
documentdb-playground/telemetry/local/k8s/observability/tempo.yaml	Adds Tempo deployment/service for traces backend.
documentdb-playground/telemetry/local/k8s/observability/prometheus.yaml	Adds Prometheus deployment/service and scrape config for the collector.
documentdb-playground/telemetry/local/k8s/observability/otel-collector.yaml	Adds OTel Collector deployment, RBAC, and pipelines for metrics/traces/logs.
documentdb-playground/telemetry/local/k8s/observability/namespace.yaml	Adds `observability` namespace manifest.
documentdb-playground/telemetry/local/k8s/observability/loki.yaml	Adds Loki deployment/service for logs backend.
documentdb-playground/telemetry/local/k8s/observability/grafana.yaml	Adds Grafana deployment/service plus datasource & dashboard provisioning.
documentdb-playground/telemetry/local/k8s/documentdb/collector-bridge.yaml	Adds ExternalName bridge Service so gateway OTLP endpoint resolves across namespaces.
documentdb-playground/telemetry/local/k8s/documentdb/cluster.yaml	Adds demo DocumentDB namespace, credentials, and DocumentDB CR for 3-instance HA.
documentdb-playground/telemetry/local/dashboards/internals.json	Adds “Internals” Grafana dashboard JSON.
documentdb-playground/telemetry/local/dashboards/gateway.json	Adds “Gateway” Grafana dashboard JSON.
documentdb-playground/telemetry/local/README.md	Documents local playground usage and architecture (diagram + steps).
docs/operator-public-documentation/preview/monitoring/overview.md	Adds monitoring architecture + setup/verification guidance.
docs/operator-public-documentation/preview/monitoring/metrics.md	Adds metrics reference and PromQL examples across signal sources.

You can also share your feedback on Copilot code review. Take the survey.

alaye-ms · 2026-03-27T19:05:33Z

+- **Helm 3** — [install](https://helm.sh/docs/intro/install/)
+- **jq** — for credential copying
+
+!!! important "Gateway OTEL support required"


These docs don't use the mkbuild stuff, so the !!! important tag doesn't work here

alaye-ms · 2026-03-27T19:07:17Z

+./scripts/validate.sh
+
+# Tear down
+./scripts/teardown.sh


Maybe this could have a separate heading for cleanup instead of being in quickstart

I see you do have a section for it, so I think it can be removed here.

WentingWu666666

docs/operator-public-documentation/preview/monitoring/overview.md (lines 23-30)

Instead of asking users to build their own gateway image from a specific branch, it would be better to publish a pre-built gateway image (e.g., to ghcr.io/documentdb/...) that includes the OTEL instrumentation. Requiring users to build the image themselves adds friction and is error-prone most users won't have the build toolchain set up for the gateway. Could we provide a ready-to-use image with telemetry support, or at least track an issue to do so before GA?

WentingWu666666 · 2026-04-10T15:27:20Z

Great work on the documentation, dashboards, and overall polish the metrics reference and alerting rules are really nice!

I'd like to discuss the collector architecture before we merge. The current design uses a central Deployment + DaemonSet, which has some limitations for production:

1. Replica blind spot: The postgresql receiver connects to the -rw service, which only routes to the primary. In a 3-instance cluster, replicas get zero PG metrics (connections, replication lag, index stats). This is a critical gap replication lag is the most important metric for replicas, and it's exactly the pod that can't be reached.

2. Hardcoded endpoint: The PG endpoint documentdb-preview-rw.documentdb-preview-ns.svc.cluster.local:5432 is baked into the collector config. This doesn't scale to multiple DocumentDB clusters or different namespaces without manual config duplication.

3. Cross-namespace coupling: The ExternalName bridge service is needed because the collector lives in a different namespace. This adds operational complexity and breaks if the CR is renamed.

Alternative: OTel Collector as a sidecar (injected by the CNPG plugin)

I've been prototyping this approach on users/wentingwu/otel-poc:

Each pod gets its own OTel Collector sidecar, auto-injected by the sidecar-injector plugin
Connects to PG via localhost:5432 works on primary AND replicas
Operator generates per-cluster ConfigMap no hardcoded names
Enable/disable via monitoring.enabled on the CR zero-downtime toggle via ConfigMap hot-reload
Resource cost: ~70 MB per pod (measured), negligible CPU

	Central Deployment	Sidecar
PG metrics from replicas	Only primary via `-rw`	All pods via localhost
Multi-cluster	Manual config per cluster	Operator auto-generates
Cross-namespace bridge	Required	Not needed (same pod)
Operator-managed	Static YAMLs	Reconciled by controller
Resource overhead	1 central + 1 DaemonSet per node	1 per pod (~70 MB)

Scope suggestion: For the initial merge, I'd suggest we reduce scope to VM/container metrics only (CPU, memory, network, filesystem via kubeletstats or hostmetrics). These are universally useful, don't require PG credentials, and let us ship something quickly while we align on the PG metrics collection architecture (sidecar vs central).

Happy to walk through the sidecar POC and discuss tradeoffs the docs and dashboards from this PR would pair perfectly with the sidecar approach.

… sidecar - Operator: add OTLP gRPC :4317 receiver to embedded base_config.yaml and include it in the generated metrics pipeline. - Sidecar-injector: when monitoring is enabled, set POD_NAME (downward API) and OTEL_RESOURCE_ATTRIBUTES=service.instance.id=$(POD_NAME) on the gateway container alongside the existing OTEL_EXPORTER_OTLP_ENDPOINT. - Playground: remove central OTel Collector, per-node DaemonSet, per-instance Services, ExternalName bridge, and the monitoring PG role. - Cluster manifest now sets spec.monitoring.enabled=true so each pod gets an in-pod otel-collector container; Prometheus discovers them via the pod annotations the injector adds. - Prometheus config switched to pod-annotation discovery + native kubelet / cAdvisor scrape; deploy.sh installs operator chart from this branch. - Dashboards remapped to cAdvisor metric names; Postgres-internal panels removed (deferred to a follow-up that extends base_config.yaml SQL). - Public monitoring docs and playground README rewritten for the sidecar architecture; Postgres metrics explicitly marked deferred. Signed-off-by: urismiley <urismiley@microsoft.com>

…review - lifecycle.go: inject OTEL_METRICS_ENABLED=true on gateway container (pgmongo gates OTel init behind it); extract injectGatewayOTelEnv helper with name-based dedup; add lifecycle_test.go (5 cases). - base_config.yaml: bind OTLP receiver to 127.0.0.1:4317. - cluster.yaml: pin gateway image with OTel support; sidecar prom port 9188. - dashboards: pivot Instance variable on pod label (Prometheus rewrites service.instance.id to exported_instance). - validate.sh: poll up to 120s for db_client_operations_total and fail red. - README/prometheus.yaml: document 9187/9188 split; add Gateway image section. - .gitignore: ignore generated operator/src/documentdb-chart/. Signed-off-by: urismiley <urismiley@microsoft.com>

Signed-off-by: urismiley <urismiley@microsoft.com>

WentingWu666666 · 2026-04-21T16:07:50Z

Suggestion: Add kubeletstats receiver to the OTel sidecar for container resource metrics

Currently, per-container CPU/memory metrics (cAdvisor) are only available via the Prometheus kubelet scrape path (kubelet-cadvisor job). Users who want pure OTLP export don't get container resource metrics at all.

Since the OTel Collector sidecar already supports both push (OTLP exporter) and pull (Prometheus exporter on :9188), we should add a kubeletstats receiver to the sidecar config. This would make container resource metrics available through both paths automatically:

receivers:
  otlp: ...              # gateway app metrics (existing)
  kubeletstats:          # container resource metrics (new)
    auth_type: serviceAccount
    collection_interval: 30s
    metric_groups: [pod, container]

This gives users metrics like container.cpu.usage and container.memory.working_set per container (postgres, gateway, otel-collector) via OTLP push or Prometheus pull with no extra DaemonSet or cluster-level infrastructure.

The only additional requirement is RBAC for the pod's ServiceAccount to read nodes/stats.

With this change, the kubelet-cadvisor Prometheus scrape job can be removed since the sidecar would already expose those metrics on the Prometheus endpoint.

WentingWu666666 · 2026-04-21T16:08:52Z

Suggestion: Split this PR into smaller, focused PRs by metric category

This PR covers a lot of ground (2900+ lines, 21 files). The operator core changes (sidecar injection, OTel base_config, lifecycle env var injection) look complete I'd suggest landing that as-is and then adding metrics support incrementally by category:

Container resource metrics Add kubeletstats receiver to the sidecar for per-container CPU/memory (postgres, gateway, otel-collector). This is the most fundamental users need to know their DB's resource usage first.
Gateway metrics OTLP receiver wiring for gateway-specific db_client_* metrics, dashboard.
CNPG metrics (optional) CNPG instance manager metrics are already exposed on :9187; integrating them into the OTel pipeline is nice-to-have.

Each can be its own PR with matching docs, dashboards, and playground updates. Easier to review, easier to revert, and unblocks the foundational pieces while the rest iterates.

xgerman reviewed Mar 11, 2026

View reviewed changes

Comment thread docs/operator-public-documentation/preview/monitoring/overview.md Outdated

udsmicrosoft force-pushed the users/urismiley/telemetry-playground branch from ae99e4f to 04ef816 Compare March 16, 2026 16:20

udsmicrosoft marked this pull request as ready for review March 16, 2026 16:40

udsmicrosoft requested review from alaye-ms and hossain-rayhan as code owners March 16, 2026 16:40

Copilot AI review requested due to automatic review settings March 16, 2026 16:40

Copilot started reviewing on behalf of udsmicrosoft March 16, 2026 16:42 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

udsmicrosoft force-pushed the users/urismiley/telemetry-playground branch 4 times, most recently from 9b0e6b5 to dfeb8fb Compare March 25, 2026 18:15

udsmicrosoft requested a review from WentingWu666666 as a code owner March 26, 2026 16:18

alaye-ms approved these changes Mar 27, 2026

View reviewed changes

WentingWu666666 reviewed Mar 28, 2026

View reviewed changes

udsmicrosoft changed the title ~~feat: add local telemetry playground and monitoring docs~~ feat: add local telemetry playground and monitoring docs (metrics-only) Apr 9, 2026

udsmicrosoft added this to DocumentDB k8s operator planning board Apr 9, 2026

github-project-automation Bot moved this to Backlog in DocumentDB k8s operator planning board Apr 9, 2026

udsmicrosoft removed this from DocumentDB k8s operator planning board Apr 9, 2026

udsmicrosoft self-assigned this Apr 9, 2026

WentingWu666666 mentioned this pull request Apr 10, 2026

Appinsights implementation #237

Open

udsmicrosoft force-pushed the users/urismiley/telemetry-playground branch from a22125c to 909d0c7 Compare April 20, 2026 17:24

udsmicrosoft changed the title ~~feat: add local telemetry playground and monitoring docs (metrics-only)~~ feat(telemetry): local telemetry playground using injected OTel sidecar (#342) Apr 21, 2026

docs(telemetry): drop stale 'deferred Postgres metrics' notices

a10cedf

Signed-off-by: urismiley <urismiley@microsoft.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): local telemetry playground using injected OTel sidecar (#342)#286

feat(telemetry): local telemetry playground using injected OTel sidecar (#342)#286
udsmicrosoft wants to merge 3 commits intodocumentdb:mainfrom
udsmicrosoft:users/urismiley/telemetry-playground

udsmicrosoft commented Mar 5, 2026 •

edited

Loading

Uh oh!

xgerman commented Mar 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alaye-ms Mar 27, 2026

Uh oh!

alaye-ms Mar 27, 2026

Uh oh!

alaye-ms Mar 27, 2026

Uh oh!

WentingWu666666 left a comment

Uh oh!

WentingWu666666 commented Apr 10, 2026

Uh oh!

WentingWu666666 commented Apr 21, 2026

Uh oh!

WentingWu666666 commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

udsmicrosoft commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture (re-aligned on #342)

Operator changes (under operator/)

Playground (documentdb-playground/telemetry/local/)

⚠️ Gateway image requirement

Documentation (docs/.../monitoring/)

Uh oh!

xgerman commented Mar 11, 2026

Review — PR #286: Local Telemetry Playground + Monitoring Docs

Code Change: lifecycle.go — OTEL Resource Attributes

🔴 Critical (1)

1. PostgreSQL receiver will fail to authenticate — password is "unused"

🟠 Major (3)

2. Monitoring docs missing front matter (title, description, tags)

3. No tests for the lifecycle.go code change

4. Traffic generator and observability stack contain hardcoded passwords

🟡 Minor (5)

5. Gateway metric names cannot be verified against source code

6. Collector deployed as Deployment but kubeletstats only covers one node

7. Kind config uses containerd v1 registry path

8. K8S_VERSION defaults to v1.35.0 in setup-kind.sh

9. The documentdb-preview-rw endpoint is hardcoded in collector config

✅ What's Good

Summary

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alaye-ms Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

alaye-ms Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

alaye-ms Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

WentingWu666666 left a comment

Choose a reason for hiding this comment

Uh oh!

WentingWu666666 commented Apr 10, 2026

Uh oh!

WentingWu666666 commented Apr 21, 2026

Uh oh!

WentingWu666666 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

udsmicrosoft commented Mar 5, 2026 •

edited

Loading

Operator changes (under `operator/`)

Playground (`documentdb-playground/telemetry/local/`)

Documentation (`docs/.../monitoring/`)

Code Change: `lifecycle.go` — OTEL Resource Attributes

1. PostgreSQL receiver will fail to authenticate — password is `"unused"`

3. No tests for the `lifecycle.go` code change

8. `K8S_VERSION` defaults to `v1.35.0` in setup-kind.sh

9. The `documentdb-preview-rw` endpoint is hardcoded in collector config

WentingWu666666 commented Apr 21, 2026 •

edited

Loading