-
Notifications
You must be signed in to change notification settings - Fork 20
Appinsights implementation #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
db6ad55
0165f5c
7e64d7f
453da4e
ded55fa
615a3da
c5b0719
c3b92fe
b795272
4481b4f
244f5b9
08dd7a5
81ccaf7
05f942e
831453c
195eb55
131d925
da1d1a8
bf654e0
a4f9acf
858ce00
e10692b
8ba1a3e
d21272d
a68ed04
e480ef3
c1695bc
e7b9e7d
9e569b0
8a9b9b9
ddd25f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,146 @@ | ||
| # Application Insights Telemetry Configuration | ||
|
|
||
| This document describes how to configure Application Insights telemetry collection for the DocumentDB Kubernetes Operator. | ||
|
|
||
| ## Overview | ||
|
|
||
| The DocumentDB Operator can send telemetry data to Azure Application Insights to help monitor operator health, track cluster lifecycle events, and diagnose issues. All telemetry is designed with privacy in mind - no personally identifiable information (PII) is collected. | ||
|
|
||
| ## Configuration | ||
|
|
||
| ### Environment Variables | ||
|
|
||
| Configure telemetry by setting these environment variables in the operator deployment: | ||
|
|
||
| | Variable | Description | Required | | ||
| |----------|-------------|----------| | ||
| | `APPINSIGHTS_INSTRUMENTATIONKEY` | Application Insights instrumentation key | Yes (or connection string) | | ||
| | `APPLICATIONINSIGHTS_CONNECTION_STRING` | Application Insights connection string (alternative to instrumentation key) | Yes (or instrumentation key) | | ||
| | `DOCUMENTDB_TELEMETRY_ENABLED` | Set to `false` to disable telemetry collection | No (default: `true`) | | ||
|
|
||
| > **Note on naming convention:** `APPINSIGHTS_INSTRUMENTATIONKEY` and `APPLICATIONINSIGHTS_CONNECTION_STRING` are the | ||
| > [official Microsoft Application Insights SDK environment variable names](https://learn.microsoft.com/en-us/azure/azure-monitor/app/connection-strings). | ||
| > The naming difference (`APPINSIGHTS_` vs `APPLICATIONINSIGHTS_`) reflects Microsoft's SDK conventions, not an inconsistency in this project. | ||
|
|
||
| ### Helm Chart Configuration | ||
|
|
||
| When installing via Helm, you can configure telemetry in your values.yaml: | ||
|
|
||
| ```yaml | ||
| # values.yaml | ||
| telemetry: | ||
| enabled: true | ||
| instrumentationKey: "YOUR-INSTRUMENTATION-KEY-HERE" | ||
| # Or use connection string: | ||
| # connectionString: "InstrumentationKey=xxx;IngestionEndpoint=https://..." | ||
| # Or use an existing secret containing APPINSIGHTS_INSTRUMENTATIONKEY / APPLICATIONINSIGHTS_CONNECTION_STRING: | ||
| # existingSecret: "documentdb-operator-telemetry" | ||
| ``` | ||
|
|
||
| ### Kubernetes Secret | ||
|
|
||
| For production deployments, store the instrumentation key in a Kubernetes secret: | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: Secret | ||
| metadata: | ||
| name: documentdb-operator-telemetry | ||
| namespace: documentdb-system | ||
| type: Opaque | ||
| stringData: | ||
| APPINSIGHTS_INSTRUMENTATIONKEY: "YOUR-INSTRUMENTATION-KEY-HERE" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another underscore would be good separation: APPINSIGHTS_INSTRUMENTATION _KEY
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above comment |
||
| ``` | ||
|
|
||
| Then reference it in the operator deployment: | ||
|
|
||
| ```yaml | ||
| envFrom: | ||
| - secretRef: | ||
| name: documentdb-operator-telemetry | ||
| ``` | ||
|
|
||
| ## Privacy & Data Collection | ||
|
|
||
| ### What We Collect | ||
|
|
||
| The operator collects anonymous, aggregated telemetry data including: | ||
|
|
||
| - **Operator lifecycle**: Startup events, health status, version information | ||
| - **Cluster operations**: Create, update, delete events (with timing metrics) | ||
| - **Backup operations**: Backup creation, completion, and expiration events | ||
| - **Error tracking**: Categorized errors (no raw error messages with sensitive data) | ||
| - **Performance metrics**: Reconciliation duration, API call latency | ||
|
|
||
| ### What We DON'T Collect | ||
|
|
||
| To protect your privacy, we explicitly do NOT collect: | ||
|
|
||
| - Cluster names, namespace names, or any user-provided resource names | ||
| - Connection strings, passwords, or credentials | ||
| - IP addresses or hostnames | ||
| - Storage class names (may contain organizational information) | ||
| - Raw error messages (only categorized error types) | ||
| - Container image names | ||
|
|
||
| ### Privacy Protection Mechanisms | ||
|
|
||
| 1. **GUIDs Instead of Names**: All resources are identified by auto-generated GUIDs stored in annotations (`telemetry.documentdb.io/cluster-id`) | ||
| 2. **Hashed Namespaces**: Namespace names are SHA-256 hashed before transmission | ||
| 3. **Categorized Data**: Values like PVC sizes are categorized (small/medium/large) instead of exact values | ||
| 4. **Error Sanitization**: Error messages are stripped of potential PII and truncated | ||
|
|
||
| ## Disabling Telemetry | ||
|
|
||
| To completely disable telemetry collection: | ||
|
|
||
| 1. **Via environment variable**: | ||
| ```yaml | ||
| env: | ||
| - name: DOCUMENTDB_TELEMETRY_ENABLED | ||
| value: "false" | ||
| ``` | ||
|
|
||
| 2. **Via Helm** (at install time): | ||
| ```yaml | ||
| telemetry: | ||
| enabled: false | ||
| ``` | ||
|
Comment on lines
+103
to
+108
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can they turn off telemetry on an already provisioned/running cluster?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, users can do helm upgrade --set telemetry.enabled=false. I can add this to more documentation if needed. |
||
|
|
||
| 3. **Via Helm upgrade** (on an already running cluster): | ||
| ```bash | ||
| helm upgrade documentdb-operator ./operator/documentdb-helm-chart \ | ||
| --namespace documentdb-operator \ | ||
| --set telemetry.enabled=false | ||
| ``` | ||
| This restarts the operator pod with telemetry disabled. No data loss or downtime for DocumentDB clusters — only the operator pod restarts. | ||
|
|
||
| 4. **Don't provide instrumentation key**: If no `APPINSIGHTS_INSTRUMENTATIONKEY` or `APPLICATIONINSIGHTS_CONNECTION_STRING` is set, telemetry is automatically disabled. | ||
|
|
||
| ## Telemetry Events Reference | ||
|
|
||
| See [appinsights-metrics.md](appinsights-metrics.md) for the complete specification of all telemetry events and metrics collected. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Telemetry Not Being Sent | ||
|
|
||
| 1. Verify the instrumentation key is correctly configured: | ||
| ```bash | ||
| kubectl get deployment documentdb-operator -n documentdb-system -o yaml | grep -A5 APPINSIGHTS | ||
| ``` | ||
|
|
||
| 2. Check operator logs for telemetry initialization: | ||
| ```bash | ||
| kubectl logs -n documentdb-system -l app=documentdb-operator | grep -i telemetry | ||
| ``` | ||
|
|
||
| 3. Verify network connectivity to Application Insights endpoint (`dc.services.visualstudio.com`) | ||
|
|
||
| ### High Cardinality Warnings | ||
|
|
||
| If you see warnings about high cardinality dimensions, this indicates too many unique values for a dimension. The telemetry system automatically samples high-frequency events to mitigate this. | ||
|
|
||
| ## Support | ||
|
|
||
| For issues related to telemetry collection, please open an issue on the [GitHub repository](https://github.com/documentdb/documentdb-kubernetes-operator/issues). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPINSIGHTS vs APPLICATIONINSHIGHTS. We might stick to one name.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are actually official Microsoft names that the AppInsights Go SDK expects. I included both because a user could use either one to pass in their own string. I added a section in the design doc that explains this a bit more.
APPINSIGHTS_INSTRUMENTATIONKEY — Takes a bare instrumentation key (just the GUID, e.g., f5614a64-9358-44db-b19b-18a2eb54f623). This is the legacy method from the original Application Insights SDK. The Go SDK we use (microsoft/ApplicationInsights-Go) reads this variable.
APPLICATIONINSIGHTS_CONNECTION_STRING — Takes a full connection string (e.g., InstrumentationKey=f5614a64-...;IngestionEndpoint=https://westus2.in.applicationinsights.azure.com/). Based on modern connection string from https://learn.microsoft.com/en-us/azure/azure-monitor/app/connection-strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go SDK this PR depends on (github.com/microsoft/ApplicationInsights-Go) is archived and explicitly unsupported:
"This SDK is NOT currently maintained or supported by Microsoft. Azure Monitor only provides support when using our supported SDKs, and this SDK does not yet meet that
standard."
Known gaps listed by Microsoft themselves:
Building a new telemetry feature on an archived, unsupported dependency is risky — any bugs or security issues won't get upstream fixes, and it may break with future App
Insights backend changes.
Recommended alternative: OpenTelemetry Go SDK + OTel Collector sidecar
Microsoft's recommended path for Go is the standard OpenTelemetry Go SDK (go.opentelemetry.io/otel) + an OTel Collector with the Azure Monitor exporter. This is:
We already have an OTel Collector sidecar architecture in progress — see PR #286 by Uri, where I've left a review comment recommending this approach. Your telemetry PR would build on top of that foundation: the operator emits custom events/metrics via the OTel SDK, and the collector routes them to App Insights. This unifies the telemetry story — one SDK (OTel) for both user-facing monitoring and product telemetry.
I'd suggest coordinating with Uri to get PR #286 merged first. If needed, I can help create a simpler PR to add the collector sidecar to unblock your work here.