Conversation
| If you don't yet know how much telemetry your workloads generate, start with [How to evaluate telemetry volume](/how-to/configure-and-tune/evaluate-telemetry-volume). | ||
|
|
||
| Follow the [storage best practices](/reference/storage) to set up a distributed storage backend with a replication factor of 3. | ||
| Do **not** use `hostPath` storage in production. |
There was a problem hiding this comment.
Perhaps a brief explanation about "why not to use hostPath"?
| Only the Grafana and Traefik charms support auth. | ||
| For exposing Grafana publicly, use two Traefik charms, one for internal connections, and another for external access, which will provide ingress to Grafana. |
There was a problem hiding this comment.
One minor suggestion:
| Only the Grafana and Traefik charms support auth. | |
| For exposing Grafana publicly, use two Traefik charms, one for internal connections, and another for external access, which will provide ingress to Grafana. | |
| Only the Grafana and Traefik charms support authentication. | |
| To expose Grafana publicly, deploy two Traefik charms: one for internal connections and another for external access to provide ingress. |
There was a problem hiding this comment.
Pull request overview
Adds a new installation guide under the “Deploy and manage” docs, and updates related navigation/style guidance in the documentation set.
Changes:
- Add
How to install COSguide with preparation checklist and initial deployment sections. - Update the deploy-and-manage index to include the new install page and reorganize the Upgrades section.
- Expand telemetry-volume guidance with manual evaluation steps for metrics/logs; update docs review prompt to specify US English spelling.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
docs/how-to/deploy-and-manage/install.md |
New installation guide content for COS/COS Lite (prep + initial deployment outline). |
docs/how-to/deploy-and-manage/index.md |
Adds the install page to the Deploy toctree and moves Upgrades into its own section. |
docs/how-to/configure-and-tune/evaluate-telemetry-volume.md |
Adds manual evaluation subsections for metrics/logs, but introduces/contains formatting issues that can break rendering. |
.github/prompts/review-docs.prompt.md |
Updates docs review checklist to specify US English spelling. |
Comments suppressed due to low confidence (1)
docs/how-to/configure-and-tune/evaluate-telemetry-volume.md:66
- The code fence closing delimiter here has four backticks instead of three, which will break Markdown rendering for the rest of the page. Make the opening/closing code fence delimiters match.
sum(rate(loki_distributor_bytes_received_total[5m]) / 1e9 * 606024)
sum(rate(loki_distributor_lines_received_total[5m]) * 60)
```
</details>
---
💡 <a href="/canonical/observability-stack/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
|
|
||
| ### COS flavor | ||
|
|
||
| The [flavor of COS](/explanation/overview/what-is-cos) to install depends on your use-case. |
| ## Create Terraform plan | ||
|
|
||
| ```hcl | ||
| module ... { ... } | ||
| ``` |
|
|
||
| ## Deploy COS Alerter | ||
|
|
||
| COS Alerter is a watchdog service for COS you should deploy on a physically different cloud. No newline at end of file |
| curl -sf localhost:8080/metrics | grep -v "^# " | wc -l | ||
| ``` | ||
|
|
||
| This will give you the number of timeseries that will be created for the workload, per unit. |
| Have your deployment sending all metrics to Prometheus (or Mimir) and inspect the 48hr plot for `count({__name__=~".+"})`. | ||
| The raw data can also be obtained by querying the Prometheus `query` endpoint directly: | ||
|
|
| --- | ||
|
|
||
| # How to install COS | ||
|
|
|
|
||
| Before deploying COS or COS Lite, work through the items below. | ||
|
|
||
| ### COS flavor |
There was a problem hiding this comment.
Please sync with @michaeldmitry about the flavor because we had a chat incl. Simme and Pietro about this. Just to make sure we are aligned cross-team.
| ```hcl | ||
| module ... { ... } | ||
| ``` | ||
|
|
There was a problem hiding this comment.
| ### Revision pins | |
| Deploying COS without revision pins, per component, will deploy the latest charms revisions in-track. Any subsequent Terraform plans will experience the same behaviour i.e., keeping COS up-to-date. However, if you require more stability, it is advised to pin the charm revisions of all components. |
Signed-off-by: Michael Thamm <mike.thamm@canonical.com>
| Only the Grafana and Traefik charms support auth. | ||
| For exposing Grafana publicly, use two Traefik charms, one for internal connections, and another for external access, which will provide ingress to Grafana. |
There was a problem hiding this comment.
Can we elaborate on the need for this or a topology diagram because I do not understand when or why to do this.
| subgraph pc["Public cloud"] | ||
| cos-alerter["COS Alerter"] | ||
| end |
There was a problem hiding this comment.
Do we really need COS Alerter here? For the purposes of the install guide, this diagram would be informative even without having COS Alerter here.
| If you want to install on edge devices, then COS Lite is likely the right choice; otherwise | ||
| you should probably go with "full" COS. |
There was a problem hiding this comment.
| If you want to install on edge devices, then COS Lite is likely the right choice; otherwise | |
| you should probably go with "full" COS. | |
| If you want to install on edge devices, want to rely on local storage, or do not need high availability, then COS Lite is likely the right choice; otherwise | |
| you should probably go with "full" COS, sometimes referred to as COS HA. |
| The [flavor of COS](/explanation/overview/what-is-cos) to install depends on your use-case. | ||
| If you want to install on edge devices, then COS Lite is likely the right choice; otherwise | ||
| you should probably go with "full" COS. | ||
|
|
||
| ```{mermaid} | ||
| graph LR | ||
|
|
||
| subgraph env["Monitored environment"] | ||
| opentelemetry-collector | ||
| end | ||
|
|
||
| subgraph k8s["K8s cluster"] | ||
| COS | ||
| end | ||
|
|
||
| subgraph pc["Public cloud"] | ||
| cos-alerter["COS Alerter"] | ||
| end | ||
|
|
||
| subgraph storage["Storage cluster"] | ||
| S3 | ||
| end | ||
|
|
||
| opentelemetry-collector ---|telemetry| COS | ||
| COS --- S3 | ||
| COS --- cos-alerter | ||
| ``` |
There was a problem hiding this comment.
We explain which COS flavour should we choose, and then we show a general diagram of just COS... this is weird
| ```{mermaid} | ||
| graph LR | ||
|
|
||
| subgraph env["Monitored environment"] |
There was a problem hiding this comment.
| subgraph env["Monitored environment"] | |
| subgraph env["Observed environment"] |
| Review the [networking best practices](/reference/networking) and ensure: | ||
|
|
||
| - A load balancer (for example, MetalLB) is available to give Traefik a stable IP. | ||
| - Egress is open for Charmhub, the Juju OCI registry, and Snapcraft. |
There was a problem hiding this comment.
Shouldn't we specify the URLs?
|
|
||
| You should bootstrap a dedicated Juju controller and model just for COS. | ||
|
|
||
| ## Create Terraform plan |
There was a problem hiding this comment.
| ## Create Terraform plan | |
| ## Terraform plan | |
| Create a `main.tf` file like this one: |
|
|
||
| ### Revision pins | ||
|
|
||
| Deploying COS without revision pins, per component, will deploy the latest charms revisions in-track. Any subsequent Terraform plans will experience the same behaviour i.e., keeping COS up-to-date. However, if you require more stability, it is advised to pin the charm revisions of all components. |
There was a problem hiding this comment.
Shouldn't we recommend pin to stable is stability is a requirement?
There was a problem hiding this comment.
We do have risk = "stable" in the TF module, but the problem is that if you deploy stable today you might get revision 123 and tomorrow you get 124, which are both on the stable risk, but there is no guarantee that today's COS is the same as tomorrow's. That is what I was trying to clarify with this section
Fixes #288
This PR adds an install guide.