Skip to content

feat(doc): add install guide#289

Draft
sed-i wants to merge 13 commits intomainfrom
feature/install-guide
Draft

feat(doc): add install guide#289
sed-i wants to merge 13 commits intomainfrom
feature/install-guide

Conversation

@sed-i
Copy link
Copy Markdown
Contributor

@sed-i sed-i commented Apr 21, 2026

Fixes #288

This PR adds an install guide.

Comment thread docs/how-to/deploy-and-manage/install.md Outdated
If you don't yet know how much telemetry your workloads generate, start with [How to evaluate telemetry volume](/how-to/configure-and-tune/evaluate-telemetry-volume).

Follow the [storage best practices](/reference/storage) to set up a distributed storage backend with a replication factor of 3.
Do **not** use `hostPath` storage in production.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a brief explanation about "why not to use hostPath"?

Comment on lines +68 to +69
Only the Grafana and Traefik charms support auth.
For exposing Grafana publicly, use two Traefik charms, one for internal connections, and another for external access, which will provide ingress to Grafana.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion:

Suggested change
Only the Grafana and Traefik charms support auth.
For exposing Grafana publicly, use two Traefik charms, one for internal connections, and another for external access, which will provide ingress to Grafana.
Only the Grafana and Traefik charms support authentication.
To expose Grafana publicly, deploy two Traefik charms: one for internal connections and another for external access to provide ingress.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new installation guide under the “Deploy and manage” docs, and updates related navigation/style guidance in the documentation set.

Changes:

  • Add How to install COS guide with preparation checklist and initial deployment sections.
  • Update the deploy-and-manage index to include the new install page and reorganize the Upgrades section.
  • Expand telemetry-volume guidance with manual evaluation steps for metrics/logs; update docs review prompt to specify US English spelling.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
docs/how-to/deploy-and-manage/install.md New installation guide content for COS/COS Lite (prep + initial deployment outline).
docs/how-to/deploy-and-manage/index.md Adds the install page to the Deploy toctree and moves Upgrades into its own section.
docs/how-to/configure-and-tune/evaluate-telemetry-volume.md Adds manual evaluation subsections for metrics/logs, but introduces/contains formatting issues that can break rendering.
.github/prompts/review-docs.prompt.md Updates docs review checklist to specify US English spelling.
Comments suppressed due to low confidence (1)

docs/how-to/configure-and-tune/evaluate-telemetry-volume.md:66

  • The code fence closing delimiter here has four backticks instead of three, which will break Markdown rendering for the rest of the page. Make the opening/closing code fence delimiters match.

sum(rate(loki_distributor_bytes_received_total[5m]) / 1e9 * 606024)
sum(rate(loki_distributor_lines_received_total[5m]) * 60)

```
</details>



---

💡 <a href="/canonical/observability-stack/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.


### COS flavor

The [flavor of COS](/explanation/overview/what-is-cos) to install depends on your use-case.
Comment on lines +75 to +79
## Create Terraform plan

```hcl
module ... { ... }
```

## Deploy COS Alerter

COS Alerter is a watchdog service for COS you should deploy on a physically different cloud. No newline at end of file
curl -sf localhost:8080/metrics | grep -v "^# " | wc -l
```

This will give you the number of timeseries that will be created for the workload, per unit.
Comment on lines 29 to 31
Have your deployment sending all metrics to Prometheus (or Mimir) and inspect the 48hr plot for `count({__name__=~".+"})`.
The raw data can also be obtained by querying the Prometheus `query` endpoint directly:

---

# How to install COS


Before deploying COS or COS Lite, work through the items below.

### COS flavor
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sync with @michaeldmitry about the flavor because we had a chat incl. Simme and Pietro about this. Just to make sure we are aligned cross-team.

Comment thread docs/how-to/deploy-and-manage/install.md Outdated
```hcl
module ... { ... }
```

Copy link
Copy Markdown
Contributor

@MichaelThamm MichaelThamm May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Revision pins
Deploying COS without revision pins, per component, will deploy the latest charms revisions in-track. Any subsequent Terraform plans will experience the same behaviour i.e., keeping COS up-to-date. However, if you require more stability, it is advised to pin the charm revisions of all components.

Signed-off-by: Michael Thamm <mike.thamm@canonical.com>
Comment on lines +68 to +69
Only the Grafana and Traefik charms support auth.
For exposing Grafana publicly, use two Traefik charms, one for internal connections, and another for external access, which will provide ingress to Grafana.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we elaborate on the need for this or a topology diagram because I do not understand when or why to do this.

Comment on lines +30 to +32
subgraph pc["Public cloud"]
cos-alerter["COS Alerter"]
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need COS Alerter here? For the purposes of the install guide, this diagram would be informative even without having COS Alerter here.

Comment on lines +16 to +17
If you want to install on edge devices, then COS Lite is likely the right choice; otherwise
you should probably go with "full" COS.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you want to install on edge devices, then COS Lite is likely the right choice; otherwise
you should probably go with "full" COS.
If you want to install on edge devices, want to rely on local storage, or do not need high availability, then COS Lite is likely the right choice; otherwise
you should probably go with "full" COS, sometimes referred to as COS HA.

Comment on lines +15 to +41
The [flavor of COS](/explanation/overview/what-is-cos) to install depends on your use-case.
If you want to install on edge devices, then COS Lite is likely the right choice; otherwise
you should probably go with "full" COS.

```{mermaid}
graph LR

subgraph env["Monitored environment"]
opentelemetry-collector
end

subgraph k8s["K8s cluster"]
COS
end

subgraph pc["Public cloud"]
cos-alerter["COS Alerter"]
end

subgraph storage["Storage cluster"]
S3
end

opentelemetry-collector ---|telemetry| COS
COS --- S3
COS --- cos-alerter
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explain which COS flavour should we choose, and then we show a general diagram of just COS... this is weird

```{mermaid}
graph LR

subgraph env["Monitored environment"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
subgraph env["Monitored environment"]
subgraph env["Observed environment"]

Review the [networking best practices](/reference/networking) and ensure:

- A load balancer (for example, MetalLB) is available to give Traefik a stable IP.
- Egress is open for Charmhub, the Juju OCI registry, and Snapcraft.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we specify the URLs?


You should bootstrap a dedicated Juju controller and model just for COS.

## Create Terraform plan
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Create Terraform plan
## Terraform plan
Create a `main.tf` file like this one:


### Revision pins

Deploying COS without revision pins, per component, will deploy the latest charms revisions in-track. Any subsequent Terraform plans will experience the same behaviour i.e., keeping COS up-to-date. However, if you require more stability, it is advised to pin the charm revisions of all components.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we recommend pin to stable is stability is a requirement?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have risk = "stable" in the TF module, but the problem is that if you deploy stable today you might get revision 123 and tomorrow you get 124, which are both on the stable risk, but there is no guarantee that today's COS is the same as tomorrow's. That is what I was trying to clarify with this section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a doc for COS day-2 charm revisions updated to latest in track

5 participants