ClickHouse · mintlify · Jun 23, 2026
diff --git a/products/kubernetes-operator/guides/configuration.mdx b/products/kubernetes-operator/guides/configuration.mdx
diff --git a/products/kubernetes-operator/guides/introduction.mdx b/products/kubernetes-operator/guides/introduction.mdx
@@ -10,6 +10,7 @@ doc_type: 'guide'
 This document provides an overview of key concepts and usage patterns for the ClickHouse Operator.
 
 ## What is the ClickHouse Operator {#what-is-the-clickhouse-operator}
+
 The ClickHouse Operator is a Kubernetes operator that automates the deployment and management of ClickHouse clusters on Kubernetes. Built using the operator pattern, it extends the Kubernetes API with custom resources that represent ClickHouse clusters and their dependencies.
 
 The operator handles:
@@ -21,9 +22,11 @@ The operator handles:
 - Storage provisioning
 
 ## Custom resources {#custom-resources}
+
 The operator provides two main custom resource definitions (CRDs):
 
 ### ClickHouseCluster {#clickhousecluster}
+
 Represents a ClickHouse database cluster with configurable replicas and shards.
 
 ```yaml
@@ -43,6 +46,7 @@ spec:
 ```
 
 ### KeeperCluster {#keepercluster}
+
 Represents a ClickHouse Keeper cluster for distributed coordination (ZooKeeper replacement).
 
 ```yaml
@@ -59,11 +63,14 @@ spec:
 ```
 
 ## Coordination {#coordination}
+
 ### ClickHouse Keeper is required {#clickhouse-keeper-is-required}
+
 Every ClickHouseCluster requires a ClickHouse Keeper cluster for distributed coordination.
 The Keeper cluster must be referenced in the ClickHouseCluster spec using `keeperClusterRef`. By default the operator looks in the ClickHouseCluster namespace, but you can also set `keeperClusterRef.namespace` to point at a KeeperCluster in another watched namespace.
 
 ###  One-to-One Keeper relationship {#one-to-one-keeper-relationship}
+
 Each ClickHouseCluster must have its own dedicated KeeperCluster. You can't share a single KeeperCluster between multiple ClickHouseClusters.
 
 **Why?** The operator automatically generates a unique authentication key for each ClickHouseCluster to access its Keeper. This key is stored in a Secret and can't be shared.
@@ -86,9 +93,11 @@ When recreating a cluster:
 To avoid authentication errors, either delete the Persistent Volumes manually or recreate both clusters together with fresh storage.
 
 ## Schema Replication {#schema-replication}
+
 The ClickHouse Operator automatically replicates database definitions across all replicas in a cluster.
 
 ### What Gets Replicated {#what-gets-replicated}
+
 The operator synchronizes:
 - [Replicated](/reference/engines/database-engines/replicated) database definitions
 - Integration database engines (PostgreSQL, MySQL, etc.)
@@ -99,6 +108,7 @@ The operator does **not** synchronize:
 - Table data (handled by ClickHouse replication)
 
 ### Recommended: Use Replicated database engine {#recommended-use-replicated-database-engine}
+
 <Tip>
 **Best practice**
 
@@ -118,18 +128,22 @@ CREATE DATABASE my_database ON CLUSTER 'default' ENGINE = Replicated;
 ```
 
 ### Avoid non-Replicated engines {#avoid-non-replicated-engines}
+
 Non-replicated database engines (Atomic, Lazy, SQLite, Ordinary) require manual schema management:
 - Tables must be created individually on each replica
 - Schema drift can occur between nodes
 - Operator can't automatically sync new replicas
 
 ### Disable schema replication {#disable-schema-replication}
+
 To disable automatic schema replication, set `spec.settings.enableDatabaseSync` to `false` in the ClickHouseCluster resource.
 
 ## Storage management {#storage-management}
+
 The operator manages storage through Kubernetes PersistentVolumeClaims (PVCs).
 
 ### Data volume configuration {#data-volume-configuration}
+
 Specify storage requirements in `dataVolumeClaimSpec`:
 
 ```yaml
@@ -142,6 +156,7 @@ spec:
 ```
 
 ### Storage Lifecycle {#storage-lifecycle}
+
 - **Creation**: PVCs are created automatically with the cluster
 - **Expansion**: Supported if StorageClass allows volume expansion
 - **Retention**: PVCs are **not** deleted automatically on cluster deletion
@@ -161,6 +176,7 @@ kubectl delete pvc -l app.kubernetes.io/instance=my-cluster-clickhouse
 ```
 
 ## Default configuration highlights {#default-configuration-highlights}
+
 * **Pre-configured Cluster:** Cluster named 'default' containing all ClickHouse nodes.
 * **Default macros:** Some useful macros are pre-defined:
   - `{cluster}`: Cluster name (`default`)
@@ -170,5 +186,6 @@ kubectl delete pvc -l app.kubernetes.io/instance=my-cluster-clickhouse
 * **Replicated storage for User Defined Functions(UDF)**
 
 ## Next steps {#next-steps}
+
 - [Configuration Guide](/products/kubernetes-operator/guides/configuration) - Detailed configuration options
 - [API Reference](/products/kubernetes-operator/reference/api-reference) - Complete API documentation
diff --git a/products/kubernetes-operator/guides/monitoring.mdx b/products/kubernetes-operator/guides/monitoring.mdx
@@ -1,5 +1,5 @@
 ---
-position: 3
+position: 4
 slug: /clickhouse-operator/guides/monitoring
 title: Monitoring the ClickHouse Operator
 keywords: ['kubernetes', 'prometheus', 'monitoring', 'metrics']
@@ -16,6 +16,7 @@ This guide is about the **operator process itself** (the controller manager). Fo
 </Note>
 
 ## Endpoints {#endpoints}
+
 The operator process exposes two HTTP endpoints inside the manager pod:
 
 | Endpoint | Default port | Path | Purpose |
@@ -28,6 +29,7 @@ The metrics endpoint is **off by default** when running the operator binary dire
 The health probe endpoint is always on; the deployment template wires `/healthz` and `/readyz` to the pod's liveness and readiness probes on port `8081`.
 
 ## Operator binary flags {#operator-binary-flags}
+
 The relevant `manager` flags (defined in [`cmd/main.go`](https://github.com/ClickHouse/clickhouse-operator/blob/main/cmd/main.go)):
 
 | Flag | Default | Description |
@@ -46,6 +48,7 @@ The `8443` (HTTPS) / `8080` (HTTP) convention in the flag's help text is only a
 </Note>
 
 ## Enable metrics via Helm {#enable-metrics-via-helm}
+
 The chart already creates a `Service` for the metrics port and, optionally, a `ServiceMonitor` for prometheus-operator.
 
 The metrics endpoint itself is on by default (`metrics.enable: true`, port `8080`, served over HTTPS via `metrics.secure: true`). The only setting you typically need to flip is `prometheus.enable` to have the chart create a `ServiceMonitor` for you:
@@ -90,6 +93,7 @@ After install the chart creates:
 - `ClusterRole/<resource-prefix>-metrics-reader` — non-resource URL `/metrics` with `get` verb.
 
 ## Securing the metrics endpoint {#securing-the-metrics-endpoint}
+
 When `metrics.secure: true` the metrics server enforces TLS **and** Kubernetes authentication/authorization on every scrape. Scrapers must:
 
 1. Present a valid Kubernetes bearer token.
@@ -131,6 +135,7 @@ If you see `401 Unauthorized` or `403 Forbidden` from the metrics endpoint, the
 </Warning>
 
 ## ServiceMonitor reference {#servicemonitor-reference}
+
 The chart renders a ServiceMonitor of this shape when `prometheus.enable: true`:
 
 ```yaml
@@ -168,9 +173,11 @@ spec:
 If your Prometheus instance does not run cert-manager, set `tlsConfig.insecureSkipVerify: true` and rely on bearer-token authentication only — the chart already does this when `certManager.enable: false`.
 
 ## Standalone Prometheus example {#standalone-prometheus-example}
+
 If you do not use kube-prometheus-stack, the repository ships a self-contained example at [`examples/prometheus_secure_metrics_scraper.yaml`](https://github.com/ClickHouse/clickhouse-operator/blob/main/examples/prometheus_secure_metrics_scraper.yaml). It creates a ServiceAccount, the necessary RBAC, and a `Prometheus` CR that selects the operator's ServiceMonitor.
 
 ## Health probe endpoints {#health-probe-endpoints}
+
 | Path | Used by | Returns |
 |---|---|---|
 | `/healthz` | Kubernetes liveness probe | `200 OK` as long as the probe server is listening. |
@@ -198,9 +205,11 @@ readinessProbe:
 A repeatedly failing probe usually means the probe server itself never came up — for example, the manager exited early during startup. Check the manager logs for `unable to start manager`, RBAC failures, or `cache did not sync` errors.
 
 ## Metrics catalog {#metrics-catalog}
+
 The operator does not register custom Prometheus collectors. Everything below is exposed by the underlying `controller-runtime` and `client-go` libraries. The most useful series, grouped by purpose:
 
 ### Reconciliation activity {#reconciliation-activity}
+
 | Metric | Type | Labels |
 |---|---|---|
 | `controller_runtime_reconcile_total` | counter | `controller`, `result` (`success` / `error` / `requeue` / `requeue_after`) |
@@ -212,6 +221,7 @@ The operator does not register custom Prometheus collectors. Everything below is
 The `controller` label is derived by `controller-runtime` from the resource type registered with `For(...)`. With the current code in `internal/controller/clickhouse` and `internal/controller/keeper` this resolves to `clickhousecluster` and `keepercluster` respectively. If you have customized the operator, verify with a one-time scrape of `/metrics`.
 
 ### Work queue {#work-queue}
+
 | Metric | Type | Labels |
 |---|---|---|
 | `workqueue_depth` | gauge | `name`, `controller`, `priority` |
@@ -225,21 +235,25 @@ The `controller` label is derived by `controller-runtime` from the resource type
 The `name` and `controller` labels carry the same value (the controller name).
 
 ### API server traffic {#api-server-traffic}
+
 | Metric | Type | Labels |
 |---|---|---|
 | `rest_client_requests_total` | counter | `code`, `method`, `host` |
 
 ### Leader election {#leader-election}
+
 | Metric | Type | Labels |
 |---|---|---|
 | `leader_election_master_status` | gauge | `name` (= `d4ceba06.clickhouse.com`) |
 
 The Helm chart enables `--leader-elect` by default, so this metric is present in standard Helm installs. When running the binary directly without the flag, the metric is absent.
 
 ### Runtime {#runtime}
+
 Standard Go process and runtime collectors — `go_goroutines`, `go_memstats_*`, `process_cpu_seconds_total`, `process_resident_memory_bytes`, etc.
 
 ## Useful PromQL queries {#useful-promql-queries}
+
 ### Health overview
 
 ```promql
@@ -282,6 +296,7 @@ sum(leader_election_master_status{name="d4ceba06.clickhouse.com"})
 ```
 
 ## Suggested alerts {#suggested-alerts}
+
 Starting point for a PrometheusRule (tune thresholds for your environment):
 
 ```yaml
@@ -330,6 +345,7 @@ groups:
 The last rule is only meaningful when leader election is enabled.
 
 ## Verifying the setup {#verifying-the-setup}
+
 A quick end-to-end check, assuming the chart was installed in `clickhouse-operator-system`:
 
 ```bash
@@ -357,5 +373,6 @@ kubectl -n $NS run curl-metrics --rm -it --restart=Never \
 If the scrape returns metrics in the Prometheus exposition format, the endpoint and RBAC are correctly wired.
 
 ## Related guides {#related-guides}
+
 - [Installation](/products/kubernetes-operator/install/helm) — Helm values relevant to monitoring.
 - [Configuration](/products/kubernetes-operator/guides/configuration) — TLS configuration shared with the metrics server.
diff --git a/products/kubernetes-operator/guides/scaling.mdx b/products/kubernetes-operator/guides/scaling.mdx
@@ -1,5 +1,5 @@
 ---
-position: 4
+position: 5
 slug: /clickhouse-operator/guides/scaling
 title: Scaling clusters
 keywords: ['kubernetes', 'scaling', 'replicas', 'shards', 'keeper', 'quorum']
@@ -16,6 +16,7 @@ A `ClickHouseCluster` always needs a Keeper, referenced through the required `sp
 </Note>
 
 ## Scaling replicas {#scaling-replicas}
+
 `spec.replicas` sets the number of replicas in every shard. Each replica runs in its own StatefulSet named `<cluster>-clickhouse-<shard>-<replica>`, so a cluster with `shards: 2` and `replicas: 3` runs six StatefulSets.
 
 Raise or lower the count in place:
@@ -30,6 +31,7 @@ spec:
 On scale up the operator creates the new per-replica StatefulSets, waits for each pod to become ready, and then synchronizes the schema to the new replicas (see [Automatic schema sync](#automatic-schema-sync)). On scale down it removes the surplus StatefulSets and cleans up the stale replicated-database replica registrations the removed replicas left behind.
 
 ## Scaling shards {#scaling-shards}
+
 `spec.shards` sets the number of shards. Each new shard adds a full set of per-replica StatefulSets, and the operator creates one [PodDisruptionBudget per shard](/products/kubernetes-operator/guides/configuration#pod-disruption-budgets) so a disruption in one shard cannot count against another.
 
 ```yaml
@@ -41,6 +43,7 @@ spec:
 Each shard holds a distinct slice of the data, and the operator does not copy or move rows between shards. A `Distributed` table or an explicit routing scheme decides which shard a row lands on, so adding a shard gives new writes somewhere to land without touching the rows already stored in the existing shards.
 
 ## Automatic schema sync {#automatic-schema-sync}
+
 When `spec.settings.enableDatabaseSync` is `true` (the default), the operator keeps the schema aligned as the topology changes:
 
 - **On scale up** — once at least two replicas are ready, the operator replicates the database definitions to the newly created replicas, so a fresh replica joins with the same `Replicated` and integration databases as the rest of the cluster.
@@ -51,6 +54,7 @@ This covers `Replicated` databases and integration database engines. It does not
 Set `enableDatabaseSync: false` to turn the behavior off, for example when an external tool owns schema propagation. The operator then reports the `SchemaSyncDisabled` reason on the `SchemaInSync` condition.
 
 ## Conditions to watch {#scaling-conditions}
+
 Inspect progress on the Custom Resource while a scale operation runs:
 
 ```bash
@@ -72,6 +76,7 @@ kubectl get clickhousecluster sample -o yaml | sed -n '/conditions:/,/^[^ ]/p'
 A scale operation is complete when `ClusterSizeAligned` reports `UpToDate`, `SchemaInSync` reports `ReplicasInSync`, and `Ready` reports `AllShardsReady`.
 
 ## Scaling Keeper {#scaling-keeper}
+
 A `KeeperCluster` runs a RAFT quorum, so the operator changes its membership **one replica at a time** and only while the cluster is in a stable state. This protects the quorum: a `2F+1` cluster tolerates `F` members down, so a 3-node cluster keeps working with one member missing and a 5-node cluster with two.
 
 ```yaml