Example of viewing a trace (right panel) from a logs query (left panel)
@@ -262,4 +262,4 @@ This is a list of all macros available in the plugin:
| `$__timeInterval(columnName)` | Replaced by a function calculating the interval based on window size in seconds. | `toStartOfInterval(toDateTime(columnName), INTERVAL 20 second)` |
| `$__timeInterval_ms(columnName)` | Replaced by a function calculating the interval based on window size in milliseconds. | `toStartOfInterval(toDateTime64(columnName, 3), INTERVAL 20 millisecond)` |
| `$__interval_s` | Replaced by the dashboard interval in seconds. | `20` |
-| `$__conditionalAll(condition, $templateVar)` | Replaced by the first parameter when the template variable in the second parameter does not select every value. Replaced by the 1=1 when the template variable selects every value. | `condition` or `1=1` |
+| `$__conditionalAll(condition, $templateVar)` | Replaced by the first parameter when the template variable in the second parameter doesn't select every value. Replaced by the 1=1 when the template variable selects every value. | `condition` or `1=1` |
diff --git a/docs/integrations/data-visualization/looker-and-clickhouse.md b/docs/integrations/data-visualization/looker-and-clickhouse.md
index 9fba107b3b8..729d67a104b 100644
--- a/docs/integrations/data-visualization/looker-and-clickhouse.md
+++ b/docs/integrations/data-visualization/looker-and-clickhouse.md
@@ -39,7 +39,7 @@ Choose a name for your data source, and select `ClickHouse` from the dialect dro
-If you are using ClickHouse Cloud or your deployment requires SSL, make sure you have SSL turned on in the additional settings.
+If you're using ClickHouse Cloud or your deployment requires SSL, make sure you have SSL turned on in the additional settings.
@@ -54,7 +54,7 @@ Now you should be able to attach ClickHouse Datasource to your Looker project.
## 3. Known limitations {#3-known-limitations}
1. The following data types are handled as strings by default:
- * Array - serialization does not work as expected due to the JDBC driver limitations
+ * Array - serialization doesn't work as expected due to the JDBC driver limitations
* Decimal* - can be changed to number in the model
* LowCardinality(...) - can be changed to a proper type in the model
* Enum8, Enum16
@@ -69,5 +69,5 @@ Now you should be able to attach ClickHouse Datasource to your Looker project.
* Polygon
* Point
* Ring
-2. [Symmetric aggregate feature](https://cloud.google.com/looker/docs/reference/param-explore-symmetric-aggregates) is not supported
-3. [Full outer join](https://cloud.google.com/looker/docs/reference/param-explore-join-type#full_outer) is not yet implemented in the driver
+2. [Symmetric aggregate feature](https://cloud.google.com/looker/docs/reference/param-explore-symmetric-aggregates) isn't supported
+3. [Full outer join](https://cloud.google.com/looker/docs/reference/param-explore-join-type#full_outer) isn't yet implemented in the driver
diff --git a/docs/integrations/data-visualization/metabase-and-clickhouse.md b/docs/integrations/data-visualization/metabase-and-clickhouse.md
index c588332d812..9dc3c4c3901 100644
--- a/docs/integrations/data-visualization/metabase-and-clickhouse.md
+++ b/docs/integrations/data-visualization/metabase-and-clickhouse.md
@@ -38,7 +38,7 @@ In this guide you will ask some questions of your ClickHouse data with Metabase
:::tip Add some data
-If you do not have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
+If you don't have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
:::
## 1. Gather your connection details {#1-gather-your-connection-details}
@@ -46,7 +46,7 @@ If you do not have a dataset to work with you can add one of the examples. This
## 2. Download the ClickHouse plugin for Metabase {#2--download-the-clickhouse-plugin-for-metabase}
-1. If you do not have a `plugins` folder, create one as a subfolder of where you have `metabase.jar` saved.
+1. If you don't have a `plugins` folder, create one as a subfolder of where you have `metabase.jar` saved.
2. The plugin is a JAR file named `clickhouse.metabase-driver.jar`. Download the latest version of the JAR file at
https://github.com/clickhouse/metabase-clickhouse-driver/releases/latest
diff --git a/docs/integrations/data-visualization/powerbi-and-clickhouse.md b/docs/integrations/data-visualization/powerbi-and-clickhouse.md
index d1856d706ad..ed932d1a033 100644
--- a/docs/integrations/data-visualization/powerbi-and-clickhouse.md
+++ b/docs/integrations/data-visualization/powerbi-and-clickhouse.md
@@ -188,7 +188,7 @@ Fill in the connection details.
:::note
-If you are using a deployment that has SSL enabled (e.g. ClickHouse Cloud or a self-managed instance), in the `SSLMode` field you should supply `require`.
+If you're using a deployment that has SSL enabled (e.g. ClickHouse Cloud or a self-managed instance), in the `SSLMode` field you should supply `require`.
- `Host` should always have the protocol (i.e. `http://` or `https://`) omitted.
- `Timeout` is an integer representing seconds. Default value: `30 seconds`.
@@ -215,7 +215,7 @@ Select your previously created data source from the list.
:::note
-If you did not specify credentials during the data source creation, you will be prompted to specify username and password.
+If you didn't specify credentials during the data source creation, you will be prompted to specify username and password.
:::
diff --git a/docs/integrations/data-visualization/splunk-and-clickhouse.md b/docs/integrations/data-visualization/splunk-and-clickhouse.md
index 4adb4b619c5..6755b579a95 100644
--- a/docs/integrations/data-visualization/splunk-and-clickhouse.md
+++ b/docs/integrations/data-visualization/splunk-and-clickhouse.md
@@ -34,13 +34,13 @@ Looking to store ClickHouse audit logs to Splunk? Follow the ["Storing ClickHous
Splunk is a popular technology for security and observability. It is also a powerful search and dashboarding engine. There are hundreds of Splunk apps available to address different use cases.
-For ClickHouse specifically, we are leveraging the [Splunk DB Connect App](https://splunkbase.splunk.com/app/2686) which has a simple integration to the highly performant ClickHouse JDBC driver to query tables in ClickHouse directly.
+For ClickHouse specifically, we're leveraging the [Splunk DB Connect App](https://splunkbase.splunk.com/app/2686) which has a simple integration to the highly performant ClickHouse JDBC driver to query tables in ClickHouse directly.
-The ideal use case for this integration is when you are using ClickHouse for large data sources such as NetFlow, Avro or Protobuf binary data, DNS, VPC flow logs, and other OTEL logs that can be shared with your team on Splunk to search and create dashboards. By using this approach, the data is not ingested into the Splunk index layer and is simply queried directly from ClickHouse similarly to other visualization integrations such as [Metabase](https://www.metabase.com/) or [Superset](https://superset.apache.org/).
+The ideal use case for this integration is when you're using ClickHouse for large data sources such as NetFlow, Avro or Protobuf binary data, DNS, VPC flow logs, and other OTEL logs that can be shared with your team on Splunk to search and create dashboards. By using this approach, the data isn't ingested into the Splunk index layer and is simply queried directly from ClickHouse similarly to other visualization integrations such as [Metabase](https://www.metabase.com/) or [Superset](https://superset.apache.org/).
## Goal {#goal}
-In this guide, we will use the ClickHouse JDBC driver to connect ClickHouse to Splunk. We will install a local version of Splunk Enterprise but we are not indexing any data. Instead, we are using the search functions through the DB Connect query engine.
+In this guide, we will use the ClickHouse JDBC driver to connect ClickHouse to Splunk. We will install a local version of Splunk Enterprise but we're not indexing any data. Instead, we're using the search functions through the DB Connect query engine.
With this guide, you will be able to create a dashboard connected to ClickHouse similar to this:
@@ -133,7 +133,7 @@ If you receive an error, make sure that you have added the IP address of your Sp
We will now run a SQL query to test that everything works.
-Select your connection details in the SQL Explorer from the DataLab section of the DB Connect App. We are using the `trips` table for this demo:
+Select your connection details in the SQL Explorer from the DataLab section of the DB Connect App. We're using the `trips` table for this demo:
diff --git a/docs/integrations/data-visualization/superset-and-clickhouse.md b/docs/integrations/data-visualization/superset-and-clickhouse.md
index 224bb5aac53..d6b9e4504b8 100644
--- a/docs/integrations/data-visualization/superset-and-clickhouse.md
+++ b/docs/integrations/data-visualization/superset-and-clickhouse.md
@@ -42,7 +42,7 @@ In this guide you will build a dashboard in Superset with data from a ClickHouse
:::tip Add some data
-If you do not have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
+If you don't have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
:::
## 1. Gather your connection details {#1-gather-your-connection-details}
@@ -93,11 +93,11 @@ If you do not have a dataset to work with you can add one of the examples. This
-3. Click the **ADD** button at the bottom of the dialog window and your table appears in the list of datasets. You are ready to build a dashboard and analyze your ClickHouse data!
+3. Click the **ADD** button at the bottom of the dialog window and your table appears in the list of datasets. You're ready to build a dashboard and analyze your ClickHouse data!
## 5. Creating charts and a dashboard in Superset {#5--creating-charts-and-a-dashboard-in-superset}
-If you are familiar with Superset, then you will feel right at home with this next section. If you are new to Superset, well...it's like a lot of the other cool visualization tools out there in the world - it doesn't take long to get started, but the details and nuances get learned over time as you use the tool.
+If you're familiar with Superset, then you will feel right at home with this next section. If you're new to Superset, well...it's like a lot of the other cool visualization tools out there in the world - it doesn't take long to get started, but the details and nuances get learned over time as you use the tool.
1. You start with a dashboard. From the top menu in Superset, select **Dashboards**. Click the button in the upper-right to add a new dashboard. The following dashboard is named **UK property prices**:
diff --git a/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md b/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md
index 881c602a8f0..3ac210e6fda 100644
--- a/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md
+++ b/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md
@@ -17,7 +17,7 @@ integration:
- In Live mode the MEDIAN() and PERCENTILE() functions (since connector v0.1.3 release) use the [ClickHouse quantile()() function](/sql-reference/aggregate-functions/reference/quantile/), which significantly speeds up the calculation, but uses sampling. If you want to get accurate calculation results, then use functions `MEDIAN_EXACT()` and `PERCENTILE_EXACT()` (based on [quantileExact()()](/sql-reference/aggregate-functions/reference/quantileexact/)).
- In Extract mode you can't use MEDIAN_EXACT() and PERCENTILE_EXACT() because MEDIAN() and PERCENTILE() are always accurate (and slow).
## Additional functions for calculated fields in Live mode {#additional-functions-for-calculated-fields-in-live-mode}
-ClickHouse has a huge number of functions that can be used for data analysis — much more than Tableau supports. For the convenience of users, we have added new functions that are available for use in Live mode when creating Calculated Fields. Unfortunately, it is not possible to add descriptions to these functions in the Tableau interface, so we will add a description for them right here.
+ClickHouse has a huge number of functions that can be used for data analysis — much more than Tableau supports. For the convenience of users, we have added new functions that are available for use in Live mode when creating Calculated Fields. Unfortunately, it isn't possible to add descriptions to these functions in the Tableau interface, so we will add a description for them right here.
- **[`-If` Aggregation Combinator](/sql-reference/aggregate-functions/combinators/#-if)** *(added in v0.2.3)* - allows to have Row-Level Filters right in the Aggregate Calculation. `SUM_IF(), AVG_IF(), COUNT_IF(), MIN_IF() & MAX_IF()` functions have been added.
- **`BAR([my_int], [min_val_int], [max_val_int], [bar_string_length_int])`** *(added in v0.2.1)* — Forget about boring bar charts! Use `BAR()` function instead (equivalent of [`bar()`](/sql-reference/functions/other-functions#bar) in ClickHouse). For example, this calculated field returns nice bars as String:
```text
@@ -44,7 +44,7 @@ ClickHouse has a huge number of functions that can be used for data analysis —
- **`KURTOSIS([my_number])`** — Computes the sample kurtosis of a sequence. Equivalent of [`kurtSamp()`](/sql-reference/aggregate-functions/reference/kurtsamp).
- **`KURTOSISP([my_number])`** — Computes the kurtosis of a sequence. The equivalent of [`kurtPop()`](/sql-reference/aggregate-functions/reference/kurtpop).
- **`MEDIAN_EXACT([my_number])`** *(added in v0.1.3)* — Exactly computes the median of a numeric data sequence. Equivalent of [`quantileExact(0.5)(...)`](/sql-reference/aggregate-functions/reference/quantileexact).
-- **`MOD([my_number_1], [my_number_2])`** — Calculates the remainder after division. If arguments are floating-point numbers, they are pre-converted to integers by dropping the decimal portion. Equivalent of [`modulo()`](/sql-reference/functions/arithmetic-functions/#modulo).
+- **`MOD([my_number_1], [my_number_2])`** — Calculates the remainder after division. If arguments are floating-point numbers, they're pre-converted to integers by dropping the decimal portion. Equivalent of [`modulo()`](/sql-reference/functions/arithmetic-functions/#modulo).
- **`PERCENTILE_EXACT([my_number], [level_float])`** *(added in v0.1.3)* — Exactly computes the percentile of a numeric data sequence. The recommended level range is [0.01, 0.99]. Equivalent of [`quantileExact()()`](/sql-reference/aggregate-functions/reference/quantileexact).
- **`PROPER([my_string])`** *(added in v0.2.5)* - Converts a text string so the first letter of each word is capitalized and the remaining letters are in lowercase. Spaces and non-alphanumeric characters such as punctuation also act as separators. For example:
```text
diff --git a/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md b/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md
index b051b07afb5..af2ddc8fc7b 100644
--- a/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md
+++ b/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md
@@ -120,7 +120,7 @@ could change, but for now you must use **default** as the database.)
-You are now ready to build some visualizations in Tableau!
+You're now ready to build some visualizations in Tableau!
## Building Visualizations in Tableau {#building-visualizations-in-tableau}
@@ -152,7 +152,7 @@ Now that we have a ClickHouse data source configured in Tableau, let's visualize
Not a very exciting line chart, but the dataset was generated by a script and built for testing query performance, so
-you will notice there is not a lot of variations in the simulated orders of the TCPD data.
+you will notice there isn't a lot of variations in the simulated orders of the TCPD data.
6. Suppose you want to know the average order amount (in dollars) by quarter and also by shipping mode (air, mail, ship,
truck, etc.):
diff --git a/docs/integrations/data-visualization/tableau/tableau-connection-tips.md b/docs/integrations/data-visualization/tableau/tableau-connection-tips.md
index 7c4d6846bf2..df262062ad9 100644
--- a/docs/integrations/data-visualization/tableau/tableau-connection-tips.md
+++ b/docs/integrations/data-visualization/tableau/tableau-connection-tips.md
@@ -31,7 +31,7 @@ SET my_setting=value;
In 99% of cases you don't need the Advanced tab, for the remaining 1% you can use the following settings:
- **Custom Connection Parameters**. By default, `socket_timeout` is already specified, this parameter may need to be changed if some extracts are updated for a very long time. The value of this parameter is specified in milliseconds. The rest of the parameters can be found [here](https://github.com/ClickHouse/clickhouse-jdbc/blob/master/clickhouse-client/src/main/java/com/clickhouse/client/config/ClickHouseClientOption.java), add them in this field separated by commas
- **JDBC Driver custom_http_params**. This field allows you to drop some parameters into the ClickHouse connection string by passing values to the [`custom_http_params` parameter of the driver](https://github.com/ClickHouse/clickhouse-jdbc#configuration). For example, this is how `session_id` is specified when the *Set Session ID* checkbox is activated
-- **JDBC Driver `typeMappings`**. This field allows you to [pass a list of ClickHouse data type mappings to Java data types used by the JDBC driver](https://github.com/ClickHouse/clickhouse-jdbc#configuration). The connector automatically displays large Integers as strings thanks to this parameter, you can change this by passing your mapping set *(I do not know why)* using
+- **JDBC Driver `typeMappings`**. This field allows you to [pass a list of ClickHouse data type mappings to Java data types used by the JDBC driver](https://github.com/ClickHouse/clickhouse-jdbc#configuration). The connector automatically displays large Integers as strings thanks to this parameter, you can change this by passing your mapping set *(I don't know why)* using
```text
UInt256=java.lang.Double,Int256=java.lang.Double
```
@@ -55,4 +55,4 @@ However, such fields are most often used to find the number of unique values *(I
```text
COUNTD([myUInt256]) // Works well too!
```
-When using the data preview (View data) of a table with UInt64 fields, an error does not appear now.
+When using the data preview (View data) of a table with UInt64 fields, an error doesn't appear now.
diff --git a/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md b/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md
index 39392fd2eae..0ac10b454c4 100644
--- a/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md
+++ b/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md
@@ -62,7 +62,7 @@ NB: if you want to use Tableau Online in combination with Tableau Desktop and sh
## Connecting Tableau Online to ClickHouse (cloud or on-premise setup with SSL) {#connecting-tableau-online-to-clickhouse-cloud-or-on-premise-setup-with-ssl}
-As it is not possible to provide the SSL certificates via the Tableau Online MySQL connection setup wizard,
+As it isn't possible to provide the SSL certificates via the Tableau Online MySQL connection setup wizard,
the only way is to use Tableau Desktop to set the connection up, and then export it to Tableau Online. This process is, however, pretty straightforward.
Run Tableau Desktop on a Windows or Mac machine, and select "Connect" -> "To a Server" -> "MySQL".
@@ -104,4 +104,4 @@ Finally, click "Publish", and your datasource with embedded credentials will be
## Known limitations (ClickHouse 23.11) {#known-limitations-clickhouse-2311}
-All the known limitations has been fixed in ClickHouse `23.11`. If you encounter any other incompatibilities, please do not hesitate to [contact us](https://clickhouse.com/company/contact) or create a [new issue](https://github.com/ClickHouse/ClickHouse/issues).
+All the known limitations has been fixed in ClickHouse `23.11`. If you encounter any other incompatibilities, please don't hesitate to [contact us](https://clickhouse.com/company/contact) or create a [new issue](https://github.com/ClickHouse/ClickHouse/issues).
diff --git a/docs/integrations/index.mdx b/docs/integrations/index.mdx
index aa44a96dbd4..6681926f6d8 100644
--- a/docs/integrations/index.mdx
+++ b/docs/integrations/index.mdx
@@ -18,7 +18,7 @@ ClickHouse integrations are organized by their support level:
| Tier | Description |
|------|-------------|
-| Core integrations | Built or maintained by ClickHouse, they are supported by ClickHouse and live in the ClickHouse GitHub organization |
+| Core integrations | Built or maintained by ClickHouse, they're supported by ClickHouse and live in the ClickHouse GitHub organization |
| Partner integrations | Built or maintained, and supported by, third-party software vendors |
| Community integrations | Built or maintained and supported by community members. No direct support is available besides the public GitHub repositories and community Slack channels |
diff --git a/docs/integrations/interfaces/http.md b/docs/integrations/interfaces/http.md
index ce74e574a3e..a005bc78ac3 100644
--- a/docs/integrations/interfaces/http.md
+++ b/docs/integrations/interfaces/http.md
@@ -92,7 +92,7 @@ curl 'http://localhost:8123/?query=SELECT%201'
```
In this example wget is used with the `-nv` (non-verbose) and `-O-` parameters to output the result to the terminal.
-In this case it is not necessary to use URL encoding for the space:
+In this case it isn't necessary to use URL encoding for the space:
```bash title="command"
wget -nv -O- 'http://localhost:8123/?query=SELECT 1'
@@ -125,7 +125,7 @@ X-ClickHouse-Exception-Tag: dngjzjnxkvlwkeua
```
As you can see, the `curl` command is somewhat inconvenient in that spaces must be URL escaped.
-Although `wget` escapes everything itself, we do not recommend using it because it does not work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
+Although `wget` escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
```bash
$ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
@@ -277,7 +277,7 @@ To delete the table:
$ echo 'DROP TABLE t' | curl 'http://localhost:8123/' --data-binary @-
```
-For successful requests that do not return a data table, an empty response body is returned.
+For successful requests that don't return a data table, an empty response body is returned.
## Compression {#compression}
@@ -376,7 +376,7 @@ echo 'SELECT 1' | curl 'http://user:password@localhost:8123/' -d @-
2. In the `user` and `password` URL parameters
:::warning
-We do not recommend using this method as the parameter might be logged by web proxy and cached in the browser
+We don't recommend using this method as the parameter might be logged by web proxy and cached in the browser
:::
For example:
@@ -393,7 +393,7 @@ For example:
echo 'SELECT 1' | curl -H 'X-ClickHouse-User: user' -H 'X-ClickHouse-Key: password' 'http://localhost:8123/' -d @-
```
-If the user name is not specified, then the `default` name is used. If the password is not specified, then an empty password is used.
+If the user name isn't specified, then the `default` name is used. If the password isn't specified, then an empty password is used.
You can also use the URL parameters to specify any settings for processing a single query or entire profiles of settings.
For example:
@@ -450,7 +450,7 @@ The possible header fields are:
| `elapsed_ns` | Query runtime in nanoseconds.|
| `memory_usage` | Memory in bytes used by the query. (**Available from v25.11**) |
-Running requests do not stop automatically if the HTTP connection is lost. Parsing and data formatting are performed on the server-side, and using the network might be ineffective.
+Running requests don't stop automatically if the HTTP connection is lost. Parsing and data formatting are performed on the server-side, and using the network might be ineffective.
The following optional parameters exist:
@@ -473,7 +473,7 @@ The following settings can be used:
`buffer_size` determines the number of bytes in the result to buffer in the server memory. If a result body is larger than this threshold, the buffer is written to the HTTP channel, and the remaining data is sent directly to the HTTP channel.
-To ensure that the entire response is buffered, set `wait_end_of_query=1`. In this case, the data that is not stored in memory will be buffered in a temporary server file.
+To ensure that the entire response is buffered, set `wait_end_of_query=1`. In this case, the data that isn't stored in memory will be buffered in a temporary server file.
For example:
@@ -490,7 +490,7 @@ Use buffering to avoid situations where a query processing error occurred after
This feature was added in ClickHouse 24.4.
In specific scenarios, setting the granted role first might be required before executing the statement itself.
-However, it is not possible to send `SET ROLE` and the statement together, as multi-statements are not allowed:
+However, it isn't possible to send `SET ROLE` and the statement together, as multi-statements aren't allowed:
```bash
curl -sS "http://localhost:8123" --data-binary "SET ROLE my_role;SELECT * FROM my_table;"
@@ -520,7 +520,7 @@ In this case, `?role=my_role&role=my_other_role` works similarly to executing `S
## HTTP response codes caveats {#http_response_codes_caveats}
-Because of limitations of the HTTP protocol, a HTTP 200 response code does not guarantee that a query was successful.
+Because of limitations of the HTTP protocol, a HTTP 200 response code doesn't guarantee that a query was successful.
Here is an example:
@@ -537,7 +537,7 @@ The reason for this behavior is the nature of the HTTP protocol. The HTTP header
This behavior is independent of the format used, whether it's `Native`, `TSV`, or `JSON`; the error message will always be in the middle of the response stream.
-You can mitigate this problem by enabling `wait_end_of_query=1` ([Response Buffering](#response-buffering)). In this case, sending of the HTTP header is delayed until the entire query is resolved. This however, does not completely solve the problem because the result must still fit within the [`http_response_buffer_size`](../../operations/settings/settings.md#http_response_buffer_size), and other settings like [`send_progress_in_http_headers`](../../operations/settings/settings.md#send_progress_in_http_headers) can interfere with the delay of the header.
+You can mitigate this problem by enabling `wait_end_of_query=1` ([Response Buffering](#response-buffering)). In this case, sending of the HTTP header is delayed until the entire query is resolved. This however, doesn't completely solve the problem because the result must still fit within the [`http_response_buffer_size`](../../operations/settings/settings.md#http_response_buffer_size), and other settings like [`send_progress_in_http_headers`](../../operations/settings/settings.md#send_progress_in_http_headers) can interfere with the delay of the header.
:::tip
The only way to catch all errors is to analyze the HTTP body before parsing it using the required format.
@@ -638,7 +638,7 @@ curl -sS "http://localhost:8123?param_arg1=abc%09123" -d "SELECT splitByChar('\t
Code: 457. DB::Exception: Value abc 123 cannot be parsed as String for query parameter 'arg1' because it isn't parsed completely: only 3 of 7 bytes was parsed: abc. (BAD_QUERY_PARAMETER) (version 23.4.1.869 (official build))
```
-If you are using URL parameters, you will need to encode the `\t` as `%5C%09`. For example:
+If you're using URL parameters, you will need to encode the `\t` as `%5C%09`. For example:
```bash
curl -sS "http://localhost:8123?param_arg1=abc%5C%09123" -d "SELECT splitByChar('\t', {arg1:String})"
@@ -737,20 +737,20 @@ Configuration options for `http_handlers` work as follows.
Each of these are discussed below:
- `method` is responsible for matching the method part of the HTTP request. `method` fully conforms to the definition of [`method`]
- (https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) in the HTTP protocol. It is an optional configuration. If it is not defined in the
- configuration file, it does not match the method portion of the HTTP request.
+ (https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) in the HTTP protocol. It is an optional configuration. If it isn't defined in the
+ configuration file, it doesn't match the method portion of the HTTP request.
- `url` is responsible for matching the URL part (path and query string) of the HTTP request.
If the `url` prefixed with `regex:` it expects [RE2](https://github.com/google/re2)'s regular expressions.
- It is an optional configuration. If it is not defined in the configuration file, it does not match the URL portion of the HTTP request.
+ It is an optional configuration. If it isn't defined in the configuration file, it doesn't match the URL portion of the HTTP request.
- `full_url` same as `url`, but, includes complete URL, i.e. `schema://host:port/path?query_string`.
- Note, ClickHouse does not support "virtual hosts", so the `host` is an IP address (and not the value of `Host` header).
+ Note, ClickHouse doesn't support "virtual hosts", so the `host` is an IP address (and not the value of `Host` header).
- `empty_query_string` - ensures that there is no query string (`?query_string`) in the request
- `headers` are responsible for matching the header part of the HTTP request. It is compatible with RE2's regular expressions. It is an optional
- configuration. If it is not defined in the configuration file, it does not match the header portion of the HTTP request.
+ configuration. If it isn't defined in the configuration file, it doesn't match the header portion of the HTTP request.
- `handler` contains the main processing part.
@@ -770,7 +770,7 @@ Each of these are discussed below:
- `response_content` — use with `static` type, response content sent to client, when using the prefix 'file://' or 'config://', find the content
from the file or configuration sends to client.
- `user` - user to execute the query from (default user is `default`).
- **Note**, you do not need to specify password for this user.
+ **Note**, you don't need to specify password for this user.
The configuration methods for different `type`s are discussed next.
@@ -856,7 +856,7 @@ In one `predefined_query_handler` only one `query` is supported.
In `dynamic_query_handler`, the query is written in the form of parameter of the HTTP request. The difference is that in `predefined_query_handler`, the query is written in the configuration file. `query_param_name` can be configured in `dynamic_query_handler`.
-ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the parameter is not passed in.
+ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the parameter isn't passed in.
To experiment with this functionality, the following example defines the values of [`max_threads`](../../operations/settings/settings.md#max_threads) and `max_final_threads` and `queries` whether the settings were set successfully.
diff --git a/docs/integrations/interfaces/mysql.md b/docs/integrations/interfaces/mysql.md
index af6e90d375e..6aa905ad455 100644
--- a/docs/integrations/interfaces/mysql.md
+++ b/docs/integrations/interfaces/mysql.md
@@ -16,24 +16,24 @@ import mysql3 from '@site/static/images/interfaces/mysql3.png';
# MySQL Interface
-ClickHouse supports the MySQL wire protocol. This allows certain clients that do not have native ClickHouse connectors leverage the MySQL protocol instead, and it has been validated with the following BI tools:
+ClickHouse supports the MySQL wire protocol. This allows certain clients that don't have native ClickHouse connectors leverage the MySQL protocol instead, and it has been validated with the following BI tools:
- [Looker Studio](../data-visualization/looker-studio-and-clickhouse.md)
- [Tableau Online](../integrations/tableau-online)
- [QuickSight](../integrations/quicksight)
-If you are trying other untested clients or integrations, keep in mind that there could be the following limitations:
+If you're trying other untested clients or integrations, keep in mind that there could be the following limitations:
- SSL implementation might not be fully compatible; there could be potential [TLS SNI](https://www.cloudflare.com/learning/ssl/what-is-sni/) issues.
-- A particular tool might require dialect features (e.g., MySQL-specific functions or settings) that are not implemented yet.
+- A particular tool might require dialect features (e.g., MySQL-specific functions or settings) that aren't implemented yet.
-If there is a native driver available (e.g., [DBeaver](../integrations/dbeaver)), it is always preferred to use it instead of the MySQL interface. Additionally, while most of the MySQL language clients should work fine, MySQL interface is not guaranteed to be a drop-in replacement for a codebase with existing MySQL queries.
+If there is a native driver available (e.g., [DBeaver](../integrations/dbeaver)), it is always preferred to use it instead of the MySQL interface. Additionally, while most of the MySQL language clients should work fine, MySQL interface isn't guaranteed to be a drop-in replacement for a codebase with existing MySQL queries.
-If your use case involves a particular tool that does not have a native ClickHouse driver, and you would like to use it via the MySQL interface and you found certain incompatibilities - please [create an issue](https://github.com/ClickHouse/ClickHouse/issues) in the ClickHouse repository.
+If your use case involves a particular tool that doesn't have a native ClickHouse driver, and you would like to use it via the MySQL interface and you found certain incompatibilities - please [create an issue](https://github.com/ClickHouse/ClickHouse/issues) in the ClickHouse repository.
::::note
To support the SQL dialect of above BI tools better, ClickHouse's MySQL interface implicitly runs SELECT queries with setting [prefer_column_name_to_alias = 1](/operations/settings/settings#prefer_column_name_to_alias).
-This cannot be turned off and it can lead in rare edge cases to different behavior between queries sent to ClickHouse's normal and MySQL query interfaces.
+This can't be turned off and it can lead in rare edge cases to different behavior between queries sent to ClickHouse's normal and MySQL query interfaces.
::::
## Enabling the MySQL Interface On ClickHouse Cloud {#enabling-the-mysql-interface-on-clickhouse-cloud}
@@ -163,7 +163,7 @@ If user password is specified using [SHA256](/sql-reference/functions/hash-funct
Restrictions:
-- prepared queries are not supported
+- prepared queries aren't supported
- some data types are sent as strings
diff --git a/docs/integrations/interfaces/postgresql.md b/docs/integrations/interfaces/postgresql.md
index 888b4fedb91..f32d414f4b5 100644
--- a/docs/integrations/interfaces/postgresql.md
+++ b/docs/integrations/interfaces/postgresql.md
@@ -17,7 +17,7 @@ import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
Check out our [Managed Postgres](/docs/cloud/managed-postgres) service. Backed by NVMe storage that is physically collocated with compute, it delivers up to 10x faster performance for workloads that are disk-bound compared to alternatives using network-attached storage like EBS and allows you to replicate your Postgres data to ClickHouse using the Postgres CDC connector in ClickPipes.
:::
-ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that is not already directly supported by ClickHouse (for example, Amazon Redshift).
+ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that isn't already directly supported by ClickHouse (for example, Amazon Redshift).
To enable the PostgreSQL wire protocol, add the [postgresql_port](/operations/server-configuration-parameters/settings#postgresql_port) setting to your server's configuration file. For example, you could define the port in a new XML file in your `config.d` folder:
@@ -48,7 +48,7 @@ psql -p 9005 -h 127.0.0.1 -U alice default
```
:::note
-The `psql` client requires a login with a password, so you will not be able connect using the `default` user with no password. Either assign a password to the `default` user, or login as a different user.
+The `psql` client requires a login with a password, so you won't be able connect using the `default` user with no password. Either assign a password to the `default` user, or login as a different user.
:::
The `psql` client prompts for the password:
diff --git a/docs/integrations/interfaces/prometheus.md b/docs/integrations/interfaces/prometheus.md
index b58934239b4..16c4dfdc419 100644
--- a/docs/integrations/interfaces/prometheus.md
+++ b/docs/integrations/interfaces/prometheus.md
@@ -12,7 +12,7 @@ doc_type: 'reference'
## Exposing metrics {#expose}
:::note
-If you are using ClickHouse Cloud, you can expose metrics to Prometheus using the [Prometheus Integration](/integrations/prometheus).
+If you're using ClickHouse Cloud, you can expose metrics to Prometheus using the [Prometheus Integration](/integrations/prometheus).
:::
ClickHouse can expose its own metrics for scraping from Prometheus:
diff --git a/docs/integrations/interfaces/ssh.md b/docs/integrations/interfaces/ssh.md
index 2b5f59e5f9a..40f0f63cc98 100644
--- a/docs/integrations/interfaces/ssh.md
+++ b/docs/integrations/interfaces/ssh.md
@@ -25,7 +25,7 @@ After creating a [database user identified by an SSH key](/knowledgebase/how-to-
CREATE USER abcuser IDENTIFIED WITH ssh_key BY KEY '
' TYPE 'ssh-ed25519';
```
-You are able to use this key to connect to a ClickHouse server. It will open a pseudoterminal (PTY) with an interactive session of clickhouse-client.
+You're able to use this key to connect to a ClickHouse server. It will open a pseudoterminal (PTY) with an interactive session of clickhouse-client.
```bash
> ssh -i ~/test_ssh/id_ed25519 abcuser@localhost -p 9022
@@ -83,7 +83,7 @@ ssh -o "StrictHostKeyChecking no" user@host
## Configuring embedded client {#configuring-embedded-client}
-You are able to pass options to an embedded client similar to the ordinary `clickhouse-client`, but with a few limitations.
+You're able to pass options to an embedded client similar to the ordinary `clickhouse-client`, but with a few limitations.
Since this is an SSH protocol, the only way to pass parameters to the target host is through environment variables.
For example setting the `format` can be done this way:
@@ -97,7 +97,7 @@ For example setting the `format` can be done this way:
└───┘
```
-You are able to change any user-level setting this way and additionally pass most of the ordinary `clickhouse-client` options (except ones which don't make sense in this setup.)
+You're able to change any user-level setting this way and additionally pass most of the ordinary `clickhouse-client` options (except ones which don't make sense in this setup.)
Important:
diff --git a/docs/integrations/interfaces/tcp.md b/docs/integrations/interfaces/tcp.md
index dfcd68595bf..c916c835c39 100644
--- a/docs/integrations/interfaces/tcp.md
+++ b/docs/integrations/interfaces/tcp.md
@@ -9,4 +9,4 @@ doc_type: 'reference'
# Native interface (TCP)
-The native protocol is used in the [command-line client](/interfaces/cli), for inter-server communication during distributed query processing, and also in other C++ programs. Unfortunately, native ClickHouse protocol does not have formal specification yet, but it can be reverse-engineered from ClickHouse source code (starting [around here](https://github.com/ClickHouse/ClickHouse/tree/master/src/Client)) and/or by intercepting and analyzing TCP traffic.
+The native protocol is used in the [command-line client](/interfaces/cli), for inter-server communication during distributed query processing, and also in other C++ programs. Unfortunately, native ClickHouse protocol doesn't have formal specification yet, but it can be reverse-engineered from ClickHouse source code (starting [around here](https://github.com/ClickHouse/ClickHouse/tree/master/src/Client)) and/or by intercepting and analyzing TCP traffic.
diff --git a/docs/integrations/language-clients/cpp.md b/docs/integrations/language-clients/cpp.md
index 800a2fc1a2a..c3eaa3b30a1 100644
--- a/docs/integrations/language-clients/cpp.md
+++ b/docs/integrations/language-clients/cpp.md
@@ -61,7 +61,7 @@ target_link_libraries(your-target PRIVATE clickhouse-cpp-lib)
### Setting the client object {#example-setup-client}
Create a `Client` instance to establish a connection to ClickHouse. The following example
-demonstrates connecting to a local ClickHouse instance, where no password is required and SSL is not
+demonstrates connecting to a local ClickHouse instance, where no password is required and SSL isn't
enabled.
```cpp
@@ -88,7 +88,7 @@ clickhouse::Client client{
### Creating tables and running queries without data {#example-create-table}
-To execute a query that does not return any data, such as creating tables, use the `Execute` method.
+To execute a query that doesn't return any data, such as creating tables, use the `Execute` method.
The same approach applies to other statements like `ALTER TABLE`, `DROP`, etc..
```cpp
diff --git a/docs/integrations/language-clients/csharp.md b/docs/integrations/language-clients/csharp.md
index 15d65a90886..8c97b07bb07 100644
--- a/docs/integrations/language-clients/csharp.md
+++ b/docs/integrations/language-clients/csharp.md
@@ -139,7 +139,7 @@ The `ClickHouseConnection` class normally allows for parallel operation (multipl
| Roles | `IReadOnlyList` | Empty | `Roles` | Comma-separated ClickHouse roles (e.g., `Roles=admin,reader`) |
:::note
-When using a connection string to set custom settings, use the `set_` prefix, e.g. "set_max_threads=4". When using a ClickHouseClientSettings object, do not use the `set_` prefix.
+When using a connection string to set custom settings, use the `set_` prefix, e.g. "set_max_threads=4". When using a ClickHouseClientSettings object, don't use the `set_` prefix.
For a full list of available settings, see [here](https://clickhouse.com/docs/operations/settings/settings).
:::
@@ -176,7 +176,7 @@ Choose **C#**. Connection details are displayed below.
-If you are using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
+If you're using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
Using a connection string:
@@ -290,7 +290,7 @@ Console.WriteLine($"Rows written: {bulkCopy.RowsWritten}");
* Column names can be optionally provided via `ColumnNames` property if source data has fewer columns than target table.
* Configurable parameters: `Columns`, `BatchSize`, `MaxDegreeOfParallelism`.
* Before copying, a `SELECT * FROM LIMIT 0` query is performed to get information about target table structure. Types of provided objects must reasonably match the target table.
-* Sessions are not compatible with parallel insertion. Connection passed to `ClickHouseBulkCopy` must have sessions disabled, or `MaxDegreeOfParallelism` must be set to `1`.
+* Sessions aren't compatible with parallel insertion. Connection passed to `ClickHouseBulkCopy` must have sessions disabled, or `MaxDegreeOfParallelism` must be set to `1`.
:::
---
@@ -364,7 +364,7 @@ Console.WriteLine($"QueryId: {command.QueryId}");
```
:::tip
-If you are overriding the `QueryId` parameter, you need to ensure its uniqueness for every call. A random GUID is a good choice.
+If you're overriding the `QueryId` parameter, you need to ensure its uniqueness for every call. A random GUID is a good choice.
:::
---
@@ -416,9 +416,9 @@ For additional practical usage examples, see the [examples directory](https://gi
`ClickHouse.Driver` uses `System.Net.Http.HttpClient` under the hood. `HttpClient` has a per-endpoint connection pool. As a consequence:
-* A `ClickHouseConnection` object does not have 1:1 mapping to TCP connections - multiple database sessions will be multiplexed through several TCP connections per server.
+* A `ClickHouseConnection` object doesn't have 1:1 mapping to TCP connections - multiple database sessions will be multiplexed through several TCP connections per server.
* `ClickHouseConnection` objects can be long-lived; the actual TCP connections underneath will be recycled by the connection pool.
-* Let `HttpClient` manage connection pooling internally. Do not pool `ClickHouseConnection` objects yourself.
+* Let `HttpClient` manage connection pooling internally. Don't pool `ClickHouseConnection` objects yourself.
* Connections can stay alive after `ClickHouseConnection` object was disposed.
* This behavior can be tweaked by passing a custom `HttpClientFactory` or `HttpClient` with custom `HttpClientHandler`.
@@ -467,7 +467,7 @@ settings.CustomSettings["wait_for_async_insert"] = 1; // Recommended: wait for f
| `wait_for_async_insert=0` | Insert returns immediately when data is buffered. No guarantee data will be persisted. | Only when data loss is acceptable |
:::warning
-With `wait_for_async_insert=0`, errors only surface during flush and cannot be traced back to the original insert. The client also provides no backpressure, risking server overload.
+With `wait_for_async_insert=0`, errors only surface during flush and can't be traced back to the original insert. The client also provides no backpressure, risking server overload.
:::
**Key settings:**
@@ -761,7 +761,7 @@ There is an important difference between HTTP parameter binding and bulk copy wh
**Bulk Copy** knows the target column's timezone and correctly interprets `Unspecified` values in that timezone.
-**HTTP Parameters** do not automatically know the column timezone. You must specify it in the parameter type hint:
+**HTTP Parameters** don't automatically know the column timezone. You must specify it in the parameter type hint:
```csharp
// CORRECT: Timezone in type hint
@@ -868,7 +868,7 @@ await bulkCopy.WriteToServerAsync(new[] { row1, row2 });
## Logging and diagnostics {#logging-and-diagnostics}
-The ClickHouse .NET client integrates with the `Microsoft.Extensions.Logging` abstractions to offer lightweight, opt-in logging. When enabled, the driver emits structured messages for connection lifecycle events, command execution, transport operations, and bulk copy uploads. Logging is entirely optional—applications that do not configure a logger continue to run without additional overhead.
+The ClickHouse .NET client integrates with the `Microsoft.Extensions.Logging` abstractions to offer lightweight, opt-in logging. When enabled, the driver emits structured messages for connection lifecycle events, command execution, transport operations, and bulk copy uploads. Logging is entirely optional—applications that don't configure a logger continue to run without additional overhead.
### Quick start {#logging-quick-start}
@@ -996,7 +996,7 @@ This will log:
### Debug mode: network tracing and diagnostics {#logging-debugmode}
-To help with diagnosing networking issues, the driver library includes a helper that enables low-level tracing of .NET networking internals. To enable it you must pass a LoggerFactory with the level set to Trace, and set EnableDebugMode to true (or manually enable it via the `ClickHouse.Driver.Diagnostic.TraceHelper` class). Events will be logged to the `ClickHouse.Driver.NetTrace` category. Warning: this will generate extremely verbose logs, and impact performance. It is not recommended to enable debug mode in production.
+To help with diagnosing networking issues, the driver library includes a helper that enables low-level tracing of .NET networking internals. To enable it you must pass a LoggerFactory with the level set to Trace, and set EnableDebugMode to true (or manually enable it via the `ClickHouse.Driver.Diagnostic.TraceHelper` class). Events will be logged to the `ClickHouse.Driver.NetTrace` category. Warning: this will generate extremely verbose logs, and impact performance. It isn't recommended to enable debug mode in production.
```csharp
var loggerFactory = LoggerFactory.Create(builder =>
@@ -1123,7 +1123,7 @@ await connection.OpenAsync();
:::note
Important considerations when providing a custom HttpClient
-- **Automatic decompression**: You must enable `AutomaticDecompression` if compression is not disabled (compression is enabled by default).
+- **Automatic decompression**: You must enable `AutomaticDecompression` if compression isn't disabled (compression is enabled by default).
- **Idle timeout**: Set `PooledConnectionIdleTimeout` smaller than the server's `keep_alive_timeout` (10 seconds for ClickHouse Cloud) to avoid connection errors from half-open connections.
:::
@@ -1131,7 +1131,7 @@ Important considerations when providing a custom HttpClient
### Dapper {#orm-support-dapper}
-`ClickHouse.Driver` can be used with Dapper, but anonymous objects are not supported.
+`ClickHouse.Driver` can be used with Dapper, but anonymous objects aren't supported.
**Working example:**
@@ -1219,7 +1219,7 @@ Entity Framework Core is currently not supported.
### AggregateFunction columns {#aggregatefunction-columns}
-Columns of type `AggregateFunction(...)` cannot be queried or inserted directly.
+Columns of type `AggregateFunction(...)` can't be queried or inserted directly.
To insert:
diff --git a/docs/integrations/language-clients/go/index.md b/docs/integrations/language-clients/go/index.md
index 5815d68b8ba..fabfefa0415 100644
--- a/docs/integrations/language-clients/go/index.md
+++ b/docs/integrations/language-clients/go/index.md
@@ -198,7 +198,7 @@ Both interfaces encode data using the [native format](/native-protocol/basics.md
## Installation {#installation}
-v1 of the driver is deprecated and will not reach feature updates or support for new ClickHouse types. You should migrate to v2, which offers superior performance.
+v1 of the driver is deprecated and won't reach feature updates or support for new ClickHouse types. You should migrate to v2, which offers superior performance.
To install the 2.x version of the client, add the package to your go.mod file:
@@ -251,7 +251,7 @@ The client is released independently of ClickHouse. 2.x represents the current m
The client supports:
-- All currently supported versions of ClickHouse as recorded [here](https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md). As ClickHouse versions are no longer supported they are also no longer actively tested against client releases.
+- All currently supported versions of ClickHouse as recorded [here](https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md). As ClickHouse versions are no longer supported they're also no longer actively tested against client releases.
- All versions of ClickHouse 2 years from the release date of the client. Note only LTS versions are actively tested.
#### Golang compatibility {#golang-compatibility}
@@ -267,7 +267,7 @@ All code examples for the ClickHouse Client API can be found [here](https://gith
### Connecting {#connecting}
-The following example, which returns the server version, demonstrates connecting to ClickHouse - assuming ClickHouse is not secured and accessible with the default user.
+The following example, which returns the server version, demonstrates connecting to ClickHouse - assuming ClickHouse isn't secured and accessible with the default user.
Note we use the default native port to connect.
@@ -402,7 +402,7 @@ fmt.Println(v.String())
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/ssl.go)
-This minimal `TLS.Config` is normally sufficient to connect to the secure native port (normally 9440) on a ClickHouse server. If the ClickHouse server does not have a valid certificate (expired, wrong hostname, not signed by a publicly recognized root Certificate Authority), `InsecureSkipVerify` can be true, but this is strongly discouraged.
+This minimal `TLS.Config` is normally sufficient to connect to the secure native port (normally 9440) on a ClickHouse server. If the ClickHouse server doesn't have a valid certificate (expired, wrong hostname, not signed by a publicly recognized root Certificate Authority), `InsecureSkipVerify` can be true, but this is strongly discouraged.
```go
conn, err := clickhouse.Open(&clickhouse.Options{
@@ -501,7 +501,7 @@ if err != nil {
### Execution {#execution}
-Arbitrary statements can be executed via the `Exec` method. This is useful for DDL and simple statements. It should not be used for larger inserts or query iterations.
+Arbitrary statements can be executed via the `Exec` method. This is useful for DDL and simple statements. It shouldn't be used for larger inserts or query iterations.
```go
conn.Exec(context.Background(), `DROP TABLE IF EXISTS example`)
@@ -586,7 +586,7 @@ return batch.Send()
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/batch.go)
-Recommendations for ClickHouse apply [here](/guides/inserting-data#best-practices-for-inserts). Batches should not be shared across go-routines - construct a separate batch per routine.
+Recommendations for ClickHouse apply [here](/guides/inserting-data#best-practices-for-inserts). Batches shouldn't be shared across go-routines - construct a separate batch per routine.
From the above example, note the need for variable types to align with the column type when appending rows. While the mapping is usually obvious, this interface tries to be flexible, and types will be converted provided no precision loss is incurred. For example, the following demonstrates inserting a string into a datetime64.
@@ -657,7 +657,7 @@ return rows.Err()
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/query_rows.go)
-Note in both cases, we are required to pass a pointer to the variables we wish to serialize the respective column values into. These must be passed in the order specified in the `SELECT` statement - by default, the order of column declaration will be used in the event of a `SELECT *` as shown above.
+Note in both cases, we're required to pass a pointer to the variables we wish to serialize the respective column values into. These must be passed in the order specified in the `SELECT` statement - by default, the order of column declaration will be used in the event of a `SELECT *` as shown above.
Similar to insertion, the Scan method requires the target variables to be of an appropriate type. This again aims to be flexible, with types converted where possible, provided no precision loss is possible, e.g., the above example shows a UUID column being read into a string variable. For a full list of supported go types for each Column type, see [Type Conversions](#type-conversions).
@@ -814,7 +814,7 @@ for i := 0; i < 1_000; i++ {
### Type conversions {#type-conversions}
-The client aims to be as flexible as possible concerning accepting variable types for both insertion and marshaling of responses. In most cases, an equivalent Golang type exists for a ClickHouse column type, e.g., [UInt64](/sql-reference/data-types/int-uint/) to [uint64](https://pkg.go.dev/builtin#uint64). These logical mappings should always be supported. You may wish to utilize variable types that can be inserted into columns or used to receive a response if the conversion of either the variable or received data takes place first. The client aims to support these conversions transparently, so users do not need to convert their data to align precisely before insertion and to provide flexible marshaling at query time. This transparent conversion does not allow for precision loss. For example, a uint32 cannot be used to receive data from a UInt64 column. Conversely, a string can be inserted into a datetime64 field provided it meets the format requirements.
+The client aims to be as flexible as possible concerning accepting variable types for both insertion and marshaling of responses. In most cases, an equivalent Golang type exists for a ClickHouse column type, e.g., [UInt64](/sql-reference/data-types/int-uint/) to [uint64](https://pkg.go.dev/builtin#uint64). These logical mappings should always be supported. You may wish to utilize variable types that can be inserted into columns or used to receive a response if the conversion of either the variable or received data takes place first. The client aims to support these conversions transparently, so users don't need to convert their data to align precisely before insertion and to provide flexible marshaling at query time. This transparent conversion doesn't allow for precision loss. For example, a uint32 can't be used to receive data from a UInt64 column. Conversely, a string can be inserted into a datetime64 field provided it meets the format requirements.
The type conversions currently supported for primitive types are captured [here](https://github.com/ClickHouse/clickhouse-go/blob/main/TYPES.md).
@@ -832,7 +832,7 @@ Handling of timezone information depends on the ClickHouse type and whether the
* At **insert** time the value is sent to ClickHouse in UNIX timestamp format. If no time zone is provided, the client will assume the client's local time zone. `time.Time{}` or `sql.NullTime` will be converted to epoch accordingly.
* At **select** time the timezone of the column will be used if set when returning a `time.Time` value. If not, the timezone of the server will be used.
* **Date/Date32**
- * At **insert** time, the timezone of any date is considered when converting the date to a unix timestamp, i.e., it will be offset by the timezone prior to storage as a date, as Date types have no locale in ClickHouse. If this is not specified in a string value, the local timezone will be used.
+ * At **insert** time, the timezone of any date is considered when converting the date to a unix timestamp, i.e., it will be offset by the timezone prior to storage as a date, as Date types have no locale in ClickHouse. If this isn't specified in a string value, the local timezone will be used.
* At **select** time, dates are scanned into `time.Time{}` or `sql.NullTime{}` instances will be returned without timezone information.
#### Array {#array}
@@ -1108,7 +1108,7 @@ rows.Close()
[Full Example - `flatten_tested=0`](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/nested.go#L28-L118)
-If the default value of 1 is used for `flatten_nested`, nested columns are flattened to separate arrays. This requires using nested slices for insertion and retrieval. While arbitrary levels of nesting may work, this is not officially supported.
+If the default value of 1 is used for `flatten_nested`, nested columns are flattened to separate arrays. This requires using nested slices for insertion and retrieval. While arbitrary levels of nesting may work, this isn't officially supported.
```go
conn, err := GetNativeConnection(nil, nil, nil)
@@ -1312,7 +1312,7 @@ if err = conn.QueryRow(ctx, "SELECT * FROM example").Scan(&col1, &col2); err !=
Due to Go's lack of a built-in Decimal type, we recommend using the third-party package [github.com/shopspring/decimal](https://github.com/shopspring/decimal) to work with Decimal types natively without modifying your original queries.
:::note
-You may be tempted to use Float instead to avoid third-party dependencies. However, be aware that [Float types in ClickHouse are not recommended when accurate values are required](https://clickhouse.com/docs/sql-reference/data-types/float).
+You may be tempted to use Float instead to avoid third-party dependencies. However, be aware that [Float types in ClickHouse aren't recommended when accurate values are required](https://clickhouse.com/docs/sql-reference/data-types/float).
If you still choose to use Go's built-in Float type on the client side, you must explicitly convert Decimal to Float using the [toFloat64() function](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#toFloat64) or [its variants](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#toFloat64OrZero) in your ClickHouse queries. Be aware that this conversion may result in loss of precision.
:::
@@ -1744,7 +1744,7 @@ rows.Close()
### Dynamic scanning {#dynamic-scanning}
-You may need to read tables for which they do not know the schema or type of the fields being returned. This is common in cases where ad-hoc data analysis is performed or generic tooling is written. To achieve this, column-type information is available on query responses. This can be used with Go reflection to create runtime instances of correctly typed variables which can be passed to Scan.
+You may need to read tables for which they don't know the schema or type of the fields being returned. This is common in cases where ad-hoc data analysis is performed or generic tooling is written. To achieve this, column-type information is available on query responses. This can be used with Go reflection to create runtime instances of correctly typed variables which can be passed to Scan.
```go
const query = `
@@ -1883,7 +1883,7 @@ Full details on exploiting tracing can be found under [OpenTelemetry support](/o
## Database/SQL API {#databasesql-api}
-The `database/sql` or "standard" API allows you to use the client in scenarios where application code should be agnostic of the underlying databases by conforming to a standard interface. This comes at some expense - additional layers of abstraction and indirection and primitives which are not necessarily aligned with ClickHouse. These costs are, however, typically acceptable in scenarios where tooling needs to connect to multiple databases.
+The `database/sql` or "standard" API allows you to use the client in scenarios where application code should be agnostic of the underlying databases by conforming to a standard interface. This comes at some expense - additional layers of abstraction and indirection and primitives which aren't necessarily aligned with ClickHouse. These costs are, however, typically acceptable in scenarios where tooling needs to connect to multiple databases.
Additionally, this client supports using HTTP as the transport layer - data will still be encoded in the native format for optimal performance.
@@ -1893,7 +1893,7 @@ Full code examples for the standard API can be found [here](https://github.com/C
### Connecting {#connecting-1}
-Connection can be achieved either via a DSN string with the format `clickhouse://:?=` and `Open` method or via the `clickhouse.OpenDB` method. The latter is not part of the `database/sql` specification but returns a `sql.DB` instance. This method provides functionality such as profiling, for which there are no obvious means of exposing through the `database/sql` specification.
+Connection can be achieved either via a DSN string with the format `clickhouse://:?=` and `Open` method or via the `clickhouse.OpenDB` method. The latter isn't part of the `database/sql` specification but returns a `sql.DB` instance. This method provides functionality such as profiling, for which there are no obvious means of exposing through the `database/sql` specification.
```go
func Connect() error {
@@ -2168,7 +2168,7 @@ _, err = conn.Exec("INSERT INTO example VALUES (1, 'test-1')")
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/std/exec.go)
-This method does not support receiving a context - by default, it executes with the background context. You can use `ExecContext` if this is needed - see [Using Context](#using-context).
+This method doesn't support receiving a context - by default, it executes with the background context. You can use `ExecContext` if this is needed - see [Using Context](#using-context).
### Batch Insert {#batch-insert-1}
@@ -2305,7 +2305,7 @@ Unless stated, complex type handling should be the same as the [ClickHouse API](
#### Maps {#maps}
-Unlike the ClickHouse API, the standard API requires maps to be strongly typed at scan type. For example, you cannot pass a `map[string]interface{}` for a `Map(String,String)` field and must use a `map[string]string` instead. An `interface{}` variable will always be compatible and can be used for more complex structures. Structs are not supported at read time.
+Unlike the ClickHouse API, the standard API requires maps to be strongly typed at scan type. For example, you can't pass a `map[string]interface{}` for a `Map(String,String)` field and must use a `map[string]string` instead. An `interface{}` variable will always be compatible and can be used for more complex structures. Structs aren't supported at read time.
```go
var (
@@ -2579,7 +2579,7 @@ if err := rows.Err(); err != nil {
### Dynamic scanning {#dynamic-scanning-1}
-Similar to the [ClickHouse API](#dynamic-scanning), column type information is available to allow you to create runtime instances of correctly typed variables which can be passed to Scan. This allows columns to be read where the type is not known.
+Similar to the [ClickHouse API](#dynamic-scanning), column type information is available to allow you to create runtime instances of correctly typed variables which can be passed to Scan. This allows columns to be read where the type isn't known.
```go
const query = `
@@ -2695,7 +2695,7 @@ fmt.Printf("external_table_1 UNION external_table_2: %d\n", count)
### Open telemetry {#open-telemetry-1}
-ClickHouse allows a [trace context](/operations/opentelemetry/) to be passed as part of the native protocol. The client allows a Span to be created via the function `clickhouse.withSpan` and passed via the Context to achieve this. This is not supported when HTTP is used as transport.
+ClickHouse allows a [trace context](/operations/opentelemetry/) to be passed as part of the native protocol. The client allows a Span to be created via the function `clickhouse.withSpan` and passed via the Context to achieve this. This isn't supported when HTTP is used as transport.
```go
var count uint64
diff --git a/docs/integrations/language-clients/java/client/client.mdx b/docs/integrations/language-clients/java/client/client.mdx
index 0dff8f9721c..7f6c4a71d4c 100644
--- a/docs/integrations/language-clients/java/client/client.mdx
+++ b/docs/integrations/language-clients/java/client/client.mdx
@@ -114,7 +114,7 @@ Client client = new Client.Builder()
```
:::note
-SSL Authentication may be hard to troubleshoot on production because many errors from SSL libraries provide not enough information. For example, if client certificate and key do not match then server will terminate connection immediately (in case of HTTP it will be connection initiation stage where no HTTP requests are send so no response is sent).
+SSL Authentication may be hard to troubleshoot on production because many errors from SSL libraries provide not enough information. For example, if client certificate and key don't match then server will terminate connection immediately (in case of HTTP it will be connection initiation stage where no HTTP requests are send so no response is sent).
Please use tools like [openssl](https://docs.openssl.org/master/man1/openssl/) to verify certificates and keys:
- check key integrity: `openssl rsa -in [key-file.key] -check -noout`
@@ -127,7 +127,7 @@ Please use tools like [openssl](https://docs.openssl.org/master/man1/openssl/) t
## Configuration {#configuration}
All settings are defined by instance methods (a.k.a configuration methods) that make the scope and context of each value clear.
-Major configuration parameters are defined in one scope (client or operation) and do not override each other.
+Major configuration parameters are defined in one scope (client or operation) and don't override each other.
Configuration is defined during client creation. See `com.clickhouse.client.api.Client.Builder`.
@@ -499,14 +499,14 @@ Configuration options for insert operations.
| `serverSetting(String name, String value)` | Sets individual server settings for an operation. |
| `serverSetting(String name, Collection values)` | Sets individual server settings with multiple values for an operation. Items of the collection should be `String` values. |
| `setDBRoles(Collection dbRoles)` | Sets DB roles to be set before executing an operation. Items of the collection should be `String` values. |
-| `setOption(String option, Object value)` | Sets a configuration option in raw format. This is not a server setting. |
+| `setOption(String option, Object value)` | Sets a configuration option in raw format. This isn't a server setting. |
### InsertResponse {#insertresponse}
Response object that holds result of insert operation. It is only available if the client got response from a server.
:::note
-This object should be closed as soon as possible to release a connection because the connection cannot be re-used until all data of previous response is fully read.
+This object should be closed as soon as possible to release a connection because the connection can't be re-used until all data of previous response is fully read.
:::
| Method | Description |
@@ -659,21 +659,21 @@ Configuration options for query operations.
|----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| `setQueryId(String queryId)` | Sets query ID that will be assigned to the operation. |
| `setFormat(ClickHouseFormat format)` | Sets response format. See `RowBinaryWithNamesAndTypes` for the full list. |
-| `setMaxExecutionTime(Integer maxExecutionTime)` | Sets operation execution time on server. Will not affect read timeout. |
+| `setMaxExecutionTime(Integer maxExecutionTime)` | Sets operation execution time on server. Won't affect read timeout. |
| `waitEndOfQuery(Boolean waitEndOfQuery)` | Requests the server to wait for the end of the query before sending a response. |
| `setUseServerTimeZone(Boolean useServerTimeZone)` | Server timezone (see client config) will be used to parse date/time types in the result of an operation. Default `false`. |
| `setUseTimeZone(String timeZone)` | Requests server to use `timeZone` for time conversion. See [session_timezone](/operations/settings/settings#session_timezone). |
| `serverSetting(String name, String value)` | Sets individual server settings for an operation. |
| `serverSetting(String name, Collection values)` | Sets individual server settings with multiple values for an operation. Items of the collection should be `String` values. |
| `setDBRoles(Collection dbRoles)` | Sets DB roles to be set before executing an operation. Items of the collection should be `String` values. |
-| `setOption(String option, Object value)` | Sets a configuration option in raw format. This is not a server setting. |
+| `setOption(String option, Object value)` | Sets a configuration option in raw format. This isn't a server setting. |
### QueryResponse {#queryresponse}
Response object that holds result of query execution. It is only available if the client got a response from a server.
:::note
-This object should be closed as soon as possible to release a connection because the connection cannot be re-used until all data of previous response is fully read.
+This object should be closed as soon as possible to release a connection because the connection can't be re-used until all data of previous response is fully read.
:::
| Method | Description |
@@ -1374,7 +1374,7 @@ Here is a list of options to configure load balancing:
| Property | Default | Description |
|-----------------------|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| load_balancing_policy | `""` | The load-balancing policy can be one of: `firstAlive` - request is sent to the first healthy node from the managed node list`random` - request is sent to a random node from the managed node list `roundRobin` - request is sent to each node from the managed node list, in turn.full qualified class name implementing `ClickHouseLoadBalancingPolicy` - custom load balancing policyIf it is not specified the request is sent to the first node from the managed node list |
+| load_balancing_policy | `""` | The load-balancing policy can be one of: `firstAlive` - request is sent to the first healthy node from the managed node list`random` - request is sent to a random node from the managed node list `roundRobin` - request is sent to each node from the managed node list, in turn.full qualified class name implementing `ClickHouseLoadBalancingPolicy` - custom load balancing policyIf it isn't specified the request is sent to the first node from the managed node list |
| load_balancing_tags | `""` | Load balancing tags for filtering out nodes. Requests are sent only to nodes that have the specified tags |
| health_check_interval | `0` | Health check interval in milliseconds, zero or negative value means one-time. |
| health_check_method | `ClickHouseHealthCheckMethod.SELECT_ONE` | Health check method. Can be one of: `ClickHouseHealthCheckMethod.SELECT_ONE` - check with `select 1` query `ClickHouseHealthCheckMethod.PING` - protocol-specific check, which is generally faster |
diff --git a/docs/integrations/language-clients/java/index.md b/docs/integrations/language-clients/java/index.md
index 3d0cecff088..1ad78306519 100644
--- a/docs/integrations/language-clients/java/index.md
+++ b/docs/integrations/language-clients/java/index.md
@@ -84,7 +84,7 @@ Java Client was developed far back in 2015. Its codebase became very hard to mai
[ClickHouse Data Types](/sql-reference/data-types)
:::note
-- AggregatedFunction - :warning: does not support `SELECT * FROM table ...`
+- AggregatedFunction - :warning: doesn't support `SELECT * FROM table ...`
- Decimal - `SET output_format_decimal_trailing_zeros=1` in 21.9+ for consistency
- Enum - can be treated as both string and integer
- UInt64 - mapped to `long` in client-v1
@@ -129,7 +129,7 @@ JDBC Drive inherits same features as underlying client implementation. Other JDB
### Logging {#logging}
Our Java language client uses [SLF4J](https://www.slf4j.org/) for logging. You can use any SLF4J-compatible logging framework, such as `Logback` or `Log4j`.
-For example, if you are using Maven you could add the following dependency to your `pom.xml` file:
+For example, if you're using Maven you could add the following dependency to your `pom.xml` file:
```xml title="pom.xml"
@@ -158,7 +158,7 @@ For example, if you are using Maven you could add the following dependency to yo
#### Configuring logging {#configuring-logging}
-This is going to depend on the logging framework you are using. For example, if you are using `Logback`, you could configure logging in a file called `logback.xml`:
+This is going to depend on the logging framework you're using. For example, if you're using `Logback`, you could configure logging in a file called `logback.xml`:
```xml title="logback.xml"
diff --git a/docs/integrations/language-clients/java/jdbc/jdbc.mdx b/docs/integrations/language-clients/java/jdbc/jdbc.mdx
index 7f778be1d37..ef484c25afa 100644
--- a/docs/integrations/language-clients/java/jdbc/jdbc.mdx
+++ b/docs/integrations/language-clients/java/jdbc/jdbc.mdx
@@ -91,14 +91,14 @@ You can use the old JDBC implementation by setting the `clickhouse.jdbc.v1` prop
There are a few things to note about the URL syntax:
- **only** one endpoint is allowed in the URL
-- protocol should be specified when it is not the default one - 'HTTP'
-- port should be specified when it is not the default one '8123'
-- driver do not guess the protocol from the port, you need to specify it explicitly
-- `ssl` parameter is not required when protocol is specified.
+- protocol should be specified when it isn't the default one - 'HTTP'
+- port should be specified when it isn't the default one '8123'
+- driver don't guess the protocol from the port, you need to specify it explicitly
+- `ssl` parameter isn't required when protocol is specified.
### Connection Properties
Main configuration parameters are defined in the [java client](/integrations/language-clients/java/client#client-configuration). They should be passed
-as is to the driver. Driver has some own properties that are not part of the client configuration they are listed below.
+as is to the driver. Driver has some own properties that aren't part of the client configuration they're listed below.
**Driver properties**:
| Property | Default | Description |
@@ -191,7 +191,7 @@ There are few ways to change the mapping:
- `rs.getObject(1, Float64.class)` will return `Float64` value of `Int8` column.
- `rs.getLong(1)` will return `Long` value of `Int8` column.
- `rs.getByte(1)` can return `Byte` value of `Int16` column if it fits into `Byte`.
-- conversion from wider to narrower type is not recommend because of data coruption risk.
+- conversion from wider to narrower type isn't recommend because of data coruption risk.
- `Bool` type acts as number, too.
- All number types can be read as `java.lang.String`.
@@ -235,7 +235,7 @@ There are few ways to change the mapping:
- `rs.getObject(1, java.time.LocalDate.class)` will return `java.time.LocalDate` value of `Date` column.
- `rs.getObject(1, java.time.LocalDateTime.class)` will return `java.time.LocalDateTime` value of `DateTime` column.
- `rs.getObject(1, java.time.LocalTime.class)` will return `java.time.LocalTime` value of `Time` column.
-- `Date`, `Date32`, `Time`, `Time64` is not affected by the timezone of the server.
+- `Date`, `Date32`, `Time`, `Time64` isn't affected by the timezone of the server.
- `DateTime`, `DateTime64` is affected by the timezone of the server or session timezone.
- `DateTime` and `DateTime64` can be retrieved as `ZonedDateTime` by using `getObject(colIndex, ZonedDateTime.class)`.
@@ -250,10 +250,10 @@ There are few ways to change the mapping:
- `Array` is mapped to `java.sql.Array` by default to be compatible with JDBC. This is also done to give more information about returned array value. Useful for type inference.
- `Array` implements `getResultSet()` method to return `java.sql.ResultSet` with the same content as the original array.
-- Collection types should not be read as `java.lang.String` because it is not a valid way to represent the data (Ex. there is no quoting for string values in array).
+- Collection types shouldn't be read as `java.lang.String` because it isn't a valid way to represent the data (Ex. there is no quoting for string values in array).
- `Map` is mapped to `JAVA_OBJECT` because value can be read only with `getObject(columnIndex, Class)` method.
- - `Map` is not a `java.sql.Struct` because it doesn't have named columns.
-- `Tuple` is mapped to `Object[]` because it can contain different types and using `List` is not valid.
+ - `Map` isn't a `java.sql.Struct` because it doesn't have named columns.
+- `Tuple` is mapped to `Object[]` because it can contain different types and using `List` isn't valid.
- `Tuple` can be read as `Array` by using `getObject(columnIndex, Array.class)` method. In this case `Array#baseTypeName` will return `Tuple` column definition.
@@ -282,9 +282,9 @@ There are few ways to change the mapping:
| AggregateFunction | OTHER | (binary representation) |
| SimpleAggregateFunction | (wrapped type) | (wrapped class) |
-- `UUID` is not JDBC standard type. However it is part of JDK. By default `java.util.UUID` is returned on `getObject()` method.
+- `UUID` isn't JDBC standard type. However it is part of JDK. By default `java.util.UUID` is returned on `getObject()` method.
- `UUID` can be read/written as `String` by using `getObject(columnIndex, String.class)` method.
-- `IPv4` and `IPv6` are not JDBC standard types. However they are part of JDK. By default `java.net.Inet4Address` and `java.net.Inet6Address` are returned on `getObject()` method.
+- `IPv4` and `IPv6` aren't JDBC standard types. However they're part of JDK. By default `java.net.Inet4Address` and `java.net.Inet6Address` are returned on `getObject()` method.
- `IPv4` and `IPv6` can be read/written as `String` by using `getObject(columnIndex, String.class)` method.
### Handling Dates, Times, and Timezones {#handling-dates-times-and-timezones}
@@ -444,7 +444,7 @@ properties.setProperty("socket_keepalive", "true");
| Streaming Data With `PreparedStatement` | Supported | Not supported |
- JDBC V2 is implemented to be more lightweight and some features were removed.
- - Streaming Data is not supported in JDBC V2 because it is not part of the JDBC spec and Java.
+ - Streaming Data isn't supported in JDBC V2 because it isn't part of the JDBC spec and Java.
- JDBC V2 expects explicit configuration. No failover defaults.
- Protocol should be specified in the URL. No implicit protocol detection using port numbers.
@@ -457,7 +457,7 @@ There are only two enums:
Connection properties are parsed in the following way:
- URL is parsed first for properties. They override all other properties.
-- Driver properties are not passed to the client.
+- Driver properties aren't passed to the client.
- Endpoints (host, port, protocol) are parsed from the URL.
Example:
@@ -643,7 +643,7 @@ Latest JDBC (0.7.2) version uses Client-V1
-Since version `0.5.0`, we are using Apache HTTP Client that's packed the Client. Since there is not a shared version of the package, you need to add a logger as a dependency.
+Since version `0.5.0`, we're using Apache HTTP Client that's packed the Client. Since there isn't a shared version of the package, you need to add a logger as a dependency.
@@ -690,7 +690,7 @@ Since version `0.5.0`, we are using Apache HTTP Client that's packed the Client.
| Property | Default | Description |
| ------------------------ | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `continueBatchOnError` | `false` | Whether to continue batch processing when error occurred |
-| `createDatabaseIfNotExist` | `false` | Whether to create database if it does not exist |
+| `createDatabaseIfNotExist` | `false` | Whether to create database if it doesn't exist |
| `custom_http_headers` | | comma separated custom http headers, for example: `User-Agent=client1,X-Gateway-Id=123` |
| `custom_http_params` | | comma separated custom http query parameters, for example: `extremes=0,max_result_rows=100` |
| `nullAsDefault` | `0` | `0` - treat null value as is and throw exception when inserting null into non-nullable column; `1` - treat null value as is and disable null-check for inserting; `2` - replace null to default value of corresponding data type for both query and insert |
@@ -705,7 +705,7 @@ Note: please refer to [JDBC specific configuration](https://github.com/ClickHous
JDBC Driver supports same data formats as client library does.
:::note
-- AggregatedFunction - :warning: does not support `SELECT * FROM table ...`
+- AggregatedFunction - :warning: doesn't support `SELECT * FROM table ...`
- Decimal - `SET output_format_decimal_trailing_zeros=1` in 21.9+ for consistency
- Enum - can be treated as both string and integer
- UInt64 - mapped to `long` (in client-v1)
@@ -960,7 +960,7 @@ properties.setProperty("socket_keepalive", "true");
```
:::note
-Currently, you must use Apache HTTP Client library when setting the socket keep-alive, as the other two HTTP client libraries supported by `clickhouse-java` do not allow setting socket options. For a detailed guide, see [Configuring HTTP library](#v07-configuring-http-library).
+Currently, you must use Apache HTTP Client library when setting the socket keep-alive, as the other two HTTP client libraries supported by `clickhouse-java` don't allow setting socket options. For a detailed guide, see [Configuring HTTP library](#v07-configuring-http-library).
:::
Alternatively, you can add equivalent parameters to the JDBC URL.
diff --git a/docs/integrations/language-clients/js.md b/docs/integrations/language-clients/js.md
index cfc255cce99..4899ca8a27e 100644
--- a/docs/integrations/language-clients/js.md
+++ b/docs/integrations/language-clients/js.md
@@ -70,7 +70,7 @@ npm i @clickhouse/client-web
|----------------|------------|
| 1.12.0 | 24.8+ |
-Likely, the client will work with the older versions, too; however, this is best-effort support and is not guaranteed. If you have a ClickHouse version older than 23.3, please refer to [ClickHouse security policy](https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md) and consider upgrading.
+Likely, the client will work with the older versions, too; however, this is best-effort support and isn't guaranteed. If you have a ClickHouse version older than 23.3, please refer to [ClickHouse security policy](https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md) and consider upgrading.
## Examples {#examples}
@@ -227,7 +227,7 @@ Every method that sends a query or a statement (`command`, `exec`, `insert`, `se
if it is enabled in the [server configuration](/operations/server-configuration-parameters/settings), or cancel long-running queries (see [the example](https://github.com/ClickHouse/clickhouse-js/blob/main/examples/cancel_query.ts)). If necessary, `query_id` can be overridden by the user in `command`/`query`/`exec`/`insert` methods params.
:::tip
-If you are overriding the `query_id` parameter, you need to ensure its uniqueness for every call. A random UUID is a good choice.
+If you're overriding the `query_id` parameter, you need to ensure its uniqueness for every call. A random UUID is a good choice.
:::
### Base parameters for all client methods {#base-parameters-for-all-client-methods}
@@ -277,7 +277,7 @@ interface ClickHouseClient {
See also: [Base parameters for all client methods](./js.md#base-parameters-for-all-client-methods).
:::tip
-Do not specify the FORMAT clause in `query`, use `format` parameter instead.
+Don't specify the FORMAT clause in `query`, use `format` parameter instead.
:::
#### Result set and row abstractions {#result-set-and-row-abstractions}
@@ -288,14 +288,14 @@ Node.js `ResultSet` implementation uses `Stream.Readable` under the hood, while
You can consume the `ResultSet` by calling either `text` or `json` methods on `ResultSet` and load the entire set of rows returned by the query into memory.
-You should start consuming the `ResultSet` as soon as possible, as it holds the response stream open and consequently keeps the underlying connection busy. The client does not buffer the incoming data to avoid potential excessive memory usage by the application.
+You should start consuming the `ResultSet` as soon as possible, as it holds the response stream open and consequently keeps the underlying connection busy. The client doesn't buffer the incoming data to avoid potential excessive memory usage by the application.
Alternatively, if it's too large to fit into memory at once, you can call the `stream` method, and process the data in the streaming mode. Each of the response chunks will be transformed into a relatively small arrays of rows instead (the size of this array depends on the size of a particular chunk the client receives from the server, as it may vary, and the size of an individual row), one chunk at a time.
Please refer to the list of the [supported data formats](./js.md#supported-data-formats) to determine what the best format is for streaming in your case. For example, if you want to stream JSON objects, you could choose [JSONEachRow](/interfaces/formats/JSONEachRow), and each row will be parsed as a JS object, or, perhaps, a more compact [JSONCompactColumns](/interfaces/formats/JSONCompactColumns) format that will result in each row being a compact array of values. See also: [streaming files](./js.md#streaming-files-nodejs-only).
:::important
-If the `ResultSet` or its stream is not fully consumed, it will be destroyed after the `request_timeout` period of inactivity.
+If the `ResultSet` or its stream isn't fully consumed, it will be destroyed after the `request_timeout` period of inactivity.
:::
```ts
@@ -437,9 +437,9 @@ interface ClickHouseClient {
}
```
-The return type is minimal, as we do not expect any data to be returned from the server and drain the response stream immediately.
+The return type is minimal, as we don't expect any data to be returned from the server and drain the response stream immediately.
-If an empty array was provided to the insert method, the insert statement will not be sent to the server; instead, the method will immediately resolve with `{ query_id: '...', executed: false }`. If the `query_id` was not provided in the method params in this case, it will be an empty string in the result, as returning a random UUID generated by the client could be confusing, as the query with such `query_id` won't exist in the `system.query_log` table.
+If an empty array was provided to the insert method, the insert statement won't be sent to the server; instead, the method will immediately resolve with `{ query_id: '...', executed: false }`. If the `query_id` wasn't provided in the method params in this case, it will be an empty string in the result, as returning a random UUID generated by the client could be confusing, as the query with such `query_id` won't exist in the `system.query_log` table.
If the insert statement was sent to the server, the `executed` flag will be `true`.
@@ -475,7 +475,7 @@ interface InsertParams extends BaseQueryParams {
See also: [Base parameters for all client methods](./js.md#base-parameters-for-all-client-methods).
:::important
-A request canceled with `abort_signal` does not guarantee that data insertion did not take place, as the server could have received some of the streamed data before the cancellation.
+A request canceled with `abort_signal` doesn't guarantee that data insertion didn't take place, as the server could have received some of the streamed data before the cancellation.
:::
**Example:** (Node.js/Web) Insert an array of values.
@@ -558,7 +558,7 @@ await client.insert({
#### Web version limitations {#web-version-limitations}
Currently, inserts in `@clickhouse/client-web` only work with `Array` and `JSON*` formats.
-Inserting streams is not supported in the web version yet due to poor browser compatibility.
+Inserting streams isn't supported in the web version yet due to poor browser compatibility.
Consequently, the `InsertParams` interface for the web version looks slightly different from the Node.js version,
as `values` are limited to the `ReadonlyArray` type only:
@@ -584,7 +584,7 @@ This is a subject to change in the future. See also: [Base parameters for all cl
### Command method {#command-method}
-It can be used for statements that do not have any output, when the format clause is not applicable, or when you are not interested in the response at all. An example of such a statement can be `CREATE TABLE` or `ALTER TABLE`.
+It can be used for statements that don't have any output, when the format clause isn't applicable, or when you're not interested in the response at all. An example of such a statement can be `CREATE TABLE` or `ALTER TABLE`.
Should be awaited.
@@ -649,13 +649,13 @@ await client.command({
```
:::important
-A request cancelled with `abort_signal` does not guarantee that the statement wasn't executed by the server.
+A request cancelled with `abort_signal` doesn't guarantee that the statement wasn't executed by the server.
:::
### Exec method {#exec-method}
-If you have a custom query that does not fit into `query`/`insert`,
-and you are interested in the result, you can use `exec` as an alternative to `command`.
+If you have a custom query that doesn't fit into `query`/`insert`,
+and you're interested in the result, you can use `exec` as an alternative to `command`.
`exec` returns a readable stream that MUST be consumed or destroyed on the application side.
@@ -729,7 +729,7 @@ interface ClickHouseClient {
Ping might be a useful tool to check if the server is available when the application starts, especially with ClickHouse Cloud, where an instance might be idling and will wake up after a ping: in that case, you might want to retry it a few times with a delay in between.
-Note that by default, Node.js version uses the `/ping` endpoint, while the Web version uses a simple `SELECT 1` query to achieve a similar result, as the `/ping` endpoint does not support CORS.
+Note that by default, Node.js version uses the `/ping` endpoint, while the Web version uses a simple `SELECT 1` query to achieve a similar result, as the `/ping` endpoint doesn't support CORS.
**Example:** (Node.js/Web) A simple ping to the ClickHouse server instance. NB: for the Web version, captured errors will be different.
[Source code](https://github.com/ClickHouse/clickhouse-js/blob/main/examples/ping.ts).
@@ -782,7 +782,7 @@ There might be confusion between JSON as a general format and [ClickHouse JSON f
The client supports streaming JSON objects with formats such as [JSONEachRow](/interfaces/formats/JSONEachRow) (see the table overview for other streaming-friendly formats; see also the `select_streaming_` [examples in the client repository](https://github.com/ClickHouse/clickhouse-js/tree/main/examples/node)).
-It's only that formats like [ClickHouse JSON](/interfaces/formats/JSON) and a few others are represented as a single object in the response and cannot be streamed by the client.
+It's only that formats like [ClickHouse JSON](/interfaces/formats/JSON) and a few others are represented as a single object in the response and can't be streamed by the client.
:::
| Format | Input (array) | Input (object) | Input/Output (Stream) | Output (JSON) | Output (text) |
@@ -884,7 +884,7 @@ await client.insert({
})
```
-However, if you are using `DateTime` or `DateTime64` columns, you can use both strings and JS Date objects. JS Date objects can be passed to `insert` as-is with `date_time_input_format` set to `best_effort`. See this [example](https://github.com/ClickHouse/clickhouse-js/blob/main/examples/insert_js_dates.ts) for more details.
+However, if you're using `DateTime` or `DateTime64` columns, you can use both strings and JS Date objects. JS Date objects can be passed to `insert` as-is with `date_time_input_format` set to `best_effort`. See this [example](https://github.com/ClickHouse/clickhouse-js/blob/main/examples/insert_js_dates.ts) for more details.
### Decimal\* types caveats {#decimal-types-caveats}
@@ -1148,14 +1148,14 @@ The client enables Keep-Alive in the underlying HTTP agent by default, meaning t
`keep_alive.idle_socket_ttl` is supposed to have its value a fair bit lower than the server/LB configuration. The main reason is that due to HTTP/1.1 allowing the server to close the sockets without notifying the client, if the server or the load balancer closes the connection _before_ the client does, the client could try to reuse the closed socket, resulting in a `socket hang up` error.
-If you are modifying `keep_alive.idle_socket_ttl`, keep in mind that it should be always in sync with your server/LB Keep-Alive configuration, and it should be **always lower** than that, ensuring that the server never closes the open connection first.
+If you're modifying `keep_alive.idle_socket_ttl`, keep in mind that it should be always in sync with your server/LB Keep-Alive configuration, and it should be **always lower** than that, ensuring that the server never closes the open connection first.
#### Adjusting `idle_socket_ttl` {#adjusting-idle_socket_ttl}
The client sets `keep_alive.idle_socket_ttl` to 2500 milliseconds, as it can be considered the safest default; on the server side `keep_alive_timeout` might be set to [as low as 3 seconds in ClickHouse versions prior to 23.11](https://github.com/ClickHouse/ClickHouse/commit/1685cdcb89fe110b45497c7ff27ce73cc03e82d1) without `config.xml` modifications.
:::warning
-If you are happy with the performance and do not experience any issues, it is recommended to **not** increase the value of `keep_alive.idle_socket_ttl` setting, as it might lead to potential "Socket hang-up" errors; additionally, if your application sends a lot of queries and there is not a lot of downtime between them, the default value should be sufficient, as the sockets will not be idling for a long enough time, and the client will keep them in the pool.
+If you're happy with the performance and don't experience any issues, it is recommended to **not** increase the value of `keep_alive.idle_socket_ttl` setting, as it might lead to potential "Socket hang-up" errors; additionally, if your application sends a lot of queries and there isn't a lot of downtime between them, the default value should be sufficient, as the sockets won't be idling for a long enough time, and the client will keep them in the pool.
:::
You can find the correct Keep-Alive timeout value in the server response headers by running the following command:
@@ -1175,7 +1175,7 @@ In this case, `keep_alive_timeout` is 10 seconds, and you could try increasing `
#### Troubleshooting {#troubleshooting}
-If you are experiencing `socket hang up` errors even when using the latest version of the client, there are the following options to resolve this issue:
+If you're experiencing `socket hang up` errors even when using the latest version of the client, there are the following options to resolve this issue:
* Enable logs with at least `WARN` log level. This will allow for checking if there is an unconsumed or a dangling stream in the application code: the transport layer will log it on the WARN level, as that could potentially lead to the socket being closed by the server. You can enable logging in the client configuration as follows:
@@ -1206,9 +1206,9 @@ If you are experiencing `socket hang up` errors even when using the latest versi
```
Keep in mind, however, that the total size of the received headers has 16KB limit in recent Node.js versions; after certain amount of progress headers received, which was around 70-80 in our tests, an exception will be generated.
- It is also possible to use an entirely different approach, avoiding wait time on the wire completely; it could be done by leveraging HTTP interface "feature" that mutations are not cancelled when the connection is lost. See [this example (part 2)](https://github.com/ClickHouse/clickhouse-js/blob/main/examples/long_running_queries_timeouts.ts) for more details.
+ It is also possible to use an entirely different approach, avoiding wait time on the wire completely; it could be done by leveraging HTTP interface "feature" that mutations aren't cancelled when the connection is lost. See [this example (part 2)](https://github.com/ClickHouse/clickhouse-js/blob/main/examples/long_running_queries_timeouts.ts) for more details.
-* Keep-Alive feature can be disabled entirely. In this case, client will also add `Connection: close` header to every request, and the underlying HTTP agent will not reuse the connections. `keep_alive.idle_socket_ttl` setting will be ignored, as there will be no idling sockets. This will result in additional overhead, as a new connection will be established for every request.
+* Keep-Alive feature can be disabled entirely. In this case, client will also add `Connection: close` header to every request, and the underlying HTTP agent won't reuse the connections. `keep_alive.idle_socket_ttl` setting will be ignored, as there will be no idling sockets. This will result in additional overhead, as a new connection will be established for every request.
```ts
const client = createClient({
@@ -1220,7 +1220,7 @@ If you are experiencing `socket hang up` errors even when using the latest versi
### Read-only users {#read-only-users}
-When using the client with a [readonly=1 user](/operations/settings/permissions-for-queries#readonly), the response compression cannot be enabled, as it requires `enable_http_compression` setting. The following configuration will result in an error:
+When using the client with a [readonly=1 user](/operations/settings/permissions-for-queries#readonly), the response compression can't be enabled, as it requires `enable_http_compression` setting. The following configuration will result in an error:
```ts
const client = createClient({
@@ -1258,7 +1258,7 @@ const client = createClient({
### Custom HTTP/HTTPS agent (experimental, Node.js only) {#custom-httphttps-agent-experimental-nodejs-only}
:::warning
-This is an experimental feature that may change in backwards-incompatible ways in the future releases. The default implementation and settings the client provides should be sufficient for most use cases. Use this feature only if you are sure that you need it.
+This is an experimental feature that may change in backwards-incompatible ways in the future releases. The default implementation and settings the client provides should be sufficient for most use cases. Use this feature only if you're sure that you need it.
:::
By default, the client will configure the underlying HTTP(s) agent using the settings provided in the client configuration (such as `max_open_connections`, `keep_alive.enabled`, `tls`), which will handle the connections to the ClickHouse server. Additionally, if TLS certificates are used, the underlying agent will be configured with the necessary certificates, and the correct TLS auth headers will be enforced.
@@ -1266,7 +1266,7 @@ By default, the client will configure the underlying HTTP(s) agent using the set
After 1.2.0, it is possible to provide a custom HTTP(s) agent to the client, replacing the default underlying one. It could be useful in case of tricky network configurations. The following conditions apply if a custom agent is provided:
- The `max_open_connections` and `tls` options will have _no effect_ and will be ignored by the client, as it is a part of the underlying agent configuration.
- `keep_alive.enabled` will only regulate the default value of the `Connection` header (`true` -> `Connection: keep-alive`, `false` -> `Connection: close`).
-- While the idle keep-alive socket management will still work (as it is not tied to the agent but to a particular socket itself), it is now possible to disable it entirely by setting the `keep_alive.idle_socket_ttl` value to `0`.
+- While the idle keep-alive socket management will still work (as it isn't tied to the agent but to a particular socket itself), it is now possible to disable it entirely by setting the `keep_alive.idle_socket_ttl` value to `0`.
#### Custom agent usage examples {#custom-agent-usage-examples}
@@ -1350,7 +1350,7 @@ With certificates _and_ a custom _HTTPS_ Agent, it is likely necessary to disabl
## Tips for performance optimizations {#tips-for-performance-optimizations}
- To reduce application memory consumption, consider using streams for large inserts (e.g. from files) and selects when applicable. For event listeners and similar use cases, [async inserts](/optimize/asynchronous-inserts) could be another good option, allowing to minimize, or even completely avoid batching on the client side. Async insert examples are available in the [client repository](https://github.com/ClickHouse/clickhouse-js/tree/main/examples), with `async_insert_` as the file name prefix.
-- The client does not enable request or response compression by default. However, when selecting or inserting large datasets, you could consider enabling it via `ClickHouseClientConfigOptions.compression` (either for just `request` or `response`, or both).
+- The client doesn't enable request or response compression by default. However, when selecting or inserting large datasets, you could consider enabling it via `ClickHouseClientConfigOptions.compression` (either for just `request` or `response`, or both).
- Compression has significant performance penalty. Enabling it for `request` or `response` will negatively impact the speed of selects or inserts, respectively, but will reduce the amount of network traffic transferred by the application.
## Contact us {#contact-us}
diff --git a/docs/integrations/language-clients/python/additional-options.md b/docs/integrations/language-clients/python/additional-options.md
index 1b63cc5b444..a3e01c5be68 100644
--- a/docs/integrations/language-clients/python/additional-options.md
+++ b/docs/integrations/language-clients/python/additional-options.md
@@ -14,7 +14,7 @@ ClickHouse Connect provides a number of additional options for advanced use case
## Global settings {#global-settings}
-There are a small number of settings that control ClickHouse Connect behavior globally. They are accessed from the top level `common` package:
+There are a small number of settings that control ClickHouse Connect behavior globally. They're accessed from the top level `common` package:
```python
from clickhouse_connect import common
@@ -25,7 +25,7 @@ common.get_setting('invalid_setting_action')
```
:::note
-These common settings `autogenerate_session_id`, `product_name`, and `readonly` should _always_ be modified before creating a client with the `clickhouse_connect.get_client` method. Changing these settings after client creation does not affect the behavior of existing clients.
+These common settings `autogenerate_session_id`, `product_name`, and `readonly` should _always_ be modified before creating a client with the `clickhouse_connect.get_client` method. Changing these settings after client creation doesn't affect the behavior of existing clients.
:::
The following global settings are currently defined:
@@ -69,14 +69,14 @@ To use a SOCKS proxy, you can send a `urllib3` `SOCKSProxyManager` as the `pool_
## "Old" JSON data type {#old-json-data-type}
-The experimental `Object` (or `Object('json')`) data type is deprecated and should be avoided in a production environment. ClickHouse Connect continues to provide limited support for the data type for backward compatibility. Note that this support does not include queries that are expected to return "top level" or "parent" JSON values as dictionaries or the equivalent, and such queries will result in an exception.
+The experimental `Object` (or `Object('json')`) data type is deprecated and should be avoided in a production environment. ClickHouse Connect continues to provide limited support for the data type for backward compatibility. Note that this support doesn't include queries that are expected to return "top level" or "parent" JSON values as dictionaries or the equivalent, and such queries will result in an exception.
## "New" Variant/Dynamic/JSON datatypes (experimental feature) {#new-variantdynamicjson-datatypes-experimental-feature}
Beginning with the 0.8.0 release, `clickhouse-connect` provides experimental support for the new (also experimental) ClickHouse types Variant, Dynamic, and JSON.
### Usage notes {#usage-notes}
-- JSON data can be inserted as either a Python dictionary or a JSON string containing a JSON object `{}`. Other forms of JSON data are not supported.
+- JSON data can be inserted as either a Python dictionary or a JSON string containing a JSON object `{}`. Other forms of JSON data aren't supported.
- Queries using subcolumns/paths for these types will return the type of the sub column.
- See the main ClickHouse [documentation](https://clickhouse.com/docs) for other usage notes.
@@ -86,4 +86,4 @@ Beginning with the 0.8.0 release, `clickhouse-connect` provides experimental sup
- Due to internal format changes, `clickhouse-connect` is only compatible with Variant types beginning with the ClickHouse 24.7 release
- Returned JSON objects will only return the `max_dynamic_paths` number of elements (which defaults to 1024). This will be fixed in a future release.
- Inserts into `Dynamic` columns will always be the String representation of the Python value. This will be fixed in a future release, once https://github.com/ClickHouse/ClickHouse/issues/70395 has been fixed.
-- The implementation for the new types has not been optimized in C code, so performance may be somewhat slower than for simpler, established data types.
+- The implementation for the new types hasn't been optimized in C code, so performance may be somewhat slower than for simpler, established data types.
diff --git a/docs/integrations/language-clients/python/advanced-inserting.md b/docs/integrations/language-clients/python/advanced-inserting.md
index e5fb625c8b2..fccb4d768a6 100644
--- a/docs/integrations/language-clients/python/advanced-inserting.md
+++ b/docs/integrations/language-clients/python/advanced-inserting.md
@@ -29,7 +29,7 @@ assert qr.row_count == 4
assert qr[0][0] == 4
```
-`InsertContext`s include mutable state that is updated during the insert process, so they are not thread safe.
+`InsertContext`s include mutable state that is updated during the insert process, so they're not thread safe.
### Write formats {#write-formats}
Write formats are currently implemented for limited number of types. In most cases ClickHouse Connect will attempt to automatically determine the correct write format for a column by checking the type of the first (non-null) data value. For example, if inserting into a `DateTime` column, and the first insert value of the column is a Python integer, ClickHouse Connect will directly insert the integer value under the assumption that it's actually an epoch second.
@@ -70,12 +70,12 @@ In most cases, it is unnecessary to override the write format for a data type, b
ClickHouse Connect provides specialized insert methods for common data formats:
-- `insert_df` -- Insert a Pandas DataFrame. Instead of a Python Sequence of Sequences `data` argument, the second parameter of this method requires a `df` argument that must be a Pandas DataFrame instance. ClickHouse Connect automatically processes the DataFrame as a column oriented datasource, so the `column_oriented` parameter is not required or available.
+- `insert_df` -- Insert a Pandas DataFrame. Instead of a Python Sequence of Sequences `data` argument, the second parameter of this method requires a `df` argument that must be a Pandas DataFrame instance. ClickHouse Connect automatically processes the DataFrame as a column oriented datasource, so the `column_oriented` parameter isn't required or available.
- `insert_arrow` -- Insert a PyArrow Table. ClickHouse Connect passes the Arrow table unmodified to the ClickHouse server for processing, so only the `database` and `settings` arguments are available in addition to `table` and `arrow_table`.
-- `insert_df_arrow` -- Insert an arrow-backed Pandas DataFrame or a Polars DataFrame. ClickHouse Connect will automatically determine if the DataFrame is a Pandas or Polars type. If Pandas, validation will be performed to ensure that each column's dtype backend is Arrow-based and an error will be raised if any are not.
+- `insert_df_arrow` -- Insert an arrow-backed Pandas DataFrame or a Polars DataFrame. ClickHouse Connect will automatically determine if the DataFrame is a Pandas or Polars type. If Pandas, validation will be performed to ensure that each column's dtype backend is Arrow-based and an error will be raised if any aren't.
:::note
-A NumPy array is a valid Sequence of Sequences and can be used as the `data` argument to the main `insert` method, so a specialized method is not required.
+A NumPy array is a valid Sequence of Sequences and can be used as the `data` argument to the main `insert` method, so a specialized method isn't required.
:::
#### Pandas DataFrame insert {#pandas-dataframe-insert}
@@ -234,7 +234,7 @@ The `clickhouse_connect.driver.tools` package includes the `insert_file` method
| client | Client | *Required* | The `driver.Client` used to perform the insert |
| table | str | *Required* | The ClickHouse table to insert into. The full table name (including database) is permitted. |
| file_path | str | *Required* | The native file system path to the data file |
-| fmt | str | CSV, CSVWithNames | The ClickHouse Input Format of the file. CSVWithNames is assumed if `column_names` is not provided |
+| fmt | str | CSV, CSVWithNames | The ClickHouse Input Format of the file. CSVWithNames is assumed if `column_names` isn't provided |
| column_names | Sequence of str | *None* | A list of column names in the data file. Not required for formats that include column names |
| database | str | *None* | Database of the table. Ignored if the table is fully qualified. If not specified, the insert will use the client database |
| settings | dict | *None* | See [settings description](driver-api.md#settings-argument). |
diff --git a/docs/integrations/language-clients/python/advanced-querying.md b/docs/integrations/language-clients/python/advanced-querying.md
index de999c94e0f..9f3f347d5ee 100644
--- a/docs/integrations/language-clients/python/advanced-querying.md
+++ b/docs/integrations/language-clients/python/advanced-querying.md
@@ -29,7 +29,7 @@ result = test_client.query(context=qc)
assert result.result_set[1][0] == 'first_value2'
```
-Note that `QueryContext`s are not thread safe, but a copy can be obtained in a multi-threaded environment by calling the `QueryContext.updated_copy` method.
+Note that `QueryContext`s aren't thread safe, but a copy can be obtained in a multi-threaded environment by calling the `QueryContext.updated_copy` method.
## Streaming queries {#streaming-queries}
@@ -46,14 +46,14 @@ The ClickHouse Connect Client provides multiple methods for retrieving data as a
Each of these methods returns a `ContextStream` object that must be opened via a `with` statement to start consuming the stream.
### Data blocks {#data-blocks}
-ClickHouse Connect processes all data from the primary `query` method as a stream of blocks received from the ClickHouse server. These blocks are transmitted in the custom "Native" format to and from ClickHouse. A "block" is simply a sequence of columns of binary data, where each column contains an equal number of data values of the specified data type. (As a columnar database, ClickHouse stores this data in a similar form.) The size of a block returned from a query is governed by two user settings that can be set at several levels (user profile, user, session, or query). They are:
+ClickHouse Connect processes all data from the primary `query` method as a stream of blocks received from the ClickHouse server. These blocks are transmitted in the custom "Native" format to and from ClickHouse. A "block" is simply a sequence of columns of binary data, where each column contains an equal number of data values of the specified data type. (As a columnar database, ClickHouse stores this data in a similar form.) The size of a block returned from a query is governed by two user settings that can be set at several levels (user profile, user, session, or query). They're:
- [max_block_size](/operations/settings/settings#max_block_size) -- Limit on the size of the block in rows. Default 65536.
- [preferred_block_size_bytes](/operations/settings/settings#preferred_block_size_bytes) -- Soft limit on the size of the block in bytes. Default 1,000,0000.
Regardless of the `preferred_block_size_setting`, each block will never be more than `max_block_size` rows. Depending on the type of query, the actual blocks returned can be of any size. For example, queries to a distributed table covering many shards may contain smaller blocks retrieved directly from each shard.
-When using one of the Client `query_*_stream` methods, results are returned on a block by block basis. ClickHouse Connect only loads a single block at a time. This allows processing large amounts of data without the need to load all of a large result set into memory. Note the application should be prepared to process any number of blocks and the exact size of each block cannot be controlled.
+When using one of the Client `query_*_stream` methods, results are returned on a block by block basis. ClickHouse Connect only loads a single block at a time. This allows processing large amounts of data without the need to load all of a large result set into memory. Note the application should be prepared to process any number of blocks and the exact size of each block can't be controlled.
### HTTP data buffer for slow processing {#http-data-buffer-for-slow-processing}
@@ -306,10 +306,10 @@ with client.query_df_arrow_stream(
```
#### Notes and caveats {#notes-and-caveats}
-- Arrow type mapping: When returning data in Arrow format, ClickHouse maps types to the closest supported Arrow types. Some ClickHouse types do not have a native Arrow equivalent and are returned as raw bytes in Arrow fields (usually `BINARY` or `FIXED_SIZE_BINARY`).
+- Arrow type mapping: When returning data in Arrow format, ClickHouse maps types to the closest supported Arrow types. Some ClickHouse types don't have a native Arrow equivalent and are returned as raw bytes in Arrow fields (usually `BINARY` or `FIXED_SIZE_BINARY`).
- Examples: `IPv4` is represented as Arrow `UINT32`; `IPv6` and large integers (`Int128/UInt128/Int256/UInt256`) are often represented as `FIXED_SIZE_BINARY`/`BINARY` with raw bytes.
- In these cases, the DataFrame column will contain byte values backed by the Arrow field; it is up to the client code to interpret/convert those bytes according to ClickHouse semantics.
-- Unsupported Arrow data types (e.g., UUID/ENUM as true Arrow types) are not emitted; values are represented using the closest supported Arrow type (often as binary bytes) for output.
+- Unsupported Arrow data types (e.g., UUID/ENUM as true Arrow types) aren't emitted; values are represented using the closest supported Arrow type (often as binary bytes) for output.
- Pandas requirement: Arrow‑backed dtypes require pandas 2.x. For older pandas versions, use `query_df` (non‑Arrow) instead.
- Strings vs binary: The `use_strings` option (when supported by the server setting `output_format_arrow_string_as_string`) controls whether ClickHouse `String` columns are returned as Arrow strings or as binary.
@@ -353,7 +353,7 @@ The key takeaway: application code must handle these conversions based on the ca
## Read formats {#read-formats}
-Read formats control the data types of values returned from the client `query`, `query_np`, and `query_df` methods. (The `raw_query` and `query_arrow` do not modify incoming data from ClickHouse, so format control does not apply.) For example, if the read format for a UUID is changed from the default `native` format to the alternative `string` format, a ClickHouse query of `UUID` column will be returned as string values (using the standard 8-4-4-4-12 RFC 1422 format) instead of Python UUID objects.
+Read formats control the data types of values returned from the client `query`, `query_np`, and `query_df` methods. (The `raw_query` and `query_arrow` don't modify incoming data from ClickHouse, so format control doesn't apply.) For example, if the read format for a UUID is changed from the default `native` format to the alternative `string` format, a ClickHouse query of `UUID` column will be returned as string values (using the standard 8-4-4-4-12 RFC 1422 format) instead of Python UUID objects.
The "data type" argument for any formatting function can include wildcards. The format is a single lower case string.
@@ -385,13 +385,13 @@ client.query('SELECT device_id, dev_address, gw_address from devices', column_fo
| ClickHouse Type | Native Python Type | Read Formats | Comments |
|-----------------------|-------------------------|-------------------|-------------------------------------------------------------------------------------------------------------------|
| Int[8-64], UInt[8-32] | int | - | |
-| UInt64 | int | signed | Superset does not currently handle large unsigned UInt64 values |
+| UInt64 | int | signed | Superset doesn't currently handle large unsigned UInt64 values |
| [U]Int[128,256] | int | string | Pandas and NumPy int values are 64 bits maximum, so these can be returned as strings |
| BFloat16 | float | - | All Python floats are 64 bits internally |
| Float32 | float | - | All Python floats are 64 bits internally |
| Float64 | float | - | |
| Decimal | decimal.Decimal | - | |
-| String | string | bytes | ClickHouse String columns have no inherent encoding, so they are also used for variable length binary data |
+| String | string | bytes | ClickHouse String columns have no inherent encoding, so they're also used for variable length binary data |
| FixedString | bytes | string | FixedStrings are fixed size byte arrays, but sometimes are treated as Python strings |
| Enum[8,16] | string | string, int | Python enums don't accept empty strings, so all enums are rendered as either strings or the underlying int value. |
| Date | datetime.date | int | ClickHouse stores Dates as days since 01/01/1970. This value is available as an int |
@@ -446,9 +446,9 @@ There are multiple mechanisms for applying a time zone to ClickHouse DateTime an
When using time zone aware data types in queries - in particular the Python `datetime.datetime` object -- `clickhouse-connect` applies a client side time zone using the following precedence rules:
1. If the query method parameter `client_tzs` is specified for the query, the specific column time zone is applied
-2. If the ClickHouse column has timezone metadata (i.e., it is a type like DateTime64(3, 'America/Denver')), the ClickHouse column timezone is applied. (Note this timezone metadata is not available to clickhouse-connect for DateTime columns prior to ClickHouse version 23.2)
+2. If the ClickHouse column has timezone metadata (i.e., it is a type like DateTime64(3, 'America/Denver')), the ClickHouse column timezone is applied. (Note this timezone metadata isn't available to clickhouse-connect for DateTime columns prior to ClickHouse version 23.2)
3. If the query method parameter `query_tz` is specified for the query, the "query timezone" is applied.
-4. If a timezone setting is applied to the query or session, that timezone is applied. (This functionality is not yet released in the ClickHouse server)
+4. If a timezone setting is applied to the query or session, that timezone is applied. (This functionality isn't yet released in the ClickHouse server)
5. Finally, if the client `apply_server_timezone` parameter has been set to True (the default), the ClickHouse server timezone is applied.
Note that if the applied timezone based on these rules is UTC, `clickhouse-connect` will _always_ return a time zone naive Python `datetime.datetime` object. Additional timezone information can then be added to this timezone naive object by the application code if desired.
diff --git a/docs/integrations/language-clients/python/advanced-usage.md b/docs/integrations/language-clients/python/advanced-usage.md
index 5832d0b0100..918f1968932 100644
--- a/docs/integrations/language-clients/python/advanced-usage.md
+++ b/docs/integrations/language-clients/python/advanced-usage.md
@@ -12,7 +12,7 @@ doc_type: 'reference'
## Raw API {#raw-api}
-For use cases which do not require transformation between ClickHouse data and native or third party data types and structures, the ClickHouse Connect client provides methods for direct usage of the ClickHouse connection.
+For use cases which don't require transformation between ClickHouse data and native or third party data types and structures, the ClickHouse Connect client provides methods for direct usage of the ClickHouse connection.
### Client `raw_query` method {#client-rawquery-method}
@@ -39,7 +39,7 @@ The `Client.raw_insert` method allows direct inserts of `bytes` objects or `byte
| Parameter | Type | Default | Description |
|--------------|----------------------------------------|------------|---------------------------------------------------------------------------------------------|
| table | str | *Required* | Either the simple or database qualified table name |
-| column_names | Sequence[str] | *None* | Column names for the insert block. Required if the `fmt` parameter does not include names |
+| column_names | Sequence[str] | *None* | Column names for the insert block. Required if the `fmt` parameter doesn't include names |
| insert_block | str, bytes, Generator[bytes], BinaryIO | *Required* | Data to insert. Strings will be encoded with the client encoding. |
| settings | dict | *None* | See [settings description](driver-api.md#settings-argument). |
| fmt | str | *None* | ClickHouse Input Format of the `insert_block` bytes. (ClickHouse uses TSV if not specified) |
@@ -80,7 +80,7 @@ Similarly, you could save data in [TabSeparated](/interfaces/formats/TabSeparate
ClickHouse Connect works well in multithreaded, multiprocess, and event-loop-driven/asynchronous applications. All query and insert processing occurs within a single thread, so operations are generally thread-safe. (Parallel processing of some operations at a low level is a possible future enhancement to overcome the performance penalty of a single thread, but even in that case thread safety will be maintained.)
-Because each query or insert executed maintains state in its own `QueryContext` or `InsertContext` object, respectively, these helper objects are not thread-safe, and they should not be shared between multiple processing streams. See the additional discussion about context objects in the [QueryContexts](advanced-querying.md#querycontexts) and [InsertContexts](advanced-inserting.md#insertcontexts) sections.
+Because each query or insert executed maintains state in its own `QueryContext` or `InsertContext` object, respectively, these helper objects aren't thread-safe, and they shouldn't be shared between multiple processing streams. See the additional discussion about context objects in the [QueryContexts](advanced-querying.md#querycontexts) and [InsertContexts](advanced-inserting.md#insertcontexts) sections.
Additionally, in an application that has two or more queries and/or inserts "in flight" at the same time, there are two further considerations to keep in mind. The first is the ClickHouse "session" associated with the query/insert, and the second is the HTTP connection pool used by ClickHouse Connect Client instances.
@@ -106,7 +106,7 @@ async def main():
asyncio.run(main())
```
-`AsyncClient` has the same methods with the same parameters as the standard `Client`, but they are coroutines when applicable. Internally, these methods from the `Client` that perform I/O operations are wrapped in a [run_in_executor](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) call.
+`AsyncClient` has the same methods with the same parameters as the standard `Client`, but they're coroutines when applicable. Internally, these methods from the `Client` that perform I/O operations are wrapped in a [run_in_executor](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) call.
Multithreaded performance will increase when using the `AsyncClient` wrapper, as the execution threads and the GIL will be released while waiting for I/O operations to complete.
@@ -120,9 +120,9 @@ Each ClickHouse query occurs within the context of a ClickHouse "session". Sessi
- To associate specific ClickHouse settings with multiple queries (see the [user settings](/operations/settings/settings.md)). The ClickHouse `SET` command is used to change the settings for the scope of a user session.
- To track [temporary tables.](/sql-reference/statements/create/table#temporary-tables)
-By default, each query executed with a ClickHouse Connect `Client` instance uses that client's session ID. `SET` statements and temporary tables work as expected when using a single client. However, the ClickHouse server does not allow concurrent queries within the same session (the client will raise a `ProgrammingError` if attempted). For applications that execute concurrent queries, use one of the following patterns:
+By default, each query executed with a ClickHouse Connect `Client` instance uses that client's session ID. `SET` statements and temporary tables work as expected when using a single client. However, the ClickHouse server doesn't allow concurrent queries within the same session (the client will raise a `ProgrammingError` if attempted). For applications that execute concurrent queries, use one of the following patterns:
1. Create a separate `Client` instance for each thread/process/event handler that needs session isolation. This preserves per-client session state (temporary tables and `SET` values).
-2. Use a unique `session_id` for each query via the `settings` argument when calling `query`, `command`, or `insert`, if you do not require shared session state.
+2. Use a unique `session_id` for each query via the `settings` argument when calling `query`, `command`, or `insert`, if you don't require shared session state.
3. Disable sessions on a shared client by setting `autogenerate_session_id=False` before creating the client (or pass it directly to `get_client`).
```python
@@ -135,7 +135,7 @@ client = clickhouse_connect.get_client(host='somehost.com', user='dbuser', passw
Alternatively, pass `autogenerate_session_id=False` directly to `get_client(...)`.
-In this case ClickHouse Connect does not send a `session_id`; the server does not treat separate requests as belonging to the same session. Temporary tables and session-level settings will not persist across requests.
+In this case ClickHouse Connect doesn't send a `session_id`; the server doesn't treat separate requests as belonging to the same session. Temporary tables and session-level settings won't persist across requests.
## Customizing the HTTP connection pool {#customizing-the-http-connection-pool}
diff --git a/docs/integrations/language-clients/python/driver-api.md b/docs/integrations/language-clients/python/driver-api.md
index 2d092535581..8e76cccc592 100644
--- a/docs/integrations/language-clients/python/driver-api.md
+++ b/docs/integrations/language-clients/python/driver-api.md
@@ -13,7 +13,7 @@ doc_type: 'reference'
:::note
Passing keyword arguments is recommended for most api methods given the number of possible arguments, most of which are optional.
-*Methods not documented here are not considered part of the API, and may be removed or changed.*
+*Methods not documented here aren't considered part of the API, and may be removed or changed.*
:::
## Client Initialization {#client-initialization}
@@ -33,8 +33,8 @@ The `clickhouse_connect.driver.client` class provides the primary interface betw
| secure | bool | False | Use HTTPS/TLS. This overrides inferred values from the interface or port arguments. |
| dsn | str | *None* | A string in standard DSN (Data Source Name) format. Other connection values (such as host or user) will be extracted from this string if not set otherwise. |
| compress | bool or str | True | Enable compression for ClickHouse HTTP inserts and query results. See [Additional Options (Compression)](additional-options.md#compression) |
-| query_limit | int | 0 (unlimited) | Maximum number of rows to return for any `query` response. Set this to zero to return unlimited rows. Note that large query limits may result in out of memory exceptions if results are not streamed, as all results are loaded into memory at once. |
-| query_retries | int | 2 | Maximum number of retries for a `query` request. Only "retryable" HTTP responses will be retried. `command` or `insert` requests are not automatically retried by the driver to prevent unintended duplicate requests. |
+| query_limit | int | 0 (unlimited) | Maximum number of rows to return for any `query` response. Set this to zero to return unlimited rows. Note that large query limits may result in out of memory exceptions if results aren't streamed, as all results are loaded into memory at once. |
+| query_retries | int | 2 | Maximum number of retries for a `query` request. Only "retryable" HTTP responses will be retried. `command` or `insert` requests aren't automatically retried by the driver to prevent unintended duplicate requests. |
| connect_timeout | int | 10 | HTTP connection timeout in seconds. |
| send_receive_timeout | int | 300 | Send/receive timeout for the HTTP connection in seconds. |
| client_name | str | *None* | client_name prepended to the HTTP User Agent header. Set this to track client queries in the ClickHouse system.query_log. |
@@ -53,15 +53,15 @@ The `clickhouse_connect.driver.client` class provides the primary interface betw
| Parameter | Type | Default | Description |
|------------------|------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| verify | bool | True | Validate the ClickHouse server TLS/SSL certificate (hostname, expiration, etc.) if using HTTPS/TLS. |
-| ca_cert | str | *None* | If *verify*=*True*, the file path to Certificate Authority root to validate ClickHouse server certificate, in .pem format. Ignored if verify is False. This is not necessary if the ClickHouse server certificate is a globally trusted root as verified by the operating system. |
+| ca_cert | str | *None* | If *verify*=*True*, the file path to Certificate Authority root to validate ClickHouse server certificate, in .pem format. Ignored if verify is False. This isn't necessary if the ClickHouse server certificate is a globally trusted root as verified by the operating system. |
| client_cert | str | *None* | File path to a TLS Client certificate in .pem format (for mutual TLS authentication). The file should contain a full certificate chain, including any intermediate certificates. |
-| client_cert_key | str | *None* | File path to the private key for the Client Certificate. Required if the private key is not included the Client Certificate key file. |
+| client_cert_key | str | *None* | File path to the private key for the Client Certificate. Required if the private key isn't included the Client Certificate key file. |
| server_host_name | str | *None* | The ClickHouse server hostname as identified by the CN or SNI of its TLS certificate. Set this to avoid SSL errors when connecting through a proxy or tunnel with a different hostname |
-| tls_mode | str | *None* | Controls advanced TLS behavior. `proxy` and `strict` do not invoke ClickHouse mutual TLS connection, but do send client cert and key. `mutual` assumes ClickHouse mutual TLS auth with a client certificate. *None*/default behavior is `mutual` |
+| tls_mode | str | *None* | Controls advanced TLS behavior. `proxy` and `strict` don't invoke ClickHouse mutual TLS connection, but do send client cert and key. `mutual` assumes ClickHouse mutual TLS auth with a client certificate. *None*/default behavior is `mutual` |
### Settings argument {#settings-argument}
-Finally, the `settings` argument to `get_client` is used to pass additional ClickHouse settings to the server for each client request. Note that in most cases, users with *readonly*=*1* access cannot alter settings sent with a query, so ClickHouse Connect will drop such settings in the final request and log a warning. The following settings apply only to HTTP queries/sessions used by ClickHouse Connect, and are not documented as general ClickHouse settings.
+Finally, the `settings` argument to `get_client` is used to pass additional ClickHouse settings to the server for each client request. Note that in most cases, users with *readonly*=*1* access can't alter settings sent with a query, so ClickHouse Connect will drop such settings in the final request and log a warning. The following settings apply only to HTTP queries/sessions used by ClickHouse Connect, and aren't documented as general ClickHouse settings.
| Setting | Description |
|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -276,14 +276,14 @@ WHERE date >= '2022-10-01 15:20:05'
```
:::warning
-Server-side binding is only supported (by the ClickHouse server) for `SELECT` queries. It does not work for `ALTER`, `DELETE`, `INSERT`, or other types of queries. This may change in the future; see https://github.com/ClickHouse/ClickHouse/issues/42092.
+Server-side binding is only supported (by the ClickHouse server) for `SELECT` queries. It doesn't work for `ALTER`, `DELETE`, `INSERT`, or other types of queries. This may change in the future; see https://github.com/ClickHouse/ClickHouse/issues/42092.
:::
#### Client-side binding {#client-side-binding}
ClickHouse Connect also supports client-side parameter binding, which can allow more flexibility in generating templated SQL queries. For client-side binding, the `parameters` argument should be a dictionary or a sequence. Client-side binding uses the Python ["printf" style](https://docs.python.org/3/library/stdtypes.html#old-string-formatting) string formatting for parameter substitution.
-Note that unlike server-side binding, client-side binding does not work for database identifiers such as database, table, or column names, since Python-style formatting cannot distinguish between the different types of strings, and they need to be formatted differently (backticks or double quotes for database identifiers, single quotes for data values).
+Note that unlike server-side binding, client-side binding doesn't work for database identifiers such as database, table, or column names, since Python-style formatting can't distinguish between the different types of strings, and they need to be formatted differently (backticks or double quotes for database identifiers, single quotes for data values).
- Example with Python Dictionary, DateTime value and string escaping
@@ -358,7 +358,7 @@ client.query("SELECT event_type, sum(timeout) FROM event_errors WHERE event_time
## Client `command` Method {#client-command-method}
-Use the `Client.command` method to send SQL queries to the ClickHouse server that do not normally return data or that return a single primitive or array value rather than a full dataset. This method takes the following parameters:
+Use the `Client.command` method to send SQL queries to the ClickHouse server that don't normally return data or that return a single primitive or array value rather than a full dataset. This method takes the following parameters:
| Parameter | Type | Default | Description |
|---------------|------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -593,9 +593,9 @@ The base `query` method returns a `QueryResult` object with the following public
- `summary` -- Any data returned by the `X-ClickHouse-Summary` HTTP response header
- `first_item` -- A convenience property for retrieving the first row of the response as a dictionary (keys are column names)
- `first_row` -- A convenience property to return the first row of the result
-- `column_block_stream` -- A generator of query results in column oriented format. This property should not be referenced directly (see below).
-- `row_block_stream` -- A generator of query results in row oriented format. This property should not be referenced directly (see below).
-- `rows_stream` -- A generator of query results that yields a single row per invocation. This property should not be referenced directly (see below).
+- `column_block_stream` -- A generator of query results in column oriented format. This property shouldn't be referenced directly (see below).
+- `row_block_stream` -- A generator of query results in row oriented format. This property shouldn't be referenced directly (see below).
+- `rows_stream` -- A generator of query results that yields a single row per invocation. This property shouldn't be referenced directly (see below).
- `summary` -- As described under the `command` method, a dictionary of summary information returned by ClickHouse
The `*_stream` properties return a Python Context that can be used as an iterator for the returned data. They should only be accessed indirectly using the Client `*_stream` methods.
@@ -632,7 +632,7 @@ This method returns a "query summary" dictionary as described under the "command
For specialized insert methods that work with Pandas DataFrames, PyArrow Tables, and Arrow-backed DataFrames, see [Advanced Inserting (Specialized Insert Methods)](advanced-inserting.md#specialized-insert-methods).
:::note
-A NumPy array is a valid Sequence of Sequences and can be used as the `data` argument to the main `insert` method, so a specialized method is not required.
+A NumPy array is a valid Sequence of Sequences and can be used as the `data` argument to the main `insert` method, so a specialized method isn't required.
:::
### Examples {#examples}
diff --git a/docs/integrations/language-clients/python/index.md b/docs/integrations/language-clients/python/index.md
index c8d3f550d45..218c305d029 100644
--- a/docs/integrations/language-clients/python/index.md
+++ b/docs/integrations/language-clients/python/index.md
@@ -23,7 +23,7 @@ ClickHouse Connect is a core database driver providing interoperability with a w
- The main interface is the `Client` object in the package `clickhouse_connect.driver`. That core package also includes assorted helper classes and utility functions used for communicating with the ClickHouse server and "context" implementations for advanced management of insert and select queries.
- The `clickhouse_connect.datatypes` package provides a base implementation and subclasses for all non-experimental ClickHouse datatypes. Its primary functionality is serialization and deserialization of ClickHouse data into the ClickHouse "Native" binary columnar format, used to achieve the most efficient transport between ClickHouse and client applications.
- The Cython/C classes in the `clickhouse_connect.cdriver` package optimize some of the most common serializations and deserializations for significantly improved performance over pure Python.
-- There is a [SQLAlchemy](https://www.sqlalchemy.org/) dialect in the package `clickhouse_connect.cc_sqlalchemy` which is built off of the `datatypes` and `dbi` packages. This implementation supports SQLAlchemy Core functionality including `SELECT` queries with `JOIN`s (`INNER`, `LEFT OUTER`, `FULL OUTER`, `CROSS`), `WHERE` clauses, `ORDER BY`, `LIMIT`/`OFFSET`, `DISTINCT` operations, lightweight `DELETE` statements with `WHERE` conditions, table reflection, and basic DDL operations (`CREATE TABLE`, `CREATE`/`DROP DATABASE`). While it does not support advanced ORM features or advanced DDL features, it provides robust query capabilities suitable for most analytical workloads against ClickHouse's OLAP-oriented database.
+- There is a [SQLAlchemy](https://www.sqlalchemy.org/) dialect in the package `clickhouse_connect.cc_sqlalchemy` which is built off of the `datatypes` and `dbi` packages. This implementation supports SQLAlchemy Core functionality including `SELECT` queries with `JOIN`s (`INNER`, `LEFT OUTER`, `FULL OUTER`, `CROSS`), `WHERE` clauses, `ORDER BY`, `LIMIT`/`OFFSET`, `DISTINCT` operations, lightweight `DELETE` statements with `WHERE` conditions, table reflection, and basic DDL operations (`CREATE TABLE`, `CREATE`/`DROP DATABASE`). While it doesn't support advanced ORM features or advanced DDL features, it provides robust query capabilities suitable for most analytical workloads against ClickHouse's OLAP-oriented database.
- The core driver and [ClickHouse Connect SQLAlchemy](sqlalchemy.md) implementation are the preferred method for connecting ClickHouse to Apache Superset. Use the `ClickHouse Connect` database connection, or `clickhousedb` SQLAlchemy dialect connection string.
This documentation is current as of the clickhouse-connect release 0.9.2.
@@ -46,7 +46,7 @@ The official ClickHouse Connect Python driver uses the HTTP protocol for communi
¹ClickHouse Connect has been explicitly tested against the listed platforms. In addition, untested binary wheels (with C optimization) are built for all architectures supported by the excellent [`cibuildwheel`](https://cibuildwheel.readthedocs.io/en/stable/) project. Finally, because ClickHouse Connect can also run as pure Python, the source installation should work on any recent Python installation.
-²SQLAlchemy support is limited to Core functionality (queries, basic DDL). ORM features are not supported. See [SQLAlchemy Integration Support](sqlalchemy.md) docs for details.
+²SQLAlchemy support is limited to Core functionality (queries, basic DDL). ORM features aren't supported. See [SQLAlchemy Integration Support](sqlalchemy.md) docs for details.
³ClickHouse Connect generally works well with versions outside the officially supported range.
diff --git a/docs/integrations/language-clients/python/sqlalchemy.md b/docs/integrations/language-clients/python/sqlalchemy.md
index 3252d336488..dc341be0a17 100644
--- a/docs/integrations/language-clients/python/sqlalchemy.md
+++ b/docs/integrations/language-clients/python/sqlalchemy.md
@@ -131,8 +131,8 @@ with Session(engine) as session:
## Scope and limitations {#scope-and-limitations}
- Core focus: Enable SQLAlchemy Core features like `SELECT` with `JOIN`s (`INNER`, `LEFT OUTER`, `FULL OUTER`, `CROSS`), `WHERE`, `ORDER BY`, `LIMIT`/`OFFSET`, and `DISTINCT`.
- `DELETE` with `WHERE` only: The dialect supports lightweight `DELETE` but requires an explicit `WHERE` clause to avoid accidental full-table deletes. To clear a table, use `TRUNCATE TABLE`.
-- No `UPDATE`: ClickHouse is append-optimized. The dialect does not implement `UPDATE`. If you need to change data, apply transformations upstream and re-insert, or use explicit text SQL (for example, `ALTER TABLE ... UPDATE`) at your own risk.
-- DDL and reflection: Creating databases and tables is supported, and reflection returns column types and table engine metadata. Traditional PK/FK/index metadata is not present because ClickHouse does not enforce those constraints.
-- ORM scope: Declarative models and inserts via `Session.add(...)`/`bulk_save_objects(...)` work for convenience. Advanced ORM features (relationship management, unit-of-work updates, cascading, eager/lazy loading semantics) are not supported.
-- Primary key semantics: `Column(..., primary_key=True)` is used by SQLAlchemy for object identity only. It does not create a server-side constraint in ClickHouse. Define `ORDER BY` (and optional `PRIMARY KEY`) via table engines (for example, `MergeTree(order_by=...)`).
-- Transactions and server features: Two-phase transactions, sequences, `RETURNING`, and advanced isolation levels are not supported. `engine.begin()` provides a Python context manager for grouping statements but performs no actual transaction control (commit/rollback are no-ops).
+- No `UPDATE`: ClickHouse is append-optimized. The dialect doesn't implement `UPDATE`. If you need to change data, apply transformations upstream and re-insert, or use explicit text SQL (for example, `ALTER TABLE ... UPDATE`) at your own risk.
+- DDL and reflection: Creating databases and tables is supported, and reflection returns column types and table engine metadata. Traditional PK/FK/index metadata isn't present because ClickHouse doesn't enforce those constraints.
+- ORM scope: Declarative models and inserts via `Session.add(...)`/`bulk_save_objects(...)` work for convenience. Advanced ORM features (relationship management, unit-of-work updates, cascading, eager/lazy loading semantics) aren't supported.
+- Primary key semantics: `Column(..., primary_key=True)` is used by SQLAlchemy for object identity only. It doesn't create a server-side constraint in ClickHouse. Define `ORDER BY` (and optional `PRIMARY KEY`) via table engines (for example, `MergeTree(order_by=...)`).
+- Transactions and server features: Two-phase transactions, sequences, `RETURNING`, and advanced isolation levels aren't supported. `engine.begin()` provides a Python context manager for grouping statements but performs no actual transaction control (commit/rollback are no-ops).
diff --git a/docs/integrations/language-clients/rust.md b/docs/integrations/language-clients/rust.md
index 015ab80a4db..677140d6f45 100644
--- a/docs/integrations/language-clients/rust.md
+++ b/docs/integrations/language-clients/rust.md
@@ -44,7 +44,7 @@ See also: [crates.io page](https://crates.io/crates/clickhouse).
* `lz4` (enabled by default) — enables `Compression::Lz4` and `Compression::Lz4Hc(_)` variants. If enabled, `Compression::Lz4` is used by default for all queries except for `WATCH`.
* `native-tls` — supports urls with the `HTTPS` schema via `hyper-tls`, which links against OpenSSL.
-* `rustls-tls` — supports urls with the `HTTPS` schema via `hyper-rustls`, which does not link against OpenSSL.
+* `rustls-tls` — supports urls with the `HTTPS` schema via `hyper-rustls`, which doesn't link against OpenSSL.
* `inserter` — enables `client.inserter()`.
* `test-util` — adds mocks. See [the example](https://github.com/ClickHouse/clickhouse-rs/tree/main/examples/mock.rs). Use it only in `dev-dependencies`.
* `watch` — enables `client.watch` functionality. See the corresponding section for details.
@@ -61,7 +61,7 @@ If both are enabled, the `rustls-tls` feature will take precedence.
The client is compatible with the LTS or newer versions of ClickHouse, as well as ClickHouse Cloud.
ClickHouse server older than v22.6 handles RowBinary [incorrectly in some rare cases](https://github.com/ClickHouse/ClickHouse/issues/37420).
-You could use v0.11+ and enable `wa-37420` feature to solve this problem. Note: this feature should not be used with newer ClickHouse versions.
+You could use v0.11+ and enable `wa-37420` feature to solve this problem. Note: this feature shouldn't be used with newer ClickHouse versions.
## Examples {#examples}
@@ -220,7 +220,7 @@ inserter.end().await?;
* All rows between `commit()` calls are inserted in the same `INSERT` statement.
:::warning
-Do not forget to flush if you want to terminate/finalize inserting:
+Don't forget to flush if you want to terminate/finalize inserting:
```rust
inserter.end().await?;
```
@@ -291,14 +291,14 @@ let client = Client::default()
```
:::danger
-With clustered deployments, due to lack of "sticky sessions", you need to be connected to a _particular cluster node_ in order to properly utilize this feature, cause, for example, a round-robin load-balancer will not guarantee that the consequent requests will be processed by the same ClickHouse node.
+With clustered deployments, due to lack of "sticky sessions", you need to be connected to a _particular cluster node_ in order to properly utilize this feature, cause, for example, a round-robin load-balancer won't guarantee that the consequent requests will be processed by the same ClickHouse node.
:::
See also: [session_id example](https://github.com/ClickHouse/clickhouse-rs/blob/main/examples/session_id.rs) in the client repo.
### Custom HTTP headers {#custom-http-headers}
-If you are using proxy authentication or need to pass custom headers, you can do it like this:
+If you're using proxy authentication or need to pass custom headers, you can do it like this:
```rust
let client = Client::default()
@@ -345,7 +345,7 @@ See also the additional examples:
:::
* `(U)Int(8|16|32|64|128)` maps to/from corresponding `(u|i)(8|16|32|64|128)` types or newtypes around them.
-* `(U)Int256` are not supported directly, but there is [a workaround for it](https://github.com/ClickHouse/clickhouse-rs/issues/48).
+* `(U)Int256` aren't supported directly, but there is [a workaround for it](https://github.com/ClickHouse/clickhouse-rs/issues/48).
* `Float(32|64)` maps to/from corresponding `f(32|64)` or newtypes around them.
* `Decimal(32|64|128)` maps to/from corresponding `i(32|64|128)` or newtypes around them. It's more convenient to use [`fixnum`](https://github.com/loyd/fixnum) or another implementation of signed fixed-point numbers.
* `Boolean` maps to/from `bool` or newtypes around it.
@@ -549,7 +549,7 @@ struct EventLog {
## Known limitations {#known-limitations}
* `Variant`, `Dynamic`, (new) `JSON` data types aren't supported yet.
-* Server-side parameter binding is not supported yet; see [this issue](https://github.com/ClickHouse/clickhouse-rs/issues/142) for tracking.
+* Server-side parameter binding isn't supported yet; see [this issue](https://github.com/ClickHouse/clickhouse-rs/issues/142) for tracking.
## Contact us {#contact-us}
diff --git a/docs/integrations/sql-clients/datagrip.md b/docs/integrations/sql-clients/datagrip.md
index f3ea58595e0..c26925de1d5 100644
--- a/docs/integrations/sql-clients/datagrip.md
+++ b/docs/integrations/sql-clients/datagrip.md
@@ -46,7 +46,7 @@ DataGrip is available at https://www.jetbrains.com/datagrip/
- Switch to the **Drivers** tab and load the ClickHouse driver
- DataGrip does not ship with drivers in order to minimize the download size. On the **Drivers** tab
+ DataGrip doesn't ship with drivers in order to minimize the download size. On the **Drivers** tab
Select **ClickHouse** from the **Complete Support** list, and expand the **+** sign. Choose the **Latest stable** driver from the **Provided Driver** option:
diff --git a/docs/integrations/sql-clients/dbeaver.md b/docs/integrations/sql-clients/dbeaver.md
index 78762348fbd..f2fa3559faa 100644
--- a/docs/integrations/sql-clients/dbeaver.md
+++ b/docs/integrations/sql-clients/dbeaver.md
@@ -56,7 +56,7 @@ DBeaver is available at https://dbeaver.io/download/
-- By default the **SSL > Use SSL** property will be unset, if you are connecting to ClickHouse Cloud or a server that requires SSL on the HTTP port, then set **SSL > Use SSL** on:
+- By default the **SSL > Use SSL** property will be unset, if you're connecting to ClickHouse Cloud or a server that requires SSL on the HTTP port, then set **SSL > Use SSL** on:
@@ -64,7 +64,7 @@ DBeaver is available at https://dbeaver.io/download/
-If DBeaver detects that you do not have the ClickHouse driver installed it will offer to download them for you:
+If DBeaver detects that you don't have the ClickHouse driver installed it will offer to download them for you:
diff --git a/docs/integrations/sql-clients/dbvisualizer.md b/docs/integrations/sql-clients/dbvisualizer.md
index a53d9430323..cd12b884689 100644
--- a/docs/integrations/sql-clients/dbvisualizer.md
+++ b/docs/integrations/sql-clients/dbvisualizer.md
@@ -45,7 +45,7 @@ To connect a database with DbVisualizer, you must first create and setup a Datab
4. Leave the **Database Type** as **Auto Detect**.
-5. If the selected driver in **Driver Type** is marked with a green check mark then it is ready to use. If it is not marked with a green check mark, you may have to configure the driver in the **Driver Manager**.
+5. If the selected driver in **Driver Type** is marked with a green check mark then it is ready to use. If it isn't marked with a green check mark, you may have to configure the driver in the **Driver Manager**.
6. Enter information about the database server in the remaining fields.
diff --git a/docs/integrations/sql-clients/marimo.md b/docs/integrations/sql-clients/marimo.md
index 7eb62338d89..01016d43ebd 100644
--- a/docs/integrations/sql-clients/marimo.md
+++ b/docs/integrations/sql-clients/marimo.md
@@ -115,7 +115,7 @@ SELECT * FROM trips LIMIT 1000;
-Now, you are able to view the results in a dataframe. I would like to visualize the most expensive drop-offs from a given pickup location. marimo provides several UI components to help you. I will use a dropdown to select the location and altair for charting.
+Now, you're able to view the results in a dataframe. I would like to visualize the most expensive drop-offs from a given pickup location. marimo provides several UI components to help you. I will use a dropdown to select the location and altair for charting.
diff --git a/docs/integrations/sql-clients/qstudio.md b/docs/integrations/sql-clients/qstudio.md
index de6ec1e0870..319a887b6bd 100644
--- a/docs/integrations/sql-clients/qstudio.md
+++ b/docs/integrations/sql-clients/qstudio.md
@@ -54,7 +54,7 @@ QStudio is available at https://www.timestored.com/qstudio/download/
Password: `XXXXXXXXXXX`
4. Click Add
-If QStudio detects that you do not have the ClickHouse JDBC driver installed, it will offer to download them for you:
+If QStudio detects that you don't have the ClickHouse JDBC driver installed, it will offer to download them for you:
## 4. Query ClickHouse {#4-query-clickhouse}
diff --git a/docs/integrations/sql-clients/sql-console.md b/docs/integrations/sql-clients/sql-console.md
index 70243bea994..9e6eb846b7c 100644
--- a/docs/integrations/sql-clients/sql-console.md
+++ b/docs/integrations/sql-clients/sql-console.md
@@ -118,7 +118,7 @@ The SQL console can convert your sorts and filters directly into queries with on
:::note
-Filters and sorts are not mandatory when using the 'Create Query' feature.
+Filters and sorts aren't mandatory when using the 'Create Query' feature.
:::
You can learn more about querying in the SQL console by reading the (link) query documentation.
@@ -310,7 +310,7 @@ After a query is executed, you can quickly search through the returned result se
-Note: Any field matching the inputted value will be returned. For example, the third record in the above screenshot does not match 'breakfast' in the `by` field, but the `text` field does:
+Note: Any field matching the inputted value will be returned. For example, the third record in the above screenshot doesn't match 'breakfast' in the `by` field, but the `text` field does:
diff --git a/docs/integrations/tools/data-integration/pg_clickhouse/introduction.md b/docs/integrations/tools/data-integration/pg_clickhouse/introduction.md
index 69beb436364..8485e36be66 100644
--- a/docs/integrations/tools/data-integration/pg_clickhouse/introduction.md
+++ b/docs/integrations/tools/data-integration/pg_clickhouse/introduction.md
@@ -131,7 +131,7 @@ sudo make install
diff --git a/docs/kubernetes-operator/02_install/olm.mdx b/docs/kubernetes-operator/02_install/olm.mdx
index 023d01b8267..6b0b5f8af2c 100644
--- a/docs/kubernetes-operator/02_install/olm.mdx
+++ b/docs/kubernetes-operator/02_install/olm.mdx
@@ -18,7 +18,7 @@ This guide covers installing the ClickHouse Operator using Operator Lifecycle Ma
## Install OLM {#install-olm}
-If OLM is not already installed in your cluster, install it:
+If OLM isn't already installed in your cluster, install it:
```bash
# Check if OLM is installed
diff --git a/docs/kubernetes-operator/03_guides/01_introduction.mdx b/docs/kubernetes-operator/03_guides/01_introduction.mdx
index ddc521fc75f..b2173bc129c 100644
--- a/docs/kubernetes-operator/03_guides/01_introduction.mdx
+++ b/docs/kubernetes-operator/03_guides/01_introduction.mdx
@@ -73,16 +73,16 @@ The Keeper cluster must be referenced in the ClickHouseCluster spec using `keepe
### One-to-One Keeper relationship {#one-to-one-keeper-relationship}
-Each ClickHouseCluster must have its own dedicated KeeperCluster. You cannot share a single KeeperCluster between multiple ClickHouseClusters.
+Each ClickHouseCluster must have its own dedicated KeeperCluster. You can't share a single KeeperCluster between multiple ClickHouseClusters.
-**Why?** The operator automatically generates a unique authentication key for each ClickHouseCluster to access its Keeper. This key is stored in a Secret and cannot be shared.
+**Why?** The operator automatically generates a unique authentication key for each ClickHouseCluster to access its Keeper. This key is stored in a Secret and can't be shared.
**Consequences**:
-- Multiple ClickHouseClusters cannot reference the same KeeperCluster
+- Multiple ClickHouseClusters can't reference the same KeeperCluster
- Recreating a ClickHouseCluster requires recreating its KeeperCluster
:::note
-Persistent Volumes are not deleted automatically when ClickHouseCluster or KeeperCluster resources are deleted.
+Persistent Volumes aren't deleted automatically when ClickHouseCluster or KeeperCluster resources are deleted.
:::
When recreating a cluster:
@@ -132,7 +132,7 @@ CREATE DATABASE my_database ON CLUSTER 'default' ENGINE = Replicated;
Non-replicated database engines (Atomic, Lazy, SQLite, Ordinary) require manual schema management:
- Tables must be created individually on each replica
- Schema drift can occur between nodes
-- Operator cannot automatically sync new replicas
+- Operator can't automatically sync new replicas
### Disable schema replication {#disable-schema-replication}
diff --git a/docs/kubernetes-operator/03_guides/02_configuration.mdx b/docs/kubernetes-operator/03_guides/02_configuration.mdx
index 288899e5855..16c8f2f5825 100644
--- a/docs/kubernetes-operator/03_guides/02_configuration.mdx
+++ b/docs/kubernetes-operator/03_guides/02_configuration.mdx
@@ -251,7 +251,7 @@ spec:
```
:::note
-It is not recommended to use ConfigMap to store plain text passwords.
+It isn't recommended to use ConfigMap to store plain text passwords.
:::
Create the secret:
diff --git a/docs/managing-data/core-concepts/academic_overview.mdx b/docs/managing-data/core-concepts/academic_overview.mdx
index 30426c72d1d..8a3eb4e3725 100644
--- a/docs/managing-data/core-concepts/academic_overview.mdx
+++ b/docs/managing-data/core-concepts/academic_overview.mdx
@@ -23,7 +23,7 @@ import image_12 from '@site/static/images/managing-data/core-concepts/_vldb2024_
import image_13 from '@site/static/images/managing-data/core-concepts/_vldb2024_10_Figure_13.png'
import Image from '@theme/IdealImage';
-{/* needed as docusaurus cannot resolve links to span ids, we need a custom span */}
+{/* needed as docusaurus can't resolve links to span ids, we need a custom span */}
export function Anchor(props) {
useBrokenLinks().collectAnchor(props.id);
return ;
@@ -97,7 +97,7 @@ Figure 3: Inserts and merges for ^^MergeTree^^*-engine tables.
Compared to LSM trees [\[58\]](#page-13-7) and their implementation in various databases [\[13,](#page-12-6) [26,](#page-12-7) [56\]](#page-13-8), ClickHouse treats all ^^parts^^ as equal instead of arranging them in a hierarchy. As a result, merges are no longer limited to ^^parts^^ in the same level. Since this also forgoes the implicit chronological ordering of ^^parts^^, alternative mechanisms for updates and deletes not based on tombstones are required (see Section [3.4)](#page-4-0). ClickHouse writes inserts directly to disk while other LSM-treebased stores typically use write-ahead logging (see Section [3.7)](#page-5-1).
-A part corresponds to a directory on disk, containing one file for each column. As an optimization, the columns of a small part (smaller than 10 MB by default) are stored consecutively in a single file to increase the spatial locality for reads and writes. The rows of a part are further logically divided into groups of 8192 records, called granules. A ^^granule^^ represents the smallest indivisible data unit processed by the scan and index lookup operators in ClickHouse. Reads and writes of on-disk data are, however, not performed at the ^^granule^^ level but at the granularity of blocks, which combine multiple neighboring granules within a column. New blocks are formed based on a configurable byte size per ^^block^^ (by default 1 MB), i.e., the number of granules in a ^^block^^ is variable and depends on the column's data type and distribution. Blocks are furthermore compressed to reduce their size and I/O costs. By default, ClickHouse employs LZ4 [\[75\]](#page-13-9) as a general-purpose compression algorithm, but users can also specify specialized codecs like Gorilla [\[63\]](#page-13-10) or FPC [\[12\]](#page-12-8) for floating-point data. Compression algorithms can also be chained. For example, it is possible to first reduce logical redundancy in numeric values using delta coding [\[23\]](#page-12-9), then perform heavy-weight compression, and finally encrypt the data using an AES codec. Blocks are decompressed on-the-fly when they are loaded from disk into memory. To enable fast random access to individual granules despite compression, ClickHouse additionally stores for each column a mapping that associates every ^^granule^^ id with the offset of its containing compressed ^^block^^ in the column file and the offset of the ^^granule^^ in the uncompressed ^^block^^.
+A part corresponds to a directory on disk, containing one file for each column. As an optimization, the columns of a small part (smaller than 10 MB by default) are stored consecutively in a single file to increase the spatial locality for reads and writes. The rows of a part are further logically divided into groups of 8192 records, called granules. A ^^granule^^ represents the smallest indivisible data unit processed by the scan and index lookup operators in ClickHouse. Reads and writes of on-disk data are, however, not performed at the ^^granule^^ level but at the granularity of blocks, which combine multiple neighboring granules within a column. New blocks are formed based on a configurable byte size per ^^block^^ (by default 1 MB), i.e., the number of granules in a ^^block^^ is variable and depends on the column's data type and distribution. Blocks are furthermore compressed to reduce their size and I/O costs. By default, ClickHouse employs LZ4 [\[75\]](#page-13-9) as a general-purpose compression algorithm, but users can also specify specialized codecs like Gorilla [\[63\]](#page-13-10) or FPC [\[12\]](#page-12-8) for floating-point data. Compression algorithms can also be chained. For example, it is possible to first reduce logical redundancy in numeric values using delta coding [\[23\]](#page-12-9), then perform heavy-weight compression, and finally encrypt the data using an AES codec. Blocks are decompressed on-the-fly when they're loaded from disk into memory. To enable fast random access to individual granules despite compression, ClickHouse additionally stores for each column a mapping that associates every ^^granule^^ id with the offset of its containing compressed ^^block^^ in the column file and the offset of the ^^granule^^ in the uncompressed ^^block^^.
Columns can further be ^^dictionary^^-encoded [\[2,](#page-12-10) [77,](#page-13-11) [81\]](#page-13-12) or made nullable using two special wrapper data types: LowCardinality(T) replaces the original column values by integer ids and thus significantly reduces the storage overhead for data with few unique values. Nullable(T) adds an internal bitmap to column T, representing whether column values are NULL or not.
@@ -117,15 +117,15 @@ Figure 4: Evaluating filters with a ^^primary key^^ index.
Second, users can create **table projections**, i.e., alternative versions of a table that contain the same rows sorted by a different ^^primary key^^ [\[71\]](#page-13-13). Projections allow to speed up queries that filter on columns different than the main table's ^^primary key^^ at the cost of an increased overhead for inserts, merges, and space consumption. By default, projections are populated lazily only from ^^parts^^ newly inserted into the main table but not from existing ^^parts^^ unless the user materializes the ^^projection^^ in full. The query optimizer chooses between reading from the main table or a ^^projection^^ based on estimated I/O costs. If no ^^projection^^ exists for a part, query execution falls back to the corresponding main table part.
-Third, **skipping indices** provide a lightweight alternative to projections. The idea of skipping indices is to store small amounts of metadata at the level of multiple consecutive granules which allows to avoid scanning irrelevant rows. Skipping indices can be created for arbitrary index expressions and using a configurable granularity, i.e. number of granules in a ^^skipping index^^ block. Available ^^skipping index^^ types include: 1. Min-max indices [\[51\]](#page-13-14), storing the minimum and maximum values of the index expression for each index ^^block^^. This index type works well for locally clustered data with small absolute ranges, e.g. loosely sorted data. 2. Set indices, storing a configurable number of unique index ^^block^^ values. These indexes are best used with data with a small local cardinality, i.e. "clumped together" values. 3. Bloom filter indices [\[9\]](#page-12-14) build for row, token, or n-gram values with a configurable false positive rate. These indices support text search [\[73\]](#page-13-15), but unlike min-max and set indices, they cannot be used for range or negative predicates.
+Third, **skipping indices** provide a lightweight alternative to projections. The idea of skipping indices is to store small amounts of metadata at the level of multiple consecutive granules which allows to avoid scanning irrelevant rows. Skipping indices can be created for arbitrary index expressions and using a configurable granularity, i.e. number of granules in a ^^skipping index^^ block. Available ^^skipping index^^ types include: 1. Min-max indices [\[51\]](#page-13-14), storing the minimum and maximum values of the index expression for each index ^^block^^. This index type works well for locally clustered data with small absolute ranges, e.g. loosely sorted data. 2. Set indices, storing a configurable number of unique index ^^block^^ values. These indexes are best used with data with a small local cardinality, i.e. "clumped together" values. 3. Bloom filter indices [\[9\]](#page-12-14) build for row, token, or n-gram values with a configurable false positive rate. These indices support text search [\[73\]](#page-13-15), but unlike min-max and set indices, they can't be used for range or negative predicates.
### 3.3 Merge-time Data Transformation \{#3-3-merge-time-data-transformation\}
-Business intelligence and observability use cases often need to handle data generated at constantly high rates or in bursts. Also, recently generated data is typically more relevant for meaningful real-time insights than historical data. Such use cases require databases to sustain high data ingestion rates while continuously reducing the volume of historical data through techniques like aggregation or data aging. ClickHouse allows a continuous incremental transformation of existing data using different merge strategies. Merge-time data transformation does not compromise the performance of INSERT statements, but it cannot guarantee that tables never contain unwanted (e.g. outdated or non-aggregated) values. If necessary, all merge-time transformations can be applied at query time by specifying the keyword FINAL in SELECT statements.
+Business intelligence and observability use cases often need to handle data generated at constantly high rates or in bursts. Also, recently generated data is typically more relevant for meaningful real-time insights than historical data. Such use cases require databases to sustain high data ingestion rates while continuously reducing the volume of historical data through techniques like aggregation or data aging. ClickHouse allows a continuous incremental transformation of existing data using different merge strategies. Merge-time data transformation doesn't compromise the performance of INSERT statements, but it can't guarantee that tables never contain unwanted (e.g. outdated or non-aggregated) values. If necessary, all merge-time transformations can be applied at query time by specifying the keyword FINAL in SELECT statements.
**Replacing merges** retain only the most recently inserted version of a tuple based on the creation timestamp of its containing part, older versions are deleted. Tuples are considered equivalent if they have the same ^^primary key^^ column values. For explicit control over which tuple is preserved, it is also possible to specify a special version column for comparison. Replacing merges are commonly used as a merge-time update mechanism (normally in use cases where updates are frequent), or as an alternative to insert-time data deduplication (Section [3.5)](#page-5-2).
-**Aggregating merges** collapse rows with equal ^^primary key^^ column values into an aggregated row. Non-^^primary key^^ columns must be of a partial aggregation state that holds the summary values. Two partial aggregation states, e.g. a sum and a count for avg(), are combined into a new partial aggregation state. Aggregating merges are typically used in materialized views instead of normal tables. Materialized views are populated based on a transformation query against a source table. Unlike other databases, ClickHouse does not refresh materialized views periodically with the entire content of the source table. Materialized views are rather updated incrementally with the result of the transformation query when a new part is inserted into the source table.
+**Aggregating merges** collapse rows with equal ^^primary key^^ column values into an aggregated row. Non-^^primary key^^ columns must be of a partial aggregation state that holds the summary values. Two partial aggregation states, e.g. a sum and a count for avg(), are combined into a new partial aggregation state. Aggregating merges are typically used in materialized views instead of normal tables. Materialized views are populated based on a transformation query against a source table. Unlike other databases, ClickHouse doesn't refresh materialized views periodically with the entire content of the source table. Materialized views are rather updated incrementally with the result of the transformation query when a new part is inserted into the source table.
[Figure 5](#page-4-1) shows a ^^materialized view^^ defined on a table with page impression statistics. For new ^^parts^^ inserted into the source table, the transformation query computes the maximum and average latencies, grouped by region, and inserts the result into a ^^materialized view^^. Aggregation functions avg() and max() with extension -State return partial aggregation states instead of actual results. An aggregating merge defined for the ^^materialized view^^ continuously combines partial aggregation states in different ^^parts^^. To obtain the final result, users consolidate the partial aggregation states in the ^^materialized view^^ using avg() and max()) with -Merge extension.
@@ -176,7 +176,7 @@ Three optimizations to speed up synchronization exist: First, new nodes added to
### 3.7 ACID Compliance \{#3-7-acid-compliance\}
-To maximize the performance of concurrent read and write operations, ClickHouse avoids latching as much as possible. Queries are executed against a snapshot of all ^^parts^^ in all involved tables created at the beginning of the query. This ensures that new ^^parts^^ inserted by parallel INSERTs or merges (Section [3.1)](#page-2-2) do not participate in execution. To prevent ^^parts^^ from being modified or removed simultaneously (Section [3.4)](#page-4-0), the reference count of the processed ^^parts^^ is incremented for the duration of the query. Formally, this corresponds to snapshot isolation realized by an MVCC variant [\[6\]](#page-12-18) based on versioned ^^parts^^. As a result, statements are generally not ACID-compliant except for the rare case that concurrent writes at the time the snapshot is taken each affect only a single part.
+To maximize the performance of concurrent read and write operations, ClickHouse avoids latching as much as possible. Queries are executed against a snapshot of all ^^parts^^ in all involved tables created at the beginning of the query. This ensures that new ^^parts^^ inserted by parallel INSERTs or merges (Section [3.1)](#page-2-2) don't participate in execution. To prevent ^^parts^^ from being modified or removed simultaneously (Section [3.4)](#page-4-0), the reference count of the processed ^^parts^^ is incremented for the duration of the query. Formally, this corresponds to snapshot isolation realized by an MVCC variant [\[6\]](#page-12-18) based on versioned ^^parts^^. As a result, statements are generally not ACID-compliant except for the rare case that concurrent writes at the time the snapshot is taken each affect only a single part.
In practice, most of ClickHouse's write-heavy decision making use cases even tolerate a small risk of losing new data in case of a power outage. The database takes advantage of this by not forcing a commit (fsync) of newly inserted ^^parts^^ to disk by default, allowing the kernel to batch writes at the cost of forgoing ^^atomicity^^.
@@ -226,7 +226,7 @@ This section presents selected key performance optimizations applied to differen
**Query compilation**. ClickHouse employs [query compilation based on LLVM](https://clickhou.se/jit) to dynamically fuse adjacent plan operators [\[38,](#page-12-22) [53\]](#page-13-0). For example, the expression a * b + c + 1 can be combined into a single operator instead of three operators. Besides expressions, ClickHouse also employs compilation to evaluate multiple aggregation functions at once (i.e. for GROUP BY) and for sorting with more than one sort key. Query compilation decreases the number of virtual calls, keeps data in registers or CPU caches, and helps the branch predictor as less code needs to execute. Additionally, runtime compilation enables a rich set of optimizations, such as logical optimizations and peephole optimizations implemented in compilers, and gives access to the fastest locally available CPU instructions. The compilation is initiated only when the same regular, aggregation, or sorting expression is executed by different queries more than a configurable number of times. Compiled query operators are cached and can be reused by future queries.[7]
-**^^Primary key^^ index evaluation**. ClickHouse evaluates WHERE conditions using the ^^primary key^^ index if a subset of filter clauses in the condition's conjunctive normal form constitutes a prefix of the ^^primary key^^ columns. The ^^primary key^^ index is analyzed left-to-right on lexicographically sorted ranges of key values. Filter clauses corresponding to a ^^primary key^^ column are evaluated using ternary logic - they are all true, all false, or mixed true/false for the values in the range. In the latter case, the range is split into sub-ranges which are analyzed recursively. Additional optimizations exist for functions in filter conditions. First, functions have traits describing their monotonicity, e.g, toDayOfMonth(date) is piecewise monotonic within a month. Monotonicity traits allow to infer if a function produces sorted results on sorted input key value ranges. Second, some functions can compute the preimage of a given function result. This is used to replace comparisons of constants with function calls on the key columns by comparing the key column value with the preimage. For example, toYear(k) = 2024 can be replaced by k >= 2024-01-01 && k < 2025-01-01.
+**^^Primary key^^ index evaluation**. ClickHouse evaluates WHERE conditions using the ^^primary key^^ index if a subset of filter clauses in the condition's conjunctive normal form constitutes a prefix of the ^^primary key^^ columns. The ^^primary key^^ index is analyzed left-to-right on lexicographically sorted ranges of key values. Filter clauses corresponding to a ^^primary key^^ column are evaluated using ternary logic - they're all true, all false, or mixed true/false for the values in the range. In the latter case, the range is split into sub-ranges which are analyzed recursively. Additional optimizations exist for functions in filter conditions. First, functions have traits describing their monotonicity, e.g, toDayOfMonth(date) is piecewise monotonic within a month. Monotonicity traits allow to infer if a function produces sorted results on sorted input key value ranges. Second, some functions can compute the preimage of a given function result. This is used to replace comparisons of constants with function calls on the key columns by comparing the key column value with the preimage. For example, toYear(k) = 2024 can be replaced by k >= 2024-01-01 && k < 2025-01-01.
**Data skipping**. ClickHouse tries to avoid data reads at query runtime using the data structures presented in Section [3.2.](#page-3-0) Additionally, filters on different columns are evaluated sequentially in order of descending estimated selectivity based on heuristics and (optional) column statistics. Only data chunks that contain at least one matching row are passed to the next predicate. This gradually decreases the amount of read data and the number of computations to be performed from predicate to predicate. The optimization is only applied when at least one highly selective predicate is present; otherwise, the latency of the query would deteriorate compared to an evaluation of all predicates in parallel.
@@ -251,7 +251,7 @@ Figure 9: Parallel hash join with three hash table partitions.
### 4.5 Workload Isolation \{#4-5-workload-isolation\}
-ClickHouse offers concurrency control, memory usage limits, and I/O scheduling, enabling users to isolate queries into workload classes. By setting limits on shared resources (CPU cores, DRAM, disk and network I/O) for specific workload classes, it ensures these queries do not affect other critical business queries.
+ClickHouse offers concurrency control, memory usage limits, and I/O scheduling, enabling users to isolate queries into workload classes. By setting limits on shared resources (CPU cores, DRAM, disk and network I/O) for specific workload classes, it ensures these queries don't affect other critical business queries.
Concurrency control prevents thread oversubscription in scenarios with a high number of concurrent queries. More specifically, the number of worker threads per query are adjusted dynamically based on a specified ratio to the number of available CPU cores.
@@ -263,7 +263,7 @@ Lastly, I/O scheduling allows users to restrict local and remote disk accesses f
Real-time decision-making applications often depend on efficient and low-latency access to data in multiple locations. Two approaches exist to make external data available in an OLAP database. With push-based data access, a third-party component bridges the database with external data stores. One example of this are specialized extract-transform-load (ETL) tools which push remote data to the destination system. In the pull-based model, the database itself connects to remote data sources and pulls data for querying into local tables or exports data to remote systems. While push-based approaches are more versatile and common, they entail a larger architectural footprint and scalability bottleneck. In contrast, remote connectivity directly in the database offers interesting capabilities, such as joins between local and remote data, while keeping the overall architecture simple and reducing the time to insight.
-The rest of the section explores pull-based data integration methods in ClickHouse, aimed to access data in remote locations. We note that the idea of remote connectivity in SQL databases is not new. For example, the SQL/MED standard [\[35\]](#page-12-25), introduced in 2001 and implemented by PostgreSQL since 2011 [\[65\]](#page-13-21), proposes foreign data wrappers as a unified interface for managing external data. Maximum interoperability with other data stores and storage formats is one of ClickHouse's design goals. As of March 2024, ClickHouse offers to the best of our knowledge the most built-in data integration options across all analytical databases.
+The rest of the section explores pull-based data integration methods in ClickHouse, aimed to access data in remote locations. We note that the idea of remote connectivity in SQL databases isn't new. For example, the SQL/MED standard [\[35\]](#page-12-25), introduced in 2001 and implemented by PostgreSQL since 2011 [\[65\]](#page-13-21), proposes foreign data wrappers as a unified interface for managing external data. Maximum interoperability with other data stores and storage formats is one of ClickHouse's design goals. As of March 2024, ClickHouse offers to the best of our knowledge the most built-in data integration options across all analytical databases.
External Connectivity. ClickHouse provides [50+](https://clickhou.se/query-integrations) integration table functions and engines for connectivity with external systems and storage locations, including ODBC, MySQL, PostgreSQL, SQLite, Kafka, Hive, MongoDB, Redis, S3/GCP/Azure object stores and various data lakes. We break them further down into categories shown by the following bonus figure (not part of the original vldb paper).
@@ -283,7 +283,7 @@ Third, **dictionaries** can be populated using arbitrary queries against almost
Data Formats. To interact with 3rd party systems, modern analytical databases must also be able to process data in any format. Besides its native format, ClickHouse supports [90+](https://clickhou.se/query-formats) formats, including CSV, JSON, Parquet, Avro, ORC, Arrow, and Protobuf. Each format can be an input format (which ClickHouse can read), an output format (which ClickHouse can export), or both. Some analytics-oriented formats like Parquet are also integrated with query processing, i.e, the optimizer can exploit embedded statistics, and filters are evaluated directly on compressed data.
-Compatibility interfaces. Besides its native binary wire protocol and HTTP, clients can interact with ClickHouse over MySQL or PostgreSQL wire-protocol-compatible interfaces. This compatibility feature is useful to enable access from proprietary applications (e.g. certain business intelligence tools), where vendors have not yet implemented native ClickHouse connectivity.
+Compatibility interfaces. Besides its native binary wire protocol and HTTP, clients can interact with ClickHouse over MySQL or PostgreSQL wire-protocol-compatible interfaces. This compatibility feature is useful to enable access from proprietary applications (e.g. certain business intelligence tools), where vendors haven't yet implemented native ClickHouse connectivity.
## 6 PERFORMANCE AS A FEATURE \{#6-performance-as-a-feature\}
@@ -309,7 +309,7 @@ While benchmarking has been criticized for being not realistic enough [\[10,](#p
Filter and aggregation queries on denormalized fact tables historically represent the primary use case of ClickHouse. We report runtimes of ClickBench, a typical workload of this kind that simulates ad-hoc and periodic reporting queries used in clickstream and traffic analysis. The benchmark consists of 43 queries against a table with 100 million anonymized page hits, sourced from one of the web's largest analytics platforms. An online dashboard [\[17\]](#page-12-28) shows measurements (cold/hot runtimes, data import time, on-disk size) for over 45 commercial and research databases as of June 2024. Results are submitted by independent contributors based on the publicly available data set and queries [\[16\]](#page-12-29). The queries test sequential and index scan access paths and routinely expose CPU-, IO-, or memory-bound relational operators.
-[Figure 10](#page-10-0) shows the total relative cold and hot runtimes for sequentially executing all ClickBench queries in databases frequently used for analytics. The measurements were taken on a single-node AWS EC2 c6a.4xlarge instance with 16 vCPUs, 32 GB RAM, and 5000 IOPS / 1000 MiB/s disk. Comparable systems were used for Redshift ([ra3.4xlarge](https://clickhou.se/redshift-sizes), 12 vCPUs, 96 GB RAM) and Snowfake ([warehouse size S](https://clickhou.se/snowflake-sizes): 2x8 vCPUs, 2x16 GB RAM). The physical database design is tuned only lightly, for example, we specify primary keys, but do not change the compression of individual columns, create projections, or skipping indexes. We also flush the Linux page cache prior to each cold query run, but do not adjust database or operating system knobs. For every query, the fastest runtime across databases is used as a baseline. Relative query runtimes for other databases are calculated as ( + 10)/(_ + 10). The total relative runtime for a database is the geometric mean of the per-query ratios. While the research database Umbra [\[54\]](#page-13-25) achieves the best overall hot runtime, ClickHouse outperforms all other production-grade databases for hot and cold runtimes.
+[Figure 10](#page-10-0) shows the total relative cold and hot runtimes for sequentially executing all ClickBench queries in databases frequently used for analytics. The measurements were taken on a single-node AWS EC2 c6a.4xlarge instance with 16 vCPUs, 32 GB RAM, and 5000 IOPS / 1000 MiB/s disk. Comparable systems were used for Redshift ([ra3.4xlarge](https://clickhou.se/redshift-sizes), 12 vCPUs, 96 GB RAM) and Snowfake ([warehouse size S](https://clickhou.se/snowflake-sizes): 2x8 vCPUs, 2x16 GB RAM). The physical database design is tuned only lightly, for example, we specify primary keys, but don't change the compression of individual columns, create projections, or skipping indexes. We also flush the Linux page cache prior to each cold query run, but don't adjust database or operating system knobs. For every query, the fastest runtime across databases is used as a baseline. Relative query runtimes for other databases are calculated as ( + 10)/(_ + 10). The total relative runtime for a database is the geometric mean of the per-query ratios. While the research database Umbra [\[54\]](#page-13-25) achieves the best overall hot runtime, ClickHouse outperforms all other production-grade databases for hot and cold runtimes.
@@ -325,7 +325,7 @@ Figure 11: Relative hot runtimes of VersionsBench 2018-2024.
#### 6.2.2 Normalized tables \{#6-2-2-normalized-tables\}
-In classical warehousing, data is often modeled using star or snowfake schemas. We present runtimes of TPC-H queries (scale factor 100) but remark that normalized tables are an emerging use case for ClickHouse. [Figure 12](#page-10-6) shows the hot runtimes of the TPC-H queries based on the parallel hash join algorithm described in Section [4.4.](#page-7-0) The measurements were taken on a single-node AWS EC2 c6i.16xlarge instance with 64 vCPUs, 128 GB RAM, and 5000 IOPS / 1000 MiB/s disk. The fastest of fve runs was recorded. For reference, we performed the same measurements in a Snowfake system of comparable size (warehouse size L, 8x8 vCPUs, 8x16 GB RAM). The results of eleven queries are excluded from the table: Queries Q2, Q4, Q13, Q17, and Q20-22 include correlated subqueries which are not supported as of ClickHouse v24.6. Queries Q7-Q9 and Q19 depend on extended plan-level optimizations for joins such as join reordering and join predicate pushdown (both missing as of ClickHouse v24.6.) to achieve viable runtimes. Automatic subquery decorrelation and better optimizer support for joins are planned for implementation in 2024 [\[18\]](#page-12-33). Out of the remaining 11 queries, 5 (6) queries executed faster in ClickHouse (Snowfake). As aforementioned optimizations are known to be critical for performance [\[27\]](#page-12-34), we expect them to improve runtimes of these queries further once implemented.
+In classical warehousing, data is often modeled using star or snowfake schemas. We present runtimes of TPC-H queries (scale factor 100) but remark that normalized tables are an emerging use case for ClickHouse. [Figure 12](#page-10-6) shows the hot runtimes of the TPC-H queries based on the parallel hash join algorithm described in Section [4.4.](#page-7-0) The measurements were taken on a single-node AWS EC2 c6i.16xlarge instance with 64 vCPUs, 128 GB RAM, and 5000 IOPS / 1000 MiB/s disk. The fastest of fve runs was recorded. For reference, we performed the same measurements in a Snowfake system of comparable size (warehouse size L, 8x8 vCPUs, 8x16 GB RAM). The results of eleven queries are excluded from the table: Queries Q2, Q4, Q13, Q17, and Q20-22 include correlated subqueries which aren't supported as of ClickHouse v24.6. Queries Q7-Q9 and Q19 depend on extended plan-level optimizations for joins such as join reordering and join predicate pushdown (both missing as of ClickHouse v24.6.) to achieve viable runtimes. Automatic subquery decorrelation and better optimizer support for joins are planned for implementation in 2024 [\[18\]](#page-12-33). Out of the remaining 11 queries, 5 (6) queries executed faster in ClickHouse (Snowfake). As aforementioned optimizations are known to be critical for performance [\[27\]](#page-12-34), we expect them to improve runtimes of these queries further once implemented.
@@ -339,7 +339,7 @@ The most similar databases to ClickHouse, in terms of goals and design principle
Snowfake [\[22\]](#page-12-37) is a popular proprietary cloud data warehouse based on a shared-disk architecture. Its approach of dividing tables into micro-partitions is similar to the concept of ^^parts^^ in ClickHouse. Snowfake uses hybrid PAX pages [\[3\]](#page-12-41) for persistence, whereas ClickHouse's storage format is strictly columnar. Snowfake also emphasizes local caching and data pruning using automatically created lightweight indexes [\[31,](#page-12-13) [51\]](#page-13-14) as a source for good performance. Similar to primary keys in ClickHouse, users may optionally create clustered indexes to co-locate data with the same values.
-Photon [\[5\]](#page-12-39) and Velox [\[62\]](#page-13-32) are query execution engines designed to be used as components in complex data management systems. Both systems are passed query plans as input, which are then executed on the local node over Parquet (Photon) or Arrow (Velox) files [\[46\]](#page-13-34). ClickHouse is able to consume and generate data in these generic formats but prefers its native file format for storage. While Velox and Photon do not optimize the query plan (Velox performs basic expression optimizations), they utilize runtime adaptivity techniques, such as dynamically switching compute kernels depending on the data characteristics. Similarly, plan operators in ClickHouse
+Photon [\[5\]](#page-12-39) and Velox [\[62\]](#page-13-32) are query execution engines designed to be used as components in complex data management systems. Both systems are passed query plans as input, which are then executed on the local node over Parquet (Photon) or Arrow (Velox) files [\[46\]](#page-13-34). ClickHouse is able to consume and generate data in these generic formats but prefers its native file format for storage. While Velox and Photon don't optimize the query plan (Velox performs basic expression optimizations), they utilize runtime adaptivity techniques, such as dynamically switching compute kernels depending on the data characteristics. Similarly, plan operators in ClickHouse
can create other operators at runtime, primarily to switch to external aggregation or join operators, based on the query memory consumption. The Photon paper notes that code-generating designs [\[38,](#page-12-22) [41,](#page-12-42) [53\]](#page-13-0) are harder to develop and debug than interpreted vectorized designs [\[11\]](#page-12-0). The (experimental) support for code generation in Velox builds and links a shared library produced from runtime-generated C++ code, whereas ClickHouse interacts directly with LLVM's on-request compilation API.
diff --git a/docs/managing-data/core-concepts/index.md b/docs/managing-data/core-concepts/index.md
index 0746062a97f..5cd8a066921 100644
--- a/docs/managing-data/core-concepts/index.md
+++ b/docs/managing-data/core-concepts/index.md
@@ -12,8 +12,8 @@ you will learn some of the core concepts of how ClickHouse works.
| Page | Description |
|----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Table parts](./parts.md) | Learn what table parts are in ClickHouse. |
-| [Table partitions](./partitions.mdx) | Learn what table partitions are and what they are used for. |
-| [Table part merges](./merges.mdx) | Learn what table part merges are and what they are used for. |
-| [Table shards and replicas](./shards.mdx) | Learn what table shards and replicas are and what they are used for. |
+| [Table partitions](./partitions.mdx) | Learn what table partitions are and what they're used for. |
+| [Table part merges](./merges.mdx) | Learn what table part merges are and what they're used for. |
+| [Table shards and replicas](./shards.mdx) | Learn what table shards and replicas are and what they're used for. |
| [Primary indexes](./primary-indexes.mdx) | Introduces ClickHouse's sparse primary index and how it helps efficiently skip unnecessary data during query execution. Explains how the index is built and used, with examples and tools for observing its effect. Links to a deep dive for advanced use cases and best practices. |
| [Architectural Overview](./academic_overview.mdx) | A concise academic overview of all components of the ClickHouse architecture, based on our VLDB 2024 scientific paper. |
diff --git a/docs/managing-data/core-concepts/merges.mdx b/docs/managing-data/core-concepts/merges.mdx
index aee140251f3..5f82d95d793 100644
--- a/docs/managing-data/core-concepts/merges.mdx
+++ b/docs/managing-data/core-concepts/merges.mdx
@@ -36,7 +36,7 @@ The following diagram sketches this background merge process:
-The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. ^^Parts^^ that were merged into larger ^^parts^^ are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged ^^parts^^. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.
+The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and hasn't been merged yet. ^^Parts^^ that were merged into larger ^^parts^^ are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged ^^parts^^. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.
## Monitoring merges \{#monitoring-merges\}
@@ -105,7 +105,7 @@ Note that increasing the number of CPU cores and the size of RAM allows to incre
## Memory optimized merges \{#memory-optimized-merges\}
-ClickHouse does not necessarily load all ^^parts^^ to be merged into memory at once, as sketched in the [previous example](/merges#concurrent-merges). Based on several [factors](https://github.com/ClickHouse/ClickHouse/blob/bf37120c925ed846ae5cd72cd51e6340bebd2918/src/Storages/MergeTree/MergeTreeSettings.cpp#L210), and to reduce memory consumption (sacrificing merge speed), so-called [vertical merging](https://github.com/ClickHouse/ClickHouse/blob/bf37120c925ed846ae5cd72cd51e6340bebd2918/src/Storages/MergeTree/MergeTreeSettings.cpp#L209) loads and merges ^^parts^^ by chunks of blocks instead of in one go.
+ClickHouse doesn't necessarily load all ^^parts^^ to be merged into memory at once, as sketched in the [previous example](/merges#concurrent-merges). Based on several [factors](https://github.com/ClickHouse/ClickHouse/blob/bf37120c925ed846ae5cd72cd51e6340bebd2918/src/Storages/MergeTree/MergeTreeSettings.cpp#L210), and to reduce memory consumption (sacrificing merge speed), so-called [vertical merging](https://github.com/ClickHouse/ClickHouse/blob/bf37120c925ed846ae5cd72cd51e6340bebd2918/src/Storages/MergeTree/MergeTreeSettings.cpp#L209) loads and merges ^^parts^^ by chunks of blocks instead of in one go.
## Merge mechanics \{#merge-mechanics\}
diff --git a/docs/managing-data/core-concepts/partitions.mdx b/docs/managing-data/core-concepts/partitions.mdx
index 85b67e1c9de..9ab82940933 100644
--- a/docs/managing-data/core-concepts/partitions.mdx
+++ b/docs/managing-data/core-concepts/partitions.mdx
@@ -123,7 +123,7 @@ TTL date + INTERVAL 12 MONTH TO VOLUME 'slow_but_cheap';
### Query optimization \{#query-optimization\}
-Partitions can assist with query performance, but this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it, as shown in the example query below.
+Partitions can assist with query performance, but this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key isn't in the primary key and you're filtering by it, as shown in the example query below.
```sql runnable
SELECT MAX(price) AS highest_price
@@ -133,7 +133,7 @@ WHERE date >= '2020-12-01'
AND town = 'LONDON';
```
-The query runs over our example table from above and [calculates](https://sql.clickhouse.com/?query=U0VMRUNUIE1BWChwcmljZSkgQVMgaGlnaGVzdF9wcmljZQpGUk9NIHVrLnVrX3ByaWNlX3BhaWRfc2ltcGxlX3BhcnRpdGlvbmVkCldIRVJFIGRhdGUgPj0gJzIwMjAtMTItMDEnCiAgQU5EIGRhdGUgPD0gJzIwMjAtMTItMzEnCiAgQU5EIHRvd24gPSAnTE9ORE9OJzs&run_query=true&tab=results) the highest price of all sold properties in London in December 2020 by filtering on both a column (`date`) used in the table's partition key and on a column (`town`) used in the table's primary key (and `date` is not part of the primary key).
+The query runs over our example table from above and [calculates](https://sql.clickhouse.com/?query=U0VMRUNUIE1BWChwcmljZSkgQVMgaGlnaGVzdF9wcmljZQpGUk9NIHVrLnVrX3ByaWNlX3BhaWRfc2ltcGxlX3BhcnRpdGlvbmVkCldIRVJFIGRhdGUgPj0gJzIwMjAtMTItMDEnCiAgQU5EIGRhdGUgPD0gJzIwMjAtMTItMzEnCiAgQU5EIHRvd24gPSAnTE9ORE9OJzs&run_query=true&tab=results) the highest price of all sold properties in London in December 2020 by filtering on both a column (`date`) used in the table's partition key and on a column (`town`) used in the table's primary key (and `date` isn't part of the primary key).
ClickHouse processes that query by applying a sequence of pruning techniques to avoid evaluating irrelevant data:
diff --git a/docs/managing-data/core-concepts/parts.md b/docs/managing-data/core-concepts/parts.md
index 839425a55ac..5e1987f3d23 100644
--- a/docs/managing-data/core-concepts/parts.md
+++ b/docs/managing-data/core-concepts/parts.md
@@ -99,4 +99,4 @@ ORDER BY name ASC;
4. │ all_6_11_1 │ 1 │ 6459763 │
└─────────────┴───────┴─────────┘
```
-The merge level is incremented by one with each additional merge on the part. A level of 0 indicates this is a new part that has not been merged yet.
+The merge level is incremented by one with each additional merge on the part. A level of 0 indicates this is a new part that hasn't been merged yet.
diff --git a/docs/managing-data/core-concepts/shards.mdx b/docs/managing-data/core-concepts/shards.mdx
index b86e1f452d8..c50c48cb0c2 100644
--- a/docs/managing-data/core-concepts/shards.mdx
+++ b/docs/managing-data/core-concepts/shards.mdx
@@ -101,7 +101,7 @@ Query processing works similarly to setups without replicas, with only a single
② The ^^Distributed table^^ forwards the query to one ^^replica^^ from each ^^shard^^, where each ClickHouse server hosting the selected ^^replica^^ computes its local query result in parallel.
-The rest works the [same](#select-forwarding) as in setups without replicas and is not shown in the diagram above. The ClickHouse server hosting the initially targeted ^^distributed table^^ collects all local results, merges them into the final global result, and returns it to the query sender.
+The rest works the [same](#select-forwarding) as in setups without replicas and isn't shown in the diagram above. The ClickHouse server hosting the initially targeted ^^distributed table^^ collects all local results, merges them into the final global result, and returns it to the query sender.
Note that ClickHouse allows configuring the query forwarding strategy for ②. By default—unlike in the diagram above—the ^^distributed table^^ [prefers](/docs/operations/settings/settings#prefer_localhost_replica) a local ^^replica^^ if available, but other load balancing [strategies](/docs/operations/settings/settings#load_balancing) can be used.
diff --git a/docs/managing-data/deleting-data/delete_mutations.mdx b/docs/managing-data/deleting-data/delete_mutations.mdx
index 31515935f6a..7f7d404737a 100644
--- a/docs/managing-data/deleting-data/delete_mutations.mdx
+++ b/docs/managing-data/deleting-data/delete_mutations.mdx
@@ -8,7 +8,7 @@ keywords: ['delete mutations', 'ALTER TABLE DELETE', 'data mutations', 'data par
doc_type: 'reference'
---
-Delete mutations refers to `ALTER` queries that manipulate table data through delete. Most notably they are queries like `ALTER TABLE DELETE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
+Delete mutations refers to `ALTER` queries that manipulate table data through delete. Most notably they're queries like `ALTER TABLE DELETE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
:::info
For deletes, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/guides/replacing-merge-tree) or [CollapsingMergeTree](/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
diff --git a/docs/managing-data/deleting-data/overview.mdx b/docs/managing-data/deleting-data/overview.mdx
index 9f82b12654a..b30f54662ec 100644
--- a/docs/managing-data/deleting-data/overview.mdx
+++ b/docs/managing-data/deleting-data/overview.mdx
@@ -19,7 +19,7 @@ Here is a summary of the different ways to delete data in ClickHouse:
## Lightweight deletes \{#lightweight-deletes\}
-Lightweight deletes cause rows to be immediately marked as deleted such that they can be automatically filtered out of all subsequent `SELECT` queries. Subsequent removal of these deleted rows occurs during natural merge cycles and thus incurs less I/O. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted. If you need to guarantee that data is deleted, consider the above mutation command.
+Lightweight deletes cause rows to be immediately marked as deleted such that they can be automatically filtered out of all subsequent `SELECT` queries. Subsequent removal of these deleted rows occurs during natural merge cycles and thus incurs less I/O. As a result, it is possible that for an unspecified period, data isn't actually deleted from storage and is only marked as deleted. If you need to guarantee that data is deleted, consider the above mutation command.
```sql
-- delete all data from 2018 with a lightweight delete. Not recommended.
@@ -43,7 +43,7 @@ Delete mutations can be issued through a `ALTER TABLE ... DELETE` command e.g.
ALTER TABLE posts DELETE WHERE toYear(CreationDate) = 2018
```
-These can be executed either synchronously (by default if non-replicated) or asynchronously (determined by the [mutations_sync](/operations/settings/settings#mutations_sync) setting). These are extremely IO-heavy, rewriting all the parts that match the `WHERE` expression. There is no atomicity to this process - parts are substituted for mutated parts as soon as they are ready, and a `SELECT` query that starts executing during a mutation will see data from parts that have already been mutated along with data from parts that have not been mutated yet. Users can track the state of the progress via the [system.mutations](/operations/system-tables/mutations#monitoring-mutations) table. These are I/O intense operations and should be used sparingly as they can impact cluster `SELECT` performance.
+These can be executed either synchronously (by default if non-replicated) or asynchronously (determined by the [mutations_sync](/operations/settings/settings#mutations_sync) setting). These are extremely IO-heavy, rewriting all the parts that match the `WHERE` expression. There is no atomicity to this process - parts are substituted for mutated parts as soon as they're ready, and a `SELECT` query that starts executing during a mutation will see data from parts that have already been mutated along with data from parts that haven't been mutated yet. Users can track the state of the progress via the [system.mutations](/operations/system-tables/mutations#monitoring-mutations) table. These are I/O intense operations and should be used sparingly as they can impact cluster `SELECT` performance.
Read more about [delete mutations](/sql-reference/statements/alter/delete).
diff --git a/docs/managing-data/truncate.md b/docs/managing-data/truncate.md
index cb21faf10a5..bacf79dce98 100644
--- a/docs/managing-data/truncate.md
+++ b/docs/managing-data/truncate.md
@@ -8,7 +8,7 @@ doc_type: 'reference'
keywords: ['truncate', 'delete data', 'remove data', 'clear table', 'table maintenance']
---
-Truncate allows the data in a table or database to be removed, while preserving their existence. This is a lightweight operation which cannot be reversed.
+Truncate allows the data in a table or database to be removed, while preserving their existence. This is a lightweight operation which can't be reversed.
import Truncate from '@site/docs/sql-reference/statements/truncate.md';
diff --git a/docs/managing-data/updating-data/update_mutations.mdx b/docs/managing-data/updating-data/update_mutations.mdx
index 41dcf182a76..8536f5c22cc 100644
--- a/docs/managing-data/updating-data/update_mutations.mdx
+++ b/docs/managing-data/updating-data/update_mutations.mdx
@@ -8,7 +8,7 @@ doc_type: 'reference'
keywords: ['update', 'mutations', 'alter table', 'data manipulation', 'modify data']
---
-Update mutations refers to `ALTER` queries that manipulate table data through updates. Most notably they are queries like `ALTER TABLE UPDATE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
+Update mutations refers to `ALTER` queries that manipulate table data through updates. Most notably they're queries like `ALTER TABLE UPDATE`, etc. Performing such queries will produce new mutated versions of the data parts. This means that such statements would trigger a rewrite of whole data parts for all data that was inserted before the mutation, translating to a large amount of write requests.
:::info
For updates, you can avoid these large amounts of write requests by using specialised table engines like [ReplacingMergeTree](/guides/replacing-merge-tree) or [CollapsingMergeTree](/engines/table-engines/mergetree-family/collapsingmergetree) instead of the default MergeTree table engine.
diff --git a/docs/materialized-view/incremental-materialized-view.md b/docs/materialized-view/incremental-materialized-view.md
index 3080f912977..0dc53c7f191 100644
--- a/docs/materialized-view/incremental-materialized-view.md
+++ b/docs/materialized-view/incremental-materialized-view.md
@@ -14,11 +14,11 @@ import Image from '@theme/IdealImage';
Incremental Materialized Views (Materialized Views) allow you to shift the cost of computation from query time to insert time, resulting in faster `SELECT` queries.
-Unlike in transactional databases like Postgres, a ClickHouse materialized view is just a trigger that runs a query on blocks of data as they are inserted into a table. The result of this query is inserted into a second "target" table. Should more rows be inserted, results will again be sent to the target table where the intermediate results will be updated and merged. This merged result is the equivalent of running the query over all of the original data.
+Unlike in transactional databases like Postgres, a ClickHouse materialized view is just a trigger that runs a query on blocks of data as they're inserted into a table. The result of this query is inserted into a second "target" table. Should more rows be inserted, results will again be sent to the target table where the intermediate results will be updated and merged. This merged result is the equivalent of running the query over all of the original data.
The principal motivation for Materialized Views is that the results inserted into the target table represent the results of an aggregation, filtering, or transformation on rows. These results will often be a smaller representation of the original data (a partial sketch in the case of aggregations). This, along with the resulting query for reading the results from the target table being simple, ensures query times are faster than if the same computation was performed on the original data, shifting computation (and thus query latency) from query time to insert time.
-Materialized views in ClickHouse are updated in real time as data flows into the table they are based on, functioning more like continually updating indexes. This is in contrast to other databases where Materialized Views are typically static snapshots of a query that must be refreshed (similar to ClickHouse [Refreshable Materialized Views](/sql-reference/statements/create/view#refreshable-materialized-view)).
+Materialized views in ClickHouse are updated in real time as data flows into the table they're based on, functioning more like continually updating indexes. This is in contrast to other databases where Materialized Views are typically static snapshots of a query that must be refreshed (similar to ClickHouse [Refreshable Materialized Views](/sql-reference/statements/create/view#refreshable-materialized-view)).
@@ -209,13 +209,13 @@ Peak memory usage: 658.84 MiB.
As before, we can create a materialized view which executes the above query as new posts are inserted into our `posts` table.
-For the purposes of example, and to avoid loading the posts data from S3, we will create a duplicate table `posts_null` with the same schema as `posts`. However, this table will not store any data and simply be used by the materialized view when rows are inserted. To prevent storage of data, we can use the [`Null` table engine type](/engines/table-engines/special/null).
+For the purposes of example, and to avoid loading the posts data from S3, we will create a duplicate table `posts_null` with the same schema as `posts`. However, this table won't store any data and simply be used by the materialized view when rows are inserted. To prevent storage of data, we can use the [`Null` table engine type](/engines/table-engines/special/null).
```sql
CREATE TABLE posts_null AS posts ENGINE = Null
```
-The Null table engine is a powerful optimization - think of it as `/dev/null`. Our materialized view will compute and store our summary statistics when our `posts_null` table receives rows at insert time - it's just a trigger. However, the raw data will not be stored. While in our case, we probably still want to store the original posts, this approach can be used to compute aggregates while avoiding storage overhead of the raw data.
+The Null table engine is a powerful optimization - think of it as `/dev/null`. Our materialized view will compute and store our summary statistics when our `posts_null` table receives rows at insert time - it's just a trigger. However, the raw data won't be stored. While in our case, we probably still want to store the original posts, this approach can be used to compute aggregates while avoiding storage overhead of the raw data.
The materialized view thus becomes:
@@ -292,7 +292,7 @@ CREATE MATERIALIZED VIEW posts_mv TO posts AS
### Lookup table {#lookup-table}
-You should consider their access patterns when choosing a ClickHouse ordering key. Columns which are frequently used in filter and aggregation clauses should be used. This can be restrictive for scenarios where users have more diverse access patterns which cannot be encapsulated in a single set of columns. For example, consider the following `comments` table:
+You should consider their access patterns when choosing a ClickHouse ordering key. Columns which are frequently used in filter and aggregation clauses should be used. This can be restrictive for scenarios where users have more diverse access patterns which can't be encapsulated in a single set of columns. For example, consider the following `comments` table:
```sql
CREATE TABLE comments
@@ -378,13 +378,13 @@ For more information see the guide ["Cascading materialized views"](https://clic
The following applies to Incremental Materialized Views only. Refreshable Materialized Views execute their query periodically over the full target dataset and fully support JOINs. Consider using them for complex JOINs if a reduction in result freshness can be tolerated.
:::
-Incremental Materialized views in ClickHouse fully support `JOIN` operations, but with one crucial constraint: **the materialized view only triggers on inserts to the source table (the left-most table in the query).** Right-side tables in JOINs do not trigger updates, even if their data changes. This behavior is especially important when building **Incremental** Materialized Views, where data is aggregated or transformed during insert time.
+Incremental Materialized views in ClickHouse fully support `JOIN` operations, but with one crucial constraint: **the materialized view only triggers on inserts to the source table (the left-most table in the query).** Right-side tables in JOINs don't trigger updates, even if their data changes. This behavior is especially important when building **Incremental** Materialized Views, where data is aggregated or transformed during insert time.
-When an Incremental materialized view is defined using a `JOIN`, the left-most table in the `SELECT` query acts as the source. When new rows are inserted into this table, ClickHouse executes the materialized view query *only* with those newly inserted rows. Right-side tables in the JOIN are read in full during this execution, but changes to them alone do not trigger the view.
+When an Incremental materialized view is defined using a `JOIN`, the left-most table in the `SELECT` query acts as the source. When new rows are inserted into this table, ClickHouse executes the materialized view query *only* with those newly inserted rows. Right-side tables in the JOIN are read in full during this execution, but changes to them alone don't trigger the view.
This behavior makes JOINs in Materialized Views similar to a snapshot join against static dimension data.
-This works well for enriching data with reference or dimension tables. However, any updates to the right-side tables (e.g., user metadata) will not retroactively update the materialized view. To see updated data, new inserts must arrive in the source table.
+This works well for enriching data with reference or dimension tables. However, any updates to the right-side tables (e.g., user metadata) won't retroactively update the materialized view. To see updated data, new inserts must arrive in the source table.
### Example {#materialized-views-and-joins-example}
@@ -558,7 +558,7 @@ Note, however, that this result is incorrect.
### Best practices for JOINs in materialized views {#join-best-practices}
-- **Use the left-most table as the trigger.** Only the table on the left side of the `SELECT` statement triggers the materialized view. Changes to right-side tables will not trigger updates.
+- **Use the left-most table as the trigger.** Only the table on the left side of the `SELECT` statement triggers the materialized view. Changes to right-side tables won't trigger updates.
- **Pre-insert joined data.** Ensure that data in joined tables exists before inserting rows into the source table. The JOIN is evaluated at insert time, so missing data will result in unmatched rows or nulls.
@@ -725,7 +725,7 @@ In the above operation, only one row is retrieved from the users table for the u
`UNION ALL` queries are commonly used to combine data from multiple source tables into a single result set.
-While `UNION ALL` is not directly supported in Incremental Materialized Views, you can achieve the same outcome by creating a separate materialized view for each `SELECT` branch and writing their results to a shared target table.
+While `UNION ALL` isn't directly supported in Incremental Materialized Views, you can achieve the same outcome by creating a separate materialized view for each `SELECT` branch and writing their results to a shared target table.
For our example, we'll use the Stack Overflow dataset. Consider the `badges` and `comments` tables below, which represent the badges earned by a user and the comments they make on posts:
@@ -858,7 +858,7 @@ GROUP BY UserId
1 row in set. Elapsed: 0.005 sec.
```
-Inserts into the `badges` table will not trigger the view, causing `user_activity` to not receive updates:
+Inserts into the `badges` table won't trigger the view, causing `user_activity` to not receive updates:
```sql
INSERT INTO badges VALUES (53505058, 2936484, 'gingerwizard', now(), 'Gold', 0);
@@ -1071,7 +1071,7 @@ ORDER BY now ASC
3 rows in set. Elapsed: 0.004 sec.
```
-Although our ordering of the arrival of rows from each view is the same, this is not guaranteed - as illustrated by the similarity of each row's insert time. Also note the improved insert performance.
+Although our ordering of the arrival of rows from each view is the same, this isn't guaranteed - as illustrated by the similarity of each row's insert time. Also note the improved insert performance.
### When to use parallel processing {#materialized-views-when-to-use-parallel}
@@ -1099,8 +1099,8 @@ Leave it disabled when:
**Non-recursive** Common Table Expressions (CTEs) are supported in Materialized Views.
-:::note Common Table Expressions **are not** materialized
-ClickHouse does not materialize CTEs; instead, it substitutes the CTE definition directly into the query, which can lead to multiple evaluations of the same expression (if the CTE is used more than once).
+:::note Common Table Expressions **aren't** materialized
+ClickHouse doesn't materialize CTEs; instead, it substitutes the CTE definition directly into the query, which can lead to multiple evaluations of the same expression (if the CTE is used more than once).
:::
Consider the following example which computes daily activity for each post type.
@@ -1179,7 +1179,7 @@ LIMIT 10
Peak memory usage: 989.53 KiB.
```
-In ClickHouse, CTEs are inlined which means they are effectively copy-pasted into the query during optimization and **not** materialized. This means:
+In ClickHouse, CTEs are inlined which means they're effectively copy-pasted into the query during optimization and **not** materialized. This means:
- If your CTE references a different table from the source table (i.e., the one the materialized view is attached to), and is used in a `JOIN` or `IN` clause, it will behave like a subquery or join, not a trigger.
- The materialized view will still only trigger on inserts into the main source table, but the CTE will be re-executed on every insert, which may cause unnecessary overhead, especially if the referenced table is large.
@@ -1193,6 +1193,6 @@ WITH recent_users AS (
SELECT * FROM stackoverflow.posts WHERE OwnerUserId IN (SELECT Id FROM recent_users)
```
-In this case, the users CTE is re-evaluated on every insert into posts, and the materialized view will not update when new users are inserted - only when posts are.
+In this case, the users CTE is re-evaluated on every insert into posts, and the materialized view won't update when new users are inserted - only when posts are.
Generally, use CTEs for logic that operates on the same source table the materialized view is attached to or ensure that referenced tables are small and unlikely to cause performance bottlenecks. Alternatively, consider [the same optimizations as JOINs with Materialized Views](/materialized-view/incremental-materialized-view#join-best-practices).
diff --git a/docs/materialized-view/refreshable-materialized-view.md b/docs/materialized-view/refreshable-materialized-view.md
index a5229a42707..27fad007870 100644
--- a/docs/materialized-view/refreshable-materialized-view.md
+++ b/docs/materialized-view/refreshable-materialized-view.md
@@ -23,7 +23,7 @@ You can also see the following video:
ClickHouse incremental materialized views are enormously powerful and typically scale much better than the approach used by refreshable materialized views, especially in cases where an aggregate over a single table needs to be performed. By only computing the aggregation over each block of data as it is inserted and merging the incremental states in the final table, the query only ever executes on a subset of the data. This method scales to potentially petabytes of data and is usually the preferred method.
-However, there are use cases where this incremental process is not required or is not applicable. Some problems are either incompatible with an incremental approach or don't require real-time updates, with a periodic rebuild being more appropriate. For example, you may want to regularly perform a complete re-computation of a view over the full dataset because it uses a complex join, which is incompatible with an incremental approach.
+However, there are use cases where this incremental process isn't required or isn't applicable. Some problems are either incompatible with an incremental approach or don't require real-time updates, with a periodic rebuild being more appropriate. For example, you may want to regularly perform a complete re-computation of a view over the full dataset because it uses a complex join, which is incompatible with an incremental approach.
> Refreshable materialized views can run batch processes performing tasks such as denormalization. Dependencies can be created between refreshable materialized views such that one view depends on the results of another and only executes once it is complete. This can replace scheduled workflows or simple DAGs such as a [dbt](https://www.getdbt.com/) job. To find out more about how to set dependencies between refreshable materialized views go to [CREATE VIEW](/sql-reference/statements/create/view#refresh-dependencies), `Dependencies` section.
diff --git a/docs/native-protocol/basics.md b/docs/native-protocol/basics.md
index dd40fcc55bf..f3c9e564025 100644
--- a/docs/native-protocol/basics.md
+++ b/docs/native-protocol/basics.md
@@ -26,7 +26,7 @@ For lengths, packet codes and other cases the *unsigned varint* encoding is used
Use [binary.PutUvarint](https://pkg.go.dev/encoding/binary#PutUvarint) and [binary.ReadUvarint](https://pkg.go.dev/encoding/binary#ReadUvarint).
:::note
-*Signed* varint is not used.
+*Signed* varint isn't used.
:::
## String {#string}
diff --git a/docs/native-protocol/client.md b/docs/native-protocol/client.md
index 709156f0ca1..2fb35b4c780 100644
--- a/docs/native-protocol/client.md
+++ b/docs/native-protocol/client.md
@@ -22,7 +22,7 @@ The `Data` can be compressed.
## Hello {#hello}
-For example, we are `Go Client` v1.10 that supports `54451` protocol version and
+For example, we're `Go Client` v1.10 that supports `54451` protocol version and
want to connect to `default` database with `default` user and `secret` password.
| field | type | value | description |
@@ -40,7 +40,7 @@ want to connect to `default` database with `default` user and `secret` password.
Protocol version is TCP protocol version of client.
Usually it is equal to the latest compatible server revision, but
-should not be confused with it.
+shouldn't be confused with it.
### Defaults {#defaults}
diff --git a/docs/native-protocol/hash.md b/docs/native-protocol/hash.md
index df6d2ceaf85..dc7eb7c1772 100644
--- a/docs/native-protocol/hash.md
+++ b/docs/native-protocol/hash.md
@@ -14,10 +14,10 @@ ClickHouse uses **one of the previous** versions of [CityHash from Google](https
:::info
CityHash has changed the algorithm after we have added it into ClickHouse.
-CityHash documentation specifically notes that the user should not rely on
-specific hash values and should not save it anywhere or use it as a sharding key.
+CityHash documentation specifically notes that the user shouldn't rely on
+specific hash values and shouldn't save it anywhere or use it as a sharding key.
-But as we exposed this function to the user, we had to fix the version of CityHash (to 1.0.2). And now we guarantee that the behaviour of CityHash functions available in SQL will not change.
+But as we exposed this function to the user, we had to fix the version of CityHash (to 1.0.2). And now we guarantee that the behaviour of CityHash functions available in SQL won't change.
— Alexey Milovidov
:::
@@ -26,7 +26,7 @@ But as we exposed this function to the user, we had to fix the version of CityHa
Current version of Google's CityHash [differs](https://github.com/ClickHouse/ClickHouse/issues/8354) from ClickHouse `cityHash64` variant.
-Don't use `farmHash64` to get Google's CityHash value! [FarmHash](https://opensource.googleblog.com/2014/03/introducing-farmhash.html) is a successor to CityHash, but they are not fully compatible.
+Don't use `farmHash64` to get Google's CityHash value! [FarmHash](https://opensource.googleblog.com/2014/03/introducing-farmhash.html) is a successor to CityHash, but they're not fully compatible.
| String | ClickHouse64 | CityHash64 | FarmHash64 |
|------------------------------------------------------------|----------------------|---------------------|----------------------|
diff --git a/docs/operations_/backup_restore/00_overview.md b/docs/operations_/backup_restore/00_overview.md
index f92deee0453..13bef531609 100644
--- a/docs/operations_/backup_restore/00_overview.md
+++ b/docs/operations_/backup_restore/00_overview.md
@@ -17,7 +17,7 @@ in the sidebar.
## Introduction {#introduction}
-While [replication](/engines/table-engines/mergetree-family/replication) provides protection from hardware failures, it does not
+While [replication](/engines/table-engines/mergetree-family/replication) provides protection from hardware failures, it doesn't
protect against human errors: accidental deletion of data, deletion of the wrong
table or a table on the wrong cluster, and software bugs that result in incorrect
data processing or data corruption.
@@ -25,7 +25,7 @@ data processing or data corruption.
In many cases mistakes like these will affect all replicas. ClickHouse has built-in
safeguards to prevent some types of mistakes, for example, by [default](/operations/settings/settings#max_table_size_to_drop)
you can't just drop tables with a `MergeTree` family engine containing more than
-50 Gb of data. However, these safeguards do not cover all possible cases and
+50 Gb of data. However, these safeguards don't cover all possible cases and
problems can still occur.
To effectively mitigate possible human errors, you should carefully prepare a
@@ -41,7 +41,7 @@ shortcomings.
:::note
Keep in mind that if you backed something up and never tried to restore it,
-chances are that the restore will not work properly when you actually need it (or at
+chances are that the restore won't work properly when you actually need it (or at
least it will take longer than the business can tolerate). So whatever backup
approach you choose, make sure to automate the restore process as well, and practice
it on a spare ClickHouse cluster regularly.
@@ -86,7 +86,7 @@ Depending on your needs, you may want to use:
`BACKUP` and `RESTORE` commands can also be marked `ASYNC`. In this case, the
backup command returns immediately, and the backup process runs in the background.
-If the commands are not marked `ASYNC`, the backup process is synchronous and
+If the commands aren't marked `ASYNC`, the backup process is synchronous and
the command blocks until the backup completes.
## Concurrent vs non-concurrent backups {#concurrent-vs-non-concurrent}
@@ -162,7 +162,7 @@ the cluster's overall setup.
This functionality only works for configurations managed through SQL commands
(referred to as ["SQL-driven Access Control and Account Management"](/operations/access-rights#enabling-access-control)).
Access configurations defined in ClickHouse server configuration files (e.g. `users.xml`)
-are not included in backups and cannot be restored through this method.
+aren't included in backups and can't be restored through this method.
## General syntax {#syntax}
diff --git a/docs/operations_/backup_restore/01_local_disk.md b/docs/operations_/backup_restore/01_local_disk.md
index 8cde1f6ad60..7c1cc34d6eb 100644
--- a/docs/operations_/backup_restore/01_local_disk.md
+++ b/docs/operations_/backup_restore/01_local_disk.md
@@ -94,7 +94,7 @@ RESTORE TABLE data AS data_restored FROM Disk('s3_plain', 'cloud_backup');
```
:::note
-- This disk should not be used for `MergeTree` itself, only for `BACKUP`/`RESTORE`
+- This disk shouldn't be used for `MergeTree` itself, only for `BACKUP`/`RESTORE`
- If your tables are backed by S3 storage and the types of the disks are different,
it doesn't use `CopyObject` calls to copy parts to the destination bucket, instead,
it downloads and uploads them, which is very inefficient. In this case prefer using
@@ -223,7 +223,7 @@ SETTINGS password='qwerty'
### Backups as tar archives {#backups-as-tar-archives}
Backups can be stored not only as zip archives, but also as tar archives.
-The functionality is the same as for zip, except that password protection is not
+The functionality is the same as for zip, except that password protection isn't
supported for tar archives. Additionally, tar archives support a variety of
compression methods.
diff --git a/docs/operations_/backup_restore/02_s3_endpoint.md b/docs/operations_/backup_restore/02_s3_endpoint.md
index 5c07dd6f390..fda43479933 100644
--- a/docs/operations_/backup_restore/02_s3_endpoint.md
+++ b/docs/operations_/backup_restore/02_s3_endpoint.md
@@ -107,7 +107,7 @@ LIMIT 100
#### Take an incremental backup {#take-an-incremental-backup}
-This backup command is similar to the base backup, but adds `SETTINGS base_backup` and the location of the base backup. Note that the destination for the incremental backup is not the same directory as the base, it is the same endpoint with a different target directory within the bucket. The base backup is in `my_backup`, and the incremental will be written to `my_incremental`:
+This backup command is similar to the base backup, but adds `SETTINGS base_backup` and the location of the base backup. Note that the destination for the incremental backup isn't the same directory as the base, it is the same endpoint with a different target directory within the bucket. The base backup is in `my_backup`, and the incremental will be written to `my_incremental`:
```sql
BACKUP TABLE test_db.test_table TO S3(
diff --git a/docs/operations_/backup_restore/04_alternative_methods.md b/docs/operations_/backup_restore/04_alternative_methods.md
index 3175c4fde61..8dedb188dc1 100644
--- a/docs/operations_/backup_restore/04_alternative_methods.md
+++ b/docs/operations_/backup_restore/04_alternative_methods.md
@@ -38,9 +38,9 @@ might work as well.
ClickHouse allows using the `ALTER TABLE ... FREEZE PARTITION ...` query to create
a local copy of table partitions. This is implemented using hardlinks to the `/var/lib/clickhouse/shadow/`
-folder, so it usually does not consume extra disk space for old data. The created
-copies of files are not handled by ClickHouse server, so you can just leave them there:
-you will have a simple backup that does not require any additional external system,
+folder, so it usually doesn't consume extra disk space for old data. The created
+copies of files aren't handled by ClickHouse server, so you can just leave them there:
+you will have a simple backup that doesn't require any additional external system,
but it will still be prone to hardware issues. For this reason, it's better to
remotely copy them to another location and then remove the local copies.
Distributed filesystems and object stores are still a good options for this,
diff --git a/docs/tips-and-tricks/cost-optimization.md b/docs/tips-and-tricks/cost-optimization.md
index d2c1717c20b..781274729e0 100644
--- a/docs/tips-and-tricks/cost-optimization.md
+++ b/docs/tips-and-tricks/cost-optimization.md
@@ -83,7 +83,7 @@ This architecture preserves the user experience - people still see meaningful la
- Lower network transfer costs for large result sets
:::note
-This is a an example specifically used for Microsoft Clarity's data scenario. If you have all your data in ClickHouse or do not have constraints against moving data to ClickHouse, try using [dictionaries](/dictionary) instead.
+This is a an example specifically used for Microsoft Clarity's data scenario. If you have all your data in ClickHouse or don't have constraints against moving data to ClickHouse, try using [dictionaries](/dictionary) instead.
:::
## Video sources {#video-sources}
diff --git a/docs/tips-and-tricks/debugging-insights.md b/docs/tips-and-tricks/debugging-insights.md
index 727ff17b2ac..39a42eda295 100644
--- a/docs/tips-and-tricks/debugging-insights.md
+++ b/docs/tips-and-tricks/debugging-insights.md
@@ -92,7 +92,7 @@ If you're using AWS, you should be aware that default general purpose EBS volume
### Too many parts error {#too-many-parts-error}
-Small frequent inserts create performance problems. The community has identified that insert rates above 10 per second often trigger "too many parts" errors because ClickHouse cannot merge parts fast enough.
+Small frequent inserts create performance problems. The community has identified that insert rates above 10 per second often trigger "too many parts" errors because ClickHouse can't merge parts fast enough.
**Solutions:**
- Batch data using 30-second or 200MB thresholds
diff --git a/docs/tips-and-tricks/success-stories.md b/docs/tips-and-tricks/success-stories.md
index 3e2befb7065..85499375662 100644
--- a/docs/tips-and-tricks/success-stories.md
+++ b/docs/tips-and-tricks/success-stories.md
@@ -30,7 +30,7 @@ These stories showcase how companies found success by using ClickHouse for their
## ClickHouse as rate limiter {#clickhouse-rate-limiter}
-When Craigslist needed to add tier-one rate limiting to protect their users, they faced the same decision every engineering team encounters - follow conventional wisdom and use Redis, or explore something different. Brad Lhotsky, working at Craigslist, knew Redis was the standard choice - virtually every rate limiting tutorial and example online uses Redis for good reason. It has rich primitives for rate limiting operations, well-established patterns, and proven track record. But Craigslist's experience with Redis wasn't matching the textbook examples. *"Our experience with Redis is not like what you've seen in the movies... there are a lot of weird maintenance issues that we've hit where we reboot a node in a Redis cluster and some latency spike hits the front end."* For a small team that values maintenance simplicity, these operational headaches were becoming a real problem.
+When Craigslist needed to add tier-one rate limiting to protect their users, they faced the same decision every engineering team encounters - follow conventional wisdom and use Redis, or explore something different. Brad Lhotsky, working at Craigslist, knew Redis was the standard choice - virtually every rate limiting tutorial and example online uses Redis for good reason. It has rich primitives for rate limiting operations, well-established patterns, and proven track record. But Craigslist's experience with Redis wasn't matching the textbook examples. *"Our experience with Redis isn't like what you've seen in the movies... there are a lot of weird maintenance issues that we've hit where we reboot a node in a Redis cluster and some latency spike hits the front end."* For a small team that values maintenance simplicity, these operational headaches were becoming a real problem.
So when Brad was approached with the rate limiting requirements, he took a different approach: *"I asked my boss, 'What do you think of this idea? Maybe I can try this with ClickHouse?'"* The idea was unconventional - using an analytical database for what's typically a caching layer problem - but it addressed their core requirements: fail open, impose no latency penalties, and be maintenance-safe for a small team. The solution leveraged their existing infrastructure where access logs were already flowing into ClickHouse via Kafka. Instead of maintaining a separate Redis cluster, they could analyze request patterns directly from the access log data and inject rate limiting rules into their existing ACL API. The approach meant slightly higher latency than Redis, which *"is kind of cheating by instantiating that data set upfront"* rather than doing real-time aggregate queries, but the queries still completed in under 100 milliseconds.
diff --git a/docs/tools-and-utilities/static-files-disk-uploader.md b/docs/tools-and-utilities/static-files-disk-uploader.md
index b842be2ff83..226d27fc482 100644
--- a/docs/tools-and-utilities/static-files-disk-uploader.md
+++ b/docs/tools-and-utilities/static-files-disk-uploader.md
@@ -10,7 +10,7 @@ doc_type: 'guide'
Outputs a data directory containing metadata for a specified ClickHouse table. This metadata can be used to create a ClickHouse table on a different server containing a read-only dataset backed by a `web` disk.
-Do not use this tool to migrate data. Instead, use the [`BACKUP` and `RESTORE` commands](/operations/backup/overview).
+Don't use this tool to migrate data. Instead, use the [`BACKUP` and `RESTORE` commands](/operations/backup/overview).
## Usage {#usage}
diff --git a/docs/tutorial.md b/docs/tutorial.md
index e6ca2551e3a..e088ecd7608 100644
--- a/docs/tutorial.md
+++ b/docs/tutorial.md
@@ -427,7 +427,7 @@ Here's an excerpt from the CSV file you're using in table format. The `LocationI
SELECT dictHas('taxi_zone_dictionary', 132)
```
-6. The following query returns 0 because 4567 is not a value of `LocationID` in the dictionary:
+6. The following query returns 0 because 4567 isn't a value of `LocationID` in the dictionary:
```sql
SELECT dictHas('taxi_zone_dictionary', 4567)
```
@@ -489,7 +489,7 @@ Write some queries that join the `taxi_zone_dictionary` with your `trips` table.
```
:::note
- Notice the output of the above `JOIN` query is the same as the query before it that used `dictGetOrDefault` (except that the `Unknown` values are not included). Behind the scenes, ClickHouse is actually calling the `dictGet` function for the `taxi_zone_dictionary` dictionary, but the `JOIN` syntax is more familiar for SQL developers.
+ Notice the output of the above `JOIN` query is the same as the query before it that used `dictGetOrDefault` (except that the `Unknown` values aren't included). Behind the scenes, ClickHouse is actually calling the `dictGet` function for the `taxi_zone_dictionary` dictionary, but the `JOIN` syntax is more familiar for SQL developers.
:::
2. This query returns rows for the the 1000 trips with the highest tip amount, then performs an inner join of each row with the dictionary:
diff --git a/docs/use-cases/AI_ML/MCP/03_librechat.md b/docs/use-cases/AI_ML/MCP/03_librechat.md
index d691196ce8f..288bac0f8df 100644
--- a/docs/use-cases/AI_ML/MCP/03_librechat.md
+++ b/docs/use-cases/AI_ML/MCP/03_librechat.md
@@ -171,7 +171,7 @@ Once installed, you can run a model like this:
ollama run qwen3:32b
```
-This will pull the model to your local machine if it is not present.
+This will pull the model to your local machine if it isn't present.
For a list of models see the [Ollama library](https://ollama.com/library)
@@ -229,11 +229,11 @@ What datasets do you have access to?
:::note
-If the MCP server option does not appear in the LibreChat UI,
+If the MCP server option doesn't appear in the LibreChat UI,
check that the proper permissions are set in your `librechat.yaml` file.
:::
-If `use` is set to `false` in the `mcpServers` section under `interface`, the MCP selection dropdown will not appear in chat:
+If `use` is set to `false` in the `mcpServers` section under `interface`, the MCP selection dropdown won't appear in chat:
```yml title="librechat.yaml"
interface:
diff --git a/docs/use-cases/AI_ML/MCP/06_ollama.md b/docs/use-cases/AI_ML/MCP/06_ollama.md
index 8b286158b11..1670dcc5de7 100644
--- a/docs/use-cases/AI_ML/MCP/06_ollama.md
+++ b/docs/use-cases/AI_ML/MCP/06_ollama.md
@@ -36,7 +36,7 @@ Once installed, you can pull a model down to your machine like this:
ollama pull qwen3:8b
```
-This will pull the model to your local machine if it is not present.
+This will pull the model to your local machine if it isn't present.
Once it's downloaded, you can run the model like this:
```bash
diff --git a/docs/use-cases/AI_ML/data-exploration/jupyter-notebook.md b/docs/use-cases/AI_ML/data-exploration/jupyter-notebook.md
index 4aa3dbb8540..f7100589ca8 100644
--- a/docs/use-cases/AI_ML/data-exploration/jupyter-notebook.md
+++ b/docs/use-cases/AI_ML/data-exploration/jupyter-notebook.md
@@ -113,7 +113,7 @@ print(result)
With the UK price paid data set up and chDB up and running in a Jupyter notebook, we can now get started exploring our data.
-Let's imagine we are interested in checking how price has changed with time for a specific area in the UK such as the capital city, London.
+Let's imagine we're interested in checking how price has changed with time for a specific area in the UK such as the capital city, London.
ClickHouse's [`remoteSecure`](/sql-reference/table-functions/remote) function allows you to easily retrieve the data from ClickHouse Cloud.
You can instruct chDB to return this data in process as a Pandas data frame - which is a convenient and familiar way of working with data.
@@ -153,7 +153,7 @@ df.head()
```
In the snippet above, `chdb.query(query, "DataFrame")` runs the specified query and outputs the result to the terminal as a Pandas DataFrame.
-In the query we are using the `remoteSecure` function to connect to ClickHouse Cloud.
+In the query we're using the `remoteSecure` function to connect to ClickHouse Cloud.
The `remoteSecure` functions takes as parameters:
- a connection string
- the name of the database and table to use
@@ -265,7 +265,7 @@ ORDER BY year;
-Although we are missing data from 2020 onwards, we can plot the two datasets against each other for the years 1995 to 2019.
+Although we're missing data from 2020 onwards, we can plot the two datasets against each other for the years 1995 to 2019.
In a new cell run the following command:
```python
diff --git a/docs/use-cases/AI_ML/data-exploration/marimo-notebook.md b/docs/use-cases/AI_ML/data-exploration/marimo-notebook.md
index 2b490a817b1..b9251d42587 100644
--- a/docs/use-cases/AI_ML/data-exploration/marimo-notebook.md
+++ b/docs/use-cases/AI_ML/data-exploration/marimo-notebook.md
@@ -125,7 +125,7 @@ You should see the result shown underneath the cell you just ran:
## Exploring the data {#exploring-the-data}
With the UK price paid data set up and chDB up and running in a Marimo notebook, we can now get started exploring our data.
-Let's imagine we are interested in checking how price has changed with time for a specific area in the UK such as the capital city, London.
+Let's imagine we're interested in checking how price has changed with time for a specific area in the UK such as the capital city, London.
ClickHouse's [`remoteSecure`](/docs/sql-reference/table-functions/remote) function allows you to easily retrieve the data from ClickHouse Cloud.
You can instruct chDB to return this data in process as a Pandas data frame - which is a convenient and familiar way of working with data.
@@ -156,7 +156,7 @@ df.head()
In the snippet above, `chdb.query(query, "DataFrame")` runs the specified query and outputs the result as a Pandas DataFrame.
-In the query we are using the [`remoteSecure`](/sql-reference/table-functions/remote) function to connect to ClickHouse Cloud.
+In the query we're using the [`remoteSecure`](/sql-reference/table-functions/remote) function to connect to ClickHouse Cloud.
The `remoteSecure` functions takes as parameters:
- a connection string
diff --git a/docs/use-cases/data_lake/glue_catalog.md b/docs/use-cases/data_lake/glue_catalog.md
index 0324fc8548b..2a1fa5cfa4d 100644
--- a/docs/use-cases/data_lake/glue_catalog.md
+++ b/docs/use-cases/data_lake/glue_catalog.md
@@ -67,7 +67,7 @@ SHOW TABLES;
└────────────────────────────────────────┘
```
-You can see above that some tables above are not Iceberg tables, for instance
+You can see above that some tables above aren't Iceberg tables, for instance
`iceberg-benchmark.hitsparquet`. You won't be able to query these as only Iceberg
is currently supported.
diff --git a/docs/use-cases/data_lake/onelake_catalog.md b/docs/use-cases/data_lake/onelake_catalog.md
index ce70f351e50..5196342b441 100644
--- a/docs/use-cases/data_lake/onelake_catalog.md
+++ b/docs/use-cases/data_lake/onelake_catalog.md
@@ -34,7 +34,7 @@ Before querying your table in Microsoft Fabric, you'll need to collect the follo
Your warehouse ID is your Workspace ID.
-For Data Item ID - we recommend using your Lakehouse ID. In our testing, it does not work with a Warehouse ID.
+For Data Item ID - we recommend using your Lakehouse ID. In our testing, it doesn't work with a Warehouse ID.
See [Microsoft OneLake's documentation](http://learn.microsoft.com/en-us/fabric/onelake/table-apis/table-apis-overview#prerequisites) for help finding these values.
diff --git a/docs/use-cases/observability/build-your-own/grafana.md b/docs/use-cases/observability/build-your-own/grafana.md
index a457a16919a..50adb5d7b02 100644
--- a/docs/use-cases/observability/build-your-own/grafana.md
+++ b/docs/use-cases/observability/build-your-own/grafana.md
@@ -44,7 +44,7 @@ Once configured you can navigate to [Grafana Explore](https://grafana.com/docs/g
## Logs {#logs}
-If adhering to the Grafana requirements for logs, you can select `Query Type: Log` in the query builder and click `Run Query`. The query builder will formulate a query to list the logs and ensure they are rendered e.g.
+If adhering to the Grafana requirements for logs, you can select `Query Type: Log` in the query builder and click `Run Query`. The query builder will formulate a query to list the logs and ensure they're rendered e.g.
```sql
SELECT Timestamp as timestamp, Body as body, SeverityText as level, TraceId as traceID FROM "default"."otel_logs" WHERE ( timestamp >= $__fromTime AND timestamp <= $__toTime ) ORDER BY timestamp DESC LIMIT 1000
diff --git a/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md b/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md
index 65475a5a84f..e802de32717 100644
--- a/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md
+++ b/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md
@@ -22,14 +22,14 @@ Any Observability solution requires a means of collecting and exporting logs and
"OpenTelemetry is an Observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs."
-Unlike ClickHouse or Prometheus, OpenTelemetry is not an observability backend and rather focuses on the generation, collection, management, and export of telemetry data. While the initial goal of OpenTelemetry was to allow you to instrument your applications or systems using language-specific SDKs easily, it has expanded to include the collection of logs through the OpenTelemetry collector - an agent or proxy that receives, processes, and exports telemetry data.
+Unlike ClickHouse or Prometheus, OpenTelemetry isn't an observability backend and rather focuses on the generation, collection, management, and export of telemetry data. While the initial goal of OpenTelemetry was to allow you to instrument your applications or systems using language-specific SDKs easily, it has expanded to include the collection of logs through the OpenTelemetry collector - an agent or proxy that receives, processes, and exports telemetry data.
## ClickHouse relevant components {#clickhouse-relevant-components}
OpenTelemetry consists of a number of components. As well as providing a data and API specification, standardized protocol, and naming conventions for fields/columns, OTel provides two capabilities which are fundamental to building an Observability solution with ClickHouse:
- The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) is a proxy that receives, processes, and exports telemetry data. A ClickHouse-powered solution uses this component for both log collection and event processing prior to batching and inserting.
-- [Language SDKs](https://opentelemetry.io/docs/languages/) that implement the specification, APIs, and export of telemetry data. These SDKs effectively ensure traces are correctly recorded within an application's code, generating constituent spans and ensuring context is propagated across services through metadata - thus formulating distributed traces and ensuring spans can be correlated. These SDKs are complemented by an ecosystem that automatically implements common libraries and frameworks, thus meaning the user is not required to change their code and obtains out-of-the-box instrumentation.
+- [Language SDKs](https://opentelemetry.io/docs/languages/) that implement the specification, APIs, and export of telemetry data. These SDKs effectively ensure traces are correctly recorded within an application's code, generating constituent spans and ensuring context is propagated across services through metadata - thus formulating distributed traces and ensuring spans can be correlated. These SDKs are complemented by an ecosystem that automatically implements common libraries and frameworks, thus meaning the user isn't required to change their code and obtains out-of-the-box instrumentation.
A ClickHouse-powered Observability solution exploits both of these tools.
@@ -53,7 +53,7 @@ In order to collect logs and insert them into ClickHouse, we recommend using the
- **Agent** - Agent instances collect data at the edge e.g. on servers or on Kubernetes nodes, or receive events directly from applications - instrumented with an OpenTelemetry SDK. In the latter case, the agent instance runs with the application or on the same host as the application (such as a sidecar or a DaemonSet). Agents can either send their data directly to ClickHouse or to a gateway instance. In the former case, this is referred to as [Agent deployment pattern](https://opentelemetry.io/docs/collector/deployment/agent/).
- **Gateway** - Gateway instances provide a standalone service (for example, a deployment in Kubernetes), typically per cluster, per data center, or per region. These receive events from applications (or other collectors as agents) via a single OTLP endpoint. Typically, a set of gateway instances are deployed, with an out-of-the-box load balancer used to distribute the load amongst them. If all agents and applications send their signals to this single endpoint, it is often referred to as a [Gateway deployment pattern](https://opentelemetry.io/docs/collector/deployment/gateway/).
-Below we assume a simple agent collector, sending its events directly to ClickHouse. See [Scaling with Gateways](#scaling-with-gateways) for further details on using gateways and when they are applicable.
+Below we assume a simple agent collector, sending its events directly to ClickHouse. See [Scaling with Gateways](#scaling-with-gateways) for further details on using gateways and when they're applicable.
### Collecting logs {#collecting-logs}
@@ -283,7 +283,7 @@ As demonstrated in the earlier example of setting the timestamp for a log event,
- A [memory_limiter](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md) is used to prevent out of memory situations on the collector. See [Estimating Resources](#estimating-resources) for recommendations.
- Any processor that does enrichment based on context. For example, the [Kubernetes Attributes Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) allows the automatic setting of spans, metrics, and logs resource attributes with k8s metadata e.g. enriching events with their source pod id.
- [Tail or head sampling](https://opentelemetry.io/docs/concepts/sampling/) if required for traces.
- - [Basic filtering](https://opentelemetry.io/docs/collector/transforming-telemetry/) - Dropping events that are not required if this cannot be done via operator (see below).
+ - [Basic filtering](https://opentelemetry.io/docs/collector/transforming-telemetry/) - Dropping events that aren't required if this can't be done via operator (see below).
- [Batching](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor) - essential when working with ClickHouse to ensure data is sent in batches. See ["Exporting to ClickHouse"](#exporting-to-clickhouse).
- **Operators** - [Operators](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/README.md) provide the most basic unit of processing available at the receiver. Basic parsing is supported, allowing fields such as the Severity and Timestamp to be set. JSON and regex parsing are supported here along with event filtering and basic transformations. We recommend performing event filtering here.
@@ -399,14 +399,14 @@ Note the following key settings:
- **pipelines** - The above configuration highlights the use of [pipelines](https://opentelemetry.io/docs/collector/configuration/#pipelines), consisting of a set of receivers, processors and exporters with one for logs and traces.
- **endpoint** - Communication with ClickHouse is configured via the `endpoint` parameter. The connection string `tcp://localhost:9000?dial_timeout=10s&compress=lz4&async_insert=1` causes communication to occur over TCP. If you prefer HTTP for traffic-switching reasons, modify this connection string as described [here](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/clickhouseexporter/README.md#configuration-options). Full connection details, with the ability to specify a username and password within this connection string, are described [here](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/clickhouseexporter/README.md#configuration-options).
-**Important:** Note the above connection string enables both compression (lz4) as well as asynchronous inserts. We recommend both are always enabled. See [Batching](#batching) for further details on asynchronous inserts. Compression should always be specified and will not by default be enabled by default on older versions of the exporter.
+**Important:** Note the above connection string enables both compression (lz4) as well as asynchronous inserts. We recommend both are always enabled. See [Batching](#batching) for further details on asynchronous inserts. Compression should always be specified and won't by default be enabled by default on older versions of the exporter.
- **ttl** - the value here determines how long data is retained. Further details in "Managing data". This should be specified as a time unit in hours e.g. 72h. We disable TTL in the example below since our data is from 2019 and will be removed by ClickHouse immediately if inserted.
- **traces_table_name** and **logs_table_name** - determines the name of the logs and traces table.
- **create_schema** - determines if tables are created with the default schemas on startup. Defaults to true for getting started. You should set it to false and define their own schema.
- **database** - target database.
- **retry_on_failure** - settings to determine whether failed batches should be tried.
-- **batch** - a batch processor ensures events are sent as batches. We recommend a value of around 5000 with a timeout of 5s. Whichever of these is reached first will initiate a batch to be flushed to the exporter. Lowering these values will mean a lower latency pipeline with data available for querying sooner, at the expense of more connections and batches sent to ClickHouse. This is not recommended if you are not using [asynchronous inserts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) as it may cause issues with [too many parts](https://clickhouse.com/blog/common-getting-started-issues-with-clickhouse#1-too-many-parts) in ClickHouse. Conversely, if you are using asynchronous inserts these availability data for querying will also be dependent on asynchronous insert settings - although data will still be flushed from the connector sooner. See [Batching](#batching) for more details.
+- **batch** - a batch processor ensures events are sent as batches. We recommend a value of around 5000 with a timeout of 5s. Whichever of these is reached first will initiate a batch to be flushed to the exporter. Lowering these values will mean a lower latency pipeline with data available for querying sooner, at the expense of more connections and batches sent to ClickHouse. This isn't recommended if you're not using [asynchronous inserts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) as it may cause issues with [too many parts](https://clickhouse.com/blog/common-getting-started-issues-with-clickhouse#1-too-many-parts) in ClickHouse. Conversely, if you're using asynchronous inserts these availability data for querying will also be dependent on asynchronous insert settings - although data will still be flushed from the connector sooner. See [Batching](#batching) for more details.
- **sending_queue** - controls the size of the sending queue. Each item in the queue contains a batch. If this queue is exceeded e.g. due to ClickHouse being unreachable but events continue to arrive, batches will be dropped.
Assuming users have extracted the structured log file and have a [local instance of ClickHouse](/install) running (with default authentication), you can run this configuration via the command:
@@ -533,7 +533,7 @@ A few important notes on this schema:
- By default, the table is partitioned by date via `PARTITION BY toDate(Timestamp)`. This makes it efficient to drop data that expires.
- The TTL is set via `TTL toDateTime(Timestamp) + toIntervalDay(3)` and corresponds to the value set in the collector configuration. [`ttl_only_drop_parts=1`](/operations/settings/merge-tree-settings#ttl_only_drop_parts) means only whole parts are dropped when all the contained rows have expired. This is more efficient than dropping rows within parts, which incurs an expensive delete. We recommend this always be set. See [Data management with TTL](/observability/managing-data#data-management-with-ttl-time-to-live) for more details.
-- The table uses the classic [`MergeTree` engine](/engines/table-engines/mergetree-family/mergetree). This is recommended for logs and traces and should not need to be changed.
+- The table uses the classic [`MergeTree` engine](/engines/table-engines/mergetree-family/mergetree). This is recommended for logs and traces and shouldn't need to be changed.
- The table is ordered by `ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)`. This means queries will be optimized for filters on `ServiceName`, `SeverityText`, `Timestamp` and `TraceId` - earlier columns in the list will filter faster than later ones e.g. filtering by `ServiceName` will be significantly faster than filtering by `TraceId`. You should modify this ordering according to their expected access patterns - see [Choosing a primary key](/use-cases/observability/schema-design#choosing-a-primary-ordering-key).
- The above schema applies `ZSTD(1)` to columns. This offers the best compression for logs. You can increase the ZSTD compression level (above the default of 1) for better compression, although this is rarely beneficial. Increasing this value will incur greater CPU overhead at insert time (during compression), although decompression (and thus queries) should remain comparable. See [here](https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema) for further details. Additional [delta encoding](/sql-reference/statements/create/table#delta) is applied to the Timestamp with the aim of reducing its size on disk.
- Note how [`ResourceAttributes`](https://opentelemetry.io/docs/specs/otel/resource/sdk/), [`LogAttributes`](https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-attributes) and [`ScopeAttributes`](https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-instrumentationscope) are maps. It's important to understand the differences between them. See ["Using maps"](/use-cases/observability/schema-design#using-maps) for how to access these maps and optimize accessing keys within them.
@@ -604,21 +604,21 @@ We recommend users use the [batch processor](https://github.com/open-telemetry/o
Typically, users are forced to send smaller batches when the throughput of a collector is low, and yet they still expect data to reach ClickHouse within a minimum end-to-end latency. In this case, small batches are sent when the `timeout` of the batch processor expires. This can cause problems and is when asynchronous inserts are required. This case typically arises when **collectors in the agent role are configured to send directly to ClickHouse**. Gateways, by acting as aggregators, can alleviate this problem - see [Scaling with Gateways](#scaling-with-gateways).
-If large batches cannot be guaranteed, you can delegate batching to ClickHouse using [Asynchronous Inserts](/best-practices/selecting-an-insert-strategy#asynchronous-inserts). With asynchronous inserts, data is inserted into a buffer first and then written to the database storage later or asynchronously respectively.
+If large batches can't be guaranteed, you can delegate batching to ClickHouse using [Asynchronous Inserts](/best-practices/selecting-an-insert-strategy#asynchronous-inserts). With asynchronous inserts, data is inserted into a buffer first and then written to the database storage later or asynchronously respectively.
-With [enabled asynchronous inserts](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), when ClickHouse ① receives an insert query, the query's data is ② immediately written into an in-memory buffer first. When ③ the next buffer flush takes place, the buffer's data is [sorted](/guides/best-practices/sparse-primary-indexes#data-is-stored-on-disk-ordered-by-primary-key-columns) and written as a part to the database storage. Note, that the data is not searchable by queries before being flushed to the database storage; the buffer flush is [configurable](/optimize/asynchronous-inserts).
+With [enabled asynchronous inserts](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), when ClickHouse ① receives an insert query, the query's data is ② immediately written into an in-memory buffer first. When ③ the next buffer flush takes place, the buffer's data is [sorted](/guides/best-practices/sparse-primary-indexes#data-is-stored-on-disk-ordered-by-primary-key-columns) and written as a part to the database storage. Note, that the data isn't searchable by queries before being flushed to the database storage; the buffer flush is [configurable](/optimize/asynchronous-inserts).
To enable asynchronous inserts for the collector, add `async_insert=1` to the connection string. We recommend users use `wait_for_async_insert=1` (the default) to get delivery guarantees - see [here](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) for further details.
Data from an async insert is inserted once the ClickHouse buffer is flushed. This occurs either after the [`async_insert_max_data_size`](/operations/settings/settings#async_insert_max_data_size) is exceeded or after [`async_insert_busy_timeout_ms`](/operations/settings/settings#async_insert_max_data_size) milliseconds since the first INSERT query. If the `async_insert_stale_timeout_ms` is set to a non-zero value, the data is inserted after `async_insert_stale_timeout_ms milliseconds` since the last query. You can tune these settings to control the end-to-end latency of their pipeline. Further settings which can be used to tune buffer flushing are documented [here](/operations/settings/settings#async_insert). Generally, defaults are appropriate.
:::note Consider Adaptive Asynchronous Inserts
-In cases where a low number of agents are in use, with low throughput but strict end-to-end latency requirements, [adaptive asynchronous inserts](https://clickhouse.com/blog/clickhouse-release-24-02#adaptive-asynchronous-inserts) may be useful. Generally, these are not applicable to high throughput Observability use cases, as seen with ClickHouse.
+In cases where a low number of agents are in use, with low throughput but strict end-to-end latency requirements, [adaptive asynchronous inserts](https://clickhouse.com/blog/clickhouse-release-24-02#adaptive-asynchronous-inserts) may be useful. Generally, these aren't applicable to high throughput Observability use cases, as seen with ClickHouse.
:::
-Finally, the previous deduplication behavior associated with synchronous inserts into ClickHouse is not enabled by default when using asynchronous inserts. If required, see the setting [`async_insert_deduplicate`](/operations/settings/settings#async_insert_deduplicate).
+Finally, the previous deduplication behavior associated with synchronous inserts into ClickHouse isn't enabled by default when using asynchronous inserts. If required, see the setting [`async_insert_deduplicate`](/operations/settings/settings#async_insert_deduplicate).
Full details on configuring this feature can be found [here](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), with a deep dive [here](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse).
@@ -632,7 +632,7 @@ In an agent only architecture, users deploy the OTel collector as agents to the
-This architecture is appropriate for small to medium-sized deployments. Its principal advantage is it does not require additional hardware and keeps the total resource footprint of the ClickHouse observability solution minimal, with a simple mapping between applications and collectors.
+This architecture is appropriate for small to medium-sized deployments. Its principal advantage is it doesn't require additional hardware and keeps the total resource footprint of the ClickHouse observability solution minimal, with a simple mapping between applications and collectors.
You should consider migrating to a Gateway-based architecture once the number of agents exceeds several hundred. This architecture has several disadvantages which make it challenging to scale:
@@ -730,11 +730,11 @@ For an example of managing larger gateway-based architectures with associated le
### Adding Kafka {#adding-kafka}
-Readers may notice the above architectures do not use Kafka as a message queue.
+Readers may notice the above architectures don't use Kafka as a message queue.
Using a Kafka queue as a message buffer is a popular design pattern seen in logging architectures and was popularized by the ELK stack. It provides a few benefits; principally, it helps provide stronger message delivery guarantees and helps deal with backpressure. Messages are sent from collection agents to Kafka and written to disk. In theory, a clustered Kafka instance should provide a high throughput message buffer since it incurs less computational overhead to write data linearly to disk than parse and process a message – in Elastic, for example, the tokenization and indexing incurs significant overhead. By moving data away from the agents, you also incur less risk of losing messages as a result of log rotation at the source. Finally, it offers some message reply and cross-region replication capabilities, which might be attractive for some use cases.
-However, ClickHouse can handle inserting data very quickly - millions of rows per second on moderate hardware. Back pressure from ClickHouse is **rare**. Often, leveraging a Kafka queue means more architectural complexity and cost. If you can embrace the principle that logs do not need the same delivery guarantees as bank transactions and other mission-critical data, we recommend avoiding the complexity of Kafka.
+However, ClickHouse can handle inserting data very quickly - millions of rows per second on moderate hardware. Back pressure from ClickHouse is **rare**. Often, leveraging a Kafka queue means more architectural complexity and cost. If you can embrace the principle that logs don't need the same delivery guarantees as bank transactions and other mission-critical data, we recommend avoiding the complexity of Kafka.
However, if you require high delivery guarantees or the ability to replay data (potentially to multiple sources), Kafka can be a useful architectural addition.
diff --git a/docs/use-cases/observability/build-your-own/introduction.md b/docs/use-cases/observability/build-your-own/introduction.md
index e12b53f746a..ad3d46f7723 100644
--- a/docs/use-cases/observability/build-your-own/introduction.md
+++ b/docs/use-cases/observability/build-your-own/introduction.md
@@ -17,7 +17,7 @@ import Image from '@theme/IdealImage';
This guide is designed for if you're looking to build your own SQL-based Observability solution using ClickHouse, focusing on logs and traces. This covers all aspects of building your own solution including considerations for ingestion, optimizing schemas for your access patterns and extracting structure from unstructured logs.
-ClickHouse alone is not an out-of-the-box solution for Observability. It can, however, be used as a highly efficient storage engine for Observability data, capable of unrivaled compression rates and lightning-fast query response times. In order for you to use ClickHouse within an Observability solution, both a user interface and data collection framework are required. We currently recommend using **Grafana** for visualization of Observability signals and **OpenTelemetry** for data collection (both are officially supported integrations).
+ClickHouse alone isn't an out-of-the-box solution for Observability. It can, however, be used as a highly efficient storage engine for Observability data, capable of unrivaled compression rates and lightning-fast query response times. In order for you to use ClickHouse within an Observability solution, both a user interface and data collection framework are required. We currently recommend using **Grafana** for visualization of Observability signals and **OpenTelemetry** for data collection (both are officially supported integrations).
@@ -53,16 +53,16 @@ SQL-based observability is for you if:
- You or your team(s) are familiar with SQL (or want to learn it)
- You prefer adhering to open standards like OpenTelemetry to avoid lock-in and achieve extensibility.
-- You are willing to run an ecosystem fueled by open-source innovation from collection to storage and visualization.
+- You're willing to run an ecosystem fueled by open-source innovation from collection to storage and visualization.
- You envision some growth to medium or large volumes of observability data under management (or even very large volumes)
- You want to be in control of the TCO (total cost of ownership) and avoid spiraling observability costs.
- You can't or don't want to get stuck with small data retention periods for your observability data just to manage the costs.
SQL-based observability may not be for you if:
-- Learning (or generating!) SQL is not appealing to you or your team(s).
-- You are looking for a packaged, end-to-end observability experience.
-- Your observability data volumes are too small to make any significant difference (e.g. <150 GiB) and are not forecasted to grow.
+- Learning (or generating!) SQL isn't appealing to you or your team(s).
+- You're looking for a packaged, end-to-end observability experience.
+- Your observability data volumes are too small to make any significant difference (e.g. <150 GiB) and aren't forecasted to grow.
- Your use case is metrics-heavy and needs PromQL. In that case, you can still use ClickHouse for logs and tracing beside Prometheus for metrics, unifying it at the presentation layer with Grafana.
- You prefer to wait for the ecosystem to mature more and SQL-based observability to get more turnkey.
diff --git a/docs/use-cases/observability/build-your-own/managing-data.md b/docs/use-cases/observability/build-your-own/managing-data.md
index 546c76fa581..68f289ba6a8 100644
--- a/docs/use-cases/observability/build-your-own/managing-data.md
+++ b/docs/use-cases/observability/build-your-own/managing-data.md
@@ -154,7 +154,7 @@ We explore both of these in detail below.
### Query performance {#query-performance}
-While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it. However, queries which need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts). The benefit of targeting a single partition will be even less pronounced to non-existent if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize GROUP BY queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, you should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the data, e.g., partitioning by day, with most queries in the last day. See [here](https://medium.com/datadenys/using-partitions-in-clickhouse-3ea0decb89c4) for an example of this behavior.
+While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key isn't in the primary key and you're filtering by it. However, queries which need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts). The benefit of targeting a single partition will be even less pronounced to non-existent if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize GROUP BY queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, you should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the data, e.g., partitioning by day, with most queries in the last day. See [here](https://medium.com/datadenys/using-partitions-in-clickhouse-3ea0decb89c4) for an example of this behavior.
## Data management with TTL (Time-to-live) {#data-management-with-ttl-time-to-live}
@@ -185,14 +185,14 @@ SETTINGS ttl_only_drop_parts = 1
By default, data with an expired TTL is removed when ClickHouse [merges data parts](/engines/table-engines/mergetree-family/mergetree#mergetree-data-storage). When ClickHouse detects that data is expired, it performs an off-schedule merge.
:::note Scheduled TTLs
-TTLs are not applied immediately but rather on a schedule, as noted above. The MergeTree table setting `merge_with_ttl_timeout` sets the minimum delay in seconds before repeating a merge with delete TTL. The default value is 14400 seconds (4 hours). But that is just the minimum delay, it can take longer until a TTL merge is triggered. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources. A TTL expiration can be forced using the command `ALTER TABLE my_table MATERIALIZE TTL`.
+TTLs aren't applied immediately but rather on a schedule, as noted above. The MergeTree table setting `merge_with_ttl_timeout` sets the minimum delay in seconds before repeating a merge with delete TTL. The default value is 14400 seconds (4 hours). But that is just the minimum delay, it can take longer until a TTL merge is triggered. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources. A TTL expiration can be forced using the command `ALTER TABLE my_table MATERIALIZE TTL`.
:::
**Important: We recommend using the setting [`ttl_only_drop_parts=1`](/operations/settings/merge-tree-settings#ttl_only_drop_parts) ** (applied by the default schema). When this setting is enabled, ClickHouse drops a whole part when all rows in it are expired. Dropping whole parts instead of partial cleaning TTL-d rows (achieved through resource-intensive mutations when `ttl_only_drop_parts=0`) allows having shorter `merge_with_ttl_timeout` times and lower impact on system performance. If data is partitioned by the same unit at which you perform TTL expiration e.g. day, parts will naturally only contain data from the defined interval. This will ensure `ttl_only_drop_parts=1` can be efficiently applied.
### Column level TTL {#column-level-ttl}
-The above example expires data at the table level. You can also expire data at the column level. As data ages, this can be used to drop columns whose value in investigations does not justify their resource overhead to retain. For example, we recommend retaining the `Body` column in case new dynamic metadata is added that has not been extracted at insert time, e.g., a new Kubernetes label. After a period e.g. 1 month, it might be obvious that this additional metadata is not useful - thus limiting the value in retaining the `Body` column.
+The above example expires data at the table level. You can also expire data at the column level. As data ages, this can be used to drop columns whose value in investigations doesn't justify their resource overhead to retain. For example, we recommend retaining the `Body` column in case new dynamic metadata is added that hasn't been extracted at insert time, e.g., a new Kubernetes label. After a period e.g. 1 month, it might be obvious that this additional metadata isn't useful - thus limiting the value in retaining the `Body` column.
Below, we show how the `Body` column can be dropped after 30 days.
@@ -208,7 +208,7 @@ ORDER BY (ServiceName, Timestamp)
```
:::note
-Specifying a column level TTL requires users to specify their own schema. This cannot be specified in the OTel collector.
+Specifying a column level TTL requires users to specify their own schema. This can't be specified in the OTel collector.
:::
## Recompressing data {#recompressing-data}
@@ -254,7 +254,7 @@ Further details and examples on configuring TTL can be found [here](/engines/tab
In ClickHouse, you may create storage tiers on different disks, e.g. hot/recent data on SSD and older data backed by S3. This architecture allows less expensive storage to be used for older data, which has higher query SLAs due to its infrequent use in investigations.
:::note Not relevant to ClickHouse Cloud
-ClickHouse Cloud uses a single copy of the data that is backed on S3, with SSD-backed node caches. Storage tiers in ClickHouse Cloud, therefore, are not required.
+ClickHouse Cloud uses a single copy of the data that is backed on S3, with SSD-backed node caches. Storage tiers in ClickHouse Cloud, therefore, aren't required.
:::
The creation of storage tiers requires users to create disks, which are then used to formulate storage policies, with volumes that can be specified during table creation. Data can be automatically moved between disks based on fill rates, part sizes, and volume priorities. Further details can be found [here](/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-multiple-volumes).
@@ -269,7 +269,7 @@ In order to avoid downtime during schema changes, users have several options, wh
### Use default values {#use-default-values}
-Columns can be added to the schema using [`DEFAULT` values](/sql-reference/statements/create/table#default). The specified default will be used if it is not specified during the INSERT.
+Columns can be added to the schema using [`DEFAULT` values](/sql-reference/statements/create/table#default). The specified default will be used if it isn't specified during the INSERT.
Schema changes can be made prior to modifying any materialized view transformation logic or OTel collector configuration, which causes these new columns to be sent.
@@ -326,7 +326,7 @@ ALTER TABLE otel_logs_v2
(ADD COLUMN `Size` UInt64 DEFAULT JSONExtractUInt(Body, 'size'))
```
-In the above example, we specify the default as the `size` key in `LogAttributes` (this will be 0 if it doesn't exist). This means queries that access this column for rows that do not have the value inserted must access the Map and will, therefore, be slower. We could easily also specify this as a constant, e.g. 0, reducing the cost of subsequent queries against rows that do not have the value. Querying this table shows the value is populated as expected from the Map:
+In the above example, we specify the default as the `size` key in `LogAttributes` (this will be 0 if it doesn't exist). This means queries that access this column for rows that don't have the value inserted must access the Map and will, therefore, be slower. We could easily also specify this as a constant, e.g. 0, reducing the cost of subsequent queries against rows that don't have the value. Querying this table shows the value is populated as expected from the Map:
```sql
SELECT Size
diff --git a/docs/use-cases/observability/build-your-own/schema-design.md b/docs/use-cases/observability/build-your-own/schema-design.md
index dd30b4fc4e0..3b358725c01 100644
--- a/docs/use-cases/observability/build-your-own/schema-design.md
+++ b/docs/use-cases/observability/build-your-own/schema-design.md
@@ -19,9 +19,9 @@ We recommend users always create their own schema for logs and traces for the fo
- **Choosing a primary key** - The default schemas use an `ORDER BY` which is optimized for specific access patterns. It is unlikely your access patterns will align with this.
- **Extracting structure** - You may wish to extract new columns from the existing columns e.g. the `Body` column. This can be done using materialized columns (and materialized views in more complex cases). This requires schema changes.
-- **Optimizing Maps** - The default schemas use the Map type for the storage of attributes. These columns allow the storage of arbitrary metadata. While an essential capability, as metadata from events is often not defined up front and therefore can't otherwise be stored in a strongly typed database like ClickHouse, access to the map keys and their values is not as efficient as access to a normal column. We address this by modifying the schema and ensuring the most commonly accessed map keys are top-level columns - see ["Extracting structure with SQL"](#extracting-structure-with-sql). This requires a schema change.
+- **Optimizing Maps** - The default schemas use the Map type for the storage of attributes. These columns allow the storage of arbitrary metadata. While an essential capability, as metadata from events is often not defined up front and therefore can't otherwise be stored in a strongly typed database like ClickHouse, access to the map keys and their values isn't as efficient as access to a normal column. We address this by modifying the schema and ensuring the most commonly accessed map keys are top-level columns - see ["Extracting structure with SQL"](#extracting-structure-with-sql). This requires a schema change.
- **Simplify map key access** - Accessing keys in maps requires a more verbose syntax. You can mitigate this with aliases. See ["Using Aliases"](#using-aliases) to simplify queries.
-- **Secondary indices** - The default schema uses secondary indices for speeding up access to Maps and accelerating text queries. These are typically not required and incur additional disk space. They can be used but should be tested to ensure they are required. See ["Secondary / Data Skipping indices"](#secondarydata-skipping-indices).
+- **Secondary indices** - The default schema uses secondary indices for speeding up access to Maps and accelerating text queries. These are typically not required and incur additional disk space. They can be used but should be tested to ensure they're required. See ["Secondary / Data Skipping indices"](#secondarydata-skipping-indices).
- **Using Codecs** - You may wish to customize codecs for columns if they understand the anticipated data and have evidence this improves compression.
_We describe each of the above use cases in detail below._
@@ -75,10 +75,10 @@ Peak memory usage: 153.71 MiB.
Note the use of the map syntax here e.g. `LogAttributes['request_path']`, and the [`path` function](/sql-reference/functions/url-functions#path) for stripping query parameters from the URL.
-If the user has not enabled JSON parsing in the collector, then `LogAttributes` will be empty, forcing us to use [JSON functions](/sql-reference/functions/json-functions) to extract the columns from the String `Body`.
+If the user hasn't enabled JSON parsing in the collector, then `LogAttributes` will be empty, forcing us to use [JSON functions](/sql-reference/functions/json-functions) to extract the columns from the String `Body`.
:::note Prefer ClickHouse for parsing
-We generally recommend users perform JSON parsing in ClickHouse of structured logs. We are confident ClickHouse is the fastest JSON parsing implementation. However, we recognize you may wish to send logs to other sources and not have this logic reside in SQL.
+We generally recommend users perform JSON parsing in ClickHouse of structured logs. We're confident ClickHouse is the fastest JSON parsing implementation. However, we recognize you may wish to send logs to other sources and not have this logic reside in SQL.
:::
```sql
@@ -156,7 +156,7 @@ You may also perform processing using OTel Collector processors and operators as
### Materialized columns {#materialized-columns}
-Materialized columns offer the simplest solution to extract structure from other columns. Values of such columns are always calculated at insert time and cannot be specified in INSERT queries.
+Materialized columns offer the simplest solution to extract structure from other columns. Values of such columns are always calculated at insert time and can't be specified in INSERT queries.
:::note Overhead
Materialized columns incur additional storage overhead as the values are extracted to new columns on disk at insert time.
@@ -164,7 +164,7 @@ Materialized columns incur additional storage overhead as the values are extract
Materialized columns support any ClickHouse expression and can exploit any of the analytical functions for [processing strings](/sql-reference/functions/string-functions) (including [regex and searching](/sql-reference/functions/string-search-functions)) and [urls](/sql-reference/functions/url-functions), performing [type conversions](/sql-reference/functions/type-conversion-functions), [extracting values from JSON](/sql-reference/functions/json-functions) or [mathematical operations](/sql-reference/functions/math-functions).
-We recommend materialized columns for basic processing. They are especially useful for extracting values from maps, promoting them to root columns, and performing type conversions. They are often most useful when used in very basic schemas or in conjunction with materialized views. Consider the following schema for logs from which the JSON has been extracted to the `LogAttributes` column by the collector:
+We recommend materialized columns for basic processing. They're especially useful for extracting values from maps, promoting them to root columns, and performing type conversions. They're often most useful when used in very basic schemas or in conjunction with materialized views. Consider the following schema for logs from which the JSON has been extracted to the `LogAttributes` column by the collector:
```sql
CREATE TABLE otel_logs
@@ -225,12 +225,12 @@ Materialized columns will, by default, not be returned in a `SELECT *`. This is
[Materialized views](/materialized-views) provide a more powerful means of applying SQL filtering and transformations to logs and traces.
-Materialized Views allow you to shift the cost of computation from query time to insert time. A ClickHouse materialized view is just a trigger that runs a query on blocks of data as they are inserted into a table. The results of this query are inserted into a second "target" table.
+Materialized Views allow you to shift the cost of computation from query time to insert time. A ClickHouse materialized view is just a trigger that runs a query on blocks of data as they're inserted into a table. The results of this query are inserted into a second "target" table.
:::note Real-time updates
-Materialized views in ClickHouse are updated in real time as data flows into the table they are based on, functioning more like continually updating indexes. In contrast, in other databases materialized views are typically static snapshots of a query that must be refreshed (similar to ClickHouse Refreshable Materialized Views).
+Materialized views in ClickHouse are updated in real time as data flows into the table they're based on, functioning more like continually updating indexes. In contrast, in other databases materialized views are typically static snapshots of a query that must be refreshed (similar to ClickHouse Refreshable Materialized Views).
:::
The query associated with the materialized view can theoretically be any query, including an aggregation although [limitations exist with Joins](https://clickhouse.com/blog/using-materialized-views-in-clickhouse#materialized-views-and-joins). For the transformations and filtering workloads required for logs and traces, you can consider any `SELECT` statement to be possible.
@@ -260,7 +260,7 @@ CREATE TABLE otel_logs
) ENGINE = Null
```
-The Null table engine is a powerful optimization - think of it as `/dev/null`. This table will not store any data, but any attached materialized views will still be executed over inserted rows before they are discarded.
+The Null table engine is a powerful optimization - think of it as `/dev/null`. This table won't store any data, but any attached materialized views will still be executed over inserted rows before they're discarded.
Consider the following query. This transforms our rows into a format we wish to preserve, extracting all columns from `LogAttributes` (we assume this has been set by the collector using the `json_parser` operator), setting the `SeverityText` and `SeverityNumber` (based on some simple conditions and definition of [these columns](https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-severitytext)). In this case we also only select the columns we know will be populated - ignoring columns such as the `TraceId`, `SpanId` and `TraceFlags`.
@@ -310,10 +310,10 @@ SeverityNumber: 9
1 row in set. Elapsed: 0.027 sec.
```
-We also extract the `Body` column above - in case additional attributes are added later that are not extracted by our SQL. This column should compress well in ClickHouse and will be rarely accessed, thus not impacting query performance. Finally, we reduce the Timestamp to a DateTime (to save space - see ["Optimizing Types"](#optimizing-types)) with a cast.
+We also extract the `Body` column above - in case additional attributes are added later that aren't extracted by our SQL. This column should compress well in ClickHouse and will be rarely accessed, thus not impacting query performance. Finally, we reduce the Timestamp to a DateTime (to save space - see ["Optimizing Types"](#optimizing-types)) with a cast.
:::note Conditionals
-Note the use of [conditionals](/sql-reference/functions/conditional-functions) above for extracting the `SeverityText` and `SeverityNumber`. These are extremely useful for formulating complex conditions and checking if values are set in maps - we naively assume all keys exist in `LogAttributes`. We recommend users become familiar with them - they are your friend in log parsing in addition to functions for handling [null values](/sql-reference/functions/functions-for-nulls)!
+Note the use of [conditionals](/sql-reference/functions/conditional-functions) above for extracting the `SeverityText` and `SeverityNumber`. These are extremely useful for formulating complex conditions and checking if values are set in maps - we naively assume all keys exist in `LogAttributes`. We recommend users become familiar with them - they're your friend in log parsing in addition to functions for handling [null values](/sql-reference/functions/functions-for-nulls)!
:::
We require a table to receive these results. The below target table matches the above query:
@@ -438,8 +438,8 @@ FROM otel_logs
The above materialized views rely on implicit casting - especially in the case of using the `LogAttributes` map. ClickHouse will often transparently cast the extracted value to the target table type, reducing the syntax required. However, we recommend users always test their views by using the views `SELECT` statement with an [`INSERT INTO`](/sql-reference/statements/insert-into) statement with a target table using the same schema. This should confirm that types are correctly handled. Special attention should be given to the following cases:
- If a key doesn't exist in a map, an empty string will be returned. In the case of numerics, you will need to map these to an appropriate value. This can be achieved with [conditionals](/sql-reference/functions/conditional-functions) e.g. `if(LogAttributes['status'] = ", 200, LogAttributes['status'])` or [cast functions](/sql-reference/functions/type-conversion-functions) if default values are acceptable e.g. `toUInt8OrDefault(LogAttributes['status'] )`
-- Some types will not always be cast e.g. string representations of numerics will not be cast to enum values.
-- JSON extract functions return default values for their type if a value is not found. Ensure these values make sense!
+- Some types won't always be cast e.g. string representations of numerics won't be cast to enum values.
+- JSON extract functions return default values for their type if a value isn't found. Ensure these values make sense!
:::note Avoid Nullable
Avoid using [Nullable](/sql-reference/data-types/nullable) in Clickhouse for Observability data. It is rarely required in logs and traces to be able to distinguish between empty and null. This feature incurs an additional storage overhead and will negatively impact query performance. See [here](/data-modeling/schema-design#optimizing-types) for further details.
@@ -461,7 +461,7 @@ Some simple rules can be applied to help choose an ordering key. The following c
On identifying the subset of columns for the ordering key, they must be declared in a specific order. This order can significantly influence both the efficiency of the filtering on secondary key columns in queries and the compression ratio for the table's data files. In general, it is **best to order the keys in ascending order of cardinality**. This should be balanced against the fact that filtering on columns that appear later in the ordering key will be less efficient than filtering on those that appear earlier in the tuple. Balance these behaviors and consider your access patterns. Most importantly, test variants. For further understanding of ordering keys and how to optimize them, we recommend [this article](/guides/best-practices/sparse-primary-indexes).
:::note Structure first
-We recommend deciding on your ordering keys once you have structured your logs. Do not use keys in attribute maps for the ordering key or JSON extraction expressions. Ensure you have your ordering keys as root columns in your table.
+We recommend deciding on your ordering keys once you have structured your logs. Don't use keys in attribute maps for the ordering key or JSON extraction expressions. Ensure you have your ordering keys as root columns in your table.
:::
## Using maps {#using-maps}
@@ -491,7 +491,7 @@ We don't recommend using dots in Map column names and may deprecate its use. Use
Querying map types is slower than querying normal columns - see ["Accelerating queries"](#accelerating-queries). In addition, it's more syntactically complicated and can be cumbersome for you to write. To address this latter issue we recommend using Alias columns.
-ALIAS columns are calculated at query time and are not stored in the table. Therefore, it is impossible to INSERT a value into a column of this type. Using aliases we can reference map keys and simplify syntax, transparently expose map entries as a normal column. Consider the following example:
+ALIAS columns are calculated at query time and aren't stored in the table. Therefore, it is impossible to INSERT a value into a column of this type. Using aliases we can reference map keys and simplify syntax, transparently expose map entries as a normal column. Consider the following example:
```sql
CREATE TABLE otel_logs
@@ -593,7 +593,7 @@ Users interested in accelerating joins with dictionaries can find further detail
Dictionaries can be used for enriching datasets at query time or insert time. Each of these approaches have their respective pros and cons. In summary:
-- **Insert time** - This is typically appropriate if the enrichment value does not change and exists in an external source which can be used to populate the dictionary. In this case, enriching the row at insert time avoids the query time lookup to the dictionary. This comes at the cost of insert performance as well as an additional storage overhead, as enriched values will be stored as columns.
+- **Insert time** - This is typically appropriate if the enrichment value doesn't change and exists in an external source which can be used to populate the dictionary. In this case, enriching the row at insert time avoids the query time lookup to the dictionary. This comes at the cost of insert performance as well as an additional storage overhead, as enriched values will be stored as columns.
- **Query time** - If values in a dictionary change frequently, query time lookups are often more applicable. This avoids needing to update columns (and rewrite data) if mapped values change. This flexibility comes at the expense of a query time lookup cost. This query time cost is typically appreciable if a lookup is required for many rows, e.g. using a dictionary lookup in a filter clause. For result enrichment, i.e. in the `SELECT`, this overhead is typically not appreciable.
We recommend that users familiarize themselves with the basics of dictionaries. Dictionaries provide an in-memory lookup table from which values can be retrieved using dedicated [specialist functions](/sql-reference/functions/ext-dict-functions#dictGetAll).
@@ -1228,7 +1228,7 @@ Note we use a `GROUP BY` here instead of using `FINAL`.
### Using Materialized views (incremental) for fast lookups {#using-materialized-views-incremental--for-fast-lookups}
-You should consider their access patterns when choosing the ClickHouse ordering key with the columns that are frequently used in filter and aggregation clauses. This can be restrictive in Observability use cases, where users have more diverse access patterns that cannot be encapsulated in a single set of columns. This is best illustrated in an example built into the default OTel schemas. Consider the default schema for the traces:
+You should consider their access patterns when choosing the ClickHouse ordering key with the columns that are frequently used in filter and aggregation clauses. This can be restrictive in Observability use cases, where users have more diverse access patterns that can't be encapsulated in a single set of columns. This is best illustrated in an example built into the default OTel schemas. Consider the default schema for the traces:
```sql
CREATE TABLE otel_traces
@@ -1267,7 +1267,7 @@ PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
```
-This schema is optimized for filtering by `ServiceName`, `SpanName`, and `Timestamp`. In tracing, users also need the ability to perform lookups by a specific `TraceId` and retrieving the associated trace's spans. While this is present in the ordering key, its position at the end means [filtering will not be as efficient](/guides/best-practices/sparse-primary-indexes#ordering-key-columns-efficiently) and likely means significant amounts of data will need to be scanned when retrieving a single trace.
+This schema is optimized for filtering by `ServiceName`, `SpanName`, and `Timestamp`. In tracing, users also need the ability to perform lookups by a specific `TraceId` and retrieving the associated trace's spans. While this is present in the ordering key, its position at the end means [filtering won't be as efficient](/guides/best-practices/sparse-primary-indexes#ordering-key-columns-efficiently) and likely means significant amounts of data will need to be scanned when retrieving a single trace.
The OTel collector also installs a materialized view and associated table to address this challenge. The table and view are shown below:
@@ -1338,12 +1338,12 @@ In previous sections, we explore how materialized views can be used in ClickHous
We provided an example where the materialized view sends rows to a target table with a different ordering key than the original table receiving inserts in order to optimize for lookups by trace id.
-Projections can be used to address the same problem, allowing the user to optimize for queries on a column that are not part of the primary key.
+Projections can be used to address the same problem, allowing the user to optimize for queries on a column that aren't part of the primary key.
In theory, this capability can be used to provide multiple ordering keys for a table, with one distinct disadvantage: data duplication. Specifically, data will need to be written in the order of the main primary key in addition to the order specified for each projection. This will slow inserts and consume more disk space.
:::note Projections vs Materialized Views
-Projections offer many of the same capabilities as materialized views, but should be used sparingly with the latter often preferred. You should understand the drawbacks and when they are appropriate. For example, while projections can be used for pre-computing aggregations we recommend users use Materialized views for this.
+Projections offer many of the same capabilities as materialized views, but should be used sparingly with the latter often preferred. You should understand the drawbacks and when they're appropriate. For example, while projections can be used for pre-computing aggregations we recommend users use Materialized views for this.
:::
@@ -1441,13 +1441,13 @@ In the above example, we specify the columns used in the earlier query in the pr
### Secondary/data skipping indices {#secondarydata-skipping-indices}
-No matter how well the primary key is tuned in ClickHouse, some queries will inevitably require full table scans. While this can be mitigated using Materialized views (and projections for some queries), these require additional maintenance and users to be aware of their availability in order to ensure they are exploited. While traditional relational databases solve this with secondary indexes, these are ineffective in column-oriented databases like ClickHouse. Instead, ClickHouse uses "Skip" indexes, which can significantly improve query performance by allowing the database to skip over large data chunks with no matching values.
+No matter how well the primary key is tuned in ClickHouse, some queries will inevitably require full table scans. While this can be mitigated using Materialized views (and projections for some queries), these require additional maintenance and users to be aware of their availability in order to ensure they're exploited. While traditional relational databases solve this with secondary indexes, these are ineffective in column-oriented databases like ClickHouse. Instead, ClickHouse uses "Skip" indexes, which can significantly improve query performance by allowing the database to skip over large data chunks with no matching values.
-The default OTel schemas use secondary indices in an attempt to accelerate access to map access. While we find these to be generally ineffective and do not recommend copying them into your custom schema, skipping indices can still be useful.
+The default OTel schemas use secondary indices in an attempt to accelerate access to map access. While we find these to be generally ineffective and don't recommend copying them into your custom schema, skipping indices can still be useful.
You should read and understand the [guide to secondary indices](/optimize/skipping-indexes) before attempting to apply them.
-**In general, they are effective when a strong correlation exists between the primary key and the targeted, non-primary column/expression and users are looking up rare values i.e. those which do not occur in many granules.**
+**In general, they're effective when a strong correlation exists between the primary key and the targeted, non-primary column/expression and users are looking up rare values i.e. those which don't occur in many granules.**
### Bloom filters for text search {#bloom-filters-for-text-search}
@@ -1478,7 +1478,7 @@ SELECT ngrams('https://www.zanbil.ir/m/filter/b113', 3)
```
:::note Inverted indices
-ClickHouse also has experimental support for inverted indices as a secondary index. We do not currently recommend these for logging datasets but anticipate they will replace token-based bloom filters when they are production-ready.
+ClickHouse also has experimental support for inverted indices as a secondary index. We don't currently recommend these for logging datasets but anticipate they will replace token-based bloom filters when they're production-ready.
:::
For the purposes of this example we use the structured logs dataset. Suppose we wish to count logs where the `Referer` column contains `ultra`.
@@ -1523,7 +1523,7 @@ ENGINE = MergeTree
ORDER BY (Timestamp)
```
-The index `ngrambf_v1(3, 10000, 3, 7)` here takes four parameters. The last of these (value 7) represents a seed. The others represent the ngram size (3), the value `m` (filter size), and the number of hash functions `k` (7). `k` and `m` require tuning and will be based on the number of unique ngrams/tokens and the probability the filter results in a true negative - thus confirming a value is not present in a granule. We recommend [these functions](/engines/table-engines/mergetree-family/mergetree#bloom-filter) to help establish these values.
+The index `ngrambf_v1(3, 10000, 3, 7)` here takes four parameters. The last of these (value 7) represents a seed. The others represent the ngram size (3), the value `m` (filter size), and the number of hash functions `k` (7). `k` and `m` require tuning and will be based on the number of unique ngrams/tokens and the probability the filter results in a true negative - thus confirming a value isn't present in a granule. We recommend [these functions](/engines/table-engines/mergetree-family/mergetree#bloom-filter) to help establish these values.
If tuned correctly, the speedup here can be significant:
diff --git a/docs/use-cases/observability/clickstack/alerts.md b/docs/use-cases/observability/clickstack/alerts.md
index caf3a60b2c8..4c37b91d11e 100644
--- a/docs/use-cases/observability/clickstack/alerts.md
+++ b/docs/use-cases/observability/clickstack/alerts.md
@@ -46,7 +46,7 @@ To create a search alert:
-For an alert to be created for a search, the search must be saved. You can either create the alert for an existing saved search or save the search during the alert creation process. In the example below, we assume the search is not saved.
+For an alert to be created for a search, the search must be saved. You can either create the alert for an existing saved search or save the search during the alert creation process. In the example below, we assume the search isn't saved.
#### Open alert creation dialog {#open-dialog}
@@ -176,7 +176,7 @@ From this view, you can see all alerts that have been created and are currently
This view also displays the alert evaluation history. Alerts are evaluated on a recurring time interval (defined by the period/duration set during alert creation). During each evaluation, HyperDX queries your data to check whether the alert condition is met:
- **Red bar**: The threshold condition was met during this evaluation and the alert fired (notification sent)
-- **Green bar**: The alert was evaluated but the threshold condition was not met (no notification sent)
+- **Green bar**: The alert was evaluated but the threshold condition wasn't met (no notification sent)
Each evaluation is independent - the alert checks the data for that time window and fires only if the condition is true at that moment.
diff --git a/docs/use-cases/observability/clickstack/config.md b/docs/use-cases/observability/clickstack/config.md
index 1a56a469e43..98cb54de7ad 100644
--- a/docs/use-cases/observability/clickstack/config.md
+++ b/docs/use-cases/observability/clickstack/config.md
@@ -210,7 +210,7 @@ Highlighted Attributes and Highlighted Trace Attributes can be configured on Log
- Highlighted Attributes are columns or expressions which are displayed for each log or span, when viewing log or span details.
- Highlighted Trace Attributes are columns or expressions which are queried from each log or span in a trace, and displayed above the trace waterfall.
-These attributes are defined in the source configuration and can be arbitrary SQL expressions. If the SQL expression returns a value that is in the format of a URL, then the attribute will be displayed as a link. Empty values are not displayed.
+These attributes are defined in the source configuration and can be arbitrary SQL expressions. If the SQL expression returns a value that is in the format of a URL, then the attribute will be displayed as a link. Empty values aren't displayed.
For example, this trace source has been configured with a Highlighted Attribute and a Highlighted Trace Attribute:
diff --git a/docs/use-cases/observability/clickstack/dashboards.md b/docs/use-cases/observability/clickstack/dashboards.md
index ba4179ea523..08828d711ff 100644
--- a/docs/use-cases/observability/clickstack/dashboards.md
+++ b/docs/use-cases/observability/clickstack/dashboards.md
@@ -95,7 +95,7 @@ Select `Dashboards` from the left menu.
By default, dashboards are temporary to support ad-hoc investigations.
-If using your own HyperDX instance you can ensure this dashboard can later be saved, by clicking `Create New Saved Dashboard`. This option will not be available if using the read-only environment [play-clickstack.clickhouse.com](https://play-clickstack.clickhouse.com).
+If using your own HyperDX instance you can ensure this dashboard can later be saved, by clicking `Create New Saved Dashboard`. This option won't be available if using the read-only environment [play-clickstack.clickhouse.com](https://play-clickstack.clickhouse.com).
### Create a visualization – average request time by service {#create-a-tile}
diff --git a/docs/use-cases/observability/clickstack/deployment/_snippets/_navigate_managed.md b/docs/use-cases/observability/clickstack/deployment/_snippets/_navigate_managed.md
index fe305ea22eb..73bc20f701a 100644
--- a/docs/use-cases/observability/clickstack/deployment/_snippets/_navigate_managed.md
+++ b/docs/use-cases/observability/clickstack/deployment/_snippets/_navigate_managed.md
@@ -17,7 +17,7 @@ Data sources will be pre-created for any OpenTelemetry data.
-If you are using Vector, you will need to create your own data sources. You will be prompted to create one on your first login. Below we show an example configuration for a logs data source.
+If you're using Vector, you will need to create your own data sources. You will be prompted to create one on your first login. Below we show an example configuration for a logs data source.
@@ -25,7 +25,7 @@ This configuration assumes an Nginx-style schema with a `time_local` column used
We also recommend updating the `Default SELECT` to explicitly define which columns are returned in the logs view. If additional fields are available, such as service name, log level, or a body column, these can also be configured. The timestamp display column can also be overridden if it differs from the column used in the table's primary key and configured above.
-In the example above, a `Body` column does not exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
+In the example above, a `Body` column doesn't exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
For other possible options, see the [configuration reference](/use-cases/observability/clickstack/config).
diff --git a/docs/use-cases/observability/clickstack/deployment/_snippets/_select_provider.md b/docs/use-cases/observability/clickstack/deployment/_snippets/_select_provider.md
index 8432830f52a..e23d09be5f4 100644
--- a/docs/use-cases/observability/clickstack/deployment/_snippets/_select_provider.md
+++ b/docs/use-cases/observability/clickstack/deployment/_snippets/_select_provider.md
@@ -27,7 +27,7 @@ These recommendations are based on the following assumptions:
- Data volume refers to **uncompressed ingest volume** per month and applies to both logs and traces.
- Query patterns are typical for observability use cases, with most queries targeting **recent data**, usually the last 24 hours.
- Ingestion is relatively **uniform across the month**. If you expect bursty traffic or spikes, you should provision additional headroom.
-- Storage is handled separately via ClickHouse Cloud object storage and is not a limiting factor for retention. We assume data retained for longer periods is infrequently accessed.
+- Storage is handled separately via ClickHouse Cloud object storage and isn't a limiting factor for retention. We assume data retained for longer periods is infrequently accessed.
More compute may be required for access patterns that regularly query longer time ranges, perform heavy aggregations, or support a high number of concurrent users.
diff --git a/docs/use-cases/observability/clickstack/deployment/index.md b/docs/use-cases/observability/clickstack/deployment/index.md
index 8a50d3d0114..1424f19a47e 100644
--- a/docs/use-cases/observability/clickstack/deployment/index.md
+++ b/docs/use-cases/observability/clickstack/deployment/index.md
@@ -10,7 +10,7 @@ keywords: ['ClickStack', 'observability']
ClickStack provides multiple deployment options to suit various use cases.
-Each of the deployment options are summarized below. The [Getting Started Guide](/use-cases/observability/clickstack/getting-started) specifically demonstrates options 1 and 2. They are included here for completeness.
+Each of the deployment options are summarized below. The [Getting Started Guide](/use-cases/observability/clickstack/getting-started) specifically demonstrates options 1 and 2. They're included here for completeness.
| Name | Description | Suitable For | Limitations | Example Link |
|------------------|----------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
diff --git a/docs/use-cases/observability/clickstack/deployment/managed.md b/docs/use-cases/observability/clickstack/deployment/managed.md
index c5821053011..83ce1dfad59 100644
--- a/docs/use-cases/observability/clickstack/deployment/managed.md
+++ b/docs/use-cases/observability/clickstack/deployment/managed.md
@@ -112,7 +112,7 @@ From the ClickHouse Cloud landing page, select the service for which you wish to
:::important Estimating resources
This guide assumes you have provisioned sufficient resources to handle the volume of observability data you plan to ingest and query with ClickStack. To estimate the required resources, refer to the [production guide](/use-cases/observability/clickstack/production#estimating-resources).
-If your ClickHouse service already hosts existing workloads, such as real-time application analytics, we recommend creating a child service using [ClickHouse Cloud's warehouses feature](/cloud/reference/warehouses) to isolate the observability workload. This ensures your existing applications are not disrupted, while keeping the datasets accessible from both services.
+If your ClickHouse service already hosts existing workloads, such as real-time application analytics, we recommend creating a child service using [ClickHouse Cloud's warehouses feature](/cloud/reference/warehouses) to isolate the observability workload. This ensures your existing applications aren't disrupted, while keeping the datasets accessible from both services.
:::
@@ -284,7 +284,7 @@ The configuration above assumes an Nginx-style schema with a `time_local` column
We also recommend updating the `Default SELECT` to explicitly define which columns are returned in the logs view. If additional fields are available, such as service name, log level, or a body column, these can also be configured. The timestamp display column can also be overridden if it differs from the column used in the table's primary key and configured above.
-In the example above, a `Body` column does not exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
+In the example above, a `Body` column doesn't exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
For other possible options, see the [configuration reference](/use-cases/observability/clickstack/config#hyperdx).
@@ -310,7 +310,7 @@ Once the source is configured, click "Save" and begin exploring your data.
3. Set the appropriate permission level for each user:
- **Service Admin → Full Access** - Required for enabling alerts
- **Service Read Only → Read Only** - Can view observability data and create dashboards
- - **No access** - Cannot access HyperDX
+ - **No access** - Can't access HyperDX
diff --git a/docs/use-cases/observability/clickstack/deployment/oss/all-in-one.md b/docs/use-cases/observability/clickstack/deployment/oss/all-in-one.md
index ca47de7fa20..94d07d092bf 100644
--- a/docs/use-cases/observability/clickstack/deployment/oss/all-in-one.md
+++ b/docs/use-cases/observability/clickstack/deployment/oss/all-in-one.md
@@ -83,9 +83,9 @@ docker run \
## Deploying to production {#deploying-to-production}
-This option should not be deployed to production for the following reasons:
+This option shouldn't be deployed to production for the following reasons:
-- **Non-persistent storage:** All data is stored using the Docker native overlay filesystem. This setup does not support performance at scale, and data will be lost if the container is removed or restarted - unless users [mount the required file paths](#persisting-data-and-settings).
+- **Non-persistent storage:** All data is stored using the Docker native overlay filesystem. This setup doesn't support performance at scale, and data will be lost if the container is removed or restarted - unless users [mount the required file paths](#persisting-data-and-settings).
- **Lack of component isolation:** All components run within a single Docker container. This prevents independent scaling and monitoring and applies any `cgroup` limits globally to all processes. As a result, components may compete for CPU and memory.
## Customizing ports {#customizing-ports-deploy}
diff --git a/docs/use-cases/observability/clickstack/deployment/oss/docker-compose.md b/docs/use-cases/observability/clickstack/deployment/oss/docker-compose.md
index cf05e53a226..5d005d7782b 100644
--- a/docs/use-cases/observability/clickstack/deployment/oss/docker-compose.md
+++ b/docs/use-cases/observability/clickstack/deployment/oss/docker-compose.md
@@ -37,7 +37,7 @@ These ports enable integrations with a variety of telemetry sources and make the
* Local testing
* Proof of concepts
-* Production deployments where fault tolerance is not required and a single server is sufficient to host all ClickHouse data
+* Production deployments where fault tolerance isn't required and a single server is sufficient to host all ClickHouse data
* When deploying ClickStack but hosting ClickHouse separately e.g. using ClickHouse Cloud.
## Deployment steps {#deployment-steps}
diff --git a/docs/use-cases/observability/clickstack/deployment/oss/helm/helm-configuration.md b/docs/use-cases/observability/clickstack/deployment/oss/helm/helm-configuration.md
index 65bcae02926..e4ca6bd2cdc 100644
--- a/docs/use-cases/observability/clickstack/deployment/oss/helm/helm-configuration.md
+++ b/docs/use-cases/observability/clickstack/deployment/oss/helm/helm-configuration.md
@@ -165,10 +165,10 @@ spec:
**Path and rewrite configuration:**
- For Next.js and other SPAs, always use a regex path and rewrite annotation as shown above
-- Do not use just `path: /` without a rewrite, as this will break static asset serving
+- Don't use just `path: /` without a rewrite, as this will break static asset serving
**Mismatched `frontendUrl` and `ingress.host`:**
-- If these do not match, you may experience issues with cookies, redirects, and asset loading
+- If these don't match, you may experience issues with cookies, redirects, and asset loading
**TLS misconfiguration:**
- Ensure your TLS secret is valid and referenced correctly in the ingress
@@ -214,7 +214,7 @@ hyperdx:
- The regex path rule allows you to route all OTLP signals (traces, metrics, logs) through a single rule
:::note
-If you do not need to expose the OTEL collector externally, you can skip this configuration. For most users, the general ingress setup is sufficient.
+If you don't need to expose the OTEL collector externally, you can skip this configuration. For most users, the general ingress setup is sufficient.
:::
## Troubleshooting ingress {#troubleshooting-ingress}
@@ -243,7 +243,7 @@ curl -I https://hyperdx.yourdomain.com/_next/static/chunks/main-xxxx.js
- Look for errors like `Unexpected token <` in the console (indicates HTML returned for JS)
**Check for path rewrites:**
-- Ensure the ingress is not stripping or incorrectly rewriting asset paths
+- Ensure the ingress isn't stripping or incorrectly rewriting asset paths
**Clear browser and CDN cache:**
- After changes, clear your browser cache and any CDN/proxy cache to avoid stale assets
diff --git a/docs/use-cases/observability/clickstack/deployment/oss/helm/helm.md b/docs/use-cases/observability/clickstack/deployment/oss/helm/helm.md
index e7dcb0c825a..38b2e9a025c 100644
--- a/docs/use-cases/observability/clickstack/deployment/oss/helm/helm.md
+++ b/docs/use-cases/observability/clickstack/deployment/oss/helm/helm.md
@@ -15,7 +15,7 @@ import hyperdx_login from '@site/static/images/use-cases/observability/hyperdx-l
import JSONSupport from '@site/docs/use-cases/observability/clickstack/deployment/_snippets/_json_support.md';
:::warning Chart Migration
-If you are currently using the `hdx-oss-v2` chart, please migrate to the `clickstack` chart. The `hdx-oss-v2` chart is in maintenance mode and will no longer receive new features. All new development is focused on the `clickstack` chart, which provides the same functionality with improved naming and better organization.
+If you're currently using the `hdx-oss-v2` chart, please migrate to the `clickstack` chart. The `hdx-oss-v2` chart is in maintenance mode and will no longer receive new features. All new development is focused on the `clickstack` chart, which provides the same functionality with improved naming and better organization.
:::
The helm chart for ClickStack can be found [here](https://github.com/ClickHouse/ClickStack-helm-charts) and is the **recommended** method for production deployments.
diff --git a/docs/use-cases/observability/clickstack/deployment/oss/hyperdx-only.md b/docs/use-cases/observability/clickstack/deployment/oss/hyperdx-only.md
index 33a19bad753..b2a64d11dcb 100644
--- a/docs/use-cases/observability/clickstack/deployment/oss/hyperdx-only.md
+++ b/docs/use-cases/observability/clickstack/deployment/oss/hyperdx-only.md
@@ -71,7 +71,7 @@ You can modify the [Docker Compose configuration](/use-cases/observability/click
## ClickStack OpenTelemetry collector {#otel-collector}
-Even if you are managing your own OpenTelemetry collector, independent of the other components in the stack, we still recommend using the ClickStack distribution of the collector. This ensures the default schema is used and best practices for ingestion are applied.
+Even if you're managing your own OpenTelemetry collector, independent of the other components in the stack, we still recommend using the ClickStack distribution of the collector. This ensures the default schema is used and best practices for ingestion are applied.
For details on deploying and configuring a standalone collector see ["Ingesting with OpenTelemetry"](/use-cases/observability/clickstack/ingesting-data/otel-collector#modifying-otel-collector-configuration).
diff --git a/docs/use-cases/observability/clickstack/deployment/oss/local-mode-only.md b/docs/use-cases/observability/clickstack/deployment/oss/local-mode-only.md
index f5f99dec91a..485eb33a87a 100644
--- a/docs/use-cases/observability/clickstack/deployment/oss/local-mode-only.md
+++ b/docs/use-cases/observability/clickstack/deployment/oss/local-mode-only.md
@@ -46,7 +46,7 @@ docker run -p 8080:8080 clickhouse/clickstack-local:latest
Visit [http://localhost:8080](http://localhost:8080) to access the HyperDX UI.
-**You will not be prompted to create a user, as authentication is not enabled in this deployment mode.**
+**You won't be prompted to create a user, as authentication isn't enabled in this deployment mode.**
Connect to your own external ClickHouse cluster e.g. ClickHouse Cloud.
diff --git a/docs/use-cases/observability/clickstack/event_patterns.md b/docs/use-cases/observability/clickstack/event_patterns.md
index 15ed4eccbd6..2f4a9ed3dad 100644
--- a/docs/use-cases/observability/clickstack/event_patterns.md
+++ b/docs/use-cases/observability/clickstack/event_patterns.md
@@ -35,6 +35,6 @@ This provides an alternative to the default **Results Table** which allows you t
Event patterns are most effective when applied to **narrowed subsets** of your data. For example, filtering down to a single service before enabling event patterns will usually surface more relevant and interesting messages than applying patterns across thousands of services at once.
-They are also particularly powerful for summarizing error messages, where repeated errors with varying IDs or payloads are grouped into concise clusters.
+They're also particularly powerful for summarizing error messages, where repeated errors with varying IDs or payloads are grouped into concise clusters.
For a live example, see how event patterns are used in the [Remote Demo Dataset](/use-cases/observability/clickstack/getting-started/remote-demo-data#identify-error-patterns).
diff --git a/docs/use-cases/observability/clickstack/example-datasets/kubernetes.md b/docs/use-cases/observability/clickstack/example-datasets/kubernetes.md
index a91d87db73e..9fc61beaa33 100644
--- a/docs/use-cases/observability/clickstack/example-datasets/kubernetes.md
+++ b/docs/use-cases/observability/clickstack/example-datasets/kubernetes.md
@@ -164,7 +164,7 @@ helm install myrelease --set clickhouse.enabled=false --set
If you'd rather use Managed ClickStack, you can deploy ClickStack and [disable the included ClickHouse](https://clickhouse.com/docs/use-cases/observability/clickstack/deployment/helm#using-clickhouse-cloud).
:::note
-The chart currently always deploys both HyperDX and MongoDB. While these components offer an alternative access path, they are not integrated with ClickHouse Cloud authentication. These components are intended for administrators in this deployment model, [providing access to the secure ingestion key](#retrieve-ingestion-api-key) needed to ingest through the deployed OTel collector, but should not be exposed to end users.
+The chart currently always deploys both HyperDX and MongoDB. While these components offer an alternative access path, they're not integrated with ClickHouse Cloud authentication. These components are intended for administrators in this deployment model, [providing access to the secure ingestion key](#retrieve-ingestion-api-key) needed to ingest through the deployed OTel collector, but shouldn't be exposed to end users.
:::
```shell
@@ -196,7 +196,7 @@ my-hyperdx-hdx-oss-v2-otel-collector-64cf698f5c-8s7qj 1/1 Running 0
Even when using Managed ClickStack, the local HyperDX instance deployed in the Kubernetes cluster is still required. It provides an ingestion key managed by the OpAMP server bundled with HyperDX, with secures ingestion through the deployed OTel collector - a capability not currently available in Managed ClickStack.
:::
-For security, the service uses `ClusterIP` and is not exposed externally by default.
+For security, the service uses `ClusterIP` and isn't exposed externally by default.
To access the HyperDX UI, port forward from 3000 to the local port 8080.
@@ -469,7 +469,7 @@ Navigate to your HyperDX UI - either using your Kubernetes-deployed instance or
Managed ClickStack
-If using Managed ClickStack, simply log in to your ClickHouse Cloud service and select "ClickStack" from the left menu. You will be automatically authenticated and will not need to create a user.
+If using Managed ClickStack, simply log in to your ClickHouse Cloud service and select "ClickStack" from the left menu. You will be automatically authenticated and won't need to create a user.
Data sources for logs, metrics and traces will be pre-created for you.
@@ -489,7 +489,7 @@ kubectl port-forward \
```
:::note ClickStack in production
-In production, we recommend using an ingress with TLS if you are not using Managed ClickStack. For example:
+In production, we recommend using an ingress with TLS if you're not using Managed ClickStack. For example:
```shell
helm upgrade my-hyperdx hyperdx/hdx-oss-v2 \
diff --git a/docs/use-cases/observability/clickstack/example-datasets/local-data.md b/docs/use-cases/observability/clickstack/example-datasets/local-data.md
index 6746d8e3ba4..86d68259f0b 100644
--- a/docs/use-cases/observability/clickstack/example-datasets/local-data.md
+++ b/docs/use-cases/observability/clickstack/example-datasets/local-data.md
@@ -267,7 +267,7 @@ docker run -d --name clickstack \
```
:::note Root user
-We run the collector as the root user to access all system logs—this is necessary to capture logs from protected paths on Linux-based systems. However, this approach is not recommended for production. In production environments, the OpenTelemetry Collector should be deployed as a local agent with only the minimal permissions required to access the intended log sources.
+We run the collector as the root user to access all system logs—this is necessary to capture logs from protected paths on Linux-based systems. However, this approach isn't recommended for production. In production environments, the OpenTelemetry Collector should be deployed as a local agent with only the minimal permissions required to access the intended log sources.
Note that we mount the host's `/var/log` to `/host/var/log` inside the container to avoid conflicts with the container's own log files.
:::
diff --git a/docs/use-cases/observability/clickstack/example-datasets/remote-demo-data.md b/docs/use-cases/observability/clickstack/example-datasets/remote-demo-data.md
index a5f79bd99de..16850532c01 100644
--- a/docs/use-cases/observability/clickstack/example-datasets/remote-demo-data.md
+++ b/docs/use-cases/observability/clickstack/example-datasets/remote-demo-data.md
@@ -45,7 +45,7 @@ import DemoArchitecture from '@site/docs/use-cases/observability/clickstack/exam
This guide uses a sample dataset hosted on the public ClickHouse playground at [sql.clickhouse.com](https://sql.clickhpouse.com), which you can connect to from your local ClickStack deployment.
:::warning Not supported with Managed ClickStack
-Remote databases are not supported when using Managed ClickStack. This dataset is therefore not supported.
+Remote databases aren't supported when using Managed ClickStack. This dataset is therefore not supported.
:::
It contains approximately 40 hours of data captured from the ClickHouse version of the official OpenTelemetry (OTel) demo. The data is replayed nightly with timestamps adjusted to the current time window, allowing users to explore system behavior using HyperDX's integrated logs, traces, and metrics.
@@ -161,7 +161,7 @@ Select the `Infrastructure` tab to view the metrics associated with the underlyi
-The issue does not seem to infrastructure related - no metrics have appreciably changed over the time period: either before or after the error. Close the infrastructure tab.
+The issue doesn't seem to infrastructure related - no metrics have appreciably changed over the time period: either before or after the error. Close the infrastructure tab.
### Explore a trace {#explore-a-trace}
@@ -284,7 +284,7 @@ In summary, by exploring logs, traces and finally metrics we have concluded:
### Using sessions {#using-sessions}
-Sessions allow us to replay the user experience, offering a visual account of how an error occurred from the user's perspective. While not typically used to diagnose root causes, they are valuable for confirming issues reported to customer support and can serve as a starting point for deeper investigation.
+Sessions allow us to replay the user experience, offering a visual account of how an error occurred from the user's perspective. While not typically used to diagnose root causes, they're valuable for confirming issues reported to customer support and can serve as a starting point for deeper investigation.
In HyperDX, sessions are linked to traces and logs, providing a complete view of the underlying cause.
@@ -306,7 +306,7 @@ If we scroll to the bottom of the spans we can see a `500` error associated with
-Selecting the span we can confirm this was caused by an internal error. By clicking the `Trace` tab and scrolling though the connected spans, we are able to confirm the customer indeed was a victim of our cache issue.
+Selecting the span we can confirm this was caused by an internal error. By clicking the `Trace` tab and scrolling though the connected spans, we're able to confirm the customer indeed was a victim of our cache issue.
diff --git a/docs/use-cases/observability/clickstack/example-datasets/sample-data.md b/docs/use-cases/observability/clickstack/example-datasets/sample-data.md
index c4f17d63182..78d6028cd10 100644
--- a/docs/use-cases/observability/clickstack/example-datasets/sample-data.md
+++ b/docs/use-cases/observability/clickstack/example-datasets/sample-data.md
@@ -93,7 +93,7 @@ done
```
This simulates OTLP log, trace, and metric sources sending data to the OTel collector. In production, these sources may be language clients or even other OTel collectors.
-Returning to the `Search` view, you should see that data has started to load (adjust the time frame to the `Last 1 hour` if the data does not render):
+Returning to the `Search` view, you should see that data has started to load (adjust the time frame to the `Last 1 hour` if the data doesn't render):
@@ -229,7 +229,7 @@ done
This simulates OTLP log, trace, and metric sources sending data to the OTel collector. In production, these sources may be language clients or even other OTel collectors.
-Returning to the `Search` view, you should see that data has started to load (adjust the time frame to the `Last 1 hour` if the data does not render):
+Returning to the `Search` view, you should see that data has started to load (adjust the time frame to the `Last 1 hour` if the data doesn't render):
diff --git a/docs/use-cases/observability/clickstack/getting-started/managed.md b/docs/use-cases/observability/clickstack/getting-started/managed.md
index e80b6bac967..3d5b1a449bc 100644
--- a/docs/use-cases/observability/clickstack/getting-started/managed.md
+++ b/docs/use-cases/observability/clickstack/getting-started/managed.md
@@ -55,7 +55,7 @@ Once your service has been provisioned, ensure the the service is selected and c
## Next Steps {#next-steps}
:::important[Record default credentials]
-If you have not recorded your default credentials during the above steps, navigate to the service and select `Connect`, recording the password and HTTP/native endpoints. Store these admin credentials securely, which can be reused in further guides.
+If you haven't recorded your default credentials during the above steps, navigate to the service and select `Connect`, recording the password and HTTP/native endpoints. Store these admin credentials securely, which can be reused in further guides.
:::
diff --git a/docs/use-cases/observability/clickstack/getting-started/oss.md b/docs/use-cases/observability/clickstack/getting-started/oss.md
index c64d4de52ce..7a4d05646ff 100644
--- a/docs/use-cases/observability/clickstack/getting-started/oss.md
+++ b/docs/use-cases/observability/clickstack/getting-started/oss.md
@@ -102,9 +102,9 @@ Alternatively, you can connect to a demo cluster where you can explore a larger
Local mode is a way to deploy HyperDX without needing to authenticate.
-**Authentication is not supported**.
+**Authentication isn't supported**.
-This mode is intended to be used for quick testing, development, demos and debugging use cases where authentication and settings persistence is not necessary.
+This mode is intended to be used for quick testing, development, demos and debugging use cases where authentication and settings persistence isn't necessary.
For further details on this deployment model, see ["Local Mode Only"](/use-cases/observability/clickstack/deployment/local-mode-only).
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/_snippets/_extending_config.md b/docs/use-cases/observability/clickstack/ingesting-data/_snippets/_extending_config.md
index b7e8277c5e3..317e8c81af5 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/_snippets/_extending_config.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/_snippets/_extending_config.md
@@ -76,7 +76,7 @@ docker run -d \
```
:::note
-You only define new receivers, processors, and pipelines in the custom config. The base processors (`memory_limiter`, `batch`) and exporters (`clickhouse`) are already defined—reference them by name. The custom configuration is merged with the base configuration and cannot override existing components.
+You only define new receivers, processors, and pipelines in the custom config. The base processors (`memory_limiter`, `batch`) and exporters (`clickhouse`) are already defined—reference them by name. The custom configuration is merged with the base configuration and can't override existing components.
:::
For more complex configurations, refer to the [default ClickStack collector configuration](https://github.com/hyperdxio/hyperdx/blob/main/docker/otel-collector/config.yaml) and the [ClickHouse exporter documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/clickhouseexporter/README.md#configuration-options).
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/collector.md b/docs/use-cases/observability/clickstack/ingesting-data/collector.md
index ddcdf78deec..c81d4701f5f 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/collector.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/collector.md
@@ -110,7 +110,7 @@ With Docker Compose, modify the collector configuration using the same environme
-If you are managing your own OpenTelemetry collector in a standalone deployment - such as when using the HyperDX-only distribution - we [recommend still using the official ClickStack distribution of the collector](/use-cases/observability/clickstack/deployment/hyperdx-only#otel-collector) for the gateway role where possible, but if you choose to bring your own, ensure it includes the [ClickHouse exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter).
+If you're managing your own OpenTelemetry collector in a standalone deployment - such as when using the HyperDX-only distribution - we [recommend still using the official ClickStack distribution of the collector](/use-cases/observability/clickstack/deployment/hyperdx-only#otel-collector) for the gateway role where possible, but if you choose to bring your own, ensure it includes the [ClickHouse exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter).
To deploy the ClickStack distribution of the OTel connector in a standalone mode, run the following docker command:
@@ -191,7 +191,7 @@ With Docker Compose, modify the collector configuration using the same environme
-By default, the ClickStack OpenTelemetry Collector is not secured when deployed outside of the Open Source distributions and does not require authentication on its OTLP ports.
+By default, the ClickStack OpenTelemetry Collector isn't secured when deployed outside of the Open Source distributions and doesn't require authentication on its OTLP ports.
To secure ingestion, specify an authentication token when deploying the collector using the `OTLP_AUTH_TOKEN` environment variable. For example:
@@ -262,7 +262,7 @@ This assumes the collector has been configured to use the database `otel`. This
## Processing - filtering, transforming, and enriching {#processing-filtering-transforming-enriching}
-Users will invariably want to filter, transform, and enrich event messages during ingestion. Since the configuration for the ClickStack connector cannot be modified, we recommend users who need further event filtering and processing either:
+Users will invariably want to filter, transform, and enrich event messages during ingestion. Since the configuration for the ClickStack connector can't be modified, we recommend users who need further event filtering and processing either:
- Deploy their own version of the OTel collector performing filtering and processing, sending events to the ClickStack collector via OTLP for ingestion into ClickHouse.
- Deploy their own version of the OTel collector and send events directly to ClickHouse using the ClickHouse exporter.
@@ -276,7 +276,7 @@ OpenTelemetry supports the following processing and filtering features you can l
- A [memory_limiter](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md) is used to prevent out of memory situations on the collector. See [Estimating Resources](#estimating-resources) for recommendations.
- Any processor that does enrichment based on context. For example, the [Kubernetes Attributes Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) allows the automatic setting of spans, metrics, and logs resource attributes with k8s metadata e.g. enriching events with their source pod id.
- [Tail or head sampling](https://opentelemetry.io/docs/concepts/sampling/) if required for traces.
-- [Basic filtering](https://opentelemetry.io/docs/collector/transforming-telemetry/) - Dropping events that are not required if this cannot be done via operator (see below).
+- [Basic filtering](https://opentelemetry.io/docs/collector/transforming-telemetry/) - Dropping events that aren't required if this can't be done via operator (see below).
- [Batching](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor) - essential when working with ClickHouse to ensure data is sent in batches. See ["Optimizing inserts"](#optimizing-inserts).
- **Operators** - [Operators](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/README.md) provide the most basic unit of processing available at the receiver. Basic parsing is supported, allowing fields such as the Severity and Timestamp to be set. JSON and regex parsing are supported here along with event filtering and basic transformations. We recommend performing event filtering here.
@@ -361,21 +361,21 @@ For this reason, the ClickStack distribution of the OTel collector uses the [bat
Typically, users are forced to send smaller batches when the throughput of a collector is low, and yet they still expect data to reach ClickHouse within a minimum end-to-end latency. In this case, small batches are sent when the `timeout` of the batch processor expires. This can cause problems and is when asynchronous inserts are required. This issue is rare if you're sending data to the ClickStack collector acting as a Gateway - by acting as aggregators, they alleviate this problem - see [Collector roles](#collector-roles).
-If large batches cannot be guaranteed, you can delegate batching to ClickHouse using [Asynchronous Inserts](/best-practices/selecting-an-insert-strategy#asynchronous-inserts). With asynchronous inserts, data is inserted into a buffer first and then written to the database storage later or asynchronously respectively.
+If large batches can't be guaranteed, you can delegate batching to ClickHouse using [Asynchronous Inserts](/best-practices/selecting-an-insert-strategy#asynchronous-inserts). With asynchronous inserts, data is inserted into a buffer first and then written to the database storage later or asynchronously respectively.
-With [asynchronous inserts enabled](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), when ClickHouse ① receives an insert query, the query's data is ② immediately written into an in-memory buffer first. When ③ the next buffer flush takes place, the buffer's data is [sorted](/guides/best-practices/sparse-primary-indexes#data-is-stored-on-disk-ordered-by-primary-key-columns) and written as a part to the database storage. Note, that the data is not searchable by queries before being flushed to the database storage; the buffer flush is [configurable](/optimize/asynchronous-inserts).
+With [asynchronous inserts enabled](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), when ClickHouse ① receives an insert query, the query's data is ② immediately written into an in-memory buffer first. When ③ the next buffer flush takes place, the buffer's data is [sorted](/guides/best-practices/sparse-primary-indexes#data-is-stored-on-disk-ordered-by-primary-key-columns) and written as a part to the database storage. Note, that the data isn't searchable by queries before being flushed to the database storage; the buffer flush is [configurable](/optimize/asynchronous-inserts).
To enable asynchronous inserts for the collector, add `async_insert=1` to the connection string. We recommend users use `wait_for_async_insert=1` (the default) to get delivery guarantees - see [here](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) for further details.
Data from an async insert is inserted once the ClickHouse buffer is flushed. This occurs either after the [`async_insert_max_data_size`](/operations/settings/settings#async_insert_max_data_size) is exceeded or after [`async_insert_busy_timeout_ms`](/operations/settings/settings#async_insert_max_data_size) milliseconds since the first INSERT query. If the `async_insert_stale_timeout_ms` is set to a non-zero value, the data is inserted after `async_insert_stale_timeout_ms milliseconds` since the last query. You can tune these settings to control the end-to-end latency of their pipeline. Further settings that can be used to tune buffer flushing are documented [here](/operations/settings/settings#async_insert). Generally, defaults are appropriate.
:::note Consider Adaptive Asynchronous Inserts
-In cases where a low number of agents are in use, with low throughput but strict end-to-end latency requirements, [adaptive asynchronous inserts](https://clickhouse.com/blog/clickhouse-release-24-02#adaptive-asynchronous-inserts) may be useful. Generally, these are not applicable to high throughput Observability use cases, as seen with ClickHouse.
+In cases where a low number of agents are in use, with low throughput but strict end-to-end latency requirements, [adaptive asynchronous inserts](https://clickhouse.com/blog/clickhouse-release-24-02#adaptive-asynchronous-inserts) may be useful. Generally, these aren't applicable to high throughput Observability use cases, as seen with ClickHouse.
:::
-Finally, the previous deduplication behavior associated with synchronous inserts into ClickHouse is not enabled by default when using asynchronous inserts. If required, see the setting [`async_insert_deduplicate`](/operations/settings/settings#async_insert_deduplicate).
+Finally, the previous deduplication behavior associated with synchronous inserts into ClickHouse isn't enabled by default when using asynchronous inserts. If required, see the setting [`async_insert_deduplicate`](/operations/settings/settings#async_insert_deduplicate).
Full details on configuring this feature can be found on this [docs page](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), or with a deep dive [blog post](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse).
@@ -389,11 +389,11 @@ The objective of this architecture is to offload computationally intensive proce
### Adding Kafka {#adding-kafka}
-Readers may notice the above architectures do not use Kafka as a message queue.
+Readers may notice the above architectures don't use Kafka as a message queue.
Using a Kafka queue as a message buffer is a popular design pattern seen in logging architectures and was popularized by the ELK stack. It provides a few benefits: principally, it helps provide stronger message delivery guarantees and helps deal with backpressure. Messages are sent from collection agents to Kafka and written to disk. In theory, a clustered Kafka instance should provide a high throughput message buffer since it incurs less computational overhead to write data linearly to disk than parse and process a message. In Elastic, for example, tokenization and indexing incurs significant overhead. By moving data away from the agents, you also incur less risk of losing messages as a result of log rotation at the source. Finally, it offers some message reply and cross-region replication capabilities, which might be attractive for some use cases.
-However, ClickHouse can handle inserting data very quickly - millions of rows per second on moderate hardware. Backpressure from ClickHouse is rare. Often, leveraging a Kafka queue means more architectural complexity and cost. If you can embrace the principle that logs do not need the same delivery guarantees as bank transactions and other mission-critical data, we recommend avoiding the complexity of Kafka.
+However, ClickHouse can handle inserting data very quickly - millions of rows per second on moderate hardware. Backpressure from ClickHouse is rare. Often, leveraging a Kafka queue means more architectural complexity and cost. If you can embrace the principle that logs don't need the same delivery guarantees as bank transactions and other mission-critical data, we recommend avoiding the complexity of Kafka.
However, if you require high delivery guarantees or the ability to replay data (potentially to multiple sources), Kafka can be a useful architectural addition.
@@ -525,5 +525,5 @@ INSERT INTO otel_metrics SELECT * FROM otel_metrics_map;
```
:::warning
-Recommended only for datasets smaller than ~10 billion rows. Data previously stored with the Map type did not preserve type precision (all values were strings). As a result, this old data will appear as strings in the new schema until it ages out, requiring some casting on the frontend. Type for new data will be preserved with the JSON type.
+Recommended only for datasets smaller than ~10 billion rows. Data previously stored with the Map type didn't preserve type precision (all values were strings). As a result, this old data will appear as strings in the new schema until it ages out, requiring some casting on the frontend. Type for new data will be preserved with the JSON type.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/aws-lambda.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/aws-lambda.md
index c0acfa8ecc4..ee38576d50a 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/aws-lambda.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/aws-lambda.md
@@ -46,7 +46,7 @@ This section covers configuring your existing AWS Lambda functions to send logs
#### Choose the appropriate Rotel Lambda Extension layer {#choose-layer}
Choose the Lambda layer that matches your Lambda runtime architecture. The `{version}` field
-is dependent on the AWS region that you are deploying into. Check the [releases](https://github.com/streamfold/rotel-lambda-extension/releases)
+is dependent on the AWS region that you're deploying into. Check the [releases](https://github.com/streamfold/rotel-lambda-extension/releases)
page for the latest version numbers that correspond to your region.
| Architecture | ARN |
@@ -187,7 +187,7 @@ For Parameter Store:
```
:::note
-AWS API calls for secret retrieval add 100-150ms to cold start latency. Secrets are retrieved in batches (up to 10) and only on initialization, so subsequent invocations are not impacted.
+AWS API calls for secret retrieval add 100-150ms to cold start latency. Secrets are retrieved in batches (up to 10) and only on initialization, so subsequent invocations aren't impacted.
:::
#### Test the integration {#test-integration}
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/cloudwatch.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/cloudwatch.md
index d3045da7cf3..c1ee6cc1fb6 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/cloudwatch.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/cloudwatch.md
@@ -58,7 +58,7 @@ If you would like to test the integration before configuring your production set
- AWS credentials with appropriate IAM permissions
:::note
-Unlike file-based log integrations (nginx, Redis), CloudWatch requires running a separate OpenTelemetry Collector that polls the CloudWatch API. This collector cannot run inside ClickStack's all-in-one image as it needs AWS credentials and API access.
+Unlike file-based log integrations (nginx, Redis), CloudWatch requires running a separate OpenTelemetry Collector that polls the CloudWatch API. This collector can't run inside ClickStack's all-in-one image as it needs AWS credentials and API access.
:::
@@ -217,7 +217,7 @@ For more configuration options, see the [CloudWatch receiver documentation](http
- Log group names/prefixes → Your actual CloudWatch log groups
:::note
-The CloudWatch receiver only fetches logs from recent time windows (based on `poll_interval`). When first started, it begins from the current time. Historical logs are not retrieved by default.
+The CloudWatch receiver only fetches logs from recent time windows (based on `poll_interval`). When first started, it begins from the current time. Historical logs aren't retrieved by default.
:::
#### Start the collector {#start-collector}
@@ -359,7 +359,7 @@ The dashboard will be created with all visualizations pre-configured:
:::note
-For the demo dataset, set the time range to **2025-12-07 00:00:00 - 2025-12-08 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-12-07 00:00:00 - 2025-12-08 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
@@ -389,7 +389,7 @@ docker compose logs otel-collector
Common errors:
- `The security token included in the request is invalid`: Credentials are invalid or expired. For temporary credentials (SSO), ensure `AWS_SESSION_TOKEN` is set.
- `operation error CloudWatch Logs: FilterLogEvents, AccessDeniedException`: IAM permissions are insufficient
-- `failed to refresh cached credentials, no EC2 IMDS role found`: AWS credentials environment variables are not set
+- `failed to refresh cached credentials, no EC2 IMDS role found`: AWS credentials environment variables aren't set
- `connection refused`: ClickStack endpoint is unreachable
**Verify CloudWatch log groups exist and have recent logs:**
@@ -409,7 +409,7 @@ aws logs filter-log-events \
**The CloudWatch receiver starts from "now" by default:**
-When the collector first starts, it creates a checkpoint at the current time and only fetches logs after that point. Historical logs are not retrieved.
+When the collector first starts, it creates a checkpoint at the current time and only fetches logs after that point. Historical logs aren't retrieved.
**To collect recent historical logs:**
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/ec2-host-logs.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/ec2-host-logs.md
index 943c59f2888..c8fb57ff3e4 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/ec2-host-logs.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/ec2-host-logs.md
@@ -70,7 +70,7 @@ curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-da
You should see your instance ID, region, and instance type. If these commands fail, verify:
- The instance metadata service is enabled
-- IMDSv2 is not blocked by security groups or network ACLs
+- IMDSv2 isn't blocked by security groups or network ACLs
- You're running these commands from the EC2 instance itself
:::note
@@ -461,7 +461,7 @@ You can filter dashboard visualizations by EC2 context:
- `host.id:i-0abc123def456` - Logs from specific instance
:::note
-For the demo dataset, set the time range to **2025-11-11 00:00:00 - 2025-11-12 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-11-11 00:00:00 - 2025-11-12 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
@@ -481,7 +481,7 @@ curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-da
If this fails, verify:
- The instance metadata service is enabled
-- IMDSv2 is not blocked by security groups
+- IMDSv2 isn't blocked by security groups
- You're running the collector on the EC2 instance itself
**Check collector logs for metadata errors:**
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/generic-host-logs.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/generic-host-logs.md
index 9e45b9ceebd..2303f25c596 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/generic-host-logs.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/host-logs/generic-host-logs.md
@@ -399,7 +399,7 @@ Key visualizations include:
- Service restart activity
:::note
-For the demo dataset, set the time range to **2025-11-11 00:00:00 - 2025-11-12 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-11-11 00:00:00 - 2025-11-12 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kafka-metrics.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kafka-metrics.md
index 1ac02036ad5..7b8014c8144 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kafka-metrics.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kafka-metrics.md
@@ -288,7 +288,7 @@ The dashboard will be created with all visualizations pre-configured:
:::note
-For the demo dataset, set the time range to **2025-11-05 16:00:00 - 2025-11-06 16:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-11-05 16:00:00 - 2025-11-06 16:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kubernetes.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kubernetes.md
index be4ea611a82..92a60f05d53 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kubernetes.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/kubernetes.md
@@ -46,7 +46,7 @@ kubectl create configmap -n=otel-demo otel-config-vars --from-literal=YOUR_OTEL_
### Creating the DaemonSet configuration {#creating-the-daemonset-configuration}
-The DaemonSet will collect logs and metrics from each node in the cluster but will not collect Kubernetes events or cluster-wide metrics.
+The DaemonSet will collect logs and metrics from each node in the cluster but won't collect Kubernetes events or cluster-wide metrics.
Download the DaemonSet manifest:
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/mysql.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/mysql.md
index d59460d25dc..e54a20756a5 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/mysql.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/mysql.md
@@ -373,7 +373,7 @@ The dashboard will be created with all visualizations pre-configured.
:::note
-For the demo dataset, set the time range to **2025-11-14 00:00:00 - 2025-11-15 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-11-14 00:00:00 - 2025-11-15 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-logs.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-logs.md
index bb67ecca47f..162508e7efc 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-logs.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-logs.md
@@ -290,7 +290,7 @@ To help you get started monitoring nginx with ClickStack, we provide essential v
#### The dashboard will be created with all visualizations pre-configured {#created-dashboard}
:::note
-For the demo dataset, set the time range to **2025-10-20 11:00:00 - 2025-10-21 11:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-10-20 11:00:00 - 2025-10-21 11:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-traces.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-traces.md
index b991ca41436..969c13b2503 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-traces.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nginx-traces.md
@@ -273,7 +273,7 @@ To help you get started monitoring traces with ClickStack, we provide essential
#### The dashboard will be created with all visualizations pre-configured. {#created-dashboard}
:::note
-For the demo dataset, set the time range to **2025-10-26 13:00:00 - 2025-10-27 13:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-10-26 13:00:00 - 2025-10-27 13:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nodejs-traces.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nodejs-traces.md
index a4c6a222741..3b7f2805ce2 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nodejs-traces.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/nodejs-traces.md
@@ -191,7 +191,7 @@ To help you get started monitoring Node.js application performance, we provide a
:::note
-For the demo dataset, set the time range to **2025-10-26 13:00:00 - 2025-10-27 13:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-10-26 13:00:00 - 2025-10-27 13:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
@@ -209,7 +209,7 @@ curl -X POST http://localhost:4318/v1/traces \
-d @nodejs-traces-sample.json
```
-This is a known issue that occurs when using the demo approach via curl and does not affect instrumented production applications.
+This is a known issue that occurs when using the demo approach via curl and doesn't affect instrumented production applications.
### No traces appearing in HyperDX {#no-traces}
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-logs.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-logs.md
index 2cfa2364fb4..3f84d6cd848 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-logs.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-logs.md
@@ -330,7 +330,7 @@ The dashboard will be created with all visualizations pre-configured:
:::note
-For the demo dataset, set the time range to **2025-11-10 00:00:00 - 2025-11-11 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-11-10 00:00:00 - 2025-11-11 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-metrics.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-metrics.md
index bc43edd703f..684d8aa5b50 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-metrics.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/postgres-metrics.md
@@ -218,7 +218,7 @@ The dashboard will be created with all visualizations pre-configured:
:::note
-For the demo dataset, set the time range to **2025-11-10 00:00:00 - 2025-11-11 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-11-10 00:00:00 - 2025-11-11 00:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-logs.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-logs.md
index 5cb5823c728..f53eaa73efc 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-logs.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-logs.md
@@ -293,7 +293,7 @@ To help you get started monitoring Redis with ClickStack, we provide essential v
#### The dashboard will be created with all visualizations pre-configured {#created-dashboard}
:::note
-For the demo dataset, set the time range to **2025-10-27 10:00:00 - 2025-10-28 10:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-10-27 10:00:00 - 2025-10-28 10:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-metrics.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-metrics.md
index c14e2aefc33..11922e72abd 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-metrics.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/redis-metrics.md
@@ -310,7 +310,7 @@ The dashboard will be created with all visualizations pre-configured:
:::note
-For the demo dataset, set the time range to **2025-10-20 05:00:00 - 2025-10-21 05:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard will not have a time range specified by default.
+For the demo dataset, set the time range to **2025-10-20 05:00:00 - 2025-10-21 05:00:00 (UTC)** (adjust based on your local timezone). The imported dashboard won't have a time range specified by default.
:::
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/systemd.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/systemd.md
index 0e250d49258..5193e1659d8 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/systemd.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/systemd.md
@@ -143,7 +143,7 @@ EOF
#### Deploy with Docker Compose {#deploy-docker-compose}
:::note
-The `journald` receiver requires the `journalctl` binary to read journal files. The official `otel/opentelemetry-collector-contrib` image does not include `journalctl` by default.
+The `journald` receiver requires the `journalctl` binary to read journal files. The official `otel/opentelemetry-collector-contrib` image doesn't include `journalctl` by default.
For containerized deployments, you can either install the collector directly on the host or build a custom image with systemd utilities. See the [troubleshooting section](#journalctl-not-found) for details.
:::
@@ -348,7 +348,7 @@ docker logs otel-collector | grep -i "error\|journald" | tail -20
If you see `exec: "journalctl": executable file not found in $PATH`:
-The `otel/opentelemetry-collector-contrib` image does not include `journalctl`. You can either:
+The `otel/opentelemetry-collector-contrib` image doesn't include `journalctl`. You can either:
1. **Install the collector on the host**:
```bash
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/temporal.md b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/temporal.md
index b634820d45d..1a0a3b58b9f 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/temporal.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/integration-examples/temporal.md
@@ -120,7 +120,7 @@ To enable custom collector configuration in your existing ClickStack deployment,
3. Mount the `temporal.key` file at `/etc/otelcol-contrib/temporal.key`
4. Ensure network connectivity between ClickStack and Temporal
-All commands assume they are executed from the sample directory as where `temporal-metrics.yaml` and `temporal.key` are stored.
+All commands assume they're executed from the sample directory as where `temporal-metrics.yaml` and `temporal.key` are stored.
##### Option 1: Docker Compose {#docker-compose}
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/opentelemetry.md b/docs/use-cases/observability/clickstack/ingesting-data/opentelemetry.md
index edf133472d5..4b4019b19d2 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/opentelemetry.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/opentelemetry.md
@@ -99,7 +99,7 @@ The ClickStack OpenTelemetry collector is included in most ClickStack distributi
The ClickStack OTel collector can also be deployed standalone, independent of other components of the stack.
-If you're using the [HyperDX-only](/use-cases/observability/clickstack/deployment/hyperdx-only) distribution, you are responsible for delivering data into ClickHouse yourself. This can be done by:
+If you're using the [HyperDX-only](/use-cases/observability/clickstack/deployment/hyperdx-only) distribution, you're responsible for delivering data into ClickHouse yourself. This can be done by:
- Running your own OpenTelemetry collector and pointing it at ClickHouse - see below.
- Sending directly to ClickHouse using alternative tooling, such as [Vector](https://vector.dev/), [Fluentd](https://www.fluentd.org/) etc, or even the default [OTel contrib collector distribution](https://github.com/open-telemetry/opentelemetry-collector-contrib).
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/schemas.md b/docs/use-cases/observability/clickstack/ingesting-data/schemas.md
index 9da15445b1b..e5859f97179 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/schemas.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/schemas.md
@@ -230,7 +230,7 @@ ORDER BY (ServiceName, MetricName, Attributes, toUnixTimestamp64Nano(TimeUnix))
### Exponential histograms {#exponential-histograms}
:::note
-HyperDX does not support fetching/displaying exponential histogram metrics yet. You may configure them in the metrics source but future support is forthcoming.
+HyperDX doesn't support fetching/displaying exponential histogram metrics yet. You may configure them in the metrics source but future support is forthcoming.
:::
```sql
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/sdks/browser.md b/docs/use-cases/observability/clickstack/ingesting-data/sdks/browser.md
index 1416b34826b..9651d1213bf 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/sdks/browser.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/sdks/browser.md
@@ -65,7 +65,7 @@ You can also include and install the script via a script tag as opposed to
installing via NPM. This will expose the `HyperDX` global variable and can be
used in the same way as the NPM package.
-This is recommended if your site is not currently built using a bundler.
+This is recommended if your site isn't currently built using a bundler.
```html
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/sdks/index.md b/docs/use-cases/observability/clickstack/ingesting-data/sdks/index.md
index af34d805536..0d6ecdbd18e 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/sdks/index.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/sdks/index.md
@@ -47,7 +47,7 @@ While ClickStack offers its own language SDKs with enhanced telemetry and featur
## Securing with API key {#securing-api-key}
:::Not required for Managed ClickStack
-The API key is not required for managed ClickStack.
+The API key isn't required for managed ClickStack.
:::
In order to send data to ClickStack via the OTel collector, SDKs will need to specify an ingestion API key. This can either be set using an `init` function in the SDK or an `OTEL_EXPORTER_OTLP_HEADERS` environment variable:
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/sdks/nestjs.md b/docs/use-cases/observability/clickstack/ingesting-data/sdks/nestjs.md
index cbc78e4bf77..61bf0eee424 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/sdks/nestjs.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/sdks/nestjs.md
@@ -72,12 +72,12 @@ export class CatsController {
### Replacing the Nest logger (also for bootstrapping) {#replacing-the-nest-logger}
:::note Important
-By doing this, you give up the dependency injection, meaning that `forRoot` and `forRootAsync` are not needed and shouldn't be used. Remove them from your main module.
+By doing this, you give up the dependency injection, meaning that `forRoot` and `forRootAsync` aren't needed and shouldn't be used. Remove them from your main module.
:::
Using the dependency injection has one minor drawback. Nest has to bootstrap the
application first (instantiating modules and providers, injecting dependencies,
-etc.) and during this process the instance of `HyperDXNestLogger` is not yet
+etc.) and during this process the instance of `HyperDXNestLogger` isn't yet
available, which means that Nest falls back to the internal logger.
One solution is to create the logger outside of the application lifecycle, using
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/sdks/python.md b/docs/use-cases/observability/clickstack/ingesting-data/sdks/python.md
index 55366a4f3e7..d06f95646cc 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/sdks/python.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/sdks/python.md
@@ -74,7 +74,7 @@ Now you can run the application with the OpenTelemetry Python agent (`openteleme
opentelemetry-instrument python app.py
```
-#### If you are using `Gunicorn`, `uWSGI` or `uvicorn` {#using-uvicorn-gunicorn-uwsgi}
+#### If you're using `Gunicorn`, `uWSGI` or `uvicorn` {#using-uvicorn-gunicorn-uwsgi}
In this case, the OpenTelemetry Python agent will require additional changes to work.
@@ -105,7 +105,7 @@ def init_tracing():
-OpenTelemetry [currently does not work](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/385) with `uvicorn` run using the `--reload`
+OpenTelemetry [currently doesn't work](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/385) with `uvicorn` run using the `--reload`
flag or with multi-workers (`--workers`). We recommend disabling those flags while testing, or using Gunicorn.
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/sdks/ruby.md b/docs/use-cases/observability/clickstack/ingesting-data/sdks/ruby.md
index 3805951da74..30d78792dbc 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/sdks/ruby.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/sdks/ruby.md
@@ -40,7 +40,7 @@ bundle add opentelemetry-sdk opentelemetry-instrumentation-all opentelemetry-exp
Next, you'll need to initialize the OpenTelemetry tracing instrumentation
and configure the log message formatter for Rails logger so that logs can be
-tied back to traces automatically. Without the custom formatter, logs will not
+tied back to traces automatically. Without the custom formatter, logs won't
be automatically correlated together in ClickStack.
In `config/initializers` folder, create a file called `hyperdx.rb` and add the
diff --git a/docs/use-cases/observability/clickstack/ingesting-data/vector.md b/docs/use-cases/observability/clickstack/ingesting-data/vector.md
index dad66860036..bb88af3fcf7 100644
--- a/docs/use-cases/observability/clickstack/ingesting-data/vector.md
+++ b/docs/use-cases/observability/clickstack/ingesting-data/vector.md
@@ -26,7 +26,7 @@ import TabItem from '@theme/TabItem';
When using Vector with ClickStack, users are responsible for defining their own schemas. These schemas may follow OpenTelemetry conventions, but they can also be entirely custom, representing user-defined event structures. In practice, Vector ingestion is most commonly used for **logs**, where users want full control over parsing and enrichment before data is written to ClickHouse.
-This guide focuses on onboarding data into ClickStack using Vector for both ClickStack Open Source and Managed ClickStack. For simplicity, it does not cover Vector sources or pipeline configuration in depth. Instead, it focuses on configuring the **sink** that writes data into ClickHouse and ensuring the resulting schema is compatible with ClickStack.
+This guide focuses on onboarding data into ClickStack using Vector for both ClickStack Open Source and Managed ClickStack. For simplicity, it doesn't cover Vector sources or pipeline configuration in depth. Instead, it focuses on configuring the **sink** that writes data into ClickHouse and ensuring the resulting schema is compatible with ClickStack.
The only strict requirement for ClickStack, whether using the open-source or managed deployment, is that the data includes a **timestamp column** (or equivalent time field), which can be declared when configuring the data source in the ClickStack UI.
@@ -102,7 +102,7 @@ sinks:
By default, we recommend using the **`json_each_row`** format, which encodes each event as a single JSON object per row. This is the default and recommended format for ClickStack when ingesting JSON data, and should be preferred over alternative formats such as JSON objects encoded as strings.
-The ClickHouse sink also supports **Arrow stream encoding** (currently in beta). This can offer higher throughput but comes with important constraints: the database and table must be static, as the schema is fetched once at startup, and dynamic routing is not supported. For this reason, Arrow encoding is best suited for fixed, well-defined ingestion pipelines.
+The ClickHouse sink also supports **Arrow stream encoding** (currently in beta). This can offer higher throughput but comes with important constraints: the database and table must be static, as the schema is fetched once at startup, and dynamic routing isn't supported. For this reason, Arrow encoding is best suited for fixed, well-defined ingestion pipelines.
We recommend reviewing the available sink configuration options in the [Vector documentation](https://vector.dev/docs/reference/configuration/sinks/clickhouse):
@@ -126,7 +126,7 @@ The configuration above assumes an Nginx-style schema with a `time_local` column
We also recommend updating the `Default SELECT` to explicitly define which columns are returned in the logs view. If additional fields are available, such as service name, log level, or a body column, these can also be configured. The timestamp display column can also be overridden if it differs from the column used in the table's primary key and configured above.
-In the example above, a `Body` column does not exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
+In the example above, a `Body` column doesn't exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
For other possible options, see the [configuration reference](/use-cases/observability/clickstack/config).
@@ -210,7 +210,7 @@ sinks:
By default, we recommend using the **`json_each_row`** format, which encodes each event as a single JSON object per row. This is the default and recommended format for ClickStack when ingesting JSON data, and should be preferred over alternative formats such as JSON objects encoded as strings.
-The ClickHouse sink also supports **Arrow stream encoding** (currently in beta). This can offer higher throughput but comes with important constraints: the database and table must be static, as the schema is fetched once at startup, and dynamic routing is not supported. For this reason, Arrow encoding is best suited for fixed, well-defined ingestion pipelines.
+The ClickHouse sink also supports **Arrow stream encoding** (currently in beta). This can offer higher throughput but comes with important constraints: the database and table must be static, as the schema is fetched once at startup, and dynamic routing isn't supported. For this reason, Arrow encoding is best suited for fixed, well-defined ingestion pipelines.
We recommend reviewing the available sink configuration options in the [Vector documentation](https://vector.dev/docs/reference/configuration/sinks/clickhouse):
@@ -234,7 +234,7 @@ The configuration above assumes an Nginx-style schema with a `time_local` column
We also recommend updating the `Default SELECT` to explicitly define which columns are returned in the logs view. If additional fields are available, such as service name, log level, or a body column, these can also be configured. The timestamp display column can also be overridden if it differs from the column used in the table's primary key and configured above.
-In the example above, a `Body` column does not exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
+In the example above, a `Body` column doesn't exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
For other possible options, see the [configuration reference](/use-cases/observability/clickstack/config).
@@ -375,7 +375,7 @@ The configuration assumes the Nginx schema with a `time_local` column used as th
We have also specified the default select to be `time_local, remote_addr, status, request`, which defines which columns are returned in the logs view.
-In the example above, a `Body` column does not exist in the data. Instead, it is defined as the SQL expression:
+In the example above, a `Body` column doesn't exist in the data. Instead, it is defined as the SQL expression:
```sql
concat(
@@ -527,7 +527,7 @@ The configuration assumes the Nginx schema with a `time_local` column used as th
We have also specified the default select to be `time_local, remote_addr, status, request`, which defines which columns are returned in the logs view.
-In the example above, a `Body` column does not exist in the data. Instead, it is defined as the SQL expression:
+In the example above, a `Body` column doesn't exist in the data. Instead, it is defined as the SQL expression:
```sql
concat(
diff --git a/docs/use-cases/observability/clickstack/managing/admin.md b/docs/use-cases/observability/clickstack/managing/admin.md
index 723ada4ffb7..add4acc96d0 100644
--- a/docs/use-cases/observability/clickstack/managing/admin.md
+++ b/docs/use-cases/observability/clickstack/managing/admin.md
@@ -9,7 +9,7 @@ keywords: ['clickstack', 'admin']
Most administrative tasks in ClickStack are performed directly on the underlying ClickHouse database. Users deploying ClickStack should be familiar with ClickHouse concepts and administration basics.
-Administrative operations typically involve executing DDL statements. The available options depend on whether you are using Managed ClickStack or ClickStack Open Source.
+Administrative operations typically involve executing DDL statements. The available options depend on whether you're using Managed ClickStack or ClickStack Open Source.
## ClickStack Open Source {#clickstack-oss}
diff --git a/docs/use-cases/observability/clickstack/managing/materialized_views.md b/docs/use-cases/observability/clickstack/managing/materialized_views.md
index dcb42de1662..f230095b93f 100644
--- a/docs/use-cases/observability/clickstack/managing/materialized_views.md
+++ b/docs/use-cases/observability/clickstack/managing/materialized_views.md
@@ -59,7 +59,7 @@ This threshold is expected to increase in future releases.
A single materialized view can compute multiple metrics for different groupings, for example, minimum, maximum, and p95 duration per service name over one-minute buckets. This allows a single view to serve many visualizations rather than just one. Consolidating metrics into shared views is therefore important to maximize the value of each view and ensure it's reused across dashboards and workflows.
:::
-Before proceeding further, you are recommended to familiarize yourself with materialized views in ClickHouse in more depth.
+Before proceeding further, you're recommended to familiarize yourself with materialized views in ClickHouse in more depth.
See our guide on [Incremental materialized views](/materialized-view/incremental-materialized-view) for additional details.
## Selecting visualizations for acceleration {#selecting-visualizatons-for-acceleration}
@@ -404,7 +404,7 @@ In summary, backfilling is often not worth the cost and operational risk. It sho
### Backfilling approaches {#backfilling-approaches}
:::note Avoid POPULATE
-Using the [POPULATE](/sql-reference/statements/create/view#materialized-view) command is not recommended for backfilling materialized views for anything other than small datasets where ingest is paused. This operator can miss rows inserted into its source table, with the materialized view created after the populate hash is finished. Furthermore, this populate runs against all data and is vulnerable to interruptions or memory limits on large datasets.
+Using the [POPULATE](/sql-reference/statements/create/view#materialized-view) command isn't recommended for backfilling materialized views for anything other than small datasets where ingest is paused. This operator can miss rows inserted into its source table, with the materialized view created after the populate hash is finished. Furthermore, this populate runs against all data and is vulnerable to interruptions or memory limits on large datasets.
:::
Suppose you want to backfill a materialized view corresponding to the following aggregation, which computes per-minute metrics grouped by service name and status code:
diff --git a/docs/use-cases/observability/clickstack/managing/performance_tuning.md b/docs/use-cases/observability/clickstack/managing/performance_tuning.md
index ec260384cb8..2d3b0795cb1 100644
--- a/docs/use-cases/observability/clickstack/managing/performance_tuning.md
+++ b/docs/use-cases/observability/clickstack/managing/performance_tuning.md
@@ -25,7 +25,7 @@ The optimizations are presented in a deliberate order, starting with the simples
Before applying any of the optimizations described in this guide, it's important to be familiar with a few core ClickHouse concepts.
-In ClickStack, each **data source maps directly to one or more ClickHouse tables**. When using OpenTelemetry, ClickStack creates and manages a set of default tables that store logs, traces, and metrics data. If you are using custom schemas or managing your own tables, you may already be familiar with these concepts. However, if you are simply sending data via the OpenTelemetry Collector, these tables are created automatically, and are where all optimizations described below will be applied.
+In ClickStack, each **data source maps directly to one or more ClickHouse tables**. When using OpenTelemetry, ClickStack creates and manages a set of default tables that store logs, traces, and metrics data. If you're using custom schemas or managing your own tables, you may already be familiar with these concepts. However, if you're simply sending data via the OpenTelemetry Collector, these tables are created automatically, and are where all optimizations described below will be applied.
| Data type | Table |
|----------------------------------|------------------------------------------------------------------------------------------------------------------------|
@@ -57,7 +57,7 @@ At a minimum, you should understand the following ClickHouse fundamentals:
These concepts are central to ClickHouse performance. They determine how data is written, how it's structured on disk, and how efficiently ClickHouse can skip reading data at query time. Every optimization in this guide, whether materialized columns, skip indexes, primary keys, projections, or materialized views, builds on these core mechanisms.
-You are recommended to review the following ClickHouse documentation before undertaking any tuning:
+You're recommended to review the following ClickHouse documentation before undertaking any tuning:
- [Creating tables in ClickHouse](/guides/creating-tables) - A simple introduction to tables.
- [Parts](/parts)
@@ -526,7 +526,7 @@ On identifying the subset of columns for the ordering key, they must be declared
### Changing the primary key {#changing-the-primary-key}
-If you are confident of your access patterns prior to data ingestion, simply drop and re-create the table for the relevant data type.
+If you're confident of your access patterns prior to data ingestion, simply drop and re-create the table for the relevant data type.
The example below shows a simple way to create a new logs table with the existing schema, but with a new primary key that includes the column `SeverityText` before the `ServiceName`.
@@ -743,7 +743,7 @@ ORDER BY t;
Queries that don't constrain `TraceId`, or that primarily filter on other dimensions that aren't leading in the projection’s ordering key, typically won't benefit (and may read via the base layout instead).
:::note
-Projections can also store aggregations (similar to materialized views). In ClickStack, projection-based aggregations are not generally recommended because selection depends on the ClickHouse analyzer, and usage can be harder to control and reason about. Instead, prefer explicit materialized views that ClickStack can register and select intentionally at the application layer.
+Projections can also store aggregations (similar to materialized views). In ClickStack, projection-based aggregations aren't generally recommended because selection depends on the ClickHouse analyzer, and usage can be harder to control and reason about. Instead, prefer explicit materialized views that ClickStack can register and select intentionally at the application layer.
:::
In practice, projections are best suited for workflows where you frequently pivot from a broader search to a trace-centric drill down (for example, fetching all spans for a specific TraceId).
@@ -764,7 +764,7 @@ For a deeper background, see:
:::note[Lightweight projections are Beta for ClickStack]
-`_part_offset-based` lightweight projections are not recommended for ClickStack workloads. While they reduce storage and write I/O, they can introduce more random access at query time, and their production behavior at the observability scale is still being evaluated. This recommendation may change as the feature matures and we gain more operational data.
+`_part_offset-based` lightweight projections aren't recommended for ClickStack workloads. While they reduce storage and write I/O, they can introduce more random access at query time, and their production behavior at the observability scale is still being evaluated. This recommendation may change as the feature matures and we gain more operational data.
:::
Newer ClickHouse versions also support more lightweight projections that store only the projection sorting key plus a `_part_offset` pointer into the base table, rather than duplicating full rows. This can greatly reduce storage overhead, and recent improvements enable granule-level pruning, making them behave more like true secondary indexes. See:
diff --git a/docs/use-cases/observability/clickstack/managing/production.md b/docs/use-cases/observability/clickstack/managing/production.md
index 82cd284d847..97fdd9d2d38 100644
--- a/docs/use-cases/observability/clickstack/managing/production.md
+++ b/docs/use-cases/observability/clickstack/managing/production.md
@@ -36,7 +36,7 @@ For production deployments, [Managed ClickStack](/use-cases/observability/clicks
### Secure ingestion {#secure-ingestion-managed}
-By default, the ClickStack OpenTelemetry Collector is not secured when deployed outside of the Open Source distributions and does not require authentication on its OTLP ports.
+By default, the ClickStack OpenTelemetry Collector isn't secured when deployed outside of the Open Source distributions and doesn't require authentication on its OTLP ports.
To secure ingestion, specify an authentication token when deploying the collector using the `OTLP_AUTH_TOKEN` environment variable. See ["Securing the collector"](/use-cases/observability/clickstack/ingesting-data/otel-collector#securing-the-collector) for further details.
@@ -57,7 +57,7 @@ These recommendations are based on the following assumptions:
- Data volume refers to **uncompressed ingest volume** per month and applies to both logs and traces.
- Query patterns are typical for observability use cases, with most queries targeting **recent data**, usually the last 24 hours.
- Ingestion is relatively **uniform across the month**. If you expect bursty traffic or spikes, you should provision additional headroom.
-- Storage is handled separately via ClickHouse Cloud object storage and is not a limiting factor for retention. We assume data retained for longer periods is infrequently accessed.
+- Storage is handled separately via ClickHouse Cloud object storage and isn't a limiting factor for retention. We assume data retained for longer periods is infrequently accessed.
More compute may be required for access patterns that regularly query longer time ranges, perform heavy aggregations, or support a high number of concurrent users.
@@ -77,7 +77,7 @@ These values are **estimates only** and should be used as an initial baseline. A
#### Isolating observability workloads {#isolating-workloads}
-If you are adding ClickStack to an **existing ClickHouse Cloud service** that already supports other workloads, such as real-time application analytics, isolating observability traffic is strongly recommended.
+If you're adding ClickStack to an **existing ClickHouse Cloud service** that already supports other workloads, such as real-time application analytics, isolating observability traffic is strongly recommended.
Use [**Managed Warehouses**](/cloud/reference/warehouses) to create a **child service** dedicated to ClickStack. This allows you to:
@@ -172,7 +172,7 @@ Users managing their own ClickHouse instance should adhere to the following best
#### Security best practices {#self-managed-security}
-If you are managing your own ClickHouse instance, it's essential to enable **TLS**, enforce authentication, and follow best practices for hardening access. See [this blog post](https://www.wiz.io/blog/clickhouse-and-wiz) for context on real-world misconfigurations and how to avoid them.
+If you're managing your own ClickHouse instance, it's essential to enable **TLS**, enforce authentication, and follow best practices for hardening access. See [this blog post](https://www.wiz.io/blog/clickhouse-and-wiz) for context on real-world misconfigurations and how to avoid them.
ClickHouse OSS provides robust security features out of the box. However, these require configuration:
@@ -198,7 +198,7 @@ The ClickHouse user for the ClickStack UI only needs to be a `readonly` user wit
- `cancel_http_readonly_queries_on_client_close`
- `wait_end_of_query`
-By default, the `default` user in both OSS and ClickHouse Cloud will have these permissions available however you are recommended to create a new user with these permissions.
+By default, the `default` user in both OSS and ClickHouse Cloud will have these permissions available however you're recommended to create a new user with these permissions.
### Configure Time To Live (TTL) {#configure-ttl}
diff --git a/docs/use-cases/observability/clickstack/migration/elastic/concepts.md b/docs/use-cases/observability/clickstack/migration/elastic/concepts.md
index 080a4b98af3..bf8b8c6719a 100644
--- a/docs/use-cases/observability/clickstack/migration/elastic/concepts.md
+++ b/docs/use-cases/observability/clickstack/migration/elastic/concepts.md
@@ -62,7 +62,7 @@ Elasticsearch is known for its schema flexibility through [dynamic mappings](htt
ClickHouse's approach to type flexibility is more transparent and controlled. Unlike Elasticsearch, where type conflicts can cause ingestion errors, ClickHouse allows mixed-type data in [`Variant`](/sql-reference/data-types/variant) columns and supports schema evolution through the use of the [`JSON`](/integrations/data-formats/json/overview) type.
-If not using [`JSON`](/integrations/data-formats/json/overview), the schema is statically-defined. If values are not provided for a row, they will either be defined as [`Nullable`](/sql-reference/data-types/nullable) (not used in ClickStack) or revert to the default value for the type e.g. empty value for `String`.
+If not using [`JSON`](/integrations/data-formats/json/overview), the schema is statically-defined. If values aren't provided for a row, they will either be defined as [`Nullable`](/sql-reference/data-types/nullable) (not used in ClickStack) or revert to the default value for the type e.g. empty value for `String`.
### Ingestion and transformation {#ingestion-and-transformation}
@@ -96,7 +96,7 @@ Elasticsearch recommends sizing shards to around [50 GB or 200 million documents
Elasticsearch indexes all fields into [**inverted indices**](https://www.elastic.co/docs/manage-data/data-store/index-basics) for fast search, optionally using [**doc values**](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/doc-values) for aggregations, sorting and scripted field access. Numeric and geo fields use [Block K-D trees](https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf) for searches on geospatial data and numeric and date ranges.
-Importantly, Elasticsearch stores the full original document in [`_source`](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field) (compressed with `LZ4`, `Deflate` or `ZSTD`), while ClickHouse does not store a separate document representation. Data is reconstructed from columns at query time, saving storage space. This same capability is possible for Elasticsearch using [Synthetic `_source`](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#synthetic-source), with some [restrictions](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#synthetic-source-restrictions). Disabling of `_source` also has [implications](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#include-exclude) which don't apply to ClickHouse.
+Importantly, Elasticsearch stores the full original document in [`_source`](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field) (compressed with `LZ4`, `Deflate` or `ZSTD`), while ClickHouse doesn't store a separate document representation. Data is reconstructed from columns at query time, saving storage space. This same capability is possible for Elasticsearch using [Synthetic `_source`](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#synthetic-source), with some [restrictions](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#synthetic-source-restrictions). Disabling of `_source` also has [implications](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#include-exclude) which don't apply to ClickHouse.
In Elasticsearch, [index mappings](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html) (equivalent to table schemas in ClickHouse) control the type of fields and the data structures used for this persistence and querying.
@@ -104,9 +104,9 @@ ClickHouse, by contrast, is **column-oriented** — every column is stored indep
-ClickHouse also supports [**skip indexes**](/optimize/skipping-indexes), which accelerate filtering by precomputing index data for selected columns. These must be explicitly defined but can significantly improve performance. Additionally, ClickHouse lets you specify [compression codecs](/use-cases/observability/schema-design#using-codecs) and compression algorithms per column — something Elasticsearch does not support (its [compression](https://www.elastic.co/docs/reference/elasticsearch/index-settings/index-modules) only applies to `_source` JSON storage).
+ClickHouse also supports [**skip indexes**](/optimize/skipping-indexes), which accelerate filtering by precomputing index data for selected columns. These must be explicitly defined but can significantly improve performance. Additionally, ClickHouse lets you specify [compression codecs](/use-cases/observability/schema-design#using-codecs) and compression algorithms per column — something Elasticsearch doesn't support (its [compression](https://www.elastic.co/docs/reference/elasticsearch/index-settings/index-modules) only applies to `_source` JSON storage).
-ClickHouse also supports sharding, but its model is designed to favor **vertical scaling**. A single shard can store **trillions of rows** and continues to perform efficiently as long as memory, CPU, and disk permit. Unlike Elasticsearch, there is **no hard row limit** per shard. Shards in ClickHouse are logical — effectively individual tables — and do not require partitioning unless the dataset exceeds the capacity of a single node. This typically occurs due to disk size constraints, with sharding ① introduced only when horizontal scale-out is necessary - reducing complexity and overhead. In this case, similar to Elasticsearch, a shard will hold a subset of the data. The data within a single shard is organized as a collection of ② immutable data parts containing ③ several data structures.
+ClickHouse also supports sharding, but its model is designed to favor **vertical scaling**. A single shard can store **trillions of rows** and continues to perform efficiently as long as memory, CPU, and disk permit. Unlike Elasticsearch, there is **no hard row limit** per shard. Shards in ClickHouse are logical — effectively individual tables — and don't require partitioning unless the dataset exceeds the capacity of a single node. This typically occurs due to disk size constraints, with sharding ① introduced only when horizontal scale-out is necessary - reducing complexity and overhead. In this case, similar to Elasticsearch, a shard will hold a subset of the data. The data within a single shard is organized as a collection of ② immutable data parts containing ③ several data structures.
Processing within a ClickHouse shard is **fully parallelized**, and users are encouraged to scale vertically to avoid the network costs associated with moving data across nodes.
@@ -135,7 +135,7 @@ Ultimately, ClickHouse favors simplicity and performance at scale by minimizing
### Deduplication and routing {#deduplication-and-routing}
-Elasticsearch de-duplicates documents based on their `_id`, routing them to shards accordingly. ClickHouse does not store a default row identifier but supports **insert-time deduplication**, allowing users to retry failed inserts safely. For more control, `ReplacingMergeTree` and other table engines enable deduplication by specific columns.
+Elasticsearch de-duplicates documents based on their `_id`, routing them to shards accordingly. ClickHouse doesn't store a default row identifier but supports **insert-time deduplication**, allowing users to retry failed inserts safely. For more control, `ReplacingMergeTree` and other table engines enable deduplication by specific columns.
Index routing in Elasticsearch ensures specific documents are always routed to specific shards. In ClickHouse, you can define **shard keys** or use `Distributed` tables to achieve similar data locality.
@@ -187,7 +187,7 @@ Elasticsearch and ClickHouse take fundamentally different approaches to managing
#### Index lifecycle management vs native TTL {#lifecycle-vs-ttl}
-In Elasticsearch, long-term data management is handled through **Index Lifecycle Management (ILM)** and **Data Streams**. These features allow you to define policies that govern when indices are rolled over (e.g. after reaching a certain size or age), when older indices are moved to lower-cost storage (e.g. warm or cold tiers), and when they are ultimately deleted. This is necessary because Elasticsearch does **not support re-sharding**, and shards cannot grow indefinitely without performance degradation. To manage shard sizes and support efficient deletion, new indices must be created periodically and old ones removed — effectively rotating data at the index level.
+In Elasticsearch, long-term data management is handled through **Index Lifecycle Management (ILM)** and **Data Streams**. These features allow you to define policies that govern when indices are rolled over (e.g. after reaching a certain size or age), when older indices are moved to lower-cost storage (e.g. warm or cold tiers), and when they're ultimately deleted. This is necessary because Elasticsearch does **not support re-sharding**, and shards can't grow indefinitely without performance degradation. To manage shard sizes and support efficient deletion, new indices must be created periodically and old ones removed — effectively rotating data at the index level.
ClickHouse takes a different approach. Data is typically stored in a **single table** and managed using **TTL (time-to-live) expressions** at the column or partition level. Data can be **partitioned by date**, allowing efficient deletion without the need to create new tables or perform index rollovers. As data ages and meets the TTL condition, ClickHouse will automatically remove it — no additional infrastructure is required to manage rotation.
@@ -239,7 +239,7 @@ This model is particularly powerful for observability use cases where you need t
ClickHouse and Elasticsearch take fundamentally different approaches to lakehouse integration. ClickHouse is a fully-fledged query execution engine capable of executing queries over lakehouse formats such as [Iceberg](/sql-reference/table-functions/iceberg) and [Delta Lake](/sql-reference/table-functions/deltalake), as well as integrating with data lake catalogs such as [AWS Glue](/use-cases/data-lake/glue-catalog) and [Unity catalog](/use-cases/data-lake/unity-catalog). These formats rely on efficient querying of [Parquet](/interfaces/formats/Parquet) files, which ClickHouse fully supports. ClickHouse can read both Iceberg and Delta Lake tables directly, enabling seamless integration with modern data lake architectures.
-In contrast, Elasticsearch is tightly coupled to its internal data format and Lucene-based storage engine. It cannot directly query lakehouse formats or Parquet files, limiting its ability to participate in modern data lake architectures. Elasticsearch requires data to be transformed and loaded into its proprietary format before it can be queried.
+In contrast, Elasticsearch is tightly coupled to its internal data format and Lucene-based storage engine. It can't directly query lakehouse formats or Parquet files, limiting its ability to participate in modern data lake architectures. Elasticsearch requires data to be transformed and loaded into its proprietary format before it can be queried.
ClickHouse's lakehouse capabilities extend beyond just reading data:
diff --git a/docs/use-cases/observability/clickstack/migration/elastic/intro.md b/docs/use-cases/observability/clickstack/migration/elastic/intro.md
index 431c0e59a4f..7bbbc1d463f 100644
--- a/docs/use-cases/observability/clickstack/migration/elastic/intro.md
+++ b/docs/use-cases/observability/clickstack/migration/elastic/intro.md
@@ -19,14 +19,14 @@ Before beginning a migration, it's important to understand the tradeoffs between
You should consider moving to ClickStack if:
-- You are ingesting large volumes of observability data and find Elastic cost-prohibitive due to inefficient compression and poor resource utilization. ClickStack can reduce storage and compute costs significantly — offering at least 10x compression on raw data.
+- You're ingesting large volumes of observability data and find Elastic cost-prohibitive due to inefficient compression and poor resource utilization. ClickStack can reduce storage and compute costs significantly — offering at least 10x compression on raw data.
- You experience poor search performance at scale or face ingestion bottlenecks.
- You want to correlate observability signals with business data using SQL, unifying observability and analytics workflows.
-- You are committed to OpenTelemetry and want to avoid vendor lock-in.
+- You're committed to OpenTelemetry and want to avoid vendor lock-in.
- You want to take advantage of the separation of storage and compute in ClickHouse Cloud, enabling virtually unlimited scale — paying only for ingestion compute and object storage during idle periods.
However, ClickStack may not be suitable if:
- You use observability data primarily for security use cases and need a SIEM-focused product.
- Universal profiling is a critical part of your workflow.
-- You require a business intelligence (BI) dashboarding platform. ClickStack intentionally has opinionated visual workflows for SREs and developers and is not designed as a Business Intelligence (BI) tool. For equivalent capabilities, we recommend using [Grafana with the ClickHouse plugin](/integrations/grafana) or [Superset](/integrations/superset).
+- You require a business intelligence (BI) dashboarding platform. ClickStack intentionally has opinionated visual workflows for SREs and developers and isn't designed as a Business Intelligence (BI) tool. For equivalent capabilities, we recommend using [Grafana with the ClickHouse plugin](/integrations/grafana) or [Superset](/integrations/superset).
diff --git a/docs/use-cases/observability/clickstack/migration/elastic/migrating-agents.md b/docs/use-cases/observability/clickstack/migration/elastic/migrating-agents.md
index e9654091200..29d477120ad 100644
--- a/docs/use-cases/observability/clickstack/migration/elastic/migrating-agents.md
+++ b/docs/use-cases/observability/clickstack/migration/elastic/migrating-agents.md
@@ -23,7 +23,7 @@ The Elastic Stack provides a number of Observability data collection agents. Spe
- The [Beats family](https://www.elastic.co/beats) - such as [Filebeat](https://www.elastic.co/beats/filebeat), [Metricbeat](https://www.elastic.co/beats/metricbeat), and [Packetbeat](https://www.elastic.co/beats/packetbeat) - all based on the `libbeat` library. These Beats support [sending data to Elasticsearch, Kafka, Redis, or Logstash](https://www.elastic.co/docs/reference/beats/filebeat/configuring-output) over the Lumberjack protocol.
- The [`Elastic Agent`](https://www.elastic.co/elastic-agent) provides a unified agent capable of collecting logs, metrics, and traces. This agent can be centrally managed via the [Elastic Fleet Server](https://www.elastic.co/docs/reference/fleet/manage-elastic-agents-in-fleet) and supports output to Elasticsearch, Logstash, Kafka, or Redis.
-- Elastic also provides a distribution of the [OpenTelemetry Collector - EDOT](https://www.elastic.co/docs/reference/opentelemetry). While it currently cannot be orchestrated by the Fleet Server, it offers a more flexible and open path if you're migrating to ClickStack.
+- Elastic also provides a distribution of the [OpenTelemetry Collector - EDOT](https://www.elastic.co/docs/reference/opentelemetry). While it currently can't be orchestrated by the Fleet Server, it offers a more flexible and open path if you're migrating to ClickStack.
The best migration path depends on the agent(s) currently in use. In the sections that follow, we document migration options for each major agent type. Our goal is to minimize friction and, where possible, allow you to continue using your existing agents during the transition.
@@ -368,7 +368,7 @@ sources:
The Elastic Agent includes an embedded EDOT Collector that allows you to instrument your applications and infrastructure once and send data to multiple vendors and backends.
:::note Agent integrations and orchestration
-Users running the EDOT collector distributed with Elastic Agent will not be able to exploit the [existing integrations offered by the agent](https://www.elastic.co/docs/reference/fleet/manage-integrations). Additionally, the collector cannot be centrally managed by Fleet - forcing the user to run the [agent in standalone mode](https://www.elastic.co/docs/reference/fleet/configure-standalone-elastic-agents), managing configuration themselves.
+Users running the EDOT collector distributed with Elastic Agent won't be able to exploit the [existing integrations offered by the agent](https://www.elastic.co/docs/reference/fleet/manage-integrations). Additionally, the collector can't be centrally managed by Fleet - forcing the user to run the [agent in standalone mode](https://www.elastic.co/docs/reference/fleet/configure-standalone-elastic-agents), managing configuration themselves.
:::
To run the Elastic Agent with the EDOT collector, see the [official Elastic guide](https://www.elastic.co/docs/reference/fleet/otel-agent-transform). Rather than configuring the Elastic endpoint, as indicated in the guide, remove existing `exporters` and configure the OTLP output - sending data to the ClickStack OpenTelemetry collector. For example, the configuration for the exporters becomes:
@@ -385,7 +385,7 @@ exporters:
```
:::note Managed ClickStack
-By default, an API ingestion key is not required if running an OpenTelemetry Collector standalone for Managed ClickStack. Ingestion can be secured however, by specifying an OTLP auth token. See ["Securing the collector"](/use-cases/observability/clickstack/ingesting-data/otel-collector#securing-the-collector).
+By default, an API ingestion key isn't required if running an OpenTelemetry Collector standalone for Managed ClickStack. Ingestion can be secured however, by specifying an OTLP auth token. See ["Securing the collector"](/use-cases/observability/clickstack/ingesting-data/otel-collector#securing-the-collector).
:::
The `YOUR_INGESTION_API_KEY` here is produced by ClickStack. You can find the key in the ClickStack UI under `Team Settings → API Keys`.
diff --git a/docs/use-cases/observability/clickstack/migration/elastic/migrating-data.md b/docs/use-cases/observability/clickstack/migration/elastic/migrating-data.md
index d3e3cfb85ef..f9d710dfea8 100644
--- a/docs/use-cases/observability/clickstack/migration/elastic/migrating-data.md
+++ b/docs/use-cases/observability/clickstack/migration/elastic/migrating-data.md
@@ -20,7 +20,7 @@ When migrating from Elastic to ClickStack for observability use cases, we recomm
3. **Simplified migration**: no need for complex data transfer tools or processes to move historical data between systems.
:::note Migrating data
-We demonstrate an approach for migrating essential data from Elasticsearch to ClickHouse in the section ["Migrating data"](#migrating-data). This should not be used for larger datasets as it is rarely performant - limited by the ability for Elasticsearch to export efficiently, with only JSON format supported.
+We demonstrate an approach for migrating essential data from Elasticsearch to ClickHouse in the section ["Migrating data"](#migrating-data). This shouldn't be used for larger datasets as it is rarely performant - limited by the ability for Elasticsearch to export efficiently, with only JSON format supported.
:::
### Implementation steps {#implementation-steps}
@@ -573,7 +573,7 @@ Where possible, we recommend running both ClickHouse, Elasticsearch, and `elasti
### Install ClickHouse client {#install-clickhouse-client}
-Ensure ClickHouse is [installed on the server](/install) on which `elasticdump` is located. **Do not start a ClickHouse server** - these steps only require the client.
+Ensure ClickHouse is [installed on the server](/install) on which `elasticdump` is located. **Don't start a ClickHouse server** - these steps only require the client.
### Stream data {#stream-data}
diff --git a/docs/use-cases/observability/clickstack/migration/elastic/migrating-sdks.md b/docs/use-cases/observability/clickstack/migration/elastic/migrating-sdks.md
index a27d7a4a9c8..b925987e367 100644
--- a/docs/use-cases/observability/clickstack/migration/elastic/migrating-sdks.md
+++ b/docs/use-cases/observability/clickstack/migration/elastic/migrating-sdks.md
@@ -18,10 +18,10 @@ The Elastic Stack provides two types of language SDKs for instrumenting applicat
1. **[Elastic Official APM agents](https://www.elastic.co/docs/reference/apm-agents/)** – These are built specifically for use with the Elastic Stack. There is currently no direct migration path for these SDKs. Applications using them will need to be re-instrumented using the corresponding [ClickStack SDKs](/use-cases/observability/clickstack/sdks).
-2. **[Elastic Distributions of OpenTelemetry (EDOT SDKs)](https://www.elastic.co/docs/reference/opentelemetry/edot-sdks/)** – These are Elastic's distributions of the standard OpenTelemetry SDKs, available for .NET, Java, Node.js, PHP, and Python. If your application is already using an EDOT SDK, you do not need to re-instrument your code. Instead, you can simply reconfigure the SDK to export telemetry data to the OTLP Collector included in ClickStack. See ["Migrating EDOT SDKs"](#migrating-edot-sdks) for further details.
+2. **[Elastic Distributions of OpenTelemetry (EDOT SDKs)](https://www.elastic.co/docs/reference/opentelemetry/edot-sdks/)** – These are Elastic's distributions of the standard OpenTelemetry SDKs, available for .NET, Java, Node.js, PHP, and Python. If your application is already using an EDOT SDK, you don't need to re-instrument your code. Instead, you can simply reconfigure the SDK to export telemetry data to the OTLP Collector included in ClickStack. See ["Migrating EDOT SDKs"](#migrating-edot-sdks) for further details.
:::note Use ClickStack SDKs where possible
-While standard OpenTelemetry SDKs are supported, we strongly recommend using the [**ClickStack-distributed SDKs**](/use-cases/observability/clickstack/sdks) for each language. These distributions include additional instrumentation, enhanced defaults, and custom extensions designed to work seamlessly with the ClickStack pipeline and UI. By using the ClickStack SDKs, you can unlock advanced features such as exception stack traces that are not available with vanilla OpenTelemetry or EDOT SDKs.
+While standard OpenTelemetry SDKs are supported, we strongly recommend using the [**ClickStack-distributed SDKs**](/use-cases/observability/clickstack/sdks) for each language. These distributions include additional instrumentation, enhanced defaults, and custom extensions designed to work seamlessly with the ClickStack pipeline and UI. By using the ClickStack SDKs, you can unlock advanced features such as exception stack traces that aren't available with vanilla OpenTelemetry or EDOT SDKs.
:::
## Migrating EDOT SDKs {#migrating-edot-sdks}
diff --git a/docs/use-cases/observability/clickstack/migration/elastic/search.md b/docs/use-cases/observability/clickstack/migration/elastic/search.md
index fad3e5654f2..8abfc4a7f16 100644
--- a/docs/use-cases/observability/clickstack/migration/elastic/search.md
+++ b/docs/use-cases/observability/clickstack/migration/elastic/search.md
@@ -16,7 +16,7 @@ import hyperdx_sql from '@site/static/images/use-cases/observability/hyperdx-sql
## Search in ClickStack and Elastic {#search-in-clickstack-and-elastic}
-ClickHouse is a SQL-native engine, designed from the ground up for high-performance analytical workloads. In contrast, Elasticsearch provides a SQL-like interface, transpiling SQL into the underlying Elasticsearch query DSL — meaning it is not a first-class citizen, and [feature parity](https://www.elastic.co/docs/explore-analyze/query-filter/languages/sql-limitations) is limited.
+ClickHouse is a SQL-native engine, designed from the ground up for high-performance analytical workloads. In contrast, Elasticsearch provides a SQL-like interface, transpiling SQL into the underlying Elasticsearch query DSL — meaning it isn't a first-class citizen, and [feature parity](https://www.elastic.co/docs/explore-analyze/query-filter/languages/sql-limitations) is limited.
ClickHouse not only supports full SQL but extends it with a range of observability-focused functions, such as [`argMax`](/sql-reference/aggregate-functions/reference/argmax), [`histogram`](/sql-reference/aggregate-functions/parametric-functions#histogram), and [`quantileTiming`](/sql-reference/aggregate-functions/reference/quantiletiming), that simplify querying structured logs, metrics, and traces.
@@ -44,11 +44,11 @@ Both ClickStack and Elasticsearch provide flexible query languages to enable int
| AND conditions | `error AND db` | `error AND db` | Both translate to intersection; no difference in user syntax. |
| Negation | `NOT error` or `-error` | `NOT error` or `-error` | Supported identically; ClickStack converts to SQL `NOT ILIKE`. |
| Grouping | `(error OR fail) AND db` | `(error OR fail) AND db` | Standard Boolean grouping in both. |
-| Wildcards | `error*` or `*fail*` | `error*`, `*fail*` | ClickStack supports leading/trailing wildcards; ES disables leading wildcards by default for perf. Wildcards within terms are not supported, e.g., `f*ail.` Wildcards must be applied with a field match.|
-| Ranges (numeric/date) | `duration:[100 TO 200]` | `duration:[100 TO 200]` | ClickStack uses SQL `BETWEEN`; Elasticsearch expands to range queries. Unbounded `*` in ranges are not supported e.g. `duration:[100 TO *]`. If needed use `Unbounded ranges` below.|
+| Wildcards | `error*` or `*fail*` | `error*`, `*fail*` | ClickStack supports leading/trailing wildcards; ES disables leading wildcards by default for perf. Wildcards within terms aren't supported, e.g., `f*ail.` Wildcards must be applied with a field match.|
+| Ranges (numeric/date) | `duration:[100 TO 200]` | `duration:[100 TO 200]` | ClickStack uses SQL `BETWEEN`; Elasticsearch expands to range queries. Unbounded `*` in ranges aren't supported e.g. `duration:[100 TO *]`. If needed use `Unbounded ranges` below.|
| Unbounded ranges (numeric/date) | `duration:>10` or `duration:>=10` | `duration:>10` or `duration:>=10` | ClickStack uses standard SQL operators|
-| Inclusive/exclusive | `duration:{100 TO 200}` (exclusive) | Same | Curly brackets denote exclusive bounds. `*` in ranges are not supported. e.g. `duration:[100 TO *]`|
-| Exists check | N/A | `_exists_:user` or `field:*` | `_exists_` is not supported. Use `LogAttributes.log.file.path: *` for `Map` columns e.g. `LogAttributes`. For root columns, these have to exist and will have a default value if not included in the event. To search for default values or missing columns use the same syntax as Elasticsearch ` ServiceName:*` or `ServiceName != ''`. |
+| Inclusive/exclusive | `duration:{100 TO 200}` (exclusive) | Same | Curly brackets denote exclusive bounds. `*` in ranges aren't supported. e.g. `duration:[100 TO *]`|
+| Exists check | N/A | `_exists_:user` or `field:*` | `_exists_` isn't supported. Use `LogAttributes.log.file.path: *` for `Map` columns e.g. `LogAttributes`. For root columns, these have to exist and will have a default value if not included in the event. To search for default values or missing columns use the same syntax as Elasticsearch ` ServiceName:*` or `ServiceName != ''`. |
| Regex | `match` function | `name:/joh?n(ath[oa]n)/` | Not currently supported in Lucene syntax. You can use SQL and the [`match`](/sql-reference/functions/string-search-functions#match) function or other [string search functions](/sql-reference/functions/string-search-functions).|
| Fuzzy match | `editDistance('quikc', field) = 1` | `quikc~` | Not currently supported in Lucene syntax. Distance functions can be used in SQL e.g. `editDistance('rror', SeverityText) = 1` or [other similarity functions](/sql-reference/functions/string-functions#jaroSimilarity). |
| Proximity search | Not supported | `"fox quick"~5` | Not currently supported in Lucene syntax. |
@@ -58,14 +58,14 @@ Both ClickStack and Elasticsearch provide flexible query languages to enable int
## Exists/missing differences {#empty-value-differences}
-Unlike Elasticsearch, where a field can be entirely omitted from an event and therefore truly "not exist," ClickHouse requires all columns in a table schema to exist. If a field is not provided in an insert event:
+Unlike Elasticsearch, where a field can be entirely omitted from an event and therefore truly "not exist," ClickHouse requires all columns in a table schema to exist. If a field isn't provided in an insert event:
- For [`Nullable`](/sql-reference/data-types/nullable) fields, it will be set to `NULL`.
- For non-nullable fields (the default), it will be populated with a default value (often an empty string, 0, or equivalent).
In ClickStack, we use the latter as [`Nullable`](/sql-reference/data-types/nullable) is [not recommended](/optimize/avoid-nullable-columns).
-This behavior means that checking whether a field "exists”" in the Elasticsearch sense is not directly supported.
+This behavior means that checking whether a field "exists”" in the Elasticsearch sense isn't directly supported.
Instead, you can use `field:*` or `field != ''` to check for the presence of a non-empty value. It is thus not possible to distinguish between truly missing and explicitly empty fields.
diff --git a/docs/use-cases/observability/clickstack/migration/elastic/types.md b/docs/use-cases/observability/clickstack/migration/elastic/types.md
index fbe1cbc1e92..08529b9a287 100644
--- a/docs/use-cases/observability/clickstack/migration/elastic/types.md
+++ b/docs/use-cases/observability/clickstack/migration/elastic/types.md
@@ -25,7 +25,7 @@ Elasticsearch and ClickHouse support a wide variety of data types, but their und
| `unsigned_long` | [`UInt64`](/sql-reference/data-types/int-uint) | Unsigned 64-bit integer. |
| `double` | [`Float64`](/sql-reference/data-types/float) | 64-bit floating-point. |
| `float` | [`Float32`](/sql-reference/data-types/float) | 32-bit floating-point. |
-| `half_float` | [`Float32`](/sql-reference/data-types/float) or [`BFloat16`](/sql-reference/data-types/float) | Closest equivalent. ClickHouse does not have a 16-bit float. ClickHouse has a `BFloat16`- this is different from Half-float IEE-754: half-float offers higher precision with a smaller range, while bfloat16 sacrifices precision for a wider range, making it better suited for machine learning workloads. |
+| `half_float` | [`Float32`](/sql-reference/data-types/float) or [`BFloat16`](/sql-reference/data-types/float) | Closest equivalent. ClickHouse doesn't have a 16-bit float. ClickHouse has a `BFloat16`- this is different from Half-float IEE-754: half-float offers higher precision with a smaller range, while bfloat16 sacrifices precision for a wider range, making it better suited for machine learning workloads. |
| `scaled_float` | [`Decimal(x, y)`](/sql-reference/data-types/decimal) | Store fixed-point numeric values. |
| `date` | [`DateTime`](/sql-reference/data-types/datetime) | Equivalent date types with second precision. |
| `date_nanos` | [`DateTime64`](/sql-reference/data-types/datetime64) | ClickHouse supports nanosecond precision with `DateTime64(9)`. |
@@ -49,7 +49,7 @@ Elasticsearch and ClickHouse support a wide variety of data types, but their und
| `geo_point` | [`Tuple(Float64, Float64)`](/sql-reference/data-types/tuple) or [`Point`](/sql-reference/data-types/geo#point) | Use tuple of (latitude, longitude). [`Point`](/sql-reference/data-types/geo#point) is available as a ClickHouse type. |
| `geo_shape`, `shape` | [`Ring`](/sql-reference/data-types/geo#ring), [`LineString`](/sql-reference/data-types/geo#linestring), [`MultiLineString`](/sql-reference/data-types/geo#multilinestring), [`Polygon`](/sql-reference/data-types/geo#polygon), [`MultiPolygon`](/sql-reference/data-types/geo#multipolygon) | Native support for geo shapes and spatial indexing. |
| `percolator` | NA | No concept of indexing queries. Use standard SQL + Incremental Materialized Views instead. |
-| `version` | [`String`](/sql-reference/data-types/string) | ClickHouse does not have a native version type. Store versions as strings and use custom UDFs functions to perform semantic comparisons if needed. Consider normalizing to numeric formats if range queries are required. |
+| `version` | [`String`](/sql-reference/data-types/string) | ClickHouse doesn't have a native version type. Store versions as strings and use custom UDFs functions to perform semantic comparisons if needed. Consider normalizing to numeric formats if range queries are required. |
### Notes {#notes}
@@ -57,9 +57,9 @@ Elasticsearch and ClickHouse support a wide variety of data types, but their und
- **Multi-fields**: Elasticsearch allows indexing the [same field multiple ways](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/multi-fields#_multi_fields_with_multiple_analyzers) (e.g., both `text` and `keyword`). In ClickHouse, this pattern must be modeled using separate columns or views.
- **Map and JSON Types** - In ClickHouse, the [`Map`](/sql-reference/data-types/map) type is commonly used to model dynamic key-value structures such as `resourceAttributes` and `logAttributes`. This type enables flexible schema-less ingestion by allowing arbitrary keys to be added at runtime — similar in spirit to JSON objects in Elasticsearch. However, there are important limitations to consider:
- - **Uniform value types**: ClickHouse [`Map`](/sql-reference/data-types/map) columns must have a consistent value type (e.g., `Map(String, String)`). Mixed-type values are not supported without coercion.
+ - **Uniform value types**: ClickHouse [`Map`](/sql-reference/data-types/map) columns must have a consistent value type (e.g., `Map(String, String)`). Mixed-type values aren't supported without coercion.
- **Performance cost**: accessing any key in a [`Map`](/sql-reference/data-types/map) requires loading the entire map into memory, which can be suboptimal for performance.
- - **No subcolumns**: unlike JSON, keys in a [`Map`](/sql-reference/data-types/map) are not represented as true subcolumns, which limits ClickHouse’s ability to index, compress, and query efficiently.
+ - **No subcolumns**: unlike JSON, keys in a [`Map`](/sql-reference/data-types/map) aren't represented as true subcolumns, which limits ClickHouse’s ability to index, compress, and query efficiently.
Because of these limitations, ClickStack is migrating away from [`Map`](/sql-reference/data-types/map) in favor of ClickHouse's enhanced [`JSON`](/sql-reference/data-types/newjson) type. The [`JSON`](/sql-reference/data-types/newjson) type addresses many of the shortcomings of `Map`:
diff --git a/docs/use-cases/observability/clickstack/overview.md b/docs/use-cases/observability/clickstack/overview.md
index 65a64795ad2..b532014b2c7 100644
--- a/docs/use-cases/observability/clickstack/overview.md
+++ b/docs/use-cases/observability/clickstack/overview.md
@@ -106,7 +106,7 @@ Managed ClickStack consists of the following components:
Users run an OpenTelemetry Collector that receives telemetry data from their applications and infrastructure. This collector forwards data via OTLP to ClickHouse Cloud. While any standards-compliant OpenTelemetry Collector can be used, we strongly recommend the **ClickStack distribution**, which is preconfigured and optimized for ClickHouse ingestion and works out of the box with ClickStack schemas.
3. **ClickHouse Cloud**
- ClickHouse is fully managed in ClickHouse Cloud, serving as the storage and query engine for all observability data. Users do not need to manage clusters, upgrades, or operational concerns.
+ ClickHouse is fully managed in ClickHouse Cloud, serving as the storage and query engine for all observability data. Users don't need to manage clusters, upgrades, or operational concerns.
Managed ClickStack provides several key benefits:
diff --git a/docs/use-cases/observability/clickstack/search.md b/docs/use-cases/observability/clickstack/search.md
index 39eac39da7f..61d06dce3cd 100644
--- a/docs/use-cases/observability/clickstack/search.md
+++ b/docs/use-cases/observability/clickstack/search.md
@@ -23,7 +23,7 @@ as well.
### Natural language search syntax {#natural-language-syntax}
-- Searches are not case sensitive
+- Searches aren't case sensitive
- Searches match by whole word by default (ex. `Error` will match `Error here`
but not `Errors here`). You can surround a word by wildcards to match partial
words (ex. `*Error*` will match `AnyError` and `AnyErrors`)
@@ -59,14 +59,14 @@ as well.
### SQL search syntax {#sql-syntax}
You can optionally toggle search inputs to be in SQL mode. This will accept any valid
-SQL WHERE clause for searching. This is useful for complex queries that cannot be
+SQL WHERE clause for searching. This is useful for complex queries that can't be
expressed in Lucene syntax.
### Select statement {#select-statement}
To specify the columns to display in the search results, you can use the `SELECT`
input. This is a SQL SELECT expression for the columns to select in the search page.
-Aliases are not supported at this time (ex. you can not use `column as "alias"`).
+Aliases aren't supported at this time (ex. you can not use `column as "alias"`).
## Saved searches {#saved-searches}
diff --git a/docs/use-cases/observability/clickstack/ttl.md b/docs/use-cases/observability/clickstack/ttl.md
index 191df59c375..a0df32474bd 100644
--- a/docs/use-cases/observability/clickstack/ttl.md
+++ b/docs/use-cases/observability/clickstack/ttl.md
@@ -71,7 +71,7 @@ We recommend always using the setting [`ttl_only_drop_parts=1`](/operations/sett
By default, data with an expired TTL is removed when ClickHouse [merges data parts](/engines/table-engines/mergetree-family/mergetree#mergetree-data-storage). When ClickHouse detects that data is expired, it performs an off-schedule merge.
:::note TTL schedule
-TTLs are not applied immediately but rather on a schedule, as noted above. The MergeTree table setting `merge_with_ttl_timeout` sets the minimum delay in seconds before repeating a merge with delete TTL. The default value is 14400 seconds (4 hours). But that is just the minimum delay; it can take longer until a TTL merge is triggered. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources. A TTL expiration can be forced using the command `ALTER TABLE my_table MATERIALIZE TTL`.
+TTLs aren't applied immediately but rather on a schedule, as noted above. The MergeTree table setting `merge_with_ttl_timeout` sets the minimum delay in seconds before repeating a merge with delete TTL. The default value is 14400 seconds (4 hours). But that is just the minimum delay; it can take longer until a TTL merge is triggered. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources. A TTL expiration can be forced using the command `ALTER TABLE my_table MATERIALIZE TTL`.
:::
## Modifying TTL {#modifying-ttl}
@@ -85,7 +85,7 @@ ALTER TABLE default.otel_logs
MODIFY TTL TimestampTime + toIntervalDay(7);
```
-2. **Modify the OTel collector**. The ClickStack OpenTelemetry collector creates tables in ClickHouse if they do not exist. This is achieved via the ClickHouse exporter, which itself exposes a `ttl` parameter used for controlling the default TTL expression e.g.
+2. **Modify the OTel collector**. The ClickStack OpenTelemetry collector creates tables in ClickHouse if they don't exist. This is achieved via the ClickHouse exporter, which itself exposes a `ttl` parameter used for controlling the default TTL expression e.g.
```yaml
exporters:
@@ -96,7 +96,7 @@ exporters:
### Column level TTL {#column-level-ttl}
-The above examples expire data at a table level. You can also expire data at a column level. As data ages, this can be used to drop columns whose value in investigations does not justify their resource overhead to retain. For example, we recommend retaining the `Body` column in case new dynamic metadata is added that has not been extracted at insert time, e.g., a new Kubernetes label. After a period e.g. 1 month, it might be obvious that this additional metadata is not useful - thus limiting the value in retaining the `Body` column.
+The above examples expire data at a table level. You can also expire data at a column level. As data ages, this can be used to drop columns whose value in investigations doesn't justify their resource overhead to retain. For example, we recommend retaining the `Body` column in case new dynamic metadata is added that hasn't been extracted at insert time, e.g., a new Kubernetes label. After a period e.g. 1 month, it might be obvious that this additional metadata isn't useful - thus limiting the value in retaining the `Body` column.
Below, we show how the `Body` column can be dropped after 30 days.
@@ -112,5 +112,5 @@ ORDER BY (ServiceName, Timestamp)
```
:::note
-Specifying a column level TTL requires users to specify their own schema. This cannot be specified in the OTel collector.
+Specifying a column level TTL requires users to specify their own schema. This can't be specified in the OTel collector.
:::
diff --git a/docs/use-cases/observability/cloud-monitoring.md b/docs/use-cases/observability/cloud-monitoring.md
index 84f9c708ec1..efbfbf0e445 100644
--- a/docs/use-cases/observability/cloud-monitoring.md
+++ b/docs/use-cases/observability/cloud-monitoring.md
@@ -25,7 +25,7 @@ ClickHouse Cloud provides comprehensive monitoring through built-in dashboard in
- **Advanced Dashboard**: The main dashboard interface accessible via Monitoring → Advanced dashboard provides real-time visibility into query rates, resource usage, system health, and storage performance. This dashboard doesn't require separate authentication, won't prevent instances from idling, and doesn't add query load to your production system. Each visualization is powered by customizable SQL queries, with out-of-the-box charts grouped into ClickHouse-specific, system health, and Cloud-specific metrics. You can extend monitoring by creating custom queries directly in the SQL console.
:::note
-Accessing these metrics does not issue a query to the underlying service and will not wake idle services.
+Accessing these metrics doesn't issue a query to the underlying service and won't wake idle services.
:::
@@ -58,7 +58,7 @@ The organization-level endpoint federates metrics from all services, while per-s
- Cached metric delivery: Uses materialized views refreshed every minute to minimize query load on production systems
:::note
-This approach respects service idling behavior, allowing for cost optimization when services are not actively processing queries. This API endpoint relies on ClickHouse Cloud API credentials. For complete endpoint configuration details, see the cloud [Prometheus documentation](/integrations/prometheus).
+This approach respects service idling behavior, allowing for cost optimization when services aren't actively processing queries. This API endpoint relies on ClickHouse Cloud API credentials. For complete endpoint configuration details, see the cloud [Prometheus documentation](/integrations/prometheus).
:::
diff --git a/docs/use-cases/time-series/04_storage-efficiency.md b/docs/use-cases/time-series/04_storage-efficiency.md
index 11f97ab08a8..a1dac9f9111 100644
--- a/docs/use-cases/time-series/04_storage-efficiency.md
+++ b/docs/use-cases/time-series/04_storage-efficiency.md
@@ -59,7 +59,7 @@ ALTER TABLE wikistat
MODIFY COLUMN `hits` UInt32;
```
-This will reduce the size of this column in memory by at least a factor of two. Note that the size on disk will remain unchanged due to compression. But be careful, pick data types that are not too small!
+This will reduce the size of this column in memory by at least a factor of two. Note that the size on disk will remain unchanged due to compression. But be careful, pick data types that aren't too small!
## Specialized codecs {#time-series-specialized-codecs}