From 5eb70d12a76ac667c022909af679736816772f2a Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Thu, 5 Feb 2026 12:17:32 +0000
Subject: [PATCH 1/7] Initial plan
From 2b9b4fa9fbb216b5cb35c16fe1c49b332c899371 Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Thu, 5 Feb 2026 12:21:31 +0000
Subject: [PATCH 2/7] Convert formal language to informal contractions in
documentation
Co-authored-by: Blargian <41984034+Blargian@users.noreply.github.com>
---
.../_S3_authentication_and_bucket.md | 2 +-
.../_add_remote_ip_access_list_detail.md | 2 +-
.../_clickhouse_mysql_cloud_setup.mdx | 2 +-
...irect_observability_integration_options.md | 2 +-
docs/_snippets/_gather_your_details_http.mdx | 2 +-
docs/_snippets/_gather_your_details_native.md | 2 +-
.../_replication-sharding-terminology.md | 2 +-
.../_snippets/_self_managed_only_automated.md | 2 +-
.../_self_managed_only_no_roadmap.md | 2 +-
.../_self_managed_only_not_applicable.md | 2 +-
docs/_snippets/_self_managed_only_roadmap.md | 2 +-
.../amazon-s3/_1-data-source.md | 2 +-
docs/_snippets/compatibility.mdx | 4 +-
.../beta-and-experimental-features.md | 8 +--
docs/about-us/distinctive-features.md | 12 ++--
docs/about-us/history.md | 8 +--
docs/about-us/support.md | 2 +-
.../_snippets/_async_inserts.md | 6 +-
.../_snippets/_avoid_mutations.md | 4 +-
.../_snippets/_avoid_optimize_final.md | 4 +-
.../_snippets/_when-to-use-json.md | 4 +-
docs/best-practices/choosing_a_primary_key.md | 4 +-
docs/best-practices/json_type.md | 4 +-
.../best-practices/minimize_optimize_joins.md | 8 +--
docs/best-practices/partitioning_keys.mdx | 6 +-
docs/best-practices/select_data_type.md | 12 ++--
.../selecting_an_insert_strategy.md | 6 +-
.../sizing-and-hardware-recommendations.md | 8 +--
docs/best-practices/use_materialized_views.md | 8 +--
.../using_data_skipping_indices.md | 4 +-
docs/chdb/api/python.md | 64 +++++++++----------
docs/cloud/_snippets/_clickpipes_faq.md | 4 +-
docs/cloud/features/01_cloud_tiers.md | 4 +-
.../03_sql_console_features/01_sql-console.md | 4 +-
.../03_query-endpoints.md | 2 +-
.../automatic_scaling/01_auto_scaling.md | 20 +++---
.../automatic_scaling/02_make_before_break.md | 4 +-
.../replica-aware-routing.md | 6 +-
.../04_infrastructure/shared-catalog.md | 6 +-
.../04_infrastructure/shared-merge-tree.md | 8 +--
.../features/04_infrastructure/warehouses.md | 14 ++--
.../features/05_admin_features/api/openapi.md | 4 +-
.../features/05_admin_features/api/postman.md | 6 +-
.../features/05_admin_features/upgrades.md | 18 +++---
.../07_monitoring/advanced_dashboard.md | 4 +-
.../features/07_monitoring/notifications.md | 4 +-
.../features/07_monitoring/prometheus.md | 8 +--
docs/cloud/features/08_backups.md | 8 +--
.../features/09_AI_ML/AI_chat_overview.md | 2 +-
.../guides/AI_ML/AIChat/using_AI_chat.md | 4 +-
.../guides/SQL_console/query-endpoints.md | 2 +-
.../backups/01_review-and-restore-backups.md | 8 +--
.../01_export-backups-to-own-cloud-account.md | 2 +-
.../02_backup_restore_from_ui.md | 8 +--
.../03_backup_restore_using_commands.md | 8 +--
.../guides/best_practices/usagelimits.md | 2 +-
docs/cloud/guides/cloud-compatibility.md | 26 ++++----
.../data_sources/01_cloud-endpoints-api.md | 2 +-
.../02_accessing-s3-data-securely.md | 4 +-
.../03_accessing-gcs-data-securely.md | 4 +-
.../01_deployment_options/byoc/01_overview.md | 6 +-
.../byoc/02_architecture.md | 2 +-
.../byoc/03_onboarding/01_standard.md | 8 +--
.../byoc/03_onboarding/02_customization.md | 4 +-
.../byoc/05_configuration.md | 2 +-
.../byoc/06_observability.md | 4 +-
.../byoc/07_operations.md | 2 +-
.../byoc/08_reference/01_faq.md | 4 +-
.../byoc/08_reference/03_network_security.md | 2 +-
docs/cloud/guides/production-readiness.md | 6 +-
.../01_manage-my-account.md | 8 +--
.../02_manage-cloud-users.md | 4 +-
.../04_manage-database-users.md | 8 +--
.../04_saml-sso-setup.md | 18 +++---
.../05_saml-sso-removal.md | 4 +-
.../06_common-access-management-queries.md | 4 +-
.../02_connectivity/01_setting-ip-filters.md | 2 +-
.../private_networking/02_aws-privatelink.md | 10 +--
.../03_gcp-private-service-connect.md | 18 +++---
.../04_azure-privatelink.md | 10 +--
docs/cloud/guides/security/03_data-masking.md | 10 +--
docs/cloud/guides/security/04_cmek.md | 2 +-
.../05_audit_logging/02_database-audit-log.md | 2 +-
.../guides/security/05_cmek_migration.md | 2 +-
docs/cloud/managed-postgres/connection.md | 4 +-
docs/cloud/managed-postgres/faq.md | 2 +-
.../managed-postgres/high-availability.md | 8 +--
.../migrations/logical-replication.md | 6 +-
.../managed-postgres/migrations/peerdb.md | 12 ++--
.../migrations/pg_dump-pg_restore.md | 8 +--
docs/cloud/managed-postgres/security.md | 2 +-
.../02_use_cases/03_data_warehousing.md | 2 +-
.../01_machine_learning.md | 2 +-
.../02_postgres/01_overview.md | 2 +-
.../02_postgres/appendix.md | 16 ++---
.../03_migration_guide_part3.md | 18 +++---
.../03_bigquery/01_overview.md | 6 +-
.../02_migrating-to-clickhouse-cloud.md | 16 ++---
.../03_bigquery/03_loading-data.md | 4 +-
.../03_sql_translation_reference.md | 6 +-
.../03_sql_translation_reference.md | 4 +-
...1_clickhouse-to-cloud_with_remotesecure.md | 4 +-
.../02_oss_to_cloud_backups.md | 4 +-
.../03_object-storage-to-clickhouse.md | 2 +-
.../03_billing/01_billing_overview.md | 14 ++--
.../aws-marketplace-committed.md | 10 +--
.../02_marketplace/aws-marketplace-payg.md | 10 +--
.../azure-marketplace-committed.md | 10 +--
.../02_marketplace/azure-marketplace-payg.md | 12 ++--
.../gcp-marketplace-committed.md | 10 +--
.../02_marketplace/gcp-marketplace-payg.md | 10 +--
.../migrate-marketplace-payg-committed.md | 8 +--
.../03_billing/02_marketplace/overview.md | 8 +--
.../03_clickpipes/clickpipes_for_cdc.md | 4 +-
.../03_billing/05_payment-thresholds.md | 2 +-
docs/cloud/reference/05_supported-regions.md | 6 +-
.../reference/10_personal-data-access.md | 6 +-
docs/cloud/reference/11_account-close.md | 10 +--
docs/cloud/reference/data-resiliency.md | 8 +--
docs/concepts/glossary.md | 8 +--
docs/concepts/olap.md | 4 +-
docs/concepts/why-clickhouse-is-so-fast.mdx | 8 +--
.../compression-in-clickhouse.md | 6 +-
docs/data-modeling/backfilling.md | 20 +++---
docs/data-modeling/denormalization.md | 6 +-
.../projections/1_projections.md | 12 ++--
...2_materialized-views-versus-projections.md | 26 ++++----
docs/data-modeling/schema-design.md | 20 +++---
docs/deployment-guides/parallel-replicas.mdx | 24 +++----
.../01_1_shard_2_replicas.md | 4 +-
.../02_2_shards_1_replica.md | 6 +-
.../03_2_shards_2_replicas.md | 6 +-
.../_snippets/_server_parameter_table.mdx | 2 +-
docs/deployment-guides/terminology.md | 2 +-
docs/dictionary/index.md | 2 +-
docs/faq/general/dependencies.md | 2 +-
docs/faq/general/distributed-join.md | 4 +-
docs/faq/general/ne-tormozit.md | 6 +-
docs/faq/general/olap.md | 4 +-
docs/faq/general/who-is-using-clickhouse.md | 4 +-
docs/faq/index.md | 2 +-
docs/faq/operations/delete-old-data.md | 2 +-
docs/faq/operations/production.md | 16 ++---
.../example-datasets/amazon-reviews.md | 2 +-
.../example-datasets/cell-towers.md | 8 +--
.../example-datasets/covid19.md | 2 +-
.../example-datasets/environmental-sensors.md | 2 +-
.../example-datasets/foursquare-os-places.md | 2 +-
.../example-datasets/github.md | 6 +-
.../example-datasets/hacker-news.md | 2 +-
.../getting-started/example-datasets/menus.md | 4 +-
docs/getting-started/example-datasets/noaa.md | 4 +-
.../example-datasets/nypd_complaint_data.md | 14 ++--
.../example-datasets/stackoverflow.md | 4 +-
.../getting-started/example-datasets/tpcds.md | 14 ++--
docs/getting-started/example-datasets/tpch.md | 6 +-
.../example-datasets/youtube-dislikes.md | 4 +-
.../install/_snippets/_deb_install.md | 4 +-
.../install/_snippets/_docker.md | 4 +-
.../install/_snippets/_linux_tar_install.md | 2 +-
.../install/_snippets/_macos.md | 4 +-
.../install/_snippets/_quick_install.md | 4 +-
.../install/_snippets/_rpm_install.md | 4 +-
.../install/_snippets/_windows_install.md | 2 +-
docs/getting-started/install/advanced.md | 2 +-
docs/getting-started/playground.md | 4 +-
docs/getting-started/quick-start/cloud.mdx | 8 +--
docs/getting-started/quick-start/oss.mdx | 10 +--
...ormance_optimizations_table_of_contents.md | 2 +-
.../best-practices/query-optimization.md | 10 +--
.../guides/best-practices/skipping-indexes.md | 18 +++---
.../best-practices/sparse-primary-indexes.md | 48 +++++++-------
.../developer/cascading-materialized-views.md | 10 +--
.../deduplicating-inserts-on-retries.md | 22 +++----
docs/guides/developer/deduplication.md | 24 +++----
.../developer/dynamic-column-selection.md | 2 +-
docs/guides/developer/index.md | 2 +-
docs/guides/developer/on-fly-mutations.md | 6 +-
docs/guides/developer/replacing-merge-tree.md | 16 ++---
...ored-procedures-and-prepared-statements.md | 18 +++---
.../developer/time-series-filling-gaps.md | 2 +-
docs/guides/developer/ttl.md | 4 +-
...nding-query-execution-with-the-analyzer.md | 14 ++--
.../avgState.md | 4 +-
.../minSimpleState.md | 4 +-
.../sumSimpleState.md | 2 +-
docs/guides/joining-tables.md | 6 +-
docs/guides/separation-storage-compute.md | 10 +--
docs/guides/sre/keeper/index.md | 54 ++++++++--------
docs/guides/sre/network-ports.md | 2 +-
docs/guides/sre/scaling-clusters.md | 8 +--
.../sre/tls/configuring-tls-acme-client.md | 10 +--
docs/guides/sre/tls/configuring-tls.md | 6 +-
.../sre/user-management/configuring-ldap.md | 2 +-
docs/guides/sre/user-management/index.md | 18 +++---
.../sre/user-management/ssl-user-auth.md | 2 +-
docs/guides/starter_guides/creating-tables.md | 2 +-
docs/guides/starter_guides/inserting-data.md | 8 +--
docs/guides/starter_guides/mutations.md | 4 +-
.../starter_guides/working_with_joins.md | 2 +-
docs/guides/starter_guides/writing-queries.md | 2 +-
docs/guides/troubleshooting.md | 22 +++----
.../data-ingestion/apache-spark/databricks.md | 2 +-
.../apache-spark/spark-native-connector.md | 10 +--
.../using_azureblobstorage.md | 2 +-
.../using_http_interface.md | 4 +-
.../clickpipes/aws-privatelink.md | 22 +++----
.../data-ingestion/clickpipes/index.md | 2 +-
.../clickpipes/kafka/02_schema-registries.md | 10 +--
.../clickpipes/kafka/03_reference.md | 10 +--
.../clickpipes/kafka/04_best_practices.md | 12 ++--
.../data-ingestion/clickpipes/kafka/05_faq.md | 2 +-
.../clickpipes/kinesis/01_overview.md | 8 +--
.../clickpipes/mongodb/controlling_sync.md | 2 +-
.../clickpipes/mongodb/index.md | 6 +-
.../clickpipes/mongodb/lifecycle.md | 2 +-
.../clickpipes/mongodb/source/atlas.md | 4 +-
.../clickpipes/mongodb/source/documentdb.md | 4 +-
.../clickpipes/mongodb/source/generic.md | 2 +-
.../clickpipes/mongodb/table_resync.md | 2 +-
.../clickpipes/mysql/controlling_sync.md | 2 +-
.../data-ingestion/clickpipes/mysql/faq.md | 10 +--
.../data-ingestion/clickpipes/mysql/index.md | 8 +--
.../clickpipes/mysql/lifecycle.md | 2 +-
.../clickpipes/mysql/parallel_initial_load.md | 6 +-
.../clickpipes/mysql/schema-changes.md | 4 +-
.../source/azure-flexible-server-mysql.md | 6 +-
.../clickpipes/mysql/source/generic.md | 2 +-
.../clickpipes/mysql/source/generic_maria.md | 2 +-
.../clickpipes/mysql/source/rds_maria.md | 2 +-
.../clickpipes/mysql/table_resync.md | 2 +-
.../object-storage/amazon-s3/01_overview.md | 16 ++---
.../azure-blob-storage/01_overview.md | 10 +--
.../google-cloud-storage/01_overview.md | 8 +--
.../clickpipes/postgres/auth.md | 2 +-
.../clickpipes/postgres/controlling_sync.md | 4 +-
.../clickpipes/postgres/deduplication.md | 4 +-
.../data-ingestion/clickpipes/postgres/faq.md | 48 +++++++-------
.../clickpipes/postgres/index.md | 12 ++--
.../clickpipes/postgres/lifecycle.md | 4 +-
.../clickpipes/postgres/ordering_keys.md | 4 +-
.../postgres/parallel_initial_load.md | 8 +--
.../clickpipes/postgres/pause_and_resume.md | 2 +-
.../postgres/postgres_generated_columns.md | 8 +--
.../clickpipes/postgres/schema-changes.md | 2 +-
.../clickpipes/postgres/source/alloydb.md | 4 +-
.../source/azure-flexible-server-postgres.md | 4 +-
.../clickpipes/postgres/source/generic.md | 6 +-
.../postgres/source/google-cloudsql.md | 2 +-
.../clickpipes/postgres/source/planetscale.md | 2 +-
.../clickpipes/postgres/source/rds.md | 2 +-
.../clickpipes/postgres/source/supabase.md | 2 +-
.../clickpipes/postgres/source/timescale.md | 4 +-
.../clickpipes/postgres/table_resync.md | 2 +-
.../clickpipes/postgres/toast.md | 8 +--
.../data-ingestion/data-formats/binary.md | 2 +-
.../data-ingestion/data-formats/csv-tsv.md | 2 +-
.../data-ingestion/data-formats/intro.md | 2 +-
.../data-formats/json/formats.md | 2 +-
.../data-formats/json/inference.md | 14 ++--
.../data-formats/json/loading.md | 4 +-
.../data-ingestion/data-formats/json/other.md | 6 +-
.../data-formats/json/schema.md | 18 +++---
.../data-formats/templates-regex.md | 2 +-
.../data-ingestion/dbms/dynamodb/index.md | 2 +-
.../dbms/jdbc-with-clickhouse.md | 12 ++--
.../postgresql/connecting-to-postgresql.md | 4 +-
.../integrations/data-ingestion/emqx/index.md | 10 +--
.../dbt/features-and-configurations.md | 50 +++++++--------
.../data-ingestion/etl-tools/dbt/guides.md | 10 +--
.../data-ingestion/etl-tools/dbt/index.md | 16 ++---
.../etl-tools/dlt-and-clickhouse.md | 14 ++--
.../data-ingestion/etl-tools/estuary.md | 2 +-
.../etl-tools/nifi-and-clickhouse.md | 2 +-
.../etl-tools/vector-to-clickhouse.md | 10 +--
docs/integrations/data-ingestion/gcs/index.md | 12 ++--
.../kafka/confluent/confluent-cloud.md | 4 +-
.../kafka/confluent/custom-connector.md | 4 +-
.../kafka/confluent/kafka-connect-http.md | 10 +--
.../data-ingestion/kafka/index.md | 6 +-
.../kafka/kafka-clickhouse-connect-sink.md | 12 ++--
.../kafka/kafka-connect-jdbc.md | 18 +++---
.../kafka/kafka-table-engine.md | 24 +++----
.../data-ingestion/kafka/kafka-vector.md | 8 +--
.../data-ingestion/kafka/msk/index.md | 6 +-
.../redshift/_snippets/_migration_guide.md | 6 +-
.../s3/creating-an-s3-iam-role-and-bucket.md | 2 +-
docs/integrations/data-ingestion/s3/index.md | 40 ++++++------
.../data-ingestion/s3/performance.md | 12 ++--
.../streamkap/sql-server-clickhouse.md | 2 +-
.../streamkap/streamkap-and-clickhouse.md | 2 +-
docs/integrations/data-sources/mysql.md | 2 +-
.../astrato-and-clickhouse.md | 2 +-
.../chartbrew-and-clickhouse.md | 2 +-
.../explo-and-clickhouse.md | 2 +-
.../holistics-and-clickhouse.md | 2 +-
.../luzmo-and-clickhouse.md | 2 +-
.../mitzu-and-clickhouse.md | 8 +--
.../zingdata-and-clickhouse.md | 2 +-
.../data-visualization/grafana/config.md | 6 +-
.../data-visualization/grafana/index.md | 14 ++--
.../grafana/query-builder.md | 4 +-
.../looker-and-clickhouse.md | 8 +--
.../metabase-and-clickhouse.md | 4 +-
.../powerbi-and-clickhouse.md | 4 +-
.../splunk-and-clickhouse.md | 8 +--
.../superset-and-clickhouse.md | 6 +-
.../tableau/tableau-analysis-tips.md | 4 +-
.../tableau/tableau-and-clickhouse.md | 4 +-
.../tableau/tableau-connection-tips.md | 4 +-
.../tableau/tableau-online-and-clickhouse.md | 4 +-
docs/integrations/index.mdx | 2 +-
docs/integrations/interfaces/http.md | 36 +++++------
docs/integrations/interfaces/mysql.md | 14 ++--
docs/integrations/interfaces/postgresql.md | 4 +-
docs/integrations/interfaces/prometheus.md | 2 +-
docs/integrations/interfaces/ssh.md | 6 +-
docs/integrations/interfaces/tcp.md | 2 +-
docs/integrations/language-clients/cpp.md | 4 +-
docs/integrations/language-clients/csharp.md | 26 ++++----
.../integrations/language-clients/go/index.md | 36 +++++------
.../language-clients/java/client/client.mdx | 16 ++---
.../language-clients/java/index.md | 6 +-
.../language-clients/java/jdbc/jdbc.mdx | 38 +++++------
docs/integrations/language-clients/js.md | 50 +++++++--------
.../python/additional-options.md | 10 +--
.../python/advanced-inserting.md | 10 +--
.../python/advanced-querying.md | 20 +++---
.../language-clients/python/advanced-usage.md | 14 ++--
.../language-clients/python/driver-api.md | 28 ++++----
.../language-clients/python/index.md | 4 +-
.../language-clients/python/sqlalchemy.md | 10 +--
docs/integrations/language-clients/rust.md | 14 ++--
docs/integrations/sql-clients/datagrip.md | 2 +-
docs/integrations/sql-clients/dbeaver.md | 4 +-
docs/integrations/sql-clients/dbvisualizer.md | 2 +-
docs/integrations/sql-clients/marimo.md | 2 +-
docs/integrations/sql-clients/qstudio.md | 2 +-
docs/integrations/sql-clients/sql-console.md | 4 +-
.../pg_clickhouse/introduction.md | 4 +-
.../pg_clickhouse/reference.md | 20 +++---
.../pg_clickhouse/tutorial.md | 2 +-
.../tools/data-integration/retool/index.md | 2 +-
docs/intro.md | 16 ++---
docs/kubernetes-operator/02_install/olm.mdx | 2 +-
.../03_guides/01_introduction.mdx | 10 +--
.../03_guides/02_configuration.mdx | 2 +-
.../core-concepts/academic_overview.mdx | 26 ++++----
docs/managing-data/core-concepts/index.md | 6 +-
docs/managing-data/core-concepts/merges.mdx | 4 +-
.../core-concepts/partitions.mdx | 4 +-
docs/managing-data/core-concepts/parts.md | 2 +-
docs/managing-data/core-concepts/shards.mdx | 2 +-
.../deleting-data/delete_mutations.mdx | 2 +-
docs/managing-data/deleting-data/overview.mdx | 4 +-
docs/managing-data/truncate.md | 2 +-
docs/managing-data/updating-data/overview.mdx | 8 +--
.../updating-data/update_mutations.mdx | 2 +-
.../incremental-materialized-view.md | 32 +++++-----
.../refreshable-materialized-view.md | 2 +-
docs/native-protocol/basics.md | 2 +-
docs/native-protocol/client.md | 4 +-
docs/native-protocol/hash.md | 8 +--
.../operations_/backup_restore/00_overview.md | 10 +--
.../backup_restore/01_local_disk.md | 4 +-
.../backup_restore/02_s3_endpoint.md | 2 +-
.../backup_restore/04_alternative_methods.md | 6 +-
docs/tips-and-tricks/cost-optimization.md | 2 +-
docs/tips-and-tricks/debugging-insights.md | 2 +-
docs/tips-and-tricks/success-stories.md | 2 +-
.../static-files-disk-uploader.md | 2 +-
docs/tutorial.md | 4 +-
docs/use-cases/AI_ML/MCP/03_librechat.md | 6 +-
docs/use-cases/AI_ML/MCP/06_ollama.md | 2 +-
.../data-exploration/jupyter-notebook.md | 6 +-
.../AI_ML/data-exploration/marimo-notebook.md | 4 +-
docs/use-cases/data_lake/glue_catalog.md | 2 +-
docs/use-cases/data_lake/onelake_catalog.md | 2 +-
.../observability/build-your-own/grafana.md | 2 +-
.../integrating-opentelemetry.md | 28 ++++----
.../build-your-own/introduction.md | 10 +--
.../build-your-own/managing-data.md | 14 ++--
.../build-your-own/schema-design.md | 50 +++++++--------
.../observability/clickstack/alerts.md | 4 +-
.../observability/clickstack/config.md | 2 +-
.../observability/clickstack/dashboards.md | 2 +-
.../deployment/_snippets/_navigate_managed.md | 4 +-
.../deployment/_snippets/_select_provider.md | 2 +-
.../clickstack/deployment/index.md | 2 +-
.../clickstack/deployment/managed.md | 6 +-
.../clickstack/deployment/oss/all-in-one.md | 4 +-
.../deployment/oss/docker-compose.md | 2 +-
.../deployment/oss/helm/helm-configuration.md | 8 +--
.../clickstack/deployment/oss/helm/helm.md | 2 +-
.../clickstack/deployment/oss/hyperdx-only.md | 2 +-
.../deployment/oss/local-mode-only.md | 2 +-
.../clickstack/event_patterns.md | 2 +-
.../clickstack/example-datasets/kubernetes.md | 8 +--
.../clickstack/example-datasets/local-data.md | 2 +-
.../example-datasets/remote-demo-data.md | 8 +--
.../example-datasets/sample-data.md | 4 +-
.../clickstack/getting-started/managed.md | 2 +-
.../clickstack/getting-started/oss.md | 6 +-
.../_snippets/_extending_config.md | 2 +-
.../clickstack/ingesting-data/collector.md | 22 +++----
.../integration-examples/aws-lambda.md | 4 +-
.../integration-examples/cloudwatch.md | 10 +--
.../host-logs/ec2-host-logs.md | 6 +-
.../host-logs/generic-host-logs.md | 2 +-
.../integration-examples/kafka-metrics.md | 2 +-
.../integration-examples/kubernetes.md | 2 +-
.../integration-examples/mysql.md | 2 +-
.../integration-examples/nginx-logs.md | 2 +-
.../integration-examples/nginx-traces.md | 2 +-
.../integration-examples/nodejs-traces.md | 4 +-
.../integration-examples/postgres-logs.md | 2 +-
.../integration-examples/postgres-metrics.md | 2 +-
.../integration-examples/redis-logs.md | 2 +-
.../integration-examples/redis-metrics.md | 2 +-
.../integration-examples/systemd.md | 4 +-
.../integration-examples/temporal.md | 2 +-
.../ingesting-data/opentelemetry.md | 2 +-
.../clickstack/ingesting-data/schemas.md | 2 +-
.../clickstack/ingesting-data/sdks/browser.md | 2 +-
.../clickstack/ingesting-data/sdks/index.md | 2 +-
.../clickstack/ingesting-data/sdks/nestjs.md | 4 +-
.../clickstack/ingesting-data/sdks/python.md | 4 +-
.../clickstack/ingesting-data/sdks/ruby.md | 2 +-
.../clickstack/ingesting-data/vector.md | 14 ++--
.../clickstack/managing/admin.md | 2 +-
.../clickstack/managing/materialized_views.md | 4 +-
.../clickstack/managing/performance_tuning.md | 10 +--
.../clickstack/managing/production.md | 10 +--
.../clickstack/migration/elastic/concepts.md | 14 ++--
.../clickstack/migration/elastic/intro.md | 6 +-
.../migration/elastic/migrating-agents.md | 6 +-
.../migration/elastic/migrating-data.md | 4 +-
.../migration/elastic/migrating-sdks.md | 4 +-
.../clickstack/migration/elastic/search.md | 14 ++--
.../clickstack/migration/elastic/types.md | 8 +--
.../observability/clickstack/overview.md | 2 +-
.../observability/clickstack/search.md | 6 +-
.../use-cases/observability/clickstack/ttl.md | 8 +--
.../observability/cloud-monitoring.md | 4 +-
.../time-series/04_storage-efficiency.md | 2 +-
445 files changed, 1611 insertions(+), 1611 deletions(-)
diff --git a/docs/_snippets/_S3_authentication_and_bucket.md b/docs/_snippets/_S3_authentication_and_bucket.md
index fc38449e674..c8af9968c7e 100644
--- a/docs/_snippets/_S3_authentication_and_bucket.md
+++ b/docs/_snippets/_S3_authentication_and_bucket.md
@@ -81,7 +81,7 @@ Save the keys somewhere else; this is the only time that the secret access key w
The bucket name must be unique across AWS, not just the organization, or it will emit an error.
:::
-3. Leave `Block all Public Access` enabled; public access is not needed.
+3. Leave `Block all Public Access` enabled; public access isn't needed.
diff --git a/docs/_snippets/_add_remote_ip_access_list_detail.md b/docs/_snippets/_add_remote_ip_access_list_detail.md
index fccdb2467ee..c1471abb08c 100644
--- a/docs/_snippets/_add_remote_ip_access_list_detail.md
+++ b/docs/_snippets/_add_remote_ip_access_list_detail.md
@@ -5,7 +5,7 @@ import ip_allow_list_add_current_ip from '@site/static/images/_snippets/ip-allow
Manage your IP Access List
-From your ClickHouse Cloud services list choose the service that you will work with and switch to **Settings**. If the IP Access List does not contain the IP Address or range of the remote system that needs to connect to your ClickHouse Cloud service, then you can resolve the problem with **Add IPs**:
+From your ClickHouse Cloud services list choose the service that you will work with and switch to **Settings**. If the IP Access List doesn't contain the IP Address or range of the remote system that needs to connect to your ClickHouse Cloud service, then you can resolve the problem with **Add IPs**:
diff --git a/docs/_snippets/_clickhouse_mysql_cloud_setup.mdx b/docs/_snippets/_clickhouse_mysql_cloud_setup.mdx
index 7fb5acc458b..948a24ddc65 100644
--- a/docs/_snippets/_clickhouse_mysql_cloud_setup.mdx
+++ b/docs/_snippets/_clickhouse_mysql_cloud_setup.mdx
@@ -51,7 +51,7 @@ ClickHouse Cloud automatically creates a `mysql4` user that shares th
The `` portion corresponds to the first part of your ClickHouse Cloud hostname.
This username format is required for compatibility with tools that establish secure connections but don't include [SNI (Server Name Indication)](https://www.cloudflare.com/learning/ssl/what-is-sni) data in their TLS handshake.
-Without SNI information, the system cannot perform proper internal routing, so the subdomain hint embedded in the username provides the necessary routing information.
+Without SNI information, the system can't perform proper internal routing, so the subdomain hint embedded in the username provides the necessary routing information.
The MySQL console client is an example of a tool that requires this.
:::tip
diff --git a/docs/_snippets/_direct_observability_integration_options.md b/docs/_snippets/_direct_observability_integration_options.md
index e57b07635d2..95005a2d102 100644
--- a/docs/_snippets/_direct_observability_integration_options.md
+++ b/docs/_snippets/_direct_observability_integration_options.md
@@ -11,7 +11,7 @@ For plugin installation and configuration details, see the ClickHouse [data sour
Datadog offers a Clickhouse Monitoring plugin for its agent which queries system tables directly. This integration provides comprehensive database monitoring with cluster awareness through clusterAllReplicas functionality.
:::note
-This integration is not recommended for ClickHouse Cloud deployments due to incompatibility with cost-optimizing idle behavior and operational limitations of the cloud proxy layer.
+This integration isn't recommended for ClickHouse Cloud deployments due to incompatibility with cost-optimizing idle behavior and operational limitations of the cloud proxy layer.
:::
### Using system tables directly {#system-tables}
diff --git a/docs/_snippets/_gather_your_details_http.mdx b/docs/_snippets/_gather_your_details_http.mdx
index f5517f41b91..0b0a9cac42e 100644
--- a/docs/_snippets/_gather_your_details_http.mdx
+++ b/docs/_snippets/_gather_your_details_http.mdx
@@ -19,4 +19,4 @@ Choose **HTTPS**. Connection details are displayed in an example `curl` command.
-If you are using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
+If you're using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
diff --git a/docs/_snippets/_gather_your_details_native.md b/docs/_snippets/_gather_your_details_native.md
index 73bf893a979..f166aad2b45 100644
--- a/docs/_snippets/_gather_your_details_native.md
+++ b/docs/_snippets/_gather_your_details_native.md
@@ -19,4 +19,4 @@ Choose **Native**, and the details are available in an example `clickhouse-clien
-If you are using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
+If you're using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
diff --git a/docs/_snippets/_replication-sharding-terminology.md b/docs/_snippets/_replication-sharding-terminology.md
index e463d9b6815..8c8ec50967e 100644
--- a/docs/_snippets/_replication-sharding-terminology.md
+++ b/docs/_snippets/_replication-sharding-terminology.md
@@ -3,7 +3,7 @@
A copy of data. ClickHouse always has at least one copy of your data, and so the minimum number of **replicas** is one. This is an important detail, you may not be used to counting the original copy of your data as a replica, but that is the term used in ClickHouse code and documentation. Adding a second replica of your data provides fault tolerance.
### Shard {#shard}
-A subset of data. ClickHouse always has at least one shard for your data, so if you do not split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the **sharding key**, and is defined when you create the distributed table. The sharding key can be random or as an output of a [hash function](/sql-reference/functions/hash-functions). The deployment examples involving sharding will use `rand()` as the sharding key, and will provide further information on when and how to choose a different sharding key.
+A subset of data. ClickHouse always has at least one shard for your data, so if you don't split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the **sharding key**, and is defined when you create the distributed table. The sharding key can be random or as an output of a [hash function](/sql-reference/functions/hash-functions). The deployment examples involving sharding will use `rand()` as the sharding key, and will provide further information on when and how to choose a different sharding key.
### Distributed coordination {#distributed-coordination}
ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.
diff --git a/docs/_snippets/_self_managed_only_automated.md b/docs/_snippets/_self_managed_only_automated.md
index 2ade570522f..8abcc94ed56 100644
--- a/docs/_snippets/_self_managed_only_automated.md
+++ b/docs/_snippets/_self_managed_only_automated.md
@@ -3,5 +3,5 @@ import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
:::note
-This page is not applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The procedure documented here is automated in ClickHouse Cloud services.
+This page isn't applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The procedure documented here is automated in ClickHouse Cloud services.
:::
diff --git a/docs/_snippets/_self_managed_only_no_roadmap.md b/docs/_snippets/_self_managed_only_no_roadmap.md
index 7329e67558b..9aca2e9a891 100644
--- a/docs/_snippets/_self_managed_only_no_roadmap.md
+++ b/docs/_snippets/_self_managed_only_no_roadmap.md
@@ -4,6 +4,6 @@ import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
:::note
-This page is not applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The feature documented here is not available in ClickHouse Cloud services.
+This page isn't applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The feature documented here isn't available in ClickHouse Cloud services.
See the ClickHouse [Cloud Compatibility](/whats-new/cloud-compatibility) guide for more information.
:::
diff --git a/docs/_snippets/_self_managed_only_not_applicable.md b/docs/_snippets/_self_managed_only_not_applicable.md
index 98c1d25088c..98a767fa5c7 100644
--- a/docs/_snippets/_self_managed_only_not_applicable.md
+++ b/docs/_snippets/_self_managed_only_not_applicable.md
@@ -3,5 +3,5 @@ import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
:::note
-This page is not applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The procedure documented here is only necessary in self-managed ClickHouse deployments.
+This page isn't applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The procedure documented here is only necessary in self-managed ClickHouse deployments.
:::
diff --git a/docs/_snippets/_self_managed_only_roadmap.md b/docs/_snippets/_self_managed_only_roadmap.md
index 6c8ba523487..8b4dd58f1b2 100644
--- a/docs/_snippets/_self_managed_only_roadmap.md
+++ b/docs/_snippets/_self_managed_only_roadmap.md
@@ -3,6 +3,6 @@ import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
:::note
-This page is not applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The feature documented here is not yet available in ClickHouse Cloud services.
+This page isn't applicable to [ClickHouse Cloud](https://clickhouse.com/cloud). The feature documented here isn't yet available in ClickHouse Cloud services.
See the ClickHouse [Cloud Compatibility](/whats-new/cloud-compatibility#roadmap) guide for more information.
:::
diff --git a/docs/_snippets/clickpipes/object-storage/amazon-s3/_1-data-source.md b/docs/_snippets/clickpipes/object-storage/amazon-s3/_1-data-source.md
index d4b1ab448dc..8c3ddd9a553 100644
--- a/docs/_snippets/clickpipes/object-storage/amazon-s3/_1-data-source.md
+++ b/docs/_snippets/clickpipes/object-storage/amazon-s3/_1-data-source.md
@@ -6,5 +6,5 @@ import Image from '@theme/IdealImage';
:::tip
- Due to differences in URL formats and API implementations across object storage service providers, not all S3-compatible services are supported out-of-the-box. If you're running into issues with a service that is not listed under [supported data sources](/integrations/clickpipes/object-storage/s3/overview#supported-data-sources), please [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
+ Due to differences in URL formats and API implementations across object storage service providers, not all S3-compatible services are supported out-of-the-box. If you're running into issues with a service that isn't listed under [supported data sources](/integrations/clickpipes/object-storage/s3/overview#supported-data-sources), please [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
:::
diff --git a/docs/_snippets/compatibility.mdx b/docs/_snippets/compatibility.mdx
index 4fa4b2f43dd..887b99a2c7a 100644
--- a/docs/_snippets/compatibility.mdx
+++ b/docs/_snippets/compatibility.mdx
@@ -1,6 +1,6 @@
:::note Compatibility
-If you are noticing differences in behavior between your self-hosted ClickHouse deployment and your ClickHouse Cloud service,
+If you're noticing differences in behavior between your self-hosted ClickHouse deployment and your ClickHouse Cloud service,
it may be related to the [compatibility setting](/operations/settings/settings#compatibility).
-In Cloud, compatibility is set when a service is created and does not change at the service level to ensure that clients get consistent behavior even as the service upgrades.
+In Cloud, compatibility is set when a service is created and doesn't change at the service level to ensure that clients get consistent behavior even as the service upgrades.
If you wish to change compatibility, you may request to do so via [support](https://clickhouse.com/support/program).
:::
\ No newline at end of file
diff --git a/docs/about-us/beta-and-experimental-features.md b/docs/about-us/beta-and-experimental-features.md
index 8979927d1ea..f72efc017d3 100644
--- a/docs/about-us/beta-and-experimental-features.md
+++ b/docs/about-us/beta-and-experimental-features.md
@@ -11,7 +11,7 @@ Because ClickHouse is open-source, it receives many contributions not only from
Due to the uncertainty of when features are classified as generally available, we delineate features into two categories: **Beta** and **Experimental**.
-**Beta** features are officially supported by the ClickHouse team. **Experimental** features are early prototypes driven by either the ClickHouse team or the community and are not officially supported.
+**Beta** features are officially supported by the ClickHouse team. **Experimental** features are early prototypes driven by either the ClickHouse team or the community and aren't officially supported.
The sections below explicitly describe the properties of **Beta** and **Experimental** features:
@@ -34,14 +34,14 @@ Note: please be sure to be using a current version of the ClickHouse [compatibil
- Can introduce breaking changes
- Functionality may change in the feature
- Need to be deliberately enabled
-- The ClickHouse team **does not support** experimental features
+- The ClickHouse team **doesn't support** experimental features
- May lack important functionality and documentation
-- Cannot be enabled in the cloud
+- Can't be enabled in the cloud
Please note: no additional experimental features are allowed to be enabled in ClickHouse Cloud other than those listed above as Beta.
diff --git a/docs/about-us/distinctive-features.md b/docs/about-us/distinctive-features.md
index c90b05bd416..fbad166dd51 100644
--- a/docs/about-us/distinctive-features.md
+++ b/docs/about-us/distinctive-features.md
@@ -14,13 +14,13 @@ doc_type: 'guide'
In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length "number" next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any "garbage") even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
-This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput of around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
+This is in contrast to systems that can store values of different columns separately, but that can't effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput of around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
Finally, ClickHouse is a database management system, not a single database. It allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
## Data compression {#data-compression}
-Some column-oriented DBMSs do not use data compression. However, data compression plays a key role in achieving excellent performance.
+Some column-oriented DBMSs don't use data compression. However, data compression plays a key role in achieving excellent performance.
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allows ClickHouse to compete with and outperform more niche databases, like time-series ones.
@@ -46,11 +46,11 @@ ClickHouse supports [a declarative query language](/sql-reference/) based on SQL
Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), the [JOIN](../sql-reference/statements/select/join.md) clause, the [IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.
-Correlated (dependent) subqueries are not supported at the time of writing but might become available in the future.
+Correlated (dependent) subqueries aren't supported at the time of writing but might become available in the future.
## Vector computation engine {#vector-engine}
-Data is not only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency.
+Data isn't only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency.
## Real-time data inserts {#real-time-data-updates}
@@ -62,11 +62,11 @@ Having data physically sorted by primary key makes it possible to extract data b
## Secondary indexes {#secondary-indexes}
-Unlike other database management systems, secondary indexes in ClickHouse do not point to specific rows or row ranges. Instead, they allow the database to know in advance that all rows in some data parts would not match the query filtering conditions and do not read them at all, thus they are called [data skipping indexes](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-data_skipping-indexes).
+Unlike other database management systems, secondary indexes in ClickHouse don't point to specific rows or row ranges. Instead, they allow the database to know in advance that all rows in some data parts wouldn't match the query filtering conditions and don't read them at all, thus they're called [data skipping indexes](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-data_skipping-indexes).
## Suitable for online queries {#suitable-for-online-queries}
-Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later").
+Most OLAP database management systems don't aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later").
In ClickHouse, "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the moment when the user interface page is loading — in other words, *online*.
diff --git a/docs/about-us/history.md b/docs/about-us/history.md
index c6d38917dc2..726afbd4b40 100644
--- a/docs/about-us/history.md
+++ b/docs/about-us/history.md
@@ -44,17 +44,17 @@ However, data aggregation comes with a lot of limitations:
- The user can't make custom reports.
- When aggregating over a large number of distinct keys, the data volume is barely reduced, so aggregation is useless.
- For a large number of reports, there are too many aggregation variations (combinatorial explosion).
-- When aggregating keys with high cardinality (such as URLs), the volume of data is not reduced by much (less than twofold).
+- When aggregating keys with high cardinality (such as URLs), the volume of data isn't reduced by much (less than twofold).
- For this reason, the volume of data with aggregation might grow instead of shrink.
-- Users do not view all the reports we generate for them. A large portion of those calculations are useless.
+- Users don't view all the reports we generate for them. A large portion of those calculations are useless.
- The logical integrity of the data may be violated for various aggregations.
-If we do not aggregate anything and work with non-aggregated data, this might reduce the volume of calculations.
+If we don't aggregate anything and work with non-aggregated data, this might reduce the volume of calculations.
However, with aggregation, a significant part of the work is taken offline and completed relatively calmly. In contrast, online calculations require calculating as fast as possible, since the user is waiting for the result.
Yandex.Metrica has a specialized system for aggregating data called Metrage, which was used for the majority of reports.
Starting in 2009, Yandex.Metrica also used a specialized OLAP database for non-aggregated data called OLAPServer, which was previously used for the report builder.
-OLAPServer worked well for non-aggregated data, but it had many restrictions that did not allow it to be used for all reports as desired. These included a lack of support for data types (numbers only), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer is not a DBMS, but a specialized DB.
+OLAPServer worked well for non-aggregated data, but it had many restrictions that didn't allow it to be used for all reports as desired. These included a lack of support for data types (numbers only), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer isn't a DBMS, but a specialized DB.
The initial goal for ClickHouse was to remove the limitations of OLAPServer and solve the problem of working with non-aggregated data for all reports, but over the years, it has grown into a general-purpose database management system suitable for a wide range of analytical tasks.
diff --git a/docs/about-us/support.md b/docs/about-us/support.md
index ee4f69af2ea..9f0c68bccf4 100644
--- a/docs/about-us/support.md
+++ b/docs/about-us/support.md
@@ -17,7 +17,7 @@ ClickHouse provides support services for our ClickHouse Cloud users and customer
You can also subscribe to our [status page](https://status.clickhouse.com) to get notified quickly about any incidents affecting our platform.
:::note
-Please note that only subscription customers have a service level agreement on support incidents. If you are not currently a ClickHouse Cloud user – while we will try to answer your question, we'd encourage you to go instead to one of our community resources:
+Please note that only subscription customers have a service level agreement on support incidents. If you're not currently a ClickHouse Cloud user – while we will try to answer your question, we'd encourage you to go instead to one of our community resources:
- [ClickHouse community Slack channel](https://clickhouse.com/slack)
- [Other community options](https://github.com/ClickHouse/ClickHouse/blob/master/README.md#useful-links)
diff --git a/docs/best-practices/_snippets/_async_inserts.md b/docs/best-practices/_snippets/_async_inserts.md
index 1a7d07c6aab..c6482f0a623 100644
--- a/docs/best-practices/_snippets/_async_inserts.md
+++ b/docs/best-practices/_snippets/_async_inserts.md
@@ -4,7 +4,7 @@ import async_inserts from '@site/static/images/bestpractices/async_inserts.png';
Asynchronous inserts in ClickHouse provide a powerful alternative when client-side batching isn't feasible. This is especially valuable in observability workloads, where hundreds or thousands of agents send data continuously—logs, metrics, traces—often in small, real-time payloads. Buffering data client-side in these environments increases complexity, requiring a centralized queue to ensure sufficiently large batches can be sent.
:::note
-Sending many small batches in synchronous mode is not recommended, leading to many parts being created. This will lead to poor query performance and ["too many part"](/knowledgebase/exception-too-many-parts) errors.
+Sending many small batches in synchronous mode isn't recommended, leading to many parts being created. This will lead to poor query performance and ["too many part"](/knowledgebase/exception-too-many-parts) errors.
:::
Asynchronous inserts shift batching responsibility from the client to the server by writing incoming data to an in-memory buffer, then flushing it to storage based on configurable thresholds. This approach significantly reduces part creation overhead, lowers CPU usage, and ensures ingestion remains efficient—even under high concurrency.
@@ -19,7 +19,7 @@ When enabled (1), inserts are buffered and only written to disk once one of the
(2) a time threshold elapses (async_insert_busy_timeout_ms) or
(3) a maximum number of insert queries accumulate (async_insert_max_query_number).
-This batching process is invisible to clients and helps ClickHouse efficiently merge insert traffic from multiple sources. However, until a flush occurs, the data cannot be queried. Importantly, there are multiple buffers per insert shape and settings combination, and in clusters, buffers are maintained per node—enabling fine-grained control across multi-tenant environments. Insert mechanics are otherwise identical to those described for [synchronous inserts](/best-practices/selecting-an-insert-strategy#synchronous-inserts-by-default).
+This batching process is invisible to clients and helps ClickHouse efficiently merge insert traffic from multiple sources. However, until a flush occurs, the data can't be queried. Importantly, there are multiple buffers per insert shape and settings combination, and in clusters, buffers are maintained per node—enabling fine-grained control across multi-tenant environments. Insert mechanics are otherwise identical to those described for [synchronous inserts](/best-practices/selecting-an-insert-strategy#synchronous-inserts-by-default).
### Choosing a return mode {#choosing-a-return-mode}
@@ -39,7 +39,7 @@ Our strong recommendation is to use `async_insert=1,wait_for_async_insert=1` if
### Deduplication and reliability {#deduplication-and-reliability}
-By default, ClickHouse performs automatic deduplication for synchronous inserts, which makes retries safe in failure scenarios. However, this is disabled for asynchronous inserts unless explicitly enabled (this should not be enabled if you have dependent materialized views—[see issue](https://github.com/ClickHouse/ClickHouse/issues/66003)).
+By default, ClickHouse performs automatic deduplication for synchronous inserts, which makes retries safe in failure scenarios. However, this is disabled for asynchronous inserts unless explicitly enabled (this shouldn't be enabled if you have dependent materialized views—[see issue](https://github.com/ClickHouse/ClickHouse/issues/66003)).
In practice, if deduplication is turned on and the same insert is retried—due to, for instance, a timeout or network drop—ClickHouse can safely ignore the duplicate. This helps maintain idempotency and avoids double-writing data. Still, it's worth noting that insert validation and schema parsing happen only during buffer flush—so errors (like type mismatches) will only surface at that point.
diff --git a/docs/best-practices/_snippets/_avoid_mutations.md b/docs/best-practices/_snippets/_avoid_mutations.md
index 462570b403a..132fcc502a4 100644
--- a/docs/best-practices/_snippets/_avoid_mutations.md
+++ b/docs/best-practices/_snippets/_avoid_mutations.md
@@ -1,4 +1,4 @@
-In ClickHouse, **mutations** refer to operations that modify or delete existing data in a table—typically using `ALTER TABLE ... DELETE` or `ALTER TABLE ... UPDATE`. While these statements may appear similar to standard SQL operations, they are fundamentally different under the hood.
+In ClickHouse, **mutations** refer to operations that modify or delete existing data in a table—typically using `ALTER TABLE ... DELETE` or `ALTER TABLE ... UPDATE`. While these statements may appear similar to standard SQL operations, they're fundamentally different under the hood.
Rather than modifying rows in place, mutations in ClickHouse are asynchronous background processes that rewrite entire [data parts](/parts) affected by the change. This approach is necessary due to ClickHouse's column-oriented, immutable storage model, and it can lead to significant I/O and resource usage.
@@ -10,7 +10,7 @@ For large datasets, this can produce a substantial spike in disk I/O and degrade
For how to monitor the number of active or queued mutations refer to the following [knowledge base article](/knowledgebase/view_number_of_active_mutations).
:::
-Mutations are **totally ordered**: they apply to data inserted before the mutation was issued, while newer data remains unaffected. They do not block inserts but can still overlap with other ongoing queries. A SELECT running during a mutation may read a mix of mutated and unmutated parts, which can lead to inconsistent views of the data during execution. ClickHouse executes mutations in parallel per part, which can further intensify memory and CPU usage, especially when complex subqueries (like x IN (SELECT ...)) are involved.
+Mutations are **totally ordered**: they apply to data inserted before the mutation was issued, while newer data remains unaffected. They don't block inserts but can still overlap with other ongoing queries. A SELECT running during a mutation may read a mix of mutated and unmutated parts, which can lead to inconsistent views of the data during execution. ClickHouse executes mutations in parallel per part, which can further intensify memory and CPU usage, especially when complex subqueries (like x IN (SELECT ...)) are involved.
As a rule, **avoid frequent or large-scale mutations**, especially on high-volume tables. Instead, use alternative table engines such as [ReplacingMergeTree](/guides/replacing-merge-tree) or [CollapsingMergeTree](/engines/table-engines/mergetree-family/collapsingmergetree), which are designed to handle data corrections more efficiently at query time or during merges. If mutations are absolutely necessary, monitor them carefully using the system.mutations table and use `KILL MUTATION` if a process is stuck or misbehaving. Misusing mutations can lead to degraded performance, excessive storage churn, and potential service instability—so apply them with caution and sparingly.
diff --git a/docs/best-practices/_snippets/_avoid_optimize_final.md b/docs/best-practices/_snippets/_avoid_optimize_final.md
index 5d0fe74ea71..749bcac7a00 100644
--- a/docs/best-practices/_snippets/_avoid_optimize_final.md
+++ b/docs/best-practices/_snippets/_avoid_optimize_final.md
@@ -3,7 +3,7 @@ import simple_merges from '@site/static/images/bestpractices/simple_merges.png';
ClickHouse tables using the **MergeTree engine** store data on disk as **immutable parts**, which are created every time data is inserted.
-Each insert creates a new part containing sorted, compressed column files, along with metadata like indexes and checksums. For a detailed description of part structures and how they are formed we recommend this [guide](/parts).
+Each insert creates a new part containing sorted, compressed column files, along with metadata like indexes and checksums. For a detailed description of part structures and how they're formed we recommend this [guide](/parts).
Over time, background processes merge smaller parts into larger ones to reduce fragmentation and improve query performance.
@@ -19,7 +19,7 @@ OPTIMIZE TABLE
FINAL;
resource intensive operations which may impact cluster performance.
:::note OPTIMIZE FINAL vs FINAL
-`OPTIMIZE FINAL` is not the same as `FINAL`, which is sometimes necessary to use
+`OPTIMIZE FINAL` isn't the same as `FINAL`, which is sometimes necessary to use
to get results without duplicates, such as with the `ReplacingMergeTree`. Generally,
`FINAL` is okay to use if your queries are filtering on the same columns as those
in your primary key.
diff --git a/docs/best-practices/_snippets/_when-to-use-json.md b/docs/best-practices/_snippets/_when-to-use-json.md
index 77c12f436fb..63d8405becc 100644
--- a/docs/best-practices/_snippets/_when-to-use-json.md
+++ b/docs/best-practices/_snippets/_when-to-use-json.md
@@ -12,7 +12,7 @@ The `JSON` type is designed for querying, filtering, and aggregating specific fi
- Your data has a dynamic or unpredictable structure with varying keys across documents
- Field types or schemas change over time or vary between records
-- You need to query, filter, or aggregate on specific paths within JSON objects whose structure you cannot predict upfront
+- You need to query, filter, or aggregate on specific paths within JSON objects whose structure you can't predict upfront
- Your use case involves semi-structured data like logs, events, or user-generated content with inconsistent schemas
### Use a `String` column (or structured types) when: {#use-string-type}
@@ -22,7 +22,7 @@ The `JSON` type is designed for querying, filtering, and aggregating specific fi
- The `JSON` is simply a transport/storage format, not analyzed within ClickHouse
:::tip
-If `JSON` is an opaque document that is not analyzed inside the database, and only stored and retrieved back, it should be stored as a `String` field. The `JSON` type's benefits only materialize when you need to efficiently query, filter, or aggregate on specific fields within dynamic `JSON` structures.
+If `JSON` is an opaque document that isn't analyzed inside the database, and only stored and retrieved back, it should be stored as a `String` field. The `JSON` type's benefits only materialize when you need to efficiently query, filter, or aggregate on specific fields within dynamic `JSON` structures.
You can also mix approaches—use standard columns for predictable top-level fields and a `JSON` column for dynamic sections of the payload.
:::
diff --git a/docs/best-practices/choosing_a_primary_key.md b/docs/best-practices/choosing_a_primary_key.md
index 3275ad897da..bb3f188498e 100644
--- a/docs/best-practices/choosing_a_primary_key.md
+++ b/docs/best-practices/choosing_a_primary_key.md
@@ -25,7 +25,7 @@ Choosing an effective primary key in ClickHouse is crucial for query performance
Some simple rules can be applied to help choose an ordering key. The following can sometimes be in conflict, so consider these in order. **You can identify a number of keys from this process, with 4-5 typically sufficient**:
:::note Important
-Ordering keys must be defined on table creation and cannot be added. Additional ordering can be added to a table after (or before) data insertion through a feature known as projections. Be aware these result in data duplication. Further details [here](/sql-reference/statements/alter/projection).
+Ordering keys must be defined on table creation and can't be added. Additional ordering can be added to a table after (or before) data insertion through a feature known as projections. Be aware these result in data duplication. Further details [here](/sql-reference/statements/alter/projection).
:::
## Example {#example}
@@ -168,7 +168,7 @@ Additionally, we visualize how the sparse index prunes all row blocks that can't
:::note
-All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they are included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
+All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they're included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
:::
A complete advanced guide on choosing primary keys can be found [here](/guides/best-practices/sparse-primary-indexes).
diff --git a/docs/best-practices/json_type.md b/docs/best-practices/json_type.md
index 5c58a39fa0a..6eb8737d8d1 100644
--- a/docs/best-practices/json_type.md
+++ b/docs/best-practices/json_type.md
@@ -29,7 +29,7 @@ Type hints offer more than just a way to avoid unnecessary type inference—they
## Advanced features {#advanced-features}
-* JSON columns **can be used in primary keys** like any other columns. Codecs cannot be specified for a subcolumn.
+* JSON columns **can be used in primary keys** like any other columns. Codecs can't be specified for a subcolumn.
* They support introspection via functions like [`JSONAllPathsWithTypes()` and `JSONDynamicPaths()`](/sql-reference/data-types/newjson#introspection-functions).
* You can read nested sub-objects using the `.^` syntax.
* Query syntax may differ from standard SQL and may require special casting or operators for nested fields.
@@ -113,7 +113,7 @@ Consider the [arXiv dataset](https://www.kaggle.com/datasets/Cornell-University/
}
```
-While the JSON here is complex, with nested structures, it is predictable. The number and type of the fields will not change. While we could use the JSON type for this example, we can also just define the structure explicitly using [Tuples](/sql-reference/data-types/tuple) and [Nested](/sql-reference/data-types/nested-data-structures/nested) types:
+While the JSON here is complex, with nested structures, it is predictable. The number and type of the fields won't change. While we could use the JSON type for this example, we can also just define the structure explicitly using [Tuples](/sql-reference/data-types/tuple) and [Nested](/sql-reference/data-types/nested-data-structures/nested) types:
```sql
CREATE TABLE arxiv
diff --git a/docs/best-practices/minimize_optimize_joins.md b/docs/best-practices/minimize_optimize_joins.md
index c3bec49726e..7868e92b66c 100644
--- a/docs/best-practices/minimize_optimize_joins.md
+++ b/docs/best-practices/minimize_optimize_joins.md
@@ -17,7 +17,7 @@ ClickHouse supports a wide variety of JOIN types and algorithms, and JOIN perfor
In general, denormalize when:
- Tables change infrequently or when batch refreshes are acceptable.
-- Relationships are not many-to-many or not excessively high in cardinality.
+- Relationships aren't many-to-many or not excessively high in cardinality.
- Only a limited subset of the columns will be queried, i.e. certain columns can be excluded from denormalization.
- You have the capability to shift processing out of ClickHouse into upstream systems like Flink, where real-time enrichment or flattening can be managed.
@@ -31,15 +31,15 @@ When JOINs are required, ensure you're using **at least version 24.12 and prefer
Follow these best practices to improve JOIN performance:
-* **Avoid Cartesian products**: If a value on the left-hand side matches multiple values on the right-hand side, the JOIN will return multiple rows — the so-called Cartesian product. If your use case doesn't need all matches from the right-hand side but just any single match, you can use `ANY` JOINs (e.g. `LEFT ANY JOIN`). They are faster and use less memory than regular JOINs.
-* **Reduce the sizes of JOINed tables**: The runtime and memory consumption of JOINs grows proportionally with the sizes of the left and right tables. To reduce the amount of processed data by the JOIN, add additional filter conditions in the `WHERE` or `JOIN ON` clauses of the query. ClickHouse pushes filter conditions as deep as possible down in the query plan, usually before JOINs. If the filters are not pushed down automatically (for any reason), rewrite one side of the JOIN as a sub-query to force pushdown.
+* **Avoid Cartesian products**: If a value on the left-hand side matches multiple values on the right-hand side, the JOIN will return multiple rows — the so-called Cartesian product. If your use case doesn't need all matches from the right-hand side but just any single match, you can use `ANY` JOINs (e.g. `LEFT ANY JOIN`). They're faster and use less memory than regular JOINs.
+* **Reduce the sizes of JOINed tables**: The runtime and memory consumption of JOINs grows proportionally with the sizes of the left and right tables. To reduce the amount of processed data by the JOIN, add additional filter conditions in the `WHERE` or `JOIN ON` clauses of the query. ClickHouse pushes filter conditions as deep as possible down in the query plan, usually before JOINs. If the filters aren't pushed down automatically (for any reason), rewrite one side of the JOIN as a sub-query to force pushdown.
* **Use direct JOINs via dictionaries if appropriate**: Standard JOINs in ClickHouse are executed in two phases: a build phase which iterates the right-hand side to build a hash table, followed by a probe phase which iterates the left-hand side to find matching join partners via hash table lookups. If the right-hand side is a [dictionary](/dictionary) or another table engine with key-value characteristics (e.g. [EmbeddedRocksDB](/engines/table-engines/integrations/embedded-rocksdb) or the [Join table engine](/engines/table-engines/special/join)), then ClickHouse can use the "direct" join algorithm, which effectively removes the need to build a hash table, speeding up query processing. This works for `INNER` and `LEFT OUTER` JOINs and is preferred for real-time analytical workloads.
* **Utilize table sorting for JOINs**: Each table in ClickHouse is sorted by the table's primary key columns. It is possible to exploit the table's sorting by using so-called sort-merge JOIN algorithms like `full_sorting_merge` and `partial_merge`. Unlike standard JOIN algorithms based on hash tables (see below, `parallel_hash`, `hash`, `grace_hash`), sort-merge JOIN algorithms first sort and then merge both tables. If the query JOINs both tables by their respective primary key columns, then sort-merge has an optimization which omits the sort step, saving processing time and overhead.
* **Avoid disk-spilling JOINs**: Intermediate states of JOINs (e.g. hash tables) can become so big that they no longer fit into main memory. In this situation, ClickHouse will return an out-of-memory error by default. Some join algorithms (see below), for example [`grace_hash`](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2), [`partial_merge`](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3) and [`full_sorting_merge`](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3), are able to spill intermediate states to disk and continue query execution. These join algorithms should nevertheless be used with care as disk access can significantly slow down join processing. We instead recommend optimizing the JOIN query in other ways to reduce the size of intermediate states.
* **Default values as no-match markers in outer JOINs**: Left/right/full outer joins include all values from the left/right/both tables. If no join partner is found in the other table for some value, ClickHouse replaces the join partner by a special marker. The SQL standard mandates that databases use NULL as such a marker. In ClickHouse, this requires wrapping the result column in Nullable, creating an additional memory and performance overhead. As an alternative, you can configure the setting `join_use_nulls = 0` and use the default value of the result column data type as the marker.
:::note Use dictionaries carefully
-When using dictionaries for JOINs in ClickHouse, it's important to understand that dictionaries, by design, do not allow duplicate keys. During data loading, any duplicate keys are silently deduplicated—only the last loaded value for a given key is retained. This behavior makes dictionaries ideal for one-to-one or many-to-one relationships where only the latest or authoritative value is needed. However, using a dictionary for a one-to-many or many-to-many relationship (e.g. joining roles to actors where an actor can have multiple roles) will result in silent data loss, as all but one of the matching rows will be discarded. As a result, dictionaries are not suitable for scenarios requiring full relational fidelity across multiple matches.
+When using dictionaries for JOINs in ClickHouse, it's important to understand that dictionaries, by design, don't allow duplicate keys. During data loading, any duplicate keys are silently deduplicated—only the last loaded value for a given key is retained. This behavior makes dictionaries ideal for one-to-one or many-to-one relationships where only the latest or authoritative value is needed. However, using a dictionary for a one-to-many or many-to-many relationship (e.g. joining roles to actors where an actor can have multiple roles) will result in silent data loss, as all but one of the matching rows will be discarded. As a result, dictionaries aren't suitable for scenarios requiring full relational fidelity across multiple matches.
:::
## Choosing the correct JOIN Algorithm {#choosing-the-right-join-algorithm}
diff --git a/docs/best-practices/partitioning_keys.mdx b/docs/best-practices/partitioning_keys.mdx
index a2403823f69..a3f0ee2daea 100644
--- a/docs/best-practices/partitioning_keys.mdx
+++ b/docs/best-practices/partitioning_keys.mdx
@@ -13,7 +13,7 @@ import partitions from '@site/static/images/bestpractices/partitions.png';
import merges_with_partitions from '@site/static/images/bestpractices/merges_with_partitions.png';
:::note A data management technique
-Partitioning is primarily a data management technique and not a query optimization tool, and while it can improve performance in specific workloads, it should not be the first mechanism used to accelerate queries; the partitioning key must be chosen carefully, with a clear understanding of its implications, and only applied when it aligns with data life cycle needs or well-understood access patterns.
+Partitioning is primarily a data management technique and not a query optimization tool, and while it can improve performance in specific workloads, it shouldn't be the first mechanism used to accelerate queries; the partitioning key must be chosen carefully, with a clear understanding of its implications, and only applied when it aligns with data life cycle needs or well-understood access patterns.
:::
In ClickHouse, partitioning organizes data into logical segments based on a specified key. This is defined using the `PARTITION BY` clause at table creation time and is commonly used to group rows by time intervals, categories, or other business-relevant dimensions. Each unique value of the partitioning expression forms its own physical partition on disk, and ClickHouse stores data in separate parts for each of these values. Partitioning improves data management, simplifies retention policies, and can help with certain query patterns.
@@ -51,7 +51,7 @@ Partitioning is a powerful tool for managing large datasets in ClickHouse, espec
While partitioning can improve query performance for some workloads, it can also negatively impact response time.
-If the partitioning key is not in the primary key and you are filtering by it, users may see an improvement in query performance with partitioning. See [here](/partitions#query-optimization) for an example.
+If the partitioning key isn't in the primary key and you're filtering by it, users may see an improvement in query performance with partitioning. See [here](/partitions#query-optimization) for an example.
Conversely, if queries need to query across partitions performance may be negatively impacted due to a higher number of total parts. For this reason, users should understand their access patterns before considering partitioning a query optimization technique.
@@ -61,7 +61,7 @@ In summary, users should primarily think of partitioning as a data management te
Importantly, a higher number of parts will negatively affect query performance. ClickHouse will therefore respond to inserts with a [“too many parts”](/knowledgebase/exception-too-many-parts) error if the number of parts exceeds specified limits either in [total](/operations/settings/merge-tree-settings#max_parts_in_total) or [per partition](/operations/settings/merge-tree-settings#parts_to_throw_insert).
-Choosing the right **cardinality** for the partitioning key is critical. A high-cardinality partitioning key - where the number of distinct partition values is large - can lead to a proliferation of data parts. Since ClickHouse does not merge parts across partitions, too many partitions will result in too many unmerged parts, eventually triggering the “Too many parts” error. [Merges are essential](/merges) for reducing storage fragmentation and optimizing query speed, but with high-cardinality partitions, that merge potential is lost.
+Choosing the right **cardinality** for the partitioning key is critical. A high-cardinality partitioning key - where the number of distinct partition values is large - can lead to a proliferation of data parts. Since ClickHouse doesn't merge parts across partitions, too many partitions will result in too many unmerged parts, eventually triggering the “Too many parts” error. [Merges are essential](/merges) for reducing storage fragmentation and optimizing query speed, but with high-cardinality partitions, that merge potential is lost.
By contrast, a **low-cardinality partitioning key**—with fewer than 100 - 1,000 distinct values - is usually optimal. It enables efficient part merging, keeps metadata overhead low, and avoids excessive object creation in storage. In addition, ClickHouse automatically builds MinMax indexes on partition columns, which can significantly speed up queries that filter on those columns. For example, filtering by month when the table is partitioned by `toStartOfMonth(date)` allows the engine to skip irrelevant partitions and their parts entirely.
diff --git a/docs/best-practices/select_data_type.md b/docs/best-practices/select_data_type.md
index 1288d4f484d..c854d71fee8 100644
--- a/docs/best-practices/select_data_type.md
+++ b/docs/best-practices/select_data_type.md
@@ -26,7 +26,7 @@ Some straightforward guidelines can significantly enhance the schema:
* **Leverage LowCardinality and Specialized Types:** For columns with fewer than approximately 10,000 unique values, use LowCardinality types to significantly reduce storage through dictionary encoding. Similarly, use FixedString only when the column values are strictly fixed-length strings (e.g., country or currency codes), and prefer Enum types for columns with a finite set of possible values to enable efficient storage and built-in data validation.
-* **Enums for data validation:** The Enum type can be used to efficiently encode enumerated types. Enums can either be 8 or 16 bits, depending on the number of unique values they are required to store. Consider using this if you need either the associated validation at insert time (undeclared values will be rejected) or wish to perform queries which exploit a natural ordering in the Enum values e.g. imagine a feedback column containing user responses Enum(':(' = 1, ':|' = 2, ':)' = 3).
+* **Enums for data validation:** The Enum type can be used to efficiently encode enumerated types. Enums can either be 8 or 16 bits, depending on the number of unique values they're required to store. Consider using this if you need either the associated validation at insert time (undeclared values will be rejected) or wish to perform queries which exploit a natural ordering in the Enum values e.g. imagine a feedback column containing user responses Enum(':(' = 1, ':|' = 2, ':)' = 3).
## Example {#example}
@@ -78,7 +78,7 @@ By applying our early simple rules to our posts table, we can identify an optima
|------------------------|------------|------------------------------------------------------------------------|----------------|--------|----------------------------------------------------------------------------------------------|------------------------------------------|
| `PostTypeId` | Yes | 1, 8 | 8 | No | | `Enum('Question' = 1, 'Answer' = 2, 'Wiki' = 3, 'TagWikiExcerpt' = 4, 'TagWiki' = 5, 'ModeratorNomination' = 6, 'WikiPlaceholder' = 7, 'PrivilegeWiki' = 8)` |
| `AcceptedAnswerId` | Yes | 0, 78285170 | 12282094 | Yes | Differentiate Null with 0 value | UInt32 |
-| `CreationDate` | No | 2008-07-31 21:42:52.667000000, 2024-03-31 23:59:17.697000000 | - | No | Millisecond granularity is not required, use DateTime | DateTime |
+| `CreationDate` | No | 2008-07-31 21:42:52.667000000, 2024-03-31 23:59:17.697000000 | - | No | Millisecond granularity isn't required, use DateTime | DateTime |
| `Score` | Yes | -217, 34970 | 3236 | No | | Int32 |
| `ViewCount` | Yes | 2, 13962748 | 170867 | No | | UInt32 |
| `Body` | No | - | - | No | | String |
@@ -86,8 +86,8 @@ By applying our early simple rules to our posts table, we can identify an optima
| `OwnerDisplayName` | No | - | 181251 | Yes | Consider Null to be empty string | String |
| `LastEditorUserId` | Yes | -1, 9999993 | 1104694 | Yes | 0 is an unused value can be used for Nulls | Int32 |
| `LastEditorDisplayName` | No | - | 70952 | Yes | Consider Null to be an empty string. Tested LowCardinality and no benefit | String |
-| `LastEditDate` | No | 2008-08-01 13:24:35.051000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity is not required, use DateTime | DateTime |
-| `LastActivityDate` | No | 2008-08-01 12:19:17.417000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity is not required, use DateTime | DateTime |
+| `LastEditDate` | No | 2008-08-01 13:24:35.051000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity isn't required, use DateTime | DateTime |
+| `LastActivityDate` | No | 2008-08-01 12:19:17.417000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity isn't required, use DateTime | DateTime |
| `Title` | No | - | - | No | Consider Null to be an empty string | String |
| `Tags` | No | - | - | No | Consider Null to be an empty string | String |
| `AnswerCount` | Yes | 0, 518 | 216 | No | Consider Null and 0 to same | UInt16 |
@@ -95,8 +95,8 @@ By applying our early simple rules to our posts table, we can identify an optima
| `FavoriteCount` | Yes | 0, 225 | 6 | Yes | Consider Null and 0 to same | UInt8 |
| `ContentLicense` | No | - | 3 | No | LowCardinality outperforms FixedString | LowCardinality(String) |
| `ParentId` | No | - | 20696028 | Yes | Consider Null to be an empty string | String |
-| `CommunityOwnedDate` | No | 2008-08-12 04:59:35.017000000, 2024-04-01 05:36:41.380000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity is not required, use DateTime | DateTime |
-| `ClosedDate` | No | 2008-09-04 20:56:44, 2024-04-06 18:49:25.393000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity is not required, use DateTime | DateTime |
+| `CommunityOwnedDate` | No | 2008-08-12 04:59:35.017000000, 2024-04-01 05:36:41.380000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity isn't required, use DateTime | DateTime |
+| `ClosedDate` | No | 2008-09-04 20:56:44, 2024-04-06 18:49:25.393000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity isn't required, use DateTime | DateTime |
:::note Tip
Identifying the type for a column relies on understanding its numeric range and number of unique values. To find the range of all columns, and the number of distinct values, you can use the simple query `SELECT * APPLY min, * APPLY max, * APPLY uniq FROM table FORMAT Vertical`. We recommend performing this over a smaller subset of the data as this can be expensive.
diff --git a/docs/best-practices/selecting_an_insert_strategy.md b/docs/best-practices/selecting_an_insert_strategy.md
index dfbcdc728e2..d864ef0f300 100644
--- a/docs/best-practices/selecting_an_insert_strategy.md
+++ b/docs/best-practices/selecting_an_insert_strategy.md
@@ -18,7 +18,7 @@ import BulkInserts from '@site/docs/best-practices/_snippets/_bulk_inserts.md';
Efficient data ingestion forms the basis of high-performance ClickHouse deployments. Selecting the right insert strategy can dramatically impact throughput, cost, and reliability. This section outlines best practices, tradeoffs, and configuration options to help you make the right decision for your workload.
:::note
-The following assumes you are pushing data to ClickHouse via a client. If you are pulling data into ClickHouse e.g. using built in table functions such as [s3](/sql-reference/table-functions/s3) and [gcs](/sql-reference/table-functions/gcs), we recommend our guide ["Optimizing for S3 Insert and Read Performance"](/integrations/s3/performance).
+The following assumes you're pushing data to ClickHouse via a client. If you're pulling data into ClickHouse e.g. using built in table functions such as [s3](/sql-reference/table-functions/s3) and [gcs](/sql-reference/table-functions/gcs), we recommend our guide ["Optimizing for S3 Insert and Read Performance"](/integrations/s3/performance).
:::
## Synchronous inserts by default {#synchronous-inserts-by-default}
@@ -49,7 +49,7 @@ The data is ⑤ transmitted to a ClickHouse network interface—either the [nati
After ⑥ receiving the data, ClickHouse ⑦ decompresses it if compression was used, then ⑧ parses it from the originally sent format.
-Using the values from that formatted data and the target table's [DDL](/sql-reference/statements/create/table) statement, ClickHouse ⑨ builds an in-memory [block](/development/architecture#block) in the MergeTree format, ⑩ [sorts](/parts#what-are-table-parts-in-clickhouse) rows by the primary key columns if they are not already pre-sorted, ⑪ creates a [sparse primary index](/guides/best-practices/sparse-primary-indexes), ⑫ applies [per-column compression](/parts#what-are-table-parts-in-clickhouse), and ⑬ writes the data as a new ⑭ [data part](/parts) to disk.
+Using the values from that formatted data and the target table's [DDL](/sql-reference/statements/create/table) statement, ClickHouse ⑨ builds an in-memory [block](/development/architecture#block) in the MergeTree format, ⑩ [sorts](/parts#what-are-table-parts-in-clickhouse) rows by the primary key columns if they're not already pre-sorted, ⑪ creates a [sparse primary index](/guides/best-practices/sparse-primary-indexes), ⑫ applies [per-column compression](/parts#what-are-table-parts-in-clickhouse), and ⑬ writes the data as a new ⑭ [data part](/parts) to disk.
### Batch inserts if synchronous {#batch-inserts-if-synchronous}
@@ -144,6 +144,6 @@ Unlike many traditional databases, ClickHouse also supports an HTTP interface. *
This is often preferable to ClickHouse's native protocol as it allows traffic to be easily switched with load balancers. We expect small differences in insert performance with the native protocol, which incurs a little less overhead.
-However, it lacks the native protocol's deeper integration and cannot perform client-side optimizations like materialized value computation or automatic conversion to Native format. While HTTP inserts can still be compressed using standard HTTP headers (e.g. `Content-Encoding: lz4`), the compression is applied to the entire payload rather than individual data blocks. This interface is often preferred in environments where protocol simplicity, load balancing, or broad format compatibility is more important than raw performance.
+However, it lacks the native protocol's deeper integration and can't perform client-side optimizations like materialized value computation or automatic conversion to Native format. While HTTP inserts can still be compressed using standard HTTP headers (e.g. `Content-Encoding: lz4`), the compression is applied to the entire payload rather than individual data blocks. This interface is often preferred in environments where protocol simplicity, load balancing, or broad format compatibility is more important than raw performance.
For a more detailed description of these interfaces see [here](/interfaces/overview).
diff --git a/docs/best-practices/sizing-and-hardware-recommendations.md b/docs/best-practices/sizing-and-hardware-recommendations.md
index 03fe792c5bd..91c4746e8a4 100644
--- a/docs/best-practices/sizing-and-hardware-recommendations.md
+++ b/docs/best-practices/sizing-and-hardware-recommendations.md
@@ -51,7 +51,7 @@ For workloads that need to optimize for concurrency (100+ queries per second), w
**Data warehousing use case**
-For data warehousing workloads and ad-hoc analytical queries, we recommend the [R-type series](https://aws.amazon.com/ec2/instance-types/#Memory_Optimized) from AWS or the equivalent offering from your cloud provider as they are memory optimized.
+For data warehousing workloads and ad-hoc analytical queries, we recommend the [R-type series](https://aws.amazon.com/ec2/instance-types/#Memory_Optimized) from AWS or the equivalent offering from your cloud provider as they're memory optimized.
---
@@ -82,9 +82,9 @@ If your use case is sensitive to price, lower amounts of memory will work as it
### What should the memory-to-storage ratio be? {#what-should-the-memory-to-storage-ratio-be}
-For low data volumes, a 1:1 memory-to-storage ratio is acceptable but total memory should not be below 8GB.
+For low data volumes, a 1:1 memory-to-storage ratio is acceptable but total memory shouldn't be below 8GB.
-For use cases with long retention periods for your data or with high data volumes, we recommend a 1:100 to 1:130 memory-to-storage ratio. For example, 100GB of RAM per replica if you are storing 10TB of data.
+For use cases with long retention periods for your data or with high data volumes, we recommend a 1:100 to 1:130 memory-to-storage ratio. For example, 100GB of RAM per replica if you're storing 10TB of data.
For use cases with frequent access such as for customer-facing workloads, we recommend using more memory at a 1:30 to 1:50 memory-to-storage ratio.
@@ -92,7 +92,7 @@ For use cases with frequent access such as for customer-facing workloads, we rec
We recommend having at least three replicas per shard (or two replicas with [Amazon EBS](https://aws.amazon.com/ebs/)). Additionally, we suggest vertically scaling all replicas prior to adding additional replicas (horizontal scaling).
-ClickHouse does not automatically shard, and re-sharding your dataset will require significant compute resources. Therefore, we generally recommend using the largest server available to prevent having to re-shard your data in the future.
+ClickHouse doesn't automatically shard, and re-sharding your dataset will require significant compute resources. Therefore, we generally recommend using the largest server available to prevent having to re-shard your data in the future.
Consider using [ClickHouse Cloud](https://clickhouse.com/cloud) which scales automatically and allows you to easily control the number of replicas for your use case.
diff --git a/docs/best-practices/use_materialized_views.md b/docs/best-practices/use_materialized_views.md
index 0d3a78b1061..d0883375104 100644
--- a/docs/best-practices/use_materialized_views.md
+++ b/docs/best-practices/use_materialized_views.md
@@ -13,7 +13,7 @@ import Image from '@theme/IdealImage';
import incremental_materialized_view from '@site/static/images/bestpractices/incremental_materialized_view.gif';
import refreshable_materialized_view from '@site/static/images/bestpractices/refreshable_materialized_view.gif';
-ClickHouse supports two types of materialized views: [**incremental**](/materialized-view/incremental-materialized-view) and [**refreshable**](/materialized-view/refreshable-materialized-view). While both are designed to accelerate queries by pre-computing and storing results, they differ significantly in how and when the underlying queries are executed, what workloads they are suited for, and how data freshness is handled.
+ClickHouse supports two types of materialized views: [**incremental**](/materialized-view/incremental-materialized-view) and [**refreshable**](/materialized-view/refreshable-materialized-view). While both are designed to accelerate queries by pre-computing and storing results, they differ significantly in how and when the underlying queries are executed, what workloads they're suited for, and how data freshness is handled.
**You should consider materialized views for specific query patterns which need to be accelerated, assuming previous best practices [regarding type](/best-practices/select-data-types) and [primary key optimization](/best-practices/choosing-a-primary-key) have been performed.**
@@ -43,9 +43,9 @@ For examples of incremental materialized views see [here](/materialized-view/inc
Refreshable materialized views execute their queries periodically rather than incrementally, storing the query result set for rapid retrieval.
-They are most useful when query performance is critical (e.g. sub-millisecond latency) and slightly stale results are acceptable. Since the query is re-run in full, refreshable views are best suited to queries that are either relatively fast to compute or which can be computed at infrequent intervals (e.g. hourly), such as caching “top N” results or lookup tables.
+They're most useful when query performance is critical (e.g. sub-millisecond latency) and slightly stale results are acceptable. Since the query is re-run in full, refreshable views are best suited to queries that are either relatively fast to compute or which can be computed at infrequent intervals (e.g. hourly), such as caching “top N” results or lookup tables.
-Execution frequency should be tuned carefully to avoid excessive load on the system. Extremely complex queries which consume significant resources should be scheduled cautiously — these can cause overall cluster performance to degrade by impacting caches and consuming CPU and memory. The query should run relatively quickly compared to the refresh interval to avoid overloading your cluster. For example, do not schedule a view to be updated every 10 seconds if the query itself takes at least 10 seconds to compute.
+Execution frequency should be tuned carefully to avoid excessive load on the system. Extremely complex queries which consume significant resources should be scheduled cautiously — these can cause overall cluster performance to degrade by impacting caches and consuming CPU and memory. The query should run relatively quickly compared to the refresh interval to avoid overloading your cluster. For example, don't schedule a view to be updated every 10 seconds if the query itself takes at least 10 seconds to compute.
## Summary {#summary}
@@ -53,7 +53,7 @@ In summary, use refreshable materialized views when:
- You need cached query results available instantly, and minor delays in freshness are acceptable.
- You need the top N for a query result set.
-- The size of the result set does not grow unbounded over time. This will cause performance of the target view to degrade.
+- The size of the result set doesn't grow unbounded over time. This will cause performance of the target view to degrade.
- You're performing complex joins or denormalization involving multiple tables, requiring updates whenever any source table changes.
- You're building batch workflows, denormalization tasks, or creating view dependencies similar to DBT DAGs.
diff --git a/docs/best-practices/using_data_skipping_indices.md b/docs/best-practices/using_data_skipping_indices.md
index b485bc0e6ea..9dacac05c0d 100644
--- a/docs/best-practices/using_data_skipping_indices.md
+++ b/docs/best-practices/using_data_skipping_indices.md
@@ -32,7 +32,7 @@ While powerful, skip indexes must be used with care. They only provide benefit w
**Effective skip index usage often depends on a strong correlation between the indexed column and the table's primary key, or inserting data in a way that groups similar values together.**
-In general, data skipping indices are best applied after ensuring proper primary key design and type optimization. They are particularly useful for:
+In general, data skipping indices are best applied after ensuring proper primary key design and type optimization. They're particularly useful for:
* Columns with high overall cardinality but low cardinality within a block.
* Rare values that are critical for search (e.g. error codes, specific IDs).
@@ -248,7 +248,7 @@ WHERE (CreationDate > '2009-01-01') AND (ViewCount > 10000000)
29 rows in set. Elapsed: 0.211 sec.
```
-We also show an animation how the minmax skipping index prunes all row blocks that cannot possibly contain matches for the `ViewCount` > 10,000,000 predicate in our example query:
+We also show an animation how the minmax skipping index prunes all row blocks that can't possibly contain matches for the `ViewCount` > 10,000,000 predicate in our example query:
diff --git a/docs/chdb/api/python.md b/docs/chdb/api/python.md
index 99a7c15240c..e0ae9f3fc5d 100644
--- a/docs/chdb/api/python.md
+++ b/docs/chdb/api/python.md
@@ -180,7 +180,7 @@ chdb.to_arrowTable(res)
| Error type | Description |
|---------------|----------------------------------------|
-| `ImportError` | If pyarrow or pandas are not installed |
+| `ImportError` | If pyarrow or pandas aren't installed |
**Example**
@@ -223,7 +223,7 @@ chdb.to_df(r)
| Exception | Condition |
|---------------|----------------------------------------|
-| `ImportError` | If pyarrow or pandas are not installed |
+| `ImportError` | If pyarrow or pandas aren't installed |
**Example**
@@ -430,7 +430,7 @@ finally blocks or destructors.
Close the session and cleanup resources.
This method closes the underlying connection and resets the global session state.
-After calling this method, the session becomes invalid and cannot be used for
+After calling this method, the session becomes invalid and can't be used for
further queries.
**Syntax**
@@ -495,7 +495,7 @@ The exact return type depends on the format parameter:
| `ValueError` | If the SQL query is malformed |
:::note
-The “Debug” format is not supported and will be automatically converted
+The “Debug” format isn't supported and will be automatically converted
to “CSV” with a warning.
For debugging, use connection string parameters instead.
:::
@@ -578,7 +578,7 @@ send_query(sql, fmt='CSV') → StreamingResult
| `ValueError` | If the SQL query is malformed |
:::note
-The “Debug” format is not supported and will be automatically converted
+The “Debug” format isn't supported and will be automatically converted
to “CSV” with a warning. For debugging, use connection string parameters instead.
:::
@@ -653,7 +653,7 @@ The exact return type depends on the format parameter:
| `ValueError` | If the SQL query is malformed |
:::note
-The “Debug” format is not supported and will be automatically converted
+The “Debug” format isn't supported and will be automatically converted
to “CSV” with a warning. For debugging, use connection string parameters
instead.
:::
@@ -814,7 +814,7 @@ Close the connection and cleanup resources.
This method closes the database connection and cleans up any associated
resources including active cursors. After calling this method, the
-connection becomes invalid and cannot be used for further operations.
+connection becomes invalid and can't be used for further operations.
**Syntax**
@@ -929,7 +929,7 @@ query(query: str, format: str = 'CSV') → Any
| Exception | Condition |
|----------------|---------------------------------------------------|
| `RuntimeError` | If query execution fails |
-| `ImportError` | If required packages for format are not installed |
+| `ImportError` | If required packages for format aren't installed |
:::warning Warning
This method loads the entire result set into memory. For large
@@ -997,7 +997,7 @@ send_query(query: str, format: str = 'CSV') → StreamingResult
| Exception | Condition |
|----------------|---------------------------------------------------|
| `RuntimeError` | If query execution fails |
-| `ImportError` | If required packages for format are not installed |
+| `ImportError` | If required packages for format aren't installed |
:::note
Only the “Arrow” format supports the `record_batch()` method on the returned StreamingResult.
@@ -1052,7 +1052,7 @@ class chdb.state.sqlitelike.Cursor(connection)
Close the cursor and cleanup resources.
This method closes the cursor and cleans up any associated resources.
-After calling this method, the cursor becomes invalid and cannot be
+After calling this method, the cursor becomes invalid and can't be
used for further operations.
**Syntax**
@@ -1452,7 +1452,7 @@ chdb.state.sqlitelike.to_arrowTable(res)
| Exception | Condition |
|---------------|-------------------------------------------------|
-| `ImportError` | If pyarrow or pandas packages are not installed |
+| `ImportError` | If pyarrow or pandas packages aren't installed |
:::note
This function requires both pyarrow and pandas to be installed.
@@ -1509,7 +1509,7 @@ chdb.state.sqlitelike.to_df(r)
| Exception | Condition |
|---------------|-------------------------------------------------|
-| `ImportError` | If pyarrow or pandas packages are not installed |
+| `ImportError` | If pyarrow or pandas packages aren't installed |
:::note
This function uses multi-threading for the Arrow to Pandas conversion
@@ -1582,7 +1582,7 @@ chdb.dbapi.connect(*args, **kwargs)
| Exception | Condition |
|--------------------------------------|-------------------------------------|
-| [`err.Error`](#chdb-dbapi-err-error) | If connection cannot be established |
+| [`err.Error`](#chdb-dbapi-err-error) | If connection can't be established |
---
@@ -1694,7 +1694,7 @@ class chdb.dbapi.connections.Connection(path=None)
```
:::note
-ClickHouse does not support traditional transactions, so commit() and rollback()
+ClickHouse doesn't support traditional transactions, so commit() and rollback()
operations are no-ops but provided for DB-API compliance.
:::
@@ -1901,7 +1901,7 @@ Get the last query response.
:::note
This property is updated each time query() is called directly.
-It does not reflect queries executed through cursors.
+It doesn't reflect queries executed through cursors.
:::
---
@@ -1935,7 +1935,7 @@ The cursor provides methods for executing SQL statements, managing query results
and navigating through result sets. It supports parameter binding, bulk operations,
and follows DB-API 2.0 specifications.
-Do not create Cursor instances directly. Use `Connection.cursor()` instead.
+Don't create Cursor instances directly. Use `Connection.cursor()` instead.
```python
class chdb.dbapi.cursors.Cursor(connection)
@@ -1991,15 +1991,15 @@ callproc(procname, args=())
| `sequence` | The original args parameter (unmodified) |
:::note
-chDB/ClickHouse does not support stored procedures in the traditional sense.
-This method is provided for DB-API 2.0 compliance but does not perform
+chDB/ClickHouse doesn't support stored procedures in the traditional sense.
+This method is provided for DB-API 2.0 compliance but doesn't perform
any actual operation. Use execute() for all SQL operations.
:::
:::warning Compatibility
This is a placeholder implementation. Traditional stored procedure
features like OUT/INOUT parameters, multiple result sets, and server
-variables are not supported by the underlying ClickHouse engine.
+variables aren't supported by the underlying ClickHouse engine.
:::
---
@@ -2146,7 +2146,7 @@ fetchall()
| Exception | Condition |
|--------------------------------------------------------|----------------------------------------|
-| [`ProgrammingError`](#chdb-dbapi-err-programmingerror) | If execute() has not been called first |
+| [`ProgrammingError`](#chdb-dbapi-err-programmingerror) | If execute() hasn't been called first |
:::warning Warning
This method can consume large amounts of memory for big result sets.
@@ -2189,7 +2189,7 @@ fetchmany(size=1)
| Exception | Condition |
|--------------------------------------------------------|----------------------------------------|
-| [`ProgrammingError`](#chdb-dbapi-err-programmingerror) | If execute() has not been called first |
+| [`ProgrammingError`](#chdb-dbapi-err-programmingerror) | If execute() hasn't been called first |
**Example**
@@ -2221,7 +2221,7 @@ fetchone()
| Exception | Condition |
|--------------------------------------------------------|----------------------------------------|
-| [`ProgrammingError`](#chdb-dbapi-err-programmingerror) | If `execute()` has not been called first |
+| [`ProgrammingError`](#chdb-dbapi-err-programmingerror) | If `execute()` hasn't been called first |
**Example**
@@ -2296,10 +2296,10 @@ nextset()
| Return Type | Description |
|--------------|---------------------------------------------------------------|
-| `None` | Always returns None as multiple result sets are not supported |
+| `None` | Always returns None as multiple result sets aren't supported |
:::note
-chDB/ClickHouse does not support multiple result sets from a single query.
+chDB/ClickHouse doesn't support multiple result sets from a single query.
This method is provided for DB-API 2.0 compliance but always returns None.
:::
@@ -2571,9 +2571,9 @@ Bases: [`DatabaseError`](#chdb-dbapi-err-databaseerror)
Exception raised when the database encounters an internal error.
This exception is raised when the database system encounters internal
-errors that are not caused by the application, such as:
+errors that aren't caused by the application, such as:
-- Invalid cursor state (cursor is not valid anymore)
+- Invalid cursor state (cursor isn't valid anymore)
- Transaction state inconsistencies (transaction is out of sync)
- Database corruption issues
- Internal data structure corruption
@@ -2602,10 +2602,10 @@ and may require database restart or repair operations.
Bases: [`DatabaseError`](#chdb-dbapi-err-databaseerror)
-Exception raised when a method or database API is not supported.
+Exception raised when a method or database API isn't supported.
This exception is raised when the application attempts to use database
-features or API methods that are not supported by the current database
+features or API methods that aren't supported by the current database
configuration or version, such as:
- Requesting `rollback()` on connections without transaction support
@@ -2647,7 +2647,7 @@ Bases: [`DatabaseError`](#chdb-dbapi-err-databaseerror)
Exception raised for errors that are related to the database’s operation.
This exception is raised for errors that occur during database operation
-and are not necessarily under the control of the programmer, including:
+and aren't necessarily under the control of the programmer, including:
- Unexpected disconnection from database
- Database server not found or unreachable
@@ -2787,7 +2787,7 @@ Convert a number or string to an integer, or return 0 if no arguments
are given. If x is a number, return x._\_int_\_(). For floating-point
numbers, this truncates towards zero.
-If x is not a number or if base is given, then x must be a string,
+If x isn't a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded
by whitespace. The base defaults to 10. Valid bases are 0 and 2-36.
@@ -3171,7 +3171,7 @@ chdb.udf.generate_udf(func_name, args, return_type, udf_body)
| `udf_body` | str | Python source code body of the UDF function |
:::note
-This function is typically called by the @chdb_udf decorator and should not
+This function is typically called by the @chdb_udf decorator and shouldn't
be called directly by users.
:::
@@ -3292,7 +3292,7 @@ Infers the most suitable data type for a list of values.
This function examines a list of values and determines the most appropriate
data type that can represent all the values in the list. It considers integer,
unsigned integer, decimal, and float types, and defaults to “string” if the
-values cannot be represented by any numeric type or if all values are None.
+values can't be represented by any numeric type or if all values are None.
**Syntax**
diff --git a/docs/cloud/_snippets/_clickpipes_faq.md b/docs/cloud/_snippets/_clickpipes_faq.md
index ba66e67e019..ac5fc4b72d2 100644
--- a/docs/cloud/_snippets/_clickpipes_faq.md
+++ b/docs/cloud/_snippets/_clickpipes_faq.md
@@ -95,7 +95,7 @@ $$
For object storage connectors (S3 and GCS),
-only the ClickPipes compute cost is incurred since the ClickPipes pod is not processing data
+only the ClickPipes compute cost is incurred since the ClickPipes pod isn't processing data
but only orchestrating the transfer which is operated by the underlying ClickHouse service:
$$
@@ -110,6 +110,6 @@ $$
The philosophy behind ClickPipes pricing is
to cover the operating costs of the platform while offering an easy and reliable way to move data to ClickHouse Cloud.
-From that angle, our market analysis revealed that we are positioned competitively.
+From that angle, our market analysis revealed that we're positioned competitively.
diff --git a/docs/cloud/features/01_cloud_tiers.md b/docs/cloud/features/01_cloud_tiers.md
index b419d36e6a8..8e68afc1202 100644
--- a/docs/cloud/features/01_cloud_tiers.md
+++ b/docs/cloud/features/01_cloud_tiers.md
@@ -162,10 +162,10 @@ This page discusses which tiers are right for your specific use case.
## Basic {#basic}
- Cost-effective option that supports single-replica deployments.
-- Ideal for departmental use cases with smaller data volumes that do not have hard reliability guarantees.
+- Ideal for departmental use cases with smaller data volumes that don't have hard reliability guarantees.
:::note
-Services in the basic tier are meant to be fixed in size and do not allow scaling, both automatic and manual.
+Services in the basic tier are meant to be fixed in size and don't allow scaling, both automatic and manual.
You can upgrade to the Scale or Enterprise tier to scale their services.
:::
diff --git a/docs/cloud/features/03_sql_console_features/01_sql-console.md b/docs/cloud/features/03_sql_console_features/01_sql-console.md
index b071ab74325..ddfc709b56f 100644
--- a/docs/cloud/features/03_sql_console_features/01_sql-console.md
+++ b/docs/cloud/features/03_sql_console_features/01_sql-console.md
@@ -114,7 +114,7 @@ The SQL console can convert your sorts and filters directly into queries with on
:::note
-Filters and sorts are not mandatory when using the 'Create Query' feature.
+Filters and sorts aren't mandatory when using the 'Create Query' feature.
:::
You can learn more about querying in the SQL console by reading the (link) query documentation.
@@ -227,7 +227,7 @@ After a query is executed, you can quickly search through the returned result se
-Note: Any field matching the inputted value will be returned. For example, the third record in the above screenshot does not match 'breakfast' in the `by` field, but the `text` field does:
+Note: Any field matching the inputted value will be returned. For example, the third record in the above screenshot doesn't match 'breakfast' in the `by` field, but the `text` field does:
diff --git a/docs/cloud/features/03_sql_console_features/03_query-endpoints.md b/docs/cloud/features/03_sql_console_features/03_query-endpoints.md
index 7d5328b4f5b..294143e0197 100644
--- a/docs/cloud/features/03_sql_console_features/03_query-endpoints.md
+++ b/docs/cloud/features/03_sql_console_features/03_query-endpoints.md
@@ -24,7 +24,7 @@ You'll be able to access API endpoints via HTTP to execute your saved queries wi
## IP Access Control {#ip-access-control}
-Query API endpoints respect API key-level IP whitelisting. Similar to the SQL Console, Query API endpoints proxy requests from within ClickHouse's infrastructure, so service-level IP whitelist settings do not apply.
+Query API endpoints respect API key-level IP whitelisting. Similar to the SQL Console, Query API endpoints proxy requests from within ClickHouse's infrastructure, so service-level IP whitelist settings don't apply.
To restrict which clients can call your Query API endpoints:
diff --git a/docs/cloud/features/04_infrastructure/automatic_scaling/01_auto_scaling.md b/docs/cloud/features/04_infrastructure/automatic_scaling/01_auto_scaling.md
index f91bcba47b2..cd0c0db7c37 100644
--- a/docs/cloud/features/04_infrastructure/automatic_scaling/01_auto_scaling.md
+++ b/docs/cloud/features/04_infrastructure/automatic_scaling/01_auto_scaling.md
@@ -35,7 +35,7 @@ For Enterprise tier services scaling works as follows:
- **Horizontal scaling**: Manual horizontal scaling will be available across all standard and custom profiles on the enterprise tier.
- **Vertical scaling**:
- Standard profiles (1:4) will support vertical autoscaling.
- - Custom profiles (`highMemory` and `highCPU`) do not support vertical autoscaling or manual vertical scaling. However, these services can be scaled vertically by contacting support.
+ - Custom profiles (`highMemory` and `highCPU`) don't support vertical autoscaling or manual vertical scaling. However, these services can be scaled vertically by contacting support.
:::note
Scaling in ClickHouse Cloud happens in what we call a ["Make Before Break" (MBB)](/cloud/features/mbb) approach.
@@ -43,7 +43,7 @@ This adds one or more replicas of the new size before removing the old replicas,
By eliminating the gap between removing existing replicas and adding new ones, MBB creates a more seamless and less disruptive scaling process.
It is especially beneficial in scale-up scenarios, where high resource utilization triggers the need for additional capacity, since removing replicas prematurely would only exacerbate the resource constraints.
As part of this approach, we wait up to an hour to let any existing queries complete on the older replicas before removing them.
-This balances the need for existing queries to complete, while at the same time ensuring that older replicas do not linger around for too long.
+This balances the need for existing queries to complete, while at the same time ensuring that older replicas don't linger around for too long.
:::
### Vertical auto scaling {#vertical-auto-scaling}
@@ -65,7 +65,7 @@ The **larger** of the CPU or memory recommendation is picked, and CPU and memory
The scaling of ClickHouse Cloud Scale or Enterprise services can be adjusted by organization members with the **Admin** role. To configure vertical autoscaling, go to the **Settings** tab for your service and adjust the minimum and maximum memory, along with CPU settings as shown below.
:::note
-Single replica services cannot be scaled for all tiers.
+Single replica services can't be scaled for all tiers.
:::
@@ -74,11 +74,11 @@ Set the **Maximum memory** for your replicas at a higher value than the **Minimu
You can also choose to set these values the same, essentially "pinning" the service to a specific configuration. Doing so will immediately force scaling to the desired size you picked.
-It's important to note that this will disable any auto scaling on the cluster, and your service will not be protected against increases in CPU or memory usage beyond these settings.
+It's important to note that this will disable any auto scaling on the cluster, and your service won't be protected against increases in CPU or memory usage beyond these settings.
:::note
For Enterprise tier services, standard 1:4 profiles will support vertical autoscaling.
-Custom profiles will not support vertical autoscaling or manual vertical scaling at launch.
+Custom profiles won't support vertical autoscaling or manual vertical scaling at launch.
However, these services can be scaled vertically by contacting support.
:::
@@ -88,7 +88,7 @@ However, these services can be scaled vertically by contacting support.
You can use ClickHouse Cloud [public APIs](https://clickhouse.com/docs/cloud/manage/api/swagger#/paths/~1v1~1organizations~1:organizationId~1services~1:serviceId~1scaling/patch) to scale your service by updating the scaling settings for the service or adjust the number of replicas from the cloud console.
-**Scale** and **Enterprise** tiers also support single-replica services. Services once scaled out, can be scaled back in to a minimum of a single replica. Note that single replica services have reduced availability and are not recommended for production usage.
+**Scale** and **Enterprise** tiers also support single-replica services. Services once scaled out, can be scaled back in to a minimum of a single replica. Note that single replica services have reduced availability and aren't recommended for production usage.
:::note
Services can scale horizontally to a maximum of 20 replicas. If you need additional replicas, please contact our support team.
@@ -121,12 +121,12 @@ Once the service has scaled, the metrics dashboard in the cloud console should s
## Automatic idling {#automatic-idling}
-In the **Settings** page, you can also choose whether or not to allow automatic idling of your service when it is inactive for a certain duration (i.e. when the service is not executing any user-submitted queries). Automatic idling reduces the cost of your service, as you are not billed for compute resources when the service is paused.
+In the **Settings** page, you can also choose whether or not to allow automatic idling of your service when it is inactive for a certain duration (i.e. when the service isn't executing any user-submitted queries). Automatic idling reduces the cost of your service, as you're not billed for compute resources when the service is paused.
### Adaptive Idling {#adaptive-idling}
ClickHouse Cloud implements adaptive idling to prevent disruptions while optimizing cost savings. The system evaluates several conditions before transitioning a service to idle. Adaptive idling overrides the idling duration setting when any of the below listed conditions are met:
-- When the number of parts exceeds the maximum idle parts threshold (default: 10,000), the service is not idled so that background maintenance can continue
-- When there are ongoing merge operations, the service is not idled until those merges complete to avoid interrupting critical data consolidation
+- When the number of parts exceeds the maximum idle parts threshold (default: 10,000), the service isn't idled so that background maintenance can continue
+- When there are ongoing merge operations, the service isn't idled until those merges complete to avoid interrupting critical data consolidation
- Additionally, the service also adapts idle timeouts based on server initialization time:
- If server initialization time is less than 15 minutes, no adaptive timeout is applied and the customer-configured default idle timeout is used
- If server initialization time is between 15 and 30 minutes, the idle timeout is set to 15 minutes
@@ -138,7 +138,7 @@ The service may enter an idle state where it suspends refreshes of [refreshable
:::
:::danger When not to use automatic idling
-Use automatic idling only if your use case can handle a delay before responding to queries, because when a service is paused, connections to the service will time out. Automatic idling is ideal for services that are used infrequently and where a delay can be tolerated. It is not recommended for services that power customer-facing features that are used frequently.
+Use automatic idling only if your use case can handle a delay before responding to queries, because when a service is paused, connections to the service will time out. Automatic idling is ideal for services that are used infrequently and where a delay can be tolerated. It isn't recommended for services that power customer-facing features that are used frequently.
:::
## Handling spikes in workload {#handling-bursty-workloads}
diff --git a/docs/cloud/features/04_infrastructure/automatic_scaling/02_make_before_break.md b/docs/cloud/features/04_infrastructure/automatic_scaling/02_make_before_break.md
index 7c20291e1e8..507779a3caa 100644
--- a/docs/cloud/features/04_infrastructure/automatic_scaling/02_make_before_break.md
+++ b/docs/cloud/features/04_infrastructure/automatic_scaling/02_make_before_break.md
@@ -16,7 +16,7 @@ In this approach, new replicas are added to the cluster before removing old repl
This is as opposed to the break-first approach, where old replicas would first be removed, before adding new ones.
The MBB approach has several benefits:
-* Since capacity is added to the cluster before removal, the **overall cluster capacity does not go down** unlike with the break-first approach. Of course, unplanned events such as node or disk failures etc. can still happen in a cloud environment.
+* Since capacity is added to the cluster before removal, the **overall cluster capacity doesn't go down** unlike with the break-first approach. Of course, unplanned events such as node or disk failures etc. can still happen in a cloud environment.
* This approach is especially useful in situations where the cluster is under heavy load as it **prevents existing replicas from being overloaded** as would happen with a break-first approach.
* Because replicas can be added quickly without having to wait to remove replicas first, this approach leads to a **faster, more responsive** scaling experience.
@@ -38,4 +38,4 @@ With MBB, there are some key behaviors that you need to be aware of:
ClickHouse Cloud has checks in place to restrict the number of replicas that a cluster might accumulate.
3. With MBB operations, system table data is kept for 30 days. This means every time an MBB operation happens on a cluster, 30 days worth of system table data is replicated from the old replicas to the new ones.
-If you are interested in learning more about the mechanics of MBB operations, please look at this [blog post](https://clickhouse.com/blog/make-before-break-faster-scaling-mechanics-for-clickhouse-cloud) from the ClickHouse engineering team.
+If you're interested in learning more about the mechanics of MBB operations, please look at this [blog post](https://clickhouse.com/blog/make-before-break-faster-scaling-mechanics-for-clickhouse-cloud) from the ClickHouse engineering team.
diff --git a/docs/cloud/features/04_infrastructure/replica-aware-routing.md b/docs/cloud/features/04_infrastructure/replica-aware-routing.md
index 425a4296301..acc85da412c 100644
--- a/docs/cloud/features/04_infrastructure/replica-aware-routing.md
+++ b/docs/cloud/features/04_infrastructure/replica-aware-routing.md
@@ -12,7 +12,7 @@ import PrivatePreviewBadge from '@theme/badges/PrivatePreviewBadge';
-Replica-aware routing (also known as sticky sessions, sticky routing, or session affinity) utilizes [Envoy proxy's ring hash load balancing](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers#ring-hash). The main purpose of replica-aware routing is to increase the chance of cache reuse. It does not guarantee isolation.
+Replica-aware routing (also known as sticky sessions, sticky routing, or session affinity) utilizes [Envoy proxy's ring hash load balancing](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers#ring-hash). The main purpose of replica-aware routing is to increase the chance of cache reuse. It doesn't guarantee isolation.
When enabling replica-aware routing for a service, we allow a wildcard subdomain on top of the service hostname. For a service with the host name `abcxyz123.us-west-2.aws.clickhouse.cloud`, you can use any hostname which matches `*.sticky.abcxyz123.us-west-2.aws.clickhouse.cloud` to visit the service:
@@ -28,11 +28,11 @@ Note the original hostname will still use `LEAST_CONNECTION` load balancing, whi
## Limitations of Replica-aware routing {#limitations-of-replica-aware-routing}
-### Replica-aware routing does not guarantee isolation {#replica-aware-routing-does-not-guarantee-isolation}
+### Replica-aware routing doesn't guarantee isolation {#replica-aware-routing-does-not-guarantee-isolation}
Any disruption to the service, e.g. server pod restarts (due to any reason like a version upgrade, crash, vertical scaling up, etc.), server scaled out / in, will cause a disruption to the routing hash ring. This will cause connections with the same hostname to land on a different server pod.
-### Replica-aware routing does not work out of the box with private link {#replica-aware-routing-does-not-work-out-of-the-box-with-private-link}
+### Replica-aware routing doesn't work out of the box with private link {#replica-aware-routing-does-not-work-out-of-the-box-with-private-link}
Customers need to manually add a DNS entry to make name resolution work for the new hostname pattern. It is possible that this can cause imbalance in the server load if customers use it incorrectly.
diff --git a/docs/cloud/features/04_infrastructure/shared-catalog.md b/docs/cloud/features/04_infrastructure/shared-catalog.md
index 19933c16a5b..b537ac5ce7d 100644
--- a/docs/cloud/features/04_infrastructure/shared-catalog.md
+++ b/docs/cloud/features/04_infrastructure/shared-catalog.md
@@ -32,14 +32,14 @@ All metadata and DDL query history in Shared Catalog is stored centrally in ZooK
## Shared database engine {#shared-database-engine}
-The **Shared database engine** works in conjunction with Shared Catalog to manage databases whose tables use **stateless table engines** such as `SharedMergeTree`. These table engines do not write persistent state to disk and are compatible with dynamic compute environments.
+The **Shared database engine** works in conjunction with Shared Catalog to manage databases whose tables use **stateless table engines** such as `SharedMergeTree`. These table engines don't write persistent state to disk and are compatible with dynamic compute environments.
Shared database engine builds on and improves the behavior of the Replicated database engine while offering additional guarantees and operational benefits.
### Key benefits {#key-benefits}
- **Atomic CREATE TABLE ... AS SELECT**
- Table creation and data insertion are executed atomically—either the entire operation completes, or the table is not created at all.
+ Table creation and data insertion are executed atomically—either the entire operation completes, or the table isn't created at all.
- **RENAME TABLE between databases**
Enables atomic movement of tables across databases:
@@ -58,7 +58,7 @@ Shared database engine builds on and improves the behavior of the Replicated dat
Unlike the Replicated database engine, which requires all replicas to be online to process a DROP query, Shared Catalog performs centralized metadata deletion. This allows operations to succeed even when some replicas are offline.
- **Automatic metadata replication**
- Shared Catalog ensures that database definitions are automatically replicated to all servers on startup. Operators do not need to manually configure or synchronize metadata on new instances.
+ Shared Catalog ensures that database definitions are automatically replicated to all servers on startup. Operators don't need to manually configure or synchronize metadata on new instances.
- **Centralized, versioned metadata state**
Shared Catalog stores a single source of truth in ZooKeeper. When a replica starts, it fetches the latest state and applies the diff to reach consistency. During query execution, the system can wait for other replicas to reach at least the required version of metadata to ensure correctness.
diff --git a/docs/cloud/features/04_infrastructure/shared-merge-tree.md b/docs/cloud/features/04_infrastructure/shared-merge-tree.md
index 06beced9d11..b49b5eece7e 100644
--- a/docs/cloud/features/04_infrastructure/shared-merge-tree.md
+++ b/docs/cloud/features/04_infrastructure/shared-merge-tree.md
@@ -101,17 +101,17 @@ ORDER BY key
Some settings behavior is significantly changed:
-- `insert_quorum` -- all inserts to SharedMergeTree are quorum inserts (written to shared storage) so this setting is not needed when using SharedMergeTree table engine.
-- `insert_quorum_parallel` -- all inserts to SharedMergeTree are quorum inserts (written to shared storage) so this setting is not needed when using SharedMergeTree table engine.
+- `insert_quorum` -- all inserts to SharedMergeTree are quorum inserts (written to shared storage) so this setting isn't needed when using SharedMergeTree table engine.
+- `insert_quorum_parallel` -- all inserts to SharedMergeTree are quorum inserts (written to shared storage) so this setting isn't needed when using SharedMergeTree table engine.
- `select_sequential_consistency` -- doesn't require quorum inserts, will trigger additional load to clickhouse-keeper on `SELECT` queries
## Consistency {#consistency}
SharedMergeTree provides better lightweight consistency than ReplicatedMergeTree. When inserting into SharedMergeTree, you don't need to provide settings such as `insert_quorum` or `insert_quorum_parallel`. Inserts are quorum inserts, meaning that the metadata will be stored in ClickHouse-Keeper, and the metadata is replicated to at least the quorum of ClickHouse-keepers. Each replica in your cluster will asynchronously fetch new information from ClickHouse-Keeper.
-Most of the time, you should not be using `select_sequential_consistency` or `SYSTEM SYNC REPLICA LIGHTWEIGHT`. The asynchronous replication should cover most scenarios and has very low latency. In the rare case where you absolutely need to prevent stale reads, follow these recommendations in order of preference:
+Most of the time, you shouldn't be using `select_sequential_consistency` or `SYSTEM SYNC REPLICA LIGHTWEIGHT`. The asynchronous replication should cover most scenarios and has very low latency. In the rare case where you absolutely need to prevent stale reads, follow these recommendations in order of preference:
-1. If you are executing your queries in the same session or the same node for your reads and writes, using `select_sequential_consistency` is not needed because your replica will already have the most recent metadata.
+1. If you're executing your queries in the same session or the same node for your reads and writes, using `select_sequential_consistency` isn't needed because your replica will already have the most recent metadata.
2. If you write to one replica and read from another, you can use `SYSTEM SYNC REPLICA LIGHTWEIGHT` to force the replica to fetch the metadata from ClickHouse-Keeper.
diff --git a/docs/cloud/features/04_infrastructure/warehouses.md b/docs/cloud/features/04_infrastructure/warehouses.md
index 6cb591f6bb4..7c842be1621 100644
--- a/docs/cloud/features/04_infrastructure/warehouses.md
+++ b/docs/cloud/features/04_infrastructure/warehouses.md
@@ -106,7 +106,7 @@ _Fig. 6 - Read-write and Read-only services in a warehouse_
:::note
1. Read-only services currently allow user management operations (create, drop, etc). This behavior may be changed in the future.
-2. Refreshable materialized views run **only** on read-write (RW) services in a warehouse. They are **not** executed on read-only (RO) services.
+2. Refreshable materialized views run **only** on read-write (RW) services in a warehouse. They're **not** executed on read-only (RO) services.
:::
@@ -116,22 +116,22 @@ Each service in a warehouse can be adjusted to your workload in terms of:
- Number of nodes (replicas). The primary service (the service that was created first in the warehouse) should have 2 or more nodes. Each secondary service can have 1 or more nodes.
- Size of nodes (replicas)
- If the service should scale automatically
-- If the service should be idled on inactivity (cannot be applied to the first service in the group - please see the **Limitations** section)
+- If the service should be idled on inactivity (can't be applied to the first service in the group - please see the **Limitations** section)
## Changes in behavior {#changes-in-behavior}
Once compute-compute is enabled for a service (at least one secondary service was created), the `clusterAllReplicas()` function call with the `default` cluster name will utilize only replicas from the service where it was called. That means, if there are two services connected to the same dataset, and `clusterAllReplicas(default, system, processes)` is called from service 1, only processes running on service 1 will be shown. If needed, you can still call `clusterAllReplicas('all_groups.default', system, processes)` for example to reach all replicas.
## Limitations {#limitations}
-1. **Primary service should always be up and should not be idled (limitation will be removed some time after GA).** During the private preview and some time after GA, the primary service (usually the existing service that you want to extend by adding other services) will be always up and will have the idling setting disabled. You will not be able to stop or idle the primary service if there is at least one secondary service. Once all secondary services are removed, you can stop or idle the original service again.
+1. **Primary service should always be up and shouldn't be idled (limitation will be removed some time after GA).** During the private preview and some time after GA, the primary service (usually the existing service that you want to extend by adding other services) will be always up and will have the idling setting disabled. You won't be able to stop or idle the primary service if there is at least one secondary service. Once all secondary services are removed, you can stop or idle the original service again.
-2. **Sometimes workloads cannot be isolated.** Though the goal is to give you an option to isolate database workloads from each other, there can be corner cases where one workload in one service will affect another service sharing the same data. These are quite rare situations that are mostly connected to OLTP-like workloads.
+2. **Sometimes workloads can't be isolated.** Though the goal is to give you an option to isolate database workloads from each other, there can be corner cases where one workload in one service will affect another service sharing the same data. These are quite rare situations that are mostly connected to OLTP-like workloads.
-3. **All read-write services are doing background merge operations.** When inserting data to ClickHouse, the database at first inserts the data to some staging partitions, and then performs merges in the background. These merges can consume memory and CPU resources. When two read-write services share the same storage, they both are performing background operations. That means that there can be a situation where there is an `INSERT` query in Service 1, but the merge operation is completed by Service 2. Note that read-only services do not execute background merges, thus they don't spend their resources on this operation.
+3. **All read-write services are doing background merge operations.** When inserting data to ClickHouse, the database at first inserts the data to some staging partitions, and then performs merges in the background. These merges can consume memory and CPU resources. When two read-write services share the same storage, they both are performing background operations. That means that there can be a situation where there is an `INSERT` query in Service 1, but the merge operation is completed by Service 2. Note that read-only services don't execute background merges, thus they don't spend their resources on this operation.
4. **All read-write services are performing S3Queue table engine insert operations.** When creating a S3Queue table on a RW service, all other RW services in the WH may perform reading data from S3 and writing data to the database.
-5. **Inserts in one read-write service can prevent another read-write service from idling if idling is enabled.** As a result, a second service performs background merge operations for the first service. These background operations can prevent the second service from going to sleep when idling. Once the background operations are finished, the service will be idled. Read-only services are not affected and will be idled without delay.
+5. **Inserts in one read-write service can prevent another read-write service from idling if idling is enabled.** As a result, a second service performs background merge operations for the first service. These background operations can prevent the second service from going to sleep when idling. Once the background operations are finished, the service will be idled. Read-only services aren't affected and will be idled without delay.
6. **CREATE/RENAME/DROP DATABASE queries could be blocked by idled/stopped services by default.** These queries can hang. To bypass this, you can run database management queries with `settings distributed_ddl_task_timeout=0` at the session or per query level. For example:
@@ -176,7 +176,7 @@ There are two ways to rename a warehouse:
### Deleting a warehouse {#deleting-a-warehouse}
-Deleting a warehouse means deleting all the compute services and the data (tables, views, users, etc.). This action cannot be undone.
+Deleting a warehouse means deleting all the compute services and the data (tables, views, users, etc.). This action can't be undone.
You can only delete a warehouse by deleting the first service created. To do this:
1. Delete all the services that were created in addition to the service that was created first;
diff --git a/docs/cloud/features/05_admin_features/api/openapi.md b/docs/cloud/features/05_admin_features/api/openapi.md
index 33bc4bc6d29..3f5832eb3cb 100644
--- a/docs/cloud/features/05_admin_features/api/openapi.md
+++ b/docs/cloud/features/05_admin_features/api/openapi.md
@@ -42,7 +42,7 @@ To use API keys with [Query API Endpoints](/cloud/get-started/query-endpoints),
-4. The next screen will display your Key ID and Key secret. Copy these values and put them somewhere safe, such as a vault. The values will not be displayed after you leave this screen.
+4. The next screen will display your Key ID and Key secret. Copy these values and put them somewhere safe, such as a vault. The values won't be displayed after you leave this screen.
@@ -55,7 +55,7 @@ $ KEY_SECRET=mykeysecret
$ curl --user $KEY_ID:$KEY_SECRET https://api.clickhouse.cloud/v1/organizations
```
-6. Returning to the **API Keys** page, you will see the key name, last four characters of the Key ID, permissions, status, expiration date, and creator. You are able to edit the key name, permissions, and expiration from this screen. Keys may also be disabled or deleted form this screen.
+6. Returning to the **API Keys** page, you will see the key name, last four characters of the Key ID, permissions, status, expiration date, and creator. You're able to edit the key name, permissions, and expiration from this screen. Keys may also be disabled or deleted form this screen.
:::note
Deleting an API key is a permanent action. Any services using the key will immediately lose access to ClickHouse Cloud.
diff --git a/docs/cloud/features/05_admin_features/api/postman.md b/docs/cloud/features/05_admin_features/api/postman.md
index 78536b3a3e8..0fb55d0db76 100644
--- a/docs/cloud/features/05_admin_features/api/postman.md
+++ b/docs/cloud/features/05_admin_features/api/postman.md
@@ -94,7 +94,7 @@ The Postman Application is available for use within a web browser or can be down
-* The returned results should deliver your organization details with "status": 200. (If you receive a "status" 400 with no organization information your configuration is not correct).
+* The returned results should deliver your organization details with "status": 200. (If you receive a "status" 400 with no organization information your configuration isn't correct).
@@ -113,7 +113,7 @@ The Postman Application is available for use within a web browser or can be down
-* The returned results should deliver your organization details with "status": 200. (If you receive a "status" 400 with no organization information your configuration is not correct).
+* The returned results should deliver your organization details with "status": 200. (If you receive a "status" 400 with no organization information your configuration isn't correct).
### Test "GET service details" {#test-get-service-details}
@@ -123,4 +123,4 @@ The Postman Application is available for use within a web browser or can be down
-* The returned results should deliver a list of your services and their details with "status": 200. (If you receive a "status" 400 with no service(s) information your configuration is not correct).
+* The returned results should deliver a list of your services and their details with "status": 200. (If you receive a "status" 400 with no service(s) information your configuration isn't correct).
diff --git a/docs/cloud/features/05_admin_features/upgrades.md b/docs/cloud/features/05_admin_features/upgrades.md
index d39de76b8ee..7386e54d5c0 100644
--- a/docs/cloud/features/05_admin_features/upgrades.md
+++ b/docs/cloud/features/05_admin_features/upgrades.md
@@ -20,24 +20,24 @@ import scheduled_upgrade_window from '@site/static/images/cloud/manage/scheduled
With ClickHouse Cloud you never have to worry about patching and upgrades. We roll out upgrades that include fixes, new features and performance improvements on a periodic basis. For the full list of what is new in ClickHouse refer to our [Cloud changelog](/whats-new/cloud).
:::note
-We are introducing a new upgrade mechanism, a concept we call "make before break" (or MBB). With this new approach, we add updated replica(s) before removing the old one(s) during the upgrade operation. This results in more seamless upgrades that are less disruptive to running workloads.
+We're introducing a new upgrade mechanism, a concept we call "make before break" (or MBB). With this new approach, we add updated replica(s) before removing the old one(s) during the upgrade operation. This results in more seamless upgrades that are less disruptive to running workloads.
-As part of this change, historical system table data will be retained for up to a maximum of 30 days as part of upgrade events. In addition, any system table data older than December 19, 2024, for services on AWS or GCP and older than January 14, 2025, for services on Azure will not be retained as part of the migration to the new organization tiers.
+As part of this change, historical system table data will be retained for up to a maximum of 30 days as part of upgrade events. In addition, any system table data older than December 19, 2024, for services on AWS or GCP and older than January 14, 2025, for services on Azure won't be retained as part of the migration to the new organization tiers.
:::
## Version compatibility {#version-compatibility}
When you create a service, the [`compatibility`](/operations/settings/settings#compatibility) setting is set to the most up-to-date ClickHouse version offered on ClickHouse Cloud at the time your service is initially provisioned.
-The `compatibility` setting allows you to use default values of settings from previous versions. When your service is upgraded to a new version, the version specified for the `compatibility` setting does not change. This means that default values for settings that existed when you first created your service will not change (unless you have already overridden those default values, in which case they will persist after the upgrade).
+The `compatibility` setting allows you to use default values of settings from previous versions. When your service is upgraded to a new version, the version specified for the `compatibility` setting doesn't change. This means that default values for settings that existed when you first created your service won't change (unless you have already overridden those default values, in which case they will persist after the upgrade).
-You cannot manage the service-level default `compatibility` setting for your service. You must [contact support](https://clickhouse.com/support/program) if you would like to change the version set for your service's default `compatibility` setting. However, you can override the `compatibility` setting at the user, role, profile, query, or session level using standard ClickHouse setting mechanisms such as `SET compatibility = '22.3'` in a session or `SETTINGS compatibility = '22.3'` in a query.
+You can't manage the service-level default `compatibility` setting for your service. You must [contact support](https://clickhouse.com/support/program) if you would like to change the version set for your service's default `compatibility` setting. However, you can override the `compatibility` setting at the user, role, profile, query, or session level using standard ClickHouse setting mechanisms such as `SET compatibility = '22.3'` in a session or `SETTINGS compatibility = '22.3'` in a query.
## Maintenance mode {#maintenance-mode}
At times, it may be necessary for us to update your service, which could require us to disable certain features such as scaling or idling. In rare cases, we may need to take action on a service that is experiencing issues and bring it back to a healthy state. During such maintenance, you will see a banner on the service page that says _"Maintenance in progress"_. You may still be able to use the service for queries during this time.
-You will not be charged for the time that the service is under maintenance. _Maintenance mode_ is a rare occurrence and should not be confused with regular service upgrades.
+You won't be charged for the time that the service is under maintenance. _Maintenance mode_ is a rare occurrence and shouldn't be confused with regular service upgrades.
## Release channels (upgrade schedule) {#release-channels-upgrade-schedule}
@@ -45,7 +45,7 @@ Users are able to specify the upgrade schedule for their ClickHouse Cloud servic
The three release channels are:
- The [**fast release channel**](#fast-release-channel-early-upgrades) for early access to upgrades.
-- The [**regular release channel**](#regular-release-channel) is the default, and upgrades on this channel start two weeks after the fast release channel upgrades. If your service on the Scale and Enterprise tier does not have a release channel set, it is on the regular release channel by default.
+- The [**regular release channel**](#regular-release-channel) is the default, and upgrades on this channel start two weeks after the fast release channel upgrades. If your service on the Scale and Enterprise tier doesn't have a release channel set, it is on the regular release channel by default.
- The [**slow release channel**](#slow-release-channel-deferred-upgrades) is for deferred release. Upgrades on this channel occur two weeks after the regular release channel upgrades.
:::note
@@ -75,11 +75,11 @@ You can modify the release schedule of the service in the Cloud console as shown
-This **Fast release** channel is suitable for testing new features in non-critical environments. **It is not recommended for production workloads with strict uptime and reliability requirements.**
+This **Fast release** channel is suitable for testing new features in non-critical environments. **It isn't recommended for production workloads with strict uptime and reliability requirements.**
### Regular release channel {#regular-release-channel}
-For all Scale and Enterprise tier services that do not have a release channel or an upgrade schedule configured, upgrades will be performed as a part of the Regular channel release. This is recommended for production environments.
+For all Scale and Enterprise tier services that don't have a release channel or an upgrade schedule configured, upgrades will be performed as a part of the Regular channel release. This is recommended for production environments.
Upgrades to the regular release channel are typically performed two weeks after the **Fast release channel**.
@@ -102,7 +102,7 @@ Specifically, services will:
:::note
You can change release channels at any time. However, in certain cases, the change will only apply to future releases.
- Moving to a faster channel will immediately upgrade your service. i.e. Slow to Regular, Regular to Fast
-- Moving to a slower channel will not downgrade your service and keep you on your current version until a newer one is available in that channel. i.e. Regular to Slow, Fast to Regular or Slow
+- Moving to a slower channel won't downgrade your service and keep you on your current version until a newer one is available in that channel. i.e. Regular to Slow, Fast to Regular or Slow
:::
## Scheduled upgrades {#scheduled-upgrades}
diff --git a/docs/cloud/features/07_monitoring/advanced_dashboard.md b/docs/cloud/features/07_monitoring/advanced_dashboard.md
index 9cb7cdabbc8..5ef6f41325c 100644
--- a/docs/cloud/features/07_monitoring/advanced_dashboard.md
+++ b/docs/cloud/features/07_monitoring/advanced_dashboard.md
@@ -60,7 +60,7 @@ edit this query by clicking on the pen icon.
The default charts in the Advanced Dashboard are designed to provide real-time
visibility into your ClickHouse system. Below is a list with descriptions for
-each chart. They are grouped into three categories to help you navigate them.
+each chart. They're grouped into three categories to help you navigate them.
### ClickHouse specific {#clickhouse-specific}
@@ -147,7 +147,7 @@ their impact on your deployment's overall performance.
A sudden change in resource consumption without a change in query throughput can
indicate more expensive queries being executed. Depending on the type of queries
-you are running, this can be expected, but spotting them from the advanced
+you're running, this can be expected, but spotting them from the advanced
dashboard is good.
Below is an example of CPU usage peaking without significantly changing the
diff --git a/docs/cloud/features/07_monitoring/notifications.md b/docs/cloud/features/07_monitoring/notifications.md
index f691c94a962..171e05675c9 100644
--- a/docs/cloud/features/07_monitoring/notifications.md
+++ b/docs/cloud/features/07_monitoring/notifications.md
@@ -15,7 +15,7 @@ import notifications_4 from '@site/static/images/cloud/manage/notifications-4.pn
ClickHouse Cloud sends notifications about critical events related to your service or organization. There are a few concepts to keep in mind to understand how notifications are sent and configured:
1. **Notification category**: Refers to groups of notifications such as billing notifications, service related notifications etc. Within each category, there are multiple notifications for which the delivery mode can be configured.
-2. **Notification severity**: Notification severity can be `info`, `warning`, or `critical` depending on how important a notification is. This is not configurable.
+2. **Notification severity**: Notification severity can be `info`, `warning`, or `critical` depending on how important a notification is. This isn't configurable.
3. **Notification channel**: Channel refers to the mode by which the notification is received such as UI, email, Slack etc. This is configurable for most notifications.
## Receiving notifications {#receiving-notifications}
@@ -39,7 +39,7 @@ To configure delivery for a specific notification, click on the pencil icon to m
:::note
-Certain **required** notifications such as **Payment failed** are not configurable.
+Certain **required** notifications such as **Payment failed** aren't configurable.
:::
## Supported notifications {#supported-notifications}
diff --git a/docs/cloud/features/07_monitoring/prometheus.md b/docs/cloud/features/07_monitoring/prometheus.md
index b5a682fd998..be67dd94cc3 100644
--- a/docs/cloud/features/07_monitoring/prometheus.md
+++ b/docs/cloud/features/07_monitoring/prometheus.md
@@ -150,7 +150,7 @@ ClickHouse Cloud provides a special metric `ClickHouse_ServiceInfo` which is a `
|full|Indicates that there were no errors during the last metrics scrape|
|partial|Indicates that there were some errors during the last metrics scrape and only `ClickHouse_ServiceInfo` metric was returned.|
-Requests to retrieve metrics will not resume an idled service. In the case that a service is in the `idle` state, only the `ClickHouse_ServiceInfo` metric will be returned.
+Requests to retrieve metrics won't resume an idled service. In the case that a service is in the `idle` state, only the `ClickHouse_ServiceInfo` metric will be returned.
For ClickPipes, there's a similar `ClickPipes_Info` metric `gauge` that in addition of the **Metric Labels** contains the following labels:
@@ -199,7 +199,7 @@ We provide instructions on using these options below, focusing on the details sp
- Login to your Grafana Cloud account
- Add a new connection by selecting the **Metrics Endpoint**
- Configure the Scrape URL to point to the Prometheus endpoint and use basic auth to configure your connection with the API key/secret
-- Test the connection to ensure you are able to connect
+- Test the connection to ensure you're able to connect
@@ -215,7 +215,7 @@ Once configured, you should see the metrics in the drop-down that you can select
### Grafana Cloud with Alloy {#grafana-cloud-with-alloy}
-If you are using Grafana Cloud, Alloy can be installed by navigating to the Alloy menu in Grafana and following the onscreen instructions:
+If you're using Grafana Cloud, Alloy can be installed by navigating to the Alloy menu in Grafana and following the onscreen instructions:
@@ -259,7 +259,7 @@ Note the `honor_labels` configuration parameter needs to be set to `true` for th
### Grafana self-managed with Alloy {#grafana-self-managed-with-alloy}
-Self-managed users of Grafana can find the instructions for installing the Alloy agent [here](https://grafana.com/docs/alloy/latest/get-started/install/). We assume users have configured Alloy to send Prometheus metrics to their desired destination. The `prometheus.scrape` component below causes Alloy to scrape the ClickHouse Cloud Endpoint. We assume `prometheus.remote_write` receives the scraped metrics. Adjust the `forward_to key` to the target destination if this does not exist.
+Self-managed users of Grafana can find the instructions for installing the Alloy agent [here](https://grafana.com/docs/alloy/latest/get-started/install/). We assume users have configured Alloy to send Prometheus metrics to their desired destination. The `prometheus.scrape` component below causes Alloy to scrape the ClickHouse Cloud Endpoint. We assume `prometheus.remote_write` receives the scraped metrics. Adjust the `forward_to key` to the target destination if this doesn't exist.
```yaml
prometheus.scrape "clickhouse_cloud" {
diff --git a/docs/cloud/features/08_backups.md b/docs/cloud/features/08_backups.md
index 22299a2bf01..b0b0804e8e2 100644
--- a/docs/cloud/features/08_backups.md
+++ b/docs/cloud/features/08_backups.md
@@ -48,7 +48,7 @@ ClickHouse Cloud allows you to configure the schedule for your backups for **Sca
:::note
The custom schedule will override the default backup policy in ClickHouse Cloud for your given service.
-In some rare scenarios, the backup scheduler will not respect the **Start Time** specified for backups. Specifically, this happens if there was a successful backup triggered < 24 hours from the time of the currently scheduled backup. This could happen due to a retry mechanism we have in place for backups. In such instances, the scheduler will skip over the backup for the current day, and will retry the backup the next day at the scheduled time.
+In some rare scenarios, the backup scheduler won't respect the **Start Time** specified for backups. Specifically, this happens if there was a successful backup triggered < 24 hours from the time of the currently scheduled backup. This could happen due to a retry mechanism we have in place for backups. In such instances, the scheduler will skip over the backup for the current day, and will retry the backup the next day at the scheduled time.
:::
See ["Configure backup schedules"](/cloud/manage/backups/configurable-backups) for steps to configure your backups.
@@ -70,7 +70,7 @@ Any usage where backups are being exported to a
different region in the same cloud provider will incur [data transfer](/cloud/manage/network-data-transfer)
charges.
-Currently, we do not support cross-cloud backups, nor backup / restore for services utilizing [Transparent Data Encryption (TDE)](/cloud/security/cmek#transparent-data-encryption-tde) or for regulated services.
+Currently, we don't support cross-cloud backups, nor backup / restore for services utilizing [Transparent Data Encryption (TDE)](/cloud/security/cmek#transparent-data-encryption-tde) or for regulated services.
:::
See ["Export backups to your own Cloud account"](/cloud/manage/backups/export-backups-to-own-cloud-account) for examples of how to take full and incremental backups to AWS, GCP, Azure object storage as well as how to restore from the backups.
@@ -95,7 +95,7 @@ You can use [SQL commands](/cloud/manage/backups/backup-restore-via-commands) to
:::warning
-ClickHouse Cloud will not manage the lifecycle of backups in customer buckets.
+ClickHouse Cloud won't manage the lifecycle of backups in customer buckets.
Customers are responsible for ensuring that backups in their bucket are managed appropriately for adhering to compliance standards as well as managing cost.
-If the backups are corrupted, they will not be able to be restored.
+If the backups are corrupted, they won't be able to be restored.
:::
diff --git a/docs/cloud/features/09_AI_ML/AI_chat_overview.md b/docs/cloud/features/09_AI_ML/AI_chat_overview.md
index 5b4528d66e9..68a554fbfd9 100644
--- a/docs/cloud/features/09_AI_ML/AI_chat_overview.md
+++ b/docs/cloud/features/09_AI_ML/AI_chat_overview.md
@@ -9,7 +9,7 @@ doc_type: 'reference'
# Ask AI agent in Cloud
The “Ask AI” agent is a turn-key experience that allows users to trigger complex analysis tasks on top of the data hosted in their ClickHouse Cloud service.
-Instead of writing SQL or navigating dashboards, users can describe what they are looking for in natural language.
+Instead of writing SQL or navigating dashboards, users can describe what they're looking for in natural language.
The assistant responds with generated queries, visualizations, or summaries, and can incorporate context like active tabs, saved queries, schema details, and dashboards to improve accuracy.
It’s designed to work as an embedded assistant, helping users move quickly from questions to insights, and from prompts to working dashboards or APIs.
diff --git a/docs/cloud/guides/AI_ML/AIChat/using_AI_chat.md b/docs/cloud/guides/AI_ML/AIChat/using_AI_chat.md
index 530291a83e0..e6a78d58ed9 100644
--- a/docs/cloud/guides/AI_ML/AIChat/using_AI_chat.md
+++ b/docs/cloud/guides/AI_ML/AIChat/using_AI_chat.md
@@ -41,8 +41,8 @@ import img_new_tab from '@site/static/images/use-cases/AI_ML/AIChat/7_open_in_ed
## Accept the data usage consent (first run) {#consent}
-1. On first use you are prompted with a consent dialog describing data handling and third‑party LLM sub-processors.
-2. Review and accept to proceed. If you decline, the panel will not open.
+1. On first use you're prompted with a consent dialog describing data handling and third‑party LLM sub-processors.
+2. Review and accept to proceed. If you decline, the panel won't open.
diff --git a/docs/cloud/guides/SQL_console/query-endpoints.md b/docs/cloud/guides/SQL_console/query-endpoints.md
index 32d50d29b61..2912ad75db7 100644
--- a/docs/cloud/guides/SQL_console/query-endpoints.md
+++ b/docs/cloud/guides/SQL_console/query-endpoints.md
@@ -213,7 +213,7 @@ The query was successfully executed.
|-------------|-------------|
| `400 Bad Request` | The request was malformed |
| `401 Unauthorized` | Missing authentication or insufficient permissions |
-| `404 Not Found` | The specified query endpoint was not found |
+| `404 Not Found` | The specified query endpoint wasn't found |
#### Error handling best practices {#error-handling-best-practices}
diff --git a/docs/cloud/guides/backups/01_review-and-restore-backups.md b/docs/cloud/guides/backups/01_review-and-restore-backups.md
index afe26de35ca..c13ecf01ca9 100644
--- a/docs/cloud/guides/backups/01_review-and-restore-backups.md
+++ b/docs/cloud/guides/backups/01_review-and-restore-backups.md
@@ -38,7 +38,7 @@ To understand the backup cost, you can view the backup cost per service from the
-Estimating the total cost for your backups requires you to set a schedule. We are also working on updating our [pricing calculator](https://clickhouse.com/pricing), so you can get a monthly cost estimate before setting a schedule. You will need to provide the following inputs in order to estimate the cost:
+Estimating the total cost for your backups requires you to set a schedule. We're also working on updating our [pricing calculator](https://clickhouse.com/pricing), so you can get a monthly cost estimate before setting a schedule. You will need to provide the following inputs in order to estimate the cost:
- Size of the full and incremental backups
- Desired frequency
- Desired retention
@@ -78,11 +78,11 @@ To use the new service, perform these steps:
### Migrate data from the **newly restored service** back to the **original service** {#migrate-data-from-the-newly-restored-service-back-to-the-original-service}
-Suppose you cannot work with the newly restored service for some reason, for example, if you still have users or applications that connect to the existing service. You may decide to migrate the newly restored data into the original service. The migration can be accomplished by following these steps:
+Suppose you can't work with the newly restored service for some reason, for example, if you still have users or applications that connect to the existing service. You may decide to migrate the newly restored data into the original service. The migration can be accomplished by following these steps:
**Allow remote access to the newly restored service**
-The new service should be restored from a backup with the same IP Allow List as the original service. This is required as connections will not be allowed to other ClickHouse Cloud services unless you had allowed access from **Anywhere**. Modify the allow list and allow access from **Anywhere** temporarily. See the [IP Access List](/cloud/security/setting-ip-filters) docs for details.
+The new service should be restored from a backup with the same IP Allow List as the original service. This is required as connections won't be allowed to other ClickHouse Cloud services unless you had allowed access from **Anywhere**. Modify the allow list and allow access from **Anywhere** temporarily. See the [IP Access List](/cloud/security/setting-ip-filters) docs for details.
**On the newly restored ClickHouse service (the system that hosts the restored data)**
@@ -146,7 +146,7 @@ The `UNDROP` command is supported in ClickHouse Cloud through [Shared Catalog](h
To prevent users from accidentally dropping tables, you can use [`GRANT` statements](/sql-reference/statements/grant) to revoke permissions for the [`DROP TABLE` command](/sql-reference/statements/drop#drop-table) for a specific user or role.
:::note
-To prevent accidental deletion of data, please note that by default it is not possible to drop tables >`1TB` in size in ClickHouse Cloud.
+To prevent accidental deletion of data, please note that by default it isn't possible to drop tables >`1TB` in size in ClickHouse Cloud.
Should you wish to drop tables greater than this threshold you can use setting `max_table_size_to_drop` to do so:
```sql
diff --git a/docs/cloud/guides/backups/03_bring_your_own_backup/01_export-backups-to-own-cloud-account.md b/docs/cloud/guides/backups/03_bring_your_own_backup/01_export-backups-to-own-cloud-account.md
index 43e3aa207de..98d3480e295 100644
--- a/docs/cloud/guides/backups/03_bring_your_own_backup/01_export-backups-to-own-cloud-account.md
+++ b/docs/cloud/guides/backups/03_bring_your_own_backup/01_export-backups-to-own-cloud-account.md
@@ -78,7 +78,7 @@ Where `uuid` is a unique identifier, used to differentiate a set of backups.
:::note
You will need to use a different UUID for each new backup in this subdirectory, otherwise you will get a `BACKUP_ALREADY_EXISTS` error.
-For example, if you are taking daily backups, you will need to use a new UUID each day.
+For example, if you're taking daily backups, you will need to use a new UUID each day.
:::
**Incremental Backup**
diff --git a/docs/cloud/guides/backups/03_bring_your_own_backup/02_backup_restore_from_ui.md b/docs/cloud/guides/backups/03_bring_your_own_backup/02_backup_restore_from_ui.md
index 3118611f9b6..709b27bbd71 100644
--- a/docs/cloud/guides/backups/03_bring_your_own_backup/02_backup_restore_from_ui.md
+++ b/docs/cloud/guides/backups/03_bring_your_own_backup/02_backup_restore_from_ui.md
@@ -182,7 +182,7 @@ If you move the backups to another location, you will need to customize the rest
For the Restore command you can also optionally add an `ASYNC` command at the end for large restores.
This allows the restores to happen asynchronously, so that if connection is lost, the restore keeps running.
It is important to note that the ASYNC command immediately returns a status of success.
-This does not mean the restore was successful.
+This doesn't mean the restore was successful.
You will need to monitor the `system.backups` table to see if the restore has finished and if it succeeded or failed.
:::
@@ -226,7 +226,7 @@ Generate an HMAC Key and Secret, which is required for password-based authentica
* c. Securely store the credentials:
* I. The system will display the Access ID (your HMAC key) and the Secret (your HMAC secret). Save these values, as
- the secret will not be displayed again after you close this window.
+ the secret won't be displayed again after you close this window.
@@ -296,7 +296,7 @@ If you move the backups to another location, you will need to customize the rest
For the Restore command you can also optionally add an `ASYNC` command at the end for large restores.
This allows the restores to happen asynchronously, so that if connection is lost, the restore keeps running.
It is important to note that the ASYNC command immediately returns a status of success.
-This does not mean the restore was successful.
+This doesn't mean the restore was successful.
You will need to monitor the `system.backups` table to see if the restore has finished and if it succeeded or failed.
:::
@@ -398,7 +398,7 @@ If you move the backups to another location, you will need to customize the rest
For the Restore command you can also optionally add an `ASYNC` command at the end for large restores.
This allows the restores to happen asynchronously, so that if connection is lost, the restore keeps running.
It is important to note that the ASYNC command immediately returns a status of success.
-This does not mean the restore was successful.
+This doesn't mean the restore was successful.
You will need to monitor the `system.backups` table to see if the restore has finished and if it succeeded or failed.
:::
diff --git a/docs/cloud/guides/backups/03_bring_your_own_backup/03_backup_restore_using_commands.md b/docs/cloud/guides/backups/03_bring_your_own_backup/03_backup_restore_using_commands.md
index e53f823331d..768432e60d9 100644
--- a/docs/cloud/guides/backups/03_bring_your_own_backup/03_backup_restore_using_commands.md
+++ b/docs/cloud/guides/backups/03_bring_your_own_backup/03_backup_restore_using_commands.md
@@ -67,7 +67,7 @@ Where `uuid` is a unique identifier, used to differentiate a set of backups.
:::note
You will need to use a different uuid for each new backup in this subdirectory, otherwise you will get a `BACKUP_ALREADY_EXISTS` error.
-For example, if you are taking daily backups, you will need to use a new uuid each day.
+For example, if you're taking daily backups, you will need to use a new uuid each day.
:::
@@ -100,7 +100,7 @@ Where `uuid` is a unique identifier, used to identify the backup.
:::note
You will need to use a different uuid for each new backup in this subdirectory, otherwise you will get a `BACKUP_ALREADY_EXISTS` error.
-For example, if you are taking daily backups, you will need to use a new uuid each day.
+For example, if you're taking daily backups, you will need to use a new uuid each day.
:::
@@ -133,7 +133,7 @@ Where `uuid` is a unique identifier, used to identify the backup.
:::note
You will need to use a different uuid for each new backup in this subdirectory, otherwise you will get a `BACKUP_ALREADY_EXISTS` error.
-For example, if you are taking daily backups, you will need to use a new uuid each day.
+For example, if you're taking daily backups, you will need to use a new uuid each day.
:::
@@ -195,7 +195,7 @@ FROM S3(
What happens to the backups in my cloud object storage? Are they cleaned up by ClickHouse at some point?
-We provide you the ability to export backups to your bucket, however, we do not clean up or delete any of the backups once written. You are responsible for managing the lifecycle of the backups in your bucket, including deleting, or archiving as needed, or moving to cheaper storage to optimize overall cost.
+We provide you the ability to export backups to your bucket, however, we don't clean up or delete any of the backups once written. You're responsible for managing the lifecycle of the backups in your bucket, including deleting, or archiving as needed, or moving to cheaper storage to optimize overall cost.
diff --git a/docs/cloud/guides/best_practices/usagelimits.md b/docs/cloud/guides/best_practices/usagelimits.md
index a3fbdc57853..e3bef7d4f55 100644
--- a/docs/cloud/guides/best_practices/usagelimits.md
+++ b/docs/cloud/guides/best_practices/usagelimits.md
@@ -14,7 +14,7 @@ Cloud enforces limits across several operational dimensions.
The details of these guardrails are listed below.
:::tip
-If you've run up against one of these guardrails, it's possible that you are
+If you've run up against one of these guardrails, it's possible that you're
implementing your use case in an unoptimized way. Contact our support team and
we will gladly help you refine your use case to avoid exceeding the guardrails
or look together at how we can increase them in a controlled manner.
diff --git a/docs/cloud/guides/cloud-compatibility.md b/docs/cloud/guides/cloud-compatibility.md
index 263300aac80..d9cec3a5ee4 100644
--- a/docs/cloud/guides/cloud-compatibility.md
+++ b/docs/cloud/guides/cloud-compatibility.md
@@ -12,11 +12,11 @@ doc_type: 'guide'
This guide provides an overview of what to expect functionally and operationally in ClickHouse Cloud. While ClickHouse Cloud is based on the open-source ClickHouse distribution, there may be some differences in architecture and implementation. You may find this blog on [how we built ClickHouse Cloud](https://clickhouse.com/blog/building-clickhouse-cloud-from-scratch-in-a-year) interesting and relevant to read as background.
## ClickHouse Cloud architecture {#clickhouse-cloud-architecture}
-ClickHouse Cloud significantly simplifies operational overhead and reduces the costs of running ClickHouse at scale. There is no need to size your deployment upfront, set up replication for high availability, manually shard your data, scale up your servers when your workload increases, or scale them down when you are not using them — we handle this for you.
+ClickHouse Cloud significantly simplifies operational overhead and reduces the costs of running ClickHouse at scale. There is no need to size your deployment upfront, set up replication for high availability, manually shard your data, scale up your servers when your workload increases, or scale them down when you're not using them — we handle this for you.
These benefits come as a result of architectural choices underlying ClickHouse Cloud:
-- Compute and storage are separated and thus can be automatically scaled along separate dimensions, so you do not have to over-provision either storage or compute in static instance configurations.
-- Tiered storage on top of object store and multi-level caching provides virtually limitless scaling and good price/performance ratio, so you do not have to size your storage partition upfront and worry about high storage costs.
+- Compute and storage are separated and thus can be automatically scaled along separate dimensions, so you don't have to over-provision either storage or compute in static instance configurations.
+- Tiered storage on top of object store and multi-level caching provides virtually limitless scaling and good price/performance ratio, so you don't have to size your storage partition upfront and worry about high storage costs.
- High availability is on by default and replication is transparently managed, so you can focus on building your applications or analyzing your data.
- Automatic scaling for variable continuous workloads is on by default, so you don't have to size your service upfront, scale up your servers when your workload increases, or manually scale down your servers when you have less activity
- Seamless hibernation for intermittent workloads is on by default. We automatically pause your compute resources after a period of inactivity and transparently start it again when a new query arrives, so you don't have to pay for idle resources.
@@ -27,7 +27,7 @@ ClickHouse Cloud provides access to a curated set of capabilities in the open so
### Database and table engines {#database-and-table-engines}
-ClickHouse Cloud provides a highly-available, replicated service by default. As a result, all database and table engines are "Replicated". You do not need to specify "Replicated"–for example, `ReplicatedMergeTree` and `MergeTree` are identical when used in ClickHouse Cloud.
+ClickHouse Cloud provides a highly-available, replicated service by default. As a result, all database and table engines are "Replicated". You don't need to specify "Replicated"–for example, `ReplicatedMergeTree` and `MergeTree` are identical when used in ClickHouse Cloud.
**Supported table engines**
@@ -77,7 +77,7 @@ We support federated ClickHouse queries for cross-cluster communication in the c
- PostgreSQL
- S3
-Federated queries with some external database and table engines, such as SQLite, ODBC, JDBC, Redis, HDFS and Hive are not yet supported.
+Federated queries with some external database and table engines, such as SQLite, ODBC, JDBC, Redis, HDFS and Hive aren't yet supported.
### User defined functions {#user-defined-functions}
@@ -86,13 +86,13 @@ User-defined functions in ClickHouse Cloud are in [private preview](https://clic
#### Settings behavior {#udf-settings-behavior}
:::warning Important
-UDFs in ClickHouse Cloud **do not inherit user-level settings**. They execute with default system settings.
+UDFs in ClickHouse Cloud **don't inherit user-level settings**. They execute with default system settings.
:::
This means:
-- Session-level settings (set via `SET` statement) are not propagated to UDF execution context
-- User profile settings are not inherited by UDFs
-- Query-level settings do not apply within UDF execution
+- Session-level settings (set via `SET` statement) aren't propagated to UDF execution context
+- User profile settings aren't inherited by UDFs
+- Query-level settings don't apply within UDF execution
### Experimental features {#experimental-features}
@@ -100,7 +100,7 @@ Experimental features are disabled in ClickHouse Cloud services to ensure the st
### Named collections {#named-collections}
-[Named collections](/operations/named-collections) are not currently supported in ClickHouse Cloud.
+[Named collections](/operations/named-collections) aren't currently supported in ClickHouse Cloud.
## Operational defaults and considerations {#operational-defaults-and-considerations}
The following are default settings for ClickHouse Cloud services. In some cases, these settings are fixed to ensure the correct operation of the service, and in others, they can be adjusted.
@@ -120,11 +120,11 @@ depending on the number of replicas configured.
Increased this setting from 50GB to allow for dropping of tables/partitions up to 1TB.
### System settings {#system-settings}
-ClickHouse Cloud is tuned for variable workloads, and for that reason most system settings are not configurable at this time. We do not anticipate the need to tune system settings for most users, but if you have a question about advanced system tuning, please contact ClickHouse Cloud Support.
+ClickHouse Cloud is tuned for variable workloads, and for that reason most system settings aren't configurable at this time. We don't anticipate the need to tune system settings for most users, but if you have a question about advanced system tuning, please contact ClickHouse Cloud Support.
### Advanced security administration {#advanced-security-administration}
-As part of creating the ClickHouse service, we create a default database, and the default user that has broad permissions to this database. This initial user can create additional users and assign their permissions to this database. Beyond this, the ability to enable the following security features within the database using Kerberos, LDAP, or SSL X.509 certificate authentication are not supported at this time.
+As part of creating the ClickHouse service, we create a default database, and the default user that has broad permissions to this database. This initial user can create additional users and assign their permissions to this database. Beyond this, the ability to enable the following security features within the database using Kerberos, LDAP, or SSL X.509 certificate authentication aren't supported at this time.
## Roadmap {#roadmap}
-We are evaluating demand for many other features in ClickHouse Cloud. If you have feedback and would like to ask for a specific feature, please [submit it here](https://console.clickhouse.cloud/support).
+We're evaluating demand for many other features in ClickHouse Cloud. If you have feedback and would like to ask for a specific feature, please [submit it here](https://console.clickhouse.cloud/support).
diff --git a/docs/cloud/guides/data_sources/01_cloud-endpoints-api.md b/docs/cloud/guides/data_sources/01_cloud-endpoints-api.md
index 4df7b512dfd..15192629ff7 100644
--- a/docs/cloud/guides/data_sources/01_cloud-endpoints-api.md
+++ b/docs/cloud/guides/data_sources/01_cloud-endpoints-api.md
@@ -15,7 +15,7 @@ import gcp_authorized_network from '@site/static/images/_snippets/gcp-authorized
If you need to fetch the list of static IPs, you can use the following ClickHouse Cloud API endpoint: [`https://api.clickhouse.cloud/static-ips.json`](https://api.clickhouse.cloud/static-ips.json). This API provides the endpoints for ClickHouse Cloud services, such as ingress/egress IPs and S3 endpoints per region and cloud.
-If you are using an integration like the MySQL or PostgreSQL Engine, it is possible that you need to authorize ClickHouse Cloud to access your instances. You can use this API to retrieve the public IPs and configure them in `firewalls` or `Authorized networks` in GCP or in `Security Groups` for Azure, AWS, or in any other infrastructure egress management system you are using.
+If you're using an integration like the MySQL or PostgreSQL Engine, it is possible that you need to authorize ClickHouse Cloud to access your instances. You can use this API to retrieve the public IPs and configure them in `firewalls` or `Authorized networks` in GCP or in `Security Groups` for Azure, AWS, or in any other infrastructure egress management system you're using.
For example, to allow access from a ClickHouse Cloud service hosted on AWS in the region `ap-south-1`, you can add the `egress_ips` addresses for that region:
diff --git a/docs/cloud/guides/data_sources/02_accessing-s3-data-securely.md b/docs/cloud/guides/data_sources/02_accessing-s3-data-securely.md
index c2feb0e09c3..a2749b2fe25 100644
--- a/docs/cloud/guides/data_sources/02_accessing-s3-data-securely.md
+++ b/docs/cloud/guides/data_sources/02_accessing-s3-data-securely.md
@@ -55,7 +55,7 @@ The IAM assume role can be setup in one of two ways:
4. Enter your bucket name in the input titled "Bucket Names". If your bucket URL is `https://ch-docs-s3-bucket.s3.eu-central-1.amazonaws.com/clickhouseS3/` then the bucket name is `ch-docs-s3-bucket`.
:::note
-Do not put the full bucket ARN but instead just the bucket name only.
+Don't put the full bucket ARN but instead just the bucket name only.
:::
5. Configure the CloudFormation stack. Below is additional information about these parameters.
@@ -145,7 +145,7 @@ DESCRIBE TABLE s3('https://s3.amazonaws.com/BUCKETNAME/BUCKETOBJECT.csv','CSVWit
```
Below is an example query that uses the `role_session_name` as a shared secret to query data from a bucket.
-If the `role_session_name` is not correct, this operation will fail.
+If the `role_session_name` isn't correct, this operation will fail.
```sql
DESCRIBE TABLE s3('https://s3.amazonaws.com/BUCKETNAME/BUCKETOBJECT.csv','CSVWithNames',extra_credentials(role_arn = 'arn:aws:iam::111111111111:role/ClickHouseAccessRole-001', role_session_name = 'secret-role-name'))
diff --git a/docs/cloud/guides/data_sources/03_accessing-gcs-data-securely.md b/docs/cloud/guides/data_sources/03_accessing-gcs-data-securely.md
index 99790f1f6b5..fe1a2fe50f0 100644
--- a/docs/cloud/guides/data_sources/03_accessing-gcs-data-securely.md
+++ b/docs/cloud/guides/data_sources/03_accessing-gcs-data-securely.md
@@ -142,7 +142,7 @@ Secret: nFy6DFRr4sM9OnV6BG4FtWVPR25JfqpmcdZ6w9nV
:::danger Important
Store these credentials securely.
-The secret cannot be retrieved again after this screen is closed.
+The secret can't be retrieved again after this screen is closed.
You will need to generate new keys if you lose the secret.
:::
@@ -185,7 +185,7 @@ When [setting up a GCS ClickPipe](/integrations/clickpipes/object-storage/gcs/ge
:::note
-Service account authentication is not currently supported - you must use HMAC keys
+Service account authentication isn't currently supported - you must use HMAC keys
The GCS bucket URL must use the format: `https://storage.googleapis.com//` (not `gs://`)
:::
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/01_overview.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/01_overview.md
index affcd7da678..9f8e8251a80 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/01_overview.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/01_overview.md
@@ -24,11 +24,11 @@ BYOC is designed specifically for large-scale deployments, and requires customer
**Supported Cloud Service Providers:**
* AWS (GA)
-* GCP (Private Preview). Please join the waitlist [here](https://clickhouse.com/cloud/bring-your-own-cloud) if you are interested.
-* Azure (Roadmap). Please join the waitlist [here](https://clickhouse.com/cloud/bring-your-own-cloud) if you are interested.
+* GCP (Private Preview). Please join the waitlist [here](https://clickhouse.com/cloud/bring-your-own-cloud) if you're interested.
+* Azure (Roadmap). Please join the waitlist [here](https://clickhouse.com/cloud/bring-your-own-cloud) if you're interested.
**Supported Cloud Regions:**
-All **public regions** listed in our [supported regions](https://clickhouse.com/docs/cloud/reference/supported-regions) documentation are available for BYOC deployments. Private regions are not currently supported.
+All **public regions** listed in our [supported regions](https://clickhouse.com/docs/cloud/reference/supported-regions) documentation are available for BYOC deployments. Private regions aren't currently supported.
## Features {#features}
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/02_architecture.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/02_architecture.md
index 85602229e67..af1c004134e 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/02_architecture.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/02_architecture.md
@@ -92,7 +92,7 @@ Together, these two components enable ClickHouse Cloud to:
All customer data remains within your cloud account and is never accessed or transmitted through these management channels.
**Additional recommendations and considerations:**
-- Ensure that network CIDR ranges for your BYOC VPC does not overlap with any existing VPCs you plan to peer with.
+- Ensure that network CIDR ranges for your BYOC VPC doesn't overlap with any existing VPCs you plan to peer with.
- Tag your resources clearly to simplify management and support.
- Plan for adequate subnet sizing and distribution across availability zones for high availability.
- Consult the [security playbook](https://clickhouse.com/docs/cloud/security/audit-logging/byoc-security-playbook) to understand shared responsibility and best practices when ClickHouse Cloud operates within your environment.
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/01_standard.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/01_standard.md
index bcbd0812e26..42e0694261c 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/01_standard.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/01_standard.md
@@ -46,7 +46,7 @@ The initial BYOC setup can be performed using either a [CloudFormation template(
:::note
-Storage buckets, VPC, Kubernetes cluster, and compute resources required for running ClickHouse are not included in this initial setup. They will be provisioned in the next step.
+Storage buckets, VPC, Kubernetes cluster, and compute resources required for running ClickHouse aren't included in this initial setup. They will be provisioned in the next step.
:::
#### Alternative Terraform Module for AWS {#terraform-module-aws}
@@ -62,11 +62,11 @@ module "clickhouse_onboarding" {
### Set up BYOC infrastructure {#setup-byoc-infrastructure}
-You will be prompted to set up the infrastructure, including S3 buckets, VPC, and the Kubernetes cluster, from the ClickHouse Cloud console. Certain configurations must be determined at this stage, as they cannot be changed later. Specifically:
+You will be prompted to set up the infrastructure, including S3 buckets, VPC, and the Kubernetes cluster, from the ClickHouse Cloud console. Certain configurations must be determined at this stage, as they can't be changed later. Specifically:
-- **Region**: All **public regions** listed in our [supported regions](https://clickhouse.com/docs/cloud/reference/supported-regions) documentation are available for BYOC deployments. Private regions are not currently supported.
+- **Region**: All **public regions** listed in our [supported regions](https://clickhouse.com/docs/cloud/reference/supported-regions) documentation are available for BYOC deployments. Private regions aren't currently supported.
-- **VPC CIDR range**: By default, we use `10.0.0.0/16` for the BYOC VPC CIDR range. If you plan to use VPC peering with another account, ensure the CIDR ranges do not overlap. Allocate a proper CIDR range for BYOC, with a minimum size of `/22` to accommodate necessary workloads.
+- **VPC CIDR range**: By default, we use `10.0.0.0/16` for the BYOC VPC CIDR range. If you plan to use VPC peering with another account, ensure the CIDR ranges don't overlap. Allocate a proper CIDR range for BYOC, with a minimum size of `/22` to accommodate necessary workloads.
- **Availability Zones**: If you plan to use VPC peering, aligning availability zones between the source and BYOC accounts can help reduce cross-AZ traffic costs. For example, in AWS, availability zone suffixes (`a`, `b`, `c`) may represent different physical zone IDs across accounts. See the [AWS guide](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/use-consistent-availability-zones-in-vpcs-across-different-aws-accounts.html) for details.
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/02_customization.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/02_customization.md
index 4773971c1a1..430b6cf536b 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/02_customization.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/02_customization.md
@@ -41,14 +41,14 @@ If your VPC doesn't already have an S3 Gateway Endpoint configured, you'll need
Your VPC must permit at least outbound internet access so that ClickHouse BYOC components can communicate with the Tailscale control plane. Tailscale is used to provide secure, zero-trust networking for private management operations. Initial registration and setup with Tailscale require public internet connectivity, which can be achieved either directly or via a NAT gateway. This connectivity is required to maintain both the privacy and security of your BYOC deployment.
**DNS Resolution**
-Ensure your VPC has working DNS resolution and does not block, interfere with, or overwrite standard DNS names. ClickHouse BYOC relies on DNS to resolve Tailscale control servers as well as ClickHouse service endpoints. If DNS is unavailable or misconfigured, BYOC services may fail to connect or operate properly.
+Ensure your VPC has working DNS resolution and doesn't block, interfere with, or overwrite standard DNS names. ClickHouse BYOC relies on DNS to resolve Tailscale control servers as well as ClickHouse service endpoints. If DNS is unavailable or misconfigured, BYOC services may fail to connect or operate properly.
### Configure your AWS account {#configure-aws-account}
To allow ClickHouse Cloud to deploy into your existing VPC, you need to grant the necessary IAM permissions within your AWS account. This is accomplished by launching a bootstrap CloudFormation stack or Terraform module, similar to the process used for standard onboarding.
1. Deploy the [CloudFormation template](https://s3.us-east-2.amazonaws.com/clickhouse-public-resources.clickhouse.cloud/cf-templates/byoc_v2.yaml) or [Terraform module](https://s3.us-east-2.amazonaws.com/clickhouse-public-resources.clickhouse.cloud/tf/byoc.tar.gz) to create the required IAM role.
-2. Set the `IncludeVPCWritePermissions` parameter to `false` to ensure ClickHouse Cloud does not receive permissions to modify your customer-managed VPC.
+2. Set the `IncludeVPCWritePermissions` parameter to `false` to ensure ClickHouse Cloud doesn't receive permissions to modify your customer-managed VPC.
3. This will create the `ClickHouseManagementRole` in your AWS account, granting ClickHouse Cloud only the minimum permissions needed to provision and manage your BYOC deployment.
:::note
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/05_configuration.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/05_configuration.md
index 3a8c89a3679..fcd103afa91 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/05_configuration.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/05_configuration.md
@@ -39,7 +39,7 @@ To set up the security group for your private load balancer:
**Contact ClickHouse Support** to request inbound security group rule changes that allow traffic from your specific source networks:
- **VPC Peering**: Request rules to permit traffic from your peered VPCs’ CIDR ranges.
-- **PrivateLink**: No security group changes required, as traffic is not governed by the load balancer's security group.
+- **PrivateLink**: No security group changes required, as traffic isn't governed by the load balancer's security group.
- **Other network setups**: Specify your scenario so support can assist accordingly.
:::note
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/06_observability.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/06_observability.md
index 22943d18ca9..5f2c2a4e76e 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/06_observability.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/06_observability.md
@@ -59,7 +59,7 @@ https://prometheus-internal....clickhouse-byoc.com
```
:::note
-The Prometheus stack URL is only accessible via private network connections and does not require authentication. Access is restricted to networks that can reach your BYOC VPC through VPC peering or other private connectivity options.
+The Prometheus stack URL is only accessible via private network connections and doesn't require authentication. Access is restricted to networks that can reach your BYOC VPC through VPC peering or other private connectivity options.
:::
### Integrating with Your Monitoring Tools {#prometheus-stack-integration}
@@ -97,7 +97,7 @@ scrape_configs:
## ClickHouse service Prometheus Integration {#direct-prometheus-integration}
-ClickHouse services expose a Prometheus-compatible metrics endpoint that you can scrape directly using your own Prometheus instance. This approach provides ClickHouse-specific metrics but does not include Kubernetes or supporting service metrics.
+ClickHouse services expose a Prometheus-compatible metrics endpoint that you can scrape directly using your own Prometheus instance. This approach provides ClickHouse-specific metrics but doesn't include Kubernetes or supporting service metrics.
### Accessing the Metrics Endpoint {#metrics-endpoint}
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/07_operations.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/07_operations.md
index 52f32be25a6..169704f7a0c 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/07_operations.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/07_operations.md
@@ -32,7 +32,7 @@ The Kubernetes cluster (EKS for AWS, GKE for GCP) that hosts your ClickHouse ser
### Cluster Upgrade Types {#cluster-upgrade-types}
-**Control Plane Upgrades**: The Kubernetes control plane components (API server, etcd, controller manager) are upgraded by ClickHouse Cloud. These upgrades are typically transparent to your workloads and do not require pod restarts.
+**Control Plane Upgrades**: The Kubernetes control plane components (API server, etcd, controller manager) are upgraded by ClickHouse Cloud. These upgrades are typically transparent to your workloads and don't require pod restarts.
**Node Group Upgrades**: Worker node upgrades require node replacement, which may impact running pods. ClickHouse Cloud coordinates these upgrades using a make-before-break approach to minimize disruption:
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/01_faq.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/01_faq.md
index ce18814f2f8..c3a46520f15 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/01_faq.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/01_faq.md
@@ -69,7 +69,7 @@ Contact support to schedule maintenance windows. Please expect a minimum of a we
How does storage communication work between BYOC VPC and S3?
-Traffic between your Customer BYOC VPC and S3 uses HTTPS (port 443) via the AWS S3 API for table data, backups, and logs. When using S3 VPC endpoints, this traffic remains within the AWS network and does not traverse the public internet.
+Traffic between your Customer BYOC VPC and S3 uses HTTPS (port 443) via the AWS S3 API for table data, backups, and logs. When using S3 VPC endpoints, this traffic remains within the AWS network and doesn't traverse the public internet.
@@ -88,6 +88,6 @@ Internal ClickHouse cluster communication within the Customer BYOC VPC uses:
Does ClickHouse offer an uptime SLA for BYOC?
-No, since the data plane is hosted in the customer's cloud environment, service availability depends on resources not in ClickHouse's control. Therefore, ClickHouse does not offer a formal uptime SLA for BYOC deployments. If you have additional questions, please contact support@clickhouse.com.
+No, since the data plane is hosted in the customer's cloud environment, service availability depends on resources not in ClickHouse's control. Therefore, ClickHouse doesn't offer a formal uptime SLA for BYOC deployments. If you have additional questions, please contact support@clickhouse.com.
diff --git a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/03_network_security.md b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/03_network_security.md
index 0bb59eba0d4..e0db46fd623 100644
--- a/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/03_network_security.md
+++ b/docs/cloud/guides/infrastructure/01_deployment_options/byoc/08_reference/03_network_security.md
@@ -150,7 +150,7 @@ By default, ingress is publicly accessible with IP allow list filtering. Custome
*Inbound, Private*
-ClickHouse Cloud engineers require troubleshooting access via Tailscale. They are provisioned with just-in-time certificate-based authentication for BYOC deployments.
+ClickHouse Cloud engineers require troubleshooting access via Tailscale. They're provisioned with just-in-time certificate-based authentication for BYOC deployments.
### Billing scraper {#billing-scraper}
diff --git a/docs/cloud/guides/production-readiness.md b/docs/cloud/guides/production-readiness.md
index 67a1aa50f25..49a8751a9f9 100644
--- a/docs/cloud/guides/production-readiness.md
+++ b/docs/cloud/guides/production-readiness.md
@@ -53,7 +53,7 @@ Establish separate environments to safely test changes before impacting producti
## Private networking {#private-networking}
-[Private networking](/cloud/security/connectivity/private-networking) in ClickHouse Cloud allows you to connect your ClickHouse services directly to your cloud virtual network, ensuring that data does not traverse the public internet. This is essential for organizations with strict security or compliance requirements, or for those running applications in private subnets.
+[Private networking](/cloud/security/connectivity/private-networking) in ClickHouse Cloud allows you to connect your ClickHouse services directly to your cloud virtual network, ensuring that data doesn't traverse the public internet. This is essential for organizations with strict security or compliance requirements, or for those running applications in private subnets.
ClickHouse Cloud supports private networking through the following mechanisms:
@@ -74,7 +74,7 @@ Moving from console-based user management to enterprise authentication integrati
[Social SSO](/cloud/security/manage-my-account): ClickHouse Cloud also supports social authentication providers (Google, Microsoft, GitHub) as an equally secure alternative to SAML SSO. Social SSO provides faster setup for organizations without existing SAML infrastructure while maintaining enterprise security standards.
:::note Important limitation
-Users authenticated through SAML or social SSO are assigned the "Member" role by default and must be manually granted additional roles by an admin after their first login. Group-to-role mapping and automatic role assignment are not currently supported.
+Users authenticated through SAML or social SSO are assigned the "Member" role by default and must be manually granted additional roles by an admin after their first login. Group-to-role mapping and automatic role assignment aren't currently supported.
:::
### Access control design {#access-control-design}
@@ -89,7 +89,7 @@ Configure quotas, limits, and settings profiles to manage resource usage for dif
### User lifecycle management limitations {#user-lifecycle-management}
-ClickHouse Cloud does not currently support SCIM or automated provisioning/deprovisioning via identity providers. Users must be manually removed from the ClickHouse Cloud console after being removed from your IdP. Plan for manual user management processes until these features become available.
+ClickHouse Cloud doesn't currently support SCIM or automated provisioning/deprovisioning via identity providers. Users must be manually removed from the ClickHouse Cloud console after being removed from your IdP. Plan for manual user management processes until these features become available.
Learn more about [Cloud Access Management](/cloud/security/cloud_access_management) and [SAML SSO setup](/cloud/security/saml-setup).
diff --git a/docs/cloud/guides/security/01_cloud_access_management/01_manage-my-account.md b/docs/cloud/guides/security/01_cloud_access_management/01_manage-my-account.md
index 78cf6597582..68510703896 100644
--- a/docs/cloud/guides/security/01_cloud_access_management/01_manage-my-account.md
+++ b/docs/cloud/guides/security/01_cloud_access_management/01_manage-my-account.md
@@ -13,7 +13,7 @@ import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge
You may use multiple methods to accept an invitation to join an organization. If this is your first invitation, select the appropriate authentication method for your organization below.
-If this is not your first organization, either sign in with your existing organization then accept the invitation from the lower left hand side of the page OR accept the invitation from your email and sign in with your existing account.
+If this isn't your first organization, either sign in with your existing organization then accept the invitation from the lower left hand side of the page OR accept the invitation from your email and sign in with your existing account.
:::note SAML Users
Organizations using SAML have a unique login per ClickHouse organization. Use the direct link provided by your administrator to log in.
@@ -54,7 +54,7 @@ Users with email + password or social authentication can further secure their ac
### Obtain a new recovery code {#obtain-recovery-code}
-If you previously enrolled in MFA and either did not create or misplaced your recovery code, follow these steps to get a new recovery code:
+If you previously enrolled in MFA and either didn't create or misplaced your recovery code, follow these steps to get a new recovery code:
1. Go to https://console.clickhouse.cloud
2. Sign in with your credentials and MFA
3. Go to your profile in the upper left corner
@@ -107,6 +107,6 @@ If you lost your MFA device or deleted your token, follow these steps to recover
If you lost your MFA device AND recovery code or you lost your MFA device and never obtained a recovery code, follow these steps to request a reset:
-**Submit a ticket**: If you are in an organization that has other administrative users, even if you are attempting to access a single user organization, ask a member of your organization assigned the Admin role to log into the organization and submit a support ticket to reset your MFA on your behalf. Once we verify the request is authenticated, we will reset your MFA and notify the Admin. Sign in as usual without MFA and go to your profile settings to enroll a new factor if you wish.
+**Submit a ticket**: If you're in an organization that has other administrative users, even if you're attempting to access a single user organization, ask a member of your organization assigned the Admin role to log into the organization and submit a support ticket to reset your MFA on your behalf. Once we verify the request is authenticated, we will reset your MFA and notify the Admin. Sign in as usual without MFA and go to your profile settings to enroll a new factor if you wish.
-**Reset via email**: If you are the only user in the organization, submit a support case via email (support@clickhouse.com) using the email address associated with your account. Once we verify the request is coming from the correct email, we will reset your MFA AND password. Access your email to access the password reset link. Set up a new password then go to your profile settings to enroll a new factor if you wish.
+**Reset via email**: If you're the only user in the organization, submit a support case via email (support@clickhouse.com) using the email address associated with your account. Once we verify the request is coming from the correct email, we will reset your MFA AND password. Access your email to access the password reset link. Set up a new password then go to your profile settings to enroll a new factor if you wish.
diff --git a/docs/cloud/guides/security/01_cloud_access_management/02_manage-cloud-users.md b/docs/cloud/guides/security/01_cloud_access_management/02_manage-cloud-users.md
index a07534c8eba..21a6c289db0 100644
--- a/docs/cloud/guides/security/01_cloud_access_management/02_manage-cloud-users.md
+++ b/docs/cloud/guides/security/01_cloud_access_management/02_manage-cloud-users.md
@@ -41,7 +41,7 @@ Users will receive an email from which they can join the organization. For more
If your organization is configured for [SAML SSO](/cloud/security/saml-setup) follow these steps to add users to your organization.
-1. Add users to your SAML application in your identity provider, the users will not appear in ClickHouse until they have logged in once
+1. Add users to your SAML application in your identity provider, the users won't appear in ClickHouse until they have logged in once
2. When the user logs in to ClickHouse Cloud they will automatically be assigned the `Member` role which may only log in and has no other access
3. Follow the instructions in the `Manage user role assignments` below to grant permissions
@@ -99,7 +99,7 @@ Save your changes with the `Save changes` button at the bottom of the tab:
## Remove a user {#remove-user}
:::note Remove SAML users
-SAML users that have been unassigned from the ClickHouse application in your identity provider are not able to log in to ClickHouse Cloud. The account is not removed from the console and will need to be manually removed.
+SAML users that have been unassigned from the ClickHouse application in your identity provider aren't able to log in to ClickHouse Cloud. The account isn't removed from the console and will need to be manually removed.
:::
Follow the steps below to remove a user.
diff --git a/docs/cloud/guides/security/01_cloud_access_management/04_manage-database-users.md b/docs/cloud/guides/security/01_cloud_access_management/04_manage-database-users.md
index 01e18d7349e..2711e8676ec 100644
--- a/docs/cloud/guides/security/01_cloud_access_management/04_manage-database-users.md
+++ b/docs/cloud/guides/security/01_cloud_access_management/04_manage-database-users.md
@@ -57,7 +57,7 @@ The user will be assigned the role associated with their email address whenever
Use the SHA256_hash method when [creating user accounts](/sql-reference/statements/create/user.md) to secure passwords. ClickHouse database passwords must contain a minimum of 12 characters and meet complexity requirements: upper case characters, lower case characters, numbers and/or special characters.
:::tip Generate passwords securely
-Since users with less than administrative privileges cannot set their own password, ask the user to hash their password using a generator
+Since users with less than administrative privileges can't set their own password, ask the user to hash their password using a generator
such as [this one](https://tools.keycdn.com/sha256-online-generator) before providing it to the admin to setup the account.
:::
@@ -84,7 +84,7 @@ Configure the following within the services and databases using the SQL [GRANT](
| Default | Full administrative access to services |
| Custom | Configure using the SQL [`GRANT`](/sql-reference/statements/grant) statement |
-- Database roles are additive. This means if a user is a member of two roles, the user has the most access granted to the two roles. They do not lose access by adding roles.
+- Database roles are additive. This means if a user is a member of two roles, the user has the most access granted to the two roles. They don't lose access by adding roles.
- Database roles can be granted to other roles, resulting in a hierarchical structure. Roles inherit all permissions of the roles for which it is a member.
- Database roles are unique per service and may be applied across multiple databases within the same service.
@@ -93,7 +93,7 @@ The illustration below shows the different ways a user could be granted permissi
### Initial settings {#initial-settings}
-Databases have an account named `default` that is added automatically and granted the default_role upon service creation. The user that creates the service is presented with the automatically generated, random password that is assigned to the `default` account when the service is created. The password is not shown after initial setup, but may be changed by any user with Service Admin permissions in the console at a later time. This account or an account with Service Admin privileges within the console may set up additional database users and roles at any time.
+Databases have an account named `default` that is added automatically and granted the default_role upon service creation. The user that creates the service is presented with the automatically generated, random password that is assigned to the `default` account when the service is created. The password isn't shown after initial setup, but may be changed by any user with Service Admin permissions in the console at a later time. This account or an account with Service Admin privileges within the console may set up additional database users and roles at any time.
:::note
To change the password assigned to the `default` account in the console, go to the Services menu on the left, access the service, go to the Settings tab and click the Reset password button.
@@ -106,7 +106,7 @@ We recommend creating a new user account associated with a person and granting t
GRANT default_role to userID;
```
-You can use a SHA256 hash generator or code function such as `hashlib` in Python to convert a 12+ character password with appropriate complexity to a SHA256 string to provide to the system administrator as the password. This ensures the administrator does not see or handle clear text passwords.
+You can use a SHA256 hash generator or code function such as `hashlib` in Python to convert a 12+ character password with appropriate complexity to a SHA256 string to provide to the system administrator as the password. This ensures the administrator doesn't see or handle clear text passwords.
### Database access listings with SQL console users {#database-access-listings-with-sql-console-users}
The following process can be used to generate a complete access listing across the SQL console and databases in your organization.
diff --git a/docs/cloud/guides/security/01_cloud_access_management/04_saml-sso-setup.md b/docs/cloud/guides/security/01_cloud_access_management/04_saml-sso-setup.md
index 7f27753d500..bafdfa5c2a8 100644
--- a/docs/cloud/guides/security/01_cloud_access_management/04_saml-sso-setup.md
+++ b/docs/cloud/guides/security/01_cloud_access_management/04_saml-sso-setup.md
@@ -24,7 +24,7 @@ import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge
ClickHouse Cloud supports single-sign on (SSO) via security assertion markup language (SAML). This enables you to sign in securely to your ClickHouse Cloud organization by authenticating with your identity provider (IdP).
-We currently support service provider-initiated SSO SSO, multiple organizations using separate connections, and just-in-time provisioning. We do not yet support a system for cross-domain identity management (SCIM) or attribute mapping.
+We currently support service provider-initiated SSO SSO, multiple organizations using separate connections, and just-in-time provisioning. We don't yet support a system for cross-domain identity management (SCIM) or attribute mapping.
Customers enabling SAML integrations can also designate the default role that will be assigned to new users and adjust session timeout settings.
@@ -58,7 +58,7 @@ Create an application within your identity provider and copy the values on the `
- [Configure Duo SAML](#configure-duo-saml)
:::tip
-ClickHouse does not support identity provider initiated sign-in. To make it easy for your users to access ClickHouse Cloud, set up a bookmark for your users using this sign-in URL format: `https://console.clickhouse.cloud/?connection={orgId}` where the `{orgID}` is your organization ID on the Organization details page.
+ClickHouse doesn't support identity provider initiated sign-in. To make it easy for your users to access ClickHouse Cloud, set up a bookmark for your users using this sign-in URL format: `https://console.clickhouse.cloud/?connection={orgId}` where the `{orgID}` is your organization ID on the Organization details page.
:::
@@ -106,7 +106,7 @@ Users configured with a different authentication method will be retained until a
To assign your first admin user via SAML:
1. Log out of [ClickHouse Cloud](https://console.clickhouse.cloud).
2. In your identity provider, assign the admin user to the ClickHouse application(s).
-3. Ask the user to log in via https://console.clickhouse.cloud/?connection={orgId} (shortcut URL). This may be via a bookmark you created in the prior steps. The user will not appear in ClickHouse Cloud until their first login.
+3. Ask the user to log in via https://console.clickhouse.cloud/?connection={orgId} (shortcut URL). This may be via a bookmark you created in the prior steps. The user won't appear in ClickHouse Cloud until their first login.
4. If the default SAML role is anything other than Admin, the user may need to log out and log back in with their original authentication method to update the new SAML user's role.
- For email + password accounts, please use `https://console.clickhouse.cloud/?with=email`.
- For social logins, please click the appropriate button (**Continue with Google** or **Continue with Microsoft**)
@@ -178,7 +178,7 @@ You will configure two App Integrations in Okta for each ClickHouse organization
3. Select SAML 2.0 and click Next.
- 4. Enter a name for your application and check the box next to **Do not display application icon to users** then click **Next**.
+ 4. Enter a name for your application and check the box next to **Don't display application icon to users** then click **Next**.
5. Use the following values to populate the SAML settings screen.
@@ -352,14 +352,14 @@ Security is our top priority when it comes to authentication. For this reason, w
- **We only process service provider-initiated authentication flows.** Users must navigate to `https://console.clickhouse.cloud` and enter an email address to be redirected to your identity provider. Instructions to add a bookmark application or shortcut are provided for your convenience so your users don't need to remember the URL.
-- **We do not automatically link SSO and non-SSO accounts.** You may see multiple accounts for your users in your ClickHouse user list even if they are using the same email address.
+- **We don't automatically link SSO and non-SSO accounts.** You may see multiple accounts for your users in your ClickHouse user list even if they're using the same email address.
## Troubleshooting Common Issues {#troubleshooting-common-issues}
| Error | Cause | Solution |
|:------|:------|:---------|
| There could be a misconfiguration in the system or a service outage | Identity provider initiated login | To resolve this error try using the direct link `https://console.clickhouse.cloud/?connection={organizationid}`. Follow the instructions for your identity provider above to make this the default login method for your users |
-| You are directed to your identity provider, then back to the login page | The identity provider does not have the email attribute mapping | Follow the instructions for your identity provider above to configure the user email attribute and log in again |
-| User is not assigned to this application | The user has not been assigned to the ClickHouse application in the identity provider | Assign the user to the application in the identity provider and log in again |
-| You have multiple ClickHouse organizations integrated with SAML SSO and you are always logged into the same organization, regardless of which link or tile you use | You are still logged in to the first organization | Log out, then log in to the other organization |
-| The URL briefly shows `access denied` | Your email domain does not match the domain we have configured | Reach out to support for assistance resolving this error |
+| You're directed to your identity provider, then back to the login page | The identity provider doesn't have the email attribute mapping | Follow the instructions for your identity provider above to configure the user email attribute and log in again |
+| User isn't assigned to this application | The user hasn't been assigned to the ClickHouse application in the identity provider | Assign the user to the application in the identity provider and log in again |
+| You have multiple ClickHouse organizations integrated with SAML SSO and you're always logged into the same organization, regardless of which link or tile you use | You're still logged in to the first organization | Log out, then log in to the other organization |
+| The URL briefly shows `access denied` | Your email domain doesn't match the domain we have configured | Reach out to support for assistance resolving this error |
diff --git a/docs/cloud/guides/security/01_cloud_access_management/05_saml-sso-removal.md b/docs/cloud/guides/security/01_cloud_access_management/05_saml-sso-removal.md
index 18243609746..fb07601fd8e 100644
--- a/docs/cloud/guides/security/01_cloud_access_management/05_saml-sso-removal.md
+++ b/docs/cloud/guides/security/01_cloud_access_management/05_saml-sso-removal.md
@@ -12,7 +12,7 @@ keywords: ['ClickHouse Cloud', 'SAML', 'SSO', 'single sign-on', 'IdP']
Customers may need to remove a SAML integration from an organization for reasons such as changing an identity provider. SAML users are separate identities from other user types. Follow the instructions below to switch to another authentication method.
:::warning
-This action cannot be undone. Removing a SAML integration will invalidate SAML users such that they cannot be recovered. Follow the instructions below carefully to ensure you retain access to the organization.
+This action can't be undone. Removing a SAML integration will invalidate SAML users such that they can't be recovered. Follow the instructions below carefully to ensure you retain access to the organization.
:::
## Before you begin {#before-you-begin}
@@ -46,7 +46,7 @@ Click the organization name on the bottom left, then select `Users and Roles`. F
Users should be fully logged out from any SAML connections before accepting the invitation. When accepting the invitation with Google or Microsoft social login, users should click the `Continue with Google` or `Continue with Microsoft` buttons. Users using email and password should go to https://console.clickhouse.cloud/?with=email to log in and accept the invitation.
:::note
-The best route to ensure users are not automatically redirected based on SAML configurations is to copy the link to accept the invitation and paste into a separate browser or private browsing/incognito session to accept the invitation.
+The best route to ensure users aren't automatically redirected based on SAML configurations is to copy the link to accept the invitation and paste into a separate browser or private browsing/incognito session to accept the invitation.
:::
### Save queries and dashboards {#save-queries-and-dashboards}
diff --git a/docs/cloud/guides/security/01_cloud_access_management/06_common-access-management-queries.md b/docs/cloud/guides/security/01_cloud_access_management/06_common-access-management-queries.md
index 26ff32401e6..08de7966af2 100644
--- a/docs/cloud/guides/security/01_cloud_access_management/06_common-access-management-queries.md
+++ b/docs/cloud/guides/security/01_cloud_access_management/06_common-access-management-queries.md
@@ -12,7 +12,7 @@ import CommonUserRolesContent from '@site/docs/_snippets/_users-and-roles-common
# Common access management queries
:::tip Self-managed
-If you are working with self-managed ClickHouse please see [SQL users and roles](/guides/sre/user-management/index.md).
+If you're working with self-managed ClickHouse please see [SQL users and roles](/guides/sre/user-management/index.md).
:::
This article shows the basics of defining SQL users and roles and applying those privileges and permissions to databases, tables, rows, and columns.
@@ -33,7 +33,7 @@ GRANT default_role TO clickhouse_admin;
```
:::note
-When using the SQL Console, your SQL statements will not be run as the `default` user. Instead, statements will be run as a user named `sql-console:${cloud_login_email}`, where `cloud_login_email` is the email of the user currently running the query.
+When using the SQL Console, your SQL statements won't be run as the `default` user. Instead, statements will be run as a user named `sql-console:${cloud_login_email}`, where `cloud_login_email` is the email of the user currently running the query.
These automatically generated SQL Console users have the `default` role.
:::
diff --git a/docs/cloud/guides/security/02_connectivity/01_setting-ip-filters.md b/docs/cloud/guides/security/02_connectivity/01_setting-ip-filters.md
index febb905b9fc..41290867d40 100644
--- a/docs/cloud/guides/security/02_connectivity/01_setting-ip-filters.md
+++ b/docs/cloud/guides/security/02_connectivity/01_setting-ip-filters.md
@@ -81,7 +81,7 @@ This screenshot shows an access list which allows traffic from a range of IP add
4. Switch to allow access from **Anywhere**
- This is not recommended, but it is allowed. We recommend that you expose an application built on top of ClickHouse to the public and restrict access to the back-end ClickHouse Cloud service.
+ This isn't recommended, but it is allowed. We recommend that you expose an application built on top of ClickHouse to the public and restrict access to the back-end ClickHouse Cloud service.
To apply the changes you made, you must click **Save**.
diff --git a/docs/cloud/guides/security/02_connectivity/private_networking/02_aws-privatelink.md b/docs/cloud/guides/security/02_connectivity/private_networking/02_aws-privatelink.md
index 0745308e5d8..28bf2756c46 100644
--- a/docs/cloud/guides/security/02_connectivity/private_networking/02_aws-privatelink.md
+++ b/docs/cloud/guides/security/02_connectivity/private_networking/02_aws-privatelink.md
@@ -69,7 +69,7 @@ Pricing considerations: AWS will charge users for cross region data transfer, se
Find Terraform examples [here](https://github.com/ClickHouse/terraform-provider-clickhouse/tree/main/examples/).
## Important considerations {#considerations}
-ClickHouse attempts to group your services to reuse the same published [service endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html#endpoint-service-overview) within the AWS region. However, this grouping is not guaranteed, especially if you spread your services across multiple ClickHouse organizations.
+ClickHouse attempts to group your services to reuse the same published [service endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html#endpoint-service-overview) within the AWS region. However, this grouping isn't guaranteed, especially if you spread your services across multiple ClickHouse organizations.
If you already have PrivateLink configured for other services in your ClickHouse organization, you can often skip most of the steps because of that grouping and proceed directly to the final step: Add ClickHouse "Endpoint ID" to ClickHouse service allow list.
## Prerequisites for this process {#prerequisites}
@@ -138,7 +138,7 @@ Make a note of the `endpointServiceId` and `privateDnsHostname` [move onto next
:::important
This section covers ClickHouse-specific details for configuring ClickHouse via AWS PrivateLink. AWS-specific steps are provided as a reference to guide you on where to look, but they may change over time without notice from the AWS cloud provider. Please consider AWS configuration based on your specific use case.
-Please note that ClickHouse is not responsible for configuring the required AWS VPC endpoints, security group rules or DNS records.
+Please note that ClickHouse isn't responsible for configuring the required AWS VPC endpoints, security group rules or DNS records.
If you previously enabled "private DNS names" while setting up PrivateLink and are experiencing difficulties configuring new services via PrivateLink, please contact ClickHouse support. For any other issues related to AWS configuration tasks, contact AWS Support directly.
:::
@@ -153,7 +153,7 @@ Select **Endpoint services that use NLBs and GWLBs** and use `Service name`
If you want to establish a cross-regional connection via PrivateLink, enable the "Cross region endpoint" checkbox and specify the service region. The service region is where the ClickHouse instance is running.
-If you get a "Service name could not be verified." error, please contact Customer Support to request adding new regions to the supported regions list.
+If you get a "Service name couldn't be verified." error, please contact Customer Support to request adding new regions to the supported regions list.
Next, select your VPC and subnets:
@@ -353,7 +353,7 @@ Please refer [here](#considerations)
### Connection reset by peer {#connection-reset-by-peer}
-- Most likely Endpoint ID was not added to service allow list, please visit [step](#add-endpoint-id-to-services-allow-list)
+- Most likely Endpoint ID wasn't added to service allow list, please visit [step](#add-endpoint-id-to-services-allow-list)
### Checking endpoint filters {#checking-endpoint-filters}
@@ -377,7 +377,7 @@ jq .result.privateEndpointIds
### Connecting to a remote database {#connecting-to-a-remote-database}
-Let's say you are trying to use [MySQL](/sql-reference/table-functions/mysql) or [PostgreSQL](/sql-reference/table-functions/postgresql) table functions in ClickHouse Cloud and connect to your database hosted in an Amazon Web Services (AWS) VPC. AWS PrivateLink cannot be used to enable this connection securely. PrivateLink is a one-way, unidirectional connection. It allows your internal network or Amazon VPC to connect securely to ClickHouse Cloud, but it does not allow ClickHouse Cloud to connect to your internal network.
+Let's say you're trying to use [MySQL](/sql-reference/table-functions/mysql) or [PostgreSQL](/sql-reference/table-functions/postgresql) table functions in ClickHouse Cloud and connect to your database hosted in an Amazon Web Services (AWS) VPC. AWS PrivateLink can't be used to enable this connection securely. PrivateLink is a one-way, unidirectional connection. It allows your internal network or Amazon VPC to connect securely to ClickHouse Cloud, but it doesn't allow ClickHouse Cloud to connect to your internal network.
According to the [AWS PrivateLink documentation](https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/aws-privatelink.html):
diff --git a/docs/cloud/guides/security/02_connectivity/private_networking/03_gcp-private-service-connect.md b/docs/cloud/guides/security/02_connectivity/private_networking/03_gcp-private-service-connect.md
index 80fe1c73e4a..e7f5a52621e 100644
--- a/docs/cloud/guides/security/02_connectivity/private_networking/03_gcp-private-service-connect.md
+++ b/docs/cloud/guides/security/02_connectivity/private_networking/03_gcp-private-service-connect.md
@@ -32,16 +32,16 @@ Service producers publish their applications to consumers by creating Private Se
:::important
-By default, a ClickHouse service is not available over a Private Service connection even if the PSC connection is approved and established; you need explicitly add the PSC ID to the allow list on an instance level by completing [step](#add-endpoint-id-to-services-allow-list) below.
+By default, a ClickHouse service isn't available over a Private Service connection even if the PSC connection is approved and established; you need explicitly add the PSC ID to the allow list on an instance level by completing [step](#add-endpoint-id-to-services-allow-list) below.
:::
**Important considerations for using Private Service Connect Global Access**:
1. Regions utilizing Global Access must belong to the same VPC.
1. Global Access must be explicitly enabled at the PSC level (refer to the screenshot below).
-1. Ensure that your firewall settings do not block access to PSC from other regions.
+1. Ensure that your firewall settings don't block access to PSC from other regions.
1. Be aware that you may incur GCP inter-region data transfer charges.
-Cross-region connectivity is not supported. The producer and consumer regions must be the same. However, you can connect from other regions within your VPC by enabling [Global Access](https://cloud.google.com/vpc/docs/about-accessing-vpc-hosted-services-endpoints#global-access) at the Private Service Connect (PSC) level.
+Cross-region connectivity isn't supported. The producer and consumer regions must be the same. However, you can connect from other regions within your VPC by enabling [Global Access](https://cloud.google.com/vpc/docs/about-accessing-vpc-hosted-services-endpoints#global-access) at the Private Service Connect (PSC) level.
**Please complete the following to enable GCP PSC**:
1. Obtain GCP service attachment for Private Service Connect.
@@ -50,7 +50,7 @@ Cross-region connectivity is not supported. The producer and consumer regions mu
1. Add "Endpoint ID" to ClickHouse service allow list.
## Attention {#attention}
-ClickHouse attempts to group your services to reuse the same published [PSC endpoint](https://cloud.google.com/vpc/docs/private-service-connect) within the GCP region. However, this grouping is not guaranteed, especially if you spread your services across multiple ClickHouse organizations.
+ClickHouse attempts to group your services to reuse the same published [PSC endpoint](https://cloud.google.com/vpc/docs/private-service-connect) within the GCP region. However, this grouping isn't guaranteed, especially if you spread your services across multiple ClickHouse organizations.
If you already have PSC configured for other services in your ClickHouse organization, you can often skip most of the steps because of that grouping and proceed directly to the final step: [Add "Endpoint ID" to ClickHouse service allow list](#add-endpoint-id-to-services-allow-list).
Find Terraform examples [here](https://github.com/ClickHouse/terraform-provider-clickhouse/tree/main/examples/).
@@ -65,7 +65,7 @@ Code examples are provided below to show how to set up Private Service Connect w
- GCP VPC in customer GCP project: `default`
:::
-You'll need to retrieve information about your ClickHouse Cloud service. You can do this either via the ClickHouse Cloud console or the ClickHouse API. If you are going to use the ClickHouse API, please set the following environment variables before proceeding:
+You'll need to retrieve information about your ClickHouse Cloud service. You can do this either via the ClickHouse Cloud console or the ClickHouse API. If you're going to use the ClickHouse API, please set the following environment variables before proceeding:
```shell
REGION=
@@ -122,7 +122,7 @@ Make a note of the `endpointServiceId` and `privateDnsHostname`. You'll use them
:::important
This section covers ClickHouse-specific details for configuring ClickHouse via GCP PSC(Private Service Connect). GCP-specific steps are provided as a reference to guide you on where to look, but they may change over time without notice from the GCP cloud provider. Please consider GCP configuration based on your specific use case.
-Please note that ClickHouse is not responsible for configuring the required GCP PSC endpoints, DNS records.
+Please note that ClickHouse isn't responsible for configuring the required GCP PSC endpoints, DNS records.
For any issues related to GCP configuration tasks, contact GCP Support directly.
:::
@@ -155,7 +155,7 @@ The **Status** column will change from **Pending** to **Accepted** once the conn
-Copy ***PSC Connection ID***, we are going to use it as ***Endpoint ID*** in the next steps.
+Copy ***PSC Connection ID***, we're going to use it as ***Endpoint ID*** in the next steps.
#### Option 2: Using Terraform {#option-2-using-terraform}
@@ -375,7 +375,7 @@ Address: 10.128.0.2
### Connection reset by peer {#connection-reset-by-peer}
-- Most likely, the Endpoint ID was not added to the service allow-list. Revisit the [_Add endpoint ID to services allow-list_ step](#add-endpoint-id-to-services-allow-list).
+- Most likely, the Endpoint ID wasn't added to the service allow-list. Revisit the [_Add endpoint ID to services allow-list_ step](#add-endpoint-id-to-services-allow-list).
### Test connectivity {#test-connectivity}
@@ -423,7 +423,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" -X GET -H "Content-Type: appl
### Connecting to a remote database {#connecting-to-a-remote-database}
-Let's say you are trying to use the [MySQL](/sql-reference/table-functions/mysql) or [PostgreSQL](/sql-reference/table-functions/postgresql) table functions in ClickHouse Cloud and connect to your database hosted in GCP. GCP PSC cannot be used to enable this connection securely. PSC is a one-way, unidirectional connection. It allows your internal network or GCP VPC to connect securely to ClickHouse Cloud, but it does not allow ClickHouse Cloud to connect to your internal network.
+Let's say you're trying to use the [MySQL](/sql-reference/table-functions/mysql) or [PostgreSQL](/sql-reference/table-functions/postgresql) table functions in ClickHouse Cloud and connect to your database hosted in GCP. GCP PSC can't be used to enable this connection securely. PSC is a one-way, unidirectional connection. It allows your internal network or GCP VPC to connect securely to ClickHouse Cloud, but it doesn't allow ClickHouse Cloud to connect to your internal network.
According to the [GCP Private Service Connect documentation](https://cloud.google.com/vpc/docs/private-service-connect):
diff --git a/docs/cloud/guides/security/02_connectivity/private_networking/04_azure-privatelink.md b/docs/cloud/guides/security/02_connectivity/private_networking/04_azure-privatelink.md
index 8ac50cb28f6..65b91fb0ce2 100644
--- a/docs/cloud/guides/security/02_connectivity/private_networking/04_azure-privatelink.md
+++ b/docs/cloud/guides/security/02_connectivity/private_networking/04_azure-privatelink.md
@@ -54,7 +54,7 @@ ClickHouse Cloud Azure PrivateLink has switched from using resourceGUID to Resou
:::
## Attention {#attention}
-ClickHouse attempts to group your services to reuse the same published [Private Link service](https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview) within the Azure region. However, this grouping is not guaranteed, especially if you spread your services across multiple ClickHouse organizations.
+ClickHouse attempts to group your services to reuse the same published [Private Link service](https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview) within the Azure region. However, this grouping isn't guaranteed, especially if you spread your services across multiple ClickHouse organizations.
If you already have Private Link configured for other services in your ClickHouse organization, you can often skip most of the steps because of that grouping and proceed directly to the final step: [Add the Private Endpoint Resource ID to your service(s) allow list](#add-private-endpoint-id-to-services-allow-list).
Find Terraform examples at the ClickHouse [Terraform Provider repository](https://github.com/ClickHouse/terraform-provider-clickhouse/tree/main/examples/).
@@ -109,7 +109,7 @@ Make a note of the `endpointServiceId`. You'll use it in the next step.
:::important
This section covers ClickHouse-specific details for configuring ClickHouse via Azure Private Link. Azure-specific steps are provided as a reference to guide you on where to look, but they may change over time without notice from the Azure cloud provider. Please consider Azure configuration based on your specific use case.
-Please note that ClickHouse is not responsible for configuring the required Azure private endpoints and DNS records.
+Please note that ClickHouse isn't responsible for configuring the required Azure private endpoints and DNS records.
For any issues related to Azure configuration tasks, contact Azure Support directly.
:::
@@ -371,7 +371,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" -X PATCH -H "Content-Type: ap
## Add the Private Endpoint Resource ID to your service(s) allow list {#add-private-endpoint-id-to-services-allow-list}
-By default, a ClickHouse Cloud service is not available over a Private Link connection even if the Private Link connection is approved and established. You need to explicitly add the Private Endpoint Resource ID for each service that should be available using Private Link.
+By default, a ClickHouse Cloud service isn't available over a Private Link connection even if the Private Link connection is approved and established. You need to explicitly add the Private Endpoint Resource ID for each service that should be available using Private Link.
### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console-2}
@@ -495,11 +495,11 @@ Address: 10.0.0.4
### Connection reset by peer {#connection-reset-by-peer}
-Most likely, the Private Endpoint Resource ID was not added to the service allow-list. Revisit the [_Add Private Endpoint Resource ID to your services allow-list_ step](#add-private-endpoint-id-to-services-allow-list).
+Most likely, the Private Endpoint Resource ID wasn't added to the service allow-list. Revisit the [_Add Private Endpoint Resource ID to your services allow-list_ step](#add-private-endpoint-id-to-services-allow-list).
### Private Endpoint is in pending state {#private-endpoint-is-in-pending-state}
-Most likely, the Private Endpoint Resource ID was not added to the service allow-list. Revisit the [_Add Private Endpoint Resource ID to your services allow-list_ step](#add-private-endpoint-id-to-services-allow-list).
+Most likely, the Private Endpoint Resource ID wasn't added to the service allow-list. Revisit the [_Add Private Endpoint Resource ID to your services allow-list_ step](#add-private-endpoint-id-to-services-allow-list).
### Test connectivity {#test-connectivity}
diff --git a/docs/cloud/guides/security/03_data-masking.md b/docs/cloud/guides/security/03_data-masking.md
index f4e418cee33..4d6ef94b7d9 100644
--- a/docs/cloud/guides/security/03_data-masking.md
+++ b/docs/cloud/guides/security/03_data-masking.md
@@ -77,7 +77,7 @@ In the query above `\3` is used to substitute the third capture group into the r
## Create masked `VIEW`s {#masked-views}
-A [`VIEW`](/sql-reference/statements/create/view) can be used in conjunction with the aforementioned string functions to apply transformations to columns containing sensitive data, before they are presented to the user.
+A [`VIEW`](/sql-reference/statements/create/view) can be used in conjunction with the aforementioned string functions to apply transformations to columns containing sensitive data, before they're presented to the user.
In this way, the original data remains unchanged, and users querying the view see only the masked data.
To demonstrate, let's imagine that we have a table which stores records of customer orders.
@@ -156,7 +156,7 @@ Next grant `SELECT` privileges on the view to the role:
GRANT SELECT ON masked_orders TO masked_orders_viewer;
```
-Because ClickHouse roles are additive, you must ensure that users who should only see the masked view do not have any `SELECT` privilege on the base table via any role.
+Because ClickHouse roles are additive, you must ensure that users who should only see the masked view don't have any `SELECT` privilege on the base table via any role.
As such, you should explicitly revoke base-table access to be safe:
@@ -264,7 +264,7 @@ GRANT masked_orders_viewer TO your_user;
In the case where you want to store only the masked data in the `orders` table,
you can mark the sensitive unmasked columns as [`EPHEMERAL`](/sql-reference/statements/create/table#ephemeral),
-which will ensure that columns of this type are not stored in the table.
+which will ensure that columns of this type aren't stored in the table.
```sql
DROP TABLE IF EXISTS orders;
@@ -320,10 +320,10 @@ ORDER BY user_id ASC
For users of ClickHouse OSS wishing to mask log data specifically, you can make use of [query masking rules](/operations/server-configuration-parameters/settings#query_masking_rules) (log masking) to mask data.
To do so, you can define regular expression-based masking rules in the server configuration.
-These rules are applied to queries and all log messages before they are stored in server logs or system tables (such as `system.query_log`, `system.text_log`, and `system.processes`).
+These rules are applied to queries and all log messages before they're stored in server logs or system tables (such as `system.query_log`, `system.text_log`, and `system.processes`).
This helps prevent sensitive data from leaking into **logs** only.
-Note that it does not mask data in query results.
+Note that it doesn't mask data in query results.
For example, to mask a social security number, you could add the following rule to your [server configuration](/operations/configuration-files):
diff --git a/docs/cloud/guides/security/04_cmek.md b/docs/cloud/guides/security/04_cmek.md
index 50002d7b164..ce6fb0dc270 100644
--- a/docs/cloud/guides/security/04_cmek.md
+++ b/docs/cloud/guides/security/04_cmek.md
@@ -30,7 +30,7 @@ Enhanced encryption is currently available in AWS and GCP services. Azure is com
### Transparent Data Encryption (TDE) {#transparent-data-encryption-tde}
-TDE must be enabled on service creation. Existing services cannot be encrypted after creation. Once TDE is enabled, it cannot be disabled. All data in the service will remain encrypted. If you want to disable TDE after it has been enabled, you must create a new service and migrate your data there.
+TDE must be enabled on service creation. Existing services can't be encrypted after creation. Once TDE is enabled, it can't be disabled. All data in the service will remain encrypted. If you want to disable TDE after it has been enabled, you must create a new service and migrate your data there.
1. Select `Create new service`
2. Name the service
diff --git a/docs/cloud/guides/security/05_audit_logging/02_database-audit-log.md b/docs/cloud/guides/security/05_audit_logging/02_database-audit-log.md
index c53fa881b64..fba8e753fe4 100644
--- a/docs/cloud/guides/security/05_audit_logging/02_database-audit-log.md
+++ b/docs/cloud/guides/security/05_audit_logging/02_database-audit-log.md
@@ -49,7 +49,7 @@ WHERE user=’compromised_account’
## Retaining log data within services {#reatining-log-data-within-services}
-Customers needing longer retention or log durability can use materialized views to achieve these objectives. For more information on materialized views, what they are, benefits and how to implement review our [materialized views](/materialized-views) videos and documentation.
+Customers needing longer retention or log durability can use materialized views to achieve these objectives. For more information on materialized views, what they're, benefits and how to implement review our [materialized views](/materialized-views) videos and documentation.
## Exporting logs {#exporting-logs}
diff --git a/docs/cloud/guides/security/05_cmek_migration.md b/docs/cloud/guides/security/05_cmek_migration.md
index 629a59972a1..ccfaa07e0a3 100644
--- a/docs/cloud/guides/security/05_cmek_migration.md
+++ b/docs/cloud/guides/security/05_cmek_migration.md
@@ -7,7 +7,7 @@ doc_type: 'guide'
keywords: ['ClickHouse Cloud', 'encryption', 'CMEK']
---
-We are improving the security of customer managed encryption keys (CMEK) services. All services are now configured with a unique AWS role per service to authorize using customer keys to encrypt and decrypt services. This new role is only shown in the service configuration screen.
+We're improving the security of customer managed encryption keys (CMEK) services. All services are now configured with a unique AWS role per service to authorize using customer keys to encrypt and decrypt services. This new role is only shown in the service configuration screen.
OpenAPI and Terraform are both supported for this new process. For more information, check out our docs ([Enhanced Encryption](/docs/cloud/security/cmek), [Cloud API](/docs/cloud/manage/api/api-overview), [Official Terraform Provider](https://registry.terraform.io/providers/ClickHouse/clickhouse/latest/docs)).
diff --git a/docs/cloud/managed-postgres/connection.md b/docs/cloud/managed-postgres/connection.md
index e596fe22f9d..e49169e7fb8 100644
--- a/docs/cloud/managed-postgres/connection.md
+++ b/docs/cloud/managed-postgres/connection.md
@@ -61,7 +61,7 @@ To use connection pooling, click the **via PgBouncer** toggle at the top of the
:::tip When to use PgBouncer
Use PgBouncer when your application opens many short-lived connections. For long-running connections or applications that use PostgreSQL features incompatible with connection pooling (like prepared statements across transactions), connect directly.
-Moving data to ClickHouse using ClickPipes is not supported via PgBouncer.
+Moving data to ClickHouse using ClickPipes isn't supported via PgBouncer.
:::
## TLS configuration {#tls}
@@ -82,7 +82,7 @@ For production workloads, we recommend connecting with verified TLS to ensure yo
-The CA certificate is unique to your Managed Postgres instance and will not work with other instances.
+The CA certificate is unique to your Managed Postgres instance and won't work with other instances.
To connect with a verified TLS connection, add `sslmode=verify-full` and the path to your downloaded certificate:
diff --git a/docs/cloud/managed-postgres/faq.md b/docs/cloud/managed-postgres/faq.md
index 6ab4a6def9a..db28d44b45b 100644
--- a/docs/cloud/managed-postgres/faq.md
+++ b/docs/cloud/managed-postgres/faq.md
@@ -33,7 +33,7 @@ For complete details on backup frequency, retention, and how to perform point-in
### Is Terraform support available for Managed Postgres? {#terraform-support}
-Terraform support for Managed Postgres is not currently available. We recommend using the ClickHouse Cloud console to create and manage your instances.
+Terraform support for Managed Postgres isn't currently available. We recommend using the ClickHouse Cloud console to create and manage your instances.
## Extensions and configuration {#extensions-and-configuration}
diff --git a/docs/cloud/managed-postgres/high-availability.md b/docs/cloud/managed-postgres/high-availability.md
index ff386ca6f87..b82af9afbbf 100644
--- a/docs/cloud/managed-postgres/high-availability.md
+++ b/docs/cloud/managed-postgres/high-availability.md
@@ -39,21 +39,21 @@ With this option, only a primary node is provisioned in your selected size. No s
Standbys and read replicas serve different purposes in Managed Postgres and are configured separately.
-**Standbys** are dedicated exclusively to high availability and automatic failover. They replicate data from the primary using streaming replication and are always ready to be promoted if the primary fails. Standbys are not exposed for read queries.
+**Standbys** are dedicated exclusively to high availability and automatic failover. They replicate data from the primary using streaming replication and are always ready to be promoted if the primary fails. Standbys aren't exposed for read queries.
**Read replicas** are designed for read scaling. They pull WAL (Write-Ahead Log) data from object storage and run in a separate network environment with their own connection endpoint. Read replicas allow you to offload read traffic from your primary without impacting HA guarantees.
### Why standbys don't serve read queries {#why-standbys-dont-serve-read-queries}
-While some database providers expose hot standbys for read-only queries, Managed Postgres intentionally does not. Allowing read queries on standbys can compromise their primary purpose: being ready to take over instantly when the primary fails.
+While some database providers expose hot standbys for read-only queries, Managed Postgres intentionally doesn't. Allowing read queries on standbys can compromise their primary purpose: being ready to take over instantly when the primary fails.
There are two main concerns:
1. **WAL replay competition**: Under write-heavy workloads, read queries on a standby compete with WAL replay for system resources. This competition can cause high replication lag, meaning the standby falls behind the primary. If a failover occurs while the standby is lagging, it won't have the most recent data and may not be ready to take over cleanly.
-2. **VACUUM interference**: Long-running read queries on a standby can prevent `VACUUM` (and `AUTOVACUUM`) from cleaning up dead tuples on the primary. PostgreSQL cannot remove rows that an active query on any replica might still need to access. This can lead to table bloat and degraded performance over time.
+2. **VACUUM interference**: Long-running read queries on a standby can prevent `VACUUM` (and `AUTOVACUUM`) from cleaning up dead tuples on the primary. PostgreSQL can't remove rows that an active query on any replica might still need to access. This can lead to table bloat and degraded performance over time.
-By keeping standbys dedicated to failover, Managed Postgres ensures they are always synchronized and ready to take over with minimal data loss and downtime. For read scaling, use [read replicas](/cloud/managed-postgres/read-replicas) instead.
+By keeping standbys dedicated to failover, Managed Postgres ensures they're always synchronized and ready to take over with minimal data loss and downtime. For read scaling, use [read replicas](/cloud/managed-postgres/read-replicas) instead.
## Handling failures {#handling-failures}
diff --git a/docs/cloud/managed-postgres/migrations/logical-replication.md b/docs/cloud/managed-postgres/migrations/logical-replication.md
index b30ecd90d15..e8aaed633d2 100644
--- a/docs/cloud/managed-postgres/migrations/logical-replication.md
+++ b/docs/cloud/managed-postgres/migrations/logical-replication.md
@@ -64,13 +64,13 @@ Here:
- Replace ``, ``, ``, ``, and `` with your source database credentials.
- `-s` specifies that we want a schema-only dump.
- `--format directory` specifies that we want the dump in a directory format, which is suitable for `pg_restore`.
-- `-f rds-dump` specifies the output directory for the dump files. Note that this directory will be created automatically and should not exist beforehand.
+- `-f rds-dump` specifies the output directory for the dump files. Note that this directory will be created automatically and shouldn't exist beforehand.
In our case, we have two tables - `events` and `users`. `events` has a million rows, and `users` has a thousand rows.
### Create a Managed Postgres instance {#migration-pgdump-pg-restore-create-pg}
-First, ensure you have a Managed Postgres instance set up, preferably in the same region as the source. You can follow the quick guide [here](../quickstart#create-postgres-database). Here's what we are going to spin up for this guide:
+First, ensure you have a Managed Postgres instance set up, preferably in the same region as the source. You can follow the quick guide [here](../quickstart#create-postgres-database). Here's what we're going to spin up for this guide:
## Restore the schema to ClickHouse Managed Postgres {#migration-logical-replication-restore-schema}
@@ -123,5 +123,5 @@ New rows inserted into the source database will now be replicated to the target
- Depending on your use case, you might want to set up monitoring and alerting for the replication process.
## Next steps {#migration-pgdump-pg-restore-next-steps}
-Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You are now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going:
+Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You're now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going:
- [Managed Postgres Quickstart Guide](../quickstart)
diff --git a/docs/cloud/managed-postgres/migrations/peerdb.md b/docs/cloud/managed-postgres/migrations/peerdb.md
index 3ad488eb87f..b628ae88226 100644
--- a/docs/cloud/managed-postgres/migrations/peerdb.md
+++ b/docs/cloud/managed-postgres/migrations/peerdb.md
@@ -29,8 +29,8 @@ This guide provides step-by-step instructions on how to migrate your PostgreSQL
## Considerations before migration {#migration-peerdb-considerations-before}
Before starting your migration, keep the following in mind:
-- **Database objects**: PeerDB will create tables automatically in the target database based on the source schema. However, certain database objects like indexes, constraints, and triggers will not be migrated automatically. You'll need to recreate these objects manually in the target database after the migration.
-- **DDL changes**: If you enable continuous replication, PeerDB will keep the target database in sync with the source for DML operations (INSERT, UPDATE, DELETE) and will propagate ADD COLUMN operations. However, other DDL changes (like DROP COLUMN, ALTER COLUMN) are not propagated automatically. More on schema changes support [here](/integrations/clickpipes/postgres/schema-changes)
+- **Database objects**: PeerDB will create tables automatically in the target database based on the source schema. However, certain database objects like indexes, constraints, and triggers won't be migrated automatically. You'll need to recreate these objects manually in the target database after the migration.
+- **DDL changes**: If you enable continuous replication, PeerDB will keep the target database in sync with the source for DML operations (INSERT, UPDATE, DELETE) and will propagate ADD COLUMN operations. However, other DDL changes (like DROP COLUMN, ALTER COLUMN) aren't propagated automatically. More on schema changes support [here](/integrations/clickpipes/postgres/schema-changes)
- **Network connectivity**: Ensure that both the source and target databases are reachable from the machine where PeerDB is running. You may need to configure firewall rules or security group settings to allow connectivity.
## Create peers {#migration-peerdb-create-peers}
@@ -74,12 +74,12 @@ If you click on the source peer, you can see a list of running commands which Pe
## Post-migration tasks {#migration-peerdb-considerations}
After the migration is complete:
-- **Recreate database objects**: Remember to manually recreate indexes, constraints, and triggers in the target database, as these are not migrated automatically.
+- **Recreate database objects**: Remember to manually recreate indexes, constraints, and triggers in the target database, as these aren't migrated automatically.
- **Test your application**: Make sure to test your application against the ClickHouse Managed Postgres instance to ensure everything is working as expected.
-- **Clean up resources**: Once you are satisfied with the migration and have switched your application to use ClickHouse Managed Postgres, you can delete the mirror and peers in PeerDB to clean up resources.
+- **Clean up resources**: Once you're satisfied with the migration and have switched your application to use ClickHouse Managed Postgres, you can delete the mirror and peers in PeerDB to clean up resources.
:::info Replication slots
-If you enabled continuous replication, PeerDB will create a **replication slot** on the source PostgreSQL database. Make sure to drop the replication slot manually from the source database after you are done with the migration to avoid unnecessary resource usage.
+If you enabled continuous replication, PeerDB will create a **replication slot** on the source PostgreSQL database. Make sure to drop the replication slot manually from the source database after you're done with the migration to avoid unnecessary resource usage.
:::
## References {#migration-peerdb-references}
@@ -88,5 +88,5 @@ If you enabled continuous replication, PeerDB will create a **replication slot**
- [Postgres ClickPipe FAQ (holds true for PeerDB as well)](../../../integrations/data-ingestion/clickpipes/postgres/faq.md)
## Next steps {#migration-pgdump-pg-restore-next-steps}
-Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You are now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going:
+Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You're now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going:
- [Managed Postgres Quickstart Guide](../quickstart)
diff --git a/docs/cloud/managed-postgres/migrations/pg_dump-pg_restore.md b/docs/cloud/managed-postgres/migrations/pg_dump-pg_restore.md
index f603a56ef1f..6845614c635 100644
--- a/docs/cloud/managed-postgres/migrations/pg_dump-pg_restore.md
+++ b/docs/cloud/managed-postgres/migrations/pg_dump-pg_restore.md
@@ -45,7 +45,7 @@ pg_dump \
Here:
- Replace ``, ``, ``, ``, and `` with your source database credentials. Most Postgres providers give you a connection string that you can use directly.
- `--format directory` specifies that we want the dump in a directory format, which is suitable for `pg_restore`.
-- `-f rds-dump` specifies the output directory for the dump files. Note that this directory will be created automatically and should not exist beforehand.
+- `-f rds-dump` specifies the output directory for the dump files. Note that this directory will be created automatically and shouldn't exist beforehand.
- You can also parallelize the dump process by adding the `--jobs` flag followed by the number of parallel jobs you want to run. For more details, refer to the [pg_dump documentation](https://www.postgresql.org/docs/current/app-pgdump.html).
:::tip
@@ -59,7 +59,7 @@ Here's what running this command looks like:
Now that we have the dump file, we can restore it to our ClickHouse Managed Postgres instance using `pg_restore`.
### Create a Managed Postgres instance {#migration-pgdump-pg-restore-create-pg}
-First, ensure you have a Managed Postgres instance set up, preferably in the same region as the source. You can follow the quick guide [here](../quickstart#create-postgres-database). Here's what we are going to spin up for this guide:
+First, ensure you have a Managed Postgres instance set up, preferably in the same region as the source. You can follow the quick guide [here](../quickstart#create-postgres-database). Here's what we're going to spin up for this guide:
### Restore the dump {#migration-pgdump-pg-restore-restore-dump}
@@ -90,9 +90,9 @@ We see that we have all our tables, indexes, views, and sequences intact, along
Using a pg_dump version older than the source server may lead to missing features or restore issues. Ideally, use the same or newer major version of pg_dump than the source database.
- Large databases may take a significant amount of time to dump and restore.
Plan accordingly to minimize downtime, and consider using parallel dumps/restores (--jobs) where supported.
-- Note that pg_dump / pg_restore do not replicate all database-related objects or runtime state.
+- Note that pg_dump / pg_restore don't replicate all database-related objects or runtime state.
These include roles and role memberships, replication slots, server-level configuration (e.g. postgresql.conf, pg_hba.conf), tablespaces, and runtime statistics.
## Next steps {#migration-pgdump-pg-restore-next-steps}
-Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You are now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going:
+Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You're now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going:
- [Managed Postgres Quickstart Guide](../quickstart)
diff --git a/docs/cloud/managed-postgres/security.md b/docs/cloud/managed-postgres/security.md
index ba7516584b9..1e7f88b03bb 100644
--- a/docs/cloud/managed-postgres/security.md
+++ b/docs/cloud/managed-postgres/security.md
@@ -61,7 +61,7 @@ Backups and Write-Ahead Log (WAL) archives stored in object storage are also enc
All backup data is stored in dedicated, isolated storage buckets with credentials scoped to each individual instance, ensuring that backup data remains secure and accessible only to authorized systems.
:::info
-Encryption at rest is enabled by default for all Managed Postgres instances and cannot be disabled. No additional configuration is required.
+Encryption at rest is enabled by default for all Managed Postgres instances and can't be disabled. No additional configuration is required.
:::
### Encryption in transit {#encryption-in-transit}
diff --git a/docs/cloud/onboard/01_discover/02_use_cases/03_data_warehousing.md b/docs/cloud/onboard/01_discover/02_use_cases/03_data_warehousing.md
index b2460585065..c90f8aa664d 100644
--- a/docs/cloud/onboard/01_discover/02_use_cases/03_data_warehousing.md
+++ b/docs/cloud/onboard/01_discover/02_use_cases/03_data_warehousing.md
@@ -15,7 +15,7 @@ import datalakehouse_01 from '@site/static/images/cloud/onboard/discover/use_cas
The data lakehouse is a convergent architecture that applies database principles
to data lake infrastructure while maintaining the flexibility and scale of cloud storage systems.
-The lakehouse is not just taking a database apart but building database-like
+The lakehouse isn't just taking a database apart but building database-like
capabilities onto a fundamentally different foundation (cloud object storage)
that focuses on supporting traditional analytics and modern AI/ML workloads in
a unified platform.
diff --git a/docs/cloud/onboard/01_discover/02_use_cases/04_machine_learning_and_genAI/01_machine_learning.md b/docs/cloud/onboard/01_discover/02_use_cases/04_machine_learning_and_genAI/01_machine_learning.md
index 5a0352debb8..c3f6a59291d 100644
--- a/docs/cloud/onboard/01_discover/02_use_cases/04_machine_learning_and_genAI/01_machine_learning.md
+++ b/docs/cloud/onboard/01_discover/02_use_cases/04_machine_learning_and_genAI/01_machine_learning.md
@@ -99,7 +99,7 @@ You can easily combine ClickHouse with data lakes, with built-in functions to qu
**Transformation engine** - SQL provides a natural means of declaring data transformations.
When extended with ClickHouse’s analytical and statistical functions, these transformations become succinct and optimized.
As well as applying to either ClickHouse tables, in cases where ClickHouse is used as a data store, table functions allow SQL queries to be written against data stored in formats such as Parquet, on-disk or object storage, or even other data stores such as Postgres and MySQL.
-A completely parallelization query execution engine, combined with a column-oriented storage format, allows ClickHouse to perform aggregations over PBs of data in seconds - unlike transformations on in memory data frames, users are not memory-bound.
+A completely parallelization query execution engine, combined with a column-oriented storage format, allows ClickHouse to perform aggregations over PBs of data in seconds - unlike transformations on in memory data frames, users aren't memory-bound.
Furthermore, materialized views allow data to be transformed at insert time, thus overloading compute to data load time from query time.
These views can exploit the same range of analytical and statistical functions ideal for data analysis and summarization.
Should any of ClickHouse’s existing analytical functions be insufficient or custom libraries need to be integrated, you can also utilize User Defined Functions (UDFs).
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/01_overview.md b/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/01_overview.md
index 9209765a154..e40f31edcaf 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/01_overview.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/01_overview.md
@@ -35,7 +35,7 @@ Real-time Change Data Capture (CDC) can be implemented in ClickHouse using [Clic
### Manual bulk load + periodic updates {#manual-bulk-load-periodic-updates}
-In some cases, a more straightforward approach like manual bulk loading followed by periodic updates may be sufficient. This strategy is ideal for one-time migrations or situations where real-time replication is not required. It involves loading data from PostgreSQL to ClickHouse in bulk, either through direct SQL `INSERT` commands or by exporting and importing CSV files. After the initial migration, you can periodically update the data in ClickHouse by syncing changes from PostgreSQL at regular intervals.
+In some cases, a more straightforward approach like manual bulk loading followed by periodic updates may be sufficient. This strategy is ideal for one-time migrations or situations where real-time replication isn't required. It involves loading data from PostgreSQL to ClickHouse in bulk, either through direct SQL `INSERT` commands or by exporting and importing CSV files. After the initial migration, you can periodically update the data in ClickHouse by syncing changes from PostgreSQL at regular intervals.
The bulk load process is simple and flexible but comes with the downside of no real-time updates. Once the initial data is in ClickHouse, updates won't be reflected immediately, so you must schedule periodic updates to sync the changes from PostgreSQL. This approach works well for less time-sensitive use cases, but it introduces a delay between when data changes in PostgreSQL and when those changes appear in ClickHouse.
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/appendix.md b/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/appendix.md
index 256f822f77a..0bd453f8e26 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/appendix.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/appendix.md
@@ -15,7 +15,7 @@ Users coming from OLTP systems who are used to ACID transactions should be aware
### Shards vs replicas {#shards-vs-replicas}
-Sharding and replication are two strategies used for scaling beyond one Postgres instance when storage and/or compute become a bottleneck to performance. Sharding in Postgres involves splitting a large database into smaller, more manageable pieces across multiple nodes. However, Postgres does not support sharding natively. Instead, sharding can be achieved using extensions such as [Citus](https://www.citusdata.com/), in which Postgres becomes a distributed database capable of scaling horizontally. This approach allows Postgres to handle higher transaction rates and larger datasets by spreading the load across several machines. Shards can be row or schema-based in order to provide flexibility for workload types, such as transactional or analytical. Sharding can introduce significant complexity in terms of data management and query execution as it requires coordination across multiple machines and consistency guarantees.
+Sharding and replication are two strategies used for scaling beyond one Postgres instance when storage and/or compute become a bottleneck to performance. Sharding in Postgres involves splitting a large database into smaller, more manageable pieces across multiple nodes. However, Postgres doesn't support sharding natively. Instead, sharding can be achieved using extensions such as [Citus](https://www.citusdata.com/), in which Postgres becomes a distributed database capable of scaling horizontally. This approach allows Postgres to handle higher transaction rates and larger datasets by spreading the load across several machines. Shards can be row or schema-based in order to provide flexibility for workload types, such as transactional or analytical. Sharding can introduce significant complexity in terms of data management and query execution as it requires coordination across multiple machines and consistency guarantees.
Unlike shards, replicas are additional Postgres instances that contain all or some of the data from the primary node. Replicas are used for various reasons, including enhanced read performance and HA (High Availability) scenarios. Physical replication is a native feature of Postgres that involves copying the entire database or significant portions to another server, including all databases, tables, and indexes. This involves streaming WAL segments from the primary node to replicas over TCP/IP. In contrast, logical replication is a higher level of abstraction that streams changes based on `INSERT`, `UPDATE`, and `DELETE` operations. Although the same outcomes may apply to physical replication, greater flexibility is enabled for targeting specific tables and operations, as well as data transformations and supporting different Postgres versions.
@@ -31,7 +31,7 @@ In summary, a replica is a copy of data that provides redundancy and reliability
## Eventual consistency {#eventual-consistency}
-ClickHouse uses ClickHouse Keeper (C++ ZooKeeper implementation, ZooKeeper can also be used) for managing its internal replication mechanism, focusing primarily on metadata storage and ensuring eventual consistency. Keeper is used to assign unique sequential numbers for each insert within a distributed environment. This is crucial for maintaining order and consistency across operations. This framework also handles background operations such as merges and mutations, ensuring that the work for these is distributed while guaranteeing they are executed in the same order across all replicas. In addition to metadata, Keeper functions as a comprehensive control center for replication, including tracking checksums for stored data parts, and acts as a distributed notification system among replicas.
+ClickHouse uses ClickHouse Keeper (C++ ZooKeeper implementation, ZooKeeper can also be used) for managing its internal replication mechanism, focusing primarily on metadata storage and ensuring eventual consistency. Keeper is used to assign unique sequential numbers for each insert within a distributed environment. This is crucial for maintaining order and consistency across operations. This framework also handles background operations such as merges and mutations, ensuring that the work for these is distributed while guaranteeing they're executed in the same order across all replicas. In addition to metadata, Keeper functions as a comprehensive control center for replication, including tracking checksums for stored data parts, and acts as a distributed notification system among replicas.
The replication process in ClickHouse (1) starts when data is inserted into any replica. This data, in its raw insert form, is (2) written to disk along with its checksums. Once written, the replica (3) attempts to register this new data part in Keeper by allocating a unique block number and logging the new part's details. Other replicas, upon (4) detecting new entries in the replication log, (5) download the corresponding data part via an internal HTTP protocol, verifying it against the checksums listed in ZooKeeper. This method ensures that all replicas eventually hold consistent and up-to-date data despite varying processing speeds or potential delays. Moreover, the system is capable of handling multiple operations concurrently, optimizing data management processes, and allowing for system scalability and robustness against hardware discrepancies.
@@ -57,13 +57,13 @@ Several options exist for increasing the consistency of reads should this be req
To overcome some of the limitations of eventual consistency, you can ensure clients are routed to the same replicas. This is useful in cases where multiple users are querying ClickHouse and results should be deterministic across requests. While results may differ, as new data inserted, the same replicas should be queried ensuring a consistent view.
-This can be achieved through several approaches depending on your architecture and whether you are using ClickHouse OSS or ClickHouse Cloud.
+This can be achieved through several approaches depending on your architecture and whether you're using ClickHouse OSS or ClickHouse Cloud.
## ClickHouse Cloud {#clickhouse-cloud}
ClickHouse Cloud uses a single copy of data backed in S3 with multiple compute replicas. The data is available to each replica node which has a local SSD cache. To ensure consistent results, users therefore need to only ensure consistent routing to the same node.
-Communication to the nodes of a ClickHouse Cloud service occurs through a proxy. HTTP and Native protocol connections will be routed to the same node for the period on which they are held open. In the case of HTTP 1.1 connections from most clients, this depends on the Keep-Alive window. This can be configured on most clients e.g. Node Js. This also requires a server side configuration, which will be higher than the client and is set to 10s in ClickHouse Cloud.
+Communication to the nodes of a ClickHouse Cloud service occurs through a proxy. HTTP and Native protocol connections will be routed to the same node for the period on which they're held open. In the case of HTTP 1.1 connections from most clients, this depends on the Keep-Alive window. This can be configured on most clients e.g. Node Js. This also requires a server side configuration, which will be higher than the client and is set to 10s in ClickHouse Cloud.
To ensure consistent routing across connections e.g. if using a connection pool or if connections expire, you can either ensure the same connection is used (easier for native) or request the exposure of sticky endpoints. This provides a set of endpoints for each node in the cluster, thus allowing clients to ensure queries are deterministically routed.
@@ -71,7 +71,7 @@ To ensure consistent routing across connections e.g. if using a connection pool
## ClickHouse OSS {#clickhouse-oss}
-To achieve this behavior in OSS depends on your shard and replica topology and if you are using a [Distributed table](/engines/table-engines/special/distributed) for querying.
+To achieve this behavior in OSS depends on your shard and replica topology and if you're using a [Distributed table](/engines/table-engines/special/distributed) for querying.
When you have only one shard and replicas (common since ClickHouse vertically scales), users select the node at the client layer and query a replica directly, ensuring this is deterministically selected.
@@ -79,7 +79,7 @@ While topologies with multiple shards and replicas are possible without a distri
In this case, you should ensure consistent node routing is performed based on a property e.g. `session_id` or `user_id`. The settings [`prefer_localhost_replica=0`](/operations/settings/settings#prefer_localhost_replica), [`load_balancing=in_order`](/operations/settings/settings#load_balancing) should be [set in the query](/operations/settings/query-level). This will ensure any local replicas of shards are preferred, with replicas preferred as listed in the configuration otherwise - provided they have the same number of errors - failover will occur with random selection if errors are higher. [`load_balancing=nearest_hostname`](/operations/settings/settings#load_balancing) can also be used as an alternative for this deterministic shard selection.
-> When creating a Distributed table, you will specify a cluster. This cluster definition, specified in config.xml, will list the shards (and their replicas) - thus allowing users to control the order in which they are used from each node. Using this, you can ensure selection is deterministic.
+> When creating a Distributed table, you will specify a cluster. This cluster definition, specified in config.xml, will list the shards (and their replicas) - thus allowing users to control the order in which they're used from each node. Using this, you can ensure selection is deterministic.
## Sequential consistency {#sequential-consistency}
@@ -90,7 +90,7 @@ Sequential consistency in databases is where the operations on a database appear
From a user's perspective this typically manifests itself as the need to write data into ClickHouse and when reading data, to guarantee that the latest inserted rows are returned.
This can be achieved in several ways (in order of preference):
-1. **Read/Write to the same node** - If you are using native protocol, or a [session to do your write/read via HTTP](/interfaces/http#default-database), you should then be connected to the same replica: in this scenario you're reading directly from the node where you're writing, then your read will always be consistent.
+1. **Read/Write to the same node** - If you're using native protocol, or a [session to do your write/read via HTTP](/interfaces/http#default-database), you should then be connected to the same replica: in this scenario you're reading directly from the node where you're writing, then your read will always be consistent.
1. **Sync replicas manually** - If you write to one replica and read from another, you can use issue `SYSTEM SYNC REPLICA LIGHTWEIGHT` prior to reading.
1. **Enable sequential consistency** - via the query setting [`select_sequential_consistency = 1`](/operations/settings/settings#select_sequential_consistency). In OSS, the setting `insert_quorum = 'auto'` must also be specified.
@@ -109,7 +109,7 @@ These properties are common for OLTP databases that act as a source of truth.
While powerful, this comes with inherent limitations and makes PB scales challenging. ClickHouse compromises on these properties in order to provide fast analytical queries at scale while sustaining high write throughput.
-ClickHouse provides ACID properties under [limited configurations](/guides/developer/transactional) - most simply when using a non-replicated instance of the MergeTree table engine with one partition. You should not expect these properties outside of these cases and ensure these are not a requirement.
+ClickHouse provides ACID properties under [limited configurations](/guides/developer/transactional) - most simply when using a non-replicated instance of the MergeTree table engine with one partition. You shouldn't expect these properties outside of these cases and ensure these aren't a requirement.
## Compression {#compression}
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/migration_guide/03_migration_guide_part3.md b/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/migration_guide/03_migration_guide_part3.md
index 900d48a5a4f..b0551ec4f3c 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/migration_guide/03_migration_guide_part3.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/02_postgres/migration_guide/03_migration_guide_part3.md
@@ -20,14 +20,14 @@ We recommend users migrating from Postgres read [the guide for modeling data in
## Primary (Ordering) Keys in ClickHouse {#primary-ordering-keys-in-clickhouse}
-Users coming from OLTP databases often look for the equivalent concept in ClickHouse. On noticing that ClickHouse supports a `PRIMARY KEY` syntax, users might be tempted to define their table schema using the same keys as their source OLTP database. This is not appropriate.
+Users coming from OLTP databases often look for the equivalent concept in ClickHouse. On noticing that ClickHouse supports a `PRIMARY KEY` syntax, users might be tempted to define their table schema using the same keys as their source OLTP database. This isn't appropriate.
### How are ClickHouse Primary keys different? {#how-are-clickhouse-primary-keys-different}
-To understand why using your OLTP primary key in ClickHouse is not appropriate, you should understand the basics of ClickHouse indexing. We use Postgres as an example comparison, but these general concepts apply to other OLTP databases.
+To understand why using your OLTP primary key in ClickHouse isn't appropriate, you should understand the basics of ClickHouse indexing. We use Postgres as an example comparison, but these general concepts apply to other OLTP databases.
- Postgres primary keys are, by definition, unique per row. The use of [B-tree structures](/guides/best-practices/sparse-primary-indexes#an-index-design-for-massive-data-scales) allows the efficient lookup of single rows by this key. While ClickHouse can be optimized for the lookup of a single row value, analytics workloads will typically require the reading of a few columns but for many rows. Filters will more often need to identify **a subset of rows** on which an aggregation will be performed.
-- Memory and disk efficiency are paramount to the scale at which ClickHouse is often used. Data is written to ClickHouse tables in chunks known as parts, with rules applied for merging the parts in the background. In ClickHouse, each part has its own primary index. When parts are merged, the merged part's primary indexes are also merged. Unlike Postgres, these indexes are not built for each row. Instead, the primary index for a part has one index entry per group of rows - this technique is called **sparse indexing**.
+- Memory and disk efficiency are paramount to the scale at which ClickHouse is often used. Data is written to ClickHouse tables in chunks known as parts, with rules applied for merging the parts in the background. In ClickHouse, each part has its own primary index. When parts are merged, the merged part's primary indexes are also merged. Unlike Postgres, these indexes aren't built for each row. Instead, the primary index for a part has one index entry per group of rows - this technique is called **sparse indexing**.
- **Sparse indexing** is possible because ClickHouse stores the rows for a part on disk ordered by a specified key. Instead of directly locating single rows (like a B-Tree-based index), the sparse primary index allows it to quickly (via a binary search over index entries) identify groups of rows that could possibly match the query. The located groups of potentially matching rows are then, in parallel, streamed into the ClickHouse engine in order to find the matches. This index design allows for the primary index to be small (it completely fits into the main memory) whilst still significantly speeding up query execution times, especially for range queries that are typical in data analytics use cases.
For more details, we recommend this [in-depth guide](/guides/best-practices/sparse-primary-indexes).
@@ -38,7 +38,7 @@ For more details, we recommend this [in-depth guide](/guides/best-practices/spar
The selected key in ClickHouse will determine not only the index but also the order in which data is written on disk. Because of this, it can dramatically impact compression levels, which can, in turn, affect query performance. An ordering key that causes the values of most columns to be written in a contiguous order will allow the selected compression algorithm (and codecs) to compress the data more effectively.
-> All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they are included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
+> All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they're included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
### Choosing an ordering key {#choosing-an-ordering-key}
@@ -114,15 +114,15 @@ Ok.
0 rows in set. Elapsed: 0.103 sec.
```
-- **Query optimization** - While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it. However, queries that need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts as a result of partitioning). The benefit of targeting a single partition will be even less pronounced to non-existence if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize GROUP BY queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, you should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the day, e.g., partitioning by day, with most queries in the last day.
+- **Query optimization** - While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key isn't in the primary key and you're filtering by it. However, queries that need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts as a result of partitioning). The benefit of targeting a single partition will be even less pronounced to non-existence if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize GROUP BY queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, you should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the day, e.g., partitioning by day, with most queries in the last day.
### Recommendations for partitions {#recommendations-for-partitions}
You should consider partitioning a data management technique. It is ideal when data needs to be expired from the cluster when operating with time series data e.g. the oldest partition can [simply be dropped](/sql-reference/statements/alter/partition#drop-partitionpart).
-**Important:** Ensure your partitioning key expression does not result in a high cardinality set i.e. creating more than 100 partitions should be avoided. For example, do not partition your data by high cardinality columns such as client identifiers or names. Instead, make a client identifier or name the first column in the ORDER BY expression.
+**Important:** Ensure your partitioning key expression doesn't result in a high cardinality set i.e. creating more than 100 partitions should be avoided. For example, don't partition your data by high cardinality columns such as client identifiers or names. Instead, make a client identifier or name the first column in the ORDER BY expression.
-> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a pre-configured limit, then ClickHouse will throw an exception on insert - as a "too many parts" error. This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts.
+> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a pre-configured limit, then ClickHouse will throw an exception on insert - as a "too many parts" error. This shouldn't happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts.
> Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
@@ -149,7 +149,7 @@ WHERE UserId = 8592047
Peak memory usage: 201.93 MiB.
```
-This query requires all 90m rows to be scanned (admittedly quickly) as the `UserId` is not the ordering key.
+This query requires all 90m rows to be scanned (admittedly quickly) as the `UserId` isn't the ordering key.
Previously, we solved this using a materialized view acting as a lookup for the `PostId`. The same problem can be solved
with a [projection](/data-modeling/projections). The command below adds a
projection for the `ORDER BY user_id`.
@@ -243,7 +243,7 @@ WHERE UserId = 8592047
### When to use projections {#when-to-use-projections}
-Projections are an appealing feature for new users as they are automatically
+Projections are an appealing feature for new users as they're automatically
maintained as data is inserted. Furthermore, queries can just be sent to a single
table where the projections are exploited where possible to speed up the response
time.
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/01_overview.md b/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/01_overview.md
index 884d086011c..79e96d5b447 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/01_overview.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/01_overview.md
@@ -43,7 +43,7 @@ ClickHouse Cloud currently has no concept equivalent to BigQuery folders.
Like BigQuery slot reservations, you can [configure vertical and horizontal autoscaling](/manage/scaling#configuring-vertical-auto-scaling) in ClickHouse Cloud. For vertical autoscaling, you can set the minimum and maximum size for the memory and CPU cores of the compute nodes for a service. The service will then scale as needed within those bounds. These settings are also available during the initial service creation flow. Each compute node in the service has the same size. You can change the number of compute nodes within a service with [horizontal scaling](/manage/scaling#manual-horizontal-scaling).
-Furthermore, similar to BigQuery quotas, ClickHouse Cloud offers concurrency control, memory usage limits, and I/O scheduling, enabling you to isolate queries into workload classes. By setting limits on shared resources (CPU cores, DRAM, disk and network I/O) for specific workload classes, it ensures these queries do not affect other critical business queries. Concurrency control prevents thread oversubscription in scenarios with a high number of concurrent queries.
+Furthermore, similar to BigQuery quotas, ClickHouse Cloud offers concurrency control, memory usage limits, and I/O scheduling, enabling you to isolate queries into workload classes. By setting limits on shared resources (CPU cores, DRAM, disk and network I/O) for specific workload classes, it ensures these queries don't affect other critical business queries. Concurrency control prevents thread oversubscription in scenarios with a high number of concurrent queries.
ClickHouse tracks byte sizes of memory allocations at the server, user, and query level, allowing flexible memory usage limits. Memory overcommit enables queries to use additional free memory beyond the guaranteed memory, while assuring memory limits for other queries. Additionally, memory usage for aggregation, sort, and join clauses can be limited, allowing fallback to external algorithms when the memory limit is exceeded.
@@ -82,7 +82,7 @@ When presented with multiple options for ClickHouse types, consider the actual r
### Primary and Foreign keys and Primary index {#primary-and-foreign-keys-and-primary-index}
-In BigQuery, a table can have [primary key and foreign key constraints](https://cloud.google.com/bigquery/docs/information-schema-table-constraints). Typically, primary and foreign keys are used in relational databases to ensure data integrity. A primary key value is normally unique for each row and is not `NULL`. Each foreign key value in a row must be present in the primary key column of the primary key table or be `NULL`. In BigQuery, these constraints are not enforced, but the query optimizer may use this information to optimize queries better.
+In BigQuery, a table can have [primary key and foreign key constraints](https://cloud.google.com/bigquery/docs/information-schema-table-constraints). Typically, primary and foreign keys are used in relational databases to ensure data integrity. A primary key value is normally unique for each row and isn't `NULL`. Each foreign key value in a row must be present in the primary key column of the primary key table or be `NULL`. In BigQuery, these constraints aren't enforced, but the query optimizer may use this information to optimize queries better.
In ClickHouse, a table can also have a primary key. Like BigQuery, ClickHouse doesn't enforce uniqueness for a table's primary key column values. Unlike BigQuery, a table's data is stored on disk [ordered](/guides/best-practices/sparse-primary-indexes#optimal-compression-ratio-of-data-files) by the primary key column(s). The query optimizer utilizes this sort order to prevent resorting, to minimize memory usage for joins, and to enable short-circuiting for limit clauses. Unlike BigQuery, ClickHouse automatically creates [a (sparse) primary index](/guides/best-practices/sparse-primary-indexes#an-index-design-for-massive-data-scales) based on the primary key column values. This index is used to speed up all queries that contain filters on the primary key columns. ClickHouse currently doesn't support foreign key constraints.
@@ -97,7 +97,7 @@ In addition to the primary index created from the values of a table's primary ke
- Similar to a Bloom Filter Index but used for tokenized strings and suitable for full-text search queries.
- [**Min-Max Index**](/engines/table-engines/mergetree-family/mergetree#minmax):
- Maintains the minimum and maximum values of a column for each data part.
- - Helps to skip reading data parts that do not fall within the specified range.
+ - Helps to skip reading data parts that don't fall within the specified range.
## Search indexes {#search-indexes}
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/02_migrating-to-clickhouse-cloud.md b/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/02_migrating-to-clickhouse-cloud.md
index 51dfb0822e1..e1022f2759e 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/02_migrating-to-clickhouse-cloud.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/02_migrating-to-clickhouse-cloud.md
@@ -50,7 +50,7 @@ BigQuery supports exporting data to Google's object store (GCS). For our example
1. Export the 7 tables to GCS. Commands for that are available [here](https://pastila.nl/?014e1ae9/cb9b07d89e9bb2c56954102fd0c37abd#0Pzj52uPYeu1jG35nmMqRQ==).
-2. Import the data into ClickHouse Cloud. For that we can use the [gcs table function](/sql-reference/table-functions/gcs). The DDL and import queries are available [here](https://pastila.nl/?00531abf/f055a61cc96b1ba1383d618721059976#Wf4Tn43D3VCU5Hx7tbf1Qw==). Note that because a ClickHouse Cloud instance consists of multiple compute nodes, instead of the `gcs` table function, we are using the [s3Cluster table function](/sql-reference/table-functions/s3Cluster) instead. This function also works with gcs buckets and [utilizes all nodes of a ClickHouse Cloud service](https://clickhouse.com/blog/supercharge-your-clickhouse-data-loads-part1#parallel-servers) to load the data in parallel.
+2. Import the data into ClickHouse Cloud. For that we can use the [gcs table function](/sql-reference/table-functions/gcs). The DDL and import queries are available [here](https://pastila.nl/?00531abf/f055a61cc96b1ba1383d618721059976#Wf4Tn43D3VCU5Hx7tbf1Qw==). Note that because a ClickHouse Cloud instance consists of multiple compute nodes, instead of the `gcs` table function, we're using the [s3Cluster table function](/sql-reference/table-functions/s3Cluster) instead. This function also works with gcs buckets and [utilizes all nodes of a ClickHouse Cloud service](https://clickhouse.com/blog/supercharge-your-clickhouse-data-loads-part1#parallel-servers) to load the data in parallel.
@@ -151,14 +151,14 @@ As described [here](/migrations/bigquery), like in BigQuery, ClickHouse doesn't
Similar to clustering in BigQuery, a ClickHouse table's data is stored on disk ordered by the primary key column(s). This sort order is utilized by the query optimizer to prevent resorting, minimize memory usage for joins, and enable short-circuiting for limit clauses.
In contrast to BigQuery, ClickHouse automatically creates [a (sparse) primary index](/guides/best-practices/sparse-primary-indexes) based on the primary key column values. This index is used to speed up all queries that contain filters on the primary key columns. Specifically:
-- Memory and disk efficiency are paramount to the scale at which ClickHouse is often used. Data is written to ClickHouse tables in chunks known as parts, with rules applied for merging the parts in the background. In ClickHouse, each part has its own primary index. When parts are merged, then the merged part's primary indexes are also merged. Not that these indexes are not built for each row. Instead, the primary index for a part has one index entry per group of rows - this technique is called sparse indexing.
+- Memory and disk efficiency are paramount to the scale at which ClickHouse is often used. Data is written to ClickHouse tables in chunks known as parts, with rules applied for merging the parts in the background. In ClickHouse, each part has its own primary index. When parts are merged, then the merged part's primary indexes are also merged. Not that these indexes aren't built for each row. Instead, the primary index for a part has one index entry per group of rows - this technique is called sparse indexing.
- Sparse indexing is possible because ClickHouse stores the rows for a part on disk ordered by a specified key. Instead of directly locating single rows (like a B-Tree-based index), the sparse primary index allows it to quickly (via a binary search over index entries) identify groups of rows that could possibly match the query. The located groups of potentially matching rows are then, in parallel, streamed into the ClickHouse engine in order to find the matches. This index design allows for the primary index to be small (it completely fits into the main memory) while still significantly speeding up query execution times, especially for range queries that are typical in data analytics use cases. For more details, we recommend [this in-depth guide](/guides/best-practices/sparse-primary-indexes).
The selected primary key in ClickHouse will determine not only the index but also the order in which data is written on disk. Because of this, it can dramatically impact compression levels, which can, in turn, affect query performance. An ordering key that causes the values of most columns to be written in a contiguous order will allow the selected compression algorithm (and codecs) to compress the data more effectively.
-> All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they are included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
+> All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they're included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
### Choosing an ordering key {#choosing-an-ordering-key}
@@ -236,15 +236,15 @@ Ok.
0 rows in set. Elapsed: 0.103 sec.
```
-- **Query optimization** - While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it. However, queries that need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts as a result of partitioning). The benefit of targeting a single partition will be even less pronounced to non-existence if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize `GROUP BY` queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, you should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the day, e.g., partitioning by day, with most queries in the last day.
+- **Query optimization** - While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key isn't in the primary key and you're filtering by it. However, queries that need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts as a result of partitioning). The benefit of targeting a single partition will be even less pronounced to non-existence if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize `GROUP BY` queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, you should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the day, e.g., partitioning by day, with most queries in the last day.
#### Recommendations {#recommendations}
You should consider partitioning a data management technique. It is ideal when data needs to be expired from the cluster when operating with time series data e.g. the oldest partition can [simply be dropped](/sql-reference/statements/alter/partition#drop-partitionpart).
-Important: Ensure your partitioning key expression does not result in a high cardinality set i.e. creating more than 100 partitions should be avoided. For example, do not partition your data by high cardinality columns such as client identifiers or names. Instead, make a client identifier or name the first column in the `ORDER BY` expression.
+Important: Ensure your partitioning key expression doesn't result in a high cardinality set i.e. creating more than 100 partitions should be avoided. For example, don't partition your data by high cardinality columns such as client identifiers or names. Instead, make a client identifier or name the first column in the `ORDER BY` expression.
-> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts_to_throw_insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
+> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts_to_throw_insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This shouldn't happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
## Materialized views vs projections {#materialized-views-vs-projections}
@@ -272,7 +272,7 @@ Peak memory usage: 201.93 MiB.
```
This query requires all 90m rows to be scanned (albeit quickly) as the `UserId`
-is not the ordering key. Previously, we solved this using a materialized view
+isn't the ordering key. Previously, we solved this using a materialized view
acting as a lookup for the `PostId`. The same problem can be solved with a projection.
The command below adds a projection with `ORDER BY user_id`.
@@ -373,7 +373,7 @@ WHERE UserId = 8592047
### When to use projections {#when-to-use-projections}
-Projections are an appealing feature for new users as they are automatically
+Projections are an appealing feature for new users as they're automatically
maintained as data is inserted. Furthermore, queries can just be sent to a single
table where the projections are exploited where possible to speed up the response
time.
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/03_loading-data.md b/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/03_loading-data.md
index 7c3b470aebc..eeba4f1b507 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/03_loading-data.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/03_bigquery/03_loading-data.md
@@ -109,7 +109,7 @@ FROM s3Cluster(
The `ACCESS_ID` and `SECRET` used in the above query is your [HMAC key](https://cloud.google.com/storage/docs/authentication/hmackeys) associated with your GCS bucket.
:::note Use `ifNull` when exporting nullable columns
-In the above query, we use the [`ifNull` function](/sql-reference/functions/functions-for-nulls#ifNull) with the `some_text` column to insert data into our ClickHouse table with a default value. You can also make your columns in ClickHouse [`Nullable`](/sql-reference/data-types/nullable), but this is not recommended as it may affect negatively performance.
+In the above query, we use the [`ifNull` function](/sql-reference/functions/functions-for-nulls#ifNull) with the `some_text` column to insert data into our ClickHouse table with a default value. You can also make your columns in ClickHouse [`Nullable`](/sql-reference/data-types/nullable), but this isn't recommended as it may affect negatively performance.
Alternatively, you can `SET input_format_null_as_default=1` and any missing or NULL values will be replaced by default values for their respective columns, if those defaults are specified.
:::
@@ -130,4 +130,4 @@ To export more BigQuery tables, simply redo the steps above for each additional
In addition to this guide, we also recommend reading our blog post that shows [how to use ClickHouse to speed up BigQuery and how to handle incremental imports](https://clickhouse.com/blog/clickhouse-bigquery-migrating-data-for-realtime-queries).
-If you are having issues transferring data from BigQuery to ClickHouse, please feel free to contact us at support@clickhouse.com.
+If you're having issues transferring data from BigQuery to ClickHouse, please feel free to contact us at support@clickhouse.com.
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/04_snowflake/03_sql_translation_reference.md b/docs/cloud/onboard/02_migrate/01_migration_guides/04_snowflake/03_sql_translation_reference.md
index bc7eb619cee..5dc8f4c8d97 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/04_snowflake/03_sql_translation_reference.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/04_snowflake/03_sql_translation_reference.md
@@ -20,7 +20,7 @@ Snowflake offers the type Number for numerics. This requires the user to specify
precision (total number of digits) and scale (digits to the right of the decimal place)
up to a total of 38. Integer declarations are synonymous with Number, and simply
define a fixed precision and scale where the range is the same. This convenience
-is possible as modifying the precision (scale is 0 for integers) does not impact the
+is possible as modifying the precision (scale is 0 for integers) doesn't impact the
size of data on disk in Snowflake - the minimal required bytes are used for a
numeric range at write time at a micro partition level. The scale does, however,
impact storage space and is offset with compression. A `Float64` type offers a
@@ -67,9 +67,9 @@ via the [`Nested`](/sql-reference/data-types/nested-data-structures/nested) type
allowing users to explicitly map nested structures. This allows codecs and type
optimizations to be applied throughout the hierarchy, unlike Snowflake, which
requires the user to use the `OBJECT`, `VARIANT`, and `ARRAY` types for the outer
-object and does not allow [explicit internal typing](https://docs.snowflake.com/en/sql-reference/data-types-semistructured#characteristics-of-an-object).
+object and doesn't allow [explicit internal typing](https://docs.snowflake.com/en/sql-reference/data-types-semistructured#characteristics-of-an-object).
This internal typing also simplifies queries on nested numerics in ClickHouse,
-which do not need to be cast and can be used in index definitions.
+which don't need to be cast and can be used in index definitions.
In ClickHouse, codecs and optimized types can also be applied to substructures.
This provides an added benefit that compression with nested structures remains
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/06_redshift/03_sql_translation_reference.md b/docs/cloud/onboard/02_migrate/01_migration_guides/06_redshift/03_sql_translation_reference.md
index e290b3a1b91..8e2a1a7f6d1 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/06_redshift/03_sql_translation_reference.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/06_redshift/03_sql_translation_reference.md
@@ -72,7 +72,7 @@ CREATE TABLE some_table(...) ENGINE = MergeTree ORDER BY (column1, column2)
```
In most cases, you can use the same sorting key columns and order in ClickHouse
-as Redshift, assuming you are using the default `COMPOUND` type. When data is
+as Redshift, assuming you're using the default `COMPOUND` type. When data is
added to Redshift, you should run the `VACUUM` and `ANALYZE` commands to re-sort
newly added data and update the statistics for the query planner - otherwise, the
unsorted space grows. No such process is required for ClickHouse.
@@ -88,7 +88,7 @@ same end-result with a slightly different setup.
You should be aware that the “primary key” concept represents different things
in ClickHouse and Redshift. In Redshift, the primary key resembles the traditional
-RDMS concept intended to enforce constraints. However, they are not strictly
+RDMS concept intended to enforce constraints. However, they're not strictly
enforced in Redshift and instead act as hints for the query planner and data
distribution among nodes. In ClickHouse, the primary key denotes columns used
to construct the sparse primary index, used to ensure the data is ordered on
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/01_clickhouse-to-cloud_with_remotesecure.md b/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/01_clickhouse-to-cloud_with_remotesecure.md
index c3baeb2f674..27f35310575 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/01_clickhouse-to-cloud_with_remotesecure.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/01_clickhouse-to-cloud_with_remotesecure.md
@@ -97,7 +97,7 @@ remoteSecure('source-hostname', db, table, 'exporter', 'password-here')
```
:::note
-If the source system is not available from outside networks then you can push the data rather than pulling it, as the `remoteSecure` function works for both selects and inserts. See the next option.
+If the source system isn't available from outside networks then you can push the data rather than pulling it, as the `remoteSecure` function works for both selects and inserts. See the next option.
:::
- Use the `remoteSecure` function to push the data to the ClickHouse Cloud service
@@ -159,7 +159,7 @@ There are a few steps in the migration:
#### Duplicate the table structure on the destination service {#duplicate-the-table-structure-on-the-destination-service}
-On the destination create the database if it is not there already:
+On the destination create the database if it isn't there already:
- Create the destination database:
```sql
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/02_oss_to_cloud_backups.md b/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/02_oss_to_cloud_backups.md
index 95224b59a6f..41fa4fdc583 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/02_oss_to_cloud_backups.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/07_OSS_to_Cloud/02_oss_to_cloud_backups.md
@@ -80,7 +80,7 @@ docker exec -it clickhouse-01 clickhouse-client
ClickHouse Cloud works with [`SharedMergeTree`](/cloud/reference/shared-merge-tree).
When restoring a backup, ClickHouse automatically converts tables with `ReplicatedMergeTree` to `SharedMergeTree` tables.
-It's likely your tables are already using the `ReplciatedMergeTree` engine if you are running a cluster.
+It's likely your tables are already using the `ReplciatedMergeTree` engine if you're running a cluster.
If not, you will need to convert any `MergeTree` tables to `ReplicatedMergeTree` before backing them up.
For the sake of demonstration of how to convert `MergeTree` tables to `ReplicatedMergeTree`, we will begin with a `MergeTree` table and convert it to `ReplicatedMergeTree` after wards.
@@ -259,7 +259,7 @@ TO S3(
Replace `BUCKET_URL`, `KEY_ID` and `SECRET_KEY` with your own AWS credentials.
The guide ["How to create an S3 bucket and IAM role"](/integrations/s3/creating-iam-user-and-s3-bucket)
-shows you how to obtain these if you do not yet have them.
+shows you how to obtain these if you don't yet have them.
If everything is correctly configured you will see a response similar to the one below
containing a unique id assigned to the backup and the status of the backup.
diff --git a/docs/cloud/onboard/02_migrate/01_migration_guides/08_other_methods/03_object-storage-to-clickhouse.md b/docs/cloud/onboard/02_migrate/01_migration_guides/08_other_methods/03_object-storage-to-clickhouse.md
index 3cb2be34f85..ef36f75a56e 100644
--- a/docs/cloud/onboard/02_migrate/01_migration_guides/08_other_methods/03_object-storage-to-clickhouse.md
+++ b/docs/cloud/onboard/02_migrate/01_migration_guides/08_other_methods/03_object-storage-to-clickhouse.md
@@ -21,7 +21,7 @@ table functions for migrating data stored in Cloud Object Storage into a ClickHo
- [gcs](/sql-reference/table-functions/gcs)
- [azureBlobStorage](/sql-reference/table-functions/azureBlobStorage)
-If your current database system is not able to directly offload data into a Cloud Object Storage, you could use a [third-party ETL/ELT tool](/cloud/migration/etl-tool-to-clickhouse) or [clickhouse-local](/cloud/migration/clickhouse-local) for moving data
+If your current database system isn't able to directly offload data into a Cloud Object Storage, you could use a [third-party ETL/ELT tool](/cloud/migration/etl-tool-to-clickhouse) or [clickhouse-local](/cloud/migration/clickhouse-local) for moving data
from you current database system to Cloud Object Storage, in order to migrate that data in a second step into a ClickHouse Cloud table.
Although this is a two steps process (offload data into a Cloud Object Storage, then load into ClickHouse), the advantage is that this
diff --git a/docs/cloud/reference/03_billing/01_billing_overview.md b/docs/cloud/reference/03_billing/01_billing_overview.md
index 4f3095fd0d2..e0f97022ec7 100644
--- a/docs/cloud/reference/03_billing/01_billing_overview.md
+++ b/docs/cloud/reference/03_billing/01_billing_overview.md
@@ -20,7 +20,7 @@ To understand what can affect your bill, and ways that you can manage your spend
### Basic: from $66.52 per month {#basic-from-6652-per-month}
-Best for: Departmental use cases with smaller data volumes that do not have hard reliability guarantees.
+Best for: Departmental use cases with smaller data volumes that don't have hard reliability guarantees.
**Basic tier service**
- 1 replica x 8 GiB RAM, 2 vCPU
@@ -182,7 +182,7 @@ Best for: large scale, mission critical deployments that have stringent security
A ClickHouse Credit is a unit of credit toward Customer's usage of ClickHouse Cloud equal to one (1) US dollar, to be applied based on ClickHouse's then-current published price list.
:::note
-If you are being billed through Stripe then you will see that 1 CHC is equal to \$0.01 USD on your Stripe invoice. This is to allow accurate billing on Stripe due to their limitation on not being able to bill fractional quantities of our standard SKU of 1 CHC = \$1 USD.
+If you're being billed through Stripe then you will see that 1 CHC is equal to \$0.01 USD on your Stripe invoice. This is to allow accurate billing on Stripe due to their limitation on not being able to bill fractional quantities of our standard SKU of 1 CHC = \$1 USD.
:::
### Where can I find legacy pricing? {#find-legacy-pricing}
@@ -261,7 +261,7 @@ Yes. Usage is consumed with the following payment methods in this order:
### What controls does ClickHouse Cloud offer to manage costs for Basic services? {#what-controls-does-clickhouse-cloud-offer-to-manage-costs-for-basic-services}
-- The [Advanced scaling control](/manage/scaling) lets you control the behavior of pausing/idling during inactivity. Adjusting memory allocation is not supported for Basic services.
+- The [Advanced scaling control](/manage/scaling) lets you control the behavior of pausing/idling during inactivity. Adjusting memory allocation isn't supported for Basic services.
- Note that the default setting pauses the service after a period of inactivity.
### If I have multiple services, do I get an invoice per service or a consolidated invoice? {#if-i-have-multiple-services-do-i-get-an-invoice-per-service-or-a-consolidated-invoice}
@@ -296,7 +296,7 @@ an invoice is generated between 3-Jan and 5-Jan-2025
ClickHouse Cloud usage statements follow a different billing cycle where usage is metered
and reported over 30 days starting from the day of sign up.
-The usage and invoice dates will differ if these dates are not the same. Since usage statements track usage by day for a given service, you can rely on statements to see the breakdown of costs.
+The usage and invoice dates will differ if these dates aren't the same. Since usage statements track usage by day for a given service, you can rely on statements to see the breakdown of costs.
### Are there any restrictions around the usage of prepaid credits? {#are-there-any-restrictions-around-the-usage-of-prepaid-credits}
@@ -325,15 +325,15 @@ By leveraging shared storage in this deployment, users benefit from cost savings
Compute-compute separation can save you a significant amount of ClickHouse Credits in some cases.
A good example is the following setup:
-1. You have ETL jobs that are running 24/7 and ingesting data into the service. These ETL jobs do not require a lot of memory so they can run on a small instance with, for example, 32 GiB of RAM.
+1. You have ETL jobs that are running 24/7 and ingesting data into the service. These ETL jobs don't require a lot of memory so they can run on a small instance with, for example, 32 GiB of RAM.
-2. A data scientist on the same team that has ad hoc reporting requirements, says they need to run a query that requires a significant amount of memory - 236 GiB, however does not need high availability and can wait and rerun queries if the first run fails.
+2. A data scientist on the same team that has ad hoc reporting requirements, says they need to run a query that requires a significant amount of memory - 236 GiB, however doesn't need high availability and can wait and rerun queries if the first run fails.
In this example you, as an administrator for the database, can do the following:
1. Create a small service with two replicas 16 GiB each - this will satisfy the ETL jobs and provide high availability.
-2. For the data scientist, you can create a second service in the same warehouse with only one replica with 236 GiB. You can enable idling for this service so you will not be paying for this service when the data scientist is not using it.
+2. For the data scientist, you can create a second service in the same warehouse with only one replica with 236 GiB. You can enable idling for this service so you won't be paying for this service when the data scientist isn't using it.
Cost estimation (per month) for this example on the **Scale Tier**:
- Parent service active 24 hours day: 2 replicas x 16 GiB 4 vCPU per replica
diff --git a/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-committed.md b/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-committed.md
index de3546cb79f..91a9627adf2 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-committed.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-committed.md
@@ -54,7 +54,7 @@ This should take you to your AWS Marketplace page with the private offer details
Complete the steps to subscribe on the AWS portal and click on **"Set up your account"**.
It is critical to redirect to ClickHouse Cloud at this point and either register for a new account, or sign in with an existing account.
-Without completing this step, we will not be able to link your AWS Marketplace contract to ClickHouse Cloud.
+Without completing this step, we won't be able to link your AWS Marketplace contract to ClickHouse Cloud.
@@ -67,18 +67,18 @@ This step is necessary so that we can bind your ClickHouse Cloud organization to
### Register if new {#register}
-If you are a new ClickHouse Cloud user, click "Register" at the bottom of the page.
+If you're a new ClickHouse Cloud user, click "Register" at the bottom of the page.
You will be prompted to create a new user and verify the email.
After verifying your email, you can leave the ClickHouse Cloud login page and login using the new username at [https://console.clickhouse.cloud](https://console.clickhouse.cloud).
-Note that if you are a new user, you will also need to provide some basic information about your business.
+Note that if you're a new user, you will also need to provide some basic information about your business.
See the screenshots below.
-If you are an existing ClickHouse Cloud user, simply log in using your credentials.
+If you're an existing ClickHouse Cloud user, simply log in using your credentials.
### Create or select organization to bill {#create-select-org-to-bill}
@@ -91,6 +91,6 @@ You can confirm from the organization's billing page in the ClickHouse UI that b
-If you run into any issues, please do not hesitate to contact our [support team](https://clickhouse.com/support/program).
+If you run into any issues, please don't hesitate to contact our [support team](https://clickhouse.com/support/program).
diff --git a/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-payg.md b/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-payg.md
index f66b920bcac..221df50f721 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-payg.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/aws-marketplace-payg.md
@@ -58,20 +58,20 @@ On the next screen, click subscribe.
### Set up your account {#set-up-your-account}
-Note that at this point, the setup is not complete and your ClickHouse Cloud organization is not being billed through the marketplace yet. You will now need to click on Set up your account on your marketplace subscription to redirect to ClickHouse Cloud to finish setup.
+Note that at this point, the setup isn't complete and your ClickHouse Cloud organization isn't being billed through the marketplace yet. You will now need to click on Set up your account on your marketplace subscription to redirect to ClickHouse Cloud to finish setup.
Once you redirect to ClickHouse Cloud, you can either login with an existing account, or register with a new account. This step is very important so we can bind your ClickHouse Cloud organization to your AWS Marketplace billing.
:::note[New Clickhouse Cloud Users]
-If you are a new ClickHouse Cloud user, follow the steps below.
+If you're a new ClickHouse Cloud user, follow the steps below.
:::
Steps for new users
-If you are a new ClickHouse Cloud user, click Register at the bottom of the page. You will be prompted to create a new user and verify the email. After verifying your email, you can leave the ClickHouse Cloud login page and login using the new username at the https://console.clickhouse.cloud.
+If you're a new ClickHouse Cloud user, click Register at the bottom of the page. You will be prompted to create a new user and verify the email. After verifying your email, you can leave the ClickHouse Cloud login page and login using the new username at the https://console.clickhouse.cloud.
@@ -85,7 +85,7 @@ You will also need to provide some basic information about your business. See th
-If you are an existing ClickHouse Cloud user, simply log in using your credentials.
+If you're an existing ClickHouse Cloud user, simply log in using your credentials.
### Add the Marketplace Subscription to an Organization {#add-marketplace-subscription}
@@ -103,4 +103,4 @@ You can confirm from the organization's billing page in the ClickHouse UI that b
## Support {#support}
-If you run into any issues, please do not hesitate to contact [our support team](https://clickhouse.com/support/program).
+If you run into any issues, please don't hesitate to contact [our support team](https://clickhouse.com/support/program).
diff --git a/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-committed.md b/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-committed.md
index 05437395f43..b58b5004bbd 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-committed.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-committed.md
@@ -72,7 +72,7 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
- Subscription and resource group
- Provide a name for the SaaS subscription
- Choose the billing plan that you have a private offer for. Only the term that the private offer was created (for example, 1 year) will have an amount against it. Other billing term options will be for $0 amounts.
-- Choose whether you want recurring billing or not. If recurring billing is not selected, the contract will end at the end of the billing period and the resources will be set to decommissioned.
+- Choose whether you want recurring billing or not. If recurring billing isn't selected, the contract will end at the end of the billing period and the resources will be set to decommissioned.
- Click on **Review + subscribe**.
@@ -97,7 +97,7 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
-8. Once ready, you can click on **Configure account now**. Note that is a critical step that binds the Azure subscription to a ClickHouse Cloud organization for your account. Without this step, your Marketplace subscription is not complete.
+8. Once ready, you can click on **Configure account now**. Note that is a critical step that binds the Azure subscription to a ClickHouse Cloud organization for your account. Without this step, your Marketplace subscription isn't complete.
@@ -105,7 +105,7 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
-9. You will be redirected to the ClickHouse Cloud sign up or sign in page. You can either sign up using a new account or sign in using an existing account. Once you are signed in, a new organization will be created that is ready to be used and billed via the Azure Marketplace.
+9. You will be redirected to the ClickHouse Cloud sign up or sign in page. You can either sign up using a new account or sign in using an existing account. Once you're signed in, a new organization will be created that is ready to be used and billed via the Azure Marketplace.
10. You will need to answer a few questions - address and company details - before you can proceed.
@@ -119,7 +119,7 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
-11. Once you hit **Complete sign up**, you will be taken to your organization within ClickHouse Cloud where you can view the billing screen to ensure you are being billed via the Azure Marketplace and can create services.
+11. Once you hit **Complete sign up**, you will be taken to your organization within ClickHouse Cloud where you can view the billing screen to ensure you're being billed via the Azure Marketplace and can create services.
@@ -135,4 +135,4 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
-If you run into any issues, please do not hesitate to contact [our support team](https://clickhouse.com/support/program).
+If you run into any issues, please don't hesitate to contact [our support team](https://clickhouse.com/support/program).
diff --git a/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-payg.md b/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-payg.md
index ecf4ae12a09..e063ed83c72 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-payg.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/azure-marketplace-payg.md
@@ -29,7 +29,7 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
- An Azure project that is enabled with purchasing rights by your billing administrator.
- To subscribe to ClickHouse Cloud on the Azure Marketplace, you must be logged in with an account that has purchasing rights and choose the appropriate project.
-1. Go to [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps) and search for ClickHouse Cloud. Make sure you are logged in so you can purchase an offering on the marketplace.
+1. Go to [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps) and search for ClickHouse Cloud. Make sure you're logged in so you can purchase an offering on the marketplace.
@@ -61,7 +61,7 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
-5. On the next screen, choose the subscription, resource group, and resource group location. The resource group location does not have to be the same location as where you intend to launch your services on ClickHouse Cloud.
+5. On the next screen, choose the subscription, resource group, and resource group location. The resource group location doesn't have to be the same location as where you intend to launch your services on ClickHouse Cloud.
@@ -87,7 +87,7 @@ Get started with ClickHouse Cloud on the [Azure Marketplace](https://azuremarket
-9. Note that at this point, you will have subscribed to the Azure subscription of ClickHouse Cloud, but you have not yet set up your account on ClickHouse Cloud. The next steps are necessary and critical for ClickHouse Cloud to be able to bind to your Azure subscription so your billing happens correctly through the Azure marketplace.
+9. Note that at this point, you will have subscribed to the Azure subscription of ClickHouse Cloud, but you haven't yet set up your account on ClickHouse Cloud. The next steps are necessary and critical for ClickHouse Cloud to be able to bind to your Azure subscription so your billing happens correctly through the Azure marketplace.
@@ -117,7 +117,7 @@ You will receive an email like the one below with details on configuring your ac
12. You will be redirected to the ClickHouse Cloud sign up or sign in page. Once you redirect to ClickHouse Cloud, you can either login with an existing account, or register with a new account. This step is very important so we can bind your ClickHouse Cloud organization to the Azure Marketplace billing.
-13. Note that if you are a new user, you will also need to provide some basic information about your business. See the screenshots below.
+13. Note that if you're a new user, you will also need to provide some basic information about your business. See the screenshots below.
@@ -129,7 +129,7 @@ You will receive an email like the one below with details on configuring your ac
-Once you hit **Complete sign up**, you will be taken to your organization within ClickHouse Cloud where you can view the billing screen to ensure you are being billed via the Azure Marketplace and can create services.
+Once you hit **Complete sign up**, you will be taken to your organization within ClickHouse Cloud where you can view the billing screen to ensure you're being billed via the Azure Marketplace and can create services.
@@ -145,4 +145,4 @@ Once you hit **Complete sign up**, you will be taken to your organization within
-14. If you run into any issues, please do not hesitate to contact [our support team](https://clickhouse.com/support/program).
+14. If you run into any issues, please don't hesitate to contact [our support team](https://clickhouse.com/support/program).
diff --git a/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-committed.md b/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-committed.md
index ced70c443fe..00e7b01b2af 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-committed.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-committed.md
@@ -74,7 +74,7 @@ Get started with ClickHouse Cloud on the [GCP Marketplace](https://console.cloud
-It is critical to redirect to ClickHouse Cloud at this point and sign up or sign in. Without completing this step, we will not be able to link your GCP Marketplace subscription to ClickHouse Cloud.
+It is critical to redirect to ClickHouse Cloud at this point and sign up or sign in. Without completing this step, we won't be able to link your GCP Marketplace subscription to ClickHouse Cloud.
@@ -90,7 +90,7 @@ It is critical to redirect to ClickHouse Cloud at this point and sign up or sign
-If you are a new ClickHouse Cloud user, click **Register** at the bottom of the page. You will be prompted to create a new user and verify the email. After verifying your email, you can leave the ClickHouse Cloud login page and login using the new username at the [https://console.clickhouse.cloud](https://console.clickhouse.cloud).
+If you're a new ClickHouse Cloud user, click **Register** at the bottom of the page. You will be prompted to create a new user and verify the email. After verifying your email, you can leave the ClickHouse Cloud login page and login using the new username at the [https://console.clickhouse.cloud](https://console.clickhouse.cloud).
@@ -98,7 +98,7 @@ If you are a new ClickHouse Cloud user, click **Register** at the bottom of the
-Note that if you are a new user, you will also need to provide some basic information about your business. See the screenshots below.
+Note that if you're a new user, you will also need to provide some basic information about your business. See the screenshots below.
@@ -110,7 +110,7 @@ Note that if you are a new user, you will also need to provide some basic inform
-If you are an existing ClickHouse Cloud user, simply log in using your credentials.
+If you're an existing ClickHouse Cloud user, simply log in using your credentials.
7. After successfully logging in, a new ClickHouse Cloud organization will be created. This organization will be connected to your GCP billing account and all usage will be billed via your GCP account.
@@ -137,4 +137,4 @@ If you are an existing ClickHouse Cloud user, simply log in using your credentia
-If you run into any issues, please do not hesitate to contact [our support team](https://clickhouse.com/support/program).
+If you run into any issues, please don't hesitate to contact [our support team](https://clickhouse.com/support/program).
diff --git a/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-payg.md b/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-payg.md
index b40dbc74701..d7b73dfd504 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-payg.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/gcp-marketplace-payg.md
@@ -57,7 +57,7 @@ Get started with ClickHouse Cloud on the [GCP Marketplace](https://console.cloud
-5. Note that at this point, the setup is not complete yet. You will need to redirect to ClickHouse Cloud by clicking on **Set up your account** and signing up on ClickHouse Cloud.
+5. Note that at this point, the setup isn't complete yet. You will need to redirect to ClickHouse Cloud by clicking on **Set up your account** and signing up on ClickHouse Cloud.
6. Once you redirect to ClickHouse Cloud, you can either login with an existing account, or register with a new account. This step is very important so we can bind your ClickHouse Cloud organization to the GCP Marketplace billing.
@@ -67,7 +67,7 @@ Get started with ClickHouse Cloud on the [GCP Marketplace](https://console.cloud
-If you are a new ClickHouse Cloud user, click **Register** at the bottom of the page. You will be prompted to create a new user and verify the email. After verifying your email, you can leave the ClickHouse Cloud login page and login using the new username at the [https://console.clickhouse.cloud](https://console.clickhouse.cloud).
+If you're a new ClickHouse Cloud user, click **Register** at the bottom of the page. You will be prompted to create a new user and verify the email. After verifying your email, you can leave the ClickHouse Cloud login page and login using the new username at the [https://console.clickhouse.cloud](https://console.clickhouse.cloud).
@@ -75,7 +75,7 @@ If you are a new ClickHouse Cloud user, click **Register** at the bottom of the
-Note that if you are a new user, you will also need to provide some basic information about your business. See the screenshots below.
+Note that if you're a new user, you will also need to provide some basic information about your business. See the screenshots below.
@@ -87,7 +87,7 @@ Note that if you are a new user, you will also need to provide some basic inform
-If you are an existing ClickHouse Cloud user, simply log in using your credentials.
+If you're an existing ClickHouse Cloud user, simply log in using your credentials.
7. After successfully logging in, a new ClickHouse Cloud organization will be created. This organization will be connected to your GCP billing account and all usage will be billed via your GCP account.
@@ -114,4 +114,4 @@ If you are an existing ClickHouse Cloud user, simply log in using your credentia
-If you run into any issues, please do not hesitate to contact [our support team](https://clickhouse.com/support/program).
+If you run into any issues, please don't hesitate to contact [our support team](https://clickhouse.com/support/program).
diff --git a/docs/cloud/reference/03_billing/02_marketplace/migrate-marketplace-payg-committed.md b/docs/cloud/reference/03_billing/02_marketplace/migrate-marketplace-payg-committed.md
index 8290447aed3..7e862851552 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/migrate-marketplace-payg-committed.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/migrate-marketplace-payg-committed.md
@@ -12,7 +12,7 @@ If your ClickHouse organization is currently billed through an active cloud mark
## Important Notes {#important-notes}
-Please note that canceling your marketplace PAYG subscription does not delete your ClickHouse Cloud account - only the billing relationship via the marketplace. Once canceled, our system will stop billing for ClickHouse Cloud services through the marketplace. (Note: this process is not immediate and may take a few minutes to complete).
+Please note that canceling your marketplace PAYG subscription doesn't delete your ClickHouse Cloud account - only the billing relationship via the marketplace. Once canceled, our system will stop billing for ClickHouse Cloud services through the marketplace. (Note: this process isn't immediate and may take a few minutes to complete).
After your marketplace subscription is canceled, if your ClickHouse organization has a credit card on file, we will charge that card at the end of your billing cycle - unless a new marketplace subscription is attached beforehand.
@@ -40,7 +40,7 @@ If you want to use a different AWS Account ID for migrating your ClickHouse orga
- Under "Agreement" click on the "Actions" dropdown or button next to the ClickHouse Cloud listing
- Select "Cancel subscription"
-> **Note:** For help cancelling your subscription (e.g. if the cancel subscription button is not available) please contact [AWS support](https://support.console.aws.amazon.com/support/home#/).
+> **Note:** For help cancelling your subscription (e.g. if the cancel subscription button isn't available) please contact [AWS support](https://support.console.aws.amazon.com/support/home#/).
Next follow these [steps](/cloud/billing/marketplace/aws-marketplace-committed-contract) to configure your ClickHouse organization to the new AWS committed spend contract you accepted.
@@ -49,7 +49,7 @@ Next follow these [steps](/cloud/billing/marketplace/aws-marketplace-committed-c
### Steps to Cancel GCP PAYG Order {#cancel-gcp-payg}
1. **Go to your [Google Cloud Marketplace Console](https://console.cloud.google.com/marketplace):**
- - Make sure you are logged in to the correct GCP account and have selected the appropriate project
+ - Make sure you're logged in to the correct GCP account and have selected the appropriate project
2. **Locate your ClickHouse order:**
- In the left menu, click "Your Orders"
- Find the correct ClickHouse order in the list of active orders
@@ -78,5 +78,5 @@ Next, follow these [steps](/cloud/billing/marketplace/azure-marketplace-committe
## Requirements for Linking to Committed Spend Contract {#linking-requirements}
> **Note:** In order to link your organization to a marketplace committed spend contract:
-> - The user following the steps must be an admin user of the ClickHouse organization you are attaching the subscription to
+> - The user following the steps must be an admin user of the ClickHouse organization you're attaching the subscription to
> - All unpaid invoices on the organization must be paid (please reach out to ClickHouse [support](https://clickhouse.com/support/program) for any questions)
diff --git a/docs/cloud/reference/03_billing/02_marketplace/overview.md b/docs/cloud/reference/03_billing/02_marketplace/overview.md
index d6e4b837cad..8d59198b5a9 100644
--- a/docs/cloud/reference/03_billing/02_marketplace/overview.md
+++ b/docs/cloud/reference/03_billing/02_marketplace/overview.md
@@ -32,7 +32,7 @@ Signing up for ClickHouse Cloud from the cloud provider marketplace is a two ste
1. You first "subscribe" to ClickHouse Cloud on the cloud providers' marketplace portal. After you have finished subscribing, you click on "Pay Now" or "Manage on Provider" (depending on the marketplace). This redirects you to ClickHouse Cloud.
2. On Clickhouse Cloud you either register for a new account, or sign in with an existing account. Either way, a new ClickHouse Cloud organization will be created for you which is tied to your marketplace billing.
-NOTE: Your existing services and organizations from any prior ClickHouse Cloud signups will remain and they will not be connected to the marketplace billing. ClickHouse Cloud allows you to use the same account to manage multiple organization, each with different billing.
+NOTE: Your existing services and organizations from any prior ClickHouse Cloud signups will remain and they won't be connected to the marketplace billing. ClickHouse Cloud allows you to use the same account to manage multiple organization, each with different billing.
You can switch between organizations from the bottom left menu of the ClickHouse Cloud console.
@@ -50,7 +50,7 @@ Your existing services and organizations from any prior ClickHouse Cloud signups
### I subscribed to ClickHouse Cloud as a marketplace user. How can I unsubscribe? {#i-subscribed-to-clickhouse-cloud-as-a-marketplace-user-how-can-i-unsubscribe}
-Note that you can simply stop using ClickHouse Cloud and delete all existing ClickHouse Cloud services. Even though the subscription will still be active, you will not be paying anything as ClickHouse Cloud doesn't have any recurring fees.
+Note that you can simply stop using ClickHouse Cloud and delete all existing ClickHouse Cloud services. Even though the subscription will still be active, you won't be paying anything as ClickHouse Cloud doesn't have any recurring fees.
If you want to unsubscribe, please navigate to the Cloud Provider console and cancel the subscription renewal there. Once the subscription ends, all existing services will be stopped and you will be prompted to add a credit card. If no card was added, after two weeks all existing services will be deleted.
@@ -74,7 +74,7 @@ Marketplace billing follows the calendar month cycle. For example, for usage bet
ClickHouse Cloud usage statements follow a different billing cycle where usage is metered and reported over 30 days starting from the day of sign up.
-The usage and invoice dates will differ if these dates are not the same. Since usage statements track usage by day for a given service, you can rely on statements to see the breakdown of costs.
+The usage and invoice dates will differ if these dates aren't the same. Since usage statements track usage by day for a given service, you can rely on statements to see the breakdown of costs.
### Where can I find general billing information? {#where-can-i-find-general-billing-information}
@@ -86,7 +86,7 @@ There is no difference in pricing between marketplace billing and signing up dir
### Can I set up multiple ClickHouse Organizations to bill to a single cloud marketplace billing account (AWS, GCP, or Azure)? {#multiple-organizations-to-bill-to-single-cloud-marketplace-account}
-Yes. Multiple ClickHouse organizations can be configured to bill usage in arrears to the same cloud marketplace billing account (AWS, GCP, or Azure). However, prepaid credits are not shared across organizations by default. If you need to share credits between organizations, please contact [ClickHouse Cloud Support](https://clickhouse.com/support/program).
+Yes. Multiple ClickHouse organizations can be configured to bill usage in arrears to the same cloud marketplace billing account (AWS, GCP, or Azure). However, prepaid credits aren't shared across organizations by default. If you need to share credits between organizations, please contact [ClickHouse Cloud Support](https://clickhouse.com/support/program).
### If my ClickHouse Organization is billed through a cloud marketplace committed spend agreement will I automatically move to PAYG billing when I run out of credits? {#automatically-move-to-PAYG-when-running-out-of-credit}
diff --git a/docs/cloud/reference/03_billing/03_clickpipes/clickpipes_for_cdc.md b/docs/cloud/reference/03_billing/03_clickpipes/clickpipes_for_cdc.md
index 647d0c06c08..561d23f3084 100644
--- a/docs/cloud/reference/03_billing/03_clickpipes/clickpipes_for_cdc.md
+++ b/docs/cloud/reference/03_billing/03_clickpipes/clickpipes_for_cdc.md
@@ -103,7 +103,7 @@ $$\$200 \text{ (ingest)} + \$146 \text{ (compute)} = \$346$$
Is the ingested data measured in pricing based on compressed or uncompressed size?
The ingested data is measured as _uncompressed data_ coming from Postgres—both
-during the initial load and CDC (via the replication slot). Postgres does not
+during the initial load and CDC (via the replication slot). Postgres doesn't
compress data during transit by default, and ClickPipe processes the raw,
uncompressed bytes.
@@ -143,7 +143,7 @@ in conjunction with the ClickPipes pricing.
Can I scale the compute allocated for Postgres CDC in my service?
-By default, compute scaling is not user-configurable. The provisioned resources
+By default, compute scaling isn't user-configurable. The provisioned resources
are optimized to handle most customer workloads optimally. If your use case
requires more or less compute, please open a support ticket so we can evaluate
your request.
diff --git a/docs/cloud/reference/03_billing/05_payment-thresholds.md b/docs/cloud/reference/03_billing/05_payment-thresholds.md
index 6a664530eed..394cd894b56 100644
--- a/docs/cloud/reference/03_billing/05_payment-thresholds.md
+++ b/docs/cloud/reference/03_billing/05_payment-thresholds.md
@@ -12,7 +12,7 @@ doc_type: 'guide'
When your amount due in a billing period for ClickHouse Cloud reaches $10,000 USD or the equivalent value, your payment method will be automatically charged. A failed charge will result in the suspension or termination of your services after a grace period.
:::note
-This payment threshold does not apply to customers who have a committed spend contract or other negotiated contractual agreement with ClickHouse.
+This payment threshold doesn't apply to customers who have a committed spend contract or other negotiated contractual agreement with ClickHouse.
:::
If your organization reaches 90% of the payment threshold and is on-track to exceed the payment threshold mid-period, the billing email associated with the organization will receive an email notification. You will also receive an email notification as well as an invoice when you exceed the payment threshold.
diff --git a/docs/cloud/reference/05_supported-regions.md b/docs/cloud/reference/05_supported-regions.md
index 3ac84612dd8..5ac30a056fc 100644
--- a/docs/cloud/reference/05_supported-regions.md
+++ b/docs/cloud/reference/05_supported-regions.md
@@ -73,9 +73,9 @@ Need to deploy to a region not currently listed? [Submit a request](https://clic
We offer Private regions for our Enterprise tier services. Please [Contact us](https://clickhouse.com/company/contact) for private region requests.
Key considerations for private regions:
-- Services will not auto-scale; however, manual vertical and horizontal scaling is supported.
-- Services cannot be idled.
-- Status page is not available for private regions.
+- Services won't auto-scale; however, manual vertical and horizontal scaling is supported.
+- Services can't be idled.
+- Status page isn't available for private regions.
Additional requirements may apply for HIPAA compliance (including signing a BAA). Note that HIPAA is currently available only for Enterprise tier services
diff --git a/docs/cloud/reference/10_personal-data-access.md b/docs/cloud/reference/10_personal-data-access.md
index c1cf4ef2357..c68d6eba34c 100644
--- a/docs/cloud/reference/10_personal-data-access.md
+++ b/docs/cloud/reference/10_personal-data-access.md
@@ -16,7 +16,7 @@ As a registered user, ClickHouse allows you to view and manage your personal acc
**What is a Data Subject Access Request (DSAR)**
-Depending on where you are located, applicable law may also provide you additional rights as to personal data that ClickHouse holds about you (Data Subject Rights), as described in the ClickHouse Privacy Policy. The process for exercising Data Subject Rights is known as a Data Subject Access Request (DSAR).
+Depending on where you're located, applicable law may also provide you additional rights as to personal data that ClickHouse holds about you (Data Subject Rights), as described in the ClickHouse Privacy Policy. The process for exercising Data Subject Rights is known as a Data Subject Access Request (DSAR).
**Scope of Personal Data**
@@ -42,7 +42,7 @@ Note: URLs with `OrgID` need to be updated to reflect the `OrgID` for your speci
### Current customers {#current-customers}
-If you have an account with us and the self-service option has not resolved your personal data issue, you can submit a Data Subject Access Request under the Privacy Policy. To do so, log into your ClickHouse account and open a [support case](https://console.clickhouse.cloud/support). This helps us verify your identity and streamline the process to address your request.
+If you have an account with us and the self-service option hasn't resolved your personal data issue, you can submit a Data Subject Access Request under the Privacy Policy. To do so, log into your ClickHouse account and open a [support case](https://console.clickhouse.cloud/support). This helps us verify your identity and streamline the process to address your request.
Please be sure to include the following details in your support case:
@@ -55,7 +55,7 @@ Please be sure to include the following details in your support case:
### Individuals without an account {#individuals-without-an-account}
-If you do not have an account with us and the self-service option above has not resolved your personal-data issue, and you wish to make a Data Subject Access Request pursuant to the Privacy Policy, you may submit these requests by email to [privacy@clickhouse.com](mailto:privacy@clickhouse.com).
+If you don't have an account with us and the self-service option above hasn't resolved your personal-data issue, and you wish to make a Data Subject Access Request pursuant to the Privacy Policy, you may submit these requests by email to [privacy@clickhouse.com](mailto:privacy@clickhouse.com).
## Identity verification {#identity-verification}
diff --git a/docs/cloud/reference/11_account-close.md b/docs/cloud/reference/11_account-close.md
index ca9d1b402c7..ab72cfa42a6 100644
--- a/docs/cloud/reference/11_account-close.md
+++ b/docs/cloud/reference/11_account-close.md
@@ -9,7 +9,7 @@ doc_type: 'guide'
## Account closure and deletion {#account-close--deletion}
-Our goal is to help you be successful in your project. If you have questions that are not answered on this site or need help evaluating a
+Our goal is to help you be successful in your project. If you have questions that aren't answered on this site or need help evaluating a
unique use case, please contact us at [support@clickhouse.com](mailto:support@clickhouse.com).
We know there are circumstances that sometimes necessitate account closure. This guide will help you through the process.
@@ -21,7 +21,7 @@ You will also continue receiving product updates so that you know if a feature y
closed accounts may be reopened at any time to start new services.
Customers requesting personal data deletion should be aware this is an irreversible process. The account and related information will no longer
-be available. You will not receive product updates and may not reopen the account. This will not affect any newsletter subscriptions.
+be available. You won't receive product updates and may not reopen the account. This won't affect any newsletter subscriptions.
Newsletter subscribers can unsubscribe at any time by using the unsubscribe link at the bottom of the newsletter email without closing their account or
deleting their information.
@@ -32,11 +32,11 @@ Before requesting account closure, please take the following steps to prepare th
1. Export any data from your service that you need to keep.
2. Stop and delete your services. This will keep additional charges from accruing on your account.
3. Remove all users except the admin that will request closure. This will help you ensure no new services are created while the process completes.
-4. Review the 'Usage' and 'Billing' tabs in the control panel to verify all charges have been paid. We are not able to close accounts with unpaid balances.
+4. Review the 'Usage' and 'Billing' tabs in the control panel to verify all charges have been paid. We're not able to close accounts with unpaid balances.
## Request an account closure {#request-account-closure}
-We are required to authenticate requests for both closure and deletion. To ensure your request can be processed quickly, please follow the steps outlined
+We're required to authenticate requests for both closure and deletion. To ensure your request can be processed quickly, please follow the steps outlined
below.
1. Sign into your clickhouse.cloud account.
2. Complete any remaining steps in the _Preparing for Closure_ section above.
@@ -54,7 +54,7 @@ Description: We would appreciate it if you would share a brief note about why yo
6. We will close your account and send a confirmation email to let you know when it is complete.
## Request deletion of your personal data {#request-personal-data-deletion}
-Please note, only account administrators may request personal data deletion from ClickHouse. If you are not an account administrator, please contact
+Please note, only account administrators may request personal data deletion from ClickHouse. If you're not an account administrator, please contact
your ClickHouse account administrator to request to be removed from the account.
To request data deletion, follow the steps in 'Request Account Closure' above. When entering the case information, change the subject to
diff --git a/docs/cloud/reference/data-resiliency.md b/docs/cloud/reference/data-resiliency.md
index c466c1f9e33..4579d493a98 100644
--- a/docs/cloud/reference/data-resiliency.md
+++ b/docs/cloud/reference/data-resiliency.md
@@ -13,7 +13,7 @@ import restore_backup from '@site/static/images/cloud/guides/restore_backup.png'
# Data resiliency {#clickhouse-cloud-data-resiliency}
This page covers the disaster recovery recommendations for ClickHouse Cloud, and guidance for customers to recover from an outage.
-ClickHouse Cloud does not currently support automatic failover, or automatic syncing across multiple geographical regions.
+ClickHouse Cloud doesn't currently support automatic failover, or automatic syncing across multiple geographical regions.
:::tip
Customers should perform periodic backup restore testing to understand the specific RTO for their service size and configuration.
@@ -27,7 +27,7 @@ It is helpful to cover some definitions first.
**RTO (Recovery Time Objective)**: The maximum allowable downtime before normal operations must resume following an outage. Example: An RTO of 30 mins means that in the event of a failure, the team is able to restore data and applications and get normal operations going within 30 mins.
-**Database Backups and Snapshots**: Backups provide durable long-term storage with a separate copy of the data. Snapshots do not create an additional copy of the data, are usually faster, and provide better RPOs.
+**Database Backups and Snapshots**: Backups provide durable long-term storage with a separate copy of the data. Snapshots don't create an additional copy of the data, are usually faster, and provide better RPOs.
## Database backups {#database-backups}
@@ -47,7 +47,7 @@ Cross-cloud backup export support is coming soon.
Applicable data transfer charges will apply for cross-region, and cross-cloud backups.
:::note
-This feature is not currently available in PCI/ HIPAA services
+This feature isn't currently available in PCI/ HIPAA services
:::
3. **Configurable backups**
@@ -107,7 +107,7 @@ To restore from an existing backup
### Primary region downtime {#primary-region-downtime}
Customers in the Enterprise Tier can [export backups](/cloud/manage/backups/export-backups-to-own-cloud-account) to their own cloud provider bucket.
-If you are concerned about regional failures, we recommend exporting backups to a different region.
+If you're concerned about regional failures, we recommend exporting backups to a different region.
Keep in mind that cross-region data transfer charges will apply.
If the primary region goes down, the backup in another region can be restored to a new service in a different region.
diff --git a/docs/concepts/glossary.md b/docs/concepts/glossary.md
index 3db8b497d56..7b6967169cb 100644
--- a/docs/concepts/glossary.md
+++ b/docs/concepts/glossary.md
@@ -33,7 +33,7 @@ A dictionary is a mapping of key-value pairs that is useful for various types of
## Distributed table {#distributed-table}
-A distributed table in ClickHouse is a special type of table that does not store data itself but provides a unified view for distributed query processing across multiple servers in a cluster.
+A distributed table in ClickHouse is a special type of table that doesn't store data itself but provides a unified view for distributed query processing across multiple servers in a cluster.
## Granule {#granule}
@@ -77,7 +77,7 @@ A partitioning key in ClickHouse is a SQL expression defined in the PARTITION BY
## Primary key {#primary-key}
-In ClickHouse, a primary key determines the order in which data is stored on disk and is used to build a sparse index that speeds up query filtering. Unlike traditional databases, the primary key in ClickHouse does not enforce uniqueness—multiple rows can have the same primary key value.
+In ClickHouse, a primary key determines the order in which data is stored on disk and is used to build a sparse index that speeds up query filtering. Unlike traditional databases, the primary key in ClickHouse doesn't enforce uniqueness—multiple rows can have the same primary key value.
## Projection {#projection}
@@ -93,7 +93,7 @@ A copy of the data stored in a ClickHouse database. You can have any number of r
## Shard {#shard}
-A subset of data. ClickHouse always has at least one shard for your data. If you do not split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server.
+A subset of data. ClickHouse always has at least one shard for your data. If you don't split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server.
## Skipping index {#skipping-index}
@@ -101,7 +101,7 @@ Skipping indices are used to store small amounts of metadata at the level of mul
## Sorting key {#sorting-key}
-In ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key.
+In ClickHouse, a sorting key defines the physical order of rows on disk. If you don't specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key.
## Sparse index {#sparse-index}
diff --git a/docs/concepts/olap.md b/docs/concepts/olap.md
index 9c7c72a77e6..c73f650829f 100644
--- a/docs/concepts/olap.md
+++ b/docs/concepts/olap.md
@@ -30,9 +30,9 @@ ClickHouse is an OLAP database management system that is pretty often used as a
All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). The former focuses on building reports, each based on large volumes of historical data, but by doing it less frequently. The latter usually handles a continuous stream of transactions, constantly modifying the current state of data.
-In practice OLAP and OLTP are not viewed as binary categories, but more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems that are integrated. This might not be such a big deal, but having more systems increases maintenance costs, and as such the trend in recent years is towards HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of workload are handled equally well by a single database management system.
+In practice OLAP and OLTP aren't viewed as binary categories, but more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems that are integrated. This might not be such a big deal, but having more systems increases maintenance costs, and as such the trend in recent years is towards HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of workload are handled equally well by a single database management system.
-Even if a DBMS started out as a pure OLAP or pure OLTP, it is forced to move in the HTAP direction to keep up with the competition. ClickHouse is no exception. Initially, it has been designed as a [fast-as-possible OLAP system](/concepts/why-clickhouse-is-so-fast) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data have been added.
+Even if a DBMS started out as a pure OLAP or pure OLTP, it is forced to move in the HTAP direction to keep up with the competition. ClickHouse is no exception. Initially, it has been designed as a [fast-as-possible OLAP system](/concepts/why-clickhouse-is-so-fast) and it still doesn't have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data have been added.
The fundamental trade-off between OLAP and OLTP systems remains:
diff --git a/docs/concepts/why-clickhouse-is-so-fast.mdx b/docs/concepts/why-clickhouse-is-so-fast.mdx
index e0197b61b14..5f4b6635c34 100644
--- a/docs/concepts/why-clickhouse-is-so-fast.mdx
+++ b/docs/concepts/why-clickhouse-is-so-fast.mdx
@@ -24,7 +24,7 @@ In ClickHouse, each table consists of multiple "table parts". A [part](/parts) i
To avoid that too many parts accumulate, ClickHouse runs a [merge](/merges) operation in the background which continuously combines multiple smaller parts into a single bigger part.
-This approach has several advantages: All data processing can be [offloaded to background part merges](/concepts/why-clickhouse-is-so-fast#storage-layer-merge-time-computation), keeping data writes lightweight and highly efficient. Individual inserts are "local" in the sense that they do not need to update global, i.e. per-table data structures. As a result, multiple simultaneous inserts need no mutual synchronization or synchronization with existing table data, and thus inserts can be performed almost at the speed of disk I/O.
+This approach has several advantages: All data processing can be [offloaded to background part merges](/concepts/why-clickhouse-is-so-fast#storage-layer-merge-time-computation), keeping data writes lightweight and highly efficient. Individual inserts are "local" in the sense that they don't need to update global, i.e. per-table data structures. As a result, multiple simultaneous inserts need no mutual synchronization or synchronization with existing table data, and thus inserts can be performed almost at the speed of disk I/O.
🤿 Deep dive into this in the [On-Disk Format](/docs/academic_overview#3-1-on-disk-format) section of the web version of our VLDB 2024 paper.
@@ -52,7 +52,7 @@ The point of these transformations is to shift work (computation) from the time
On the one hand, user queries may become significantly faster, sometimes by 1000x or more, if they can leverage "transformed" data, e.g. pre-aggregated data.
-On the other hand, the majority of the runtime of merges is consumed by loading the input parts and saving the output part. The additional effort to transform the data during merge does usually not impact the runtime of merges too much. All of this magic is completely transparent and does not affect the result of queries (besides their performance).
+On the other hand, the majority of the runtime of merges is consumed by loading the input parts and saving the output part. The additional effort to transform the data during merge does usually not impact the runtime of merges too much. All of this magic is completely transparent and doesn't affect the result of queries (besides their performance).
🤿 Deep dive into this in the [Merge-time Data Transformation](/docs/academic_overview#3-3-merge-time-data-transformation) section of the web version of our VLDB 2024 paper.
@@ -116,7 +116,7 @@ What sets ClickHouse [apart](https://www.youtube.com/watch?v=CAS2otEoerM) is its
* The fill factor: When and how to resize? How to move values during resize?
* Deletions: Should the hash table allow evicting entries?
-A standard hash table provided by a third-party library would functionally work, but it would not be fast. Great performance requires meticulous benchmarking and experimentation.
+A standard hash table provided by a third-party library would functionally work, but it wouldn't be fast. Great performance requires meticulous benchmarking and experimentation.
The [hash table implementation in ClickHouse](https://clickhouse.com/blog/hash-tables-in-clickhouse-and-zero-cost-abstractions) chooses one of **30+ precompiled hash table variants based** on the specifics of the query and the data.
@@ -127,7 +127,7 @@ The [hash table implementation in ClickHouse](https://clickhouse.com/blog/hash-t
* Is the sort required to be stable?
* Should all data be sorted or will a partial sort suffice?
-Algorithms that rely on data characteristics often perform better than their generic counterparts. If the data characteristics are not known in advance, the system can try various implementations and choose the one that works best at runtime. For an example, see the [article on how LZ4 decompression is implemented in ClickHouse](https://habr.com/en/company/yandex/blog/457612/).
+Algorithms that rely on data characteristics often perform better than their generic counterparts. If the data characteristics aren't known in advance, the system can try various implementations and choose the one that works best at runtime. For an example, see the [article on how LZ4 decompression is implemented in ClickHouse](https://habr.com/en/company/yandex/blog/457612/).
🤿 Deep dive into this in the [Holistic Performance Optimization](/academic_overview#4-4-holistic-performance-optimization) section of the web version of our VLDB 2024 paper.
diff --git a/docs/data-compression/compression-in-clickhouse.md b/docs/data-compression/compression-in-clickhouse.md
index 233f4e4014e..a396dc7fc6f 100644
--- a/docs/data-compression/compression-in-clickhouse.md
+++ b/docs/data-compression/compression-in-clickhouse.md
@@ -67,12 +67,12 @@ GROUP BY name
A note on compact versus wide parts
-If you are seeing `compressed_size` or `uncompressed_size` values equal to `0`, this could be because the type of the
+If you're seeing `compressed_size` or `uncompressed_size` values equal to `0`, this could be because the type of the
parts are `compact` and not `wide` (see description for `part_type` in [`system.parts`](/operations/system-tables/parts)).
The part format is controlled by settings [`min_bytes_for_wide_part`](/operations/settings/merge-tree-settings#min_bytes_for_wide_part)
and [`min_rows_for_wide_part`](/operations/settings/merge-tree-settings#min_rows_for_wide_part) meaning that if the inserted
-data results in a part which does not exceed the values of the aforementioned settings, the part will be compact rather
-than wide and you will not see the values for `compressed_size` or `uncompressed_size`.
+data results in a part which doesn't exceed the values of the aforementioned settings, the part will be compact rather
+than wide and you won't see the values for `compressed_size` or `uncompressed_size`.
To demonstrate:
diff --git a/docs/data-modeling/backfilling.md b/docs/data-modeling/backfilling.md
index b73c45946c0..670ff5a7064 100644
--- a/docs/data-modeling/backfilling.md
+++ b/docs/data-modeling/backfilling.md
@@ -70,7 +70,7 @@ The full PyPI dataset, consisting of over 1 trillion rows, is available in our p
## Backfilling scenarios {#backfilling-scenarios}
-Backfilling is typically needed when a stream of data is being consumed from a point in time. This data is being inserted into ClickHouse tables with [incremental materialized views](/materialized-view/incremental-materialized-view), triggering on blocks as they are inserted. These views may be transforming the data prior to insert or computing aggregates and sending results to target tables for later use in downstream applications.
+Backfilling is typically needed when a stream of data is being consumed from a point in time. This data is being inserted into ClickHouse tables with [incremental materialized views](/materialized-view/incremental-materialized-view), triggering on blocks as they're inserted. These views may be transforming the data prior to insert or computing aggregates and sending results to target tables for later use in downstream applications.
We will attempt to cover the following scenarios:
@@ -146,7 +146,7 @@ Peak memory usage: 682.38 KiB.
Suppose we wish to load another subset `{101..200}`. While we could insert directly into `pypi`, we can do this backfill in isolation by creating duplicate tables.
-Should the backfill fail, we have not impacted our main tables and can simply [truncate](/managing-data/truncate) our duplicate tables and repeat.
+Should the backfill fail, we haven't impacted our main tables and can simply [truncate](/managing-data/truncate) our duplicate tables and repeat.
To create new copies of these views, we can use the `CREATE TABLE AS` clause with the suffix `_v2`:
@@ -259,7 +259,7 @@ ClickPipes uses this approach when loading data from object storage, automatical
## Scenario 1: Backfilling data with existing data ingestion {#scenario-1-backfilling-data-with-existing-data-ingestion}
-In this scenario, we assume that the data to backfill is not in an isolated bucket and thus filtering is required. Data is already inserting and a timestamp or monotonically increasing column can be identified from which historical data needs to be backfilled.
+In this scenario, we assume that the data to backfill isn't in an isolated bucket and thus filtering is required. Data is already inserting and a timestamp or monotonically increasing column can be identified from which historical data needs to be backfilled.
This process follows the following steps:
@@ -313,18 +313,18 @@ ALTER TABLE pypi_v2 MOVE PARTITION () TO pypi
ALTER TABLE pypi_downloads_v2 MOVE PARTITION () TO pypi_downloads
```
-If the historical data is an isolated bucket, the above time filter is not required. If a time or monotonic column is unavailable, isolate your historical data.
+If the historical data is an isolated bucket, the above time filter isn't required. If a time or monotonic column is unavailable, isolate your historical data.
:::note Just use ClickPipes in ClickHouse Cloud
-If you are using ClickHouse Cloud you should use ClickPipes for restoring historical backups if the data can be isolated in its own bucket (and a filter is not required). In addition to reducing the load time by parallelizing the load with multiple workers, ClickPipes automates the above process and creates duplicate tables for both the main table and materialized views.
+If you're using ClickHouse Cloud you should use ClickPipes for restoring historical backups if the data can be isolated in its own bucket (and a filter isn't required). In addition to reducing the load time by parallelizing the load with multiple workers, ClickPipes automates the above process and creates duplicate tables for both the main table and materialized views.
:::
## Scenario 2: Adding materialized views to existing tables {#scenario-2-adding-materialized-views-to-existing-tables}
-It is not uncommon for new materialized views to need to be added to a setup for which significant data has been populated and data is being inserted. A timestamp or monotonically increasing column, which can be used to identify a point in the stream, is useful here and avoids pauses in data ingestion. In the examples below, we assume both cases, preferring approaches that avoid pauses in ingestion.
+It isn't uncommon for new materialized views to need to be added to a setup for which significant data has been populated and data is being inserted. A timestamp or monotonically increasing column, which can be used to identify a point in the stream, is useful here and avoids pauses in data ingestion. In the examples below, we assume both cases, preferring approaches that avoid pauses in ingestion.
:::note Avoid POPULATE
-We do not recommend using the [`POPULATE`](/sql-reference/statements/create/view#materialized-view) command for backfilling materialized views for anything other than small datasets where ingest is paused. This operator can miss rows inserted into its source table, with the materialized view created after the populate hash is finished. Furthermore, this populate runs against all data and is vulnerable to interruptions or memory limits on large datasets.
+We don't recommend using the [`POPULATE`](/sql-reference/statements/create/view#materialized-view) command for backfilling materialized views for anything other than small datasets where ingest is paused. This operator can miss rows inserted into its source table, with the materialized view created after the populate hash is finished. Furthermore, this populate runs against all data and is vulnerable to interruptions or memory limits on large datasets.
:::
### Timestamp or Monotonically increasing column available {#timestamp-or-monotonically-increasing-column-available}
@@ -399,14 +399,14 @@ In the above example our target table is a [SummingMergeTree](/engines/table-eng
In our case, this is a relatively lightweight aggregation that completes in under 3s and uses less than 600MiB of memory. For more complex or longer-running aggregations, you can make this process more resilient by using the earlier duplicate table approach i.e. create a shadow target table, e.g., `pypi_downloads_per_day_v2`, insert into this, and attach its resulting partitions to `pypi_downloads_per_day`.
-Often materialized view's query can be more complex (not uncommon as otherwise users wouldn't use a view!) and consume resources. In rarer cases, the resources for the query are beyond that of the server. This highlights one of the advantages of ClickHouse materialized views - they are incremental and don't process the entire dataset in one go!
+Often materialized view's query can be more complex (not uncommon as otherwise users wouldn't use a view!) and consume resources. In rarer cases, the resources for the query are beyond that of the server. This highlights one of the advantages of ClickHouse materialized views - they're incremental and don't process the entire dataset in one go!
In this case, users have several options:
1. Modify your query to backfill ranges e.g. `WHERE timestamp BETWEEN 2024-12-17 08:00:00 AND 2024-12-17 09:00:00`, `WHERE timestamp BETWEEN 2024-12-17 07:00:00 AND 2024-12-17 08:00:00` etc.
2. Use a [Null table engine](/engines/table-engines/special/null) to fill the materialized view. This replicates the typical incremental population of a materialized view, executing it's query over blocks of data (of configurable size).
-(1) represents the simplest approach is often sufficient. We do not include examples for brevity.
+(1) represents the simplest approach is often sufficient. We don't include examples for brevity.
We explore (2) further below.
@@ -471,7 +471,7 @@ Several factors will determine the performance and resources used in the above s
**Tip for trivial INSERT SELECT queries**: For simple `INSERT INTO t1 SELECT * FROM t2` queries without complex transformations, consider enabling `optimize_trivial_insert_select=1`. This setting (disabled by default since version 24.7) automatically adjusts the SELECT parallelism to match `max_insert_threads`, reducing resource usage and the number of parts created. This is particularly useful for bulk data migrations between tables.
:::
-For improving performance, you can follow the guidelines outlined in the [Tuning Threads and Block Size for Inserts](/integrations/s3/performance#tuning-threads-and-block-size-for-inserts) section of the [Optimizing for S3 Insert and Read Performance guide](/integrations/s3/performance). It should not be necessary to also modify `min_insert_block_size_bytes_for_materialized_views` and `min_insert_block_size_rows_for_materialized_views` to improve performance in most cases. If these are modified, use the same best practices as discussed for `min_insert_block_size_rows` and `min_insert_block_size_bytes`.
+For improving performance, you can follow the guidelines outlined in the [Tuning Threads and Block Size for Inserts](/integrations/s3/performance#tuning-threads-and-block-size-for-inserts) section of the [Optimizing for S3 Insert and Read Performance guide](/integrations/s3/performance). It shouldn't be necessary to also modify `min_insert_block_size_bytes_for_materialized_views` and `min_insert_block_size_rows_for_materialized_views` to improve performance in most cases. If these are modified, use the same best practices as discussed for `min_insert_block_size_rows` and `min_insert_block_size_bytes`.
To minimize memory, you may wish to experiment with these settings. This will invariably lower performance. Using the earlier query, we show examples below.
diff --git a/docs/data-modeling/denormalization.md b/docs/data-modeling/denormalization.md
index 3e82168f0e5..df9955f46b3 100644
--- a/docs/data-modeling/denormalization.md
+++ b/docs/data-modeling/denormalization.md
@@ -32,7 +32,7 @@ In general, we would recommend denormalizing in the following cases:
- Denormalize tables which change infrequently or for which a delay before data is available for analytical queries can be tolerated i.e. the data can be completely reloaded in a batch.
- Avoid denormalizing many-to-many relationships. This can result in the need to update many rows if a single source row changes.
-- Avoid denormalizing high cardinality relationships. If each row in a table has thousands of related entries in another table, these will need to be represented as an `Array` - either of a primitive type or tuples. Generally, arrays with more than 1000 tuples would not be recommended.
+- Avoid denormalizing high cardinality relationships. If each row in a table has thousands of related entries in another table, these will need to be represented as an `Array` - either of a primitive type or tuples. Generally, arrays with more than 1000 tuples wouldn't be recommended.
- Rather than denormalizing all columns as nested objects, consider denormalizing just a statistic using materialized views (see below).
All information doesn't need to be denormalized - just the key information that needs to be frequently accessed.
@@ -127,7 +127,7 @@ LIMIT 5
└──────────┴──────────────────────────────────────────────┴───────┘
```
-The main observation here is that aggregated vote statistics for each post would be sufficient for most analysis - we do not need to denormalize all of the vote information. For example, the current `Score` column represents such a statistic i.e. total up votes minus down votes. Ideally, we would just be able to retrieve these statistics at query time with a simple lookup (see [dictionaries](/dictionary)).
+The main observation here is that aggregated vote statistics for each post would be sufficient for most analysis - we don't need to denormalize all of the vote information. For example, the current `Score` column represents such a statistic i.e. total up votes minus down votes. Ideally, we would just be able to retrieve these statistics at query time with a simple lookup (see [dictionaries](/dictionary)).
### Users and Badges {#users-and-badges}
@@ -237,7 +237,7 @@ ORDER BY c DESC LIMIT 5
└──────────┴─────┘
```
-Likewise, these links are not events which occur overly frequently:
+Likewise, these links aren't events which occur overly frequently:
```sql
SELECT
diff --git a/docs/data-modeling/projections/1_projections.md b/docs/data-modeling/projections/1_projections.md
index 353c3d5cd4c..4383a71ad9d 100644
--- a/docs/data-modeling/projections/1_projections.md
+++ b/docs/data-modeling/projections/1_projections.md
@@ -73,7 +73,7 @@ others indirectly via `_part_offset`.
## When to use Projections? {#when-to-use-projections}
-Projections are an appealing feature for new users as they are automatically
+Projections are an appealing feature for new users as they're automatically
maintained as data is inserted. Furthermore, queries can just be sent to a
single table where the projections are exploited where possible to speed up
the response time.
@@ -88,9 +88,9 @@ you should be aware of and thus should be deployed sparingly.
- Projections don't allow using different TTL for the source table and the
(hidden) target table, materialized views allow different TTLs.
-- Lightweight updates and deletes are not supported for tables with projections.
+- Lightweight updates and deletes aren't supported for tables with projections.
- Materialized Views can be chained: the target table of one materialized view
- can be the source table of another materialized view, and so on. This is not
+ can be the source table of another materialized view, and so on. This isn't
possible with projections.
- Projection definitions don't support joins, but Materialized Views do. However, queries on tables with projections can use joins freely.
- Projection definitions don't support filters (`WHERE` clause), but Materialized Views do. However, queries on tables with projections can filter freely.
@@ -113,7 +113,7 @@ We recommend using projections when:
In this example, we'll show you how to add a projection to a table.
We'll also look at how the projection can be used to speed up queries which filter
-on columns which are not in the primary key of a table.
+on columns which aren't in the primary key of a table.
For this example, we'll be using the New York Taxi Data
dataset available at [sql.clickhouse.com](https://sql.clickhouse.com/) which is ordered
@@ -131,7 +131,7 @@ FROM nyc_taxi.trips WHERE tip_amount > 200 AND trip_duration_min > 0
ORDER BY tip_amount, trip_id ASC
```
-Notice that because we are filtering on `tip_amount` which is not in the `ORDER BY`, ClickHouse
+Notice that because we're filtering on `tip_amount` which isn't in the `ORDER BY`, ClickHouse
had to do a full table scan. Let's speed this query up.
So as to preserve the original table and results, we'll create a new table and copy the data using an `INSERT INTO SELECT`:
@@ -590,7 +590,7 @@ INSERT INTO page_views VALUES (
:::note
Note: The table uses custom settings for illustration, such as one-row granules
-and disabled part merges, which are not recommended for production use.
+and disabled part merges, which aren't recommended for production use.
:::
This setup produces:
diff --git a/docs/data-modeling/projections/2_materialized-views-versus-projections.md b/docs/data-modeling/projections/2_materialized-views-versus-projections.md
index c5e53980285..292a22e2758 100644
--- a/docs/data-modeling/projections/2_materialized-views-versus-projections.md
+++ b/docs/data-modeling/projections/2_materialized-views-versus-projections.md
@@ -20,15 +20,15 @@ The table below summarizes the key differences between materialized views and pr
|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data storage and location | Store their results in a **separate, explicit target table**, acting as insert triggers, on insert to a source table. | Projections create optimized data layouts that are physically **stored alongside the main table data** and are invisible to the user. |
| Update mechanism | Operate **synchronously** on `INSERT` to the source table (for incremental materialized views). Note: they can also be **scheduled** using refreshable materialized views. | **Asynchronous** updates in the background upon `INSERT` to the main table. |
-| Query interaction | Working with Materialized Views requires querying the **target table directly**, meaning that you need to be aware of the existence of materialized views when writing queries. | Projections are **automatically selected** by ClickHouse's query optimizer, and are transparent in the sense that the user does not have to modify their queries to the table with the projection in order to utilise it. From version 25.6 it is also possible to filter by more than one projection. |
-| Handling `UPDATE` / `DELETE` | **Do not automatically react** to `UPDATE` or `DELETE` operations on the source table as materialized views have no knowledge of the source table, acting only as insert triggers _to_ a source table. This can lead to potential data staleness between source and target tables and requires workarounds or periodic full refresh. (via refreshable materialized view). | By default, are **incompatible with `DELETED` rows** (especially lightweight deletes). `lightweight_mutation_projection_mode` (v24.7+) can enable compatibility. |
-| `JOIN` support | Yes. Refreshable materialized views can be used for complex denormalization. Incremental materialized views only trigger on left-most table inserts. | No. `JOIN` operations are not supported within projection definitions for filtering the materialized data. However, queries that join tables with projections work normally—projections optimize individual table access. |
-| `WHERE` clause in definition | Yes. `WHERE` clauses can be included to filter data before materialization. | No. `WHERE` clauses are not supported within projection definitions for filtering the materialized data. |
-| Chaining capabilities | Yes, the target table of one materialized view can be the source for another materialized view, enabling multi-stage pipelines. | No. Projections cannot be chained. |
+| Query interaction | Working with Materialized Views requires querying the **target table directly**, meaning that you need to be aware of the existence of materialized views when writing queries. | Projections are **automatically selected** by ClickHouse's query optimizer, and are transparent in the sense that the user doesn't have to modify their queries to the table with the projection in order to utilise it. From version 25.6 it is also possible to filter by more than one projection. |
+| Handling `UPDATE` / `DELETE` | **Don't automatically react** to `UPDATE` or `DELETE` operations on the source table as materialized views have no knowledge of the source table, acting only as insert triggers _to_ a source table. This can lead to potential data staleness between source and target tables and requires workarounds or periodic full refresh. (via refreshable materialized view). | By default, are **incompatible with `DELETED` rows** (especially lightweight deletes). `lightweight_mutation_projection_mode` (v24.7+) can enable compatibility. |
+| `JOIN` support | Yes. Refreshable materialized views can be used for complex denormalization. Incremental materialized views only trigger on left-most table inserts. | No. `JOIN` operations aren't supported within projection definitions for filtering the materialized data. However, queries that join tables with projections work normally—projections optimize individual table access. |
+| `WHERE` clause in definition | Yes. `WHERE` clauses can be included to filter data before materialization. | No. `WHERE` clauses aren't supported within projection definitions for filtering the materialized data. |
+| Chaining capabilities | Yes, the target table of one materialized view can be the source for another materialized view, enabling multi-stage pipelines. | No. Projections can't be chained. |
| Applicable table engines | Can be used with various source table engines, but target tables are usually of the `MergeTree` family. | **Only available** for `MergeTree` family table engines. |
| Failure handling | Failure during data insertion means that data is lost in the target table, leading to potential inconsistency. | Failures are handled **silently** in the background. Queries can seamlessly mix materialized and unmaterialized parts. |
| Operational overhead | Requires explicit target table creation and often manual backfilling. Managing consistency with `UPDATE`/`DELETE` increases complexity. | Projections are automatically maintained and kept-in-sync and generally have a lower operational burden. |
-| `FINAL` query compatibility | Generally compatible, but often require `GROUP BY` on the target table. | **Do not work** with `FINAL` queries. |
+| `FINAL` query compatibility | Generally compatible, but often require `GROUP BY` on the target table. | **Don't work** with `FINAL` queries. |
| Lazy materialization | Yes. | Monitor for projection compatibility issues when using materialization features. You may need to set `query_plan_optimize_lazy_materialization = false` |
| Parallel replicas | Yes. | No. |
| [`optimize_read_in_order`](/operations/settings/settings#optimize_read_in_order) | Yes. | Yes. |
@@ -56,18 +56,18 @@ You should consider avoiding use of materialized views when:
You should consider using projections when:
-- **Optimizing queries for a single table**: Your primary goal is to speed up queries on a single base table by providing alternative sorting orders, optimizing filters on columns which are not part of the primary-key, or pre-computing aggregations for a single table.
+- **Optimizing queries for a single table**: Your primary goal is to speed up queries on a single base table by providing alternative sorting orders, optimizing filters on columns which aren't part of the primary-key, or pre-computing aggregations for a single table.
- You want **query transparency**: you want queries to target the original table without modification, relying on ClickHouse to pick the best data layout for a given query.
### When to avoid projections {#avoid-projections}
You should consider avoiding the use of projections when:
-- **Complex data transformation or multi-stage ETL are required**: Projection definitions do not support `JOIN` operations, cannot be chained to build multi-step pipelines, and cannot handle some SQL features like window functions or complex `CASE` statements. While queries on tables with projections can join freely, the projections themselves are not suited for complex data transformation.
-- **Explicit filtering of materialized data is needed**: Projections do not support `WHERE` clauses in their definition to filter the data that gets materialized into the projection itself.
+- **Complex data transformation or multi-stage ETL are required**: Projection definitions don't support `JOIN` operations, can't be chained to build multi-step pipelines, and can't handle some SQL features like window functions or complex `CASE` statements. While queries on tables with projections can join freely, the projections themselves aren't suited for complex data transformation.
+- **Explicit filtering of materialized data is needed**: Projections don't support `WHERE` clauses in their definition to filter the data that gets materialized into the projection itself.
- **Non-MergeTree table engines are used**: Projections are exclusively available for tables using the `MergeTree` family of engines.
-- `FINAL` queries are essential: Projections do not work with `FINAL` queries, which are sometimes used for deduplication.
-- You need [parallel replicas](/deployment-guides/parallel-replicas) as they are not supported with projections.
+- `FINAL` queries are essential: Projections don't work with `FINAL` queries, which are sometimes used for deduplication.
+- You need [parallel replicas](/deployment-guides/parallel-replicas) as they're not supported with projections.
## Summary {#summary}
@@ -81,11 +81,11 @@ access patterns.
As a general rule of thumb, you should consider using materialized views when
you need to aggregate data from one or more source tables into a target table or
perform complex transformations at scale. Materialized views are excellent for shifting
-the work of expensive aggregations from query time to insert time. They are a
+the work of expensive aggregations from query time to insert time. They're a
great choice for daily or monthly rollups, real-time dashboards or data summaries.
On the other hand, you should use projections when you need to optimize queries
which filter on different columns than those which are used in the table's primary
-key which determines the physical ordering of the data on disk. They are particularly
+key which determines the physical ordering of the data on disk. They're particularly
useful when it's no longer possible to change the primary key of a table, or when
your access patterns are more diverse than what the primary key can accommodate.
diff --git a/docs/data-modeling/schema-design.md b/docs/data-modeling/schema-design.md
index 51bab619aec..58a6180e820 100644
--- a/docs/data-modeling/schema-design.md
+++ b/docs/data-modeling/schema-design.md
@@ -16,7 +16,7 @@ Understanding effective schema design is key to optimizing ClickHouse performanc
For the examples in this guide, we use a subset of the Stack Overflow dataset. This contains every post, vote, user, comment and badge that has occurred on Stack Overflow from 2008 to Apr 2024. This data is available in Parquet using the schemas below under the S3 bucket `s3://datasets-documentation/stackoverflow/parquet/`:
-> The primary keys and relationships indicated are not enforced through constraints (Parquet is file not table format) and purely indicate how the data is related and the unique keys it possesses.
+> The primary keys and relationships indicated aren't enforced through constraints (Parquet is file not table format) and purely indicate how the data is related and the unique keys it possesses.
@@ -144,7 +144,7 @@ Use the minimal precision for numeric types - ClickHouse has a number of numeric
- **Minimal precision for date types** - ClickHouse supports a number of date and datetime types. Date and Date32 can be used for storing pure dates, with the latter supporting a larger date range at the expense of more bits. DateTime and DateTime64 provide support for date times. DateTime is limited to second granularity and uses 32 bits. DateTime64, as the name suggests, uses 64 bits but provides support up to nanosecond granularity. As ever, choose the more coarse version acceptable for queries, minimizing the number of bits needed.
- **Use LowCardinality** - Numbers, strings, Date or DateTime columns with a low number of unique values can potentially be encoded using the LowCardinality type. This dictionary encodes values, reducing the size on disk. Consider this for columns with less than 10k unique values.
FixedString for special cases - Strings which have a fixed length can be encoded with the FixedString type e.g. language and currency codes. This is efficient when data has the length of precisely N bytes. In all other cases, it is likely to reduce efficiency and LowCardinality is preferred.
-- **Enums for data validation** - The Enum type can be used to efficiently encode enumerated types. Enums can either be 8 or 16 bits, depending on the number of unique values they are required to store. Consider using this if you need either the associated validation at insert time (undeclared values will be rejected) or wish to perform queries which exploit a natural ordering in the Enum values e.g. imagine a feedback column containing user responses `Enum(':(' = 1, ':|' = 2, ':)' = 3)`.
+- **Enums for data validation** - The Enum type can be used to efficiently encode enumerated types. Enums can either be 8 or 16 bits, depending on the number of unique values they're required to store. Consider using this if you need either the associated validation at insert time (undeclared values will be rejected) or wish to perform queries which exploit a natural ordering in the Enum values e.g. imagine a feedback column containing user responses `Enum(':(' = 1, ':|' = 2, ':)' = 3)`.
> Tip: To find the range of all columns, and the number of distinct values, you can use the simple query `SELECT * APPLY min, * APPLY max, * APPLY uniq FROM table FORMAT Vertical`. We recommend performing this over a smaller subset of the data as this can be expensive. This query requires numerics to be at least defined as such for an accurate result i.e. not a String.
@@ -154,7 +154,7 @@ By applying these simple rules to our posts table, we can identify an optimal ty
|------------------------|------------|------------------------------------------------------------------------|----------------|--------|----------------------------------------------------------------------------------------------|------------------------------------------|
| `PostTypeId` | Yes | 1, 8 | 8 | No | | `Enum('Question' = 1, 'Answer' = 2, 'Wiki' = 3, 'TagWikiExcerpt' = 4, 'TagWiki' = 5, 'ModeratorNomination' = 6, 'WikiPlaceholder' = 7, 'PrivilegeWiki' = 8)` |
| `AcceptedAnswerId` | Yes | 0, 78285170 | 12282094 | Yes | Differentiate Null with 0 value | UInt32 |
-| `CreationDate` | No | 2008-07-31 21:42:52.667000000, 2024-03-31 23:59:17.697000000 | - | No | Millisecond granularity is not required, use DateTime | DateTime |
+| `CreationDate` | No | 2008-07-31 21:42:52.667000000, 2024-03-31 23:59:17.697000000 | - | No | Millisecond granularity isn't required, use DateTime | DateTime |
| `Score` | Yes | -217, 34970 | 3236 | No | | Int32 |
| `ViewCount` | Yes | 2, 13962748 | 170867 | No | | UInt32 |
| `Body` | No | - | - | No | | String |
@@ -162,8 +162,8 @@ By applying these simple rules to our posts table, we can identify an optimal ty
| `OwnerDisplayName` | No | - | 181251 | Yes | Consider Null to be empty string | String |
| `LastEditorUserId` | Yes | -1, 9999993 | 1104694 | Yes | 0 is an unused value can be used for Nulls | Int32 |
| `LastEditorDisplayName` | No | - | 70952 | Yes | Consider Null to be an empty string. Tested LowCardinality and no benefit | String |
-| `LastEditDate` | No | 2008-08-01 13:24:35.051000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity is not required, use DateTime | DateTime |
-| `LastActivityDate` | No | 2008-08-01 12:19:17.417000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity is not required, use DateTime | DateTime |
+| `LastEditDate` | No | 2008-08-01 13:24:35.051000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity isn't required, use DateTime | DateTime |
+| `LastActivityDate` | No | 2008-08-01 12:19:17.417000000, 2024-04-06 21:01:22.697000000 | - | No | Millisecond granularity isn't required, use DateTime | DateTime |
| `Title` | No | - | - | No | Consider Null to be an empty string | String |
| `Tags` | No | - | - | No | Consider Null to be an empty string | String |
| `AnswerCount` | Yes | 0, 518 | 216 | No | Consider Null and 0 to same | UInt16 |
@@ -171,8 +171,8 @@ By applying these simple rules to our posts table, we can identify an optimal ty
| `FavoriteCount` | Yes | 0, 225 | 6 | Yes | Consider Null and 0 to same | UInt8 |
| `ContentLicense` | No | - | 3 | No | LowCardinality outperforms FixedString | LowCardinality(String) |
| `ParentId` | No | - | 20696028 | Yes | Consider Null to be an empty string | String |
-| `CommunityOwnedDate` | No | 2008-08-12 04:59:35.017000000, 2024-04-01 05:36:41.380000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity is not required, use DateTime | DateTime |
-| `ClosedDate` | No | 2008-09-04 20:56:44, 2024-04-06 18:49:25.393000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity is not required, use DateTime | DateTime |
+| `CommunityOwnedDate` | No | 2008-08-12 04:59:35.017000000, 2024-04-01 05:36:41.380000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity isn't required, use DateTime | DateTime |
+| `ClosedDate` | No | 2008-09-04 20:56:44, 2024-04-06 18:49:25.393000000 | - | Yes | Consider default 1970-01-01 for Nulls. Millisecond granularity isn't required, use DateTime | DateTime |
@@ -229,7 +229,7 @@ At the scale at which ClickHouse is often used, memory and disk efficiency are p
The selected key in ClickHouse will determine not only the index, but also order in which data is written on disk. Because of this, it can dramatically impact compression levels which can in turn affect query performance. An ordering key which causes the values of most columns to be written in contiguous order will allow the selected compression algorithm (and codecs) to compress the data more effectively.
-> All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they are included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
+> All columns in a table will be sorted based on the value of the specified ordering key, regardless of whether they're included in the key itself. For instance, if `CreationDate` is used as the key, the order of values in all other columns will correspond to the order of values in the `CreationDate` column. Multiple ordering keys can be specified - this will order with the same semantics as an `ORDER BY` clause in a `SELECT` query.
Some simple rules can be applied to help choose an ordering key. The following can sometimes be in conflict, so consider these in order. You can identify a number of keys from this process, with 4-5 typically sufficient:
@@ -335,9 +335,9 @@ In the other guides listed below, we will explore a number of techniques to rest
> Through this section, we use optimized variants of our other tables. While we provide the schemas for these, for the sake of brevity we omit the decisions made. These are based on the rules described earlier and we leave inferring the decisions to the reader.
-The following approaches all aim to minimize the need to use JOINs to optimize reads and improve query performance. While JOINs are fully supported in ClickHouse, we recommend they are used sparingly (2 to 3 tables in a JOIN query is fine) to achieve optimal performance.
+The following approaches all aim to minimize the need to use JOINs to optimize reads and improve query performance. While JOINs are fully supported in ClickHouse, we recommend they're used sparingly (2 to 3 tables in a JOIN query is fine) to achieve optimal performance.
-> ClickHouse has no notion of foreign keys. This does not prohibit joins but means referential integrity is left to the user to manage at an application level. In OLAP systems like ClickHouse, data integrity is often managed at the application level or during the data ingestion process rather than being enforced by the database itself where it incurs a significant overhead. This approach allows for more flexibility and faster data insertion. This aligns with ClickHouse's focus on speed and scalability of read and insert queries with very large datasets.
+> ClickHouse has no notion of foreign keys. This doesn't prohibit joins but means referential integrity is left to the user to manage at an application level. In OLAP systems like ClickHouse, data integrity is often managed at the application level or during the data ingestion process rather than being enforced by the database itself where it incurs a significant overhead. This approach allows for more flexibility and faster data insertion. This aligns with ClickHouse's focus on speed and scalability of read and insert queries with very large datasets.
In order to minimize the use of Joins at query time, users have several tools/approaches:
diff --git a/docs/deployment-guides/parallel-replicas.mdx b/docs/deployment-guides/parallel-replicas.mdx
index fb1a1a8165b..7415de50967 100644
--- a/docs/deployment-guides/parallel-replicas.mdx
+++ b/docs/deployment-guides/parallel-replicas.mdx
@@ -93,7 +93,7 @@ execute a query on a subset of the data. How does it work when there is no shard
To parallelize query execution through multiple servers, we first need to be
able to assign one of our servers as a coordinator. The coordinator is the one
-that creates the list of tasks that need to be executed, ensures they are all
+that creates the list of tasks that need to be executed, ensures they're all
executed, aggregated and that the result is returned to the client. Like
in most distributed systems, this will be the role of the node that receives the
initial query. We also need to define the unit of work. In a sharded architecture,
@@ -122,7 +122,7 @@ With parallel replicas:
Each set of granules gets processed by the corresponding replicas and a
- mergeable state is sent to the coordinator when they are finished.
+ mergeable state is sent to the coordinator when they're finished.
Finally, the coordinator merges all the results from the replicas and
@@ -176,9 +176,9 @@ announcement. Let's visualize how it works using the figure below:
The coordinating node then uses the announcements to define a set of
granules that can be assigned to the different replicas. Here for example,
we can see that no granules from part 3 have been assigned to replica 2
- because this replica did not provide this part in its announcement.
+ because this replica didn't provide this part in its announcement.
Also note that no tasks were assigned to replica 3 because the
- replica did not provide an announcement.
+ replica didn't provide an announcement.
After each replica has processed the query on their subset of granules
@@ -190,12 +190,12 @@ announcement. Let's visualize how it works using the figure below:
### Dynamic coordination \{#dynamic-coordination\}
To address the issue of tail latency, we added dynamic coordination. This means
-that all the granules are not sent to a replica in one request, but each replica
+that all the granules aren't sent to a replica in one request, but each replica
will be able to request a new task (a set of granules to be processed) to the
coordinator. The coordinator will give the replica the set of granules based on
the announcement received.
-Let's assume that we are at the stage in the process where all replicas have sent
+Let's assume that we're at the stage in the process where all replicas have sent
an announcement with all parts.
The figure below visualizes how dynamic coordination works:
@@ -204,7 +204,7 @@ The figure below visualizes how dynamic coordination works:
- Replicas let the coordinator node know that they are able to process
+ Replicas let the coordinator node know that they're able to process
tasks, they can also specify how much work they can process.
@@ -300,7 +300,7 @@ This feature has known limitations, of which the major ones are documented in
this section.
:::note
-If you find an issue which is not one of the limitations given below, and
+If you find an issue which isn't one of the limitations given below, and
suspect parallel replica to be the cause, please open an issue on GitHub using
the label `comp-parallel-replicas`.
:::
@@ -308,9 +308,9 @@ the label `comp-parallel-replicas`.
| Limitation | Description |
|-----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Complex queries | Currently parallel replica works fairly well for simple queries. Complexity layers like CTEs, subqueries, JOINs, non-flat query, etc. can have a negative impact on query performance. |
-| Small queries | If you are executing a query that does not process a lot of rows, executing it on multiple replicas might not yield a better performance time given that the network time for the coordination between replicas can lead to additional cycles in the query execution. You can limit these issues by using the setting: [`parallel_replicas_min_number_of_rows_per_replica`](/docs/operations/settings/settings#parallel_replicas_min_number_of_rows_per_replica). |
+| Small queries | If you're executing a query that doesn't process a lot of rows, executing it on multiple replicas might not yield a better performance time given that the network time for the coordination between replicas can lead to additional cycles in the query execution. You can limit these issues by using the setting: [`parallel_replicas_min_number_of_rows_per_replica`](/docs/operations/settings/settings#parallel_replicas_min_number_of_rows_per_replica). |
| Parallel replicas are disabled with FINAL | |
-| Projections are not used together with Parallel replicas | |
+| Projections aren't used together with Parallel replicas | |
| High Cardinality data and complex aggregation | High cardinality aggregation that needs to send much data can significantly slow down your queries. |
| Compatibility with the new analyzer | The new analyzer might significantly slow down or speed up query execution in specific scenarios. |
@@ -319,7 +319,7 @@ the label `comp-parallel-replicas`.
| Setting | Description |
|----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `enable_parallel_replicas` | `0`: disabled `1`: enabled `2`: Force the usage of parallel replica, will throw an exception if not used. |
-| `cluster_for_parallel_replicas` | The cluster name to use for parallel replication; if you are using ClickHouse Cloud, use `default`. |
+| `cluster_for_parallel_replicas` | The cluster name to use for parallel replication; if you're using ClickHouse Cloud, use `default`. |
| `max_parallel_replicas` | Maximum number of replicas to use for the query execution on multiple replicas, if a number lower than the number of replicas in the cluster is specified, nodes will be selected randomly. This value can also be overcommitted to account for horizontal scaling. |
| `parallel_replicas_min_number_of_rows_per_replica` | Help limiting the number of replicas used based on the number of rows that need to be processed the number of replicas used is defined by: `estimated rows to read` / `min_number_of_rows_per_replica`. |
| `enable_analyzer` | Query execution with parallel replicas is supported only with enabled analyzer |
@@ -331,7 +331,7 @@ You can check what settings are being used for each query in the
also look at the [`system.events`](/docs/operations/system-tables/events)
table to see all the events that have occured on the server, and you can use the
[`clusterAllReplicas`](/docs/sql-reference/table-functions/cluster) table function to see the tables on all the replicas
-(if you are a cloud user, use `default`).
+(if you're a cloud user, use `default`).
```sql title="Query"
SELECT
diff --git a/docs/deployment-guides/replication-sharding-examples/01_1_shard_2_replicas.md b/docs/deployment-guides/replication-sharding-examples/01_1_shard_2_replicas.md
index 9007a1a2878..75385b788ee 100644
--- a/docs/deployment-guides/replication-sharding-examples/01_1_shard_2_replicas.md
+++ b/docs/deployment-guides/replication-sharding-examples/01_1_shard_2_replicas.md
@@ -37,7 +37,7 @@ The architecture of the cluster you will be setting up is shown below:
## Prerequisites {#pre-requisites}
- You've set up a [local ClickHouse server](/install) before
-- You are familiar with basic configuration concepts of ClickHouse such as [configuration files](/operations/configuration-files)
+- You're familiar with basic configuration concepts of ClickHouse such as [configuration files](/operations/configuration-files)
- You have docker installed on your machine
@@ -282,7 +282,7 @@ are allowed, but can also be turned off with setting `allow_distributed_ddl_quer
#### Keeper configuration {#keeper-config-explanation}
The `` section tells ClickHouse where ClickHouse Keeper (or ZooKeeper) is running.
-As we are using a ClickHouse Keeper cluster, each `` of the cluster needs to be specified,
+As we're using a ClickHouse Keeper cluster, each `` of the cluster needs to be specified,
along with its hostname and port number using the `` and `` tags respectively.
Set up of ClickHouse Keeper is explained in the next step of the tutorial.
diff --git a/docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md b/docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md
index 03fae707423..a4fe7684844 100644
--- a/docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md
+++ b/docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md
@@ -36,7 +36,7 @@ The architecture of the cluster you will be setting up is shown below:
## Prerequisites {#pre-requisites}
- You've set up a [local ClickHouse server](/install) before
-- You are familiar with basic configuration concepts of ClickHouse such as [configuration files](/operations/configuration-files)
+- You're familiar with basic configuration concepts of ClickHouse such as [configuration files](/operations/configuration-files)
- You have docker installed on your machine
@@ -301,7 +301,7 @@ are allowed, but can also be turned off with setting `allow_distributed_ddl_quer
#### Keeper configuration {#keeper-config-explanation}
The `` section tells ClickHouse where ClickHouse Keeper (or ZooKeeper) is running.
-As we are using a ClickHouse Keeper cluster, each `` of the cluster needs to be specified,
+As we're using a ClickHouse Keeper cluster, each `` of the cluster needs to be specified,
along with its hostname and port number using the `` and `` tags respectively.
Set up of ClickHouse Keeper is explained in the next step of the tutorial.
@@ -813,7 +813,7 @@ Code: 198. DB::NetException: Not found address of host: clickhouse-01: (clickhou
: While executing Remote. (ALL_CONNECTION_TRIES_FAILED)
```
-Unfortunately, our cluster is not fault-tolerant. If one of the hosts fails, the
+Unfortunately, our cluster isn't fault-tolerant. If one of the hosts fails, the
cluster is considered unhealthy and the query fails compared to the replicated
table we saw in the [previous example](/architecture/replication) for which
we were able to insert data even when one of the hosts failed.
diff --git a/docs/deployment-guides/replication-sharding-examples/03_2_shards_2_replicas.md b/docs/deployment-guides/replication-sharding-examples/03_2_shards_2_replicas.md
index 8d268eda275..4ed8d24e528 100644
--- a/docs/deployment-guides/replication-sharding-examples/03_2_shards_2_replicas.md
+++ b/docs/deployment-guides/replication-sharding-examples/03_2_shards_2_replicas.md
@@ -33,7 +33,7 @@ The architecture of the cluster you will be setting up is shown below:
## Prerequisites {#prerequisites}
- You've set up a [local ClickHouse server](/install) before
-- You are familiar with basic configuration concepts of ClickHouse such as [configuration files](/operations/configuration-files)
+- You're familiar with basic configuration concepts of ClickHouse such as [configuration files](/operations/configuration-files)
- You have docker installed on your machine
@@ -333,7 +333,7 @@ across the cluster using the `ON CLUSTER` clause.
#### Keeper configuration {#keeper-config-explanation}
The `` section tells ClickHouse where ClickHouse Keeper (or ZooKeeper) is running.
-As we are using a ClickHouse Keeper cluster, each `` of the cluster needs to be specified,
+As we're using a ClickHouse Keeper cluster, each `` of the cluster needs to be specified,
along with its hostname and port number using the `` and `` tags respectively.
Set up of ClickHouse Keeper is explained in the next step of the tutorial.
@@ -661,7 +661,7 @@ SHOW TABLES IN uk;
## Insert data into a distributed table {#inserting-data-using-distributed}
-To insert data into the table, `ON CLUSTER` cannot be used as it does
+To insert data into the table, `ON CLUSTER` can't be used as it does
not apply to DML (Data Manipulation Language) queries such as `INSERT`, `UPDATE`,
and `DELETE`. To insert data, it is necessary to make use of the
[`Distributed`](/engines/table-engines/special/distributed) table engine.
diff --git a/docs/deployment-guides/replication-sharding-examples/_snippets/_server_parameter_table.mdx b/docs/deployment-guides/replication-sharding-examples/_snippets/_server_parameter_table.mdx
index 4586778b803..3bcc25b09d5 100644
--- a/docs/deployment-guides/replication-sharding-examples/_snippets/_server_parameter_table.mdx
+++ b/docs/deployment-guides/replication-sharding-examples/_snippets/_server_parameter_table.mdx
@@ -2,5 +2,5 @@ For each server, the following parameters are specified:
| Parameter | Description | Default Value |
|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
-| `host` | The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server does not start. If you change the DNS record, you need to restart the server. | - |
+| `host` | The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server doesn't start. If you change the DNS record, you need to restart the server. | - |
| `port` | The TCP port for messenger activity (`tcp_port` in the config, usually set to 9000). Not to be confused with `http_port`. | - |
diff --git a/docs/deployment-guides/terminology.md b/docs/deployment-guides/terminology.md
index 0cfe65185af..b11d1e9ed31 100644
--- a/docs/deployment-guides/terminology.md
+++ b/docs/deployment-guides/terminology.md
@@ -16,7 +16,7 @@ we recommend that you try them and then adjust them to suit your needs. You may
an example here that fits your requirements exactly.
We offer 'recipes' of a number of different topologies in the [example repo](https://github.com/ClickHouse/examples/tree/main/docker-compose-recipes/recipes)
-and recommend taking a look at them if the examples in this section do not fit your
+and recommend taking a look at them if the examples in this section don't fit your
needs exactly.
diff --git a/docs/dictionary/index.md b/docs/dictionary/index.md
index 823a979fcda..36b0a108e32 100644
--- a/docs/dictionary/index.md
+++ b/docs/dictionary/index.md
@@ -245,7 +245,7 @@ Peak memory usage: 248.84 MiB.
## Index time enrichment {#index-time-enrichment}
-In the above example, we used a dictionary at query time to remove a join. Dictionaries can also be used to enrich rows at insert time. This is typically appropriate if the enrichment value does not change and exists in an external source which can be used to populate the dictionary. In this case, enriching the row at insert time avoids the query time lookup to the dictionary.
+In the above example, we used a dictionary at query time to remove a join. Dictionaries can also be used to enrich rows at insert time. This is typically appropriate if the enrichment value doesn't change and exists in an external source which can be used to populate the dictionary. In this case, enriching the row at insert time avoids the query time lookup to the dictionary.
Let's suppose that the `Location` of a user in Stack Overflow never changes (in reality they do) - specifically the `Location` column of the `users` table. Suppose we want to do an analytics query on the posts table by location. This contains a `UserId`.
diff --git a/docs/faq/general/dependencies.md b/docs/faq/general/dependencies.md
index acbfafd5e7a..bbb67e497eb 100644
--- a/docs/faq/general/dependencies.md
+++ b/docs/faq/general/dependencies.md
@@ -10,7 +10,7 @@ keywords: ['dependencies', '3rd-party']
# What are the 3rd-party dependencies for running ClickHouse?
-ClickHouse does not have any runtime dependencies. It is distributed as a single binary application which is fully self-contained. This application provides all the functionality of the cluster, serves queries, acts as a worker node in the cluster, as a coordination system providing the RAFT consensus algorithm, as a client or a local query engine.
+ClickHouse doesn't have any runtime dependencies. It is distributed as a single binary application which is fully self-contained. This application provides all the functionality of the cluster, serves queries, acts as a worker node in the cluster, as a coordination system providing the RAFT consensus algorithm, as a client or a local query engine.
This unique architecture choice differentiates it from other systems, that often have dedicated frontend, backend, or aggregation nodes, as it makes the deployment, cluster management, and monitoring easier.
diff --git a/docs/faq/general/distributed-join.md b/docs/faq/general/distributed-join.md
index ac3065546b0..c0b180c4ac1 100644
--- a/docs/faq/general/distributed-join.md
+++ b/docs/faq/general/distributed-join.md
@@ -14,6 +14,6 @@ ClickHouse supports distributed JOIN on a cluster.
When the data is co-located on the cluster (e.g., the JOIN is performed by the user identifier, which is also a sharding key), ClickHouse provides a way to perform the JOIN without data movement on the network.
-When the data is not co-located, ClickHouse allows a broadcast JOIN, when parts of the joined data are distributed across the nodes of the cluster.
+When the data isn't co-located, ClickHouse allows a broadcast JOIN, when parts of the joined data are distributed across the nodes of the cluster.
-As of 2025, ClickHouse does not perform the shuffle-join algorithm, which means redistribution of the both sides of the join over network across the cluster according to the join keys.
+As of 2025, ClickHouse doesn't perform the shuffle-join algorithm, which means redistribution of the both sides of the join over network across the cluster according to the join keys.
diff --git a/docs/faq/general/ne-tormozit.md b/docs/faq/general/ne-tormozit.md
index dc1336c1b45..40b11755ef5 100644
--- a/docs/faq/general/ne-tormozit.md
+++ b/docs/faq/general/ne-tormozit.md
@@ -20,8 +20,8 @@ We decided to keep the slogan even on t-shirts produced for international events
So, what does it mean? Here are some ways to translate *"не тормозит"*:
-- If you translate it literally, it sounds something like *"ClickHouse does not press the brake pedal"*.
-- Shorter, but less precise translations might be *"ClickHouse is not slow"*, *"ClickHouse does not lag"* or just *"ClickHouse is fast"*.
+- If you translate it literally, it sounds something like *"ClickHouse doesn't press the brake pedal"*.
+- Shorter, but less precise translations might be *"ClickHouse isn't slow"*, *"ClickHouse doesn't lag"* or just *"ClickHouse is fast"*.
If you haven't seen one of those t-shirts in person, you can check them out online in many ClickHouse-related videos. For example, this one:
@@ -29,4 +29,4 @@ If you haven't seen one of those t-shirts in person, you can check them out onli
-_P.S. These t-shirts are not for sale_, they were given away for free at some [ClickHouse Meetups](https://www.meetup.com/pro/clickhouse/), usually as a gift for best questions or other forms of active participation. Now, these t-shirts are no longer produced, and they have become highly valued collector's items.
+_P.S. These t-shirts aren't for sale_, they were given away for free at some [ClickHouse Meetups](https://www.meetup.com/pro/clickhouse/), usually as a gift for best questions or other forms of active participation. Now, these t-shirts are no longer produced, and they have become highly valued collector's items.
diff --git a/docs/faq/general/olap.md b/docs/faq/general/olap.md
index bbb939b7182..908b11d3a86 100644
--- a/docs/faq/general/olap.md
+++ b/docs/faq/general/olap.md
@@ -33,9 +33,9 @@ ClickHouse is an OLAP database management system that is pretty often used as a
All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). Former focuses on building reports, each based on large volumes of historical data, but doing it not so frequently. While the latter usually handle a continuous stream of transactions, constantly modifying the current state of data.
-In practice OLAP and OLTP are not categories, it's more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system.
+In practice OLAP and OLTP aren't categories, it's more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system.
-Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../concepts/why-clickhouse-is-so-fast.mdx) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
+Even if a DBMS started as a pure OLAP or pure OLTP, they're forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../concepts/why-clickhouse-is-so-fast.mdx) and it still doesn't have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
The fundamental trade-off between OLAP and OLTP systems remains:
diff --git a/docs/faq/general/who-is-using-clickhouse.md b/docs/faq/general/who-is-using-clickhouse.md
index 142693a6e94..6a9905de691 100644
--- a/docs/faq/general/who-is-using-clickhouse.md
+++ b/docs/faq/general/who-is-using-clickhouse.md
@@ -10,9 +10,9 @@ doc_type: 'reference'
# Who is using ClickHouse? {#who-is-using-clickhouse}
-Being an open-source product makes this question not so straightforward to answer. You do not have to tell anyone if you want to start using ClickHouse, you just go grab source code or pre-compiled packages. There's no contract to sign and the [Apache 2.0 license](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) allows for unconstrained software distribution.
+Being an open-source product makes this question not so straightforward to answer. You don't have to tell anyone if you want to start using ClickHouse, you just go grab source code or pre-compiled packages. There's no contract to sign and the [Apache 2.0 license](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) allows for unconstrained software distribution.
-Also, the technology stack is often in a grey zone of what's covered by an NDA. Some companies consider technologies they use as a competitive advantage even if they are open-source and do not allow employees to share any details publicly. Some see some PR risks and allow employees to share implementation details only with their PR department approval.
+Also, the technology stack is often in a grey zone of what's covered by an NDA. Some companies consider technologies they use as a competitive advantage even if they're open-source and don't allow employees to share any details publicly. Some see some PR risks and allow employees to share implementation details only with their PR department approval.
So how to tell who is using ClickHouse?
diff --git a/docs/faq/index.md b/docs/faq/index.md
index ae14438cc44..3aad7f8c1e1 100644
--- a/docs/faq/index.md
+++ b/docs/faq/index.md
@@ -11,7 +11,7 @@ keywords: ['FAQ', 'questions', 'answers']
| Page | Description |
|---------------------------------------------------------------|----------------------------------------------------------------------------------------|
| [General Questions about ClickHouse](general/index.md) | General questions we get about ClickHouse. |
-| [Why not use something like MapReduce?](general/mapreduce.md) | Explainer on why MapReduce implementations are not appropriate for the OLAP scenario. |
+| [Why not use something like MapReduce?](general/mapreduce.md) | Explainer on why MapReduce implementations aren't appropriate for the OLAP scenario. |
| [What does "не тормозит" mean](general/ne-tormozit.md) | Explainer on what "не тормозит" means, which you may have seen on ClickHouse t-shirts. |
| [What is OLAP](general/olap.md) | Explainer on what Online Analytical Processing is. |
| [Who is using ClickHouse](general/who-is-using-clickhouse.md) | Learn about who is using ClickHouse. |
diff --git a/docs/faq/operations/delete-old-data.md b/docs/faq/operations/delete-old-data.md
index b597ee9f52a..1db4a485b5b 100644
--- a/docs/faq/operations/delete-old-data.md
+++ b/docs/faq/operations/delete-old-data.md
@@ -16,7 +16,7 @@ The short answer is “yes”. ClickHouse has multiple mechanisms that allow fre
ClickHouse allows to automatically drop values when some condition happens. This condition is configured as an expression based on any columns, usually just static offset for any timestamp column.
-The key advantage of this approach is that it does not need any external system to trigger, once TTL is configured, data removal happens automatically in background.
+The key advantage of this approach is that it doesn't need any external system to trigger, once TTL is configured, data removal happens automatically in background.
:::note
TTL can also be used to move data not only to [/dev/null](https://en.wikipedia.org/wiki/Null_device), but also between different storage systems, like from SSD to HDD.
diff --git a/docs/faq/operations/production.md b/docs/faq/operations/production.md
index 60f86cecc20..268066d51eb 100644
--- a/docs/faq/operations/production.md
+++ b/docs/faq/operations/production.md
@@ -12,7 +12,7 @@ keywords: ['production', 'deployment', 'versions', 'best practices', 'upgrade st
First of all, let's discuss why people ask this question in the first place. There are two key reasons:
-1. ClickHouse is developed with pretty high velocity, and usually there are 10+ stable releases per year. That makes a wide range of releases to choose from, which is not so trivial of a choice.
+1. ClickHouse is developed with pretty high velocity, and usually there are 10+ stable releases per year. That makes a wide range of releases to choose from, which isn't so trivial of a choice.
2. Some users want to avoid spending time figuring out which version works best for their use case and just follow someone else's advice.
The second reason is more fundamental, so we'll start with that one and then get back to navigating through various ClickHouse releases.
@@ -29,19 +29,19 @@ Here are some key points to get reasonable fidelity in a pre-production environm
- Don't make it read-only with some frozen data.
- Don't make it write-only with just copying data without building some typical reports.
- Don't wipe it clean instead of applying schema migrations.
-- Use a sample of real production data and queries. Try to choose a sample that's still representative and makes `SELECT` queries return reasonable results. Use obfuscation if your data is sensitive and internal policies do not allow it to leave the production environment.
+- Use a sample of real production data and queries. Try to choose a sample that's still representative and makes `SELECT` queries return reasonable results. Use obfuscation if your data is sensitive and internal policies don't allow it to leave the production environment.
- Make sure that pre-production is covered by your monitoring and alerting software the same way as your production environment does.
- If your production spans across multiple datacenters or regions, make your pre-production do the same.
-- If your production uses complex features like replication, distributed tables and cascading materialized views, make sure they are configured similarly in pre-production.
+- If your production uses complex features like replication, distributed tables and cascading materialized views, make sure they're configured similarly in pre-production.
- There's a trade-off on using the roughly same number of servers or VMs in pre-production as in production but of smaller size, or much less of them but of the same size. The first option might catch extra network-related issues, while the latter is easier to manage.
The second area to invest in is **automated testing infrastructure**. Don't assume that if some kind of query has executed successfully once, it'll continue to do so forever. It's OK to have some unit tests where ClickHouse is mocked, but make sure your product has a reasonable set of automated tests that are run against real ClickHouse and check that all important use cases are still working as expected.
-An extra step forward could be contributing those automated tests to [ClickHouse's open-source test infrastructure](https://github.com/ClickHouse/ClickHouse/tree/master/tests) that are continuously used in its day-to-day development. It definitely will take some additional time and effort to learn [how to run it](../../development/tests.md) and then how to adapt your tests to this framework, but it'll pay off by ensuring that ClickHouse releases are already tested against them when they are announced stable, instead of repeatedly losing time on reporting the issue after the fact and then waiting for a bugfix to be implemented, backported and released. Some companies even have such test contributions to infrastructure by its use as an internal policy, (called [Beyonce's Rule](https://www.oreilly.com/library/view/software-engineering-at/9781492082781/ch01.html#policies_that_scale_well) at Google).
+An extra step forward could be contributing those automated tests to [ClickHouse's open-source test infrastructure](https://github.com/ClickHouse/ClickHouse/tree/master/tests) that are continuously used in its day-to-day development. It definitely will take some additional time and effort to learn [how to run it](../../development/tests.md) and then how to adapt your tests to this framework, but it'll pay off by ensuring that ClickHouse releases are already tested against them when they're announced stable, instead of repeatedly losing time on reporting the issue after the fact and then waiting for a bugfix to be implemented, backported and released. Some companies even have such test contributions to infrastructure by its use as an internal policy, (called [Beyonce's Rule](https://www.oreilly.com/library/view/software-engineering-at/9781492082781/ch01.html#policies_that_scale_well) at Google).
When you have your pre-production environment and testing infrastructure in place, choosing the best version is straightforward:
-1. Routinely run your automated tests against new ClickHouse releases. You can do it even for ClickHouse releases that are marked as `testing`, but going forward to the next steps with them is not recommended.
+1. Routinely run your automated tests against new ClickHouse releases. You can do it even for ClickHouse releases that are marked as `testing`, but going forward to the next steps with them isn't recommended.
2. Deploy the ClickHouse release that passed the tests to pre-production and check that all processes are running as expected.
3. Report any issues you discovered to [ClickHouse GitHub Issues](https://github.com/ClickHouse/ClickHouse/issues).
4. If there were no major issues, it should be safe to start deploying ClickHouse release to your production environment. Investing in gradual release automation that implements an approach similar to [canary releases](https://martinfowler.com/bliki/CanaryRelease.html) or [green-blue deployments](https://martinfowler.com/bliki/BlueGreenDeployment.html) might further reduce the risk of issues in production.
@@ -57,10 +57,10 @@ If you look into the contents of the ClickHouse package repository, you'll see t
Here is some guidance on how to choose between them:
-- `stable` is the kind of package we recommend by default. They are released roughly monthly (and thus provide new features with reasonable delay) and three latest stable releases are supported in terms of diagnostics and backporting of bug fixes.
+- `stable` is the kind of package we recommend by default. They're released roughly monthly (and thus provide new features with reasonable delay) and three latest stable releases are supported in terms of diagnostics and backporting of bug fixes.
- `lts` are released twice a year and are supported for a year after their initial release. You might prefer them over `stable` in the following cases:
- - Your company has some internal policies that do not allow for frequent upgrades or using non-LTS software.
- - You are using ClickHouse in some secondary products that either do not require any complex ClickHouse features or do not have enough resources to keep it updated.
+ - Your company has some internal policies that don't allow for frequent upgrades or using non-LTS software.
+ - You're using ClickHouse in some secondary products that either don't require any complex ClickHouse features or don't have enough resources to keep it updated.
Many teams who initially think that `lts` is the way to go often switch to `stable` anyway because of some recent feature that's important for their product.
diff --git a/docs/getting-started/example-datasets/amazon-reviews.md b/docs/getting-started/example-datasets/amazon-reviews.md
index 822dca0f21a..a1f97380dc0 100644
--- a/docs/getting-started/example-datasets/amazon-reviews.md
+++ b/docs/getting-started/example-datasets/amazon-reviews.md
@@ -123,7 +123,7 @@ FROM s3Cluster('default',
```
:::tip
-In ClickHouse Cloud, the name of the cluster is `default`. Change `default` to the name of your cluster...or use the `s3` table function (instead of `s3Cluster`) if you do not have a cluster.
+In ClickHouse Cloud, the name of the cluster is `default`. Change `default` to the name of your cluster...or use the `s3` table function (instead of `s3Cluster`) if you don't have a cluster.
:::
5. That query doesn't take long - averaging about 300,000 rows per second. Within 5 minutes or so you should see all the rows inserted:
diff --git a/docs/getting-started/example-datasets/cell-towers.md b/docs/getting-started/example-datasets/cell-towers.md
index 5a74602167a..8f77ca347a8 100644
--- a/docs/getting-started/example-datasets/cell-towers.md
+++ b/docs/getting-started/example-datasets/cell-towers.md
@@ -283,7 +283,7 @@ The schema for this table was designed for compact storage on disk and query spe
- `mcc` or Mobile country code, is stored as a `UInt16` as we know the range is 1 - 999.
- `lon` and `lat` are `Float64`.
-None of the other fields are used in the queries or visualizations in this guide, but they are described in the forum linked above if you are interested.
+None of the other fields are used in the queries or visualizations in this guide, but they're described in the forum linked above if you're interested.
## Build visualizations with Apache Superset {#build-visualizations-with-apache-superset}
@@ -310,7 +310,7 @@ To build a Superset dashboard using the OpenCelliD dataset you should:
:::note
- If **ClickHouse Connect** is not one of your options, then you will need to install it. The command is `pip install clickhouse-connect`, and more info is [available here](https://pypi.org/project/clickhouse-connect/).
+ If **ClickHouse Connect** isn't one of your options, then you will need to install it. The command is `pip install clickhouse-connect`, and more info is [available here](https://pypi.org/project/clickhouse-connect/).
:::
#### Add your connection details {#add-your-connection-details}
@@ -357,7 +357,7 @@ Click on **UPDATE CHART** to render the visualization.
### Add the charts to a **dashboard** {#add-the-charts-to-a-dashboard}
-This screenshot shows cell tower locations with LTE, UMTS, and GSM radios. The charts are all created in the same way, and they are added to a dashboard.
+This screenshot shows cell tower locations with LTE, UMTS, and GSM radios. The charts are all created in the same way, and they're added to a dashboard.
@@ -366,5 +366,5 @@ The data is also available for interactive queries in the [Playground](https://s
This [example](https://sql.clickhouse.com?query_id=UV8M4MAGS2PWAUOAYAAARM) will populate the username and even the query for you.
-Although you cannot create tables in the Playground, you can run all of the queries and even use Superset (adjust the host name and port number).
+Although you can't create tables in the Playground, you can run all of the queries and even use Superset (adjust the host name and port number).
:::
diff --git a/docs/getting-started/example-datasets/covid19.md b/docs/getting-started/example-datasets/covid19.md
index 5ebc04c7f31..2f24e15f94b 100644
--- a/docs/getting-started/example-datasets/covid19.md
+++ b/docs/getting-started/example-datasets/covid19.md
@@ -135,7 +135,7 @@ FROM covid19;
└────────────────────────────────────────────┘
```
-7. You will notice the data has a lot of 0's for dates - either weekends or days when numbers were not reported each day. We can use a window function to smooth out the daily averages of new cases:
+7. You will notice the data has a lot of 0's for dates - either weekends or days when numbers weren't reported each day. We can use a window function to smooth out the daily averages of new cases:
```sql
SELECT
diff --git a/docs/getting-started/example-datasets/environmental-sensors.md b/docs/getting-started/example-datasets/environmental-sensors.md
index 0e6fd494d47..80399c3937d 100644
--- a/docs/getting-started/example-datasets/environmental-sensors.md
+++ b/docs/getting-started/example-datasets/environmental-sensors.md
@@ -76,7 +76,7 @@ ENGINE = MergeTree
ORDER BY (timestamp, sensor_id);
```
-3. ClickHouse Cloud services have a cluster named `default`. We will use the `s3Cluster` table function, which reads S3 files in parallel from the nodes in your cluster. (If you do not have a cluster, just use the `s3` function and remove the cluster name.)
+3. ClickHouse Cloud services have a cluster named `default`. We will use the `s3Cluster` table function, which reads S3 files in parallel from the nodes in your cluster. (If you don't have a cluster, just use the `s3` function and remove the cluster name.)
This query will take a while - it's about 1.67T of data uncompressed:
diff --git a/docs/getting-started/example-datasets/foursquare-os-places.md b/docs/getting-started/example-datasets/foursquare-os-places.md
index c952cfd04b8..5a2b9c267a6 100644
--- a/docs/getting-started/example-datasets/foursquare-os-places.md
+++ b/docs/getting-started/example-datasets/foursquare-os-places.md
@@ -232,7 +232,7 @@ This column converts a latitude value into a Y coordinate in the Mercator projec
- multiplying by `0xFFFFFFFF` scales to the full 32-bit integer range
Specifying `MATERIALIZED` makes sure that ClickHouse calculates the values for these
-columns when we `INSERT` the data, without having to specify these columns (which are not
+columns when we `INSERT` the data, without having to specify these columns (which aren't
part of the original data schema) in the `INSERT statement.
The table is ordered by `mortonEncode(mercator_x, mercator_y)` which produces a
diff --git a/docs/getting-started/example-datasets/github.md b/docs/getting-started/example-datasets/github.md
index 4d64fbf8070..c9e6e0330ba 100644
--- a/docs/getting-started/example-datasets/github.md
+++ b/docs/getting-started/example-datasets/github.md
@@ -337,7 +337,7 @@ LIMIT 10
10 rows in set. Elapsed: 0.085 sec. Processed 532.10 thousand rows, 8.68 MB (6.30 million rows/s., 102.64 MB/s.)
```
-Note that this allows for files to be renamed and then re-renamed to their original values. First we aggregate `old_path` for a list of deleted files as a result of renaming. We union this with the last operation for every `path`. Finally, we filter this list to those where the final event is not a `Delete`.
+Note that this allows for files to be renamed and then re-renamed to their original values. First we aggregate `old_path` for a list of deleted files as a result of renaming. We union this with the last operation for every `path`. Finally, we filter this list to those where the final event isn't a `Delete`.
[play](https://sql.clickhouse.com?query_id=1OXCKMOH2JVMSHD3NS2WW6)
@@ -387,7 +387,7 @@ git ls-files | grep -v -E 'generated\.cpp|^(contrib|docs?|website|libs/(libcityh
The difference here is caused by a few factors:
-- A rename can occur alongside other modifications to the file. These are listed as separate events in file_changes but with the same time. The `argMax` function has no way of distinguishing these - it picks the first value. The natural ordering of the inserts (the only means of knowing the correct order) is not maintained across the union so modified events can be selected. For example, below the `src/Functions/geometryFromColumn.h` file has several modifications before being renamed to `src/Functions/geometryConverters.h`. Our current solution may pick a Modify event as the latest change causing `src/Functions/geometryFromColumn.h` to be retained.
+- A rename can occur alongside other modifications to the file. These are listed as separate events in file_changes but with the same time. The `argMax` function has no way of distinguishing these - it picks the first value. The natural ordering of the inserts (the only means of knowing the correct order) isn't maintained across the union so modified events can be selected. For example, below the `src/Functions/geometryFromColumn.h` file has several modifications before being renamed to `src/Functions/geometryConverters.h`. Our current solution may pick a Modify event as the latest change causing `src/Functions/geometryFromColumn.h` to be retained.
[play](https://sql.clickhouse.com?query_id=SCXWMR9GBMJ9UNZYQXQBFA)
@@ -2358,7 +2358,7 @@ WHERE (path = 'src/Storages/StorageReplicatedMergeTree.cpp') AND (change_type =
This makes viewing the full history of a file challenging since we don't have a single value connecting all line or file changes.
-To address this, we can use User Defined Functions (UDFs). These cannot, currently, be recursive, so to identify the history of a file we must define a series of UDFs which call each other explicitly.
+To address this, we can use User Defined Functions (UDFs). These can't, currently, be recursive, so to identify the history of a file we must define a series of UDFs which call each other explicitly.
This means we can only track renames to a maximum depth - the below example is 5 deep. It is unlikely a file will be renamed more times than this, so for now, this is sufficient.
diff --git a/docs/getting-started/example-datasets/hacker-news.md b/docs/getting-started/example-datasets/hacker-news.md
index 69cc9679260..3ae30c5cf4e 100644
--- a/docs/getting-started/example-datasets/hacker-news.md
+++ b/docs/getting-started/example-datasets/hacker-news.md
@@ -444,7 +444,7 @@ LIMIT 5
## Parquet {#parquet}
One of the strengths of ClickHouse is its ability to handle any number of [formats](/interfaces/formats).
-CSV represents a rather ideal use case, and is not the most efficient for data exchange.
+CSV represents a rather ideal use case, and isn't the most efficient for data exchange.
Next, you'll load the data from a Parquet file which is an efficient column-oriented format.
diff --git a/docs/getting-started/example-datasets/menus.md b/docs/getting-started/example-datasets/menus.md
index 4a9b7b61e46..0bac7258b50 100644
--- a/docs/getting-started/example-datasets/menus.md
+++ b/docs/getting-started/example-datasets/menus.md
@@ -124,9 +124,9 @@ clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_defa
We use [CSVWithNames](/interfaces/formats/CSVWithNames) format as the data is represented by CSV with header.
-We disable `format_csv_allow_single_quotes` as only double quotes are used for data fields and single quotes can be inside the values and should not confuse the CSV parser.
+We disable `format_csv_allow_single_quotes` as only double quotes are used for data fields and single quotes can be inside the values and shouldn't confuse the CSV parser.
-We disable [input_format_null_as_default](/operations/settings/formats#input_format_null_as_default) as our data does not have [NULL](/operations/settings/formats#input_format_null_as_default). Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data.
+We disable [input_format_null_as_default](/operations/settings/formats#input_format_null_as_default) as our data doesn't have [NULL](/operations/settings/formats#input_format_null_as_default). Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data.
The setting [date_time_input_format best_effort](/operations/settings/formats#date_time_input_format) allows to parse [DateTime](../../sql-reference/data-types/datetime.md) fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed.
diff --git a/docs/getting-started/example-datasets/noaa.md b/docs/getting-started/example-datasets/noaa.md
index 0f86380d492..b3cfae956c2 100644
--- a/docs/getting-started/example-datasets/noaa.md
+++ b/docs/getting-started/example-datasets/noaa.md
@@ -30,7 +30,7 @@ The sections below give a brief overview of the steps that were involved in brin
### Pre-prepared data {#pre-prepared-data}
-More specifically, rows have been removed that did not fail any quality assurance checks by Noaa. The data has also been restructured from a measurement per line to a row per station id and date, i.e.
+More specifically, rows have been removed that didn't fail any quality assurance checks by Noaa. The data has also been restructured from a measurement per line to a row per station id and date, i.e.
```csv
"station_id","date","tempAvg","tempMax","tempMin","precipitation","snowfall","snowDepth","percentDailySun","averageWindSpeed","maxWindSpeed","weatherType"
@@ -99,7 +99,7 @@ Summarizing the format documentation and the columns in order:
- WT** = Weather Type where ** defines the weather type. Full list of weather types here.
- DATA VALUE = 5 character data value for ELEMENT i.e. the value of the measurement.
- M-FLAG = 1 character Measurement Flag. This has 10 possible values. Some of these values indicate questionable data accuracy. We accept data where this is set to "P" - identified as missing presumed zero, as this is only relevant to the PRCP, SNOW and SNWD measurements.
-- Q-FLAG is the measurement quality flag with 14 possible values. We are only interested in data with an empty value i.e. it did not fail any quality assurance checks.
+- Q-FLAG is the measurement quality flag with 14 possible values. We're only interested in data with an empty value i.e. it didn't fail any quality assurance checks.
- S-FLAG is the source flag for the observation. Not useful for our analysis and ignored.
- OBS-TIME = 4-character time of observation in hour-minute format (i.e. 0700 =7:00 am). Typically not present in older data. We ignore this for our purposes.
diff --git a/docs/getting-started/example-datasets/nypd_complaint_data.md b/docs/getting-started/example-datasets/nypd_complaint_data.md
index 235c8177d55..44f69fa13d8 100644
--- a/docs/getting-started/example-datasets/nypd_complaint_data.md
+++ b/docs/getting-started/example-datasets/nypd_complaint_data.md
@@ -56,10 +56,10 @@ CMPLNT_FR_TM Nullable(String)
```
:::tip
-Most of the time the above command will let you know which fields in the input data are numeric, and which are strings, and which are tuples. This is not always the case. Because ClickHouse is routineley used with datasets containing billions of records there is a default number (100) of rows examined to [infer the schema](/integrations/data-formats/json/inference) in order to avoid parsing billions of rows to infer the schema. The response below may not match what you see, as the dataset is updated several times each year. Looking at the Data Dictionary you can see that CMPLNT_NUM is specified as text, and not numeric. By overriding the default of 100 rows for inference with the setting `SETTINGS input_format_max_rows_to_read_for_schema_inference=2000`
+Most of the time the above command will let you know which fields in the input data are numeric, and which are strings, and which are tuples. This isn't always the case. Because ClickHouse is routineley used with datasets containing billions of records there is a default number (100) of rows examined to [infer the schema](/integrations/data-formats/json/inference) in order to avoid parsing billions of rows to infer the schema. The response below may not match what you see, as the dataset is updated several times each year. Looking at the Data Dictionary you can see that CMPLNT_NUM is specified as text, and not numeric. By overriding the default of 100 rows for inference with the setting `SETTINGS input_format_max_rows_to_read_for_schema_inference=2000`
you can get a better idea of the content.
-Note: as of version 22.5 the default is now 25,000 rows for inferring the schema, so only change the setting if you are on an older version or if you need more than 25,000 rows to be sampled.
+Note: as of version 22.5 the default is now 25,000 rows for inferring the schema, so only change the setting if you're on an older version or if you need more than 25,000 rows to be sampled.
:::
Run this command at your command prompt. You will be using `clickhouse-local` to query the data in the TSV file you downloaded.
@@ -109,7 +109,7 @@ Lat_Lon Tuple(Nullable(Float64), Nullable(Float64))
New Georeferenced Column Nullable(String)
```
-At this point you should check that the columns in the TSV file match the names and types specified in the **Columns in this Dataset** section of the [dataset web page](https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243). The data types are not very specific, all numeric fields are set to `Nullable(Float64)`, and all other fields are `Nullable(String)`. When you create a ClickHouse table to store the data you can specify more appropriate and performant types.
+At this point you should check that the columns in the TSV file match the names and types specified in the **Columns in this Dataset** section of the [dataset web page](https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243). The data types aren't very specific, all numeric fields are set to `Nullable(Float64)`, and all other fields are `Nullable(String)`. When you create a ClickHouse table to store the data you can specify more appropriate and performant types.
### Determine the proper schema {#determine-the-proper-schema}
@@ -150,9 +150,9 @@ Result:
The query response shows that the `JURISDICTION_CODE` fits well in a `UInt8`.
-Similarly, look at some of the `String` fields and see if they are well suited to being `DateTime` or [`LowCardinality(String)`](../../sql-reference/data-types/lowcardinality.md) fields.
+Similarly, look at some of the `String` fields and see if they're well suited to being `DateTime` or [`LowCardinality(String)`](../../sql-reference/data-types/lowcardinality.md) fields.
-For example, the field `PARKS_NM` is described as "Name of NYC park, playground or greenspace of occurrence, if applicable (state parks are not included)". The names of parks in New York City may be a good candidate for a `LowCardinality(String)`:
+For example, the field `PARKS_NM` is described as "Name of NYC park, playground or greenspace of occurrence, if applicable (state parks aren't included)". The names of parks in New York City may be a good candidate for a `LowCardinality(String)`:
```sh
clickhouse-local --input_format_max_rows_to_read_for_schema_inference=2000 \
@@ -323,7 +323,7 @@ LIMIT 25
FORMAT PrettyCompact"
```
-Lines 2 and 3 above contain the concatenation from the previous step, and lines 4 and 5 above parse the strings into `DateTime64`. As the complaint end time is not guaranteed to exist `parseDateTime64BestEffortOrNull` is used.
+Lines 2 and 3 above contain the concatenation from the previous step, and lines 4 and 5 above parse the strings into `DateTime64`. As the complaint end time isn't guaranteed to exist `parseDateTime64BestEffortOrNull` is used.
Result:
```response
@@ -356,7 +356,7 @@ Result:
└─────────────────────────┴─────────────────────────┘
```
:::note
-The dates shown as `1925` above are from errors in the data. There are several records in the original data with dates in the years `1019` - `1022` that should be `2019` - `2022`. They are being stored as Jan 1st 1925 as that is the earliest date with a 64 bit DateTime.
+The dates shown as `1925` above are from errors in the data. There are several records in the original data with dates in the years `1019` - `1022` that should be `2019` - `2022`. They're being stored as Jan 1st 1925 as that is the earliest date with a 64 bit DateTime.
:::
## Create a table {#create-a-table}
diff --git a/docs/getting-started/example-datasets/stackoverflow.md b/docs/getting-started/example-datasets/stackoverflow.md
index 1b92b41e8c1..477af7df7b4 100644
--- a/docs/getting-started/example-datasets/stackoverflow.md
+++ b/docs/getting-started/example-datasets/stackoverflow.md
@@ -224,7 +224,7 @@ These files are up to 35GB and can take around 30 mins to download depending on
### Convert to JSON {#convert-to-json}
-At the time of writing, ClickHouse does not have native support for XML as an input format. To load the data into ClickHouse we first convert to NDJSON.
+At the time of writing, ClickHouse doesn't have native support for XML as an input format. To load the data into ClickHouse we first convert to NDJSON.
To convert XML to JSON we recommend the [`xq`](https://github.com/kislyuk/yq) linux tool, a simple `jq` wrapper for XML documents.
@@ -254,7 +254,7 @@ cd posts
tail +3 ../Posts.xml | head -n -1 | split -l 10000 --filter='{ printf "\n"; cat - ; printf "\n"; } > $FILE' -
```
-After running the above you will have a set of files, each with 10000 lines. This ensures the memory overhead of the next command is not excessive (xml to JSON conversion is done in memory).
+After running the above you will have a set of files, each with 10000 lines. This ensures the memory overhead of the next command isn't excessive (xml to JSON conversion is done in memory).
```bash
find . -maxdepth 1 -type f -exec xq -c '.rows.row[]' {} \; | sed -e 's:"@:":g' > posts_v2.json
diff --git a/docs/getting-started/example-datasets/tpcds.md b/docs/getting-started/example-datasets/tpcds.md
index 782f4548f37..572c7f477f3 100644
--- a/docs/getting-started/example-datasets/tpcds.md
+++ b/docs/getting-started/example-datasets/tpcds.md
@@ -974,7 +974,7 @@ LIMIT 100;
```
::::note
-The query does not work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/95299. This alternative formulation with a minor fix works:
+The query doesn't work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/95299. This alternative formulation with a minor fix works:
```sql
WITH
@@ -2351,7 +2351,7 @@ LIMIT 100;
```
::::note
-The query does not work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/95299. This alternative formulation with a minor fix works:
+The query doesn't work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/95299. This alternative formulation with a minor fix works:
```sql
SELECT
@@ -3322,7 +3322,7 @@ LIMIT 100;
```
::::note
-The query does not work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94858. This alternative formulation with a minor fix works:
+The query doesn't work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94858. This alternative formulation with a minor fix works:
```sql
WITH
@@ -3946,7 +3946,7 @@ LIMIT 100;
```
::::note
-The query does not work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94858. This alternative formulation with a minor fix works:
+The query doesn't work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94858. This alternative formulation with a minor fix works:
```sql
WITH
@@ -4090,7 +4090,7 @@ LIMIT 100;
```
::::note
-The query does not work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94976. This alternative formulation with a minor fix works:
+The query doesn't work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94976. This alternative formulation with a minor fix works:
```sql
WITH
@@ -5208,7 +5208,7 @@ LIMIT 100;
```
::::note
-The query does not work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94671. This alternative formulation with a minor fix works:
+The query doesn't work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/94671. This alternative formulation with a minor fix works:
```sql
WITH
@@ -5710,7 +5710,7 @@ LIMIT 100;
```
::::note
-The query does not work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/95299. This alternative formulation with a minor fix works:
+The query doesn't work out-of-the-box due to https://github.com/ClickHouse/ClickHouse/issues/95299. This alternative formulation with a minor fix works:
```sql
WITH
diff --git a/docs/getting-started/example-datasets/tpch.md b/docs/getting-started/example-datasets/tpch.md
index 6b13a9121b9..5ec41002c75 100644
--- a/docs/getting-started/example-datasets/tpch.md
+++ b/docs/getting-started/example-datasets/tpch.md
@@ -54,8 +54,8 @@ Now create tables in ClickHouse.
We stick as closely as possible to the rules of the TPC-H specification:
- Primary keys are created only for the columns mentioned in section 1.4.2.2 of the specification.
- Substitution parameters were replaced by the values for query validation in sections 2.1.x.4 of the specification.
-- As per section 1.4.2.1, the table definitions do not use the optional `NOT NULL` constraints, even if `dbgen` generates them by default.
- The performance of `SELECT` queries in ClickHouse is not affected by the presence or absence of `NOT NULL` constraints.
+- As per section 1.4.2.1, the table definitions don't use the optional `NOT NULL` constraints, even if `dbgen` generates them by default.
+ The performance of `SELECT` queries in ClickHouse isn't affected by the presence or absence of `NOT NULL` constraints.
- As per section 1.3.1, we use ClickHouse's native datatypes (e.g. `Int32`, `String`) to implement the abstract datatypes mentioned in the
specification (e.g. `Identifier`, `Variable text, size N`). The only effect of this is better readability, the SQL-92 datatypes generated
by `dbgen` (e.g. `INTEGER`, `VARCHAR(40)`) would also work in ClickHouse.
@@ -390,7 +390,7 @@ WHERE
```
::::note
-As of February 2025, the query does not work out-of-the box due to a bug with Decimal addition. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/70136
+As of February 2025, the query doesn't work out-of-the box due to a bug with Decimal addition. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/70136
This alternative formulation works and was verified to return the reference results.
diff --git a/docs/getting-started/example-datasets/youtube-dislikes.md b/docs/getting-started/example-datasets/youtube-dislikes.md
index 93fcf8571a0..5a17b05152b 100644
--- a/docs/getting-started/example-datasets/youtube-dislikes.md
+++ b/docs/getting-started/example-datasets/youtube-dislikes.md
@@ -103,7 +103,7 @@ ORDER BY (uploader, upload_date)
The following command streams the records from the S3 files into the `youtube` table.
:::important
-This inserts a lot of data - 4.65 billion rows. If you do not want the entire dataset, simply add a `LIMIT` clause with the desired number of rows.
+This inserts a lot of data - 4.65 billion rows. If you don't want the entire dataset, simply add a `LIMIT` clause with the desired number of rows.
:::
```sql
@@ -139,7 +139,7 @@ FROM s3(
Some comments about our `INSERT` command:
-- The `parseDateTimeBestEffortUSOrZero` function is handy when the incoming date fields may not be in the proper format. If `fetch_date` does not get parsed properly, it will be set to `0`
+- The `parseDateTimeBestEffortUSOrZero` function is handy when the incoming date fields may not be in the proper format. If `fetch_date` doesn't get parsed properly, it will be set to `0`
- The `upload_date` column contains valid dates, but it also contains strings like "4 hours ago" - which is certainly not a valid date. We decided to store the original value in `upload_date_str` and attempt to parse it with `toDate(parseDateTimeBestEffortUSOrZero(upload_date::String))`. If the parsing fails we just get `0`
- We used `ifNull` to avoid getting `NULL` values in our table. If an incoming value is `NULL`, the `ifNull` function is setting the value to an empty string
diff --git a/docs/getting-started/install/_snippets/_deb_install.md b/docs/getting-started/install/_snippets/_deb_install.md
index 31e1fa6532d..60bcc70d6e4 100644
--- a/docs/getting-started/install/_snippets/_deb_install.md
+++ b/docs/getting-started/install/_snippets/_deb_install.md
@@ -91,7 +91,7 @@ clickhouse-client --password
:::tip
In production environments we strongly recommend running ClickHouse Keeper on dedicated nodes.
In test environments, if you decide to run ClickHouse Server and ClickHouse Keeper on the same server,
-then you do not need to install ClickHouse Keeper as it is included with ClickHouse server.
+then you don't need to install ClickHouse Keeper as it is included with ClickHouse server.
:::
To install `clickhouse-keeper` on standalone ClickHouse Keeper servers, run:
@@ -120,7 +120,7 @@ The various deb packages available are detailed below:
| `clickhouse-server` | Creates a symbolic link for `clickhouse-server` and installs the default server configuration. |
| `clickhouse-client` | Creates a symbolic link for `clickhouse-client` and other client-related tools. and installs client configuration files. |
| `clickhouse-common-static-dbg` | Installs ClickHouse compiled binary files with debug info. |
-| `clickhouse-keeper` | Used to install ClickHouse Keeper on dedicated ClickHouse Keeper nodes. If you are running ClickHouse Keeper on the same server as ClickHouse server, then you do not need to install this package. Installs ClickHouse Keeper and the default ClickHouse Keeper configuration files. |
+| `clickhouse-keeper` | Used to install ClickHouse Keeper on dedicated ClickHouse Keeper nodes. If you're running ClickHouse Keeper on the same server as ClickHouse server, then you don't need to install this package. Installs ClickHouse Keeper and the default ClickHouse Keeper configuration files. |
:::info
diff --git a/docs/getting-started/install/_snippets/_docker.md b/docs/getting-started/install/_snippets/_docker.md
index 16ea41ffb72..8456dcb5926 100644
--- a/docs/getting-started/install/_snippets/_docker.md
+++ b/docs/getting-started/install/_snippets/_docker.md
@@ -70,7 +70,7 @@ docker rm some-clickhouse-server
### Networking {#networking}
:::note
-the predefined user `default` does not have the network access unless the password is set,
+the predefined user `default` doesn't have the network access unless the password is set,
see "How to create default database and user on starting" and "Managing `default` user" below
:::
@@ -118,7 +118,7 @@ You may also want to mount:
ClickHouse has some advanced functionality, which requires enabling several [Linux capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html)
-They are optional and can be enabled using the following [docker command-line arguments](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities):
+They're optional and can be enabled using the following [docker command-line arguments](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities):
```bash
docker run -d \
diff --git a/docs/getting-started/install/_snippets/_linux_tar_install.md b/docs/getting-started/install/_snippets/_linux_tar_install.md
index 1f0a07bdc52..fce03d56773 100644
--- a/docs/getting-started/install/_snippets/_linux_tar_install.md
+++ b/docs/getting-started/install/_snippets/_linux_tar_install.md
@@ -1,6 +1,6 @@
# Install ClickHouse using tgz archives
-> It is recommended to use official pre-compiled `tgz` archives for all Linux distributions, where installation of `deb` or `rpm` packages is not possible.
+> It is recommended to use official pre-compiled `tgz` archives for all Linux distributions, where installation of `deb` or `rpm` packages isn't possible.
diff --git a/docs/getting-started/install/_snippets/_macos.md b/docs/getting-started/install/_snippets/_macos.md
index 4ce3ab22e96..5ef94cbeaea 100644
--- a/docs/getting-started/install/_snippets/_macos.md
+++ b/docs/getting-started/install/_snippets/_macos.md
@@ -24,7 +24,7 @@ brew install --cask clickhouse
## Fix the developer verification error in macOS {#fix-developer-verification-error-macos}
If you install ClickHouse using `brew`, you may encounter an error from MacOS.
-By default, MacOS will not run applications or tools created by a developer who cannot be verified.
+By default, MacOS won't run applications or tools created by a developer who can't be verified.
When attempting to run any `clickhouse` command, you may see this error:
@@ -41,7 +41,7 @@ The easiest way to remove the `clickhouse` executable from the quarantine bin is
-1. Scroll to the bottom of the window to find a message saying _"clickhouse-macos-aarch64" was blocked from use because it is not from an identified developer".
+1. Scroll to the bottom of the window to find a message saying _"clickhouse-macos-aarch64" was blocked from use because it isn't from an identified developer".
1. Click **Allow Anyway**.
diff --git a/docs/getting-started/install/_snippets/_quick_install.md b/docs/getting-started/install/_snippets/_quick_install.md
index d9dcb1da5ab..19ebcf5b8da 100644
--- a/docs/getting-started/install/_snippets/_quick_install.md
+++ b/docs/getting-started/install/_snippets/_quick_install.md
@@ -15,7 +15,7 @@ curl https://clickhouse.com/ | sh
```
:::note
-For Mac users: If you are getting errors that the developer of the binary cannot be verified, please see [here](/knowledgebase/fix-developer-verification-error-in-macos).
+For Mac users: If you're getting errors that the developer of the binary can't be verified, please see [here](/knowledgebase/fix-developer-verification-error-in-macos).
:::
## Start clickhouse-local {#start-clickhouse-local}
@@ -68,7 +68,7 @@ file. All available configuration settings are documented [here](/operations/ser
[example configuration file
template](https://github.com/ClickHouse/ClickHouse/blob/master/programs/server/config.xml).
-You are now ready to start sending SQL commands to ClickHouse!
+You're now ready to start sending SQL commands to ClickHouse!
:::tip
The [Quick Start](/get-started/quick-start) walks you through the steps for creating tables and inserting data.
diff --git a/docs/getting-started/install/_snippets/_rpm_install.md b/docs/getting-started/install/_snippets/_rpm_install.md
index 94ae1de6d88..99807dbdbb7 100644
--- a/docs/getting-started/install/_snippets/_rpm_install.md
+++ b/docs/getting-started/install/_snippets/_rpm_install.md
@@ -22,7 +22,7 @@ sudo zypper --gpg-auto-import-keys refresh clickhouse-stable
```
In the steps below, `yum install` can be replaced by `zypper install`, depending
-on which package manager you are using.
+on which package manager you're using.
## Install ClickHouse server and client {#install-clickhouse-server-and-client-1}
@@ -68,7 +68,7 @@ clickhouse-client --password
:::tip
In production environments we strongly recommend running ClickHouse Keeper on dedicated nodes.
In test environments, if you decide to run ClickHouse Server and ClickHouse Keeper on the same server,
-then you do not need to install ClickHouse Keeper as it is included with ClickHouse server.
+then you don't need to install ClickHouse Keeper as it is included with ClickHouse server.
:::
To install `clickhouse-keeper` on standalone ClickHouse Keeper servers, run:
diff --git a/docs/getting-started/install/_snippets/_windows_install.md b/docs/getting-started/install/_snippets/_windows_install.md
index aad7d44ab24..a46fc91db4a 100644
--- a/docs/getting-started/install/_snippets/_windows_install.md
+++ b/docs/getting-started/install/_snippets/_windows_install.md
@@ -88,6 +88,6 @@ file. All available configuration settings are documented [here](/operations/ser
[example configuration file
template](https://github.com/ClickHouse/ClickHouse/blob/master/programs/server/config.xml).
-You are now ready to start sending SQL commands to ClickHouse!
+You're now ready to start sending SQL commands to ClickHouse!
diff --git a/docs/getting-started/install/advanced.md b/docs/getting-started/install/advanced.md
index a4d711e06b9..011d90fc396 100644
--- a/docs/getting-started/install/advanced.md
+++ b/docs/getting-started/install/advanced.md
@@ -19,7 +19,7 @@ Client: /programs/clickhouse-client
Server: /programs/clickhouse-server
```
-You'll need to create data and metadata folders manually and `chown` them for the desired user. Their paths can be changed in server config (src/programs/server/config.xml), by default they are:
+You'll need to create data and metadata folders manually and `chown` them for the desired user. Their paths can be changed in server config (src/programs/server/config.xml), by default they're:
```bash
/var/lib/clickhouse/data/default/
diff --git a/docs/getting-started/playground.md b/docs/getting-started/playground.md
index 388022e9f11..3945e10f3b7 100644
--- a/docs/getting-started/playground.md
+++ b/docs/getting-started/playground.md
@@ -28,8 +28,8 @@ You can make queries to Playground using any HTTP client, for example [curl](htt
The queries are executed as a read-only user. It implies some limitations:
-- DDL queries are not allowed
-- INSERT queries are not allowed
+- DDL queries aren't allowed
+- INSERT queries aren't allowed
The service also have quotas on its usage.
diff --git a/docs/getting-started/quick-start/cloud.mdx b/docs/getting-started/quick-start/cloud.mdx
index be934600ee2..04a52d24355 100644
--- a/docs/getting-started/quick-start/cloud.mdx
+++ b/docs/getting-started/quick-start/cloud.mdx
@@ -42,7 +42,7 @@ To create a free ClickHouse service in [ClickHouse Cloud](https://console.clickh
-Once you are logged in, ClickHouse Cloud starts the onboarding wizard which walks you through creating a new ClickHouse service. Select your desired region for deploying the service, and give your new service a name:
+Once you're logged in, ClickHouse Cloud starts the onboarding wizard which walks you through creating a new ClickHouse service. Select your desired region for deploying the service, and give your new service a name:
@@ -78,7 +78,7 @@ You should see 4 databases in the list, plus any that you may have added.
-That's it - you are ready to start using your new ClickHouse service!
+That's it - you're ready to start using your new ClickHouse service!
### Connect with your app \{#connect-with-your-app\}
@@ -216,7 +216,7 @@ You can also connect to your ClickHouse Cloud service using a command-line tool
--user default \
--password
```
-If you get the smiley face prompt, you are ready to run queries!
+If you get the smiley face prompt, you're ready to run queries!
```response
:)
```
@@ -315,5 +315,5 @@ Suppose we have the following text in a CSV file named `data.csv`:
- We have a list of [example datasets](/getting-started/index.md) with instructions on how to insert them
- Check out our 25-minute video on [Getting Started with ClickHouse](https://clickhouse.com/company/events/getting-started-with-clickhouse/)
- If your data is coming from an external source, view our [collection of integration guides](/integrations/index.mdx) for connecting to message queues, databases, pipelines and more
-- If you are using a UI/BI visualization tool, view the [user guides for connecting a UI to ClickHouse](/integrations/data-visualization)
+- If you're using a UI/BI visualization tool, view the [user guides for connecting a UI to ClickHouse](/integrations/data-visualization)
- The user guide on [primary keys](/guides/best-practices/sparse-primary-indexes.md) is everything you need to know about primary keys and how to define them
diff --git a/docs/getting-started/quick-start/oss.mdx b/docs/getting-started/quick-start/oss.mdx
index fa8b0537ee0..be78fb7a177 100644
--- a/docs/getting-started/quick-start/oss.mdx
+++ b/docs/getting-started/quick-start/oss.mdx
@@ -35,8 +35,8 @@ We recommend running the command below from a new and empty subdirectory as
some configuration files will be created in the directory the binary is located
in the first time ClickHouse server is run.
-The script below is not the recommended way to install ClickHouse for production.
-If you are looking to install a production instance of ClickHouse, please see the [install page](/install).
+The script below isn't the recommended way to install ClickHouse for production.
+If you're looking to install a production instance of ClickHouse, please see the [install page](/install).
:::
```bash
@@ -56,7 +56,7 @@ sudo ./clickhouse install
At this stage, you can ignore the prompt to run the `install` command.
:::note
-For Mac users: If you are getting errors that the developer of the binary cannot
+For Mac users: If you're getting errors that the developer of the binary can't
be verified, please see ["Fix the Developer Verification Error in MacOS"](https://clickhouse.com/docs/knowledgebase/fix-developer-verification-error-in-macos).
:::
@@ -167,7 +167,7 @@ technologies that integrate with ClickHouse.
1. used as the source of a `SELECT` query (allowing you to run ad-hoc queries and
leave your data in S3), or...
- 2. insert the resulting table into a `MergeTree` table (when you are ready to
+ 2. insert the resulting table into a `MergeTree` table (when you're ready to
move your data into ClickHouse)
An ad-hoc query looks like:
@@ -373,7 +373,7 @@ technologies that integrate with ClickHouse.
- Continue your learning by taking our free on-demand training courses at the [ClickHouse Academy](https://learn.clickhouse.com/visitor_class_catalog).
- We have a list of [example datasets](/getting-started/example-datasets/) with instructions on how to insert them.
- If your data is coming from an external source, view our [collection of integration guides](/integrations/) for connecting to message queues, databases, pipelines and more.
-- If you are using a UI/BI visualization tool, view the [user guides for connecting a UI to ClickHouse](/integrations/data-visualization/).
+- If you're using a UI/BI visualization tool, view the [user guides for connecting a UI to ClickHouse](/integrations/data-visualization/).
- The user guide on [primary keys](/guides/best-practices/sparse-primary-indexes.md) is everything you need to know about primary keys and how to define them.
diff --git a/docs/guides/best-practices/_snippets/_performance_optimizations_table_of_contents.md b/docs/guides/best-practices/_snippets/_performance_optimizations_table_of_contents.md
index 0eb29693249..f0d83269b33 100644
--- a/docs/guides/best-practices/_snippets/_performance_optimizations_table_of_contents.md
+++ b/docs/guides/best-practices/_snippets/_performance_optimizations_table_of_contents.md
@@ -10,7 +10,7 @@
| [Asynchronous inserts](/optimize/asynchronous-inserts) | Improve insert performance by leveraging server-side batching to reduce client-side complexity and increase throughput for high-frequency insertions. |
| [Avoid mutations](/optimize/avoid-mutations) | Design append-only workflows that eliminate costly `UPDATE` and `DELETE` operations while maintaining data accuracy and performance. |
| [Avoid nullable columns](/optimize/avoid-nullable-columns) | Reduce storage overhead and improve query performance by using default values instead of nullable columns where possible. |
-| [Avoid `OPTIMIZE FINAL`](/optimize/avoidoptimizefinal) | Understand when you should and should not use `OPTIMIZE TABLE FINAL` |
+| [Avoid `OPTIMIZE FINAL`](/optimize/avoidoptimizefinal) | Understand when you should and shouldn't use `OPTIMIZE TABLE FINAL` |
| [Analyzer](/operations/analyzer) | Leverage ClickHouse's new query analyzer to identify performance bottlenecks and optimize query execution plans for better efficiency. |
| [Query profiling](/operations/optimizing-performance/sampling-query-profiler) | Use the sampling query profiler to analyze query execution patterns, identify performance hot spots, and optimize resource usage. |
| [Query cache](/operations/query-cache) | Accelerate frequently executed `SELECT` queries by enabling and configuring ClickHouse's built-in query result caching. |
diff --git a/docs/guides/best-practices/query-optimization.md b/docs/guides/best-practices/query-optimization.md
index 40298786258..70ca5f31232 100644
--- a/docs/guides/best-practices/query-optimization.md
+++ b/docs/guides/best-practices/query-optimization.md
@@ -30,7 +30,7 @@ In this section, we will look at those tools and how to use them.
To understand query performance, let's look at what happens in ClickHouse when a query is executed.
-The following part is deliberately simplified and takes some shortcuts; the idea here is not to drown you with details but to get you up to speed with the basic concepts. For more information you can read about [query analyzer](/operations/analyzer).
+The following part is deliberately simplified and takes some shortcuts; the idea here isn't to drown you with details but to get you up to speed with the basic concepts. For more information you can read about [query analyzer](/operations/analyzer).
From a very high-level standpoint, when ClickHouse executes a query, the following happens:
@@ -60,7 +60,7 @@ We'll use a real example to illustrate how we approach query performances.
Let's use the NYC Taxi dataset, which contains taxi ride data in NYC. First, we start by ingesting the NYC taxi dataset with no optimization.
-Below is the command to create the table and insert data from an S3 bucket. Note that we infer the schema from the data voluntarily, which is not optimized.
+Below is the command to create the table and insert data from an S3 bucket. Note that we infer the schema from the data voluntarily, which isn't optimized.
```sql
-- Create table with inferred schema
@@ -325,7 +325,7 @@ The table contains 329.04 million rows, therefore each query is doing a full sca
### Explain statement {#explain-statement}
-Now that we have some long-running queries, let's understand how they are executed. For this, ClickHouse supports the [EXPLAIN statement command](/sql-reference/statements/explain). It is a very useful tool that provides a very detailed view of all the query execution stages without actually running the query. While it can be overwhelming to look at for a non-ClickHouse expert, it remains an essential tool for gaining insight into how your query is executed.
+Now that we have some long-running queries, let's understand how they're executed. For this, ClickHouse supports the [EXPLAIN statement command](/sql-reference/statements/explain). It is a very useful tool that provides a very detailed view of all the query execution stages without actually running the query. While it can be overwhelming to look at for a non-ClickHouse expert, it remains an essential tool for gaining insight into how your query is executed.
The documentation provides a detailed [guide](/guides/developer/understanding-query-execution-with-the-analyzer) on what the EXPLAIN statement is and how to use it to analyze your query execution. Rather than repeating what is in this guide, let's focus on a few commands that will help us find bottlenecks in query execution performance.
@@ -465,7 +465,7 @@ pickup_location_id_nulls: 0
dropoff_location_id_nulls: 0
```
-We have only two columns with null values: `mta_tax` and `payment_type`. The rest of the fields should not be using a `Nullable` column.
+We have only two columns with null values: `mta_tax` and `payment_type`. The rest of the fields shouldn't be using a `Nullable` column.
### Low cardinality {#low-cardinality}
@@ -588,7 +588,7 @@ The new table is considerably smaller than the previous one. We see a reduction
Primary keys in ClickHouse work differently than in most traditional database systems. In those systems, primary keys enforce uniqueness and data integrity. Any attempt to insert duplicate primary key values is rejected, and a B-tree or hash-based index is usually created for fast lookup.
-In ClickHouse, the primary key's [objective](/guides/best-practices/sparse-primary-indexes#a-table-with-a-primary-key) is different; it does not enforce uniqueness or help with data integrity. Instead, it is designed to optimize query performance. The primary key defines the order in which the data is stored on disk and is implemented as a sparse index that stores pointers to the first row of each granule.
+In ClickHouse, the primary key's [objective](/guides/best-practices/sparse-primary-indexes#a-table-with-a-primary-key) is different; it doesn't enforce uniqueness or help with data integrity. Instead, it is designed to optimize query performance. The primary key defines the order in which the data is stored on disk and is implemented as a sparse index that stores pointers to the first row of each granule.
> Granules in ClickHouse are the smallest units of data read during query execution. They contain up to a fixed number of rows, determined by index_granularity, with a default value of 8192 rows. Granules are stored contiguously and sorted by the primary key.
diff --git a/docs/guides/best-practices/skipping-indexes.md b/docs/guides/best-practices/skipping-indexes.md
index 42c6f093e16..b4aef967f8a 100644
--- a/docs/guides/best-practices/skipping-indexes.md
+++ b/docs/guides/best-practices/skipping-indexes.md
@@ -20,7 +20,7 @@ Many factors affect ClickHouse query performance. The critical element in most s
Nevertheless, no matter how carefully tuned the primary key, there will inevitably be query use cases that can not efficiently use it. Users commonly rely on ClickHouse for time series type data, but they often wish to analyze that same data according to other business dimensions, such as customer id, website URL, or product number. In that case, query performance can be considerably worse because a full scan of each column value may be required to apply the WHERE clause condition. While ClickHouse is still relatively fast in those circumstances, evaluating millions or billions of individual values will cause "non-indexed" queries to execute much more slowly than those based on the primary key.
-In a traditional relational database, one approach to this problem is to attach one or more "secondary" indexes to a table. This is a b-tree structure that permits the database to find all matching rows on disk in O(log(n)) time instead of O(n) time (a table scan), where n is the number of rows. However, this type of secondary index will not work for ClickHouse (or other column-oriented databases) because there are no individual rows on the disk to add to the index.
+In a traditional relational database, one approach to this problem is to attach one or more "secondary" indexes to a table. This is a b-tree structure that permits the database to find all matching rows on disk in O(log(n)) time instead of O(n) time (a table scan), where n is the number of rows. However, this type of secondary index won't work for ClickHouse (or other column-oriented databases) because there are no individual rows on the disk to add to the index.
Instead, ClickHouse provides a different type of index, which in specific circumstances can significantly improve query speed. These structures are labeled "Skip" indexes because they enable ClickHouse to skip reading significant chunks of data that are guaranteed to have no matching values.
@@ -38,7 +38,7 @@ When a user creates a data skipping index, there will be two additional files in
- `skp_idx_{index_name}.idx`, which contains the ordered expression values
- `skp_idx_{index_name}.mrk2`, which contains the corresponding offsets into the associated data column files.
-If some portion of the WHERE clause filtering condition matches the skip index expression when executing a query and reading the relevant column files, ClickHouse will use the index file data to determine whether each relevant block of data must be processed or can be bypassed (assuming that the block has not already been excluded by applying the primary key). To use a very simplified example, consider the following table loaded with predictable data.
+If some portion of the WHERE clause filtering condition matches the skip index expression when executing a query and reading the relevant column files, ClickHouse will use the index file data to determine whether each relevant block of data must be processed or can be bypassed (assuming that the block hasn't already been excluded by applying the primary key). To use a very simplified example, consider the following table loaded with predictable data.
```sql
CREATE TABLE skip_table
@@ -52,7 +52,7 @@ SETTINGS index_granularity=8192;
INSERT INTO skip_table SELECT number, intDiv(number,4096) FROM numbers(100000000);
```
-When executing a simple query that does not use the primary key, all 100 million entries in the `my_value`
+When executing a simple query that doesn't use the primary key, all 100 million entries in the `my_value`
column are scanned:
```sql
@@ -134,11 +134,11 @@ This type of index only works correctly with a scalar or tuple expression -- the
This lightweight index type accepts a single parameter of the max_size of the value set per block (0 permits
an unlimited number of discrete values). This set contains all values in the block (or is empty if the number of values exceeds the max_size). This index type works well with columns with low cardinality within each set of granules (essentially, "clumped together") but higher cardinality overall.
-The cost, performance, and effectiveness of this index is dependent on the cardinality within blocks. If each block contains a large number of unique values, either evaluating the query condition against a large index set will be very expensive, or the index will not be applied because the index is empty due to exceeding max_size.
+The cost, performance, and effectiveness of this index is dependent on the cardinality within blocks. If each block contains a large number of unique values, either evaluating the query condition against a large index set will be very expensive, or the index won't be applied because the index is empty due to exceeding max_size.
### Bloom filter types {#bloom-filter-types}
-A *Bloom filter* is a data structure that allows space-efficient testing of set membership at the cost of a slight chance of false positives. A false positive is not a significant concern in the case of skip indexes because the only disadvantage is reading a few unnecessary blocks. However, the potential for false positives does mean that the indexed expression should be expected to be true, otherwise valid data may be skipped.
+A *Bloom filter* is a data structure that allows space-efficient testing of set membership at the cost of a slight chance of false positives. A false positive isn't a significant concern in the case of skip indexes because the only disadvantage is reading a few unnecessary blocks. However, the potential for false positives does mean that the indexed expression should be expected to be true, otherwise valid data may be skipped.
Because Bloom filters can more efficiently handle testing for a large number of discrete values, they can be appropriate for conditional expressions that produce more values to test. In particular, a Bloom filter index can be applied to arrays, where every value of the array is tested, and to maps, by converting either the keys or values to an array using the mapKeys or mapValues function.
@@ -162,7 +162,7 @@ The core purpose of data-skipping indexes is to limit the amount of data analyze
* the query is processed and the expression is applied to the stored index values to determine whether to exclude the block.
Each type of skip index works on a subset of available ClickHouse functions appropriate to the index implementation listed
-[here](/engines/table-engines/mergetree-family/mergetree/#functions-support). In general, set indexes and Bloom filter based indexes (another type of set index) are both unordered and therefore do not work with ranges. In contrast, minmax indexes work particularly well with ranges since determining whether ranges intersect is very fast. The efficacy of partial match functions LIKE, startsWith, endsWith, and hasToken depend on the index type used, the index expression, and the particular shape of the data.
+[here](/engines/table-engines/mergetree-family/mergetree/#functions-support). In general, set indexes and Bloom filter based indexes (another type of set index) are both unordered and therefore don't work with ranges. In contrast, minmax indexes work particularly well with ranges since determining whether ranges intersect is very fast. The efficacy of partial match functions LIKE, startsWith, endsWith, and hasToken depend on the index type used, the index expression, and the particular shape of the data.
## Skip index settings {#skip-index-settings}
@@ -173,12 +173,12 @@ likely to include most granules, applying the data skipping index incurs an unne
0 for queries that are unlikely to benefit from any skip indexes.
* **force_data_skipping_indices** (comma separated list of index names). This setting can be used to prevent some kinds of inefficient
queries. In circumstances where querying a table is too expensive unless a skip index is used, using this setting with one or more index
-names will return an exception for any query that does not use the listed index. This would prevent poorly written queries from
+names will return an exception for any query that doesn't use the listed index. This would prevent poorly written queries from
consuming server resources.
## Skip index best practices {#skip-best-practices}
-Skip indexes are not intuitive, especially for those accustomed to secondary row-based indexes from the RDMS realm or inverted indexes from document stores. To get any benefit, applying a ClickHouse data skipping index must avoid enough granule reads to offset the cost of calculating the index. Critically, if a value occurs even once in an indexed block, it means the entire block must be read into memory and evaluated, and the index cost has been needlessly incurred.
+Skip indexes aren't intuitive, especially for those accustomed to secondary row-based indexes from the RDMS realm or inverted indexes from document stores. To get any benefit, applying a ClickHouse data skipping index must avoid enough granule reads to offset the cost of calculating the index. Critically, if a value occurs even once in an indexed block, it means the entire block must be read into memory and evaluated, and the index cost has been needlessly incurred.
Consider the following data distribution:
@@ -215,7 +215,7 @@ important for searches. A set skip index on the error_code column would allow b
errors and therefore significantly improve error focused queries.
Finally, the key best practice is to test, test, test. Again, unlike b-tree secondary indexes or inverted indexes for searching documents,
-data skipping index behavior is not easily predictable. Adding them to a table incurs a meaningful cost both on data ingest and on queries
+data skipping index behavior isn't easily predictable. Adding them to a table incurs a meaningful cost both on data ingest and on queries
that for any number of reasons don't benefit from the index. They should always be tested on real world type of data, and testing should
include variations of the type, granularity size and other parameters. Testing will often reveal patterns and pitfalls that aren't obvious from
thought experiments alone.
diff --git a/docs/guides/best-practices/sparse-primary-indexes.md b/docs/guides/best-practices/sparse-primary-indexes.md
index 333ff2c850e..73671cbe1fb 100644
--- a/docs/guides/best-practices/sparse-primary-indexes.md
+++ b/docs/guides/best-practices/sparse-primary-indexes.md
@@ -1,7 +1,7 @@
---
sidebar_label: 'Primary indexes'
sidebar_position: 1
-description: 'In this guide we are going to do a deep dive into ClickHouse indexing.'
+description: 'In this guide we're going to do a deep dive into ClickHouse indexing.'
title: 'A Practical Introduction to Primary Indexes in ClickHouse'
slug: /guides/best-practices/sparse-primary-indexes
show_related_blogs: true
@@ -39,7 +39,7 @@ import Image from '@theme/IdealImage';
## Introduction {#introduction}
-In this guide we are going to do a deep dive into ClickHouse indexing. We will illustrate and discuss in detail:
+In this guide we're going to do a deep dive into ClickHouse indexing. We will illustrate and discuss in detail:
- [how indexing in ClickHouse is different from traditional relational database management systems](#an-index-design-for-massive-data-scales)
- [how ClickHouse is building and using a table's sparse primary index](#a-table-with-a-primary-key)
- [what some of the best practices are for indexing in ClickHouse](#using-multiple-primary-indexes)
@@ -113,7 +113,7 @@ OPTIMIZE TABLE hits_NoPrimaryKey FINAL;
```
:::note
-In general it is not required nor recommended to immediately optimize a table
+In general it isn't required nor recommended to immediately optimize a table
after loading data into it. Why this is necessary for this example will become apparent.
:::
@@ -326,7 +326,7 @@ ClickHouse is a
-When the UserID has high cardinality then it is unlikely that the same UserID value is spread over multiple table rows and granules. This means the URL values for the index marks are not monotonically increasing:
+When the UserID has high cardinality then it is unlikely that the same UserID value is spread over multiple table rows and granules. This means the URL values for the index marks aren't monotonically increasing:
As we can see in the diagram above, all shown marks whose URL values are smaller than W3 are getting selected for streaming its associated granule's rows into the ClickHouse engine.
-This is because whilst all index marks in the diagram fall into scenario 1 described above, they do not satisfy the mentioned exclusion-precondition that *the directly succeeding index mark has the same UserID value as the current mark* and thus can't be excluded.
+This is because whilst all index marks in the diagram fall into scenario 1 described above, they don't satisfy the mentioned exclusion-precondition that *the directly succeeding index mark has the same UserID value as the current mark* and thus can't be excluded.
For example, consider index mark 0 for which the **URL value is smaller than W3 and for which the URL value of the directly succeeding index mark is also smaller than W3**. This can *not* be excluded because the directly succeeding index mark 1 does *not* have the same UserID value as the current mark 0.
@@ -797,10 +797,10 @@ This ultimately prevents ClickHouse from making assumptions about the maximum UR
The same scenario is true for mark 1, 2, and 3.
:::note Conclusion
-The generic exclusion search algorithm that ClickHouse is using instead of the binary search algorithm when a query is filtering on a column that is part of a compound key, but is not the first key column is most effective when the predecessor key column has low(er) cardinality.
+The generic exclusion search algorithm that ClickHouse is using instead of the binary search algorithm when a query is filtering on a column that is part of a compound key, but isn't the first key column is most effective when the predecessor key column has low(er) cardinality.
:::
-In our sample data set both key columns (UserID, URL) have similar high cardinality, and, as explained, the generic exclusion search algorithm is not very effective when the predecessor key column of the URL column has a high(er) or similar cardinality.
+In our sample data set both key columns (UserID, URL) have similar high cardinality, and, as explained, the generic exclusion search algorithm isn't very effective when the predecessor key column of the URL column has a high(er) or similar cardinality.
### Note about data skipping index {#note-about-data-skipping-index}
@@ -867,7 +867,7 @@ In the following we discuss this three options for creating and using multiple p
### Option 1: Secondary Tables {#option-1-secondary-tables}
-We are creating a new additional table where we switch the order of the key columns (compared to our original table) in the primary key:
+We're creating a new additional table where we switch the order of the key columns (compared to our original table) in the primary key:
```sql
CREATE TABLE hits_URL_UserID
@@ -947,7 +947,7 @@ Processed 319.49 thousand rows,
Now, instead of [almost doing a full table scan](/guides/best-practices/sparse-primary-indexes#efficient-filtering-on-secondary-key-columns), ClickHouse executed that query much more effectively.
-With the primary index from the [original table](#a-table-with-a-primary-key) where UserID was the first, and URL the second key column, ClickHouse used a [generic exclusion search](/guides/best-practices/sparse-primary-indexes#generic-exclusion-search-algorithm) over the index marks for executing that query and that was not very effective because of the similarly high cardinality of UserID and URL.
+With the primary index from the [original table](#a-table-with-a-primary-key) where UserID was the first, and URL the second key column, ClickHouse used a [generic exclusion search](/guides/best-practices/sparse-primary-indexes#generic-exclusion-search-algorithm) over the index marks for executing that query and that wasn't very effective because of the similarly high cardinality of UserID and URL.
With URL as the first column in the primary index, ClickHouse is now running binary search over the index marks.
The corresponding trace log in the ClickHouse server log file confirms that:
@@ -968,7 +968,7 @@ ClickHouse selected only 39 index marks, instead of 1076 when generic exclusion
Note that the additional table is optimized for speeding up the execution of our example query filtering on URLs.
-Similar to the [bad performance](/guides/best-practices/sparse-primary-indexes#secondary-key-columns-can-not-be-inefficient) of that query with our [original table](#a-table-with-a-primary-key), our [example query filtering on `UserIDs`](#the-primary-index-is-used-for-selecting-granules) will not run very effectively with the new additional table, because UserID is now the second key column in the primary index of that table and therefore ClickHouse will use generic exclusion search for granule selection, which is [not very effective for similarly high cardinality](/guides/best-practices/sparse-primary-indexes#generic-exclusion-search-algorithm) of UserID and URL.
+Similar to the [bad performance](/guides/best-practices/sparse-primary-indexes#secondary-key-columns-can-not-be-inefficient) of that query with our [original table](#a-table-with-a-primary-key), our [example query filtering on `UserIDs`](#the-primary-index-is-used-for-selecting-granules) won't run very effectively with the new additional table, because UserID is now the second key column in the primary index of that table and therefore ClickHouse will use generic exclusion search for granule selection, which is [not very effective for similarly high cardinality](/guides/best-practices/sparse-primary-indexes#generic-exclusion-search-algorithm) of UserID and URL.
Open the details box for specifics.
@@ -1130,11 +1130,11 @@ ALTER TABLE hits_UserID_URL
:::note
- the projection is creating a **hidden table** whose row order and primary index is based on the given `ORDER BY` clause of the projection
-- the hidden table is not listed by the `SHOW TABLES` query
+- the hidden table isn't listed by the `SHOW TABLES` query
- we use the `MATERIALIZE` keyword in order to immediately populate the hidden table with all 8.87 million rows from the source table [hits_UserID_URL](#a-table-with-a-primary-key)
- if new rows are inserted into the source table hits_UserID_URL, then that rows are automatically also inserted into the hidden table
- a query is always (syntactically) targeting the source table hits_UserID_URL, but if the row order and primary index of the hidden table allows a more effective query execution, then that hidden table will be used instead
-- please note that projections do not make queries that use ORDER BY more efficient, even if the ORDER BY matches the projection's ORDER BY statement (see https://github.com/ClickHouse/ClickHouse/issues/47333)
+- please note that projections don't make queries that use ORDER BY more efficient, even if the ORDER BY matches the projection's ORDER BY statement (see https://github.com/ClickHouse/ClickHouse/issues/47333)
- Effectively the implicitly created hidden table has the same row order and primary index as the [secondary table that we created explicitly](/guides/best-practices/sparse-primary-indexes#option-1-secondary-tables):
@@ -1199,7 +1199,7 @@ The corresponding trace log in the ClickHouse server log file confirms that Clic
### Summary {#summary}
-The primary index of our [table with compound primary key (UserID, URL)](#a-table-with-a-primary-key) was very useful for speeding up a [query filtering on UserID](#the-primary-index-is-used-for-selecting-granules). But that index is not providing significant help with speeding up a [query filtering on URL](/guides/best-practices/sparse-primary-indexes#secondary-key-columns-can-not-be-inefficient), despite the URL column being part of the compound primary key.
+The primary index of our [table with compound primary key (UserID, URL)](#a-table-with-a-primary-key) was very useful for speeding up a [query filtering on UserID](#the-primary-index-is-used-for-selecting-granules). But that index isn't providing significant help with speeding up a [query filtering on URL](/guides/best-practices/sparse-primary-indexes#secondary-key-columns-can-not-be-inefficient), despite the URL column being part of the compound primary key.
And vice versa:
The primary index of our [table with compound primary key (URL, UserID)](/guides/best-practices/sparse-primary-indexes#option-1-secondary-tables) was speeding up a [query filtering on URL](/guides/best-practices/sparse-primary-indexes#secondary-key-columns-can-not-be-inefficient), but didn't provide much support for a [query filtering on UserID](#the-primary-index-is-used-for-selecting-granules).
@@ -1225,9 +1225,9 @@ where each row contains three columns that indicate whether or not the access by
We will use a compound primary key containing all three aforementioned columns that could be used to speed up typical web analytics queries that calculate
- how much (percentage of) traffic to a specific URL is from bots or
-- how confident we are that a specific user is (not) a bot (what percentage of traffic from that user is (not) assumed to be bot traffic)
+- how confident we're that a specific user is (not) a bot (what percentage of traffic from that user is (not) assumed to be bot traffic)
-We use this query for calculating the cardinalities of the three columns that we want to use as key columns in a compound primary key (note that we are using the [URL table function](/sql-reference/table-functions/url.md) for querying TSV data ad hoc without having to create a local table). Run this query in `clickhouse client`:
+We use this query for calculating the cardinalities of the three columns that we want to use as key columns in a compound primary key (note that we're using the [URL table function](/sql-reference/table-functions/url.md) for querying TSV data ad hoc without having to create a local table). Run this query in `clickhouse client`:
```sql
SELECT
formatReadableQuantity(uniq(URL)) AS cardinality_URL,
@@ -1254,7 +1254,7 @@ The response is:
We can see that there is a big difference between the cardinalities, especially between the `URL` and `IsRobot` columns, and therefore the order of these columns in a compound primary key is significant for both the efficient speed up of queries filtering on that columns and for achieving optimal compression ratios for the table's column data files.
-In order to demonstrate that we are creating two table versions for our bot traffic analysis data:
+In order to demonstrate that we're creating two table versions for our bot traffic analysis data:
- a table `hits_URL_UserID_IsRobot` with the compound primary key `(URL, UserID, IsRobot)` where we order the key columns by cardinality in descending order
- a table `hits_IsRobot_UserID_URL` with the compound primary key `(IsRobot, UserID, URL)` where we order the key columns by cardinality in ascending order
@@ -1316,7 +1316,7 @@ The response is:
When a query is filtering on at least one column that is part of a compound key, and is the first key column, [then ClickHouse is running the binary search algorithm over the key column's index marks](#the-primary-index-is-used-for-selecting-granules).
-When a query is filtering (only) on a column that is part of a compound key, but is not the first key column, [then ClickHouse is using the generic exclusion search algorithm over the key column's index marks](/guides/best-practices/sparse-primary-indexes#secondary-key-columns-can-not-be-inefficient).
+When a query is filtering (only) on a column that is part of a compound key, but isn't the first key column, [then ClickHouse is using the generic exclusion search algorithm over the key column's index marks](/guides/best-practices/sparse-primary-indexes#secondary-key-columns-can-not-be-inefficient).
For the second case the ordering of the key columns in the compound primary key is significant for the effectiveness of the [generic exclusion search algorithm](https://github.com/ClickHouse/ClickHouse/blob/22.3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L1444).
diff --git a/docs/guides/developer/cascading-materialized-views.md b/docs/guides/developer/cascading-materialized-views.md
index 6333a8d79ad..228256f9440 100644
--- a/docs/guides/developer/cascading-materialized-views.md
+++ b/docs/guides/developer/cascading-materialized-views.md
@@ -122,11 +122,11 @@ GROUP BY
```
:::note
-A common misinterpretation when working with Materialized views is that data is read from the table, This is not how `Materialized views` work; the data forwarded is the inserted block, not the final result in your table.
+A common misinterpretation when working with Materialized views is that data is read from the table, This isn't how `Materialized views` work; the data forwarded is the inserted block, not the final result in your table.
-Let's imagine in this example that the engine used in `monthly_aggregated_data` is a CollapsingMergeTree, the data forwarded to our second Materialized view `year_aggregated_data_mv` will not be the final result of the collapsed table, it will forward the block of data with the fields defined as in the `SELECT ... GROUP BY`.
+Let's imagine in this example that the engine used in `monthly_aggregated_data` is a CollapsingMergeTree, the data forwarded to our second Materialized view `year_aggregated_data_mv` won't be the final result of the collapsed table, it will forward the block of data with the fields defined as in the `SELECT ... GROUP BY`.
-If you are using CollapsingMergeTree, ReplacingMergeTree, or even SummingMergeTree and you plan to create a cascade Materialized view you need to understand the limitations described here.
+If you're using CollapsingMergeTree, ReplacingMergeTree, or even SummingMergeTree and you plan to create a cascade Materialized view you need to understand the limitations described here.
:::
## Sample data {#sample-data}
@@ -153,11 +153,11 @@ Ok.
0 rows in set. Elapsed: 0.002 sec.
```
-We have used a small dataset to be sure we can follow and compare the result with what we are expecting, once your flow is correct with a small data set, you could just move to a large amount of data.
+We have used a small dataset to be sure we can follow and compare the result with what we're expecting, once your flow is correct with a small data set, you could just move to a large amount of data.
## Results {#results}
-If you try to query the target table by selecting the `sumCountViews` field, you will see the binary representation (in some terminals), as the value is not stored as a number but as an AggregateFunction type.
+If you try to query the target table by selecting the `sumCountViews` field, you will see the binary representation (in some terminals), as the value isn't stored as a number but as an AggregateFunction type.
To get the final result of the aggregation you should use the `-Merge` suffix.
You can see the special characters stored in AggregateFunction with this query:
diff --git a/docs/guides/developer/deduplicating-inserts-on-retries.md b/docs/guides/developer/deduplicating-inserts-on-retries.md
index 034c5582427..3bbfd7adadc 100644
--- a/docs/guides/developer/deduplicating-inserts-on-retries.md
+++ b/docs/guides/developer/deduplicating-inserts-on-retries.md
@@ -6,9 +6,9 @@ keywords: ['deduplication', 'deduplicate', 'insert retries', 'inserts']
doc_type: 'guide'
---
-Insert operations can sometimes fail due to errors such as timeouts. When inserts fail, data may or may not have been successfully inserted. This guide covers how to enable deduplication on insert retries such that the same data does not get inserted more than once.
+Insert operations can sometimes fail due to errors such as timeouts. When inserts fail, data may or may not have been successfully inserted. This guide covers how to enable deduplication on insert retries such that the same data doesn't get inserted more than once.
-When an insert is retried, ClickHouse tries to determine whether the data has already been successfully inserted. If the inserted data is marked as a duplicate, ClickHouse does not insert it into the destination table. However, the user will still receive a successful operation status as if the data had been inserted normally.
+When an insert is retried, ClickHouse tries to determine whether the data has already been successfully inserted. If the inserted data is marked as a duplicate, ClickHouse doesn't insert it into the destination table. However, the user will still receive a successful operation status as if the data had been inserted normally.
## Limitations {#limitations}
@@ -32,19 +32,19 @@ The settings above determine the parameters of the deduplication log for a table
### Query-level insert deduplication {#query-level-insert-deduplication}
-The setting `insert_deduplicate=1` enables deduplication at the query level. Note that if you insert data with `insert_deduplicate=0`, that data cannot be deduplicated even if you retry an insert with `insert_deduplicate=1`. This is because the `block_id`s are not written for blocks during inserts with `insert_deduplicate=0`.
+The setting `insert_deduplicate=1` enables deduplication at the query level. Note that if you insert data with `insert_deduplicate=0`, that data can't be deduplicated even if you retry an insert with `insert_deduplicate=1`. This is because the `block_id`s aren't written for blocks during inserts with `insert_deduplicate=0`.
## How insert deduplication works {#how-insert-deduplication-works}
When data is inserted into ClickHouse, it splits data into blocks based on the number of rows and bytes.
-For tables using `*MergeTree` engines, each block is assigned a unique `block_id`, which is a hash of the data in that block. This `block_id` is used as a unique key for the insert operation. If the same `block_id` is found in the deduplication log, the block is considered a duplicate and is not inserted into the table.
+For tables using `*MergeTree` engines, each block is assigned a unique `block_id`, which is a hash of the data in that block. This `block_id` is used as a unique key for the insert operation. If the same `block_id` is found in the deduplication log, the block is considered a duplicate and isn't inserted into the table.
This approach works well for cases where inserts contain different data. However, if the same data is inserted multiple times intentionally, you need to use the `insert_deduplication_token` setting to control the deduplication process. This setting allows you to specify a unique token for each insert, which ClickHouse uses to determine whether the data is a duplicate.
For `INSERT ... VALUES` queries, splitting the inserted data into blocks is deterministic and is determined by settings. Therefore, you should retry insertions with the same settings values as the initial operation.
-For `INSERT ... SELECT` queries, it's important that the `SELECT` part of the query returns the same data in the same order for each operation. Note, this is hard to achieve in practice. To ensure stable data order on retries, define a `ORDER BY ALL` section in the `SELECT` part of the query. Right now you have to use exactly `ORDER BY ALL` in the query. Support for `ORDER BY` is not implemented yet and the `SELECT` part of the query would not be considered as stable. Keep in mind that it is possible that the selected table could be updated between retries - the result data could have changed and deduplication will not occur. Additionally, in situations where you are inserting large amounts of data, it is possible that the number of blocks after inserts can overflow the deduplication log window, and ClickHouse won't know to deduplicate the blocks.
+For `INSERT ... SELECT` queries, it's important that the `SELECT` part of the query returns the same data in the same order for each operation. Note, this is hard to achieve in practice. To ensure stable data order on retries, define a `ORDER BY ALL` section in the `SELECT` part of the query. Right now you have to use exactly `ORDER BY ALL` in the query. Support for `ORDER BY` isn't implemented yet and the `SELECT` part of the query wouldn't be considered as stable. Keep in mind that it is possible that the selected table could be updated between retries - the result data could have changed and deduplication won't occur. Additionally, in situations where you're inserting large amounts of data, it is possible that the number of blocks after inserts can overflow the deduplication log window, and ClickHouse won't know to deduplicate the blocks.
Right now, the behavior for `INSERT ... SELECT` is controlled by the `insert_select_deduplicate` setting. This setting determines whether deduplication is applied to data inserted using `INSERT ... SELECT` queries. See the linked documentation for details and usage examples.
## Insert deduplication with materialized views {#insert-deduplication-with-materialized-views}
@@ -66,7 +66,7 @@ When inserting blocks into tables under materialized views, ClickHouse calculate
### Identical blocks after materialized view transformations {#identical-blocks-after-materialized-view-transformations}
-Identical blocks, which have been generated during transformation inside a materialized view, are not deduplicated because they are based on different inserted data.
+Identical blocks, which have been generated during transformation inside a materialized view, aren't deduplicated because they're based on different inserted data.
Here is an example:
@@ -100,7 +100,7 @@ SET min_insert_block_size_rows=0;
SET min_insert_block_size_bytes=0;
```
-The settings above allow us to select from a table with a series of blocks containing only one row. These small blocks are not squashed and remain the same until they are inserted into a table.
+The settings above allow us to select from a table with a series of blocks containing only one row. These small blocks aren't squashed and remain the same until they're inserted into a table.
```sql
SET deduplicate_blocks_in_dependent_materialized_views=1;
@@ -141,7 +141,7 @@ ORDER BY all;
└─────┴───────┴───────────┘
```
-Here we see that 2 parts have been inserted into the `mv_dst` table. That parts contain the same data, however they are not deduplicated.
+Here we see that 2 parts have been inserted into the `mv_dst` table. That parts contain the same data, however they're not deduplicated.
```sql
INSERT INTO dst SELECT
@@ -211,7 +211,7 @@ ORDER BY all;
└────────────┴─────┴───────┴───────────┘
```
-With the settings above, two blocks result from select– as a result, there should be two blocks for insertion into table `dst`. However, we see that only one block has been inserted into table `dst`. This occurred because the second block has been deduplicated. It has the same data and the key for deduplication `block_id` which is calculated as a hash from the inserted data. This behaviour is not what was expected. Such cases are a rare occurrence, but theoretically is possible. In order to handle such cases correctly, the user has to provide a `insert_deduplication_token`. Let's fix this with the following examples:
+With the settings above, two blocks result from select– as a result, there should be two blocks for insertion into table `dst`. However, we see that only one block has been inserted into table `dst`. This occurred because the second block has been deduplicated. It has the same data and the key for deduplication `block_id` which is calculated as a hash from the inserted data. This behaviour isn't what was expected. Such cases are a rare occurrence, but theoretically is possible. In order to handle such cases correctly, the user has to provide a `insert_deduplication_token`. Let's fix this with the following examples:
### Identical blocks in insertion with `insert_deduplication_token` {#identical-blocks-in-insertion-with-insert_deduplication_token}
@@ -300,7 +300,7 @@ ORDER BY all;
└────────────┴─────┴───────┴───────────┘
```
-That insertion is also deduplicated even though it contains different inserted data. Note that `insert_deduplication_token` has higher priority: ClickHouse does not use the hash sum of data when `insert_deduplication_token` is provided.
+That insertion is also deduplicated even though it contains different inserted data. Note that `insert_deduplication_token` has higher priority: ClickHouse doesn't use the hash sum of data when `insert_deduplication_token` is provided.
### Different insert operations generate the same data after transformation in the underlying table of the materialized view {#different-insert-operations-generate-the-same-data-after-transformation-in-the-underlying-table-of-the-materialized-view}
@@ -384,7 +384,7 @@ ORDER by all;
└───────────────┴─────┴───────┴───────────┘
```
-We insert different data each time. However, the same data is inserted into the `mv_dst` table. Data is not deduplicated because the source data was different.
+We insert different data each time. However, the same data is inserted into the `mv_dst` table. Data isn't deduplicated because the source data was different.
### Different materialized view inserts into one underlying table with equivalent data {#different-materialized-view-inserts-into-one-underlying-table-with-equivalent-data}
diff --git a/docs/guides/developer/deduplication.md b/docs/guides/developer/deduplication.md
index 6424f38aa8d..8b9366245f5 100644
--- a/docs/guides/developer/deduplication.md
+++ b/docs/guides/developer/deduplication.md
@@ -15,7 +15,7 @@ import Image from '@theme/IdealImage';
**Deduplication** refers to the process of ***removing duplicate rows of a dataset***. In an OLTP database, this is done easily because each row has a unique primary key-but at the cost of slower inserts. Every inserted row needs to first be searched for and, if found, needs to be replaced.
-ClickHouse is built for speed when it comes to data insertion. The storage files are immutable and ClickHouse does not check for an existing primary key before inserting a row-so deduplication involves a bit more effort. This also means that deduplication is not immediate-it is **eventual**, which has a few side effects:
+ClickHouse is built for speed when it comes to data insertion. The storage files are immutable and ClickHouse doesn't check for an existing primary key before inserting a row-so deduplication involves a bit more effort. This also means that deduplication isn't immediate-it is **eventual**, which has a few side effects:
- At any moment in time your table can still have duplicates (rows with the same sorting key)
- The actual removal of duplicate rows occurs during the merging of parts
@@ -35,7 +35,7 @@ Deduplication is implemented in ClickHouse using the following table engines:
1. `ReplacingMergeTree` table engine: with this table engine, duplicate rows with the same sorting key are removed during merges. `ReplacingMergeTree` is a good option for emulating upsert behavior (where you want queries to return the last row inserted).
-2. Collapsing rows: the `CollapsingMergeTree` and `VersionedCollapsingMergeTree` table engines use a logic where an existing row is "canceled" and a new row is inserted. They are more complex to implement than `ReplacingMergeTree`, but your queries and aggregations can be simpler to write without worrying about whether or not data has been merged yet. These two table engines are useful when you need to update data frequently.
+2. Collapsing rows: the `CollapsingMergeTree` and `VersionedCollapsingMergeTree` table engines use a logic where an existing row is "canceled" and a new row is inserted. They're more complex to implement than `ReplacingMergeTree`, but your queries and aggregations can be simpler to write without worrying about whether or not data has been merged yet. These two table engines are useful when you need to update data frequently.
We walk through both of these techniques below. For more details, check out our free on-demand [Deleting and Updating Data training module](https://learn.clickhouse.com/visitor_catalog_class/show/1328954/?utm_source=clickhouse&utm_medium=docs).
@@ -88,7 +88,7 @@ FROM hackernews_rmt
└────┴─────────┴─────────────────┴───────┘
```
-The separate boxes above in the output demonstrate the two parts behind-the-scenes - this data has not been merged yet, so the duplicate rows have not been removed yet. Let's use the `FINAL` keyword in the `SELECT` query, which results in a logical merging of the query result:
+The separate boxes above in the output demonstrate the two parts behind-the-scenes - this data hasn't been merged yet, so the duplicate rows haven't been removed yet. Let's use the `FINAL` keyword in the `SELECT` query, which results in a logical merging of the query result:
```sql
SELECT *
@@ -106,7 +106,7 @@ FINAL
The result only has 2 rows, and the last row inserted is the row that gets returned.
:::note
-Using `FINAL` works okay if you have a small amount of data. If you are dealing with a large amount of data,
+Using `FINAL` works okay if you have a small amount of data. If you're dealing with a large amount of data,
using `FINAL` is probably not the best option. Let's discuss a better option for
finding the latest value of a column.
:::
@@ -191,7 +191,7 @@ What is the sign column of a `CollapsingMergeTree` table? It represents the _sta
- If two rows have the same primary key (or sort order if that is different than the primary key), but different values of the sign column, then the last row inserted with a +1 becomes the state row and the other rows cancel each other
- Rows that cancel each other out are deleted during merges
-- Rows that do not have a matching pair are kept
+- Rows that don't have a matching pair are kept
Let's add a row to the `hackernews_views` table. Since it is the only row for this primary key, we set its state to 1:
@@ -239,10 +239,10 @@ FINAL
└─────┴─────────┴───────┴──────┘
```
-But of course, using `FINAL` is not recommended for large tables.
+But of course, using `FINAL` isn't recommended for large tables.
:::note
-The value passed in for the `views` column in our example is not really needed, nor does it have to match the current value of `views` of the old row. In fact, you can cancel a row with just the primary key and a -1:
+The value passed in for the `views` column in our example isn't really needed, nor does it have to match the current value of `views` of the old row. In fact, you can cancel a row with just the primary key and a -1:
```sql
INSERT INTO hackernews_views(id, author, sign) VALUES
@@ -252,7 +252,7 @@ INSERT INTO hackernews_views(id, author, sign) VALUES
## Real-time updates from multiple threads {#real-time-updates-from-multiple-threads}
-With a `CollapsingMergeTree` table, rows cancel each other using a sign column, and the state of a row is determined by the last row inserted. But this can be problematic if you are inserting rows from different threads where rows can be inserted out of order. Using the "last" row does not work in this situation.
+With a `CollapsingMergeTree` table, rows cancel each other using a sign column, and the state of a row is determined by the last row inserted. But this can be problematic if you're inserting rows from different threads where rows can be inserted out of order. Using the "last" row doesn't work in this situation.
This is where `VersionedCollapsingMergeTree` comes in handy - it collapses rows just like `CollapsingMergeTree`, but instead of keeping the last row inserted, it keeps the row with the highest value of a version column that you specify.
@@ -273,8 +273,8 @@ PRIMARY KEY (id, author)
Notice the table uses `VersionsedCollapsingMergeTree` as the engine and passes in the **sign column** and a **version column**. Here is the table works:
- It deletes each pair of rows that have the same primary key and version and different sign
-- The order that rows were inserted does not matter
-- Note that if the version column is not a part of the primary key, ClickHouse adds it to the primary key implicitly as the last field
+- The order that rows were inserted doesn't matter
+- Note that if the version column isn't a part of the primary key, ClickHouse adds it to the primary key implicitly as the last field
You use the same type of logic when writing queries - group by the primary key and use clever logic to avoid rows that have been canceled but not deleted yet. Let's add some rows to the `hackernews_views_vcmt` table:
@@ -342,6 +342,6 @@ A `VersionedCollapsingMergeTree` table is quite handy when you want to implement
## Why aren't my rows being deduplicated? {#why-arent-my-rows-being-deduplicated}
-One reason inserted rows may not be deduplicated is if you are using a non-idempotent function or expression in your `INSERT` statement. For example, if you are inserting rows with the column `createdAt DateTime64(3) DEFAULT now()`, your rows are guaranteed to be unique because each row will have a unique default value for the `createdAt` column. The MergeTree / ReplicatedMergeTree table engine will not know to deduplicate the rows as each inserted row will generate a unique checksum.
+One reason inserted rows may not be deduplicated is if you're using a non-idempotent function or expression in your `INSERT` statement. For example, if you're inserting rows with the column `createdAt DateTime64(3) DEFAULT now()`, your rows are guaranteed to be unique because each row will have a unique default value for the `createdAt` column. The MergeTree / ReplicatedMergeTree table engine won't know to deduplicate the rows as each inserted row will generate a unique checksum.
-In this case, you can specify your own `insert_deduplication_token` for each batch of rows to ensure that multiple inserts of the same batch will not result in the same rows being re-inserted. Please see the [documentation on `insert_deduplication_token`](/operations/settings/settings#insert_deduplication_token) for more details about how to use this setting.
+In this case, you can specify your own `insert_deduplication_token` for each batch of rows to ensure that multiple inserts of the same batch won't result in the same rows being re-inserted. Please see the [documentation on `insert_deduplication_token`](/operations/settings/settings#insert_deduplication_token) for more details about how to use this setting.
diff --git a/docs/guides/developer/dynamic-column-selection.md b/docs/guides/developer/dynamic-column-selection.md
index 1155c4c279a..7927cc0b328 100644
--- a/docs/guides/developer/dynamic-column-selection.md
+++ b/docs/guides/developer/dynamic-column-selection.md
@@ -151,7 +151,7 @@ FROM nyc_taxi.trips;
## Replacing columns {#replacing-columns}
-So far so good. But let’s say we want to adjust one of the values, while leaving the other ones as they are. For example, maybe we want to double the total amount and divide the MTA tax by 1.1. We can do that by using the [`REPLACE`](/sql-reference/statements/select) modifier, which will replace a column while leaving the other ones as they are.
+So far so good. But let’s say we want to adjust one of the values, while leaving the other ones as they're. For example, maybe we want to double the total amount and divide the MTA tax by 1.1. We can do that by using the [`REPLACE`](/sql-reference/statements/select) modifier, which will replace a column while leaving the other ones as they're.
```sql
FROM nyc_taxi.trips
diff --git a/docs/guides/developer/index.md b/docs/guides/developer/index.md
index b8f04b79264..276aceb0aa8 100644
--- a/docs/guides/developer/index.md
+++ b/docs/guides/developer/index.md
@@ -20,6 +20,6 @@ This section contains the following advanced guides:
| [Deduplication strategies](../developer/deduplication) | A guide which dives into data deduplication, a technique for removing duplicate rows from your database. Explains differences from primary key-based deduplication in OLTP systems, ClickHouse's approach to deduplication and how to handle duplicate data scenarios within your ClickHouse queries. |
| [Filling gaps in time-series data](../developer/time-series-filling-gaps) | A guide which provides insights into ClickHouse's capabilities for handling time-series data, including techniques for filling gaps in data to create a more complete and continuous representation of time-series information. |
| [Manage Data with TTL (Time-to-live)](../developer/ttl) | A guide discussing how to use the `WITH FILL` clause to fill gaps in time-series data. It covers how to fill gaps with 0 values, how to specify a starting point for filling gaps, how to fill gaps up to a specific end point, and how to interpolate values for cumulative calculations. |
-| [Stored procedures & query parameters](../developer/stored-procedures-and-prepared-statements) | A guide explaining that ClickHouse does not support traditional stored procedures, and provides recommended alternatives including User-Defined Functions (UDFs), parameterized views, materialized views, and external orchestration. Also covers query parameters for safe parameterized queries (similar to prepared statements). |
+| [Stored procedures & query parameters](../developer/stored-procedures-and-prepared-statements) | A guide explaining that ClickHouse doesn't support traditional stored procedures, and provides recommended alternatives including User-Defined Functions (UDFs), parameterized views, materialized views, and external orchestration. Also covers query parameters for safe parameterized queries (similar to prepared statements). |
| [Understanding query execution with the Analyzer](../developer/understanding-query-execution-with-the-analyzer) | A guide which demystifies ClickHouse query execution by introducing the analyzer tool. It explains how the analyzer breaks down a query into a series of steps, allowing you to visualize and troubleshoot the entire execution process for optimal performance. |
| [Using JOINs in ClickHouse](../joining-tables) | A guide that simplifies joining tables in ClickHouse. It covers different join types (`INNER`, `LEFT`, `RIGHT`, etc.), explores best practices for efficient joins (like placing smaller tables on the right), and provides insights on ClickHouse's internal join algorithms to help you optimize your queries for complex data relationships. |
diff --git a/docs/guides/developer/on-fly-mutations.md b/docs/guides/developer/on-fly-mutations.md
index a28689c01b0..bc818b296f2 100644
--- a/docs/guides/developer/on-fly-mutations.md
+++ b/docs/guides/developer/on-fly-mutations.md
@@ -9,7 +9,7 @@ doc_type: 'guide'
## On-the-fly mutations {#on-the-fly-mutations}
-When on-the-fly mutations are enabled, updated rows are marked as updated immediately and subsequent `SELECT` queries will automatically return with the changed values. When on-the-fly mutations are not enabled, you may have to wait for your mutations to be applied via a background process to see the changed values.
+When on-the-fly mutations are enabled, updated rows are marked as updated immediately and subsequent `SELECT` queries will automatically return with the changed values. When on-the-fly mutations aren't enabled, you may have to wait for your mutations to be applied via a background process to see the changed values.
On-the-fly mutations can be enabled for `MergeTree`-family tables by enabling the query-level setting `apply_mutations_on_fly`.
@@ -48,7 +48,7 @@ SET apply_mutations_on_fly = 0;
SELECT id, v FROM test_on_fly_mutations ORDER BY id;
```
-Note that the values of the rows have not yet been updated when we query the new table:
+Note that the values of the rows haven't yet been updated when we query the new table:
```response
┌─id─┬─v─┐
@@ -77,7 +77,7 @@ The `SELECT` query now returns the correct result immediately, without having to
## Performance impact {#performance-impact}
-When on-the-fly mutations are enabled, mutations are not materialized immediately but will only be applied during `SELECT` queries. However, please note that mutations are still being materialized asynchronously in the background, which is a heavy process.
+When on-the-fly mutations are enabled, mutations aren't materialized immediately but will only be applied during `SELECT` queries. However, please note that mutations are still being materialized asynchronously in the background, which is a heavy process.
If the number of submitted mutations constantly exceeds the number of mutations that are processed in the background over some time interval, the queue of unmaterialized mutations that have to be applied will continue to grow. This will result in the eventual degradation of `SELECT` query performance.
diff --git a/docs/guides/developer/replacing-merge-tree.md b/docs/guides/developer/replacing-merge-tree.md
index b248661b30d..b0a38a30398 100644
--- a/docs/guides/developer/replacing-merge-tree.md
+++ b/docs/guides/developer/replacing-merge-tree.md
@@ -15,10 +15,10 @@ In order to process a stream of update and delete rows while avoiding the above
## Automatic upserts of inserted rows {#automatic-upserts-of-inserted-rows}
The [ReplacingMergeTree table engine](/engines/table-engines/mergetree-family/replacingmergetree) allows update operations to be applied to rows, without needing to use inefficient `ALTER` or `DELETE` statements, by offering the ability for you to insert multiple copies of the same row and denote one as the latest version. A background process, in turn, asynchronously removes older versions of the same row, efficiently imitating an update operation through the use of immutable inserts.
-This relies on the ability of the table engine to identify duplicate rows. This is achieved using the `ORDER BY` clause to determine uniqueness, i.e., if two rows have the same values for the columns specified in the `ORDER BY`, they are considered duplicates. A `version` column, specified when defining the table, allows the latest version of a row to be retained when two rows are identified as duplicates i.e. the row with the highest version value is kept.
+This relies on the ability of the table engine to identify duplicate rows. This is achieved using the `ORDER BY` clause to determine uniqueness, i.e., if two rows have the same values for the columns specified in the `ORDER BY`, they're considered duplicates. A `version` column, specified when defining the table, allows the latest version of a row to be retained when two rows are identified as duplicates i.e. the row with the highest version value is kept.
We illustrate this process in the example below. Here, the rows are uniquely identified by the A column (the `ORDER BY` for the table). We assume these rows have been inserted as two batches, resulting in the formation of two data parts on disk. Later, during an asynchronous background process, these parts are merged together.
-ReplacingMergeTree additionally allows a deleted column to be specified. This can contain either 0 or 1, where a value of 1 indicates that the row (and its duplicates) has been deleted and zero is used otherwise. **Note: Deleted rows will not be removed at merge time.**
+ReplacingMergeTree additionally allows a deleted column to be specified. This can contain either 0 or 1, where a value of 1 indicates that the row (and its duplicates) has been deleted and zero is used otherwise. **Note: Deleted rows won't be removed at merge time.**
During this process, the following occurs during part merging:
@@ -55,7 +55,7 @@ We recommend pausing inserts once (1) is guaranteed and until this command and t
Above, we highlighted an important additional constraint that must also be satisfied in the case of the ReplacingMergeTree: the values of columns of the `ORDER BY` uniquely identify a row across changes. If migrating from a transactional database like Postgres, the original Postgres primary key should thus be included in the Clickhouse `ORDER BY` clause.
-Users of ClickHouse will be familiar with choosing the columns in their tables `ORDER BY` clause to [optimize for query performance](/data-modeling/schema-design#choosing-an-ordering-key). Generally, these columns should be selected based on your [frequent queries and listed in order of increasing cardinality](/guides/best-practices/sparse-primary-indexes#an-index-design-for-massive-data-scales). Importantly, the ReplacingMergeTree imposes an additional constraint - these columns must be immutable, i.e., if replicating from Postgres, only add columns to this clause if they do not change in the underlying Postgres data. While other columns can change, these are required to be consistent for unique row identification.
+Users of ClickHouse will be familiar with choosing the columns in their tables `ORDER BY` clause to [optimize for query performance](/data-modeling/schema-design#choosing-an-ordering-key). Generally, these columns should be selected based on your [frequent queries and listed in order of increasing cardinality](/guides/best-practices/sparse-primary-indexes#an-index-design-for-massive-data-scales). Importantly, the ReplacingMergeTree imposes an additional constraint - these columns must be immutable, i.e., if replicating from Postgres, only add columns to this clause if they don't change in the underlying Postgres data. While other columns can change, these are required to be consistent for unique row identification.
For analytical workloads, the Postgres primary key is generally of little use as you will rarely perform point row lookups. Given we recommend that columns be ordered in order of increasing cardinality, as well as the fact that matches on [columns listed earlier in the ORDER BY will usually be faster](/guides/best-practices/sparse-primary-indexes#ordering-key-columns-efficiently), the Postgres primary key should be appended to the end of the `ORDER BY` (unless it has analytical value). In the case that multiple columns form a primary key in Postgres, they should be appended to the `ORDER BY`, respecting cardinality and the likelihood of query value. You may also wish to generate a unique primary key using a concatenation of values via a `MATERIALIZED` column.
Consider the posts table from the Stack Overflow dataset.
@@ -97,7 +97,7 @@ We use an `ORDER BY` key of `(PostTypeId, toDate(CreationDate), CreationDate, Id
## Querying ReplacingMergeTree {#querying-replacingmergetree}
-At merge time, the ReplacingMergeTree identifies duplicate rows, using the values of the `ORDER BY` columns as a unique identifier, and either retains only the highest version or removes all duplicates if the latest version indicates a delete. This, however, offers eventual correctness only - it does not guarantee rows will be deduplicated, and you should not rely on it. Queries can, therefore, produce incorrect answers due to update and delete rows being considered in queries.
+At merge time, the ReplacingMergeTree identifies duplicate rows, using the values of the `ORDER BY` columns as a unique identifier, and either retains only the highest version or removes all duplicates if the latest version indicates a delete. This, however, offers eventual correctness only - it doesn't guarantee rows will be deduplicated, and you shouldn't rely on it. Queries can, therefore, produce incorrect answers due to update and delete rows being considered in queries.
To obtain correct answers, you will need to complement background merges with query time deduplication and deletion removal. This can be achieved using the `FINAL` operator.
@@ -222,16 +222,16 @@ Peak memory usage: 8.14 MiB.
## FINAL performance {#final-performance}
The `FINAL` operator does have a small performance overhead on queries.
-This will be most noticeable when queries are not filtering on primary key columns,
+This will be most noticeable when queries aren't filtering on primary key columns,
causing more data to be read and increasing the deduplication overhead. If you
filter on key columns using a `WHERE` condition, the data loaded and passed for
deduplication will be reduced.
-If the `WHERE` condition does not use a key column, ClickHouse does not currently utilize the `PREWHERE` optimization when using `FINAL`. This optimization aims to reduce the rows read for non-filtered columns. Examples of emulating this `PREWHERE` and thus potentially improving performance can be found [here](https://clickhouse.com/blog/clickhouse-postgresql-change-data-capture-cdc-part-1#final-performance).
+If the `WHERE` condition doesn't use a key column, ClickHouse doesn't currently utilize the `PREWHERE` optimization when using `FINAL`. This optimization aims to reduce the rows read for non-filtered columns. Examples of emulating this `PREWHERE` and thus potentially improving performance can be found [here](https://clickhouse.com/blog/clickhouse-postgresql-change-data-capture-cdc-part-1#final-performance).
## Exploiting partitions with ReplacingMergeTree {#exploiting-partitions-with-replacingmergetree}
-Merging of data in ClickHouse occurs at a partition level. When using ReplacingMergeTree, we recommend users partition their table according to best practices, provided you can ensure this **partitioning key does not change for a row**. This will ensure updates pertaining to the same row will be sent to the same ClickHouse partition. You may reuse the same partition key as Postgres provided you adhere to the best practices outlined here.
+Merging of data in ClickHouse occurs at a partition level. When using ReplacingMergeTree, we recommend users partition their table according to best practices, provided you can ensure this **partitioning key doesn't change for a row**. This will ensure updates pertaining to the same row will be sent to the same ClickHouse partition. You may reuse the same partition key as Postgres provided you adhere to the best practices outlined here.
Assuming this is the case, you can use the setting `do_not_merge_across_partitions_select_final=1` to improve `FINAL` query performance. This setting causes partitions to be merged and processed independently when using FINAL.
@@ -335,7 +335,7 @@ For a more sustainable solution that maintains performance, partitioning the tab
### Partitioning and merging across partitions {#partitioning-and-merging-across-partitions}
-As discussed in Exploiting Partitions with ReplacingMergeTree, we recommend partitioning tables as a best practice. Partitioning isolates data for more efficient merges and avoids merging across partitions, particularly during query execution. This behavior is enhanced in versions from 23.12 onward: if the partition key is a prefix of the sorting key, merging across partitions is not performed at query time, leading to faster query performance.
+As discussed in Exploiting Partitions with ReplacingMergeTree, we recommend partitioning tables as a best practice. Partitioning isolates data for more efficient merges and avoids merging across partitions, particularly during query execution. This behavior is enhanced in versions from 23.12 onward: if the partition key is a prefix of the sorting key, merging across partitions isn't performed at query time, leading to faster query performance.
### Tuning merges for better query performance {#tuning-merges-for-better-query-performance}
diff --git a/docs/guides/developer/stored-procedures-and-prepared-statements.md b/docs/guides/developer/stored-procedures-and-prepared-statements.md
index b6bb0fb1f38..bb8129894da 100644
--- a/docs/guides/developer/stored-procedures-and-prepared-statements.md
+++ b/docs/guides/developer/stored-procedures-and-prepared-statements.md
@@ -18,7 +18,7 @@ This guide explains ClickHouse's approach to these concepts and provides recomme
## Alternatives to stored procedures in ClickHouse {#alternatives-to-stored-procedures}
-ClickHouse does not support traditional stored procedures with control flow logic (`IF`/`ELSE`, loops, etc.).
+ClickHouse doesn't support traditional stored procedures with control flow logic (`IF`/`ELSE`, loops, etc.).
This is an intentional design decision based on ClickHouse's architecture as an analytical database.
Loops are discouraged for analytical databases because processing O(n) simple queries is usually slower than processing fewer complex queries.
@@ -102,7 +102,7 @@ SELECT format_phone('5551234567');
**Limitations:**
- No loops or complex control flow
-- Cannot modify data (`INSERT`/`UPDATE`/`DELETE`)
+- Can't modify data (`INSERT`/`UPDATE`/`DELETE`)
- Recursive functions not allowed
See [`CREATE FUNCTION`](/sql-reference/statements/create/function) for complete syntax.
@@ -411,7 +411,7 @@ SELECT @status, @points;
:::note Query parameters
The example below uses query parameters in ClickHouse.
Skip ahead to ["Alternatives to prepared statements in ClickHouse"](/guides/developer/stored-procedures-and-prepared-statements#alternatives-to-prepared-statements-in-clickhouse)
-if you are not yet familiar with query parameters in ClickHouse.
+if you're not yet familiar with query parameters in ClickHouse.
:::
```python
@@ -788,15 +788,15 @@ SELECT count() FROM {table: Identifier};
For use of query parameters in [language clients](/integrations/language-clients), refer to the documentation for
-the specific language client you are interested in.
+the specific language client you're interested in.
### Limitations of query parameters {#limitations-of-query-parameters}
Query parameters are **not general text substitutions**. They have specific limitations:
-1. They are **primarily intended for SELECT statements** - the best support is in SELECT queries
-2. They **work as identifiers or literals** - they cannot substitute arbitrary SQL fragments
-3. They have **limited DDL support** - they are supported in `CREATE TABLE`, but not in `ALTER TABLE`
+1. They're **primarily intended for SELECT statements** - the best support is in SELECT queries
+2. They **work as identifiers or literals** - they can't substitute arbitrary SQL fragments
+3. They have **limited DDL support** - they're supported in `CREATE TABLE`, but not in `ALTER TABLE`
**What WORKS:**
```sql
@@ -870,11 +870,11 @@ ClickHouse's [MySQL interface](/interfaces/mysql) includes minimal support for p
**Key limitations:**
-- **Parameter binding is not supported** - You cannot use `?` placeholders with bound parameters
+- **Parameter binding isn't supported** - You can't use `?` placeholders with bound parameters
- Queries are stored but not parsed during `PREPARE`
- Implementation is minimal and designed for specific BI tool compatibility
-**Example of what does NOT work:**
+**Example of what doesn't work:**
```sql
-- This MySQL-style prepared statement with parameters does NOT work in ClickHouse
diff --git a/docs/guides/developer/time-series-filling-gaps.md b/docs/guides/developer/time-series-filling-gaps.md
index c3b8ea2cdce..337f4488e5e 100644
--- a/docs/guides/developer/time-series-filling-gaps.md
+++ b/docs/guides/developer/time-series-filling-gaps.md
@@ -161,7 +161,7 @@ We can see from the results that the buckets from `00:24:03.000` to `00:24:03.50
## WITH FILL...TO {#with-fillto}
We're still missing some buckets from the end of the time range though, which we can fill by providing a `TO` value.
-`TO` is not inclusive, so we'll add a small amount to the end time to make sure that it's included:
+`TO` isn't inclusive, so we'll add a small amount to the end time to make sure that it's included:
```sql
SELECT
diff --git a/docs/guides/developer/ttl.md b/docs/guides/developer/ttl.md
index f34bb14afbb..fff5f6440ba 100644
--- a/docs/guides/developer/ttl.md
+++ b/docs/guides/developer/ttl.md
@@ -62,7 +62,7 @@ Choose your partition granularity based on your TTL period:
## Triggering TTL events {#triggering-ttl-events}
-The deleting or aggregating of expired rows is not immediate - it only occurs during table merges. If you have a table that's not actively merging (for whatever reason), there are two settings that trigger TTL events:
+The deleting or aggregating of expired rows isn't immediate - it only occurs during table merges. If you have a table that's not actively merging (for whatever reason), there are two settings that trigger TTL events:
- `merge_with_ttl_timeout`: the minimum delay in seconds before repeating a merge with delete TTL. The default is 14400 seconds (4 hours).
- `merge_with_recompression_ttl_timeout`: the minimum delay in seconds before repeating a merge with recompression TTL (rules that roll up data before deleting). Default value: 14400 seconds (4 hours).
@@ -155,7 +155,7 @@ Some notes on the `hits` table:
:::note
-If you are using ClickHouse Cloud, the steps in the lesson are not applicable. You do not need to worry about moving old data around in ClickHouse Cloud.
+If you're using ClickHouse Cloud, the steps in the lesson aren't applicable. You don't need to worry about moving old data around in ClickHouse Cloud.
:::
A common practice when working with large amounts of data is to move that data around as it gets older. Here are the steps for implementing a hot/warm/cold architecture in ClickHouse using the `TO DISK` and `TO VOLUME` clauses of the `TTL` command. (By the way, it doesn't have to be a hot and cold thing - you can use TTL to move data around for whatever use case you have.)
diff --git a/docs/guides/developer/understanding-query-execution-with-the-analyzer.md b/docs/guides/developer/understanding-query-execution-with-the-analyzer.md
index d5bf396a86c..d563a8ccd9d 100644
--- a/docs/guides/developer/understanding-query-execution-with-the-analyzer.md
+++ b/docs/guides/developer/understanding-query-execution-with-the-analyzer.md
@@ -16,7 +16,7 @@ import Image from '@theme/IdealImage';
# Understanding query execution with the analyzer
-ClickHouse processes queries extremely quickly, but the execution of a query is not a simple story. Let's try to understand how a `SELECT` query gets executed. To illustrate it, let's add some data in a table in ClickHouse:
+ClickHouse processes queries extremely quickly, but the execution of a query isn't a simple story. Let's try to understand how a `SELECT` query gets executed. To illustrate it, let's add some data in a table in ClickHouse:
```sql
CREATE TABLE session_events(
@@ -38,7 +38,7 @@ Now that we have some data in ClickHouse, we want to run some queries and unders
-Let's look at each entity in action during query execution. We are going to take a few queries and then examine them using the `EXPLAIN` statement.
+Let's look at each entity in action during query execution. We're going to take a few queries and then examine them using the `EXPLAIN` statement.
## Parser {#parser}
@@ -69,11 +69,11 @@ The output is an Abstract Syntax Tree that can be visualized as shown below:
-Each node has corresponding children and the overall tree represents the overall structure of your query. This is a logical structure to help processing a query. From an end-user standpoint (unless interested in query execution), it is not super useful; this tool is mainly used by developers.
+Each node has corresponding children and the overall tree represents the overall structure of your query. This is a logical structure to help processing a query. From an end-user standpoint (unless interested in query execution), it isn't super useful; this tool is mainly used by developers.
## Analyzer {#analyzer}
-ClickHouse currently has two architectures for the Analyzer. You can use the old architecture by setting: `enable_analyzer=0`. The new architecture is enabled by default. We are going to describe only the new architecture here, given the old one is going to be deprecated once the new analyzer is generally available.
+ClickHouse currently has two architectures for the Analyzer. You can use the old architecture by setting: `enable_analyzer=0`. The new architecture is enabled by default. We're going to describe only the new architecture here, given the old one is going to be deprecated once the new analyzer is generally available.
:::note
The new architecture should provide us with a better framework to improve ClickHouse's performance. However, given it is a fundamental component of the query processing steps, it also might have a negative impact on some queries and there are [known incompatibilities](/operations/analyzer#known-incompatibilities). You can revert back to the old analyzer by changing the `enable_analyzer` setting at the query or user level.
@@ -336,7 +336,7 @@ You can then copy this output and paste it [here](https://dreampuf.github.io/Gra
-A white rectangle corresponds to a pipeline node, the gray rectangle corresponds to the query plan steps, and the `x` followed by a number corresponds to the number of inputs/outputs that are being used. If you do not want to see them in a compact form, you can always add `compact=0`:
+A white rectangle corresponds to a pipeline node, the gray rectangle corresponds to the query plan steps, and the `x` followed by a number corresponds to the number of inputs/outputs that are being used. If you don't want to see them in a compact form, you can always add `compact=0`:
```sql
EXPLAIN PIPELINE graph = 1, compact = 0
@@ -435,8 +435,8 @@ digraph
-So the executor decided not to parallelize operations because the volume of data was not high enough. By adding more rows, the executor then decided to use multiple threads as shown in the graph.
+So the executor decided not to parallelize operations because the volume of data wasn't high enough. By adding more rows, the executor then decided to use multiple threads as shown in the graph.
## Executor {#executor}
-Finally the last step of the query execution is done by the executor. It will take the query pipeline and execute it. There are different types of executors, depending if you are doing a `SELECT`, an `INSERT`, or an `INSERT SELECT`.
+Finally the last step of the query execution is done by the executor. It will take the query pipeline and execute it. There are different types of executors, depending if you're doing a `SELECT`, an `INSERT`, or an `INSERT SELECT`.
diff --git a/docs/guides/examples/aggregate_function_combinators/avgState.md b/docs/guides/examples/aggregate_function_combinators/avgState.md
index 1dec7554042..36a846695bf 100644
--- a/docs/guides/examples/aggregate_function_combinators/avgState.md
+++ b/docs/guides/examples/aggregate_function_combinators/avgState.md
@@ -36,7 +36,7 @@ ORDER BY (page_id, viewed_at);
```
Create the aggregate table that will store average response times. Note that
-`avg` cannot use the `SimpleAggregateFunction` type as it requires a complex
+`avg` can't use the `SimpleAggregateFunction` type as it requires a complex
state (a sum and a count). We therefore use the `AggregateFunction` type:
```sql
@@ -112,7 +112,7 @@ FROM page_performance
Notice that the `avg_response_time` column is of type `AggregateFunction(avg, UInt32)`
and stores intermediate state information. Also notice that the row data for the
-`avg_response_time` is not useful to us and we see strange text characters such
+`avg_response_time` isn't useful to us and we see strange text characters such
as `�, n, F, }`. This is the terminals attempt to display binary data as text.
The reason for this is that `AggregateFunction` types store their state in a
binary format that's optimized for efficient storage and computation, not for
diff --git a/docs/guides/examples/aggregate_function_combinators/minSimpleState.md b/docs/guides/examples/aggregate_function_combinators/minSimpleState.md
index 0e70bc48160..cc383192886 100644
--- a/docs/guides/examples/aggregate_function_combinators/minSimpleState.md
+++ b/docs/guides/examples/aggregate_function_combinators/minSimpleState.md
@@ -134,7 +134,7 @@ ORDER BY location_id;
```
Notice above that we have two inserted values for each location. This is because
-parts have not yet been merged (and aggregated by `AggregatingMergeTree`). To get
+parts haven't yet been merged (and aggregated by `AggregatingMergeTree`). To get
the final result from the partial states we need to add a `GROUP BY`:
```sql
@@ -160,7 +160,7 @@ We now get the expected result:
```
:::note
-With `SimpleState`, you do not need to use the `Merge` combinator to combine
+With `SimpleState`, you don't need to use the `Merge` combinator to combine
partial aggregation states.
:::
diff --git a/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md b/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md
index a6a6d315f9c..516319a879b 100644
--- a/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md
+++ b/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md
@@ -23,7 +23,7 @@ Let's look at a practical example using a table that tracks votes on posts.
For each post, we want to maintain running totals of upvotes, downvotes, and an
overall score. Using the `SimpleAggregateFunction` type with sum is suited for
this use case as we only need to store the running totals, not the entire state
-of the aggregation. As a result, it will be faster and will not require merging
+of the aggregation. As a result, it will be faster and won't require merging
of partial aggregate states.
First, we create a table for the raw data:
diff --git a/docs/guides/joining-tables.md b/docs/guides/joining-tables.md
index 83d86dd9582..20dccb871e4 100644
--- a/docs/guides/joining-tables.md
+++ b/docs/guides/joining-tables.md
@@ -146,11 +146,11 @@ The supported `JOIN` types for each join algorithm are shown below and should be
A full detailed description of each `JOIN` algorithm can be found [here](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2), including their pros, cons, and scaling properties.
-Selecting the appropriate join algorithms depends on whether you are looking to optimize for memory or performance.
+Selecting the appropriate join algorithms depends on whether you're looking to optimize for memory or performance.
## Optimizing JOIN performance {#optimizing-join-performance}
-If your key optimization metric is performance and you are looking to execute the join as fast as possible, you can use the following decision tree for choosing the right join algorithm:
+If your key optimization metric is performance and you're looking to execute the join as fast as possible, you can use the following decision tree for choosing the right join algorithm:
@@ -183,6 +183,6 @@ If you want to optimize a join for the lowest memory usage instead of the fastes
- **(1)** If your table's physical row order matches the join key sort order, then the memory usage of the **full sorting merge join** is as low as it gets. With the additional benefit of good join speed because the sorting phase is [disabled](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3#utilizing-physical-row-order).
-- **(2)** The **grace hash join** can be tuned for very low memory usage by [configuring](https://github.com/ClickHouse/ClickHouse/blob/23.5/src/Core/Settings.h#L759) a high number of [buckets](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2#description-2) at the expense of join speed. The **partial merge join** intentionally uses a low amount of main memory. The **full sorting merge join** with external sorting enabled generally uses more memory than the partial merge join (assuming the row order does not match the key sort order), with the benefit of significantly better join execution time.
+- **(2)** The **grace hash join** can be tuned for very low memory usage by [configuring](https://github.com/ClickHouse/ClickHouse/blob/23.5/src/Core/Settings.h#L759) a high number of [buckets](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2#description-2) at the expense of join speed. The **partial merge join** intentionally uses a low amount of main memory. The **full sorting merge join** with external sorting enabled generally uses more memory than the partial merge join (assuming the row order doesn't match the key sort order), with the benefit of significantly better join execution time.
For users needing more details on the above, we recommend the following [blog series](https://clickhouse.com/blog/clickhouse-fully-supports-joins-part1).
diff --git a/docs/guides/separation-storage-compute.md b/docs/guides/separation-storage-compute.md
index 05e2b31949e..c591acbac5c 100644
--- a/docs/guides/separation-storage-compute.md
+++ b/docs/guides/separation-storage-compute.md
@@ -24,10 +24,10 @@ Using ClickHouse backed by S3 is especially useful for use cases where query per
Please note that implementing and managing a separation of storage and compute architecture is more complicated compared to standard ClickHouse deployments. While self-managed ClickHouse allows for separation of storage and compute as discussed in this guide, we recommend using [ClickHouse Cloud](https://clickhouse.com/cloud), which allows you to use ClickHouse in this architecture without configuration using the [`SharedMergeTree` table engine](/cloud/reference/shared-merge-tree).
-*This guide assumes you are using ClickHouse version 22.8 or higher.*
+*This guide assumes you're using ClickHouse version 22.8 or higher.*
:::warning
-Do not configure any AWS/GCS life cycle policy. This is not supported and could lead to broken tables.
+Don't configure any AWS/GCS life cycle policy. This isn't supported and could lead to broken tables.
:::
## 1. Use S3 as a ClickHouse disk {#1-use-s3-as-a-clickhouse-disk}
@@ -110,7 +110,7 @@ ORDER BY id
SETTINGS storage_policy = 's3_main';
```
-Note that we did not have to specify the engine as `S3BackedMergeTree`. ClickHouse automatically converts the engine type internally if it detects the table is using S3 for storage.
+Note that we didn't have to specify the engine as `S3BackedMergeTree`. ClickHouse automatically converts the engine type internally if it detects the table is using S3 for storage.
Show that the table was created with the correct policy:
@@ -157,14 +157,14 @@ SELECT * FROM my_s3_table;
In the AWS console, if your data was successfully inserted to S3, you should see that ClickHouse has created new files in your specified bucket.
-If everything worked successfully, you are now using ClickHouse with separated storage and compute!
+If everything worked successfully, you're now using ClickHouse with separated storage and compute!
## 3. Implementing replication for fault tolerance (optional) {#3-implementing-replication-for-fault-tolerance-optional}
:::warning
-Do not configure any AWS/GCS life cycle policy. This is not supported and could lead to broken tables.
+Don't configure any AWS/GCS life cycle policy. This isn't supported and could lead to broken tables.
:::
For fault tolerance, you can use multiple ClickHouse server nodes distributed across multiple AWS regions, with an S3 bucket for each node.
diff --git a/docs/guides/sre/keeper/index.md b/docs/guides/sre/keeper/index.md
index 39e18dc7baa..02fff5661fe 100644
--- a/docs/guides/sre/keeper/index.md
+++ b/docs/guides/sre/keeper/index.md
@@ -26,7 +26,7 @@ By default, ClickHouse Keeper provides the same guarantees as ZooKeeper: lineari
ClickHouse Keeper supports Access Control Lists (ACLs) the same way as [ZooKeeper](https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl) does. ClickHouse Keeper supports the same set of permissions and has the identical built-in schemes: `world`, `auth` and `digest`. The digest authentication scheme uses the pair `username:password`, the password is encoded in Base64.
:::note
-External integrations are not supported.
+External integrations aren't supported.
:::
### Configuration {#configuration}
@@ -46,8 +46,8 @@ The main ClickHouse Keeper configuration tag is `` and has the fo
| `snapshot_storage_path` | Path to coordination snapshots. | - |
| `enable_reconfiguration` | Enable dynamic cluster reconfiguration via [`reconfig`](#reconfiguration). | `False` |
| `max_memory_usage_soft_limit` | Soft limit in bytes of keeper max memory usage. | `max_memory_usage_soft_limit_ratio` * `physical_memory_amount` |
-| `max_memory_usage_soft_limit_ratio` | If `max_memory_usage_soft_limit` is not set or set to zero, we use this value to define the default soft limit. | `0.9` |
-| `cgroups_memory_observer_wait_time` | If `max_memory_usage_soft_limit` is not set or is set to `0`, we use this interval to observe the amount of physical memory. Once the memory amount changes, we will recalculate Keeper's memory soft limit by `max_memory_usage_soft_limit_ratio`. | `15` |
+| `max_memory_usage_soft_limit_ratio` | If `max_memory_usage_soft_limit` isn't set or set to zero, we use this value to define the default soft limit. | `0.9` |
+| `cgroups_memory_observer_wait_time` | If `max_memory_usage_soft_limit` isn't set or is set to `0`, we use this interval to observe the amount of physical memory. Once the memory amount changes, we will recalculate Keeper's memory soft limit by `max_memory_usage_soft_limit_ratio`. | `15` |
| `http_control` | Configuration of [HTTP control](#http-control) interface. | - |
| `digest_enabled` | Enable real-time data consistency check | `True` |
| `create_snapshot_on_exit` | Create a snapshot during shutdown | - |
@@ -68,8 +68,8 @@ Internal coordination settings are located in the `. 3.4. Steps for migration:
+Seamless migration from ZooKeeper to ClickHouse Keeper isn't possible. You have to stop your ZooKeeper cluster, convert data, and start ClickHouse Keeper. `clickhouse-keeper-converter` tool allows converting ZooKeeper logs and snapshots to ClickHouse Keeper snapshot. It works only with ZooKeeper > 3.4. Steps for migration:
1. Stop all ZooKeeper nodes.
@@ -479,7 +479,7 @@ clickhouse-keeper-converter --zookeeper-logs-dir /var/lib/zookeeper/version-2 --
4. Copy snapshot to ClickHouse server nodes with a configured `keeper` or start ClickHouse Keeper instead of ZooKeeper. The snapshot must persist on all nodes, otherwise, empty nodes can be faster and one of them can become a leader.
:::note
-`keeper-converter` tool is not available from the Keeper standalone binary.
+`keeper-converter` tool isn't available from the Keeper standalone binary.
If you have ClickHouse installed, you can use the binary directly:
```bash
@@ -499,11 +499,11 @@ so to add/remove a node from the cluster you need to have a quorum. If you lose
of starting them again, Raft will stop working and not allow you to reconfigure your cluster using the conventional way.
Nevertheless, ClickHouse Keeper has a recovery mode which allows you to forcefully reconfigure your cluster with only 1 node.
-This should be done only as your last resort if you cannot start your nodes again, or start a new instance on the same endpoint.
+This should be done only as your last resort if you can't start your nodes again, or start a new instance on the same endpoint.
Important things to note before continuing:
-- Make sure that the failed nodes cannot connect to the cluster again.
-- Do not start any of the new nodes until it's specified in the steps.
+- Make sure that the failed nodes can't connect to the cluster again.
+- Don't start any of the new nodes until it's specified in the steps.
After making sure that the above things are true, you need to do following:
1. Pick a single Keeper node to be your new leader. Be aware that the data of that node will be used for the entire cluster, so we recommend using a node with the most up-to-date state.
@@ -569,9 +569,9 @@ To use a disk for state file, `keeper_server.state_storage_disk` config should b
Moving files between disks is safe and there is no risk of losing data if Keeper stops in the middle of transfer.
Until the file is completely moved to the new disk, it's not deleted from the old one.
-Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) cannot satisfy some guarantees for all types of disks.
+Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) can't satisfy some guarantees for all types of disks.
Right now, only disks of type `local` support persistent sync.
-If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` is not used.
+If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` isn't used.
If `latest_log_storage_disk` is used, it should always be a `local` disk.
If `force_sync` is disabled, disks of all types can be used in any setup.
@@ -900,7 +900,7 @@ This guide provides simple and minimal settings to configure ClickHouse Keeper w
└────┴─────────┘
```
-6. You can create a `Distributed` table to represent the data on the two shards. Tables with the `Distributed` table engine do not store any data of their own, but allow distributed query processing on multiple servers. Reads hit all the shards, and writes can be distributed across the shards. Run the following query on `chnode1`:
+6. You can create a `Distributed` table to represent the data on the two shards. Tables with the `Distributed` table engine don't store any data of their own, but allow distributed query processing on multiple servers. Reads hit all the shards, and writes can be distributed across the shards. Run the following query on `chnode1`:
```sql
CREATE TABLE db1.dist_table (
id UInt64,
@@ -994,7 +994,7 @@ example for server 1:
```
:::note
-Notice that we define macros for `shard` and `replica`, but that `{uuid}` is not defined here, it is built-in and there is no need to define.
+Notice that we define macros for `shard` and `replica`, but that `{uuid}` isn't defined here, it is built-in and there is no need to define.
:::
2. Create a Database
@@ -1259,7 +1259,7 @@ server.id2 = ...
```
- Each server entry is delimited by a newline.
-- `server_type` is either `participant` or `learner` ([learner](https://github.com/eBay/NuRaft/blob/master/docs/readonly_member.md) does not participate in leader elections).
+- `server_type` is either `participant` or `learner` ([learner](https://github.com/eBay/NuRaft/blob/master/docs/readonly_member.md) doesn't participate in leader elections).
- `server_priority` is a non-negative integer telling [which nodes should be prioritised on leader elections](https://github.com/eBay/NuRaft/blob/master/docs/leader_election_priority.md).
Priority of 0 means server will never be a leader.
@@ -1316,9 +1316,9 @@ There are some caveats in Keeper reconfiguration implementation:
Changing server type (participant/learner) isn't possible either as it's not supported by NuRaft, and
the only way would be to remove and add server, which again would be misleading.
-- You cannot use the returned `znodestat` value.
-- The `from_version` field is not used. All requests with set `from_version` are declined.
- This is due to the fact `/keeper/config` is a virtual node, which means it is not stored in
+- You can't use the returned `znodestat` value.
+- The `from_version` field isn't used. All requests with set `from_version` are declined.
+ This is due to the fact `/keeper/config` is a virtual node, which means it isn't stored in
persistent storage, but rather generated on-the-fly with the specified node config for every request.
This decision was made as to not duplicate data as NuRaft already stores this config.
- Unlike ZooKeeper, there is no way to wait on cluster reconfiguration by submitting a `sync` command.
@@ -1344,10 +1344,10 @@ To get confident with the process, here's a [sandbox repository](https://github.
While ClickHouse Keeper aims to be fully compatible with ZooKeeper, there are some features that are currently not implemented (although development is ongoing):
-- [`create`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#create(java.lang.String,byte%5B%5D,java.util.List,org.apache.zookeeper.CreateMode,org.apache.zookeeper.data.Stat)) does not support returning `Stat` object
-- [`create`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#create(java.lang.String,byte%5B%5D,java.util.List,org.apache.zookeeper.CreateMode,org.apache.zookeeper.data.Stat)) does not support [TTL](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/CreateMode.html#PERSISTENT_WITH_TTL)
-- [`addWatch`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#addWatch(java.lang.String,org.apache.zookeeper.Watcher,org.apache.zookeeper.AddWatchMode)) does not work with [`PERSISTENT`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/AddWatchMode.html#PERSISTENT) watches
-- [`removeWatch`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#removeWatches(java.lang.String,org.apache.zookeeper.Watcher,org.apache.zookeeper.Watcher.WatcherType,boolean)) and [`removeAllWatches`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#removeAllWatches(java.lang.String,org.apache.zookeeper.Watcher.WatcherType,boolean)) are not supported
-- `setWatches` is not supported
-- Creating [`CONTAINER`](https://zookeeper.apache.org/doc/r3.5.1-alpha/api/org/apache/zookeeper/CreateMode.html) type znodes is not supported
-- [`SASL authentication`](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL) is not supported
+- [`create`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#create(java.lang.String,byte%5B%5D,java.util.List,org.apache.zookeeper.CreateMode,org.apache.zookeeper.data.Stat)) doesn't support returning `Stat` object
+- [`create`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#create(java.lang.String,byte%5B%5D,java.util.List,org.apache.zookeeper.CreateMode,org.apache.zookeeper.data.Stat)) doesn't support [TTL](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/CreateMode.html#PERSISTENT_WITH_TTL)
+- [`addWatch`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#addWatch(java.lang.String,org.apache.zookeeper.Watcher,org.apache.zookeeper.AddWatchMode)) doesn't work with [`PERSISTENT`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/AddWatchMode.html#PERSISTENT) watches
+- [`removeWatch`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#removeWatches(java.lang.String,org.apache.zookeeper.Watcher,org.apache.zookeeper.Watcher.WatcherType,boolean)) and [`removeAllWatches`](https://zookeeper.apache.org/doc/r3.9.1/apidocs/zookeeper-server/org/apache/zookeeper/ZooKeeper.html#removeAllWatches(java.lang.String,org.apache.zookeeper.Watcher.WatcherType,boolean)) aren't supported
+- `setWatches` isn't supported
+- Creating [`CONTAINER`](https://zookeeper.apache.org/doc/r3.5.1-alpha/api/org/apache/zookeeper/CreateMode.html) type znodes isn't supported
+- [`SASL authentication`](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL) isn't supported
diff --git a/docs/guides/sre/network-ports.md b/docs/guides/sre/network-ports.md
index 7ca8020d4b1..9be32463a03 100644
--- a/docs/guides/sre/network-ports.md
+++ b/docs/guides/sre/network-ports.md
@@ -2,7 +2,7 @@
slug: /guides/sre/network-ports
sidebar_label: 'Network ports'
title: 'Network ports'
-description: 'Description of available network ports and what they are used for'
+description: 'Description of available network ports and what they're used for'
doc_type: 'reference'
keywords: ['network', 'ports', 'configuration', 'security', 'firewall']
---
diff --git a/docs/guides/sre/scaling-clusters.md b/docs/guides/sre/scaling-clusters.md
index 18e278b6faf..700751dac2f 100644
--- a/docs/guides/sre/scaling-clusters.md
+++ b/docs/guides/sre/scaling-clusters.md
@@ -2,7 +2,7 @@
slug: /guides/sre/scaling-clusters
sidebar_label: 'Rebalancing shards'
sidebar_position: 20
-description: 'ClickHouse does not support automatic shard rebalancing, so we provide some best practices for how to rebalance shards.'
+description: 'ClickHouse doesn't support automatic shard rebalancing, so we provide some best practices for how to rebalance shards.'
title: 'Rebalancing Data'
doc_type: 'guide'
keywords: ['scaling', 'clusters', 'horizontal scaling', 'capacity planning', 'performance']
@@ -10,12 +10,12 @@ keywords: ['scaling', 'clusters', 'horizontal scaling', 'capacity planning', 'pe
# Rebalancing data
-ClickHouse does not support automatic shard rebalancing. However, there are ways to rebalance shards in order of preference:
+ClickHouse doesn't support automatic shard rebalancing. However, there are ways to rebalance shards in order of preference:
-1. Adjust the shard for the [distributed table](/engines/table-engines/special/distributed.md), allowing writes to be biased to the new shard. This potentially will cause load imbalances and hot spots on the cluster but can be viable in most scenarios where write throughput is not extremely high. It does not require the user to change their write target i.e. It can remain as the distributed table. This does not assist with rebalancing existing data.
+1. Adjust the shard for the [distributed table](/engines/table-engines/special/distributed.md), allowing writes to be biased to the new shard. This potentially will cause load imbalances and hot spots on the cluster but can be viable in most scenarios where write throughput isn't extremely high. It doesn't require the user to change their write target i.e. It can remain as the distributed table. This doesn't assist with rebalancing existing data.
2. As an alternative to (1), modify the existing cluster and write exclusively to the new shard until the cluster is balanced - manually weighting writes. This has the same limitations as (1).
3. If you need to rebalance existing data and you have partitioned your data, consider detaching partitions and manually relocating them to another node before reattaching to the new shard. This is more manual than subsequent techniques but may be faster and less resource-intensive. This is a manual operation and thus needs to consider the rebalancing of the data.
-4. Export the data from the source cluster to the new cluster via an [INSERT FROM SELECT](/sql-reference/statements/insert-into.md/#inserting-the-results-of-select). This will not be performant on very large datasets and will potentially incur significant IO on the source cluster and use considerable network resources. This represents a last resort.
+4. Export the data from the source cluster to the new cluster via an [INSERT FROM SELECT](/sql-reference/statements/insert-into.md/#inserting-the-results-of-select). This won't be performant on very large datasets and will potentially incur significant IO on the source cluster and use considerable network resources. This represents a last resort.
diff --git a/docs/guides/sre/tls/configuring-tls-acme-client.md b/docs/guides/sre/tls/configuring-tls-acme-client.md
index 8908cc76bf5..548583de72c 100644
--- a/docs/guides/sre/tls/configuring-tls-acme-client.md
+++ b/docs/guides/sre/tls/configuring-tls-acme-client.md
@@ -43,7 +43,7 @@ To enable ACME, configure HTTP and HTTPS ports along with the `acme` block:
The HTTP port serves ACME `HTTP-01` challenge (more on challenge types [here](https://letsencrypt.org/docs/challenge-types/)) requests during domain validation. Once validation completes and a certificate is issued, the HTTPS port serves encrypted traffic using the obtained certificate.
-The HTTP port does not need to be 80 on the server itself; it may be remapped using `nftables` or similar tools. Check your ACME provider's documentation for accepted ports for `HTTP-01` challenges.
+The HTTP port doesn't need to be 80 on the server itself; it may be remapped using `nftables` or similar tools. Check your ACME provider's documentation for accepted ports for `HTTP-01` challenges.
In the `acme` block, we're defining `email` for account creation, and accepting ACME service terms of service.
After that, the only thing we need is a list of domains.
@@ -52,7 +52,7 @@ After that, the only thing we need is a list of domains.
- Only `HTTP-01` challenge type is supported.
- Only `RSA 2048` keys are supported.
-- Rate limiting is not handled.
+- Rate limiting isn't handled.
## Configuration parameters {#configuration-parameters}
@@ -76,13 +76,13 @@ Note that configuration uses Let's Encrypt production directory by default. To a
When enabling the ACME client on a cluster with multiple replicas, additional care is required during the initial certificate issuance.
-The first replica that starts with ACME enabled will immediately attempt to create an ACME order and perform HTTP-01 challenge validation. If only a subset of replicas is serving traffic at that moment, the challenge is likely to fail, as other replicas will not be able to respond to validation requests.
+The first replica that starts with ACME enabled will immediately attempt to create an ACME order and perform HTTP-01 challenge validation. If only a subset of replicas is serving traffic at that moment, the challenge is likely to fail, as other replicas won't be able to respond to validation requests.
If possible, it is recommended to temporarily route traffic to a single replica (for example, by adjusting DNS records) and allow it to complete the initial certificate issuance. Once the certificate is successfully issued and stored in Keeper, ACME can be enabled on the remaining replicas. They will automatically reuse the existing certificate and participate in future renewals.
-If routing traffic to a single replica is not feasible, an alternative approach is to manually upload the existing certificate and private key into Keeper before enabling the ACME client. This avoids the initial validation step and allows all replicas to start with a valid certificate already present.
+If routing traffic to a single replica isn't feasible, an alternative approach is to manually upload the existing certificate and private key into Keeper before enabling the ACME client. This avoids the initial validation step and allows all replicas to start with a valid certificate already present.
-After the initial certificate has been issued or imported, certificate renewal does not require special handling, as all replicas will already be running the ACME client and sharing state through Keeper.
+After the initial certificate has been issued or imported, certificate renewal doesn't require special handling, as all replicas will already be running the ACME client and sharing state through Keeper.
## Keeper data structure {#keeper-data-structure}
diff --git a/docs/guides/sre/tls/configuring-tls.md b/docs/guides/sre/tls/configuring-tls.md
index 5541437b53e..1cfe4f0eb6e 100644
--- a/docs/guides/sre/tls/configuring-tls.md
+++ b/docs/guides/sre/tls/configuring-tls.md
@@ -40,7 +40,7 @@ View the [Quick Start](/getting-started/install/install.mdx) for more details on
## 2. Create TLS certificates {#2-create-tls-certificates}
:::note
-Using self-signed certificates are for demonstration purposes only and should not used in production. Certificate requests should be created to be signed by the organization and validated using the CA chain that will be configured in the settings. However, these steps can be used to configure and test settings, then can be replaced by the actual certificates that will be used.
+Using self-signed certificates are for demonstration purposes only and shouldn't used in production. Certificate requests should be created to be signed by the organization and validated using the CA chain that will be configured in the settings. However, these steps can be used to configure and test settings, then can be replaced by the actual certificates that will be used.
:::
1. Generate a key that will be used for the new CA:
@@ -378,7 +378,7 @@ The settings below are configured in the ClickHouse server `config.xml`
|9444 | ClickHouse Keeper Raft port |
3. Verify ClickHouse Keeper health
-The typical [4 letter word (4lW)](/guides/sre/keeper/index.md#four-letter-word-commands) commands will not work using `echo` without TLS, here is how to use the commands with `openssl`.
+The typical [4 letter word (4lW)](/guides/sre/keeper/index.md#four-letter-word-commands) commands won't work using `echo` without TLS, here is how to use the commands with `openssl`.
- Start an interactive session with `openssl`
```bash
@@ -455,7 +455,7 @@ The typical [4 letter word (4lW)](/guides/sre/keeper/index.md#four-letter-word-c
:::note
- the browser will show an untrusted certificate since it is being reached from a workstation and the certificates are not in the root CA stores on the client machine.
+ the browser will show an untrusted certificate since it is being reached from a workstation and the certificates aren't in the root CA stores on the client machine.
When using certificates issued from a public authority or enterprise CA, it should show trusted.
:::
diff --git a/docs/guides/sre/user-management/configuring-ldap.md b/docs/guides/sre/user-management/configuring-ldap.md
index 40d9b015c53..8ecee52e71a 100644
--- a/docs/guides/sre/user-management/configuring-ldap.md
+++ b/docs/guides/sre/user-management/configuring-ldap.md
@@ -71,7 +71,7 @@ ClickHouse can be configured to use LDAP to authenticate ClickHouse database use
|tls_require_cert |whether to require certificate for connection|never|
:::note
- In this example, since the public server uses 389 and does not use a secure port, we disable TLS for demonstration purposes.
+ In this example, since the public server uses 389 and doesn't use a secure port, we disable TLS for demonstration purposes.
:::
:::note
diff --git a/docs/guides/sre/user-management/index.md b/docs/guides/sre/user-management/index.md
index 9b7267aa6ca..5af0894332d 100644
--- a/docs/guides/sre/user-management/index.md
+++ b/docs/guides/sre/user-management/index.md
@@ -34,14 +34,14 @@ You can't manage the same access entity by both configuration methods simultaneo
:::
:::note
-If you are looking to manage ClickHouse Cloud console users, please refer to this [page](/cloud/security/manage-cloud-users)
+If you're looking to manage ClickHouse Cloud console users, please refer to this [page](/cloud/security/manage-cloud-users)
:::
To see all users, roles, profiles, etc. and all their grants use [`SHOW ACCESS`](/sql-reference/statements/show#show-access) statement.
## Overview {#access-control-usage}
-By default, the ClickHouse server provides the `default` user account which is not allowed using SQL-driven access control and account management but has all the rights and permissions. The `default` user account is used in any cases when the username is not defined, for example, at login from client or in distributed queries. In distributed query processing a default user account is used, if the configuration of the server or cluster does not specify the [user and password](/engines/table-engines/special/distributed.md) properties.
+By default, the ClickHouse server provides the `default` user account which isn't allowed using SQL-driven access control and account management but has all the rights and permissions. The `default` user account is used in any cases when the username isn't defined, for example, at login from client or in distributed queries. In distributed query processing a default user account is used, if the configuration of the server or cluster doesn't specify the [user and password](/engines/table-engines/special/distributed.md) properties.
If you just started using ClickHouse, consider the following scenario:
@@ -51,8 +51,8 @@ If you just started using ClickHouse, consider the following scenario:
### Properties of current solution {#access-control-properties}
-- You can grant permissions for databases and tables even if they do not exist.
-- If a table is deleted, all the privileges that correspond to this table are not revoked. This means that even if you create a new table with the same name later, all the privileges remain valid. To revoke privileges corresponding to the deleted table, you need to execute, for example, the `REVOKE ALL PRIVILEGES ON db.table FROM ALL` query.
+- You can grant permissions for databases and tables even if they don't exist.
+- If a table is deleted, all the privileges that correspond to this table aren't revoked. This means that even if you create a new table with the same name later, all the privileges remain valid. To revoke privileges corresponding to the deleted table, you need to execute, for example, the `REVOKE ALL PRIVILEGES ON db.table FROM ALL` query.
- There are no lifetime settings for privileges.
### User account {#user-account-management}
@@ -163,7 +163,7 @@ Management queries:
## Defining SQL users and roles {#defining-sql-users-and-roles}
:::tip
-If you are working in ClickHouse Cloud, please see [Cloud access management](/cloud/security/console-roles).
+If you're working in ClickHouse Cloud, please see [Cloud access management](/cloud/security/console-roles).
:::
This article shows the basics of defining SQL users and roles and applying those privileges and permissions to databases, tables, rows, and columns.
@@ -205,7 +205,7 @@ This article shows the basics of defining SQL users and roles and applying those
This article is intended to provide you with a better understanding of how to define permissions, and how permissions work when using `ALTER` statements for privileged users.
-The `ALTER` statements are divided into several categories, some of which are hierarchical and some of which are not and must be explicitly defined.
+The `ALTER` statements are divided into several categories, some of which are hierarchical and some of which aren't and must be explicitly defined.
**Example DB, table and user configuration**
1. With an admin user, create a sample user
@@ -306,7 +306,7 @@ Query id: 706befbc-525e-4ec1-a1a2-ba2508cc09e3
└──────────────────────────────────────────────────────────────┘
```
-This will grant all permissions under `ALTER TABLE` and `ALTER VIEW` from the example above, however, it will not grant certain other `ALTER` permissions such as `ALTER ROW POLICY` (Refer back to the hierarchy and you will see that `ALTER ROW POLICY` is not a child of `ALTER TABLE` or `ALTER VIEW`). Those must be explicitly granted or revoked.
+This will grant all permissions under `ALTER TABLE` and `ALTER VIEW` from the example above, however, it won't grant certain other `ALTER` permissions such as `ALTER ROW POLICY` (Refer back to the hierarchy and you will see that `ALTER ROW POLICY` isn't a child of `ALTER TABLE` or `ALTER VIEW`). Those must be explicitly granted or revoked.
If only a subset of `ALTER` permissions is needed then each can be granted separately, if there are sub-privileges to that permission then those would be automatically granted also.
@@ -519,7 +519,7 @@ Query id: 1c7622fa-9df1-4c54-9fc3-f984c716aeba
Ok.
```
-8. Test granting a privilege that the alter admin user does not have is not a sub privilege of the grants for the admin user.
+8. Test granting a privilege that the alter admin user doesn't have isn't a sub privilege of the grants for the admin user.
```sql
GRANT ALTER UPDATE ON my_db.my_table TO my_user;
```
@@ -536,4 +536,4 @@ Code: 497. DB::Exception: Received from chnode1.marsnet.local:9440. DB::Exceptio
```
**Summary**
-The ALTER privileges are hierarchical for `ALTER` with tables and views but not for other `ALTER` statements. The permissions can be set in granular level or by grouping of permissions and also revoked similarly. The user granting or revoking must have `WITH GRANT OPTION` to set privileges on users, including the acting user themselves, and must have the privilege already. The acting user cannot revoke their own privileges if they do not have the grant option privilege themselves.
+The ALTER privileges are hierarchical for `ALTER` with tables and views but not for other `ALTER` statements. The permissions can be set in granular level or by grouping of permissions and also revoked similarly. The user granting or revoking must have `WITH GRANT OPTION` to set privileges on users, including the acting user themselves, and must have the privilege already. The acting user can't revoke their own privileges if they don't have the grant option privilege themselves.
diff --git a/docs/guides/sre/user-management/ssl-user-auth.md b/docs/guides/sre/user-management/ssl-user-auth.md
index a01c3c6cdec..3aa9e0934cc 100644
--- a/docs/guides/sre/user-management/ssl-user-auth.md
+++ b/docs/guides/sre/user-management/ssl-user-auth.md
@@ -72,7 +72,7 @@ For details on how to enable SQL users and set roles, refer to [Defining SQL Use
:::
:::note
- We recommend using SQL to define users and roles. However, if you are currently defining users and roles in configuration files, the user will look like:
+ We recommend using SQL to define users and roles. However, if you're currently defining users and roles in configuration files, the user will look like:
```xml
diff --git a/docs/guides/starter_guides/creating-tables.md b/docs/guides/starter_guides/creating-tables.md
index 5833ea044cc..80c0bc3d3f0 100644
--- a/docs/guides/starter_guides/creating-tables.md
+++ b/docs/guides/starter_guides/creating-tables.md
@@ -16,7 +16,7 @@ doc_type: 'guide'
CREATE DATABASE IF NOT EXISTS helloworld
```
-Similarly, use `CREATE TABLE` to define a new table. If you do not specify the database name, the table will be in the
+Similarly, use `CREATE TABLE` to define a new table. If you don't specify the database name, the table will be in the
`default` database.
The following table named `my_first_table` is created in the `helloworld` database:
diff --git a/docs/guides/starter_guides/inserting-data.md b/docs/guides/starter_guides/inserting-data.md
index d61c4aa5950..17622e80c8d 100644
--- a/docs/guides/starter_guides/inserting-data.md
+++ b/docs/guides/starter_guides/inserting-data.md
@@ -33,7 +33,7 @@ Therefore, sending a smaller amount of inserts that each contain more data, comp
Generally, we recommend inserting data in fairly large batches of at least 1,000 rows at a time, and ideally between 10,000 to 100,000 rows.
(Further details [here](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance)).
-If large batches are not possible, use asynchronous inserts described below.
+If large batches aren't possible, use asynchronous inserts described below.
### Ensure consistent batches for idempotent retries {#ensure-consistent-batches-for-idempotent-retries}
@@ -58,10 +58,10 @@ It should be noted however that this approach is a little less performant as wri
### Use asynchronous inserts for small batches {#use-asynchronous-inserts-for-small-batches}
-There are scenarios where client-side batching is not feasible e.g. an observability use case with 100s or 1000s of single-purpose agents sending logs, metrics, traces, etc.
+There are scenarios where client-side batching isn't feasible e.g. an observability use case with 100s or 1000s of single-purpose agents sending logs, metrics, traces, etc.
In this scenario real-time transport of that data is key to detect issues and anomalies as quickly as possible.
Furthermore, there is a risk of event spikes in the observed systems, which could potentially cause large memory spikes and related issues when trying to buffer observability data client-side.
-If large batches cannot be inserted, you can delegate batching to ClickHouse using [asynchronous inserts](/best-practices/selecting-an-insert-strategy#asynchronous-inserts).
+If large batches can't be inserted, you can delegate batching to ClickHouse using [asynchronous inserts](/best-practices/selecting-an-insert-strategy#asynchronous-inserts).
With asynchronous inserts, data is inserted into a buffer first and then written to the database storage later in 3 steps, as illustrated by the diagram below:
@@ -78,7 +78,7 @@ The part created from the buffer flush will potentially contain the data from se
Generally, these mechanics shift the batching of data from the client side to the server side (ClickHouse instance).
:::note
-Note that the data is not searchable by queries before being flushed to the database storage and that the buffer flush is configurable.
+Note that the data isn't searchable by queries before being flushed to the database storage and that the buffer flush is configurable.
Full details on configuring asynchronous inserts can be found [here](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), with a deep dive [here](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse).
:::
diff --git a/docs/guides/starter_guides/mutations.md b/docs/guides/starter_guides/mutations.md
index 565da16cb06..9399f2a044c 100644
--- a/docs/guides/starter_guides/mutations.md
+++ b/docs/guides/starter_guides/mutations.md
@@ -57,7 +57,7 @@ ALTER TABLE [.]
UPDATE = WHERE ` command. You will also need the `tzdata` package.
+1. If you can't access the repository for any reason, download packages as described in the [install guide](../getting-started/install/install.mdx) article and install them manually using the `sudo dpkg -i ` command. You will also need the `tzdata` package.
-### Cannot update deb packages from ClickHouse repository with apt-get {#cannot-update-deb-packages-from-clickhouse-repository-with-apt-get}
+### Can't update deb packages from ClickHouse repository with apt-get {#can't-update-deb-packages-from-clickhouse-repository-with-apt-get}
The issue may be happened when the GPG key is changed.
@@ -80,10 +80,10 @@ After that follow the [install guide](/install/redhat)
Possible issues:
-- The server is not running.
+- The server isn't running.
- Unexpected or wrong configuration parameters.
-### Server is not running {#server-is-not-running}
+### Server isn't running {#server-is-not-running}
#### Check if server is running {#check-if-server-is-running}
@@ -91,7 +91,7 @@ Possible issues:
sudo service clickhouse-server status
```
-If the server is not running, start it with the command:
+If the server isn't running, start it with the command:
```shell
sudo service clickhouse-server start
@@ -112,7 +112,7 @@ If `clickhouse-server` start failed with a configuration error, you should see t
2019.01.11 15:23:25.549505 [ 45 ] {} ExternalDictionaries: Failed reloading 'event2id' external dictionary: Poco::Exception. Code: 1000, e.code() = 111, e.displayText() = Connection refused, e.what() = Connection refused
```
-If you do not see an error at the end of the file, look through the entire file starting from the string:
+If you don't see an error at the end of the file, look through the entire file starting from the string:
```plaintext
Application: starting up.
@@ -136,7 +136,7 @@ Revision: 54413
#### See system.d logs {#see-systemd-logs}
-If you do not find any useful information in `clickhouse-server` logs or there aren't any logs, you can view `system.d` logs using the command:
+If you don't find any useful information in `clickhouse-server` logs or there aren't any logs, you can view `system.d` logs using the command:
```shell
sudo journalctl -u clickhouse-server
@@ -179,7 +179,7 @@ Check:
## Query processing {#query-processing}
-If ClickHouse is not able to process the query, it sends an error description to the client. In the `clickhouse-client` you get a description of the error in the console. If you are using the HTTP interface, ClickHouse sends the error description in the response body. For example:
+If ClickHouse isn't able to process the query, it sends an error description to the client. In the `clickhouse-client` you get a description of the error in the console. If you're using the HTTP interface, ClickHouse sends the error description in the response body. For example:
```shell
$ curl 'http://localhost:8123/' --data-binary "SELECT a"
diff --git a/docs/integrations/data-ingestion/apache-spark/databricks.md b/docs/integrations/data-ingestion/apache-spark/databricks.md
index 7b303b433a9..5b4a0025278 100644
--- a/docs/integrations/data-ingestion/apache-spark/databricks.md
+++ b/docs/integrations/data-ingestion/apache-spark/databricks.md
@@ -191,7 +191,7 @@ This example assumes preconfigured secret scopes in Databricks. For setup instru
### Access mode requirements {#access-mode}
-The ClickHouse Spark Connector requires **Dedicated** (formerly Single User) access mode. **Standard** (formerly Shared) access mode is not supported when Unity Catalog is enabled, as Databricks blocks external DataSource V2 connectors in that configuration.
+The ClickHouse Spark Connector requires **Dedicated** (formerly Single User) access mode. **Standard** (formerly Shared) access mode isn't supported when Unity Catalog is enabled, as Databricks blocks external DataSource V2 connectors in that configuration.
| Access Mode | Unity Catalog | Supported |
|-------------|---------------|-----------|
diff --git a/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md b/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
index bf1f7a87488..3bc833552d9 100644
--- a/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
+++ b/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
@@ -521,7 +521,7 @@ The Spark connector (both TableProvider API and Catalog API) supports the follow
- **`overwrite`**: Replace all data in the table (truncates table)
:::important
-**Partition Overwrite Not Supported**: The connector does not currently support partition-level overwrite operations (e.g., `overwrite` mode with `partitionBy`). This feature is in progress. See [GitHub issue #34](https://github.com/ClickHouse/spark-clickhouse-connector/issues/34) for tracking this feature.
+**Partition Overwrite Not Supported**: The connector doesn't currently support partition-level overwrite operations (e.g., `overwrite` mode with `partitionBy`). This feature is in progress. See [GitHub issue #34](https://github.com/ClickHouse/spark-clickhouse-connector/issues/34) for tracking this feature.
:::
@@ -773,7 +773,7 @@ df.show()
## Write data {#write-data}
:::important
-**Partition Overwrite Not Supported**: The Catalog API does not currently support partition-level overwrite operations (e.g., `overwrite` mode with `partitionBy`). This feature is in progress. See [GitHub issue #34](https://github.com/ClickHouse/spark-clickhouse-connector/issues/34) for tracking this feature.
+**Partition Overwrite Not Supported**: The Catalog API doesn't currently support partition-level overwrite operations (e.g., `overwrite` mode with `partitionBy`). This feature is in progress. See [GitHub issue #34](https://github.com/ClickHouse/spark-clickhouse-connector/issues/34) for tracking this feature.
:::
@@ -1272,7 +1272,7 @@ VariantType write support varies by format:
| Format | Support | Notes |
|--------|---------|-------|
| JSON | ✅ Full | Supports both `JSON` and `Variant` types. Recommended for VariantType data |
-| Arrow | ⚠️ Partial | Supports writing to ClickHouse `JSON` type. Does not support ClickHouse `Variant` type. Full support is pending resolution of https://github.com/ClickHouse/ClickHouse/issues/92752 |
+| Arrow | ⚠️ Partial | Supports writing to ClickHouse `JSON` type. Doesn't support ClickHouse `Variant` type. Full support is pending resolution of https://github.com/ClickHouse/ClickHouse/issues/92752 |
Configure the write format:
@@ -1291,7 +1291,7 @@ If you need to write to a ClickHouse `Variant` type, use JSON format. Arrow form
3. **Enable experimental features**: Ensure ClickHouse has `allow_experimental_json_type = 1` enabled
4. **Use JSON format for writes**: JSON format is recommended for VariantType data for better compatibility
5. **Consider query patterns**: JSON/Variant types support ClickHouse's JSON path queries for efficient filtering
-6. **Column hints for performance**: When using JSON fields in ClickHouse, adding column hints improves query performance. Currently, adding column hints via Spark is not supported. See [GitHub issue #497](https://github.com/ClickHouse/spark-clickhouse-connector/issues/497) for tracking this feature.
+6. **Column hints for performance**: When using JSON fields in ClickHouse, adding column hints improves query performance. Currently, adding column hints via Spark isn't supported. See [GitHub issue #497](https://github.com/ClickHouse/spark-clickhouse-connector/issues/497) for tracking this feature.
### Example: Complete Workflow {#varianttype-example-workflow}
@@ -1495,7 +1495,7 @@ Alternatively, set them in `spark-defaults.conf` or when creating the Spark sess
| spark.clickhouse.write.batchSize | 10000 | The number of records per batch on writing to ClickHouse. | 0.1.0 |
| spark.clickhouse.write.compression.codec | lz4 | The codec used to compress data for writing. Supported codecs: none, lz4. | 0.3.0 |
| spark.clickhouse.write.distributed.convertLocal | false | When writing Distributed table, write local table instead of itself. If `true`, ignore `spark.clickhouse.write.distributed.useClusterNodes`. This bypasses ClickHouse's native routing, requiring Spark to evaluate the sharding key. When using unsupported sharding expressions, set `spark.clickhouse.ignoreUnsupportedTransform` to `false` to prevent silent data distribution errors. | 0.1.0 |
-| spark.clickhouse.write.distributed.convertLocal.allowUnsupportedSharding | false | Allow writing to Distributed tables with `convertLocal=true` and `ignoreUnsupportedTransform=true` when the sharding key is unsupported. This is dangerous and may cause data corruption due to incorrect sharding. When set to `true`, you must ensure that your data is properly sorted/sharded before writing, as Spark cannot evaluate the unsupported sharding expression. Only set to `true` if you understand the risks and have verified your data distribution. By default, this combination will throw an error to prevent silent data corruption. | 0.10.0 |
+| spark.clickhouse.write.distributed.convertLocal.allowUnsupportedSharding | false | Allow writing to Distributed tables with `convertLocal=true` and `ignoreUnsupportedTransform=true` when the sharding key is unsupported. This is dangerous and may cause data corruption due to incorrect sharding. When set to `true`, you must ensure that your data is properly sorted/sharded before writing, as Spark can't evaluate the unsupported sharding expression. Only set to `true` if you understand the risks and have verified your data distribution. By default, this combination will throw an error to prevent silent data corruption. | 0.10.0 |
| spark.clickhouse.write.distributed.useClusterNodes | true | Write to all nodes of cluster when writing Distributed table. | 0.1.0 |
| spark.clickhouse.write.format | arrow | Serialize format for writing. Supported formats: json, arrow | 0.4.0 |
| spark.clickhouse.write.localSortByKey | true | If `true`, do local sort by sort keys before writing. | 0.3.0 |
diff --git a/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md b/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md
index 352502bae96..76ccf822494 100644
--- a/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md
+++ b/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md
@@ -93,7 +93,7 @@ Dataset.
:::warning
Make sure that **Allow storage account key access** is enabled for your storage
-account, otherwise you will not be able to use the account keys to access the
+account, otherwise you won't be able to use the account keys to access the
data.
:::
diff --git a/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md b/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md
index 08133fe61b0..bb83ca074df 100644
--- a/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md
+++ b/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md
@@ -267,7 +267,7 @@ Data](https://clickhouse.com/docs/getting-started/example-datasets/environmental
### Setting up an example dataset {#setting-up-an-example-dataset}
-In this example, we will not use the full Environmental Sensors Dataset, but
+In this example, we won't use the full Environmental Sensors Dataset, but
just a small subset available at the
[Sensors Dataset Sample](https://datasets-documentation.s3.eu-west-3.amazonaws.com/environmental/sensors.csv).
@@ -305,7 +305,7 @@ Now that we've configured both the input and output datasets, we can set up a
sensors table. Set **Request method** to POST. Ensure **HTTP compression
type** is set to **None**.
:::warning
- HTTP compression does not work correctly in Azure Data Factory's Copy Data
+ HTTP compression doesn't work correctly in Azure Data Factory's Copy Data
activity. When enabled, Azure sends a payload consisting of zero bytes only
— likely a bug in the service. Be sure to leave compression disabled.
:::
diff --git a/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md b/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md
index 3f356937ef3..f7486cbe1ac 100644
--- a/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md
+++ b/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md
@@ -49,7 +49,7 @@ ClickPipes reverse private endpoint can be configured with one of the following
### VPC resource {#vpc-resource}
:::info
-Cross-region is not supported.
+Cross-region isn't supported.
:::
Your VPC resources can be accessed in ClickPipes using [PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html). This approach doesn't require setting up a load balancer in front of your data source.
@@ -74,7 +74,7 @@ Your resource gateway attached subnets are recommended to have sufficient IP add
It's recommended to have at least `/26` subnet mask for each subnet.
For each VPC endpoint (each Reverse Private Endpoint), AWS requires a consecutive block of 16 IP addresses per subnet. (`/28` subnet mask)
-If this requirement is not met, Reverse Private Endpoint will transition to a failed state.
+If this requirement isn't met, Reverse Private Endpoint will transition to a failed state.
:::
You can create a resource gateway from the [AWS console](https://docs.aws.amazon.com/vpc/latest/privatelink/create-resource-gateway.html) or with the following command:
@@ -147,7 +147,7 @@ aws ram create-resource-share \
The output will contain a Resource-Share ARN, which you will need to set up a ClickPipe connection with VPC resource.
-You are ready to [create a ClickPipe with Reverse private endpoint](#creating-clickpipe) using VPC resource. You will need to:
+You're ready to [create a ClickPipe with Reverse private endpoint](#creating-clickpipe) using VPC resource. You will need to:
- Set `VPC endpoint type` to `VPC Resource`.
- Set `Resource configuration ID` to the ID of the Resource-Configuration created in step 2.
- Set `Resource share ARN` to the ARN of the Resource-Share created in step 3.
@@ -159,8 +159,8 @@ For more details on PrivateLink with VPC resource, see [AWS documentation](https
### MSK multi-VPC connectivity {#msk-multi-vpc}
The [Multi-VPC connectivity](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access-mult-vpc.html) is a built-in feature of AWS MSK that allows you to connect multiple VPCs to a single MSK cluster.
-Private DNS support is out of the box and does not require any additional configuration.
-Cross-region is not supported.
+Private DNS support is out of the box and doesn't require any additional configuration.
+Cross-region isn't supported.
It is a recommended option for ClickPipes for MSK.
See the [getting started](https://docs.aws.amazon.com/msk/latest/developerguide/mvpc-getting-started.html) guide for more details.
@@ -239,7 +239,7 @@ For same-region access, creating a VPC Resource is the recommended approach.
7. Click on `Create` and wait for the reverse private endpoint to be ready.
- If you are creating a new endpoint, it will take some time to set up the endpoint.
+ If you're creating a new endpoint, it will take some time to set up the endpoint.
The page will refresh automatically once the endpoint is ready.
VPC endpoint service might require accepting the connection request in your AWS console.
@@ -249,7 +249,7 @@ For same-region access, creating a VPC Resource is the recommended approach.
On a list of endpoints, you can see the DNS name for the available endpoint.
It can be either an internally ClickPipes provisioned DNS name or a private DNS name supplied by a PrivateLink service.
- DNS name is not a complete network address.
+ DNS name isn't a complete network address.
Add the port according to the data source.
MSK connection string can be accessed in the AWS console.
@@ -283,17 +283,17 @@ You can manage existing reverse private endpoints in the ClickHouse Cloud servic
AWS PrivateLink support is limited to specific AWS regions for ClickPipes.
Please refer to the [ClickPipes regions list](/integrations/clickpipes#list-of-static-ips) to see the available regions.
-This restriction does not apply to PrivateLink VPC endpoint service with a cross-region connectivity enabled.
+This restriction doesn't apply to PrivateLink VPC endpoint service with a cross-region connectivity enabled.
## Limitations {#limitations}
-AWS PrivateLink endpoints for ClickPipes created in ClickHouse Cloud are not guaranteed to be created
+AWS PrivateLink endpoints for ClickPipes created in ClickHouse Cloud aren't guaranteed to be created
in the same AWS region as the ClickHouse Cloud service.
Currently, only VPC endpoint service supports
cross-region connectivity.
-Private endpoints are linked to a specific ClickHouse service and are not transferable between services.
+Private endpoints are linked to a specific ClickHouse service and aren't transferable between services.
Multiple ClickPipes for a single ClickHouse service can reuse the same endpoint.
-AWS MSK supports only one PrivateLink (VPC endpoint) per MSK cluster per authentication type (SASL_IAM or SASL_SCRAM). As a result, multiple ClickHouse Cloud services or organizations cannot create separate PrivateLink connections to the same MSK cluster using the same auth type.
+AWS MSK supports only one PrivateLink (VPC endpoint) per MSK cluster per authentication type (SASL_IAM or SASL_SCRAM). As a result, multiple ClickHouse Cloud services or organizations can't create separate PrivateLink connections to the same MSK cluster using the same auth type.
diff --git a/docs/integrations/data-ingestion/clickpipes/index.md b/docs/integrations/data-ingestion/clickpipes/index.md
index 54610191a98..94a2932b3ea 100644
--- a/docs/integrations/data-ingestion/clickpipes/index.md
+++ b/docs/integrations/data-ingestion/clickpipes/index.md
@@ -125,7 +125,7 @@ ClickPipes will create a table next to your destination table with the postfix `
### System Errors {#system-errors}
Errors related to the operation of the ClickPipe will be stored in the `system.clickpipes_log` table. This will store all other errors related to the operation of your ClickPipe (network, connectivity, etc.). This table has a [TTL](/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-ttl) of 7 days.
-If ClickPipes cannot connect to a data source after 15 min or to a destination after 1 hr, the ClickPipes instance stops and stores an appropriate message in the system error table (provided the ClickHouse instance is available).
+If ClickPipes can't connect to a data source after 15 min or to a destination after 1 hr, the ClickPipes instance stops and stores an appropriate message in the system error table (provided the ClickHouse instance is available).
## FAQ {#faq}
- **What is ClickPipes?**
diff --git a/docs/integrations/data-ingestion/clickpipes/kafka/02_schema-registries.md b/docs/integrations/data-ingestion/clickpipes/kafka/02_schema-registries.md
index 11386b23c57..9c22b7b4736 100644
--- a/docs/integrations/data-ingestion/clickpipes/kafka/02_schema-registries.md
+++ b/docs/integrations/data-ingestion/clickpipes/kafka/02_schema-registries.md
@@ -22,7 +22,7 @@ Schema registries that are API-compatible with the Confluent Schema Registry are
- Confluent Schema Registry
- Redpanda Schema Registry
-ClickPipes does not support AWS Glue Schema Registry or Azure Schema Registry yet. If you require support for these schema registries, [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
+ClickPipes doesn't support AWS Glue Schema Registry or Azure Schema Registry yet. If you require support for these schema registries, [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
## Configuration {#schema-registry-configuration}
@@ -38,13 +38,13 @@ ClickPipes with Avro data require a schema registry. This can be configured in o
ClickPipes dynamically retrieves and applies the Avro schema from the configured schema registry.
- If there's a schema id embedded in the message, it will use that to retrieve the schema.
- If there's no schema id embedded in the message, it will use the schema id or subject name specified in the ClickPipe configuration to retrieve the schema.
-- If the message is written without an embedded schema id, and no schema id or subject name is specified in the ClickPipe configuration, then the schema will not be retrieved and the message will be skipped with a `SOURCE_SCHEMA_ERROR` logged in the ClickPipes errors table.
-- If the message does not conform to the schema, then the message will be skipped with a `DATA_PARSING_ERROR` logged in the ClickPipes errors table.
+- If the message is written without an embedded schema id, and no schema id or subject name is specified in the ClickPipe configuration, then the schema won't be retrieved and the message will be skipped with a `SOURCE_SCHEMA_ERROR` logged in the ClickPipes errors table.
+- If the message doesn't conform to the schema, then the message will be skipped with a `DATA_PARSING_ERROR` logged in the ClickPipes errors table.
## Schema mapping {#schema-mapping}
The following rules are applied to the mapping between the retrieved Avro schema and the ClickHouse destination table:
-- If the Avro schema contains a field that is not included in the ClickHouse destination mapping, that field is ignored.
-- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that DEFAULT expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
+- If the Avro schema contains a field that isn't included in the ClickHouse destination mapping, that field is ignored.
+- If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that DEFAULT expressions aren't currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing).
- If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro record field can not be inserted into an Int32 ClickHouse column).
diff --git a/docs/integrations/data-ingestion/clickpipes/kafka/03_reference.md b/docs/integrations/data-ingestion/clickpipes/kafka/03_reference.md
index 27aa95d7bcc..c2492311f32 100644
--- a/docs/integrations/data-ingestion/clickpipes/kafka/03_reference.md
+++ b/docs/integrations/data-ingestion/clickpipes/kafka/03_reference.md
@@ -65,11 +65,11 @@ The following standard ClickHouse data types are currently supported in ClickPip
### Avro {#avro}
#### Supported Avro Data Types {#supported-avro-data-types}
-ClickPipes supports all Avro Primitive and Complex types, and all Avro Logical types except `time-millis`, `time-micros`, `local-timestamp-millis`, `local_timestamp-micros`, and `duration`. Avro `record` types are converted to Tuple, `array` types to Array, and `map` to Map (string keys only). In general the conversions listed [here](/interfaces/formats/Avro#data-type-mapping) are available. We recommend using exact type matching for Avro numeric types, as ClickPipes does not check for overflow or precision loss on type conversion.
+ClickPipes supports all Avro Primitive and Complex types, and all Avro Logical types except `time-millis`, `time-micros`, `local-timestamp-millis`, `local_timestamp-micros`, and `duration`. Avro `record` types are converted to Tuple, `array` types to Array, and `map` to Map (string keys only). In general the conversions listed [here](/interfaces/formats/Avro#data-type-mapping) are available. We recommend using exact type matching for Avro numeric types, as ClickPipes doesn't check for overflow or precision loss on type conversion.
Alternatively, all Avro types can be inserted into a `String` column, and will be represented as a valid JSON string in that case.
#### Nullable types and Avro unions {#nullable-types-and-avro-unions}
-Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(null, T)` where T is the base Avro type. During schema inference, such unions will be mapped to a ClickHouse "Nullable" column. Note that ClickHouse does not support
+Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(null, T)` where T is the base Avro type. During schema inference, such unions will be mapped to a ClickHouse "Nullable" column. Note that ClickHouse doesn't support
`Nullable(Array)`, `Nullable(Map)`, or `Nullable(Tuple)` types. Avro null unions for these types will be mapped to non-nullable versions (Avro Record types are mapped to a ClickHouse named Tuple). Avro "nulls" for these types will be inserted as:
- An empty Array for a null Avro array
- An empty Map for a null Avro Map
@@ -78,10 +78,10 @@ Nullable types in Avro are defined by using a Union schema of `(T, null)` or `(n
#### Variant type support {#variant-type-support}
ClickPipes supports the Variant type in the following circumstances:
- Avro Unions. If your Avro schema contains a union with multiple non-null types, ClickPipes will infer the
- appropriate variant type. Variant types are not otherwise supported for Avro data.
+ appropriate variant type. Variant types aren't otherwise supported for Avro data.
- JSON fields. You can manually specify a Variant type (such as `Variant(String, Int64, DateTime)`) for any JSON field
- in the source data stream. Complex subtypes (arrays/maps/tuples) are not supported. In addition, because of the way ClickPipes determines
- the correct variant subtype to use, only one integer or datetime type can be used in the Variant definition - for example, `Variant(Int64, UInt32)` is not supported.
+ in the source data stream. Complex subtypes (arrays/maps/tuples) aren't supported. In addition, because of the way ClickPipes determines
+ the correct variant subtype to use, only one integer or datetime type can be used in the Variant definition - for example, `Variant(Int64, UInt32)` isn't supported.
#### JSON type support {#json-type-support}
ClickPipes support the JSON type in the following circumstances:
diff --git a/docs/integrations/data-ingestion/clickpipes/kafka/04_best_practices.md b/docs/integrations/data-ingestion/clickpipes/kafka/04_best_practices.md
index 0cf6f30d4ea..5a4e7807f50 100644
--- a/docs/integrations/data-ingestion/clickpipes/kafka/04_best_practices.md
+++ b/docs/integrations/data-ingestion/clickpipes/kafka/04_best_practices.md
@@ -20,7 +20,7 @@ To learn more about message compression in Kafka, we recommend starting with thi
## Limitations {#limitations}
-- [`DEFAULT`](/sql-reference/statements/create/table#default) is not supported.
+- [`DEFAULT`](/sql-reference/statements/create/table#default) isn't supported.
- Individual messages are limited to 8MB (uncompressed) by default when running with the smallest (XS) replica size, and 16MB (uncompressed) with larger replicas. Messages that exceed this limit will be rejected with an error. If you have a need for larger messages, please contact support.
## Delivery semantics {#delivery-semantics}
@@ -31,7 +31,7 @@ For Apache Kafka protocol data sources, ClickPipes supports [SASL/PLAIN](https:/
## Warpstream Fetch Size {#warpstream-settings}
ClickPipes rely on the Kafka setting `max.fetch_bytes` to limit the size of data processed in a single ClickPipes node at any one time. In some circumstances
-Warpstream does not respect this setting, which can cause unexpected pipe failures. We strongly recommend that the Warpstream specific setting `kafkaMaxFetchPartitionBytesUncompressedOverride`
+Warpstream doesn't respect this setting, which can cause unexpected pipe failures. We strongly recommend that the Warpstream specific setting `kafkaMaxFetchPartitionBytesUncompressedOverride`
to 8MB (or lower) when configuring your WarpStream agent to prevent ClickPipes failures.
### IAM {#iam}
@@ -87,7 +87,7 @@ Below is an example of the required IAM policy for Apache Kafka APIs for MSK:
#### Configuring a trusted relationship {#configuring-a-trusted-relationship}
-If you are authenticating to MSK with a IAM role ARN, you will need to add a trusted relationship between your ClickHouse Cloud instance so the role can be assumed.
+If you're authenticating to MSK with a IAM role ARN, you will need to add a trusted relationship between your ClickHouse Cloud instance so the role can be assumed.
:::note
Role-based access only works for ClickHouse Cloud instances deployed to AWS.
@@ -126,7 +126,7 @@ Batches are inserted when one of the following criteria has been met:
Latency (defined as the time between the Kafka message being produced and the message being available in ClickHouse) will be dependent on a number of factors (i.e. broker latency, network latency, message size/format). The [batching](#batching) described in the section above will also impact latency. We always recommend testing your specific use case with typical loads to determine the expected latency.
-ClickPipes does not provide any guarantees concerning latency. If you have specific low-latency requirements, please [contact us](https://clickhouse.com/company/contact?loc=clickpipes).
+ClickPipes doesn't provide any guarantees concerning latency. If you have specific low-latency requirements, please [contact us](https://clickhouse.com/company/contact?loc=clickpipes).
### Scaling {#scaling}
@@ -141,11 +141,11 @@ the ClickPipe will automatically restart the consumer and continue processing me
### Benchmarks {#benchmarks}
-Below are some informal benchmarks for ClickPipes for Kafka that can be used to get a general idea of the baseline performance. It's important to know that many factors can impact performance, including message size, data types, and data format. Your mileage may vary, and what we show here is not a guarantee of actual performance.
+Below are some informal benchmarks for ClickPipes for Kafka that can be used to get a general idea of the baseline performance. It's important to know that many factors can impact performance, including message size, data types, and data format. Your mileage may vary, and what we show here isn't a guarantee of actual performance.
Benchmark details:
-- We used production ClickHouse Cloud services with enough resources to ensure that throughput was not bottlenecked by the insert processing on the ClickHouse side.
+- We used production ClickHouse Cloud services with enough resources to ensure that throughput wasn't bottlenecked by the insert processing on the ClickHouse side.
- The ClickHouse Cloud service, the Kafka cluster (Confluent Cloud), and the ClickPipe were all running in the same region (`us-east-2`).
- The ClickPipe was configured with a single L-sized replica (4 GiB of RAM and 1 vCPU).
- The sample data included nested data with a mix of `UUID`, `String`, and `Int` datatypes. Other datatypes, such as `Float`, `Decimal`, and `DateTime`, may be less performant.
diff --git a/docs/integrations/data-ingestion/clickpipes/kafka/05_faq.md b/docs/integrations/data-ingestion/clickpipes/kafka/05_faq.md
index 48ab8ad521d..389c9a81c7b 100644
--- a/docs/integrations/data-ingestion/clickpipes/kafka/05_faq.md
+++ b/docs/integrations/data-ingestion/clickpipes/kafka/05_faq.md
@@ -81,7 +81,7 @@ No. ClickPipes requires the Event Hubs namespace to have the Kafka surface enabl
Does Azure Schema Registry work with ClickPipes?
-No. ClickPipes only supports schema registries that are API-compatible with the Confluent Schema Registry, which is not the case for Azure Schema Registry. If you require support for this schema registry, [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
+No. ClickPipes only supports schema registries that are API-compatible with the Confluent Schema Registry, which isn't the case for Azure Schema Registry. If you require support for this schema registry, [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
diff --git a/docs/integrations/data-ingestion/clickpipes/kinesis/01_overview.md b/docs/integrations/data-ingestion/clickpipes/kinesis/01_overview.md
index f9ff8c5d80d..d495748779a 100644
--- a/docs/integrations/data-ingestion/clickpipes/kinesis/01_overview.md
+++ b/docs/integrations/data-ingestion/clickpipes/kinesis/01_overview.md
@@ -117,7 +117,7 @@ The following ClickHouse data types are currently supported in ClickPipes:
### Variant type support {#variant-type-support}
You can manually specify a Variant type (such as `Variant(String, Int64, DateTime)`) for any JSON field
in the source data stream. Because of the way ClickPipes determines the correct variant subtype to use, only one integer or datetime
-type can be used in the Variant definition - for example, `Variant(Int64, UInt32)` is not supported.
+type can be used in the Variant definition - for example, `Variant(Int64, UInt32)` isn't supported.
### JSON type support {#json-type-support}
JSON fields that are always a JSON object can be assigned to a JSON destination column. You will have to manually change the destination
@@ -140,7 +140,7 @@ view). For such pipes, it may improve ClickPipes performance to delete all the
## Limitations {#limitations}
-- [DEFAULT](/sql-reference/statements/create/table#default) is not supported.
+- [DEFAULT](/sql-reference/statements/create/table#default) isn't supported.
- Individual messages are limited to 8MB (uncompressed) by default when running with the smallest (XS) replica size, and 16MB (uncompressed) with larger replicas. Messages that exceed this limit will be rejected with an error. If you have a need for larger messages, please contact support.
## Performance {#performance}
@@ -163,9 +163,9 @@ If you have specific low-latency requirements, please [contact us](https://click
We strongly recommend limiting the number concurrently active shards to match your throughput requirements. For an "On Demand" Kinesis stream, AWS will automatically assign a matching number of shards based on throughput,
but for "Provisioned" streams, provisioning too many shards can cause latency as described below, plus have increased costs because Kinesis pricing for such streams is on a "per shard" basis.
-If your producer application writes continuously to a large number of active shards, this can cause latency if your pipe is not scaled high enough to efficiently process those shards. Based on Kinesis throughput limits,
+If your producer application writes continuously to a large number of active shards, this can cause latency if your pipe isn't scaled high enough to efficiently process those shards. Based on Kinesis throughput limits,
ClickPipes assigns a specific number of "workers" per replica to read shard data. For example, at the smallest size, a ClickPipes replica will have 4 of these worker threads. If the producer is writing
-to more than 4 shards at the same time, data will not be processed from the "extra" shards until a worker thread is available. In particular, if the pipe is using "enhanced fanout", each worker thread will subscribe to a
+to more than 4 shards at the same time, data won't be processed from the "extra" shards until a worker thread is available. In particular, if the pipe is using "enhanced fanout", each worker thread will subscribe to a
single shard for 5 minutes, and is unavailable to read any other shard during that time. This can cause latency "spikes" of 5 minute multiples.
### Scaling {#scaling}
diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/controlling_sync.md b/docs/integrations/data-ingestion/clickpipes/mongodb/controlling_sync.md
index b98c052ae7e..90325a6e19c 100644
--- a/docs/integrations/data-ingestion/clickpipes/mongodb/controlling_sync.md
+++ b/docs/integrations/data-ingestion/clickpipes/mongodb/controlling_sync.md
@@ -26,7 +26,7 @@ There are two main ways to control the sync of a MongoDB ClickPipe. The ClickPip
### Sync interval {#interval}
-The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
+The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse isn't included in this interval.
The default is **1 minute**.
Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/index.md b/docs/integrations/data-ingestion/clickpipes/mongodb/index.md
index fce10ab66c8..60e21112e31 100644
--- a/docs/integrations/data-ingestion/clickpipes/mongodb/index.md
+++ b/docs/integrations/data-ingestion/clickpipes/mongodb/index.md
@@ -48,7 +48,7 @@ Once your source MongoDB database is set up, you can continue creating your Clic
## Create your ClickPipe {#create-your-clickpipe}
-Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
+Make sure you're logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service.
@@ -75,7 +75,7 @@ Make sure you are logged in to your ClickHouse Cloud account. If you don't have
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}
-You can specify SSH tunneling details if your source MongoDB database is not publicly accessible.
+You can specify SSH tunneling details if your source MongoDB database isn't publicly accessible.
1. Enable the "Use SSH Tunnelling" toggle.
2. Fill in the SSH connection details.
@@ -124,4 +124,4 @@ Here are a few caveats to note when using this connector:
- We require MongoDB version 5.1.0+.
- We use MongoDB's native Change Streams API for CDC, which relies on the MongoDB oplog to capture real-time changes.
- Documents from MongoDB are replicated into ClickHouse as JSON type by default. This allows for flexible schema management and makes it possible to use the rich set of JSON operators in ClickHouse for querying and analytics. You can learn more about querying JSON data [here](https://clickhouse.com/docs/sql-reference/data-types/newjson).
-- Self-serve PrivateLink configuration is not currently available. If you are on AWS and require PrivateLink, please reach out to db-integrations-support@clickhouse.com or create a support ticket — we will work with you to enable it.
+- Self-serve PrivateLink configuration isn't currently available. If you're on AWS and require PrivateLink, please reach out to db-integrations-support@clickhouse.com or create a support ticket — we will work with you to enable it.
diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md b/docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md
index e9c83c9bb8d..96654a729bf 100644
--- a/docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md
+++ b/docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md
@@ -39,7 +39,7 @@ Once the pipe is in the `Running` state, you can pause it. This will stop the CD
:::note
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
:::
-When you click on the Pause button, the pipe enters the `Pausing` state. This is a transient state where we are in the process of stopping the CDC process. Once the CDC process is fully stopped, the pipe will enter the `Paused` state.
+When you click on the Pause button, the pipe enters the `Pausing` state. This is a transient state where we're in the process of stopping the CDC process. Once the CDC process is fully stopped, the pipe will enter the `Paused` state.
## Modifying {#modifying}
:::note
diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/source/atlas.md b/docs/integrations/data-ingestion/clickpipes/mongodb/source/atlas.md
index 6e8d13af8ae..d519cb8c413 100644
--- a/docs/integrations/data-ingestion/clickpipes/mongodb/source/atlas.md
+++ b/docs/integrations/data-ingestion/clickpipes/mongodb/source/atlas.md
@@ -22,7 +22,7 @@ import Image from '@theme/IdealImage';
## Configure oplog retention {#enable-oplog-retention}
-Minimum oplog retention of 24 hours is required for replication. We recommend setting the oplog retention to 72 hours or longer to ensure that the oplog is not truncated before the initial snapshot is completed. To set the oplog retention via UI:
+Minimum oplog retention of 24 hours is required for replication. We recommend setting the oplog retention to 72 hours or longer to ensure that the oplog isn't truncated before the initial snapshot is completed. To set the oplog retention via UI:
1. Navigate to your cluster's `Overview` tab in the MongoDB Atlas console and click on the `Configuration` tab.
@@ -37,7 +37,7 @@ Minimum oplog retention of 24 hours is required for replication. We recommend se
## Configure a database user {#configure-database-user}
-Once you are logged in to your MongoDB Atlas console, click `Database Access` under the Security tab in the left navigation bar. Click on "Add New Database User".
+Once you're logged in to your MongoDB Atlas console, click `Database Access` under the Security tab in the left navigation bar. Click on "Add New Database User".
ClickPipes requires password authentication:
diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/source/documentdb.md b/docs/integrations/data-ingestion/clickpipes/mongodb/source/documentdb.md
index 1a74a7fd730..f5ff5080714 100644
--- a/docs/integrations/data-ingestion/clickpipes/mongodb/source/documentdb.md
+++ b/docs/integrations/data-ingestion/clickpipes/mongodb/source/documentdb.md
@@ -21,11 +21,11 @@ ClickPipes supports DocumentDB version 5.0.
## Configure change stream log retention {#configure-change-stream-log-retention}
-By default, Amazon DocumentDB has a 3-hour change stream log retention period, while initial load may take much longer depending on existing data volume in your DocumentDB. We recommend setting the change stream log retention to 72 hours or longer to ensure that it is not truncated before the initial snapshot is completed.
+By default, Amazon DocumentDB has a 3-hour change stream log retention period, while initial load may take much longer depending on existing data volume in your DocumentDB. We recommend setting the change stream log retention to 72 hours or longer to ensure that it isn't truncated before the initial snapshot is completed.
### Update change stream log retention via AWS Console {#update-change-stream-log-retention-via-aws-console}
-1. Click `Parameter groups` in the left panel, find the parameter group used by your DocumentDB cluster (if you are using the default parameter group, you will need to create a new parameter group first in order to modify it).
+1. Click `Parameter groups` in the left panel, find the parameter group used by your DocumentDB cluster (if you're using the default parameter group, you will need to create a new parameter group first in order to modify it).
2. Search for `change_stream_log_retention_duration`, select and edit it to `259200` (72 hours)
diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/source/generic.md b/docs/integrations/data-ingestion/clickpipes/mongodb/source/generic.md
index 87b4c96f578..cf0607ede72 100644
--- a/docs/integrations/data-ingestion/clickpipes/mongodb/source/generic.md
+++ b/docs/integrations/data-ingestion/clickpipes/mongodb/source/generic.md
@@ -20,7 +20,7 @@ If you use MongoDB Atlas, please refer to the specific guide [here](./atlas).
## Enable oplog retention {#enable-oplog-retention}
-Minimum oplog retention of 24 hours is required for replication. We recommend setting the oplog retention to 72 hours or longer to ensure that the oplog is not truncated before the initial snapshot is completed.
+Minimum oplog retention of 24 hours is required for replication. We recommend setting the oplog retention to 72 hours or longer to ensure that the oplog isn't truncated before the initial snapshot is completed.
You can check your current oplog retention by running the following command in the MongoDB shell (you must have `clusterMonitor` role to run this command):
diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/table_resync.md b/docs/integrations/data-ingestion/clickpipes/mongodb/table_resync.md
index ee31e7aad1c..67bd53e8e79 100644
--- a/docs/integrations/data-ingestion/clickpipes/mongodb/table_resync.md
+++ b/docs/integrations/data-ingestion/clickpipes/mongodb/table_resync.md
@@ -23,7 +23,7 @@ This can be followed by following the [table removal guide](./removing_tables).
### 2. Truncate or drop the table on ClickHouse {#truncate-drop-table}
This step is to avoid data duplication when we add this table again in the next step. You can do this by heading over to the **SQL Console** tab in ClickHouse Cloud and running a query.
-Note that we have validation to block table addition if the table already exists in ClickHouse and is not empty.
+Note that we have validation to block table addition if the table already exists in ClickHouse and isn't empty.
### 3. Add the table to the ClickPipe again {#add-table-again}
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md b/docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md
index 551e0a23625..ecc1510cdfa 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md
@@ -26,7 +26,7 @@ There are two main ways to control the sync of a MySQL ClickPipe. The ClickPipe
### Sync interval {#interval}
-The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
+The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse isn't included in this interval.
The default is **1 minute**.
Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/faq.md b/docs/integrations/data-ingestion/clickpipes/mysql/faq.md
index cd7195e8ef6..4fe860d91ad 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/faq.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/faq.md
@@ -17,14 +17,14 @@ integration:
Yes, the MySQL ClickPipe supports MariaDB 10.0 and above. The configuration for it is very similar to MySQL, using GTID replication by default.
### Does the MySQL ClickPipe support PlanetScale, Vitess, or TiDB? {#does-the-clickpipe-support-planetscale-vitess}
-No, these do not support MySQL's binlog API.
+No, these don't support MySQL's binlog API.
### How is replication managed? {#how-is-replication-managed}
We support both `GTID` & `FilePos` replication. Unlike Postgres there is no slot to manage offset. Instead, you must configure your MySQL server to have a sufficient binlog retention period. If our offset into the binlog becomes invalidated *(eg, mirror paused too long, or database failover occurs while using `FilePos` replication)* then you will need to resync the pipe. Make sure to optimize materialized views depending on destination tables, as inefficient queries can slow down ingestion to fall behind the retention period.
It's also possible for an inactive database to rotate the log file without allowing ClickPipes to progress to a more recent offset. You may need to setup a heartbeat table with regularly scheduled updates.
-At the start of an initial load we record the binlog offset to start at. This offset must still be valid when the initial load finishes in order for CDC to progress. If you are ingesting a large amount of data be sure to configure an appropriate binlog retention period. While setting up tables you can speed up initial load by configuring *Use a custom partitioning key for initial load* for large tables under advanced settings so that we can load a single table in parallel.
+At the start of an initial load we record the binlog offset to start at. This offset must still be valid when the initial load finishes in order for CDC to progress. If you're ingesting a large amount of data be sure to configure an appropriate binlog retention period. While setting up tables you can speed up initial load by configuring *Use a custom partitioning key for initial load* for large tables under advanced settings so that we can load a single table in parallel.
### Why am I getting a TLS certificate validation error when connecting to MySQL? {#tls-certificate-validation-error}
@@ -46,11 +46,11 @@ Please refer to the [ClickPipes for MySQL: Schema Changes Propagation Support](.
### Do you support replicating MySQL foreign key cascading deletes `ON DELETE CASCADE`? {#support-on-delete-cascade}
-Due to how MySQL [handles cascading deletes](https://dev.mysql.com/doc/refman/8.0/en/innodb-and-mysql-replication.html), they are not written to the binlog. Therefore it's not possible for ClickPipes (or any CDC tool) to replicate them. This can lead to inconsistent data. It's advised to use triggers instead for supporting cascading deletes.
+Due to how MySQL [handles cascading deletes](https://dev.mysql.com/doc/refman/8.0/en/innodb-and-mysql-replication.html), they're not written to the binlog. Therefore it's not possible for ClickPipes (or any CDC tool) to replicate them. This can lead to inconsistent data. It's advised to use triggers instead for supporting cascading deletes.
### Why can I not replicate my table which has a dot in it? {#replicate-table-dot}
-PeerDB has a limitation currently where dots in source table identifiers - aka either schema name or table name - is not supported for replication as PeerDB cannot discern, in that case, what is the schema and what is the table as it splits on dot.
+PeerDB has a limitation currently where dots in source table identifiers - aka either schema name or table name - isn't supported for replication as PeerDB can't discern, in that case, what is the schema and what is the table as it splits on dot.
Effort is being made to support input of schema and table separately to get around this limitation.
### Can I include columns I initially excluded from replication? {#include-excluded-columns}
-This is not yet supported, an alternative would be to [resync the table](./table_resync.md) whose columns you want to include.
+This isn't yet supported, an alternative would be to [resync the table](./table_resync.md) whose columns you want to include.
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/index.md b/docs/integrations/data-ingestion/clickpipes/mysql/index.md
index 3bf3ed87f23..2c533ecc625 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/index.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/index.md
@@ -40,7 +40,7 @@ MySQL ClickPipes can be deployed and managed manually using the ClickPipes UI. I
## Prerequisites {#prerequisites}
-[//]: # "TODO Binlog replication configuration is not needed for one-time ingestion pipes. This has been a source of confusion in the past, so we should also provide the bare minimum requirements for bulk loads to avoid scaring users off."
+[//]: # "TODO Binlog replication configuration isn't needed for one-time ingestion pipes. This has been a source of confusion in the past, so we should also provide the bare minimum requirements for bulk loads to avoid scaring users off."
To get started, you first need to ensure that your MySQL database is correctly configured for binlog replication. The configuration steps depend on how you're deploying MySQL, so please follow the relevant guide below:
@@ -60,7 +60,7 @@ Once your source MySQL database is set up, you can continue creating your ClickP
## Create your ClickPipe {#create-your-clickpipe}
-Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
+Make sure you're logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
[//]: # ( TODO update image here)
1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service.
@@ -88,7 +88,7 @@ Make sure you are logged in to your ClickHouse Cloud account. If you don't have
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}
-You can specify SSH tunneling details if your source MySQL database is not publicly accessible.
+You can specify SSH tunneling details if your source MySQL database isn't publicly accessible.
1. Enable the "Use SSH Tunnelling" toggle.
2. Fill in the SSH connection details.
@@ -132,6 +132,6 @@ Finally, please refer to the ["ClickPipes for MySQL FAQ"](/integrations/clickpip
## What's next? {#whats-next}
-[//]: # "TODO Write a MySQL-specific migration guide and best practices similar to the existing one for PostgreSQL. The current migration guide points to the MySQL table engine, which is not ideal."
+[//]: # "TODO Write a MySQL-specific migration guide and best practices similar to the existing one for PostgreSQL. The current migration guide points to the MySQL table engine, which isn't ideal."
Once you've set up your ClickPipe to replicate data from MySQL to ClickHouse Cloud, you can focus on how to query and model your data for optimal performance. For common questions around MySQL CDC and troubleshooting, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md b/docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md
index 6e92ce49b72..e12557d5a48 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md
@@ -39,7 +39,7 @@ Once the pipe is in the `Running` state, you can pause it. This will stop the CD
:::note
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
:::
-When you click on the Pause button, the pipe enters the `Pausing` state. This is a transient state where we are in the process of stopping the CDC process. Once the CDC process is fully stopped, the pipe will enter the `Paused` state.
+When you click on the Pause button, the pipe enters the `Pausing` state. This is a transient state where we're in the process of stopping the CDC process. Once the CDC process is fully stopped, the pipe will enter the `Paused` state.
## Modifying {#modifying}
:::note
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md b/docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md
index 5a7efee5539..a4ac7ee5eb4 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md
@@ -51,6 +51,6 @@ Not really related to parallel snapshot, but this setting controls how many tabl
You can run **SHOW processlist** in MySQL to see the parallel snapshot in action. The ClickPipe will create multiple connections to the source database, each reading a different partition of the source table. If you see **SELECT** queries with different ranges, it means that the ClickPipe is reading the source tables. You can also see the COUNT(*) and the partitioning query in here.
### Limitations {#limitations-parallel-mysql-snapshot}
-- The snapshot parameters cannot be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
-- When adding tables to an existing ClickPipe, you cannot change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.
-- The partition key column should not contain `NULL`s, as they are skipped by the partitioning logic.
+- The snapshot parameters can't be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
+- When adding tables to an existing ClickPipe, you can't change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.
+- The partition key column shouldn't contain `NULL`s, as they're skipped by the partitioning logic.
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/schema-changes.md b/docs/integrations/data-ingestion/clickpipes/mysql/schema-changes.md
index f12a8365f02..aa63067b2a5 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/schema-changes.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/schema-changes.md
@@ -16,7 +16,7 @@ ClickPipes for MySQL can detect schema changes in the source tables and, in some
| Schema Change Type | Behaviour |
| ----------------------------------------------------------------------------------- | ------------------------------------- |
| Adding a new column (`ALTER TABLE ADD COLUMN ...`) | Propagated automatically. The new column(s) will be populated for all rows replicated after the schema change |
-| Adding a new column with a default value (`ALTER TABLE ADD COLUMN ... DEFAULT ...`) | Propagated automatically. The new column(s) will be populated for all rows replicated after the schema change, but existing rows will not show the default value without a full table refresh |
+| Adding a new column with a default value (`ALTER TABLE ADD COLUMN ... DEFAULT ...`) | Propagated automatically. The new column(s) will be populated for all rows replicated after the schema change, but existing rows won't show the default value without a full table refresh |
| Dropping an existing column (`ALTER TABLE DROP COLUMN ...`) | Detected, but **not** propagated. The dropped column(s) will be populated with `NULL` for all rows replicated after the schema change |
-**Schema changes are not supported for MySQL 5.7 and older versions**. Reliably tracking columns depends on on table metadata not available in the binlog prior to [MySQL 8.0.1](https://dev.mysql.com/blog-archive/more-metadata-is-written-into-binary-log/).
+**Schema changes aren't supported for MySQL 5.7 and older versions**. Reliably tracking columns depends on on table metadata not available in the binlog prior to [MySQL 8.0.1](https://dev.mysql.com/blog-archive/more-metadata-is-written-into-binary-log/).
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/source/azure-flexible-server-mysql.md b/docs/integrations/data-ingestion/clickpipes/mysql/source/azure-flexible-server-mysql.md
index d9d4dac6bcd..9c5a2e28e67 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/source/azure-flexible-server-mysql.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/source/azure-flexible-server-mysql.md
@@ -17,7 +17,7 @@ import TabItem from '@theme/TabItem';
This step-by-step guide shows you how to configure Azure Flexible Server for MySQL to replicate data into ClickHouse Cloud using the [MySQL ClickPipe](../index.md). Only **one-time ingestion** is supported for this service. For common questions around MySQL CDC, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).
:::warning
-Continuous ingestion via **CDC is not supported** for this service. Azure Flexible Server for MySQL does not allow configuring the [`binlog_row_metadata`](https://dev.mysql.com/doc/refman/en/replication-options-binary-log.html#sysvar_binlog_row_metadata) system variable to `FULL`, which is required for full-featured MySQL CDC in ClickPipes.
+Continuous ingestion via **CDC isn't supported** for this service. Azure Flexible Server for MySQL doesn't allow configuring the [`binlog_row_metadata`](https://dev.mysql.com/doc/refman/en/replication-options-binary-log.html#sysvar_binlog_row_metadata) system variable to `FULL`, which is required for full-featured MySQL CDC in ClickPipes.
Please submit a feature request in the [Azure feedback forum](https://feedback.azure.com/d365community/forum/47b1e71d-ee24-ec11-b6e6-000d3a4f0da0), upvote [this question](https://learn.microsoft.com/en-us/answers/questions/766047/setting-binlog-row-metadata-to-full-in-azure-db-fo), or [contact Azure support](https://azure.microsoft.com/en-us/support/create-ticket/) to request this capability.
:::
@@ -47,7 +47,7 @@ Connect to your Azure Flexible Server for MySQL instance as an admin user and ex
## Configure network access {#configure-network-access}
:::note
-ClickPipes does not support Azure Private Link connections. If you do not allow public access to your Azure Flexible Server for MySQL instance, you can [use an SSH tunnel](/integrations/clickpipes/mysql/source/azure-flexible-server-mysql#configure-network-access) to connect securely. Azure Private Link will be supported in the future.
+ClickPipes doesn't support Azure Private Link connections. If you don't allow public access to your Azure Flexible Server for MySQL instance, you can [use an SSH tunnel](/integrations/clickpipes/mysql/source/azure-flexible-server-mysql#configure-network-access) to connect securely. Azure Private Link will be supported in the future.
:::
Next, you must allow connections to your Azure Flexible Server for MySQL instance from ClickPipes.
@@ -68,7 +68,7 @@ Next, you must allow connections to your Azure Flexible Server for MySQL instanc
-If you do not allow public access to your Azure Flexible Server for MySQL instance, you must first set up an SSH bastion host to securely tunnel your connection. To set up an SSH bastion host on Azure:
+If you don't allow public access to your Azure Flexible Server for MySQL instance, you must first set up an SSH bastion host to securely tunnel your connection. To set up an SSH bastion host on Azure:
1. Create and start an Azure Virtual Machine (VM) following the [official documentation](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal?tabs=ubuntu).
- Ensure the VM is in the same Virtual Network (VNet) as your Azure Flexible Server for MySQL instance, or in a peered VNet with connectivity.
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/source/generic.md b/docs/integrations/data-ingestion/clickpipes/mysql/source/generic.md
index 3a7cbcac9cb..d345f9ebfac 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/source/generic.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/source/generic.md
@@ -91,7 +91,7 @@ You NEED to RESTART the MySQL instance for the changes to take effect.
:::note
-Column exclusion and schema changes are not supported for MySQL 5.7 and older versions. These features depend on table metadata not available in the binlog prior to [MySQL 8.0.1](https://dev.mysql.com/blog-archive/more-metadata-is-written-into-binary-log/).
+Column exclusion and schema changes aren't supported for MySQL 5.7 and older versions. These features depend on table metadata not available in the binlog prior to [MySQL 8.0.1](https://dev.mysql.com/blog-archive/more-metadata-is-written-into-binary-log/).
:::
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/source/generic_maria.md b/docs/integrations/data-ingestion/clickpipes/mysql/source/generic_maria.md
index 0767506a45d..b6daa68b95f 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/source/generic_maria.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/source/generic_maria.md
@@ -60,7 +60,7 @@ You NEED to RESTART the MariaDB instance for the changes to take effect.
:::note
-Column exclusion is not supported for MariaDB \<= 10.4 because the `binlog_row_metadata` setting wasn't yet introduced.
+Column exclusion isn't supported for MariaDB \<= 10.4 because the `binlog_row_metadata` setting wasn't yet introduced.
:::
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/source/rds_maria.md b/docs/integrations/data-ingestion/clickpipes/mysql/source/rds_maria.md
index fa372aa379b..d956b274fe8 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/source/rds_maria.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/source/rds_maria.md
@@ -40,7 +40,7 @@ The automated backups feature determines whether binary logging is turned on or
Setting backup retention to a reasonably long value depending on the replication use-case is advisable.
### 2. Binlog retention hours {#binlog-retention-hours-rds}
-Amazon RDS for MariaDB has a different method of setting binlog retention duration, which is the amount of time a binlog file containing changes is kept. If some changes are not read before the binlog file is removed, replication will be unable to continue. The default value of binlog retention hours is NULL, which means binary logs aren't retained.
+Amazon RDS for MariaDB has a different method of setting binlog retention duration, which is the amount of time a binlog file containing changes is kept. If some changes aren't read before the binlog file is removed, replication will be unable to continue. The default value of binlog retention hours is NULL, which means binary logs aren't retained.
To specify the number of hours to retain binary logs on a DB instance, use the mysql.rds_set_configuration function with a binlog retention period long enough for replication to occur. `24 hours` is the recommended minimum.
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/table_resync.md b/docs/integrations/data-ingestion/clickpipes/mysql/table_resync.md
index a4a68a757aa..48253a51c2b 100644
--- a/docs/integrations/data-ingestion/clickpipes/mysql/table_resync.md
+++ b/docs/integrations/data-ingestion/clickpipes/mysql/table_resync.md
@@ -23,7 +23,7 @@ This can be followed by following the [table removal guide](./removing_tables).
### 2. Truncate or drop the table on ClickHouse {#truncate-drop-table}
This step is to avoid data duplication when we add this table again in the next step. You can do this by heading over to the **SQL Console** tab in ClickHouse Cloud and running a query.
-Note that we have validation to block table addition if the table already exists in ClickHouse and is not empty.
+Note that we have validation to block table addition if the table already exists in ClickHouse and isn't empty.
### 3. Add the table to the ClickPipe again {#add-table-again}
diff --git a/docs/integrations/data-ingestion/clickpipes/object-storage/amazon-s3/01_overview.md b/docs/integrations/data-ingestion/clickpipes/object-storage/amazon-s3/01_overview.md
index 7bea63d371c..92888cf26c7 100644
--- a/docs/integrations/data-ingestion/clickpipes/object-storage/amazon-s3/01_overview.md
+++ b/docs/integrations/data-ingestion/clickpipes/object-storage/amazon-s3/01_overview.md
@@ -25,12 +25,12 @@ S3 ClickPipes can be deployed and managed manually using the ClickPipes UI, as w
| Name | Logo | Details |
|----------------------|-------------------------------------------------------------------------------------------|-------------------|
| **Amazon S3** | | Continuous ingestion requires [lexicographical order](#continuous-ingestion-lexicographical-order) by default, but can be configured to [ingest files in any order](#continuous-ingestion-any-order). |
-| **Cloudflare R2** _S3-compatible_ | | Continuous ingestion requires [lexicographical order](#continuous-ingestion-lexicographical-order). Unordered mode is not supported. |
-| **DigitalOcean Spaces** _S3-compatible_ | | Continuous ingestion requires [lexicographical order](#continuous-ingestion-lexicographical-order). Unordered mode is not supported. |
-| **OVH Object Storage** _S3-compatible_ | | Continuous ingestion requires [lexicographical order](#continuous-ingestion-lexicographical-order). Unordered mode is not supported. |
+| **Cloudflare R2** _S3-compatible_ | | Continuous ingestion requires [lexicographical order](#continuous-ingestion-lexicographical-order). Unordered mode isn't supported. |
+| **DigitalOcean Spaces** _S3-compatible_ | | Continuous ingestion requires [lexicographical order](#continuous-ingestion-lexicographical-order). Unordered mode isn't supported. |
+| **OVH Object Storage** _S3-compatible_ | | Continuous ingestion requires [lexicographical order](#continuous-ingestion-lexicographical-order). Unordered mode isn't supported. |
:::tip
-Due to differences in URL formats and API implementations across object storage service providers, not all S3-compatible services are supported out-of-the-box. If you're running into issues with a service that is not listed above, please [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
+Due to differences in URL formats and API implementations across object storage service providers, not all S3-compatible services are supported out-of-the-box. If you're running into issues with a service that isn't listed above, please [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
:::
## Supported formats {#supported-formats}
@@ -53,7 +53,7 @@ When continuous ingestion is enabled, ClickPipes continuously ingests data from
#### Lexicographical order {#continuous-ingestion-lexicographical-order}
-By default, the S3 ClickPipe assumes files are added to a bucket in lexicographical order, and relies on this implicit order to ingest files sequentially. This means that any new file **must** be lexically greater than the last ingested file. For example, files named `file1`, `file2`, and `file3` will be ingested sequentially, but if a new `file 0` is added to the bucket, it will be **ignored** because the file name is not lexically greater than the last ingested file.
+By default, the S3 ClickPipe assumes files are added to a bucket in lexicographical order, and relies on this implicit order to ingest files sequentially. This means that any new file **must** be lexically greater than the last ingested file. For example, files named `file1`, `file2`, and `file3` will be ingested sequentially, but if a new `file 0` is added to the bucket, it will be **ignored** because the file name isn't lexically greater than the last ingested file.
In this mode, the S3 ClickPipe does an initial load of **all files** in the specified path, and then polls for new files at a configurable interval (by default, 30 seconds). It is **not possible** to start ingestion from a specific file or point in time — ClickPipes will always load all files in the specified path.
@@ -198,7 +198,7 @@ ClickPipes provides sensible defaults that cover the requirements of most use ca
### Scaling {#scaling}
-Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings will not affect the ClickPipe size.
+Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings won't affect the ClickPipe size.
To increase the throughput on large ingest jobs, we recommend scaling the ClickHouse service before creating the ClickPipe.
@@ -210,10 +210,10 @@ ClickPipes will only attempt to ingest objects that are **10GB or smaller** in s
### Compatibility {#compatibility}
-Despite being S3-compatible, some services use a different URL structure that the S3 ClickPipe might not be able to parse (e.g., Backblaze B2), or require integration with provider-specific queue services for continuous, unordered ingestion. If you're running into issues with a service that is not listed under [Supported data sources](#supported-data-sources), please [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
+Despite being S3-compatible, some services use a different URL structure that the S3 ClickPipe might not be able to parse (e.g., Backblaze B2), or require integration with provider-specific queue services for continuous, unordered ingestion. If you're running into issues with a service that isn't listed under [Supported data sources](#supported-data-sources), please [reach out to our team](https://clickhouse.com/company/contact?loc=clickpipes).
### View support {#view-support}
Materialized views on the target table are also supported. ClickPipes will create staging tables not only for the target table, but also any dependent materialized view.
-We do not create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you are missing data in the materialized view.
+We don't create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you're missing data in the materialized view.
diff --git a/docs/integrations/data-ingestion/clickpipes/object-storage/azure-blob-storage/01_overview.md b/docs/integrations/data-ingestion/clickpipes/object-storage/azure-blob-storage/01_overview.md
index 38c85bbe553..25355d1a33d 100644
--- a/docs/integrations/data-ingestion/clickpipes/object-storage/azure-blob-storage/01_overview.md
+++ b/docs/integrations/data-ingestion/clickpipes/object-storage/azure-blob-storage/01_overview.md
@@ -34,7 +34,7 @@ When continuous ingestion is enabled, ClickPipes continuously ingests data from
#### Lexicographical order {#continuous-ingestion-lexicographical-order}
-The ABS ClickPipe assumes files are added to a container in lexicographical order, and relies on this implicit order to ingest files sequentially. This means that any new file **must** be lexically greater than the last ingested file. For example, files named `file1`, `file2`, and `file3` will be ingested sequentially, but if a new `file 0` is added to the container, it will be **ignored** because the file name is not lexically greater than the last ingested file.
+The ABS ClickPipe assumes files are added to a container in lexicographical order, and relies on this implicit order to ingest files sequentially. This means that any new file **must** be lexically greater than the last ingested file. For example, files named `file1`, `file2`, and `file3` will be ingested sequentially, but if a new `file 0` is added to the container, it will be **ignored** because the file name isn't lexically greater than the last ingested file.
In this mode, the ABS ClickPipe does an initial load of **all files** in the specified path, and then polls for new files at a configurable interval (by default, 30 seconds). It is **not possible** to start ingestion from a specific file or point in time — ClickPipes will always load all files in the specified path.
@@ -89,7 +89,7 @@ Containers must allow the [`s3:GetObject`](https://docs.aws.amazon.com/AmazonS3/
### Authentication {#authentication}
:::note
-Microsoft Entra ID authentication (including Managed Identities) is not currently supported.
+Microsoft Entra ID authentication (including Managed Identities) isn't currently supported.
:::
Azure Blob Storage authentication uses a [connection string](https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string), which supports both access keys and shared access signatures (SAS).
@@ -119,7 +119,7 @@ Generate a SAS token in the Azure Portal under **Storage Account > Shared access
ABS ClickPipes use two distinct network paths for metadata discovery and data ingestion: the ClickPipes service and the ClickHouse Cloud service, respectively. If you want to configure an additional layer of network security (e.g., for compliance reasons), network access **must be configured for both paths**.
:::warning
-IP-based access control **does not work** if your Azure Blob Storage container is in the same Azure region as your ClickHouse Cloud service. When both services are co-located, traffic is routed through Azure's internal network, rather than the public internet.
+IP-based access control **doesn't work** if your Azure Blob Storage container is in the same Azure region as your ClickHouse Cloud service. When both services are co-located, traffic is routed through Azure's internal network, rather than the public internet.
:::
* For **IP-based access control**, the [IP network rules](https://learn.microsoft.com/en-us/azure/storage/common/storage-network-security) for your Azure Storage firewall must allow the static IPs for the ClickPipes service region listed [here](/integrations/clickpipes#list-of-static-ips), as well as the [static IPs](/manage/data-sources/cloud-endpoints-api) for the ClickHouse Cloud service. To obtain the static IPs for your ClickHouse Cloud region, open a terminal and run:
@@ -150,7 +150,7 @@ ClickPipes provides sensible defaults that cover the requirements of most use ca
### Scaling {#scaling}
-Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings will not affect the ClickPipe size.
+Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings won't affect the ClickPipe size.
To increase the throughput on large ingest jobs, we recommend scaling the ClickHouse service before creating the ClickPipe.
@@ -175,7 +175,7 @@ For [continuous ingestion](#continuous-ingestion), ClickPipes must scan the cont
Materialized views on the target table are also supported. ClickPipes will create staging tables not only for the target table, but also any dependent materialized view.
-We do not create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you are missing data in the materialized view.
+We don't create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you're missing data in the materialized view.
### Dependencies {#dependencies}
diff --git a/docs/integrations/data-ingestion/clickpipes/object-storage/google-cloud-storage/01_overview.md b/docs/integrations/data-ingestion/clickpipes/object-storage/google-cloud-storage/01_overview.md
index bf9a971d698..a1979faf9cc 100644
--- a/docs/integrations/data-ingestion/clickpipes/object-storage/google-cloud-storage/01_overview.md
+++ b/docs/integrations/data-ingestion/clickpipes/object-storage/google-cloud-storage/01_overview.md
@@ -36,7 +36,7 @@ When continuous ingestion is enabled, ClickPipes continuously ingests data from
#### Lexicographical order {#continuous-ingestion-lexicographical-order}
-The GCS ClickPipe assumes files are added to a bucket in lexicographical order, and relies on this implicit order to ingest files sequentially. This means that any new file **must** be lexically greater than the last ingested file. For example, files named `file1`, `file2`, and `file3` will be ingested sequentially, but if a new `file 0` is added to the bucket, it will be **ignored** because the file name is not lexically greater than the last ingested file.
+The GCS ClickPipe assumes files are added to a bucket in lexicographical order, and relies on this implicit order to ingest files sequentially. This means that any new file **must** be lexically greater than the last ingested file. For example, files named `file1`, `file2`, and `file3` will be ingested sequentially, but if a new `file 0` is added to the bucket, it will be **ignored** because the file name isn't lexically greater than the last ingested file.
In this mode, the GCS ClickPipe does an initial load of **all files** in the specified path, and then polls for new files at a configurable interval (by default, 30 seconds). It is **not possible** to start ingestion from a specific file or point in time — ClickPipes will always load all files in the specified path.
@@ -91,7 +91,7 @@ The [`roles/storage.objectViewer`](https://docs.cloud.google.com/storage/docs/ac
### Authentication {#authentication}
:::note
-Service account authentication is not currently supported.
+Service account authentication isn't currently supported.
:::
#### HMAC credentials {#hmac-credentials}
@@ -134,7 +134,7 @@ ClickPipes provides sensible defaults that cover the requirements of most use ca
### Scaling {#scaling}
-Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings will not affect the ClickPipe size.
+Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings won't affect the ClickPipe size.
To increase the throughput on large ingest jobs, we recommend scaling the ClickHouse service before creating the ClickPipe.
@@ -150,4 +150,4 @@ The GCS ClickPipe uses on the Cloud Storage [XML API](https://docs.cloud.google.
### View support {#view-support}
Materialized views on the target table are also supported. ClickPipes will create staging tables not only for the target table, but also any dependent materialized view.
-We do not create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you are missing data in the materialized view.
+We don't create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you're missing data in the materialized view.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/auth.md b/docs/integrations/data-ingestion/clickpipes/postgres/auth.md
index e6688a3a22f..520a5f88400 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/auth.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/auth.md
@@ -18,7 +18,7 @@ This article demonstrates how ClickPipes customers can leverage role-based acces
:::warning
For AWS RDS Postgres and Aurora Postgres you can only run `Initial Load Only` ClickPipes due to the limitations of the AWS IAM DB Authentication.
-For MySQL and MariaDB, this limitation does not apply, and you can run both `Initial Load Only` and `CDC` ClickPipes.
+For MySQL and MariaDB, this limitation doesn't apply, and you can run both `Initial Load Only` and `CDC` ClickPipes.
:::
## Setup {#setup}
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md b/docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md
index 748331c0a62..928b64e9a93 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md
@@ -26,7 +26,7 @@ There are two main ways to control the sync of a Postgres ClickPipe. The ClickPi
### Sync interval {#interval}
-The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
+The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse isn't included in this interval.
The default is **1 minute**.
Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
@@ -60,7 +60,7 @@ This will open a flyout with the sync settings, where you can change the sync in
### Tweaking the sync settings to help with replication slot growth {#tweaking}
Let's talk about how to use these settings to handle a large replication slot of a CDC pipe.
-The pushing time to ClickHouse does not scale linearly with the pulling time from the source database. This can be leveraged to reduce the size of a large replication slot.
+The pushing time to ClickHouse doesn't scale linearly with the pulling time from the source database. This can be leveraged to reduce the size of a large replication slot.
By increasing both the sync interval and pull batch size, the ClickPipe will pull a whole lot of data from the source database in one go, and then push it to ClickHouse.
### Monitoring sync control behaviour {#monitoring}
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md b/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md
index ca58c67925d..0e86b4b2046 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md
@@ -23,7 +23,7 @@ ClickPipes uses [Postgres Logical Decoding](https://www.pgedge.com/blog/logical-
### ReplacingMergeTree {#replacingmergetree}
-ClickPipes maps Postgres tables to ClickHouse using the [ReplacingMergeTree](/engines/table-engines/mergetree-family/replacingmergetree) engine. ClickHouse performs best with append-only workloads and does not recommend frequent UPDATEs. This is where ReplacingMergeTree is particularly powerful.
+ClickPipes maps Postgres tables to ClickHouse using the [ReplacingMergeTree](/engines/table-engines/mergetree-family/replacingmergetree) engine. ClickHouse performs best with append-only workloads and doesn't recommend frequent UPDATEs. This is where ReplacingMergeTree is particularly powerful.
With ReplacingMergeTree, updates are modeled as inserts with a newer version (`_peerdb_version`) of the row, while deletes are inserts with a newer version and `_peerdb_is_deleted` marked as true. The ReplacingMergeTree engine deduplicates/merges data in the background, and retains the latest version of the row for a given primary key (id), enabling efficient handling of UPDATEs and DELETEs as versioned inserts.
@@ -159,7 +159,7 @@ This section will explore techniques for deduplicating data while keeping the or
#### Views {#views}
-[Views](/sql-reference/statements/create/view#normal-view) are a great way to hide the FINAL keyword from the query, as they do not store any data and simply perform a read from another table on each access.
+[Views](/sql-reference/statements/create/view#normal-view) are a great way to hide the FINAL keyword from the query, as they don't store any data and simply perform a read from another table on each access.
Below is an example of creating views for each table of our database in ClickHouse with the FINAL keyword and filter for the deleted rows.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/faq.md b/docs/integrations/data-ingestion/clickpipes/postgres/faq.md
index 173e3b5d8e4..6d2dfa052d2 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/faq.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/faq.md
@@ -35,7 +35,7 @@ Please refer to the [Postgres Generated Columns: Gotchas and Best Practices](./g
For a table to be replicated using ClickPipes for Postgres, it must have either a primary key or a [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY) defined.
- **Primary Key**: The most straightforward approach is to define a primary key on the table. This provides a unique identifier for each row, which is crucial for tracking updates and deletions. You can have REPLICA IDENTITY set to `DEFAULT` (the default behavior) in this case.
-- **Replica Identity**: If a table does not have a primary key, you can set a replica identity. The replica identity can be set to `FULL`, which means that the entire row will be used to identify changes. Alternatively, you can set it to use a unique index if one exists on the table, and then set REPLICA IDENTITY to `USING INDEX index_name`.
+- **Replica Identity**: If a table doesn't have a primary key, you can set a replica identity. The replica identity can be set to `FULL`, which means that the entire row will be used to identify changes. Alternatively, you can set it to use a unique index if one exists on the table, and then set REPLICA IDENTITY to `USING INDEX index_name`.
To set the replica identity to FULL, you can use the following SQL command:
```sql
ALTER TABLE your_table_name REPLICA IDENTITY FULL;
@@ -44,7 +44,7 @@ REPLICA IDENTITY FULL also enables replication of unchanged TOAST columns. More
Note that using `REPLICA IDENTITY FULL` can have performance implications and also faster WAL growth, especially for tables without a primary key and with frequent updates or deletes, as it requires more data to be logged for each change. If you have any doubts or need assistance with setting up primary keys or replica identities for your tables, please reach out to our support team for guidance.
-It's important to note that if neither a primary key nor a replica identity is defined, ClickPipes will not be able to replicate changes for that table, and you may encounter errors during the replication process. Therefore, it's recommended to review your table schemas and ensure that they meet these requirements before setting up your ClickPipe.
+It's important to note that if neither a primary key nor a replica identity is defined, ClickPipes won't be able to replicate changes for that table, and you may encounter errors during the replication process. Therefore, it's recommended to review your table schemas and ensure that they meet these requirements before setting up your ClickPipe.
### Do you support partitioned tables as part of Postgres CDC? {#do-you-support-partitioned-tables-as-part-of-postgres-cdc}
@@ -65,7 +65,7 @@ Yes! ClickPipes for Postgres offers two ways to connect to databases in private
- us-east-2
- eu-central-1
- For detailed setup instructions, see our [PrivateLink documentation](/knowledgebase/aws-privatelink-setup-for-clickpipes)
- - For regions where PrivateLink is not available, please use SSH tunneling
+ - For regions where PrivateLink isn't available, please use SSH tunneling
### How do you handle UPDATEs and DELETEs? {#how-do-you-handle-updates-and-deletes}
@@ -73,7 +73,7 @@ ClickPipes for Postgres captures both INSERTs and UPDATEs from Postgres as new r
DELETEs from Postgres are propagated as new rows marked as deleted (using the `_peerdb_is_deleted` column). Since the deduplication process is asynchronous, you might temporarily see duplicates. To address this, you need to handle deduplication at the query layer.
-Also note that by default, Postgres does not send column values of columns that are not part of the primary key or replica identity during DELETE operations. If you want to capture the full row data during DELETEs, you can set the [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY) to FULL.
+Also note that by default, Postgres doesn't send column values of columns that aren't part of the primary key or replica identity during DELETE operations. If you want to capture the full row data during DELETEs, you can set the [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY) to FULL.
For more details, refer to:
@@ -83,14 +83,14 @@ For more details, refer to:
### Can I update primary key columns in PostgreSQL? {#can-i-update-primary-key-columns-in-postgresql}
:::warning
-Primary key updates in PostgreSQL cannot be properly replayed in ClickHouse by default.
+Primary key updates in PostgreSQL can't be properly replayed in ClickHouse by default.
This limitation exists because `ReplacingMergeTree` deduplication works based on the `ORDER BY` columns (which typically correspond to the primary key). When a primary key is updated in PostgreSQL, it appears as a new row with a different key in ClickHouse, rather than an update to the existing row. This can lead to both the old and new primary key values existing in your ClickHouse table.
:::
-Note that updating primary key columns is not a common practice in PostgreSQL database design, as primary keys are intended to be immutable identifiers. Most applications avoid primary key updates by design, making this limitation rarely encountered in typical use cases.
+Note that updating primary key columns isn't a common practice in PostgreSQL database design, as primary keys are intended to be immutable identifiers. Most applications avoid primary key updates by design, making this limitation rarely encountered in typical use cases.
-There is an experimental setting available that can enable primary key update handling, but it comes with significant performance implications and is not recommended for production use without careful consideration.
+There is an experimental setting available that can enable primary key update handling, but it comes with significant performance implications and isn't recommended for production use without careful consideration.
If your use case requires updating primary key columns in PostgreSQL and having those changes properly reflected in ClickHouse, please reach out to our support team at [db-integrations-support@clickhouse.com](mailto:db-integrations-support@clickhouse.com) to discuss your specific requirements and potential solutions.
@@ -108,7 +108,7 @@ If you're noticing that the size of your Postgres replication slot keeps increas
1. **Sudden Spikes in Database Activity**
- Large batch updates, bulk inserts, or significant schema changes can quickly generate a lot of WAL data.
- - The replication slot will hold these WAL records until they are consumed, causing a temporary spike in size.
+ - The replication slot will hold these WAL records until they're consumed, causing a temporary spike in size.
2. **Long-Running Transactions**
- An open transaction forces Postgres to keep all WAL segments generated since the transaction began, which can dramatically increase slot size.
@@ -152,26 +152,26 @@ Currently, we don't support defining custom data type mappings as part of the pi
### How are JSON and JSONB columns replicated from Postgres? {#how-are-json-and-jsonb-columns-replicated-from-postgres}
-JSON and JSONB columns are replicated as String type in ClickHouse. Since ClickHouse supports a native [JSON type](/sql-reference/data-types/newjson), you can create a materialized view over the ClickPipes tables to perform the translation if needed. Alternatively, you can use [JSON functions](/sql-reference/functions/json-functions) directly on the String column(s). We are actively working on a feature that replicates JSON and JSONB columns directly to the JSON type in ClickHouse. This feature is expected to be available in a few months.
+JSON and JSONB columns are replicated as String type in ClickHouse. Since ClickHouse supports a native [JSON type](/sql-reference/data-types/newjson), you can create a materialized view over the ClickPipes tables to perform the translation if needed. Alternatively, you can use [JSON functions](/sql-reference/functions/json-functions) directly on the String column(s). We're actively working on a feature that replicates JSON and JSONB columns directly to the JSON type in ClickHouse. This feature is expected to be available in a few months.
### What happens to inserts when a mirror is paused? {#what-happens-to-inserts-when-a-mirror-is-paused}
-When you pause the mirror, the messages are queued up in the replication slot on the source Postgres, ensuring they are buffered and not lost. However, pausing and resuming the mirror will re-establish the connection, which could take some time depending on the source.
+When you pause the mirror, the messages are queued up in the replication slot on the source Postgres, ensuring they're buffered and not lost. However, pausing and resuming the mirror will re-establish the connection, which could take some time depending on the source.
During this process, both the sync (pulling data from Postgres and streaming it into the ClickHouse raw table) and normalize (from raw table to target table) operations are aborted. However, they retain the state required to resume durably.
-- For sync, if it is canceled mid-way, the confirmed_flush_lsn in Postgres is not advanced, so the next sync will start from the same position as the aborted one, ensuring data consistency.
+- For sync, if it is canceled mid-way, the confirmed_flush_lsn in Postgres isn't advanced, so the next sync will start from the same position as the aborted one, ensuring data consistency.
- For normalize, the ReplacingMergeTree insert order handles deduplication.
In summary, while sync and normalize processes are terminated during a pause, it is safe to do so as they can resume without data loss or inconsistency.
### Can ClickPipe creation be automated or done via API or CLI? {#can-clickpipe-creation-be-automated-or-done-via-api-or-cli}
-A Postgres ClickPipe can also be created and managed via [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi) endpoints. This feature is in beta, and the API reference can be found [here](https://clickhouse.com/docs/cloud/manage/api/swagger#tag/beta). We are actively working on Terraform support to create Postgres ClickPipes as well.
+A Postgres ClickPipe can also be created and managed via [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi) endpoints. This feature is in beta, and the API reference can be found [here](https://clickhouse.com/docs/cloud/manage/api/swagger#tag/beta). We're actively working on Terraform support to create Postgres ClickPipes as well.
### How do I speed up my initial load? {#how-do-i-speed-up-my-initial-load}
-You cannot speed up an already running initial load. However, you can optimize future initial loads by adjusting certain settings. By default, the settings are configured with 4 parallel threads and a snapshot number of rows per partition set to 100,000. These are advanced settings and are generally sufficient for most use cases.
+You can't speed up an already running initial load. However, you can optimize future initial loads by adjusting certain settings. By default, the settings are configured with 4 parallel threads and a snapshot number of rows per partition set to 100,000. These are advanced settings and are generally sufficient for most use cases.
For Postgres versions 13 or lower, CTID range scans are slower, and these settings become more critical. In such cases, consider the following process to improve performance:
@@ -179,7 +179,7 @@ For Postgres versions 13 or lower, CTID range scans are slower, and these settin
2. **Delete destination tables on ClickHouse**: Ensure that the tables created by the previous pipe are removed.
3. **Create a new pipe with optimized settings**: Typically, increase the snapshot number of rows per partition to between 1 million and 10 million, depending on your specific requirements and the load your Postgres instance can handle.
-These adjustments should significantly enhance the performance of the initial load, especially for older Postgres versions. If you are using Postgres 14 or later, these settings are less impactful due to improved support for CTID range scans.
+These adjustments should significantly enhance the performance of the initial load, especially for older Postgres versions. If you're using Postgres 14 or later, these settings are less impactful due to improved support for CTID range scans.
### How should I scope my publications when setting up replication? {#how-should-i-scope-my-publications-when-setting-up-replication}
@@ -222,7 +222,7 @@ For manually created publications, please add any tables you want to the publica
:::
:::warning
-If you're replicating from a Postgres read replica/hot standby, you will need to create your own publication on the primary instance, which will automatically propagate to the standby. The ClickPipe will not be able to manage the publication in this case as you're unable to create publications on a standby.
+If you're replicating from a Postgres read replica/hot standby, you will need to create your own publication on the primary instance, which will automatically propagate to the standby. The ClickPipe won't be able to manage the publication in this case as you're unable to create publications on a standby.
:::
### Recommended `max_slot_wal_keep_size` settings {#recommended-max_slot_wal_keep_size-settings}
@@ -277,7 +277,7 @@ The only way to recover ClickPipe is by triggering a resync, which you can do in
The most common cause of replication slot invalidation is a low `max_slot_wal_keep_size` setting on your PostgreSQL database (e.g., a few gigabytes). We recommend increasing this value. [Refer to this section](/integrations/clickpipes/postgres/faq#recommended-max_slot_wal_keep_size-settings) on tuning `max_slot_wal_keep_size`. Ideally, this should be set to at least 200GB to prevent replication slot invalidation.
-In rare cases, we have seen this issue occur even when `max_slot_wal_keep_size` is not configured. This could be due to an intricate and rare bug in PostgreSQL, although the cause remains unclear.
+In rare cases, we have seen this issue occur even when `max_slot_wal_keep_size` isn't configured. This could be due to an intricate and rare bug in PostgreSQL, although the cause remains unclear.
### I am seeing out of memory (OOMs) on ClickHouse while my ClickPipe is ingesting data. Can you help? {#i-am-seeing-out-of-memory-ooms-on-clickhouse-while-my-clickpipe-is-ingesting-data-can-you-help}
@@ -293,7 +293,7 @@ Another reason we've observed is the presence of downstream Materialized Views w
The `invalid snapshot identifier` error occurs when there is a connection drop between ClickPipes and your Postgres database. This can happen due to gateway timeouts, database restarts, or other transient issues.
-It is recommended that you do not carry out any disruptive operations like upgrades or restarts on your Postgres database while Initial Load is in progress and ensure that the network connection to your database is stable.
+It is recommended that you don't carry out any disruptive operations like upgrades or restarts on your Postgres database while Initial Load is in progress and ensure that the network connection to your database is stable.
To resolve this issue, you can trigger a resync from the ClickPipes UI. This will restart the initial load process from the beginning.
@@ -320,7 +320,7 @@ WITH (publish_via_partition_root = true);
### What if I am seeing `Unexpected Datatype` errors or `Cannot parse type XX ...` {#what-if-i-am-seeing-unexpected-datatype-errors}
-This error typically occurs when the source Postgres database has a datatype which cannot be mapped during ingestion.
+This error typically occurs when the source Postgres database has a datatype which can't be mapped during ingestion.
For more specific issue, refer to the possibilities below.
### I'm seeing errors like `invalid memory alloc request size ` during replication/slot creation {#postgres-invalid-memalloc-bug}
@@ -335,10 +335,10 @@ CREATE PUBLICATION FOR TABLES IN SCHEMA WITH (publish =
```
Then when [setting up](https://clickhouse.com/docs/integrations/clickpipes/postgres#configuring-the-replication-settings) your Postgres ClickPipe, make sure this publication name is selected.
-Note that TRUNCATE operations are ignored by ClickPipes and will not be replicated to ClickHouse.
+Note that TRUNCATE operations are ignored by ClickPipes and won't be replicated to ClickHouse.
### Why can I not replicate my table which has a dot in it? {#replicate-table-dot}
-PeerDB has a limitation currently where dots in source table identifiers - aka either schema name or table name - is not supported for replication as PeerDB cannot discern, in that case, what is the schema and what is the table as it splits on dot.
+PeerDB has a limitation currently where dots in source table identifiers - aka either schema name or table name - isn't supported for replication as PeerDB can't discern, in that case, what is the schema and what is the table as it splits on dot.
Effort is being made to support input of schema and table separately to get around this limitation.
### Initial load completed but there is no/missing data on ClickHouse. What could be the issue? {#initial-load-issue}
@@ -356,16 +356,16 @@ If the source is configured accordingly, the slot is preserved after failovers t
### I am seeing errors like `Internal error encountered during logical decoding of aborted sub-transaction` {#transient-logical-decoding-errors}
-This error suggests a transient issue with the logical decoding of aborted sub-transaction, and is specific to custom implementations of Aurora Postgres. Given the error is coming from `ReorderBufferPreserveLastSpilledSnapshot` routine, this suggests that logical decoding is not able to read the snapshot spilled to disk. It may be worth trying to increase [`logical_decoding_work_mem`](https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-LOGICAL-DECODING-WORK-MEM) to a higher value.
+This error suggests a transient issue with the logical decoding of aborted sub-transaction, and is specific to custom implementations of Aurora Postgres. Given the error is coming from `ReorderBufferPreserveLastSpilledSnapshot` routine, this suggests that logical decoding isn't able to read the snapshot spilled to disk. It may be worth trying to increase [`logical_decoding_work_mem`](https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-LOGICAL-DECODING-WORK-MEM) to a higher value.
### I am seeing errors like `error converting new tuple to map` or `error parsing logical message` during CDC replication {#logical-message-processing-errors}
-Postgres sends information about changes in the form of messages that have a fixed protocol. These errors arise when the ClickPipe receives a message that it is unable to parse, either due to corruption in transit or invalid messages being sent. While the exact issue tends to vary, we've seen several cases from Neon Postgres sources. In case you are seeing this issue with Neon as well, please raise a support ticket with them. In other cases, please reach out to our support team for guidance.
+Postgres sends information about changes in the form of messages that have a fixed protocol. These errors arise when the ClickPipe receives a message that it is unable to parse, either due to corruption in transit or invalid messages being sent. While the exact issue tends to vary, we've seen several cases from Neon Postgres sources. In case you're seeing this issue with Neon as well, please raise a support ticket with them. In other cases, please reach out to our support team for guidance.
### Can I include columns I initially excluded from replication? {#include-excluded-columns}
-This is not yet supported, an alternative would be to [resync the table](./table_resync.md) whose columns you want to include.
+This isn't yet supported, an alternative would be to [resync the table](./table_resync.md) whose columns you want to include.
-### I am noticing that my ClickPipe has entered Snapshot but data is not flowing in, what could be the issue? {#snapshot-no-data-flow}
+### I am noticing that my ClickPipe has entered Snapshot but data isn't flowing in, what could be the issue? {#snapshot-no-data-flow}
This can be for a few reasons, mainly around some prerequisites for snapshotting taking longer than usual. For more information, do read our doc on parallel snapshotting [here](./parallel_initial_load.md).
#### Parallel snapshotting is taking time to obtain partitions {#parallel-snapshotting-taking-time}
Parallel snapshotting has a few initial steps to obtain logical partitions for your tables. If your tables are small, this will finish in a matter of seconds however for very large (order of terabytes) tables, this can take longer. You can monitor the queries running on your Postgres source in the **Source** tab to see if there are any long running queries related to obtaining partitions for snapshotting. Once the partitions are obtained, data will start flowing in.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/index.md b/docs/integrations/data-ingestion/clickpipes/postgres/index.md
index 7b636118c92..b4787eedb5c 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/index.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/index.md
@@ -43,13 +43,13 @@ To get started, you first need to make sure that your Postgres database is set u
7. [Crunchy Bridge Postgres](./postgres/source/crunchy-postgres)
-8. [Generic Postgres Source](./postgres/source/generic), if you are using any other Postgres provider or using a self-hosted instance.
+8. [Generic Postgres Source](./postgres/source/generic), if you're using any other Postgres provider or using a self-hosted instance.
-9. [TimescaleDB](./postgres/source/timescale), if you are using the TimescaleDB extension on a managed service or self-hosted instance.
+9. [TimescaleDB](./postgres/source/timescale), if you're using the TimescaleDB extension on a managed service or self-hosted instance.
:::warning
-Postgres Proxies like PgBouncer, RDS Proxy, Supabase Pooler, etc., are not supported for CDC based replication. Please make sure to NOT use them for the ClickPipes setup and instead add connection details of the actual Postgres database.
+Postgres Proxies like PgBouncer, RDS Proxy, Supabase Pooler, etc., aren't supported for CDC based replication. Please make sure to NOT use them for the ClickPipes setup and instead add connection details of the actual Postgres database.
:::
@@ -57,7 +57,7 @@ Once your source Postgres database is set up, you can continue creating your Cli
## Creating your ClickPipe {#creating-your-clickpipe}
-Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
+Make sure you're logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
[//]: # ( TODO update image here)
1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service.
@@ -93,7 +93,7 @@ You can follow the [setup guide to set up the connection](/integrations/clickpip
#### (Optional) Setting up SSH tunneling {#optional-setting-up-ssh-tunneling}
-You can specify SSH tunneling details if your source Postgres database is not publicly accessible.
+You can specify SSH tunneling details if your source Postgres database isn't publicly accessible.
1. Enable the "Use SSH Tunnelling" toggle.
2. Fill in the SSH connection details.
@@ -136,7 +136,7 @@ You can configure the Advanced settings if needed. A brief description of each s
7. You can select the tables you want to replicate from the source Postgres database. While selecting the tables, you can also choose to rename the tables in the destination ClickHouse database as well as exclude specific columns.
:::warning
- If you are defining an ordering key in ClickHouse differently than from the primary key in Postgres, don't forget to read all the [considerations](/integrations/clickpipes/postgres/ordering_keys) around it
+ If you're defining an ordering key in ClickHouse differently than from the primary key in Postgres, don't forget to read all the [considerations](/integrations/clickpipes/postgres/ordering_keys) around it
:::
### Review permissions and start the ClickPipe {#review-permissions-and-start-the-clickpipe}
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md b/docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md
index de9b34fe084..9e671af353c 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md
@@ -24,7 +24,7 @@ After a pipe is provisioned, it enters the `Setup` state. This state is where we
## Snapshot {#snapshot}
-Once setup is complete, we enter the `Snapshot` state (unless it's a CDC-only pipe, which would transition to `Running`). `Snapshot`, `Initial Snapshot` and `Initial Load` (more common) are interchangeable terms. In this state, we take a snapshot of the source database tables and load them into ClickHouse. This does not use logical replication, but the replication slot is created at this step, therefore your `max_slot_wal_keep_size` and storage parameters should account for slot growth during initial load. For more information on initial load, see the [parallel initial load documentation](./parallel_initial_load). The pipe will also enter the `Snapshot` state when a resync is triggered or when new tables are added to an existing pipe.
+Once setup is complete, we enter the `Snapshot` state (unless it's a CDC-only pipe, which would transition to `Running`). `Snapshot`, `Initial Snapshot` and `Initial Load` (more common) are interchangeable terms. In this state, we take a snapshot of the source database tables and load them into ClickHouse. This doesn't use logical replication, but the replication slot is created at this step, therefore your `max_slot_wal_keep_size` and storage parameters should account for slot growth during initial load. For more information on initial load, see the [parallel initial load documentation](./parallel_initial_load). The pipe will also enter the `Snapshot` state when a resync is triggered or when new tables are added to an existing pipe.
## Running {#running}
@@ -39,7 +39,7 @@ Once the pipe is in the `Running` state, you can pause it. This will stop the CD
:::note
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
:::
-When you click on the Pause button, the pipe enters the `Pausing` state. This is a transient state where we are in the process of stopping the CDC process. Once the CDC process is fully stopped, the pipe will enter the `Paused` state.
+When you click on the Pause button, the pipe enters the `Pausing` state. This is a transient state where we're in the process of stopping the CDC process. Once the CDC process is fully stopped, the pipe will enter the `Paused` state.
## Modifying {#modifying}
:::note
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md b/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md
index 6da44276caa..295fe29c1ad 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md
@@ -45,9 +45,9 @@ For example, in a multi-tenant SaaS application, using (`tenant_id`, `id`) as th
For Postgres CDC to function as expected, it is important to modify the `REPLICA IDENTITY` on tables to include the ordering key columns. This is essential for handling DELETEs accurately.
-If the `REPLICA IDENTITY` does not include the ordering key columns, Postgres CDC will not capture the values of columns other than the primary key - this is a limitation of Postgres logical decoding. All ordering key columns besides the primary key in Postgres will have nulls. This affects deduplication, meaning the previous version of the row may not be deduplicated with the latest deleted version (where `_peerdb_is_deleted` is set to 1).
+If the `REPLICA IDENTITY` doesn't include the ordering key columns, Postgres CDC won't capture the values of columns other than the primary key - this is a limitation of Postgres logical decoding. All ordering key columns besides the primary key in Postgres will have nulls. This affects deduplication, meaning the previous version of the row may not be deduplicated with the latest deleted version (where `_peerdb_is_deleted` is set to 1).
-In the above example with `owneruserid` and `id`, if the primary key does not already include `owneruserid`, you need to have a `UNIQUE INDEX` on (`owneruserid`, `id`) and set it as the `REPLICA IDENTITY` for the table. This ensures that Postgres CDC captures the necessary column values for accurate replication and deduplication.
+In the above example with `owneruserid` and `id`, if the primary key doesn't already include `owneruserid`, you need to have a `UNIQUE INDEX` on (`owneruserid`, `id`) and set it as the `REPLICA IDENTITY` for the table. This ensures that Postgres CDC captures the necessary column values for accurate replication and deduplication.
Below is an example of how to do this on the events table. Make sure to apply this to all tables with modified ordering keys.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md b/docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md
index 54498bf741a..cdd38496a08 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md
@@ -21,7 +21,7 @@ Initial load is the first phase of a CDC ClickPipe, where the ClickPipe syncs th
However, the Postgres ClickPipe can parallelize this process, which can significantly speed up the initial load.
### CTID column in Postgres {#ctid-pg-snapshot}
-In Postgres, every row in a table has a unique identifier called the CTID. This is a system column that is not visible to you by default, but it can be used to uniquely identify rows in a table. The CTID is a combination of the block number and the offset within the block, which allows for efficient access to rows.
+In Postgres, every row in a table has a unique identifier called the CTID. This is a system column that isn't visible to you by default, but it can be used to uniquely identify rows in a table. The CTID is a combination of the block number and the offset within the block, which allows for efficient access to rows.
### Logical partitioning {#logical-partitioning-pg-snapshot}
The Postgres ClickPipe uses the CTID column to logically partition source tables. It obtains the partitions by first performing a COUNT(*) on the source table, followed by a window function partitioning query to get the CTID ranges for each partition. This allows the ClickPipe to read the source table in parallel, with each partition being processed by a separate thread.
@@ -48,6 +48,6 @@ You can analyze **pg_stat_activity** to see the parallel snapshot in action. The
### Limitations {#limitations-parallel-pg-snapshot}
-- The snapshot parameters cannot be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
-- When adding tables to an existing ClickPipe, you cannot change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.
-- The partition key column should not contain `NULL`s, as they are skipped by the partitioning logic.
+- The snapshot parameters can't be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
+- When adding tables to an existing ClickPipe, you can't change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.
+- The partition key column shouldn't contain `NULL`s, as they're skipped by the partitioning logic.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/pause_and_resume.md b/docs/integrations/data-ingestion/clickpipes/postgres/pause_and_resume.md
index 0b08404b1b3..bdbe273ad84 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/pause_and_resume.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/pause_and_resume.md
@@ -35,7 +35,7 @@ There are scenarios where it would be useful to pause a Postgres ClickPipe. For
5. In around 5 seconds (and also on page refresh), the status of the pipe should be **Paused**.
:::warning
-Pausing a Postgres ClickPipe will not pause the growth of replication slots.
+Pausing a Postgres ClickPipe won't pause the growth of replication slots.
:::
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md b/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md
index e687fba413d..44c34f061de 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md
@@ -13,11 +13,11 @@ When using PostgreSQL's generated columns in tables that are being replicated, t
## The problem with generated columns {#the-problem-with-generated-columns}
-1. **Not Published via `pgoutput`:** Generated columns are not published through the `pgoutput` logical replication plugin. This means that when you're replicating data from PostgreSQL to another system, the values of generated columns are not included in the replication stream.
+1. **Not Published via `pgoutput`:** Generated columns aren't published through the `pgoutput` logical replication plugin. This means that when you're replicating data from PostgreSQL to another system, the values of generated columns aren't included in the replication stream.
-2. **Issues with Primary Keys:** If a generated column is part of your primary key, it can cause deduplication problems on the destination. Since the generated column values are not replicated, the destination system won't have the necessary information to properly identify and deduplicate rows.
+2. **Issues with Primary Keys:** If a generated column is part of your primary key, it can cause deduplication problems on the destination. Since the generated column values aren't replicated, the destination system won't have the necessary information to properly identify and deduplicate rows.
-3. **Issues with schema changes**: If you add a generated column to a table that is already being replicated, the new column will not be populated in the destination - as Postgres does not give us the RelationMessage for the new column. If you then add a new non-generated column to the same table, the ClickPipe, when trying to reconcile the schema, will not be able to find the generated column in the destination, leading to a failure in the replication process.
+3. **Issues with schema changes**: If you add a generated column to a table that is already being replicated, the new column won't be populated in the destination - as Postgres doesn't give us the RelationMessage for the new column. If you then add a new non-generated column to the same table, the ClickPipe, when trying to reconcile the schema, won't be able to find the generated column in the destination, leading to a failure in the replication process.
## Best practices {#best-practices}
@@ -29,7 +29,7 @@ To work around these limitations, consider the following best practices:
## Upcoming improvements to UI {#upcoming-improvements-to-ui}
-In upcoming versions, we are planning to add a UI to help you with the following:
+In upcoming versions, we're planning to add a UI to help you with the following:
1. **Identify Tables with Generated Columns:** The UI will have a feature to identify tables that contain generated columns. This will help you understand which tables are affected by this issue.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/schema-changes.md b/docs/integrations/data-ingestion/clickpipes/postgres/schema-changes.md
index 75af1c3ed36..3f6d7c5a4a7 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/schema-changes.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/schema-changes.md
@@ -16,7 +16,7 @@ ClickPipes for Postgres can detect schema changes in the source tables and, in s
| Schema Change Type | Behaviour |
| ----------------------------------------------------------------------------------- | ------------------------------------- |
| Adding a new column (`ALTER TABLE ADD COLUMN ...`) | Propagated automatically once the table gets an insert/update/delete. The new column(s) will be populated for all rows replicated after the schema change |
-| Adding a new column with a default value (`ALTER TABLE ADD COLUMN ... DEFAULT ...`) | Propagated automatically once the table gets an insert/update/delete. The new column(s) will be populated for all rows replicated after the schema change, but existing rows will not show the default value without a full table refresh |
+| Adding a new column with a default value (`ALTER TABLE ADD COLUMN ... DEFAULT ...`) | Propagated automatically once the table gets an insert/update/delete. The new column(s) will be populated for all rows replicated after the schema change, but existing rows won't show the default value without a full table refresh |
| Dropping an existing column (`ALTER TABLE DROP COLUMN ...`) | Detected, but **not** propagated. The dropped column(s) will be populated with `NULL` for all rows replicated after the schema change |
Note that column addition will be propagated at the end of a batch's sync, which could occur after the sync interval or pull batch size is reached. More information on controlling syncs [here](./controlling_sync.md)
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/alloydb.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/alloydb.md
index a0d704a8f04..2f376bfe7a2 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/alloydb.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/alloydb.md
@@ -106,7 +106,7 @@ Connect to your AlloyDB instance as an admin user and execute the following comm
## Configure network access {#configure-network-access}
:::note
-ClickPipes does not support Private Service Connect (PSC) connections. If you do not allow public access to your AlloyDB instance, you can [use an SSH tunnel](#configure-network-access) to connect securely. PSC will be supported in the future.
+ClickPipes doesn't support Private Service Connect (PSC) connections. If you don't allow public access to your AlloyDB instance, you can [use an SSH tunnel](#configure-network-access) to connect securely. PSC will be supported in the future.
:::
Next, you must allow connections to your AlloyDB instance from ClickPipes.
@@ -133,7 +133,7 @@ Next, you must allow connections to your AlloyDB instance from ClickPipes.
-If you do not allow public access to your AlloyDB instance, you must first set up an SSH bastion host to securely tunnel your connection. To set up an SSH bastion host on Google Cloud Platform:
+If you don't allow public access to your AlloyDB instance, you must first set up an SSH bastion host to securely tunnel your connection. To set up an SSH bastion host on Google Cloud Platform:
1. Create and start a Google Compute Engine (GCE) instance following the [official documentation](https://cloud.google.com/compute/docs/instances/create-start-instance).
- Ensure the GCE instance is in the same Virtual Private Network (VPC) as your AlloyDB instance.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md
index 772004c0614..90e28b0b809 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md
@@ -22,7 +22,7 @@ ClickPipes supports Postgres version 12 and later.
## Enable logical replication {#enable-logical-replication}
-**You don't need** to follow the below steps if `wal_level` is set to `logical`. This setting should mostly be pre-configured if you are migrating from another data replication tool.
+**You don't need** to follow the below steps if `wal_level` is set to `logical`. This setting should mostly be pre-configured if you're migrating from another data replication tool.
1. Click on the **Server parameters** section
@@ -91,7 +91,7 @@ Connect to your Azure Flexible Server Postgres through the admin user and run th
Please follow the below steps to add [ClickPipes IPs](../../index.md#list-of-static-ips) to your network.
1. Go to the **Networking** tab and add the [ClickPipes IPs](../../index.md#list-of-static-ips) to the Firewall
- of your Azure Flexible Server Postgres OR the Jump Server/Bastion if you are using SSH tunneling.
+ of your Azure Flexible Server Postgres OR the Jump Server/Bastion if you're using SSH tunneling.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md
index 56e7e3b8f49..e966fdeec52 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md
@@ -48,7 +48,7 @@ ClickPipes supports Postgres version 12 and later.
SHOW max_replication_slots;
```
- If the values do not match the recommended values, you can run the following SQL commands to set them:
+ If the values don't match the recommended values, you can run the following SQL commands to set them:
```sql
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;
@@ -101,7 +101,7 @@ Connect to your Postgres instance as an admin user and execute the following com
## Enabling connections in pg_hba.conf to the ClickPipes User {#enabling-connections-in-pg_hbaconf-to-the-clickpipes-user}
-If you are self serving, you need to allow connections to the ClickPipes user from the ClickPipes IP addresses by following the below steps. If you are using a managed service, you can do the same by following the provider's documentation.
+If you're self serving, you need to allow connections to the ClickPipes user from the ClickPipes IP addresses by following the below steps. If you're using a managed service, you can do the same by following the provider's documentation.
1. Make necessary changes to the `pg_hba.conf` file to allow connections to the ClickPipes user from the ClickPipes IP addresses. An example entry in the `pg_hba.conf` file would look like:
```response
@@ -115,7 +115,7 @@ If you are self serving, you need to allow connections to the ClickPipes user fr
## Increase `max_slot_wal_keep_size` {#increase-max_slot_wal_keep_size}
-This is a recommended configuration change to ensure that large transactions/commits do not cause the replication slot to be dropped.
+This is a recommended configuration change to ensure that large transactions/commits don't cause the replication slot to be dropped.
You can increase the `max_slot_wal_keep_size` parameter for your PostgreSQL instance to a higher value (at least 100GB or `102400`) by updating the `postgresql.conf` file.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md
index 741d66ca28d..fd2580f6d85 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md
@@ -34,7 +34,7 @@ Anything on or after Postgres 12
## Enable logical replication {#enable-logical-replication}
-**You don't need** to follow the below steps if the settings `cloudsql. logical_decoding` is on and `wal_sender_timeout` is 0. These settings should mostly be pre-configured if you are migrating from another data replication tool.
+**You don't need** to follow the below steps if the settings `cloudsql. logical_decoding` is on and `wal_sender_timeout` is 0. These settings should mostly be pre-configured if you're migrating from another data replication tool.
1. Click on **Edit** button on the Overview page.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/planetscale.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/planetscale.md
index 74c4ecffbcb..0fc077850a0 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/planetscale.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/planetscale.md
@@ -94,7 +94,7 @@ Connect to your PlanetScale Postgres instance using the default `postgres.<...>`
## Caveats {#caveats}
1. To connect to PlanetScale Postgres, the current branch needs to be appended to the username created above. For example, if the created user was named `clickpipes_user`, the actual user provided during the ClickPipe creation needs to be `clickpipes_user`.`branch` where `branch` refers to the "id" of the current PlanetScale Postgres [branch](https://planetscale.com/docs/postgres/branching). To quickly determine this, you can refer to the username of the `postgres` user you used to create the user earlier, the part after the period would be the branch id.
-2. Do not use the `PSBouncer` port (currently `6432`) for CDC pipes connecting to PlanetScale Postgres, the normal port `5432` must be used. Either port may be used for initial-load only pipes.
+2. Don't use the `PSBouncer` port (currently `6432`) for CDC pipes connecting to PlanetScale Postgres, the normal port `5432` must be used. Either port may be used for initial-load only pipes.
3. Please ensure you're connecting only to the primary instance, [connecting to replica instances](https://planetscale.com/docs/postgres/scaling/replicas#how-to-query-postgres-replicas) is currently not supported.
## What's next? {#whats-next}
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md
index f560884ab6c..14388d0adc6 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md
@@ -126,7 +126,7 @@ If you want to restrict traffic to your RDS instance, please add the [documented
To connect to your RDS instance through a private network, you can use AWS PrivateLink. Follow our [AWS PrivateLink setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup-for-clickpipes) to set up the connection.
### Workarounds for RDS Proxy {#workarounds-for-rds-proxy}
-RDS Proxy does not support logical replication connections. If you have dynamic IP addresses in RDS and cannot use DNS name or a lambda, here are some alternatives:
+RDS Proxy doesn't support logical replication connections. If you have dynamic IP addresses in RDS and can't use DNS name or a lambda, here are some alternatives:
1. Using a cron job, resolve the RDS endpoint's IP periodically and update the NLB if it has changed.
2. Using RDS Event Notifications with EventBridge/SNS: Trigger updates automatically using AWS RDS event notifications
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md
index abd074a4d07..464c142cf92 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md
@@ -90,7 +90,7 @@ Head over to your Supabase Project's `Project Settings` -> `Database` (under `Co
:::info
-The connection pooler is not supported for CDC based replication, hence it needs to be disabled.
+The connection pooler isn't supported for CDC based replication, hence it needs to be disabled.
:::
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md
index 4c9ad96771f..10388573e05 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md
@@ -50,7 +50,7 @@ For other managed services, please raise a support ticket with your provider to
it isn't already.
:::info
-Timescale Cloud does not support enabling logical replication, which is needed for Postgres pipes in CDC mode.
+Timescale Cloud doesn't support enabling logical replication, which is needed for Postgres pipes in CDC mode.
As a result, users of Timescale Cloud can only perform a one-time load of their data (`Initial Load Only`) with the
Postgres ClickPipe.
:::
@@ -58,7 +58,7 @@ Postgres ClickPipe.
## Configuration {#configuration}
Timescale hypertables don't store any data inserted into them. Instead, the data is stored in multiple corresponding
-"chunk" tables which are in the `_timescaledb_internal` schema. For running queries on the hypertables, this is not an
+"chunk" tables which are in the `_timescaledb_internal` schema. For running queries on the hypertables, this isn't an
issue. But during logical replication, instead of detecting changes in the hypertable we detect them in the chunk table
instead. The Postgres ClickPipe has logic to automatically remap changes from the chunk tables to the parent hypertable,
but this requires additional steps.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/table_resync.md b/docs/integrations/data-ingestion/clickpipes/postgres/table_resync.md
index 4c78e00b3f9..79918b00ad5 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/table_resync.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/table_resync.md
@@ -23,7 +23,7 @@ This can be followed by following the [table removal guide](./removing_tables).
### 2. Truncate or drop the table on ClickHouse {#truncate-drop-table}
This step is to avoid data duplication when we add this table again in the next step. You can do this by heading over to the **SQL Console** tab in ClickHouse Cloud and running a query.
-Note that we have validation to block table addition if the table already exists in ClickHouse and is not empty.
+Note that we have validation to block table addition if the table already exists in ClickHouse and isn't empty.
Alternatively, if you need to keep the old table you can simply rename it. This is also helpful when the table is very big and the drop operation might take some time.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/toast.md b/docs/integrations/data-ingestion/clickpipes/postgres/toast.md
index 281514a08cf..39de1716e7b 100644
--- a/docs/integrations/data-ingestion/clickpipes/postgres/toast.md
+++ b/docs/integrations/data-ingestion/clickpipes/postgres/toast.md
@@ -15,7 +15,7 @@ When replicating data from PostgreSQL to ClickHouse, it's important to understan
TOAST (The Oversized-Attribute Storage Technique) is PostgreSQL's mechanism for handling large field values. When a row exceeds the maximum row size (typically 2KB, but this can vary depending on the PostgreSQL version and exact settings), PostgreSQL automatically moves large field values into a separate TOAST table, storing only a pointer in the main table.
-It's important to note that during Change Data Capture (CDC), unchanged TOAST columns are not included in the replication stream. This can lead to incomplete data replication if not handled properly.
+It's important to note that during Change Data Capture (CDC), unchanged TOAST columns aren't included in the replication stream. This can lead to incomplete data replication if not handled properly.
During the initial load (snapshot), all column values, including TOAST columns, will be replicated correctly regardless of their size. The limitations described in this guide primarily affect the ongoing CDC process after the initial load.
@@ -49,14 +49,14 @@ ALTER TABLE your_table_name REPLICA IDENTITY FULL;
Refer to [this blog post](https://xata.io/blog/replica-identity-full-performance) for performance considerations when setting `REPLICA IDENTITY FULL`.
-## Replication behavior when REPLICA IDENTITY FULL is not set {#replication-behavior-when-replica-identity-full-is-not-set}
+## Replication behavior when REPLICA IDENTITY FULL isn't set {#replication-behavior-when-replica-identity-full-is-not-set}
-If `REPLICA IDENTITY FULL` is not set for a table with TOAST columns, you may encounter the following issues when replicating to ClickHouse:
+If `REPLICA IDENTITY FULL` isn't set for a table with TOAST columns, you may encounter the following issues when replicating to ClickHouse:
1. For INSERT operations, all columns (including TOAST columns) will be replicated correctly.
2. For UPDATE operations:
- - If a TOAST column is not modified, its value will appear as NULL or empty in ClickHouse.
+ - If a TOAST column isn't modified, its value will appear as NULL or empty in ClickHouse.
- If a TOAST column is modified, it will be replicated correctly.
3. For DELETE operations, TOAST column values will appear as NULL or empty in ClickHouse.
diff --git a/docs/integrations/data-ingestion/data-formats/binary.md b/docs/integrations/data-ingestion/data-formats/binary.md
index 3160e09845a..d5772fde780 100644
--- a/docs/integrations/data-ingestion/data-formats/binary.md
+++ b/docs/integrations/data-ingestion/data-formats/binary.md
@@ -85,7 +85,7 @@ INTO OUTFILE 'data.binary' FORMAT RowBinary
This will generate [data.binary](assets/data.binary) file in a binary rows format.
### Exploring RowBinary files {#exploring-rowbinary-files}
-Automatic schema inference is not supported for this format, so to explore before loading, we have to define schema explicitly:
+Automatic schema inference isn't supported for this format, so to explore before loading, we have to define schema explicitly:
```sql
SELECT *
diff --git a/docs/integrations/data-ingestion/data-formats/csv-tsv.md b/docs/integrations/data-ingestion/data-formats/csv-tsv.md
index 00bef9d1257..f7cfb6faacd 100644
--- a/docs/integrations/data-ingestion/data-formats/csv-tsv.md
+++ b/docs/integrations/data-ingestion/data-formats/csv-tsv.md
@@ -69,7 +69,7 @@ clickhouse-client -q "INSERT INTO sometable FORMAT CSVWithNames" < data_small_he
In this case, ClickHouse skips the first row while importing data from the file.
:::tip
-Starting from [version](https://github.com/ClickHouse/ClickHouse/releases) 23.1, ClickHouse will automatically detect headers in CSV files when using the `CSV` format, so it is not necessary to use `CSVWithNames` or `CSVWithNamesAndTypes`.
+Starting from [version](https://github.com/ClickHouse/ClickHouse/releases) 23.1, ClickHouse will automatically detect headers in CSV files when using the `CSV` format, so it isn't necessary to use `CSVWithNames` or `CSVWithNamesAndTypes`.
:::
### CSV files with custom delimiters {#csv-files-with-custom-delimiters}
diff --git a/docs/integrations/data-ingestion/data-formats/intro.md b/docs/integrations/data-ingestion/data-formats/intro.md
index cc1ce5472d0..b270a7f82bd 100644
--- a/docs/integrations/data-ingestion/data-formats/intro.md
+++ b/docs/integrations/data-ingestion/data-formats/intro.md
@@ -33,4 +33,4 @@ Handle common Apache formats such as Parquet and Arrow.
Need a SQL dump to import into MySQL or Postgresql? Look no further.
-If you are looking to connect a BI tool like Grafana, Tableau and others, check out the [Visualize category](../../data-visualization/index.md) of the docs.
+If you're looking to connect a BI tool like Grafana, Tableau and others, check out the [Visualize category](../../data-visualization/index.md) of the docs.
diff --git a/docs/integrations/data-ingestion/data-formats/json/formats.md b/docs/integrations/data-ingestion/data-formats/json/formats.md
index 7c8f0e1c7a4..87c72d9a64b 100644
--- a/docs/integrations/data-ingestion/data-formats/json/formats.md
+++ b/docs/integrations/data-ingestion/data-formats/json/formats.md
@@ -83,7 +83,7 @@ LIMIT 2;
2 rows in set. Elapsed: 0.003 sec.
```
-The `JSONAsObject` format may also be useful for reading newline-delimited JSON in cases where the structure of the objects is inconsistent. For example, if a key varies in type across rows (it may sometimes be a string, but other times an object). In such cases, ClickHouse cannot infer a stable schema using `JSONEachRow`, and `JSONAsObject` allows the data to be ingested without strict type enforcement, storing each JSON row as a whole in a single column. For example, notice how `JSONEachRow` fails on the following example:
+The `JSONAsObject` format may also be useful for reading newline-delimited JSON in cases where the structure of the objects is inconsistent. For example, if a key varies in type across rows (it may sometimes be a string, but other times an object). In such cases, ClickHouse can't infer a stable schema using `JSONEachRow`, and `JSONAsObject` allows the data to be ingested without strict type enforcement, storing each JSON row as a whole in a single column. For example, notice how `JSONEachRow` fails on the following example:
```sql
SELECT count()
diff --git a/docs/integrations/data-ingestion/data-formats/json/inference.md b/docs/integrations/data-ingestion/data-formats/json/inference.md
index 5d476b1a839..ae875f47994 100644
--- a/docs/integrations/data-ingestion/data-formats/json/inference.md
+++ b/docs/integrations/data-ingestion/data-formats/json/inference.md
@@ -10,7 +10,7 @@ ClickHouse can automatically determine the structure of JSON data. This can be u
## When to use type inference {#when-to-use-type-inference}
-* **Consistent structure** - The data from which you are going to infer types contains all the keys that you are interested in. Type inference is based on sampling the data up to a [maximum number of rows](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) or [bytes](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference). Data after the sample, with additional columns, will be ignored and can't be queried.
+* **Consistent structure** - The data from which you're going to infer types contains all the keys that you're interested in. Type inference is based on sampling the data up to a [maximum number of rows](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) or [bytes](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference). Data after the sample, with additional columns, will be ignored and can't be queried.
* **Consistent types** - Data types for specific keys need to be compatible i.e. it must be possible to coerce one type to the other automatically.
If you have more dynamic JSON, to which new keys are added and multiple types are possible for the same path, see ["Working with semi-structured and dynamic data"](/integrations/data-formats/json/inference#working-with-semi-structured-data).
@@ -90,7 +90,7 @@ SETTINGS describe_compact_output = 1
└────────────────┴─────────────────────────────────────────────────────────────────────────┘
```
:::note Avoid nulls
-You can see a lot of the columns are detected as Nullable. We [do not recommend using the Nullable](/sql-reference/data-types/nullable#storage-features) type when not absolutely needed. You can use [schema_inference_make_columns_nullable](/operations/settings/formats#schema_inference_make_columns_nullable) to control the behavior of when Nullable is applied.
+You can see a lot of the columns are detected as Nullable. We [don't recommend using the Nullable](/sql-reference/data-types/nullable#storage-features) type when not absolutely needed. You can use [schema_inference_make_columns_nullable](/operations/settings/formats#schema_inference_make_columns_nullable) to control the behavior of when Nullable is applied.
:::
We can see that most columns have automatically been detected as `String`, with `update_date` column correctly detected as a `Date`. The `versions` column has been created as an `Array(Tuple(created String, version String))` to store a list of objects, with `authors_parsed` being defined as `Array(Array(String))` for nested arrays.
@@ -147,7 +147,7 @@ Schema inference allows us to query JSON files without needing to specify the sc
## Creating tables {#creating-tables}
-We can rely on schema inference to create the schema for a table. The following `CREATE AS EMPTY` command causes the DDL for the table to be inferred and the table to created. This does not load any data:
+We can rely on schema inference to create the schema for a table. The following `CREATE AS EMPTY` command causes the DDL for the table to be inferred and the table to created. This doesn't load any data:
```sql
CREATE TABLE arxiv
@@ -184,7 +184,7 @@ ENGINE = MergeTree
ORDER BY update_date
```
-The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection is not correct, you can provide hints as described [here](/operations/settings/formats#schema_inference_make_columns_nullable).
+The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection isn't correct, you can provide hints as described [here](/operations/settings/formats#schema_inference_make_columns_nullable).
### Creating tables from snippets {#creating-tables-from-snippets}
@@ -275,7 +275,7 @@ FORMAT PrettyJSONEachRow
## Handling errors {#handling-errors}
-Sometimes, you might have bad data. For example, specific columns that do not have the right type or an improperly formatted JSON object. For this, you can use the settings [`input_format_allow_errors_num`](/operations/settings/formats#input_format_allow_errors_num) and [`input_format_allow_errors_ratio`](/operations/settings/formats#input_format_allow_errors_ratio) to allow a certain number of rows to be ignored if the data is triggering insert errors. Additionally, [hints](/operations/settings/formats#schema_inference_hints) can be provided to assist inference.
+Sometimes, you might have bad data. For example, specific columns that don't have the right type or an improperly formatted JSON object. For this, you can use the settings [`input_format_allow_errors_num`](/operations/settings/formats#input_format_allow_errors_num) and [`input_format_allow_errors_ratio`](/operations/settings/formats#input_format_allow_errors_ratio) to allow a certain number of rows to be ignored if the data is triggering insert errors. Additionally, [hints](/operations/settings/formats#schema_inference_hints) can be provided to assist inference.
## Working with semi-structured and dynamic data {#working-with-semi-structured-data}
@@ -331,7 +331,7 @@ SETTINGS describe_compact_output = 1
1 row in set. Elapsed: 0.005 sec.
```
-This format is also essential in cases where columns have multiple types that cannot be reconciled. For example, consider a `sample.json` file with the following newline-delimited JSON:
+This format is also essential in cases where columns have multiple types that can't be reconciled. For example, consider a `sample.json` file with the following newline-delimited JSON:
```json
{"a":1}
@@ -362,7 +362,7 @@ However, some types are incompatible. Consider the following example:
{"a":{"b":2}}
```
-In this case any form of type conversion here is not possible. A `DESCRIBE` command thus fails:
+In this case any form of type conversion here isn't possible. A `DESCRIBE` command thus fails:
```sql
DESCRIBE s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/json/conflict_sample.json')
diff --git a/docs/integrations/data-ingestion/data-formats/json/loading.md b/docs/integrations/data-ingestion/data-formats/json/loading.md
index 7986925f96f..383bdbb67ef 100644
--- a/docs/integrations/data-ingestion/data-formats/json/loading.md
+++ b/docs/integrations/data-ingestion/data-formats/json/loading.md
@@ -39,7 +39,7 @@ In this simple case, our structure is static, our column names are known, and th
Whereas ClickHouse supports semi-structured data through a JSON type, where key names and their types can be dynamic, this is unnecessary here.
:::note Prefer static schemas where possible
-In cases where your columns have fixed names and types, and new columns are not expected, always prefer a statically defined schema in production.
+In cases where your columns have fixed names and types, and new columns aren't expected, always prefer a statically defined schema in production.
The JSON type is preferred for highly dynamic data, where the names and types of columns are subject to change. This type is also useful in prototyping and data exploration.
:::
@@ -78,7 +78,7 @@ LIMIT 1
1 row in set. Elapsed: 1.232 sec.
```
-Note how we are not required to specify the file format. Instead, we use a glob pattern to read all `*.json.gz` files in the bucket. ClickHouse automatically infers the format is `JSONEachRow` (ndjson) from the file extension and contents. A format can be manually specified through parameter functions in case ClickHouse is unable to detect it.
+Note how we're not required to specify the file format. Instead, we use a glob pattern to read all `*.json.gz` files in the bucket. ClickHouse automatically infers the format is `JSONEachRow` (ndjson) from the file extension and contents. A format can be manually specified through parameter functions in case ClickHouse is unable to detect it.
```sql
SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/pypi/json/*.json.gz', JSONEachRow)
diff --git a/docs/integrations/data-ingestion/data-formats/json/other.md b/docs/integrations/data-ingestion/data-formats/json/other.md
index d76ade3ce82..ba201c8acf5 100644
--- a/docs/integrations/data-ingestion/data-formats/json/other.md
+++ b/docs/integrations/data-ingestion/data-formats/json/other.md
@@ -18,9 +18,9 @@ Different techniques may be applied to different objects in the same schema. For
If the objects are highly dynamic, with no predictable structure and contain arbitrary nested objects, you should use the `String` type. Values can be extracted at query time using JSON functions as we show below.
-Handling data using the structured approach described above is often not viable for those users with dynamic JSON, which is either subject to change or for which the schema is not well understood. For absolute flexibility, you can simply store JSON as `String`s before using functions to extract fields as required. This represents the extreme opposite of handling JSON as a structured object. This flexibility incurs costs with significant disadvantages - primarily an increase in query syntax complexity as well as degraded performance.
+Handling data using the structured approach described above is often not viable for those users with dynamic JSON, which is either subject to change or for which the schema isn't well understood. For absolute flexibility, you can simply store JSON as `String`s before using functions to extract fields as required. This represents the extreme opposite of handling JSON as a structured object. This flexibility incurs costs with significant disadvantages - primarily an increase in query syntax complexity as well as degraded performance.
-As noted earlier, for the [original person object](/integrations/data-formats/json/schema#static-vs-dynamic-json), we cannot ensure the structure of the `tags` column. We insert the original row (including `company.labels`, which we ignore for now), declaring the `Tags` column as a `String`:
+As noted earlier, for the [original person object](/integrations/data-formats/json/schema#static-vs-dynamic-json), we can't ensure the structure of the `tags` column. We insert the original row (including `company.labels`, which we ignore for now), declaring the `Tags` column as a `String`:
```sql
CREATE TABLE people
@@ -160,7 +160,7 @@ A faster and more strict set of functions are available. These `simpleJSON*` fun
- Field names must be constants
- Consistent encoding of field names e.g. `simpleJSONHas('{"abc":"def"}', 'abc') = 1`, but `visitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0`
- The field names are unique across all nested structures. No differentiation is made between nesting levels and matching is indiscriminate. In the event of multiple matching fields, the first occurrence is used.
-- No special characters outside of string literals. This includes spaces. The following is invalid and will not parse.
+- No special characters outside of string literals. This includes spaces. The following is invalid and won't parse.
```json
{"@timestamp": 893964617, "clientip": "40.135.0.0", "request": {"method": "GET",
diff --git a/docs/integrations/data-ingestion/data-formats/json/schema.md b/docs/integrations/data-ingestion/data-formats/json/schema.md
index 7b7d5b8eb8d..de0a87b1380 100644
--- a/docs/integrations/data-ingestion/data-formats/json/schema.md
+++ b/docs/integrations/data-ingestion/data-formats/json/schema.md
@@ -21,7 +21,7 @@ While [schema inference](/integrations/data-formats/json/inference) can be used
The principal task on defining a schema for JSON is to determine the appropriate type for each key's value. We recommended users apply the following rules recursively on each key in the JSON hierarchy to determine the appropriate type for each key.
1. **Primitive types** - If the key's value is a primitive type, irrespective of whether it is part of a sub-object or on the root, ensure you select its type according to general schema [design best practices](/data-modeling/schema-design) and [type optimization rules](/data-modeling/schema-design#optimizing-types). Arrays of primitives, such as `phone_numbers` below, can be modeled as `Array()` e.g., `Array(String)`.
-2. **Static vs dynamic** - If the key's value is a complex object i.e. either an object or an array of objects, establish whether it is subject to change. Objects that rarely have new keys, where the addition of a new key can be predicted and handled with a schema change via [`ALTER TABLE ADD COLUMN`](/sql-reference/statements/alter/column#add-column), can be considered **static**. This includes objects where only a subset of the keys may be provided on some JSON documents. Objects where new keys are added frequently and/or are not predictable should be considered **dynamic**. **The exception here is structures with hundreds or thousands of sub keys which can be considered dynamic for convenience purposes**.
+2. **Static vs dynamic** - If the key's value is a complex object i.e. either an object or an array of objects, establish whether it is subject to change. Objects that rarely have new keys, where the addition of a new key can be predicted and handled with a schema change via [`ALTER TABLE ADD COLUMN`](/sql-reference/statements/alter/column#add-column), can be considered **static**. This includes objects where only a subset of the keys may be provided on some JSON documents. Objects where new keys are added frequently and/or aren't predictable should be considered **dynamic**. **The exception here is structures with hundreds or thousands of sub keys which can be considered dynamic for convenience purposes**.
To establish whether a value is **static** or **dynamic**, see the relevant sections [**Handling static objects**](/integrations/data-formats/json/schema#handling-static-structures) and [**Handling dynamic objects**](/integrations/data-formats/json/schema#handling-semi-structured-dynamic-structures) below.
@@ -82,12 +82,12 @@ To illustrate these rules, we use the following JSON example representing a pers
Applying these rules:
- The root keys `name`, `username`, `email`, `website` can be represented as type `String`. The column `phone_numbers` is an Array primitive of type `Array(String)`, with `dob` and `id` type `Date` and `UInt32` respectively.
-- New keys will not be added to the `address` object (only new address objects), and it can thus be considered **static**. If we recurse, all of the sub-columns can be considered primitives (and type `String`) except `geo`. This is also a static structure with two `Float32` columns, `lat` and `lon`.
+- New keys won't be added to the `address` object (only new address objects), and it can thus be considered **static**. If we recurse, all of the sub-columns can be considered primitives (and type `String`) except `geo`. This is also a static structure with two `Float32` columns, `lat` and `lon`.
- The `tags` column is **dynamic**. We assume new arbitrary tags can be added to this object of any type and structure.
- The `company` object is **static** and will always contain at most the 3 keys specified. The subkeys `name` and `catchPhrase` are of type `String`. The key `labels` is **dynamic**. We assume new arbitrary tags can be added to this object. Values will always be key-value pairs of type string.
:::note
-Structures with hundreds or thousands of static keys can be considered dynamic, as it is rarely realistic to statically declare the columns for these. However, where possible [skip paths](#using-type-hints-and-skipping-paths) which are not needed to save both storage and inference overhead.
+Structures with hundreds or thousands of static keys can be considered dynamic, as it is rarely realistic to statically declare the columns for these. However, where possible [skip paths](#using-type-hints-and-skipping-paths) which aren't needed to save both storage and inference overhead.
:::
## Handling static structures {#handling-static-structures}
@@ -202,7 +202,7 @@ ORDER BY company.name
### Handling default values {#handling-default-values}
-Even if JSON objects are structured, they are often sparse with only a subset of the known keys provided. Fortunately, the `Tuple` type does not require all columns in the JSON payload. If not provided, default values will be used.
+Even if JSON objects are structured, they're often sparse with only a subset of the known keys provided. Fortunately, the `Tuple` type doesn't require all columns in the JSON payload. If not provided, default values will be used.
Consider our earlier `people` table and the following sparse JSON, missing the keys `suite`, `geo`, `phone_numbers`, and `catchPhrase`.
@@ -282,7 +282,7 @@ If you need to differentiate between a value being empty and not provided, the [
While a structured approach is simplest when the JSON keys are static, this approach can still be used if the changes to the schema can be planned, i.e., new keys are known in advance, and the schema can be modified accordingly.
-Note that ClickHouse will, by default, ignore JSON keys that are provided in the payload and are not present in the schema. Consider the following modified JSON payload with the addition of a `nickname` key:
+Note that ClickHouse will, by default, ignore JSON keys that are provided in the payload and aren't present in the schema. Consider the following modified JSON payload with the addition of a `nickname` key:
```json
{
@@ -327,7 +327,7 @@ Ok.
1 row in set. Elapsed: 0.002 sec.
```
-Columns can be added to a schema using the [`ALTER TABLE ADD COLUMN`](/sql-reference/statements/alter/column#add-column) command. A default can be specified via the `DEFAULT` clause, which will be used if it is not specified during the subsequent inserts. Rows for which this value is not present (as they were inserted prior to its creation) will also return this default value. If no `DEFAULT` value is specified, the default value for the type will be used.
+Columns can be added to a schema using the [`ALTER TABLE ADD COLUMN`](/sql-reference/statements/alter/column#add-column) command. A default can be specified via the `DEFAULT` clause, which will be used if it isn't specified during the subsequent inserts. Rows for which this value isn't present (as they were inserted prior to its creation) will also return this default value. If no `DEFAULT` value is specified, the default value for the type will be used.
For example:
@@ -475,7 +475,7 @@ Given the dynamic nature of the `company.labels` column between objects, with re
- **Single JSON column** - represents the entire schema as a single `JSON` column, allowing all structures to be dynamic beneath this.
- **Targeted JSON column** - only use the `JSON` type for the `company.labels` column, retaining the structured schema used above for all other columns.
-While the first approach [does not align with previous methodology](#static-vs-dynamic-json), a single JSON column approach is useful for prototyping and data engineering tasks.
+While the first approach [doesn't align with previous methodology](#static-vs-dynamic-json), a single JSON column approach is useful for prototyping and data engineering tasks.
For production deployments of ClickHouse at scale, we recommend being specific with structure and using the JSON type for targeted dynamic sub-structures where possible.
@@ -490,7 +490,7 @@ A strict schema has a number of benefits:
This approach is useful for prototyping and data engineering tasks. For production, try use `JSON` only for dynamic sub structures where necessary.
:::note Performance considerations
-A single JSON column can be optimized by skipping (not storing) JSON paths that are not required and by using [type hints](#using-type-hints-and-skipping-paths). Type hints allow the user to explicitly define the type for a sub-column, thereby skipping inference and indirection processing at query time. This can be used to deliver the same performance as if an explicit schema was used. See ["Using type hints and skipping paths"](#using-type-hints-and-skipping-paths) for further details.
+A single JSON column can be optimized by skipping (not storing) JSON paths that aren't required and by using [type hints](#using-type-hints-and-skipping-paths). Type hints allow the user to explicitly define the type for a sub-column, thereby skipping inference and indirection processing at query time. This can be used to deliver the same performance as if an explicit schema was used. See ["Using type hints and skipping paths"](#using-type-hints-and-skipping-paths) for further details.
:::
The schema for a single JSON column here is simple:
@@ -935,7 +935,7 @@ Additionally, by leveraging offsets, ClickHouse ensures that these sub-columns r
However, in scenarios with high-cardinality or highly variable JSON structures—such as telemetry pipelines, logs, or machine-learning feature stores - this behavior can lead to an explosion of column files. Each new unique JSON path results in a new column file, and each type variant under that path results in an additional column file. While this is optimal for read performance, it introduces operational challenges: file descriptor exhaustion, increased memory usage, and slower merges due to a high number of small files.
-To mitigate this, ClickHouse introduces the concept of an overflow subcolumn: once the number of distinct JSON paths exceeds a threshold, additional paths are stored in a single shared file using a compact encoded format. This file is still queryable but does not benefit from the same performance characteristics as dedicated subcolumns.
+To mitigate this, ClickHouse introduces the concept of an overflow subcolumn: once the number of distinct JSON paths exceeds a threshold, additional paths are stored in a single shared file using a compact encoded format. This file is still queryable but doesn't benefit from the same performance characteristics as dedicated subcolumns.
diff --git a/docs/integrations/data-ingestion/data-formats/templates-regex.md b/docs/integrations/data-ingestion/data-formats/templates-regex.md
index 4fb8bc12a32..b60c8539717 100644
--- a/docs/integrations/data-ingestion/data-formats/templates-regex.md
+++ b/docs/integrations/data-ingestion/data-formats/templates-regex.md
@@ -52,7 +52,7 @@ ${time:Escaped} [error] client: ${ip:CSV}, server: ${host:CSV} ${request:JSON}
We define a name of a column and escaping rule in a `${name:escaping}` format. Multiple options are available here, like CSV, JSON, Escaped, or Quoted, which implement [respective escaping rules](/interfaces/formats/Template).
-Now we can use the given file as an argument to the `format_template_row` settings option while importing data (*note, that template and data files **should not have** an extra `\n` symbol at the end of file*):
+Now we can use the given file as an argument to the `format_template_row` settings option while importing data (*note, that template and data files **shouldn't have** an extra `\n` symbol at the end of file*):
```sql
INSERT INTO error_log FROM INFILE 'error.log'
diff --git a/docs/integrations/data-ingestion/dbms/dynamodb/index.md b/docs/integrations/data-ingestion/dbms/dynamodb/index.md
index f292f60783d..fd356786d56 100644
--- a/docs/integrations/data-ingestion/dbms/dynamodb/index.md
+++ b/docs/integrations/data-ingestion/dbms/dynamodb/index.md
@@ -114,7 +114,7 @@ https://{bucket}.s3.amazonaws.com/{prefix}/AWSDynamoDB/{export-id}/data/*
- **Format**: JSONEachRow
- **Table**: Your snapshot table (e.g. `default.snapshot` in example above)
-Once created, data will begin populating in the snapshot and destination tables. You do not need to wait for the snapshot load to finish before moving on to the next step.
+Once created, data will begin populating in the snapshot and destination tables. You don't need to wait for the snapshot load to finish before moving on to the next step.
## 4. Create the Kinesis ClickPipe {#4-create-the-kinesis-clickpipe}
diff --git a/docs/integrations/data-ingestion/dbms/jdbc-with-clickhouse.md b/docs/integrations/data-ingestion/dbms/jdbc-with-clickhouse.md
index e601fa3b9bf..a06b4ce27f3 100644
--- a/docs/integrations/data-ingestion/dbms/jdbc-with-clickhouse.md
+++ b/docs/integrations/data-ingestion/dbms/jdbc-with-clickhouse.md
@@ -57,7 +57,7 @@ cd ~/clickhouse-jdbc-bridge
wget https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v2.0.7/clickhouse-jdbc-bridge-2.0.7-shaded.jar
```
-In order to be able to connect to MySQL we are creating a named data source:
+In order to be able to connect to MySQL we're creating a named data source:
```bash
cd ~/clickhouse-jdbc-bridge
@@ -82,7 +82,7 @@ In order to be able to connect to MySQL we are creating a named data source:
:::note
in the config file above
-- you are free to use any name you like for the datasource, we used `mysql8`
+- you're free to use any name you like for the datasource, we used `mysql8`
- in the value for the `jdbcUrl` you need to replace ``, and `` with appropriate values according to your running MySQL instance, e.g. `"jdbc:mysql://localhost:3306"`
- you need to replace `` and `` with your MySQL credentials, if you don't use a password, you can delete the `"password": ""` line in the config file above
- in the value for `driverUrls` we just specified a URL from which the current version of the MySQL JDBC driver can be downloaded. That's all we have to do, and the ClickHouse JDBC Bridge will automatically download that JDBC driver (into a OS specific directory).
@@ -90,7 +90,7 @@ in the config file above
-Now we are ready to start the ClickHouse JDBC Bridge:
+Now we're ready to start the ClickHouse JDBC Bridge:
```bash
cd ~/clickhouse-jdbc-bridge
java -jar clickhouse-jdbc-bridge-2.0.7-shaded.jar
@@ -111,7 +111,7 @@ The easiest way to execute the following examples is to copy and paste them into
SELECT * FROM jdbc('mysql8', 'mydatabase', 'mytable');
```
:::note
-As the first parameter for the jdbc table function we are using the name of the named data source that we configured above.
+As the first parameter for the jdbc table function we're using the name of the named data source that we configured above.
:::
- JDBC Table Engine:
@@ -125,7 +125,7 @@ As the first parameter for the jdbc table function we are using the name of the
SELECT * FROM mytable;
```
:::note
- As the first parameter for the jdbc engine clause we are using the name of the named data source that we configured above
+ As the first parameter for the jdbc engine clause we're using the name of the named data source that we configured above
The schema of the ClickHouse JDBC engine table and schema of the connected MySQL table must be aligned, e.g. the column names and order must be the same, and the column data types must be compatible
:::
@@ -166,7 +166,7 @@ jdbc_bridge:
:::note
- you need to replace `JDBC-Bridge-Host` with the hostname or ip address of the dedicated ClickHouse JDBC Bridge host
-- we specified the default ClickHouse JDBC Bridge port `9019`, if you are using a different port for the JDBC Bridge then you must adapt the configuration above accordingly
+- we specified the default ClickHouse JDBC Bridge port `9019`, if you're using a different port for the JDBC Bridge then you must adapt the configuration above accordingly
:::
[//]: # (## 4. Additional Info)
diff --git a/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md b/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md
index 7dda71918d9..5bcd4aca629 100644
--- a/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md
+++ b/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md
@@ -76,7 +76,7 @@ This article is to illustrate basic methods of integration using one table.
```
:::note
-If you are using this feature in ClickHouse Cloud, you may need the to allow the ClickHouse Cloud IP addresses to access your PostgreSQL instance.
+If you're using this feature in ClickHouse Cloud, you may need the to allow the ClickHouse Cloud IP addresses to access your PostgreSQL instance.
Check the ClickHouse [Cloud Endpoints API](/cloud/get-started/query-endpoints) for egress traffic details.
:::
@@ -338,7 +338,7 @@ Query id: b0729816-3917-44d3-8d1a-fed912fb59ce
```
### 4. Summary {#4-summary}
-This integration guide focused on a simple example on how to replicate a database with a table, however, there exist more advanced options which include replicating the whole database or adding new tables and schemas to the existing replications. Although DDL commands are not supported for this replication, the engine can be set to detect changes and reload the tables when there are structural changes made.
+This integration guide focused on a simple example on how to replicate a database with a table, however, there exist more advanced options which include replicating the whole database or adding new tables and schemas to the existing replications. Although DDL commands aren't supported for this replication, the engine can be set to detect changes and reload the tables when there are structural changes made.
:::info
For more features available for advanced options, please see the [reference documentation](/engines/database-engines/materialized-postgresql).
diff --git a/docs/integrations/data-ingestion/emqx/index.md b/docs/integrations/data-ingestion/emqx/index.md
index f2145768b1c..551dff90fcd 100644
--- a/docs/integrations/data-ingestion/emqx/index.md
+++ b/docs/integrations/data-ingestion/emqx/index.md
@@ -53,10 +53,10 @@ With the infrastructure provided by cloud providers, EMQX Cloud serves dozens of
### Assumptions {#assumptions}
-* You are familiar with the [MQTT protocol](https://mqtt.org/), which is designed as an extremely lightweight publish/subscribe messaging transport protocol.
-* You are using EMQX or EMQX Cloud for real-time message processing engine, powering event streaming for IoT devices at massive scale.
+* You're familiar with the [MQTT protocol](https://mqtt.org/), which is designed as an extremely lightweight publish/subscribe messaging transport protocol.
+* You're using EMQX or EMQX Cloud for real-time message processing engine, powering event streaming for IoT devices at massive scale.
* You have prepared a Clickhouse Cloud instance to persist device data.
-* We are using [MQTT X](https://mqttx.app/) as an MQTT client testing tool to connect the deployment of EMQX Cloud to publish MQTT data. Or other methods connecting to the MQTT broker will do the job as well.
+* We're using [MQTT X](https://mqttx.app/) as an MQTT client testing tool to connect the deployment of EMQX Cloud to publish MQTT data. Or other methods connecting to the MQTT broker will do the job as well.
## Get your ClickHouse Cloud service {#get-your-clickhouse-cloudservice}
@@ -105,7 +105,7 @@ Creating a dedicated MQTT broker on EMQX Cloud is as easy as a few clicks.
EMQX Cloud provides a 14-day free trial for both standard deployment and professional deployment for every account.
-Start at the [EMQX Cloud sign up](https://accounts.emqx.com/signup?continue=https%3A%2F%2Fwww.emqx.com%2Fen%2Fcloud) page and click start free to register an account if you are new to EMQX Cloud.
+Start at the [EMQX Cloud sign up](https://accounts.emqx.com/signup?continue=https%3A%2F%2Fwww.emqx.com%2Fen%2Fcloud) page and click start free to register an account if you're new to EMQX Cloud.
@@ -127,7 +127,7 @@ Now click the panel to go to the cluster view. On this dashboard, you will see t
### Add client credential {#add-client-credential}
-EMQX Cloud does not allow anonymous connections by default,so you need add a client credential so you can use the MQTT client tool to send data to this broker.
+EMQX Cloud doesn't allow anonymous connections by default,so you need add a client credential so you can use the MQTT client tool to send data to this broker.
Click 'Authentication & ACL' on the left menu and click 'Authentication' in the submenu. Click the 'Add' button on the right and give a username and password for the MQTT connection later. Here we will use `emqx` and `xxxxxx` for the username and password.
diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md
index d2b47731c6d..1d0a86993af 100644
--- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md
+++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md
@@ -63,14 +63,14 @@ your_profile_name:
```
### Schema vs Database {#schema-vs-database}
-The dbt model relation identifier `database.schema.table` is not compatible with Clickhouse because Clickhouse does not
+The dbt model relation identifier `database.schema.table` isn't compatible with Clickhouse because Clickhouse doesn't
support a `schema`.
So we use a simplified approach `schema.table`, where `schema` is the Clickhouse database. Using the `default` database
-is not recommended.
+isn't recommended.
### SET Statement Warning {#set-statement-warning}
-In many environments, using the SET statement to persist a ClickHouse setting across all DBT queries is not reliable
+In many environments, using the SET statement to persist a ClickHouse setting across all DBT queries isn't reliable
and can cause unexpected failures. This is particularly true when using HTTP connections through a load balancer that
distributes queries across multiple nodes (such as ClickHouse cloud), although in some circumstances this can also
happen with native ClickHouse connections. Accordingly, we recommend configuring any required ClickHouse settings in the
@@ -90,7 +90,7 @@ seeds:
When using a ClickHouse cluster, you need to consider two things:
- Setting the `cluster` setting.
-- Ensuring read-after-write consistency, especially if you are using more than one `threads`.
+- Ensuring read-after-write consistency, especially if you're using more than one `threads`.
#### Cluster Setting {#cluster-setting}
@@ -101,7 +101,7 @@ The `cluster` setting in profile enables dbt-clickhouse to run against a ClickHo
- Table and incremental materializations
- Distributed materializations
-Replicated engines will **not** include the `ON CLUSTER` clause, as they are designed to manage replication internally.
+Replicated engines will **not** include the `ON CLUSTER` clause, as they're designed to manage replication internally.
To **opt out** of cluster-based creation for a specific model, add the `disable_on_cluster` config:
@@ -115,7 +115,7 @@ To **opt out** of cluster-based creation for a specific model, add the `disable_
```
-table and incremental materializations with non-replicated engine will not be affected by `cluster` setting (model would
+table and incremental materializations with non-replicated engine won't be affected by `cluster` setting (model would
be created on the connected node only).
**Compatibility**
@@ -125,9 +125,9 @@ without `on cluster` clause for this model.
#### Read-after-write Consistency {#read-after-write-consistency}
-dbt relies on a read-after-insert consistency model. This is not compatible with ClickHouse clusters that have more than one replica if you cannot guarantee that all operations will go to the same replica. You may not encounter problems in your day-to-day usage of dbt, but there are some strategies depending on your cluster to have this guarantee in place:
-- If you are using a ClickHouse Cloud cluster, you only need to set `select_sequential_consistency: 1` in your profile's `custom_settings` property. You can find more information about this setting [here](/operations/settings/settings#select_sequential_consistency).
-- If you are using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some `replica aware routing`/`sticky sessions` mechanism to be able to always reach the same replica. Adding the setting `select_sequential_consistency = 1` in clusters outside ClickHouse Cloud is [not recommended](/operations/settings/settings#select_sequential_consistency).
+dbt relies on a read-after-insert consistency model. This isn't compatible with ClickHouse clusters that have more than one replica if you can't guarantee that all operations will go to the same replica. You may not encounter problems in your day-to-day usage of dbt, but there are some strategies depending on your cluster to have this guarantee in place:
+- If you're using a ClickHouse Cloud cluster, you only need to set `select_sequential_consistency: 1` in your profile's `custom_settings` property. You can find more information about this setting [here](/operations/settings/settings#select_sequential_consistency).
+- If you're using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some `replica aware routing`/`sticky sessions` mechanism to be able to always reach the same replica. Adding the setting `select_sequential_consistency = 1` in clusters outside ClickHouse Cloud is [not recommended](/operations/settings/settings#select_sequential_consistency).
## General information about features {#general-information-about-features}
@@ -389,7 +389,7 @@ caveats to using this strategy:
rare cases using non-deterministic incremental_predicates
this could result in a race condition for the updated/deleted items (and related log messages in the ClickHouse logs).
To ensure consistent results the
- incremental predicates should only include sub-queries on data that will not be modified during the incremental
+ incremental predicates should only include sub-queries on data that won't be modified during the incremental
materialization.
##### The Microbatch Strategy (Requires dbt-core >= 1.9) {#microbatch-strategy}
@@ -420,14 +420,14 @@ For detailed microbatch usage, refer to the [official documentation](https://doc
This strategy replaces the `inserts_only` setting in previous versions of dbt-clickhouse. This approach simply appends
new rows to the existing relation.
-As a result duplicate rows are not eliminated, and there is no temporary or intermediate table. It is the fastest
+As a result duplicate rows aren't eliminated, and there is no temporary or intermediate table. It is the fastest
approach if duplicates are either permitted
in the data or excluded by the incremental query WHERE clause/filter.
##### The insert_overwrite Strategy (Experimental) {#insert-overwrite-strategy}
> [IMPORTANT]
-> Currently, the insert_overwrite strategy is not fully functional with distributed materializations.
+> Currently, the insert_overwrite strategy isn't fully functional with distributed materializations.
Performs the following steps:
@@ -440,7 +440,7 @@ This approach has the following advantages:
- It is faster than the default strategy because it doesn't copy the entire table.
- It is safer than other strategies because it doesn't modify the original table until the INSERT operation completes
- successfully: in case of intermediate failure, the original table is not modified.
+ successfully: in case of intermediate failure, the original table isn't modified.
- It implements "partitions immutability" data engineering best practice. Which simplifies incremental and parallel data
processing, rollbacks, etc.
@@ -479,14 +479,14 @@ select a,b,c from {{ source('raw', 'table_2') }}
> IMPORTANT!
>
> When updating a model with multiple materialized views (MVs), especially when renaming one of the MV names,
-> dbt-clickhouse does not automatically drop the old MV. Instead,
+> dbt-clickhouse doesn't automatically drop the old MV. Instead,
> you will encounter the following warning:
`Warning - Table was detected with the same pattern as model name but was not found in this run. In case it is a renamed mv that was previously part of this model, drop it manually (!!!) `
#### How to iterate the target table schema {#how-to-iterate-the-target-table-schema}
Starting with dbt-clickhouse version 1.9.8, you can control how the target table schema is iterated when `dbt run` encounters different columns in the MV's SQL.
-By default, dbt will not apply any changes to the target table (`ignore` setting value), but you can change this setting to follow the same behavior as the `on_schema_change` config [in incremental models](https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change).
+By default, dbt won't apply any changes to the target table (`ignore` setting value), but you can change this setting to follow the same behavior as the `on_schema_change` config [in incremental models](https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change).
Also, you can use this setting as a safety mechanism. If you set it to `fail`, the build will fail if the columns in the MV's SQL differ from the target table that was created by the first `dbt run`.
@@ -532,7 +532,7 @@ refreshable config object):
|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------|
| refresh_interval | The interval clause (required) | Yes | |
| randomize | The randomization clause, will appear after `RANDOMIZE FOR` | | |
-| append | If set to `True`, each refresh inserts rows into the table without deleting existing rows. The insert is not atomic, just like a regular INSERT SELECT. | | False |
+| append | If set to `True`, each refresh inserts rows into the table without deleting existing rows. The insert isn't atomic, just like a regular INSERT SELECT. | | False |
| depends_on | A dependencies list for the refreshable mv. Please provide the dependencies in the following format `{schema}.{view_name}` | | |
| depends_on_validation | Whether to validate the existence of the dependencies provided in `depends_on`. In case a dependency doesn't contain a schema, the validation occurs on schema `default` | | False |
@@ -555,15 +555,15 @@ A config example for refreshable materialized view:
#### Limitations {#limitations}
-* When creating a refreshable materialized view (MV) in ClickHouse that has a dependency, ClickHouse does not throw an
- error if the specified dependency does not exist at the time of creation. Instead, the refreshable MV remains in an
+* When creating a refreshable materialized view (MV) in ClickHouse that has a dependency, ClickHouse doesn't throw an
+ error if the specified dependency doesn't exist at the time of creation. Instead, the refreshable MV remains in an
inactive state, waiting for the dependency to be satisfied before it starts processing updates or refreshing.
- This behavior is by design, but it may lead to delays in data availability if the required dependency is not addressed
+ This behavior is by design, but it may lead to delays in data availability if the required dependency isn't addressed
promptly. Users are advised to ensure all dependencies are correctly defined and exist before creating a refreshable
materialized view.
-* As of today, there is no actual "dbt linkage" between the mv and its dependencies, therefore the creation order is not
+* As of today, there is no actual "dbt linkage" between the mv and its dependencies, therefore the creation order isn't
guaranteed.
-* The refreshable feature was not tested with multiple mvs directing to the same target model.
+* The refreshable feature wasn't tested with multiple mvs directing to the same target model.
### Materialization: dictionary (experimental) {#materialization-dictionary}
@@ -638,7 +638,7 @@ strategies correctly.
2. _The Delete+Insert_ Strategy creates distributed temp table to work with all data on every shard.
3. _The Default (Legacy) Strategy_ creates distributed temp and intermediate tables for the same reason.
-Only shard tables are replacing, because distributed table does not keep data.
+Only shard tables are replacing, because distributed table doesn't keep data.
The distributed table reloads only when the full_refresh mode is enabled or the table structure may have changed.
#### Distributed incremental model example {#distributed-incremental-model-example}
@@ -700,7 +700,7 @@ For more information on configuration, check out the [snapshot configs](https://
Only exact column type contracts are supported. For example, a contract with a UInt32 column type will fail if the model
returns a UInt64 or other integer type.
ClickHouse also support _only_ `CHECK` constraints on the entire table/model. Primary key, foreign key, unique, and
-column level CHECK constraints are not supported.
+column level CHECK constraints aren't supported.
(See ClickHouse documentation on primary/order by keys.)
### Additional ClickHouse Macros {#additional-clickhouse-macros}
@@ -761,7 +761,7 @@ dbt-clickhouse supports most of the cross database macros now included in `dbt C
### dbt Catalog Integration Status {#dbt-catalog-integration-status}
-dbt Core v1.10 introduced catalog integration support, which allows adapters to materialize models into external catalogs that manage open table formats like Apache Iceberg. **This feature is not yet natively implemented in dbt-clickhouse.** You can track the progress of this feature implementation in [GitHub issue #489](https://github.com/ClickHouse/dbt-clickhouse/issues/489).
+dbt Core v1.10 introduced catalog integration support, which allows adapters to materialize models into external catalogs that manage open table formats like Apache Iceberg. **This feature isn't yet natively implemented in dbt-clickhouse.** You can track the progress of this feature implementation in [GitHub issue #489](https://github.com/ClickHouse/dbt-clickhouse/issues/489).
### ClickHouse Catalog Support {#clickhouse-catalog-support}
@@ -822,5 +822,5 @@ The good things about these workarounds are:
But there are currently some limitations:
* **Manual setup:** Iceberg tables and catalog databases must be created manually in ClickHouse before they can be referenced in dbt.
-* **No catalog-level DDL:** dbt cannot manage catalog-level operations like creating or dropping Iceberg tables in external catalogs. So you will not be able to create them right now from the dbt connector. Creating tables with the Iceberg() engines may be added in the future.
+* **No catalog-level DDL:** dbt can't manage catalog-level operations like creating or dropping Iceberg tables in external catalogs. So you won't be able to create them right now from the dbt connector. Creating tables with the Iceberg() engines may be added in the future.
* **Write operations:** Currently, writing into Iceberg/Data Catalog tables is limited. Check the ClickHouse documentation to understand which options are available.
diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/guides.md b/docs/integrations/data-ingestion/etl-tools/dbt/guides.md
index f1d80cba68b..93879f9813e 100644
--- a/docs/integrations/data-ingestion/etl-tools/dbt/guides.md
+++ b/docs/integrations/data-ingestion/etl-tools/dbt/guides.md
@@ -404,7 +404,7 @@ When using the view materialization, a model is rebuilt as a view on each run, v
## Creating a table materialization {#creating-a-table-materialization}
-In the previous example, our model was materialized as a view. While this might offer sufficient performance for some queries, more complex SELECTs or frequently executed queries may be better materialized as a table. This materialization is useful for models that will be queried by BI tools to ensure users have a faster experience. This effectively causes the query results to be stored as a new table, with the associated storage overheads - effectively, an `INSERT TO SELECT` is executed. Note that this table will be reconstructed each time i.e., it is not incremental. Large result sets may therefore result in long execution times - see [dbt Limitations](/integrations/dbt#limitations).
+In the previous example, our model was materialized as a view. While this might offer sufficient performance for some queries, more complex SELECTs or frequently executed queries may be better materialized as a table. This materialization is useful for models that will be queried by BI tools to ensure users have a faster experience. This effectively causes the query results to be stored as a new table, with the associated storage overheads - effectively, an `INSERT TO SELECT` is executed. Note that this table will be reconstructed each time i.e., it isn't incremental. Large result sets may therefore result in long execution times - see [dbt Limitations](/integrations/dbt#limitations).
1. Modify the file `actors_summary.sql` such that the `materialized` parameter is set to `table`. Notice how `ORDER BY` is defined, and notice we use the `MergeTree` table engine:
@@ -667,7 +667,7 @@ AND event_time > subtractMinutes(now(), 15) ORDER BY event_time LIMIT 100;
Adjust the above query to the period of execution. We leave result inspection to the user but highlight the general strategy used by the adapter to perform incremental updates:
1. The adapter creates a temporary table `actor_sumary__dbt_tmp`. Rows that have changed are streamed into this table.
-2. A new table, `actor_summary_new,` is created. The rows from the old table are, in turn, streamed from the old to new, with a check to make sure row ids do not exist in the temporary table. This effectively handles updates and duplicates.
+2. A new table, `actor_summary_new,` is created. The rows from the old table are, in turn, streamed from the old to new, with a check to make sure row ids don't exist in the temporary table. This effectively handles updates and duplicates.
3. The results from the temporary table are streamed into the new `actor_summary` table:
4. Finally, the new table is exchanged atomically with the old version via an `EXCHANGE TABLES` statement. The old and temporary tables are in turn dropped.
@@ -814,7 +814,7 @@ Performs the following steps:
This approach has the following advantages:
* It is faster than the default strategy because it doesn't copy the entire table.
-* It is safer than other strategies because it doesn't modify the original table until the INSERT operation completes successfully: in case of intermediate failure, the original table is not modified.
+* It is safer than other strategies because it doesn't modify the original table until the INSERT operation completes successfully: in case of intermediate failure, the original table isn't modified.
* It implements "partitions immutability" data engineering best practice. Which simplifies incremental and parallel data processing, rollbacks, etc.
@@ -890,7 +890,7 @@ This example assumes you have completed [Creating an Incremental Table Model](#c
A few observations regarding this content:
* The select query defines the results you wish to snapshot over time. The function ref is used to reference our previously created actor_summary model.
-* We require a timestamp column to indicate record changes. Our updated_at column (see [Creating an Incremental Table Model](#creating-an-incremental-materialization)) can be used here. The parameter strategy indicates our use of a timestamp to denote updates, with the parameter updated_at specifying the column to use. If this is not present in your model you can alternatively use the [check strategy](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots#check-strategy). This is significantly more inefficient and requires the user to specify a list of columns to compare. dbt compares the current and historical values of these columns, recording any changes (or doing nothing if identical).
+* We require a timestamp column to indicate record changes. Our updated_at column (see [Creating an Incremental Table Model](#creating-an-incremental-materialization)) can be used here. The parameter strategy indicates our use of a timestamp to denote updates, with the parameter updated_at specifying the column to use. If this isn't present in your model you can alternatively use the [check strategy](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots#check-strategy). This is significantly more inefficient and requires the user to specify a list of columns to compare. dbt compares the current and historical values of these columns, recording any changes (or doing nothing if identical).
3. Run the command `dbt snapshot`.
@@ -996,7 +996,7 @@ For further details on dbt snapshots see [here](https://docs.getdbt.com/docs/bui
## Using seeds {#using-seeds}
-dbt provides the ability to load data from CSV files. This capability is not suited to loading large exports of a database and is more designed for small files typically used for code tables and [dictionaries](../../../../sql-reference/dictionaries/index.md), e.g. mapping country codes to country names. For a simple example, we generate and then upload a list of genre codes using the seed functionality.
+dbt provides the ability to load data from CSV files. This capability isn't suited to loading large exports of a database and is more designed for small files typically used for code tables and [dictionaries](../../../../sql-reference/dictionaries/index.md), e.g. mapping country codes to country names. For a simple example, we generate and then upload a list of genre codes using the seed functionality.
1. We generate a list of genre codes from our existing dataset. From the dbt directory, use the `clickhouse-client` to create a file `seeds/genre_codes.csv`:
diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md
index 554dcca865b..ee5bca53196 100644
--- a/docs/integrations/data-ingestion/etl-tools/dbt/index.md
+++ b/docs/integrations/data-ingestion/etl-tools/dbt/index.md
@@ -22,7 +22,7 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
## The dbt-clickhouse Adapter {#dbt-clickhouse-adapter}
**dbt** (data build tool) enables analytics engineers to transform data in their warehouses by simply writing select statements. dbt handles materializing these select statements into objects in the database in the form of tables and views - performing the T of [Extract Load and Transform (ELT)](https://en.wikipedia.org/wiki/Extract,_load,_transform). You can create a model defined by a SELECT statement.
-Within dbt, these models can be cross-referenced and layered to allow the construction of higher-level concepts. The boilerplate SQL required to connect models is automatically generated. Furthermore, dbt identifies dependencies between models and ensures they are created in the appropriate order using a directed acyclic graph (DAG).
+Within dbt, these models can be cross-referenced and layered to allow the construction of higher-level concepts. The boilerplate SQL required to connect models is automatically generated. Furthermore, dbt identifies dependencies between models and ensures they're created in the appropriate order using a directed acyclic graph (DAG).
dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse).
@@ -49,7 +49,7 @@ List of supported features:
- [x] ClickHouse-specific column configurations (Codec, TTL...)
- [x] ClickHouse-specific table settings (indexes, projections...)
-All features up to dbt-core 1.10 are supported, including `--sample` flag and all deprecation warnings fixed for future releases. **Catalog integrations** (e.g., Iceberg) introduced in dbt 1.10 are not yet natively supported in the adapter, but workarounds are available. See the [Catalog Support section](/integrations/dbt/features-and-configurations#catalog-support) for details.
+All features up to dbt-core 1.10 are supported, including `--sample` flag and all deprecation warnings fixed for future releases. **Catalog integrations** (e.g., Iceberg) introduced in dbt 1.10 aren't yet natively supported in the adapter, but workarounds are available. See the [Catalog Support section](/integrations/dbt/features-and-configurations#catalog-support) for details.
This adapter is still not available for use inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to make it available soon. Please reach out to support to get more information on this.
@@ -61,7 +61,7 @@ dbt provides 5 types of materialization. All of them are supported by `dbt-click
* **view** (default): The model is built as a view in the database. At ClickHouse this is built as a [view](/sql-reference/statements/create/view).
* **table**: The model is built as a table in the database. At ClickHouse this is built as a [table](/sql-reference/statements/create/table).
-* **ephemeral**: The model is not directly built in the database but is instead pulled into dependent models as CTEs (Common Table Expressions).
+* **ephemeral**: The model isn't directly built in the database but is instead pulled into dependent models as CTEs (Common Table Expressions).
* **incremental**: The model is initially materialized as a table, and in subsequent runs, dbt inserts new rows and updates changed rows in the table.
* **materialized view**: The model is built as a materialized view in the database. At ClickHouse this is built as a [materialized view](/sql-reference/statements/create/view#materialized-view).
@@ -126,7 +126,7 @@ Go to the [guides page](/integrations/dbt/guides) to learn more about how to use
### Testing and Deploying your models (CI/CD) {#testing-and-deploying-your-models-ci-cd}
-There are many ways to test and deploy your dbt project. dbt has some suggestions for [best practice workflows](https://docs.getdbt.com/best-practices/best-practice-workflows#pro-tips-for-workflows) and [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs). We are going to discuss several strategies, but keep in mind that these strategies may need to be deeply adjusted to fit your specific use case.
+There are many ways to test and deploy your dbt project. dbt has some suggestions for [best practice workflows](https://docs.getdbt.com/best-practices/best-practice-workflows#pro-tips-for-workflows) and [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs). We're going to discuss several strategies, but keep in mind that these strategies may need to be deeply adjusted to fit your specific use case.
#### CI/CD with simple data tests and unit tests {#ci-with-simple-data-tests-and-unit-tests}
@@ -170,12 +170,12 @@ Some operations may take longer than expected due to specific ClickHouse queries
The current ClickHouse adapter for dbt has several limitations you should be aware of:
-- The plugin uses syntax that requires ClickHouse version 25.3 or newer. We do not test older versions of Clickhouse. We also do not currently test Replicated tables.
-- Different runs of the `dbt-adapter` may collide if they are run at the same time as internally they can use the same table names for the same operations. For more information, check the issue [#420](https://github.com/ClickHouse/dbt-clickhouse/issues/420).
+- The plugin uses syntax that requires ClickHouse version 25.3 or newer. We don't test older versions of Clickhouse. We also don't currently test Replicated tables.
+- Different runs of the `dbt-adapter` may collide if they're run at the same time as internally they can use the same table names for the same operations. For more information, check the issue [#420](https://github.com/ClickHouse/dbt-clickhouse/issues/420).
- The adapter currently materializes models as tables using an [INSERT INTO SELECT](https://clickhouse.com/docs/sql-reference/statements/insert-into#inserting-the-results-of-select). This effectively means data duplication if the run is executed again. Very large datasets (PB) can result in extremely long run times, making some models unviable. To improve performance, use ClickHouse Materialized Views by implementing the view as `materialized: materialization_view`. Additionally, aim to minimize the number of rows returned by any query by utilizing `GROUP BY` where possible. Prefer models that summarize data over those that simply transform while maintaining row counts of the source.
-- To use Distributed tables to represent a model, you must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter does not manage cluster creation.
+- To use Distributed tables to represent a model, you must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter doesn't manage cluster creation.
- When dbt creates a relation (table/view) in a database, it usually creates it as: `{{ database }}.{{ schema }}.{{ table/view id }}`. ClickHouse has no notion of schemas. The adapter therefore uses `{{schema}}.{{ table/view id }}`, where `schema` is the ClickHouse database.
-- Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This should not affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements.
+- Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This shouldn't affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements.
## Fivetran {#fivetran}
diff --git a/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md b/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md
index 44e39701190..5ce1a44e0c8 100644
--- a/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md
+++ b/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md
@@ -83,9 +83,9 @@ dataset_table_separator = "___" # Separator for dataset table names fro
:::note HTTP_PORT
The `http_port` parameter specifies the port number to use when connecting to the ClickHouse server's HTTP interface. This is different from default port 9000, which is used for the native TCP protocol.
-You must set `http_port` if you are not using external staging (i.e. you don't set the staging parameter in your pipeline). This is because the built-in ClickHouse local storage staging uses the clickhouse content library, which communicates with ClickHouse over HTTP.
+You must set `http_port` if you're not using external staging (i.e. you don't set the staging parameter in your pipeline). This is because the built-in ClickHouse local storage staging uses the clickhouse content library, which communicates with ClickHouse over HTTP.
-Make sure your ClickHouse server is configured to accept HTTP connections on the port specified by `http_port`. For example, if you set `http_port = 8443`, then ClickHouse should be listening for HTTP requests on port 8443. If you are using external staging, you can omit the `http_port` parameter, since clickhouse-connect will not be used in this case.
+Make sure your ClickHouse server is configured to accept HTTP connections on the port specified by `http_port`. For example, if you set `http_port = 8443`, then ClickHouse should be listening for HTTP requests on port 8443. If you're using external staging, you can omit the `http_port` parameter, since clickhouse-connect won't be used in this case.
:::
You can pass a database connection string similar to the one used by the `clickhouse-driver` library. The credentials above will look like this:
@@ -118,7 +118,7 @@ Data is loaded into ClickHouse using the most efficient method depending on the
## Datasets {#datasets}
-`Clickhouse` does not support multiple datasets in one database, whereas `dlt` relies on datasets due to multiple reasons. In order to make `Clickhouse` work with `dlt`, tables generated by `dlt` in your `Clickhouse` database will have their names prefixed with the dataset name, separated by the configurable `dataset_table_separator`. Additionally, a special sentinel table that does not contain any data will be created, allowing `dlt` to recognize which virtual datasets already exist in a `Clickhouse` destination.
+`Clickhouse` doesn't support multiple datasets in one database, whereas `dlt` relies on datasets due to multiple reasons. In order to make `Clickhouse` work with `dlt`, tables generated by `dlt` in your `Clickhouse` database will have their names prefixed with the dataset name, separated by the configurable `dataset_table_separator`. Additionally, a special sentinel table that doesn't contain any data will be created, allowing `dlt` to recognize which virtual datasets already exist in a `Clickhouse` destination.
## Supported file formats {#supported-file-formats}
@@ -128,10 +128,10 @@ Data is loaded into ClickHouse using the most efficient method depending on the
The `clickhouse` destination has a few specific deviations from the default sql destinations:
1. `Clickhouse` has an experimental `object` datatype, but we have found it to be a bit unpredictable, so the dlt clickhouse destination will load the complex datatype to a text column. If you need this feature, get in touch with our Slack community, and we will consider adding it.
-2. `Clickhouse` does not support the `time` datatype. Time will be loaded to a `text` column.
-3. `Clickhouse` does not support the `binary` datatype. Instead, binary data will be loaded into a `text` column. When loading from `jsonl`, the binary data will be a base64 string, and when loading from parquet, the `binary` object will be converted to `text`.
-5. `Clickhouse` accepts adding columns to a populated table that are not null.
-6. `Clickhouse` can produce rounding errors under certain conditions when using the float or double datatype. If you cannot afford to have rounding errors, make sure to use the decimal datatype. For example, loading the value 12.7001 into a double column with the loader file format set to `jsonl` will predictably produce a rounding error.
+2. `Clickhouse` doesn't support the `time` datatype. Time will be loaded to a `text` column.
+3. `Clickhouse` doesn't support the `binary` datatype. Instead, binary data will be loaded into a `text` column. When loading from `jsonl`, the binary data will be a base64 string, and when loading from parquet, the `binary` object will be converted to `text`.
+5. `Clickhouse` accepts adding columns to a populated table that aren't null.
+6. `Clickhouse` can produce rounding errors under certain conditions when using the float or double datatype. If you can't afford to have rounding errors, make sure to use the decimal datatype. For example, loading the value 12.7001 into a double column with the loader file format set to `jsonl` will predictably produce a rounding error.
## Supported column hints {#supported-column-hints}
ClickHouse supports the following column hints:
diff --git a/docs/integrations/data-ingestion/etl-tools/estuary.md b/docs/integrations/data-ingestion/etl-tools/estuary.md
index 7f9707f4194..57fa84dcdf8 100644
--- a/docs/integrations/data-ingestion/etl-tools/estuary.md
+++ b/docs/integrations/data-ingestion/etl-tools/estuary.md
@@ -19,7 +19,7 @@ import PartnerBadge from '@theme/badges/PartnerBadge';
[Estuary](https://estuary.dev/) is a right-time data platform that flexibly combines real-time and batch data in simple-to-setup ETL pipelines. With enterprise-grade security and deployment options, Estuary unlocks durable data flows from SaaS, database, and streaming sources to a variety of destinations, including ClickHouse.
-Estuary connects with ClickHouse via the Kafka ClickPipe. You do not need to maintain your own Kafka ecosystem with this integration.
+Estuary connects with ClickHouse via the Kafka ClickPipe. You don't need to maintain your own Kafka ecosystem with this integration.
## Setup guide {#setup-guide}
diff --git a/docs/integrations/data-ingestion/etl-tools/nifi-and-clickhouse.md b/docs/integrations/data-ingestion/etl-tools/nifi-and-clickhouse.md
index e61f3bbc5b2..6e96d7cfcbc 100644
--- a/docs/integrations/data-ingestion/etl-tools/nifi-and-clickhouse.md
+++ b/docs/integrations/data-ingestion/etl-tools/nifi-and-clickhouse.md
@@ -152,7 +152,7 @@ For a new setup, download the binary from https://nifi.apache.org/download.html
| Database Connection Pooling Service | ClickHouse JDBC | Select the ClickHouse controller service |
| Table Name | tbl | Input your table name here |
| Translate Field Names | false | Set to "false" so that field names inserted must match the column name |
- | Maximum Batch Size | 1000 | Maximum number of rows per insert. This value should not be lower than the value of "Minimum Number of Records" in `MergeRecord` processor |
+ | Maximum Batch Size | 1000 | Maximum number of rows per insert. This value shouldn't be lower than the value of "Minimum Number of Records" in `MergeRecord` processor |
4. To confirm that each insert contains multiple rows, check that the row count in the table is incrementing by at least the value of "Minimum Number of Records" defined in `MergeRecord`.
diff --git a/docs/integrations/data-ingestion/etl-tools/vector-to-clickhouse.md b/docs/integrations/data-ingestion/etl-tools/vector-to-clickhouse.md
index 86b58d64343..f0eb4079a05 100644
--- a/docs/integrations/data-ingestion/etl-tools/vector-to-clickhouse.md
+++ b/docs/integrations/data-ingestion/etl-tools/vector-to-clickhouse.md
@@ -45,7 +45,7 @@ Define a table to store the log events:
CREATE DATABASE IF NOT EXISTS nginxdb
```
-2. Insert the entire log event as a single string. Obviously this is not a great format for performing analytics on the log data, but we will figure that part out below using ***materialized views***.
+2. Insert the entire log event as a single string. Obviously this isn't a great format for performing analytics on the log data, but we will figure that part out below using ***materialized views***.
```sql
CREATE TABLE IF NOT EXISTS nginxdb.access_logs (
@@ -122,7 +122,7 @@ SELECT * FROM nginxdb.access_logs
## Parse the Logs {#4-parse-the-logs}
-Having the logs in ClickHouse is great, but storing each event as a single string does not allow for much data analysis.
+Having the logs in ClickHouse is great, but storing each event as a single string doesn't allow for much data analysis.
We'll next look at how to parse the log events using a [materialized view](/materialized-view/incremental-materialized-view).
A **materialized view** functions similarly to an insert trigger in SQL. When rows of data are inserted into a source table, the materialized view makes some transformation of these rows and inserts the results into a target table.
@@ -144,7 +144,7 @@ SELECT splitByWhitespace('192.168.208.1 - - [12/Oct/2021:15:32:43 +0000] "GET /
["192.168.208.1","-","-","[12/Oct/2021:15:32:43","+0000]","\"GET","/","HTTP/1.1\"","304","0","\"-\"","\"Mozilla/5.0","(Macintosh;","Intel","Mac","OS","X","10_15_7)","AppleWebKit/537.36","(KHTML,","like","Gecko)","Chrome/93.0.4577.63","Safari/537.36\""]
```
-A few of the strings have some extra characters, and the user agent (the browser details) did not need to be parsed, but
+A few of the strings have some extra characters, and the user agent (the browser details) didn't need to be parsed, but
the resulting array is close to what is needed.
Similar to `splitByWhitespace`, the [`splitByRegexp`](/sql-reference/functions/splitting-merging-functions#splitByRegexp) function splits a string into an array based on a regular expression.
@@ -175,7 +175,7 @@ However, if we change the separator from a colon (**:**) to a comma (**,**) then
SELECT parseDateTimeBestEffort(replaceOne(trim(LEADING '[' FROM '[12/Oct/2021:15:32:43'), ':', ' '))
```
-We are now ready to define the materialized view.
+We're now ready to define the materialized view.
The definition below includes `POPULATE`, which means the existing rows in **access_logs** will be processed and inserted right away.
Run the following SQL statement:
@@ -225,7 +225,7 @@ SELECT * FROM nginxdb.access_logs_view
:::note
The lesson above stored the data in two tables, but you could change the initial `nginxdb.access_logs` table to use the [`Null`](/engines/table-engines/special/null) table engine.
-The parsed data will still end up in the `nginxdb.access_logs_view` table, but the raw data will not be stored in a table.
+The parsed data will still end up in the `nginxdb.access_logs_view` table, but the raw data won't be stored in a table.
:::
diff --git a/docs/integrations/data-ingestion/gcs/index.md b/docs/integrations/data-ingestion/gcs/index.md
index 89d68b88eb9..3a9c2448c16 100644
--- a/docs/integrations/data-ingestion/gcs/index.md
+++ b/docs/integrations/data-ingestion/gcs/index.md
@@ -16,7 +16,7 @@ import GCS_examine_bucket_2 from '@site/static/images/integrations/data-ingestio
# Integrate Google Cloud Storage with ClickHouse
:::note
-If you are using ClickHouse Cloud on [Google Cloud](https://cloud.google.com), this page does not apply as your services will already be using [Google Cloud Storage](https://cloud.google.com/storage). If you are looking to `SELECT` or `INSERT` data from GCS, please see the [`gcs` table function](/sql-reference/table-functions/gcs).
+If you're using ClickHouse Cloud on [Google Cloud](https://cloud.google.com), this page doesn't apply as your services will already be using [Google Cloud Storage](https://cloud.google.com/storage). If you're looking to `SELECT` or `INSERT` data from GCS, please see the [`gcs` table function](/sql-reference/table-functions/gcs).
:::
ClickHouse recognizes that GCS represents an attractive storage solution if you're seeking to separate storage and compute. To help achieve this, support is provided for using GCS as the storage for a MergeTree engine. This will enable you to exploit the scalability and cost benefits of GCS, and the insert and query performance of the MergeTree engine.
@@ -30,7 +30,7 @@ To utilize a GCS bucket as a disk, we must first declare it within the ClickHous
#### Storage configuration > disks > gcs {#storage_configuration--disks--gcs}
This part of the configuration is shown in the highlighted section and specifies that:
-- Batch deletes are not to be performed. GCS does not currently support batch deletes, so the autodetect is disabled to suppress error messages.
+- Batch deletes aren't to be performed. GCS doesn't currently support batch deletes, so the autodetect is disabled to suppress error messages.
- The type of the disk is `s3` because the S3 API is in use.
- The endpoint as provided by GCS
- The service account HMAC key and secret
@@ -185,7 +185,7 @@ For further information on tuning threads, see [Optimizing for Performance](../s
## Using Google Cloud Storage (GCS) {#gcs-multi-region}
:::tip
-Object storage is used by default in ClickHouse Cloud, you do not need to follow this procedure if you are running in ClickHouse Cloud.
+Object storage is used by default in ClickHouse Cloud, you don't need to follow this procedure if you're running in ClickHouse Cloud.
:::
### Plan the deployment {#plan-the-deployment}
@@ -220,7 +220,7 @@ Deploy ClickHouse on two hosts, in the sample configurations these are named `ch
Place `chnode1` in one GCP region, and `chnode2` in a second. In this guide `us-east1` and `us-east4` are used for the compute engine VMs, and also for GCS buckets.
:::note
-Do not start `clickhouse server` until after it is configured. Just install it.
+Don't start `clickhouse server` until after it is configured. Just install it.
:::
Refer to the [installation instructions](/getting-started/install/install.mdx) when performing the deployment steps on the ClickHouse server nodes.
@@ -235,7 +235,7 @@ Refer to the [installation instructions](/getting-started/install/install.mdx) w
The two ClickHouse servers will be located in different regions for high availability. Each will have a GCS bucket in the same region.
-In **Cloud Storage > Buckets** choose **CREATE BUCKET**. For this tutorial two buckets are created, one in each of `us-east1` and `us-east4`. The buckets are single region, standard storage class, and not public. When prompted, enable public access prevention. Do not create folders, they will be created when ClickHouse writes to the storage.
+In **Cloud Storage > Buckets** choose **CREATE BUCKET**. For this tutorial two buckets are created, one in each of `us-east1` and `us-east4`. The buckets are single region, standard storage class, and not public. When prompted, enable public access prevention. Don't create folders, they will be created when ClickHouse writes to the storage.
If you need step-by-step instructions to create buckets and an HMAC key, then expand **Create GCS buckets and an HMAC key** and follow along:
@@ -362,7 +362,7 @@ This file configures the hostname and port of each ClickHouse server in the clus
#### Replica identification {#replica-identification}
-This file configures settings related to the ClickHouse Keeper path. Specifically the macros used to identify which replica the data is part of. On one server the replica should be specified as `replica_1`, and on the other server `replica_2`. The names can be changed, based on our example of one replica being stored in South Carolina and the other in Northern Virginia the values could be `carolina` and `virginia`; just make sure that they are different on each machine.
+This file configures settings related to the ClickHouse Keeper path. Specifically the macros used to identify which replica the data is part of. On one server the replica should be specified as `replica_1`, and on the other server `replica_2`. The names can be changed, based on our example of one replica being stored in South Carolina and the other in Northern Virginia the values could be `carolina` and `virginia`; just make sure that they're different on each machine.
```xml title=/etc/clickhouse-server/config.d/macros.xml
diff --git a/docs/integrations/data-ingestion/kafka/confluent/confluent-cloud.md b/docs/integrations/data-ingestion/kafka/confluent/confluent-cloud.md
index 0a56db5484d..f2f49c0216b 100644
--- a/docs/integrations/data-ingestion/kafka/confluent/confluent-cloud.md
+++ b/docs/integrations/data-ingestion/kafka/confluent/confluent-cloud.md
@@ -30,7 +30,7 @@ import Image from '@theme/IdealImage';
## Prerequisites {#prerequisites}
-We assume you are familiar with:
+We assume you're familiar with:
* [ClickHouse Connector Sink](../kafka-clickhouse-connect-sink.md)
* Confluent Cloud
@@ -42,7 +42,7 @@ Creating a topic on Confluent Cloud is fairly simple, and there are detailed ins
#### Important notes {#important-notes}
* The Kafka topic name must be the same as the ClickHouse table name. The way to tweak this is by using a transformer (for example [`ExtractTopic`](https://docs.confluent.io/platform/current/connect/transforms/extracttopic.html)).
-* More partitions does not always mean more performance - see our upcoming guide for more details and performance tips.
+* More partitions doesn't always mean more performance - see our upcoming guide for more details and performance tips.
#### Gather your connection details {#gather-your-connection-details}
diff --git a/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md b/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md
index f710a0841d0..e07331cc45e 100644
--- a/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md
+++ b/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md
@@ -27,7 +27,7 @@ import AddCustomConnectorPlugin from '@site/static/images/integrations/data-inge
## Prerequisites {#prerequisites}
-We assume you are familiar with:
+We assume you're familiar with:
* [ClickHouse Connector Sink](../kafka-clickhouse-connect-sink.md)
* Confluent Platform and [Custom Connectors](https://docs.confluent.io/cloud/current/connectors/bring-your-connector/overview.html).
@@ -43,7 +43,7 @@ Creating a topic on Confluent Platform is fairly simple, and there are detailed
#### Important notes {#important-notes}
* The Kafka topic name must be the same as the ClickHouse table name. The way to tweak this is by using a transformer (for example [`ExtractTopic`](https://docs.confluent.io/platform/current/connect/transforms/extracttopic.html)).
-* More partitions does not always mean more performance - see our upcoming guide for more details and performance tips.
+* More partitions doesn't always mean more performance - see our upcoming guide for more details and performance tips.
#### Install connector {#install-connector}
You can download the connector from our [repository](https://github.com/ClickHouse/clickhouse-kafka-connect/releases) - please feel free to submit comments and issues there as well!
diff --git a/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md b/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md
index 33d76572f4b..70e1f90c632 100644
--- a/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md
+++ b/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md
@@ -17,7 +17,7 @@ import httpAdvanced from '@site/static/images/integrations/data-ingestion/kafka/
import createMessageInTopic from '@site/static/images/integrations/data-ingestion/kafka/confluent/create_message_in_topic.png';
# Confluent HTTP sink connector
-The HTTP Sink Connector is data type agnostic and thus does not need a Kafka schema as well as supporting ClickHouse specific data types such as Maps and Arrays. This additional flexibility comes at a slight increase in configuration complexity.
+The HTTP Sink Connector is data type agnostic and thus doesn't need a Kafka schema as well as supporting ClickHouse specific data types such as Maps and Arrays. This additional flexibility comes at a slight increase in configuration complexity.
Below we describe a simple installation, pulling messages from a single Kafka topic and inserting rows into a ClickHouse table.
@@ -105,7 +105,7 @@ and verify the created message's been written to your ClickHouse instance.
#### HTTP Sink doesn't batch messages {#http-sink-doesnt-batch-messages}
From the [Sink documentation](https://docs.confluent.io/kafka-connectors/http/current/overview.html#http-sink-connector-for-cp):
-> The HTTP Sink connector does not batch requests for messages containing Kafka header values that are different.
+> The HTTP Sink connector doesn't batch requests for messages containing Kafka header values that are different.
1. Verify your Kafka records have the same key.
2. When you add parameters to the HTTP API URL, each record can result in a unique URL. For this reason, batching is disabled when using additional URL parameters.
@@ -128,7 +128,7 @@ Note that this example preserves the Array fields of the Github dataset. We assu
Follow [these instructions](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html#set-up-a-local-connect-worker-with-cp-install) for setting up Connect relevant to your installation type, noting the differences between a standalone and distributed cluster. If using Confluent Cloud, the distributed setup is relevant.
-The most important parameter is the `http.api.url`. The [HTTP interface](/interfaces/http) for ClickHouse requires you to encode the INSERT statement as a parameter in the URL. This must include the format (`JSONEachRow` in this case) and target database. The format must be consistent with the Kafka data, which will be converted to a string in the HTTP payload. These parameters must be URL escaped. An example of this format for the Github dataset (assuming you are running ClickHouse locally) is shown below:
+The most important parameter is the `http.api.url`. The [HTTP interface](/interfaces/http) for ClickHouse requires you to encode the INSERT statement as a parameter in the URL. This must include the format (`JSONEachRow` in this case) and target database. The format must be consistent with the Kafka data, which will be converted to a string in the HTTP payload. These parameters must be URL escaped. An example of this format for the Github dataset (assuming you're running ClickHouse locally) is shown below:
```response
://:?query=INSERT%20INTO%20.
%20FORMAT%20JSONEachRow
@@ -141,14 +141,14 @@ The following additional parameters are relevant to using the HTTP Sink with Cli
* `request.method` - Set to **POST**
* `retry.on.status.codes` - Set to 400-500 to retry on any error codes. Refine based expected errors in data.
* `request.body.format` - In most cases this will be JSON.
-* `auth.type` - Set to BASIC if you security with ClickHouse. Other ClickHouse compatible authentication mechanisms are not currently supported.
+* `auth.type` - Set to BASIC if you security with ClickHouse. Other ClickHouse compatible authentication mechanisms aren't currently supported.
* `ssl.enabled` - set to true if using SSL.
* `connection.user` - username for ClickHouse.
* `connection.password` - password for ClickHouse.
* `batch.max.size` - The number of rows to send in a single batch. Ensure this set is to an appropriately large number. Per ClickHouse [recommendations](/sql-reference/statements/insert-into#performance-considerations) a value of 1000 should be considered a minimum.
* `tasks.max` - The HTTP Sink connector supports running one or more tasks. This can be used to increase performance. Along with batch size this represents your primary means of improving performance.
* `key.converter` - set according to the types of your keys.
-* `value.converter` - set based on the type of data on your topic. This data does not need a schema. The format here must be consistent with the FORMAT specified in the parameter `http.api.url`. The simplest here is to use JSON and the org.apache.kafka.connect.json.JsonConverter converter. Treating the value as a string, via the converter org.apache.kafka.connect.storage.StringConverter, is also possible - although this will require the user to extract a value in the insert statement using functions. [Avro format](/interfaces/formats/Avro) is also supported in ClickHouse if using the io.confluent.connect.avro.AvroConverter converter.
+* `value.converter` - set based on the type of data on your topic. This data doesn't need a schema. The format here must be consistent with the FORMAT specified in the parameter `http.api.url`. The simplest here is to use JSON and the org.apache.kafka.connect.json.JsonConverter converter. Treating the value as a string, via the converter org.apache.kafka.connect.storage.StringConverter, is also possible - although this will require the user to extract a value in the insert statement using functions. [Avro format](/interfaces/formats/Avro) is also supported in ClickHouse if using the io.confluent.connect.avro.AvroConverter converter.
A full list of settings, including how to configure a proxy, retries, and advanced SSL, can be found [here](https://docs.confluent.io/kafka-connect-http/current/connector_config.html).
diff --git a/docs/integrations/data-ingestion/kafka/index.md b/docs/integrations/data-ingestion/kafka/index.md
index d37d90a1d1d..419f6903eed 100644
--- a/docs/integrations/data-ingestion/kafka/index.md
+++ b/docs/integrations/data-ingestion/kafka/index.md
@@ -93,9 +93,9 @@ To get started using the Kafka table engine, see the [reference documentation](.
| Product | Strengths | Weaknesses |
|---------|-----------|------------|
-| **ClickPipes for Kafka** | • Scalable architecture for high throughput and low latency • Built-in monitoring and schema management • Private networking connections (via PrivateLink) • Supports SSL/TLS authentication and IAM authorization • Supports programmatic configuration (Terraform, API endpoints) | • Does not support pushing data to Kafka • At-least-once semantics |
-| **Kafka Connect Sink** | • Exactly-once semantics • Allows granular control over data transformation, batching and error handling • Can be deployed in private networks • Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • Does not support pushing data to Kafka • Operationally complex to set up and maintain • Requires Kafka and Kafka Connect expertise |
-| **Kafka table engine** | • Supports [pushing data to Kafka](./kafka-table-engine.md/#clickhouse-to-kafka) • Operationally simple to set up | • At-least-once semantics • Limited horizontal scaling for consumers. Cannot be scaled independently from the ClickHouse server • Limited error handling and debugging options • Requires Kafka expertise |
+| **ClickPipes for Kafka** | • Scalable architecture for high throughput and low latency • Built-in monitoring and schema management • Private networking connections (via PrivateLink) • Supports SSL/TLS authentication and IAM authorization • Supports programmatic configuration (Terraform, API endpoints) | • Doesn't support pushing data to Kafka • At-least-once semantics |
+| **Kafka Connect Sink** | • Exactly-once semantics • Allows granular control over data transformation, batching and error handling • Can be deployed in private networks • Allows real-time replication from databases not yet supported in ClickPipes via Debezium | • Doesn't support pushing data to Kafka • Operationally complex to set up and maintain • Requires Kafka and Kafka Connect expertise |
+| **Kafka table engine** | • Supports [pushing data to Kafka](./kafka-table-engine.md/#clickhouse-to-kafka) • Operationally simple to set up | • At-least-once semantics • Limited horizontal scaling for consumers. Can't be scaled independently from the ClickHouse server • Limited error handling and debugging options • Requires Kafka expertise |
### Other options {#other-options}
diff --git a/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md b/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md
index 700237f1959..03b611f29f6 100644
--- a/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md
+++ b/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md
@@ -105,7 +105,7 @@ The full table of configuration options:
| `errors.retry.timeout` | ClickHouse JDBC Retry Timeout | `"60"` |
| `exactlyOnce` | Exactly Once Enabled | `"false"` |
| `topics` (Required) | The Kafka topics to poll - topic names must match table names | `""` |
-| `key.converter` (Required* - See Description) | Set according to the types of your keys. Required here if you are passing keys (and not defined in worker config). | `"org.apache.kafka.connect.storage.StringConverter"` |
+| `key.converter` (Required* - See Description) | Set according to the types of your keys. Required here if you're passing keys (and not defined in worker config). | `"org.apache.kafka.connect.storage.StringConverter"` |
| `value.converter` (Required* - See Description) | Set based on the type of data on your topic. Supported: - JSON, String, Avro or Protobuf formats. Required here if not defined in worker config. | `"org.apache.kafka.connect.json.JsonConverter"` |
| `value.converter.schemas.enable` | Connector Value Converter Schema Support | `"false"` |
| `errors.tolerance` | Connector Error Tolerance. Supported: none, all | `"none"` |
@@ -128,7 +128,7 @@ Each topic requires a dedicated target table in ClickHouse. The target table nam
### Pre-processing {#pre-processing}
-If you need to transform outbound messages before they are sent to ClickHouse Kafka Connect
+If you need to transform outbound messages before they're sent to ClickHouse Kafka Connect
Sink, use [Kafka Connect Transformations](https://docs.confluent.io/platform/current/connect/transforms/overview.html).
### Supported data types {#supported-data-types}
@@ -399,7 +399,7 @@ For detailed JMX metric definitions and Prometheus integration, see the [jmx-exp
### Limitations {#limitations}
-- Deletes are not supported.
+- Deletes aren't supported.
- Batch size is inherited from the Kafka Consumer properties.
- When using KeeperMap for exactly-once and the offset is changed or re-wound, you need to delete the content from KeeperMap for that specific topic. (See troubleshooting guide below for more details)
@@ -446,7 +446,7 @@ Kafka Connect (the framework) fetches messages from Kafka topics in the backgrou
- **`fetch.min.bytes`**: Minimum amount of data before the framework passes values to the connector (default: 1 byte)
- **`fetch.max.bytes`**: Maximum amount of data to fetch in a single request (default: 52428800 / 50 MB)
-- **`fetch.max.wait.ms`**: Maximum time to wait before returning data if `fetch.min.bytes` is not met (default: 500 ms)
+- **`fetch.max.wait.ms`**: Maximum time to wait before returning data if `fetch.min.bytes` isn't met (default: 500 ms)
:::note
On Confluent Cloud, adjustment of these settings requires opening a support case through Confluent Cloud.
@@ -767,7 +767,7 @@ Right now the focus is on identifying errors that are transient and can be retri
- 999 - KEEPER_EXCEPTION
- 1002 - UNKNOWN_EXCEPTION
- `SocketTimeoutException` - This is thrown when the socket times out.
-- `UnknownHostException` - This is thrown when the host cannot be resolved.
+- `UnknownHostException` - This is thrown when the host can't be resolved.
- `IOException` - This is thrown when there is a problem with the network.
#### "All my data is blank/zeroes" {#all-my-data-is-blankzeroes}
@@ -783,7 +783,7 @@ transforms.flatten.delimiter=_
This will transform your data from a nested JSON to a flattened JSON (using `_` as a delimiter). Fields in the table would then follow the "field1_field2_field3" format (i.e. "before_id", "after_id", etc.).
#### "I want to use my Kafka keys in ClickHouse" {#i-want-to-use-my-kafka-keys-in-clickhouse}
-Kafka keys are not stored in the value field by default, but you can use the `KeyToValue` transformation to move the key to the value field (under a new `_key` field name):
+Kafka keys aren't stored in the value field by default, but you can use the `KeyToValue` transformation to move the key to the value field (under a new `_key` field name):
```properties
transforms=keyToValue
diff --git a/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md b/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md
index f7530de766d..5832690c829 100644
--- a/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md
+++ b/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md
@@ -13,14 +13,14 @@ import ConnectionDetails from '@site/docs/_snippets/_gather_your_details_http.md
# JDBC connector
:::note
-This connector should only be used if your data is simple and consists of primitive data types e.g., int. ClickHouse specific types such as maps are not supported.
+This connector should only be used if your data is simple and consists of primitive data types e.g., int. ClickHouse specific types such as maps aren't supported.
:::
For our examples, we utilize the Confluent distribution of Kafka Connect.
-Below we describe a simple installation, pulling messages from a single Kafka topic and inserting rows into a ClickHouse table. We recommend Confluent Cloud, which offers a generous free tier for those who do not have a Kafka environment.
+Below we describe a simple installation, pulling messages from a single Kafka topic and inserting rows into a ClickHouse table. We recommend Confluent Cloud, which offers a generous free tier for those who don't have a Kafka environment.
-Note that a schema is required for the JDBC Connector (You cannot use plain JSON or CSV with the JDBC connector). Whilst the schema can be encoded in each message; it is [strongly advised to use the Confluent schema registry](https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/#json-schemas)y to avoid the associated overhead. The insertion script provided automatically infers a schema from the messages and inserts this to the registry - this script can thus be reused for other datasets. Kafka's keys are assumed to be Strings. Further details on Kafka schemas can be found [here](https://docs.confluent.io/platform/current/schema-registry/index.html).
+Note that a schema is required for the JDBC Connector (You can't use plain JSON or CSV with the JDBC connector). Whilst the schema can be encoded in each message; it is [strongly advised to use the Confluent schema registry](https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/#json-schemas)y to avoid the associated overhead. The insertion script provided automatically infers a schema from the messages and inserts this to the registry - this script can thus be reused for other datasets. Kafka's keys are assumed to be Strings. Further details on Kafka schemas can be found [here](https://docs.confluent.io/platform/current/schema-registry/index.html).
### License {#license}
The JDBC Connector is distributed under the [Confluent Community License](https://www.confluent.io/confluent-community-license)
@@ -39,7 +39,7 @@ For sending data to ClickHouse from Kafka, we use the Sink component of the conn
#### 2. Download and install the JDBC Driver {#2-download-and-install-the-jdbc-driver}
-Download and install the ClickHouse JDBC driver `clickhouse-jdbc--shaded.jar` from [here](https://github.com/ClickHouse/clickhouse-java/releases). Install this into Kafka Connect following the details [here](https://docs.confluent.io/kafka-connect-jdbc/current/#installing-jdbc-drivers). Other drivers may work but have not been tested.
+Download and install the ClickHouse JDBC driver `clickhouse-jdbc--shaded.jar` from [here](https://github.com/ClickHouse/clickhouse-java/releases). Install this into Kafka Connect following the details [here](https://docs.confluent.io/kafka-connect-jdbc/current/#installing-jdbc-drivers). Other drivers may work but haven't been tested.
:::note
@@ -64,13 +64,13 @@ The following parameters are relevant to using the JDBC connector with ClickHous
* `pk.mode` - Not relevant to ClickHouse. Set to none.
* `auto.create` - Not supported and must be false.
* `auto.evolve` - We recommend false for this setting although it may be supported in the future.
-* `insert.mode` - Set to "insert". Other modes are not currently supported.
+* `insert.mode` - Set to "insert". Other modes aren't currently supported.
* `key.converter` - Set according to the types of your keys.
* `value.converter` - Set based on the type of data on your topic. This data must have a supported schema - JSON, Avro or Protobuf formats.
If using our sample dataset for testing, ensure the following are set:
-* `value.converter.schemas.enable` - Set to false as we utilize a schema registry. Set to true if you are embedding the schema in each message.
+* `value.converter.schemas.enable` - Set to false as we utilize a schema registry. Set to true if you're embedding the schema in each message.
* `key.converter` - Set to "org.apache.kafka.connect.storage.StringConverter". We utilise String keys.
* `value.converter` - Set "io.confluent.connect.json.JsonSchemaConverter".
* `value.converter.schema.registry.url` - Set to the schema server url along with the credentials for the schema server via the parameter `value.converter.schema.registry.basic.auth.user.info`.
@@ -79,7 +79,7 @@ Example configuration files for the Github sample data can be found [here](https
#### 4. Create the ClickHouse table {#4-create-the-clickhouse-table}
-Ensure the table has been created, dropping it if it already exists from previous examples. An example compatible with the reduced Github dataset is shown below. Not the absence of any Array or Map types that are not currently not supported:
+Ensure the table has been created, dropping it if it already exists from previous examples. An example compatible with the reduced Github dataset is shown below. Not the absence of any Array or Map types that aren't currently not supported:
```sql
CREATE TABLE github
@@ -127,9 +127,9 @@ python producer.py -c github.config
This script can be used to insert any ndjson file into a Kafka topic. This will attempt to infer a schema for you automatically. The sample config provided will only insert 10k messages - [modify here](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/integrations/data-ingestion/kafka/code/producer/github.config#L25) if required. This configuration also removes any incompatible Array fields from the dataset during insertion to Kafka.
-This is required for the JDBC connector to convert messages to INSERT statements. If you are using your own data, ensure you either insert a schema with every message (setting _value.converter.schemas.enable _to true) or ensure your client publishes messages referencing a schema to the registry.
+This is required for the JDBC connector to convert messages to INSERT statements. If you're using your own data, ensure you either insert a schema with every message (setting _value.converter.schemas.enable _to true) or ensure your client publishes messages referencing a schema to the registry.
-Kafka Connect should begin consuming messages and inserting rows into ClickHouse. Note that warnings regards "[JDBC Compliant Mode] Transaction is not supported." are expected and can be ignored.
+Kafka Connect should begin consuming messages and inserting rows into ClickHouse. Note that warnings regards "[JDBC Compliant Mode] Transaction isn't supported." are expected and can be ignored.
A simple read on the target table "Github" should confirm data insertion.
diff --git a/docs/integrations/data-ingestion/kafka/kafka-table-engine.md b/docs/integrations/data-ingestion/kafka/kafka-table-engine.md
index dd40d12083e..a5531d57d99 100644
--- a/docs/integrations/data-ingestion/kafka/kafka-table-engine.md
+++ b/docs/integrations/data-ingestion/kafka/kafka-table-engine.md
@@ -30,7 +30,7 @@ To use the Kafka table engine, you should be broadly familiar with [ClickHouse m
Initially, we focus on the most common use case: using the Kafka table engine to insert data into ClickHouse from Kafka.
-The Kafka table engine allows ClickHouse to read from a Kafka topic directly. Whilst useful for viewing messages on a topic, the engine by design only permits one-time retrieval, i.e. when a query is issued to the table, it consumes data from the queue and increases the consumer offset before returning results to the caller. Data cannot, in effect, be re-read without resetting these offsets.
+The Kafka table engine allows ClickHouse to read from a Kafka topic directly. Whilst useful for viewing messages on a topic, the engine by design only permits one-time retrieval, i.e. when a query is issued to the table, it consumes data from the queue and increases the consumer offset before returning results to the caller. Data can't, in effect, be re-read without resetting these offsets.
To persist this data from a read of the table engine, we need a means of capturing the data and inserting it into another table. Trigger-based materialized views natively provide this functionality. A materialized view initiates a read on the table engine, receiving batches of documents. The TO clause determines the destination of the data - typically a table of the [Merge Tree family](../../../engines/table-engines/mergetree-family/index.md). This process is visualized below:
@@ -44,7 +44,7 @@ If you have data populated on a target topic, you can adapt the following for us
##### 2. Configure ClickHouse {#2-configure-clickhouse}
-This step is required if you are connecting to a secure Kafka. These settings cannot be passed through the SQL DDL commands and must be configured in the ClickHouse config.xml. We assume you are connecting to a SASL secured instance. This is the simplest method when interacting with Confluent Cloud.
+This step is required if you're connecting to a secure Kafka. These settings can't be passed through the SQL DDL commands and must be configured in the ClickHouse config.xml. We assume you're connecting to a SASL secured instance. This is the simplest method when interacting with Confluent Cloud.
```xml
@@ -220,7 +220,7 @@ To stop message consumption, you can detach the Kafka engine table:
DETACH TABLE github_queue;
```
-This will not impact the offsets of the consumer group. To restart consumption, and continue from the previous offset, reattach the table.
+This won't impact the offsets of the consumer group. To restart consumption, and continue from the previous offset, reattach the table.
```sql
ATTACH TABLE github_queue;
@@ -289,11 +289,11 @@ The result looks like:
##### Modify Kafka engine settings {#modify-kafka-engine-settings}
-We recommend dropping the Kafka engine table and recreating it with the new settings. The materialized view does not need to be modified during this process - message consumption will resume once the Kafka engine table is recreated.
+We recommend dropping the Kafka engine table and recreating it with the new settings. The materialized view doesn't need to be modified during this process - message consumption will resume once the Kafka engine table is recreated.
##### Debugging Issues {#debugging-issues}
-Errors such as authentication issues are not reported in responses to Kafka engine DDL. For diagnosing issues, we recommend using the main ClickHouse log file clickhouse-server.err.log. Further trace logging for the underlying Kafka client library [librdkafka](https://github.com/edenhill/librdkafka) can be enabled through configuration.
+Errors such as authentication issues aren't reported in responses to Kafka engine DDL. For diagnosing issues, we recommend using the main ClickHouse log file clickhouse-server.err.log. Further trace logging for the underlying Kafka client library [librdkafka](https://github.com/edenhill/librdkafka) can be enabled through configuration.
```xml
@@ -303,10 +303,10 @@ Errors such as authentication issues are not reported in responses to Kafka engi
##### Handling malformed messages {#handling-malformed-messages}
-Kafka is often used as a "dumping ground" for data. This leads to topics containing mixed message formats and inconsistent field names. Avoid this and utilize Kafka features such Kafka Streams or ksqlDB to ensure messages are well-formed and consistent before insertion into Kafka. If these options are not possible, ClickHouse has some features that can help.
+Kafka is often used as a "dumping ground" for data. This leads to topics containing mixed message formats and inconsistent field names. Avoid this and utilize Kafka features such Kafka Streams or ksqlDB to ensure messages are well-formed and consistent before insertion into Kafka. If these options aren't possible, ClickHouse has some features that can help.
-* Treat the message field as strings. Functions can be used in the materialized view statement to perform cleansing and casting if required. This should not represent a production solution but might assist in one-off ingestion.
-* If you're consuming JSON from a topic, using the JSONEachRow format, use the setting [`input_format_skip_unknown_fields`](/operations/settings/formats#input_format_skip_unknown_fields). When writing data, by default, ClickHouse throws an exception if input data contains columns that do not exist in the target table. However, if this option is enabled, these excess columns will be ignored. Again this is not a production-level solution and might confuse others.
+* Treat the message field as strings. Functions can be used in the materialized view statement to perform cleansing and casting if required. This shouldn't represent a production solution but might assist in one-off ingestion.
+* If you're consuming JSON from a topic, using the JSONEachRow format, use the setting [`input_format_skip_unknown_fields`](/operations/settings/formats#input_format_skip_unknown_fields). When writing data, by default, ClickHouse throws an exception if input data contains columns that don't exist in the target table. However, if this option is enabled, these excess columns will be ignored. Again this isn't a production-level solution and might confuse others.
* Consider the setting `kafka_skip_broken_messages`. This requires the user to specify the level of tolerance per block for malformed messages - considered in the context of kafka_max_block_size. If this tolerance is exceeded (measured in absolute messages) the usual exception behaviour will revert, and other messages will be skipped.
##### Delivery Semantics and challenges with duplicates {#delivery-semantics-and-challenges-with-duplicates}
@@ -459,7 +459,7 @@ Although an elaborate example, this illustrates the power of materialized views
#### Working with ClickHouse Clusters {#working-with-clickhouse-clusters}
-Through Kafka consumer groups, multiple ClickHouse instances can potentially read from the same topic. Each consumer will be assigned to a topic partition in a 1:1 mapping. When scaling ClickHouse consumption using the Kafka table engine, consider that the total number of consumers within a cluster cannot exceed the number of partitions on the topic. Therefore ensure partitioning is appropriately configured for the topic in advance.
+Through Kafka consumer groups, multiple ClickHouse instances can potentially read from the same topic. Each consumer will be assigned to a topic partition in a 1:1 mapping. When scaling ClickHouse consumption using the Kafka table engine, consider that the total number of consumers within a cluster can't exceed the number of partitions on the topic. Therefore ensure partitioning is appropriately configured for the topic in advance.
Multiple ClickHouse instances can all be configured to read from a topic using the same consumer group id - specified during the Kafka table engine creation. Therefore, each instance will read from one or more partitions, inserting segments to their local target table. The target tables can, in turn, be configured to use a ReplicatedMergeTree to handle duplication of the data. This approach allows Kafka reads to be scaled with the ClickHouse cluster, provided there are sufficient Kafka partitions.
@@ -469,14 +469,14 @@ Multiple ClickHouse instances can all be configured to read from a topic using t
Consider the following when looking to increase Kafka Engine table throughput performance:
-* The performance will vary depending on the message size, format, and target table types. 100k rows/sec on a single table engine should be considered obtainable. By default, messages are read in blocks, controlled by the parameter kafka_max_block_size. By default, this is set to the [max_insert_block_size](/operations/settings/settings#max_insert_block_size), defaulting to 1,048,576. Unless messages are extremely large, this should nearly always be increased. Values between 500k to 1M are not uncommon. Test and evaluate the effect on throughput performance.
+* The performance will vary depending on the message size, format, and target table types. 100k rows/sec on a single table engine should be considered obtainable. By default, messages are read in blocks, controlled by the parameter kafka_max_block_size. By default, this is set to the [max_insert_block_size](/operations/settings/settings#max_insert_block_size), defaulting to 1,048,576. Unless messages are extremely large, this should nearly always be increased. Values between 500k to 1M aren't uncommon. Test and evaluate the effect on throughput performance.
* The number of consumers for a table engine can be increased using kafka_num_consumers. However, by default, inserts will be linearized in a single thread unless kafka_thread_per_consumer is changed from the default value of 1. Set this to 1 to ensure flushes are performed in parallel. Note that creating a Kafka engine table with N consumers (and kafka_thread_per_consumer=1) is logically equivalent to creating N Kafka engines, each with a materialized view and kafka_thread_per_consumer=0.
-* Increasing consumers is not a free operation. Each consumer maintains its own buffers and threads, increasing the overhead on the server. Be conscious of the overhead of consumers and scale linearly across your cluster first and if possible.
+* Increasing consumers isn't a free operation. Each consumer maintains its own buffers and threads, increasing the overhead on the server. Be conscious of the overhead of consumers and scale linearly across your cluster first and if possible.
* If the throughput of Kafka messages is variable and delays are acceptable, consider increasing the stream_flush_interval_ms to ensure larger blocks are flushed.
* [background_message_broker_schedule_pool_size](/operations/server-configuration-parameters/settings#background_message_broker_schedule_pool_size) sets the number of threads performing background tasks. These threads are used for Kafka streaming. This setting is applied at the ClickHouse server start and can't be changed in a user session, defaulting to 16. If you see timeouts in the logs, it may be appropriate to increase this.
* For communication with Kafka, the librdkafka library is used, which itself creates threads. Large numbers of Kafka tables, or consumers, can thus result in large numbers of context switches. Either distribute this load across the cluster, only replicating the target tables if possible, or consider using a table engine to read from multiple topics - a list of values is supported. Multiple materialized views can be read from a single table, each filtering to the data from a specific topic.
-Any settings changes should be tested. We recommend monitoring Kafka consumer lags to ensure you are properly scaled.
+Any settings changes should be tested. We recommend monitoring Kafka consumer lags to ensure you're properly scaled.
#### Additional settings {#additional-settings}
diff --git a/docs/integrations/data-ingestion/kafka/kafka-vector.md b/docs/integrations/data-ingestion/kafka/kafka-vector.md
index 6c3803b6010..923c21d0f5f 100644
--- a/docs/integrations/data-ingestion/kafka/kafka-vector.md
+++ b/docs/integrations/data-ingestion/kafka/kafka-vector.md
@@ -16,11 +16,11 @@ import ConnectionDetails from '@site/docs/_snippets/_gather_your_details_http.md
A [getting started](../etl-tools/vector-to-clickhouse.md) guide for Vector with ClickHouse focuses on the log use case and reading events from a file. We utilize the [Github sample dataset](https://datasets-documentation.s3.eu-west-3.amazonaws.com/kafka/github_all_columns.ndjson) with events held on a Kafka topic.
-Vector utilizes [sources](https://vector.dev/docs/about/concepts/#sources) for retrieving data through a push or pull model. [Sinks](https://vector.dev/docs/about/concepts/#sinks) meanwhile provide a destination for events. We, therefore, utilize the Kafka source and ClickHouse sink. Note that whilst Kafka is supported as a Sink, a ClickHouse source is not available. Vector is as a result not appropriate if you want to transfer data to Kafka from ClickHouse.
+Vector utilizes [sources](https://vector.dev/docs/about/concepts/#sources) for retrieving data through a push or pull model. [Sinks](https://vector.dev/docs/about/concepts/#sinks) meanwhile provide a destination for events. We, therefore, utilize the Kafka source and ClickHouse sink. Note that whilst Kafka is supported as a Sink, a ClickHouse source isn't available. Vector is as a result not appropriate if you want to transfer data to Kafka from ClickHouse.
Vector also supports the [transformation](https://vector.dev/docs/reference/configuration/transforms/) of data. This is beyond the scope of this guide. The user is referred to the Vector documentation should they need this on their dataset.
-Note that the current implementation of the ClickHouse sink utilizes the HTTP interface. The ClickHouse sink does not support the use of a JSON schema at this time. Data must be published to Kafka in either plain JSON format or as Strings.
+Note that the current implementation of the ClickHouse sink utilizes the HTTP interface. The ClickHouse sink doesn't support the use of a JSON schema at this time. Data must be published to Kafka in either plain JSON format or as Strings.
### License {#license}
Vector is distributed under the [MPL-2.0 License](https://github.com/vectordotdev/vector/blob/master/LICENSE)
@@ -107,9 +107,9 @@ batch.timeout_secs = 1
A few important notes on this configuration and behavior of Vector:
- This example has been tested against Confluent Cloud. Therefore, the `sasl.*` and `ssl.enabled` security options may not be appropriate in self-managed cases.
-- A protocol prefix is not required for the configuration parameter `bootstrap_servers` e.g. `pkc-2396y.us-east-1.aws.confluent.cloud:9092`
+- A protocol prefix isn't required for the configuration parameter `bootstrap_servers` e.g. `pkc-2396y.us-east-1.aws.confluent.cloud:9092`
- The source parameter `decoding.codec = "json"` ensures the message is passed to the ClickHouse sink as a single JSON object. If handling messages as Strings and using the default `bytes` value, the contents of the message will be appended to a field `message`. In most cases this will require processing in ClickHouse as described in the [Vector getting started](../etl-tools/vector-to-clickhouse.md#4-parse-the-logs) guide.
-- Vector [adds a number of fields](https://vector.dev/docs/reference/configuration/sources/kafka/#output-data) to the messages. In our example, we ignore these fields in the ClickHouse sink via the configuration parameter `skip_unknown_fields = true`. This ignores fields that are not part of the target table schema. Feel free to adjust your schema to ensure these meta fields such as `offset` are added.
+- Vector [adds a number of fields](https://vector.dev/docs/reference/configuration/sources/kafka/#output-data) to the messages. In our example, we ignore these fields in the ClickHouse sink via the configuration parameter `skip_unknown_fields = true`. This ignores fields that aren't part of the target table schema. Feel free to adjust your schema to ensure these meta fields such as `offset` are added.
- Notice how the sink references of the source of events via the parameter `inputs`.
- Note the behavior of the ClickHouse sink as described [here](https://vector.dev/docs/reference/configuration/sinks/clickhouse/#buffers-and-batches). For optimal throughput, you may wish to tune the `buffer.max_events`, `batch.timeout_secs` and `batch.max_bytes` parameters. Per ClickHouse [recommendations](/sql-reference/statements/insert-into#performance-considerations) a value of 1000 is should be considered a minimum for the number of events in any single batch. For uniform high throughput use cases, you may increase the parameter `buffer.max_events`. More variable throughputs may require changes in the parameter `batch.timeout_secs`
- The parameter `auto_offset_reset = "smallest"` forces the Kafka source to start from the start of the topic - thus ensuring we consume the messages published in step (1). You may require different behavior. See [here](https://vector.dev/docs/reference/configuration/sources/kafka/#auto_offset_reset) for further details.
diff --git a/docs/integrations/data-ingestion/kafka/msk/index.md b/docs/integrations/data-ingestion/kafka/msk/index.md
index b9dbad8a0c9..a2475049c03 100644
--- a/docs/integrations/data-ingestion/kafka/msk/index.md
+++ b/docs/integrations/data-ingestion/kafka/msk/index.md
@@ -31,8 +31,8 @@ import ConnectionDetails from '@site/docs/_snippets/_gather_your_details_http.md
## Prerequisites {#prerequisites}
We assume:
-* you are familiar with [ClickHouse Connector Sink](../kafka-clickhouse-connect-sink.md),
-* you are familiar with Amazon MSK and MSK Connectors. We recommend the Amazon MSK [Getting Started guide](https://docs.aws.amazon.com/msk/latest/developerguide/getting-started.html) and [MSK Connect guide](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect.html).
+* you're familiar with [ClickHouse Connector Sink](../kafka-clickhouse-connect-sink.md),
+* you're familiar with Amazon MSK and MSK Connectors. We recommend the Amazon MSK [Getting Started guide](https://docs.aws.amazon.com/msk/latest/developerguide/getting-started.html) and [MSK Connect guide](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect.html).
## The official Kafka connector from ClickHouse with Amazon MSK {#the-official-kafka-connector-from-clickhouse-with-amazon-msk}
@@ -157,7 +157,7 @@ You can find more details (both implementation and other considerations) in the
In order for MSK Connect to connect to ClickHouse, we recommend your MSK cluster to be in a private subnet with a Private NAT connected for internet access. Instructions on how to set this up are provided below. Note that public subnets are supported but not recommended due to the need to constantly assign an Elastic IP address to your ENI, [AWS provides more details here](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-internet-access.html)
-1. **Create a Private Subnet:** Create a new subnet within your VPC, designating it as a private subnet. This subnet should not have direct access to the internet.
+1. **Create a Private Subnet:** Create a new subnet within your VPC, designating it as a private subnet. This subnet shouldn't have direct access to the internet.
1. **Create a NAT Gateway:** Create a NAT gateway in a public subnet of your VPC. The NAT gateway enables instances in your private subnet to connect to the internet or other AWS services, but prevents the internet from initiating a connection with those instances.
1. **Update the Route Table:** Add a route that directs internet-bound traffic to the NAT gateway
1. **Ensure Security Group(s) and Network ACLs Configuration:** Configure your [security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html) and [network ACLs (Access Control Lists)](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html) to allow relevant traffic.
diff --git a/docs/integrations/data-ingestion/redshift/_snippets/_migration_guide.md b/docs/integrations/data-ingestion/redshift/_snippets/_migration_guide.md
index efe9c50665e..cba690ff1a2 100644
--- a/docs/integrations/data-ingestion/redshift/_snippets/_migration_guide.md
+++ b/docs/integrations/data-ingestion/redshift/_snippets/_migration_guide.md
@@ -21,7 +21,7 @@ From the ClickHouse instance standpoint, you can either:
3. **[PIVOT](#pivot-data-from-redshift-to-clickhouse-using-s3)** using S3 object storage using an "Unload then load" logic
:::note
-We used Redshift as a data source in this tutorial. However, the migration approaches presented here are not exclusive to Redshift, and similar steps can be derived for any compatible data source.
+We used Redshift as a data source in this tutorial. However, the migration approaches presented here aren't exclusive to Redshift, and similar steps can be derived for any compatible data source.
:::
## Push Data from Redshift to ClickHouse {#push-data-from-redshift-to-clickhouse}
@@ -57,7 +57,7 @@ In the pull scenario, the idea is to leverage the ClickHouse JDBC Bridge to conn
* Requires a ClickHouse JDBC Bridge instance which can turn into a potential scalability bottleneck
:::note
-Even though Redshift is based on PostgreSQL, using the ClickHouse PostgreSQL table function or table engine is not possible since ClickHouse requires PostgreSQL version 9 or above and the Redshift API is based on an earlier version (8.x).
+Even though Redshift is based on PostgreSQL, using the ClickHouse PostgreSQL table function or table engine isn't possible since ClickHouse requires PostgreSQL version 9 or above and the Redshift API is based on an earlier version (8.x).
:::
### Tutorial {#tutorial}
@@ -71,7 +71,7 @@ To use this option, you need to set up a ClickHouse JDBC Bridge. ClickHouse JDBC
Deploy the ClickHouse JDBC Bridge. For more details, see our user guide on [JDBC for External Data sources](/integrations/data-ingestion/dbms/jdbc-with-clickhouse.md)
:::note
-If you are using ClickHouse Cloud, you will need to run your ClickHouse JDBC Bridge on a separate environment and connect to ClickHouse Cloud using the [remoteSecure](/sql-reference/table-functions/remote/) function
+If you're using ClickHouse Cloud, you will need to run your ClickHouse JDBC Bridge on a separate environment and connect to ClickHouse Cloud using the [remoteSecure](/sql-reference/table-functions/remote/) function
:::
#### Configure your Redshift datasource {#configure-your-redshift-datasource}
diff --git a/docs/integrations/data-ingestion/s3/creating-an-s3-iam-role-and-bucket.md b/docs/integrations/data-ingestion/s3/creating-an-s3-iam-role-and-bucket.md
index f8a675d5286..47e1953ffb2 100644
--- a/docs/integrations/data-ingestion/s3/creating-an-s3-iam-role-and-bucket.md
+++ b/docs/integrations/data-ingestion/s3/creating-an-s3-iam-role-and-bucket.md
@@ -92,7 +92,7 @@ Click on the newly created user
The bucket name must be unique across AWS, not just the organization, or it will emit an error.
:::
-3. Leave `Block all Public Access` enabled; public access is not needed.
+3. Leave `Block all Public Access` enabled; public access isn't needed.
diff --git a/docs/integrations/data-ingestion/s3/index.md b/docs/integrations/data-ingestion/s3/index.md
index ec2a1af654c..f47537147dd 100644
--- a/docs/integrations/data-ingestion/s3/index.md
+++ b/docs/integrations/data-ingestion/s3/index.md
@@ -170,7 +170,7 @@ FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trip
LIMIT 10;
```
-Note that we are not required to list the columns since the `TabSeparatedWithNames` format encodes the column names in the first row. Other formats, such as `CSV` or `TSV`, will return auto-generated columns for this query, e.g., `c1`, `c2`, `c3` etc.
+Note that we're not required to list the columns since the `TabSeparatedWithNames` format encodes the column names in the first row. Other formats, such as `CSV` or `TSV`, will return auto-generated columns for this query, e.g., `c1`, `c2`, `c3` etc.
Queries additionally support [virtual columns](../sql-reference/table-functions/s3#virtual-columns), like `_path` and `_file`, that provide information regarding the bucket path and filename respectively. For example:
@@ -203,7 +203,7 @@ FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trip
└──────────┘
```
-While useful for sampling data and executing ae-hoc, exploratory queries, reading data directly from S3 is not something you want to do regularly. When it is time to get serious, import the data into a `MergeTree` table in ClickHouse.
+While useful for sampling data and executing ae-hoc, exploratory queries, reading data directly from S3 isn't something you want to do regularly. When it is time to get serious, import the data into a `MergeTree` table in ClickHouse.
### Using clickhouse-local {#using-clickhouse-local}
@@ -312,9 +312,9 @@ s3Cluster(cluster_name, source, [access_key_id, secret_access_key,] format, stru
* `format` — The [format](/interfaces/formats#formats-overview) of the file.
* `structure` — Structure of the table. Format 'column1_name column1_type, column2_name column2_type, ...'.
-Like any `s3` functions, the credentials are optional if the bucket is insecure or you define security through the environment, e.g., IAM roles. Unlike the s3 function, however, the structure must be specified in the request as of 22.3.1, i.e., the schema is not inferred.
+Like any `s3` functions, the credentials are optional if the bucket is insecure or you define security through the environment, e.g., IAM roles. Unlike the s3 function, however, the structure must be specified in the request as of 22.3.1, i.e., the schema isn't inferred.
-This function will be used as part of an `INSERT INTO SELECT` in most cases. In this case, you will often be inserting a distributed table. We illustrate a simple example below where trips_all is a distributed table. While this table uses the events cluster, the consistency of the nodes used for reads and writes is not a requirement:
+This function will be used as part of an `INSERT INTO SELECT` in most cases. In this case, you will often be inserting a distributed table. We illustrate a simple example below where trips_all is a distributed table. While this table uses the events cluster, the consistency of the nodes used for reads and writes isn't a requirement:
```sql
INSERT INTO default.trips_all
@@ -330,7 +330,7 @@ Inserts will occur against the initiator node. This means that while reads will
## S3 table engines {#s3-table-engines}
-While the `s3` functions allow ad-hoc queries to be performed on data stored in S3, they are syntactically verbose. The `S3` table engine allows you to not have to specify the bucket URL and credentials over and over again. To address this, ClickHouse provides the S3 table engine.
+While the `s3` functions allow ad-hoc queries to be performed on data stored in S3, they're syntactically verbose. The `S3` table engine allows you to not have to specify the bucket URL and credentials over and over again. To address this, ClickHouse provides the S3 table engine.
```sql
CREATE TABLE s3_engine_table (name String, value UInt32)
@@ -340,7 +340,7 @@ CREATE TABLE s3_engine_table (name String, value UInt32)
* `path` — Bucket URL with a path to the file. Supports following wildcards in read-only mode: `*`, `?`, `{abc,def}` and `{N..M}` where N, M — numbers, 'abc', 'def' — strings. For more information, see [here](/engines/table-engines/integrations/s3#wildcards-in-path).
* `format` — The[ format](/interfaces/formats#formats-overview) of the file.
-* `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the AWS account user. You can use these to authenticate your requests. The parameter is optional. If credentials are not specified, configuration file values are used. For more information, see [Managing credentials](#managing-credentials).
+* `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the AWS account user. You can use these to authenticate your requests. The parameter is optional. If credentials aren't specified, configuration file values are used. For more information, see [Managing credentials](#managing-credentials).
* `compression` — Compression type. Supported values: none, gzip/gz, brotli/br, xz/LZMA, zstd/zst. The parameter is optional. By default, it will autodetect compression by file extension.
### Reading data {#reading-data}
@@ -421,7 +421,7 @@ LIMIT 10;
### Inserting data {#inserting-data}
-The `S3` table engine supports parallel reads. Writes are only supported if the table definition does not contain glob patterns. The above table, therefore, would block writes.
+The `S3` table engine supports parallel reads. Writes are only supported if the table definition doesn't contain glob patterns. The above table, therefore, would block writes.
To demonstrate writes, create a table that points to a writable S3 bucket:
@@ -473,11 +473,11 @@ Both of these settings default to 0 - thus forcing the user to set one of them.
Some notes about the `S3` table engine:
-- Unlike a traditional `MergeTree` family table, dropping an `S3` table will not delete the underlying data.
+- Unlike a traditional `MergeTree` family table, dropping an `S3` table won't delete the underlying data.
- Full settings for this table type can be found [here](/engines/table-engines/integrations/s3.md/#settings).
- Be aware of the following caveats when using this engine:
- * ALTER queries are not supported
- * SAMPLE operations are not supported
+ * ALTER queries aren't supported
+ * SAMPLE operations aren't supported
* There is no notion of indexes, i.e. primary or skip.
## Managing credentials {#managing-credentials}
@@ -519,7 +519,7 @@ In the previous examples, we have passed credentials in the `s3` function or `S3
* Check performed in **$HOME/.aws**
* Temporary credentials obtained via the AWS Security Token Service - i.e. via [`AssumeRole`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) API
* Checks for credentials in the ECS environment variables `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI` or `AWS_CONTAINER_CREDENTIALS_FULL_URI` and `AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN`.
- * Obtains the credentials via [Amazon EC2 instance metadata](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-metadata.html) provided [AWS_EC2_METADATA_DISABLED](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html#envvars-list-AWS_EC2_METADATA_DISABLED) is not set to true.
+ * Obtains the credentials via [Amazon EC2 instance metadata](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-metadata.html) provided [AWS_EC2_METADATA_DISABLED](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html#envvars-list-AWS_EC2_METADATA_DISABLED) isn't set to true.
* These same settings can also be set for a specific endpoint, using the same prefix matching rule.
## Optimizing for performance {#s3-optimizing-performance}
@@ -532,7 +532,7 @@ Internally, the ClickHouse merge tree uses two primary storage formats: [`Wide`
## S3 backed MergeTree {#s3-backed-mergetree}
-The `s3` functions and associated table engine allow us to query data in S3 using familiar ClickHouse syntax. However, concerning data management features and performance, they are limited. There is no support for primary indexes, no-cache support, and files inserts need to be managed by the user.
+The `s3` functions and associated table engine allow us to query data in S3 using familiar ClickHouse syntax. However, concerning data management features and performance, they're limited. There is no support for primary indexes, no-cache support, and files inserts need to be managed by the user.
ClickHouse recognizes that S3 represents an attractive storage solution, especially where query performance on "colder" data is less critical, and users seek to separate storage and compute. To help achieve this, support is provided for using S3 as the storage for a MergeTree engine. This will enable you to exploit the scalability and cost benefits of S3, and the insert and query performance of the MergeTree engine.
@@ -641,7 +641,7 @@ SELECT passenger_count, avg(tip_amount) AS avg_tip, avg(total_amount) AS avg_amo
### Modifying a table {#modifying-a-table}
-Occasionally you may need to modify the storage policy of a specific table. Whilst this is possible, it comes with limitations. The new target policy must contain all of the disks and volumes of the previous policy, i.e., data will not be migrated to satisfy a policy change. When validating these constraints, volumes and disks will be identified by their name, with attempts to violate resulting in an error. However, assuming you use the previous examples, the following changes are valid.
+Occasionally you may need to modify the storage policy of a specific table. Whilst this is possible, it comes with limitations. The new target policy must contain all of the disks and volumes of the previous policy, i.e., data won't be migrated to satisfy a policy change. When validating these constraints, volumes and disks will be identified by their name, with attempts to violate resulting in an error. However, assuming you use the previous examples, the following changes are valid.
```xml
@@ -670,7 +670,7 @@ Occasionally you may need to modify the storage policy of a specific table. Whil
ALTER TABLE trips_s3 MODIFY SETTING storage_policy='s3_tiered'
```
-Here we reuse the main volume in our new s3_tiered policy and introduce a new hot volume. This uses the default disk, which consists of only one disk configured via the parameter ``. Note that our volume names and disks do not change. New inserts to our table will reside on the default disk until this reaches move_factor * disk_size - at which data will be relocated to S3.
+Here we reuse the main volume in our new s3_tiered policy and introduce a new hot volume. This uses the default disk, which consists of only one disk configured via the parameter ``. Note that our volume names and disks don't change. New inserts to our table will reside on the default disk until this reaches move_factor * disk_size - at which data will be relocated to S3.
### Handling replication {#handling-replication}
@@ -681,7 +681,7 @@ Replication with S3 disks can be accomplished by using the `ReplicatedMergeTree`
The following notes cover the implementation of S3 interactions with ClickHouse. Whilst generally only informative, it may help the readers when [Optimizing for Performance](#s3-optimizing-performance):
* By default, the maximum number of query processing threads used by any stage of the query processing pipeline is equal to the number of cores. Some stages are more parallelizable than others, so this value provides an upper bound. Multiple query stages may execute at once since data is streamed from the disk. The exact number of threads used for a query may thus exceed this. Modify through the setting [max_threads](/operations/settings/settings#max_threads).
-* Reads on S3 are asynchronous by default. This behavior is determined by setting `remote_filesystem_read_method`, set to the value `threadpool` by default. When serving a request, ClickHouse reads granules in stripes. Each of these stripes potentially contain many columns. A thread will read the columns for their granules one by one. Rather than doing this synchronously, a prefetch is made for all columns before waiting for the data. This offers significant performance improvements over synchronous waits on each column. You will not need to change this setting in most cases - see [Optimizing for Performance](#s3-optimizing-performance).
+* Reads on S3 are asynchronous by default. This behavior is determined by setting `remote_filesystem_read_method`, set to the value `threadpool` by default. When serving a request, ClickHouse reads granules in stripes. Each of these stripes potentially contain many columns. A thread will read the columns for their granules one by one. Rather than doing this synchronously, a prefetch is made for all columns before waiting for the data. This offers significant performance improvements over synchronous waits on each column. You won't need to change this setting in most cases - see [Optimizing for Performance](#s3-optimizing-performance).
* Writes are performed in parallel, with a maximum of 100 concurrent file writing threads. `max_insert_delayed_streams_for_parallel_write`, which has a default value of 1000, controls the number of S3 blobs written in parallel. Since a buffer is required for each file being written (~1MB), this effectively limits the memory consumption of an INSERT. It may be appropriate to lower this value in low server memory scenarios.
## Use S3 object storage as a ClickHouse disk {#configuring-s3-for-clickhouse-use}
@@ -731,7 +731,7 @@ vim /etc/clickhouse-server/config.d/storage_config.xml
The tags `s3_disk` and `s3_cache` within the `` tag are arbitrary labels. These can be set to something else but the same label must be used in the `` tab under the `` tab to reference the disk.
The `` tag is also arbitrary and is the name of the policy which will be used as the identifier storage target when creating resources in ClickHouse.
-The configuration shown above is for ClickHouse version 22.8 or higher, if you are using an older version please see the [storing data](/operations/storing-data.md/#using-local-cache) docs.
+The configuration shown above is for ClickHouse version 22.8 or higher, if you're using an older version please see the [storing data](/operations/storing-data.md/#using-local-cache) docs.
For more information about using S3:
Integrations Guide: [S3 Backed MergeTree](#s3-backed-mergetree)
@@ -817,7 +817,7 @@ You should see something like the following:
## Replicating a single shard across two AWS regions using S3 Object Storage {#s3-multi-region}
:::tip
-Object storage is used by default in ClickHouse Cloud, you do not need to follow this procedure if you are running in ClickHouse Cloud.
+Object storage is used by default in ClickHouse Cloud, you don't need to follow this procedure if you're running in ClickHouse Cloud.
:::
### Plan the deployment {#plan-the-deployment}
@@ -994,7 +994,7 @@ The above macros are for `chnode1`, on `chnode2` set `replica` to `replica_2`.
In ClickHouse versions 22.7 and lower the setting `allow_remote_fs_zero_copy_replication` is set to `true` by default for S3 and HDFS disks. This setting should be set to `false` for this disaster recovery scenario, and in version 22.8 and higher it is set to `false` by default.
-This setting should be false for two reasons: 1) this feature is not production ready; 2) in a disaster recovery scenario both the data and metadata need to be stored in multiple regions. Set `allow_remote_fs_zero_copy_replication` to `false`.
+This setting should be false for two reasons: 1) this feature isn't production ready; 2) in a disaster recovery scenario both the data and metadata need to be stored in multiple regions. Set `allow_remote_fs_zero_copy_replication` to `false`.
```xml title="/etc/clickhouse-server/config.d/remote-servers.xml"
@@ -1134,7 +1134,7 @@ When you added the [cluster configuration](#define-a-cluster) a single shard rep
```
- Understand the use of the macros defined earlier
- The macros `shard`, and `replica` were [defined earlier](#define-a-cluster), and in the highlighted line below you can see where the values are substituted on each ClickHouse node. Additionally, the value `uuid` is used; `uuid` is not defined in the macros as it is generated by the system.
+ The macros `shard`, and `replica` were [defined earlier](#define-a-cluster), and in the highlighted line below you can see where the values are substituted on each ClickHouse node. Additionally, the value `uuid` is used; `uuid` isn't defined in the macros as it is generated by the system.
```sql
SELECT create_table_query
FROM system.tables
@@ -1213,7 +1213,7 @@ These tests will verify that data is being replicated across the two servers, an
536K /var/lib/clickhouse/disks/s3_disk/store/551
```
- Check the S3 data in each S3 Bucket (the totals are not shown, but both buckets have approximately 36 MiB stored after the inserts):
+ Check the S3 data in each S3 Bucket (the totals aren't shown, but both buckets have approximately 36 MiB stored after the inserts):
diff --git a/docs/integrations/data-ingestion/s3/performance.md b/docs/integrations/data-ingestion/s3/performance.md
index 3a84db89f70..e7541f50f6e 100644
--- a/docs/integrations/data-ingestion/s3/performance.md
+++ b/docs/integrations/data-ingestion/s3/performance.md
@@ -138,11 +138,11 @@ Ensure your buckets are located in the same region as your ClickHouse instances.
ClickHouse can read files stored in S3 buckets in the [supported formats](/interfaces/formats#formats-overview) using the `s3` function and `S3` engine. If reading raw files, some of these formats have distinct advantages:
-* Formats with encoded column names such as Native, Parquet, CSVWithNames, and TabSeparatedWithNames will be less verbose to query since the user will not be required to specify the column name is the `s3` function. The column names allow this information to be inferred.
-* Formats will differ in performance with respect to read and write throughputs. Native and parquet represent the most optimal formats for read performance since they are already column orientated and more compact. The native format additionally benefits from alignment with how ClickHouse stores data in memory - thus reducing processing overhead as data is streamed into ClickHouse.
+* Formats with encoded column names such as Native, Parquet, CSVWithNames, and TabSeparatedWithNames will be less verbose to query since the user won't be required to specify the column name is the `s3` function. The column names allow this information to be inferred.
+* Formats will differ in performance with respect to read and write throughputs. Native and parquet represent the most optimal formats for read performance since they're already column orientated and more compact. The native format additionally benefits from alignment with how ClickHouse stores data in memory - thus reducing processing overhead as data is streamed into ClickHouse.
* The block size will often impact the latency of reads on large files. This is very apparent if you only sample the data, e.g., returning the top N rows. In the case of formats such as CSV and TSV, files must be parsed to return a set of rows. Formats such as Native and Parquet will allow faster sampling as a result.
* Each compression format brings pros and cons, often balancing the compression level for speed and biasing compression or decompression performance. If compressing raw files such as CSV or TSV, lz4 offers the fastest decompression performance, sacrificing the compression level. Gzip typically compresses better at the expense of slightly slower read speeds. Xz takes this further by usually offering the best compression with the slowest compression and decompression performance. If exporting, Gz and lz4 offer comparable compression speeds. Balance this against your connection speeds. Any gains from faster decompression or compression will be easily negated by a slower connection to your s3 buckets.
-* Formats such as native or parquet do not typically justify the overhead of compression. Any savings in data size are likely to be minimal since these formats are inherently compact. The time spent compressing and decompressing will rarely offset network transfer times - especially since s3 is globally available with higher network bandwidth.
+* Formats such as native or parquet don't typically justify the overhead of compression. Any savings in data size are likely to be minimal since these formats are inherently compact. The time spent compressing and decompressing will rarely offset network transfer times - especially since s3 is globally available with higher network bandwidth.
## Example dataset {#example-dataset}
@@ -191,7 +191,7 @@ When reading from queries, the initial query can often appear slower than if the
## Using threads for reads {#using-threads-for-reads}
-Read performance on S3 will scale linearly with the number of cores, provided you are not limited by network bandwidth or local I/O. Increasing the number of threads also has memory overhead permutations that you should be aware of. The following can be modified to improve read throughput performance potentially:
+Read performance on S3 will scale linearly with the number of cores, provided you're not limited by network bandwidth or local I/O. Increasing the number of threads also has memory overhead permutations that you should be aware of. The following can be modified to improve read throughput performance potentially:
* Usually, the default value of `max_threads` is sufficient, i.e., the number of cores. If the amount of memory used for a query is high, and this needs to be reduced, or the `LIMIT` on results is low, this value can be set lower. Users with plenty of memory may wish to experiment with increasing this value for possible higher read throughput from S3. Typically this is only beneficial on machines with lower core counts, i.e., < 10. The benefit from further parallelization typically diminishes as other resources act as a bottleneck, e.g., network and CPU contention.
* Versions of ClickHouse before 22.3.1 only parallelized reads across multiple files when using the `s3` function or `S3` table engine. This required the user to ensure files were split into chunks on S3 and read using a glob pattern to achieve optimal read performance. Later versions now parallelize downloads within a file.
@@ -369,7 +369,7 @@ As expected, this reduces insert performance by 3x.
### Disable de-duplication {#disable-de-duplication}
-Insert operations can sometimes fail due to errors such as timeouts. When inserts fail, data may or may not have been successfully inserted. To allow inserts to be safely re-tried by the client, by default in distributed deployments such as ClickHouse Cloud, ClickHouse tries to determine whether the data has already been successfully inserted. If the inserted data is marked as a duplicate, ClickHouse does not insert it into the destination table. However, the user will still receive a successful operation status as if the data had been inserted normally.
+Insert operations can sometimes fail due to errors such as timeouts. When inserts fail, data may or may not have been successfully inserted. To allow inserts to be safely re-tried by the client, by default in distributed deployments such as ClickHouse Cloud, ClickHouse tries to determine whether the data has already been successfully inserted. If the inserted data is marked as a duplicate, ClickHouse doesn't insert it into the destination table. However, the user will still receive a successful operation status as if the data had been inserted normally.
While this behavior, which incurs an insert overhead, makes sense when loading data from a client or in batches it can be unnecessary when performing an `INSERT INTO SELECT` from object storage. By disabling this functionality at insert time, we can improve performance as shown below:
@@ -386,7 +386,7 @@ Peak memory usage: 26.57 GiB.
### Optimize on insert {#optimize-on-insert}
-In ClickHouse, the `optimize_on_insert` setting controls whether data parts are merged during the insert process. When enabled (`optimize_on_insert = 1` by default), small parts are merged into larger ones as they are inserted, improving query performance by reducing the number of parts that need to be read. However, this merging adds overhead to the insert process, potentially slowing down high-throughput insertions.
+In ClickHouse, the `optimize_on_insert` setting controls whether data parts are merged during the insert process. When enabled (`optimize_on_insert = 1` by default), small parts are merged into larger ones as they're inserted, improving query performance by reducing the number of parts that need to be read. However, this merging adds overhead to the insert process, potentially slowing down high-throughput insertions.
Disabling this setting (`optimize_on_insert = 0`) skips merging during inserts, allowing data to be written more quickly, especially when handling frequent small inserts. The merging process is deferred to the background, allowing for better insert performance but temporarily increasing the number of small parts, which may slow down queries until the background merge completes. This setting is ideal when insert performance is a priority, and the background merge process can handle optimization efficiently later. As shown below, disabling setting can improve insert throughput:
diff --git a/docs/integrations/data-ingestion/streamkap/sql-server-clickhouse.md b/docs/integrations/data-ingestion/streamkap/sql-server-clickhouse.md
index 7f2c5e10c94..800aa5bcc4e 100644
--- a/docs/integrations/data-ingestion/streamkap/sql-server-clickhouse.md
+++ b/docs/integrations/data-ingestion/streamkap/sql-server-clickhouse.md
@@ -16,7 +16,7 @@ import image3 from '@site/static/images/integrations/data-ingestion/etl-tools/im
# Streaming Data from SQL Server to ClickHouse for Fast Analytics: Step-by-Step Guide
-In this article, we are breaking down a tutorial that shows you how to stream data from SQL Server to ClickHouse. ClickHouse is ideal if you want super fast analytics for reporting internal or customer-facing dashboards. We’ll walk step-by-step through getting both databases set up, how to connect them, and finally, how to use [Streamkap](https://streamkap.com) to stream your data. If SQL Server handles your day-to-day operations but you need the speed and smarts of ClickHouse for analytics, you’re in the right spot.
+In this article, we're breaking down a tutorial that shows you how to stream data from SQL Server to ClickHouse. ClickHouse is ideal if you want super fast analytics for reporting internal or customer-facing dashboards. We’ll walk step-by-step through getting both databases set up, how to connect them, and finally, how to use [Streamkap](https://streamkap.com) to stream your data. If SQL Server handles your day-to-day operations but you need the speed and smarts of ClickHouse for analytics, you’re in the right spot.
## Why Stream Data from SQL Server to ClickHouse? {#why-stream-data-from-sql-server-to-clickhouse}
diff --git a/docs/integrations/data-ingestion/streamkap/streamkap-and-clickhouse.md b/docs/integrations/data-ingestion/streamkap/streamkap-and-clickhouse.md
index 62d4c5a410f..0044811d921 100644
--- a/docs/integrations/data-ingestion/streamkap/streamkap-and-clickhouse.md
+++ b/docs/integrations/data-ingestion/streamkap/streamkap-and-clickhouse.md
@@ -93,7 +93,7 @@ By default, Streamkap uses an upsert ingestion mode. When it creates a table in
- **Updates** in the source are written as new rows in ClickHouse. During its background merge process, ReplacingMergeTree collapses these rows, keeping only the latest version based on the ordering key.
-- **Deletes** are handled by a metadata flag feeding the ReplacingMergeTree ```is_deleted``` parameter. Rows deleted at the source are not removed immediately but are marked as deleted.
+- **Deletes** are handled by a metadata flag feeding the ReplacingMergeTree ```is_deleted``` parameter. Rows deleted at the source aren't removed immediately but are marked as deleted.
- Optionally deleted records can be kept in ClickHouse for analytics purposes
### Metadata Columns {#metadata-columns}
diff --git a/docs/integrations/data-sources/mysql.md b/docs/integrations/data-sources/mysql.md
index d7827994e0b..1e2ca9153af 100644
--- a/docs/integrations/data-sources/mysql.md
+++ b/docs/integrations/data-sources/mysql.md
@@ -63,7 +63,7 @@ The `MySQL` table engine allows you to connect ClickHouse to MySQL. **SELECT** a
```
:::note
-If you are using this feature in ClickHouse Cloud, you may need the to allow the ClickHouse Cloud IP addresses to access your MySQL instance.
+If you're using this feature in ClickHouse Cloud, you may need the to allow the ClickHouse Cloud IP addresses to access your MySQL instance.
Check the ClickHouse [Cloud Endpoints API](//cloud/get-started/query-endpoints.md) for egress traffic details.
:::
diff --git a/docs/integrations/data-visualization/community_integrations/astrato-and-clickhouse.md b/docs/integrations/data-visualization/community_integrations/astrato-and-clickhouse.md
index 64bd30c2a2e..35d38460fda 100644
--- a/docs/integrations/data-visualization/community_integrations/astrato-and-clickhouse.md
+++ b/docs/integrations/data-visualization/community_integrations/astrato-and-clickhouse.md
@@ -78,7 +78,7 @@ In our Data View editor, you will see all of your Tables and Schemas in ClickHou
Now that you have your data selected, go to define the **data view**. Click define on the top right of the webpage.
-In here, you are able to join data, as well as, **create governed dimensions and measures** - ideal for driving consistency in business logic across various teams.
+In here, you're able to join data, as well as, **create governed dimensions and measures** - ideal for driving consistency in business logic across various teams.
diff --git a/docs/integrations/data-visualization/community_integrations/chartbrew-and-clickhouse.md b/docs/integrations/data-visualization/community_integrations/chartbrew-and-clickhouse.md
index 652f1dba137..d3af06b675c 100644
--- a/docs/integrations/data-visualization/community_integrations/chartbrew-and-clickhouse.md
+++ b/docs/integrations/data-visualization/community_integrations/chartbrew-and-clickhouse.md
@@ -38,7 +38,7 @@ In this guide, you will connect Chartbrew to ClickHouse, run a SQL query, and cr
:::tip Add some data
-If you do not have a dataset to work with, you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset.
+If you don't have a dataset to work with, you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset.
:::
## 1. Gather your connection details {#1-gather-your-connection-details}
diff --git a/docs/integrations/data-visualization/community_integrations/explo-and-clickhouse.md b/docs/integrations/data-visualization/community_integrations/explo-and-clickhouse.md
index af32ba71de5..4165a214595 100644
--- a/docs/integrations/data-visualization/community_integrations/explo-and-clickhouse.md
+++ b/docs/integrations/data-visualization/community_integrations/explo-and-clickhouse.md
@@ -45,7 +45,7 @@ In this guide you will connect your data from ClickHouse to Explo and visualize
:::tip Add some data
-If you do not have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
+If you don't have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
:::
## 1. Gather your connection details {#1-gather-your-connection-details}
diff --git a/docs/integrations/data-visualization/community_integrations/holistics-and-clickhouse.md b/docs/integrations/data-visualization/community_integrations/holistics-and-clickhouse.md
index cf5586ab8cc..3d6b3935cb5 100644
--- a/docs/integrations/data-visualization/community_integrations/holistics-and-clickhouse.md
+++ b/docs/integrations/data-visualization/community_integrations/holistics-and-clickhouse.md
@@ -70,7 +70,7 @@ Since Holistics is a cloud-based application, its servers must be able to reach
-2. **Reverse SSH Tunnel:** If your database is in a private network (VPC) and cannot be exposed publicly, use a [Reverse SSH Tunnel](https://docs.holistics.io/docs/connect/connect-tunnel).
+2. **Reverse SSH Tunnel:** If your database is in a private network (VPC) and can't be exposed publicly, use a [Reverse SSH Tunnel](https://docs.holistics.io/docs/connect/connect-tunnel).
## Add data source in Holistics {#step-3-add-data-source-in-holistics}
diff --git a/docs/integrations/data-visualization/community_integrations/luzmo-and-clickhouse.md b/docs/integrations/data-visualization/community_integrations/luzmo-and-clickhouse.md
index 70addac093c..71062ecdb12 100644
--- a/docs/integrations/data-visualization/community_integrations/luzmo-and-clickhouse.md
+++ b/docs/integrations/data-visualization/community_integrations/luzmo-and-clickhouse.md
@@ -52,7 +52,7 @@ You can now use your datasets to build beautiful (embedded) dashboards, or even
## Usage notes {#usage-notes}
1. The Luzmo ClickHouse connector uses the HTTP API interface (typically running on port 8123) to connect.
-2. If you use tables with the `Distributed` table engine some Luzmo-charts might fail when `distributed_product_mode` is `deny`. This should only occur, however, if you link the table to another table and use that link in a chart. In that case make sure to set the `distributed_product_mode` to another option that makes sense for you within your ClickHouse cluster. If you are using ClickHouse Cloud you can safely ignore this setting.
+2. If you use tables with the `Distributed` table engine some Luzmo-charts might fail when `distributed_product_mode` is `deny`. This should only occur, however, if you link the table to another table and use that link in a chart. In that case make sure to set the `distributed_product_mode` to another option that makes sense for you within your ClickHouse cluster. If you're using ClickHouse Cloud you can safely ignore this setting.
3. To ensure that e.g. only the Luzmo application can access your ClickHouse instance, it is highly recommended to **whitelist** the [Luzmo range of static IP addresses](https://academy.luzmo.com/article/u9on8gbm). We also recommend using a technical read-only user.
4. The ClickHouse connector currently supports following data types:
diff --git a/docs/integrations/data-visualization/community_integrations/mitzu-and-clickhouse.md b/docs/integrations/data-visualization/community_integrations/mitzu-and-clickhouse.md
index c69f2227df4..e6b952b14d0 100644
--- a/docs/integrations/data-visualization/community_integrations/mitzu-and-clickhouse.md
+++ b/docs/integrations/data-visualization/community_integrations/mitzu-and-clickhouse.md
@@ -31,17 +31,17 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
Mitzu is a no-code, warehouse-native product analytics application. Similar to tools like Amplitude, Mixpanel, and PostHog, Mitzu empowers users to analyze product usage data without requiring SQL or Python expertise.
-However, unlike these platforms, Mitzu does not duplicate the company's product usage data. Instead, it generates native SQL queries directly on the company's existing data warehouse or lake.
+However, unlike these platforms, Mitzu doesn't duplicate the company's product usage data. Instead, it generates native SQL queries directly on the company's existing data warehouse or lake.
## Goal {#goal}
-In this guide, we are going to cover the following:
+In this guide, we're going to cover the following:
- Warehouse-native product analytics
- How to integrate Mitzu to ClickHouse
:::tip Example datasets
-If you do not have a data set to use for Mitzu, you can work with NYC Taxi Data.
+If you don't have a data set to use for Mitzu, you can work with NYC Taxi Data.
This dataset is available in ClickHouse Cloud or [can be loaded with these instructions](/getting-started/example-datasets/nyc-taxi).
:::
@@ -162,7 +162,7 @@ If you encounter a limitation with Mitzu UI, copy the SQL code and continue your
## Mitzu support {#mitzu-support}
-If you are lost, feel free to contact us at [support@mitzu.io](email://support@mitzu.io)
+If you're lost, feel free to contact us at [support@mitzu.io](email://support@mitzu.io)
Or you our Slack community [here](https://join.slack.com/t/mitzu-io/shared_invite/zt-1h1ykr93a-_VtVu0XshfspFjOg6sczKg)
diff --git a/docs/integrations/data-visualization/community_integrations/zingdata-and-clickhouse.md b/docs/integrations/data-visualization/community_integrations/zingdata-and-clickhouse.md
index d7a1ae6a25f..0f3e53d7d09 100644
--- a/docs/integrations/data-visualization/community_integrations/zingdata-and-clickhouse.md
+++ b/docs/integrations/data-visualization/community_integrations/zingdata-and-clickhouse.md
@@ -55,7 +55,7 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
-5. If the connection is successful, Zing will proceed you to table selection. Select the required tables and click on **Save**. If Zing cannot connect to your data source, you'll see a message asking your to check your credentials and retry. If even after checking your credentials and retrying you still experience issues, reach out to Zing support here.
+5. If the connection is successful, Zing will proceed you to table selection. Select the required tables and click on **Save**. If Zing can't connect to your data source, you'll see a message asking your to check your credentials and retry. If even after checking your credentials and retrying you still experience issues, reach out to Zing support here.
diff --git a/docs/integrations/data-visualization/grafana/config.md b/docs/integrations/data-visualization/grafana/config.md
index dfaf440f5d6..59570eaee37 100644
--- a/docs/integrations/data-visualization/grafana/config.md
+++ b/docs/integrations/data-visualization/grafana/config.md
@@ -132,7 +132,7 @@ It is also required to configure these defaults for enabling [data links](./quer
To speed up [query building for logs](./query-builder.md#logs), you can set a default database/table as well as columns for the logs query. This will pre-load the query builder with a runnable logs query, which makes browsing on the explore page faster for observability.
-If you are using OpenTelemetry, you should enable the "**Use OTel**" switch, and set the **default log table** to `otel_logs`.
+If you're using OpenTelemetry, you should enable the "**Use OTel**" switch, and set the **default log table** to `otel_logs`.
This will automatically override the default columns to use the selected OTel schema version.
While OpenTelemetry isn't required for logs, using a single logs/trace dataset helps to enable a smoother observability workflow with [data linking](./query-builder.md#data-links).
@@ -160,7 +160,7 @@ jsonData:
To speed up [query building for traces](./query-builder.md#traces), you can set a default database/table as well as columns for the trace query. This will pre-load the query builder with a runnable trace search query, which makes browsing on the explore page faster for observability.
-If you are using OpenTelemetry, you should enable the "**Use OTel**" switch, and set the **default trace table** to `otel_traces`.
+If you're using OpenTelemetry, you should enable the "**Use OTel**" switch, and set the **default trace table** to `otel_traces`.
This will automatically override the default columns to use the selected OTel schema version.
While OpenTelemetry isn't required, this feature works best when using its schema for traces.
@@ -215,7 +215,7 @@ CREATE TABLE alias_example (
In the above example, we create an alias called `TimestampDate` that converts the nanoseconds timestamp to a `Date` type.
This data isn't stored on disk like the first column, it's calculated at query time.
-Table-defined aliases will not be returned with `SELECT *`, but this can be configured in the server settings.
+Table-defined aliases won't be returned with `SELECT *`, but this can be configured in the server settings.
For more info, read the documentation for the [ALIAS](/sql-reference/statements/create/table#alias) column type.
diff --git a/docs/integrations/data-visualization/grafana/index.md b/docs/integrations/data-visualization/grafana/index.md
index 16cc877b585..cee1ce34009 100644
--- a/docs/integrations/data-visualization/grafana/index.md
+++ b/docs/integrations/data-visualization/grafana/index.md
@@ -48,16 +48,16 @@ Grafana requires a plugin to connect to ClickHouse, which is easily installed wi
When connecting ClickHouse to a data visualization tool like Grafana, it is recommended to make a read-only user to protect your data from unwanted modifications.
-Grafana does not validate that queries are safe. Queries can contain any SQL statement, including `DELETE` and `INSERT`.
+Grafana doesn't validate that queries are safe. Queries can contain any SQL statement, including `DELETE` and `INSERT`.
To configure a read-only user, follow these steps:
1. Create a `readonly` user profile following the [Creating Users and Roles in ClickHouse](/operations/access-rights) guide.
2. Ensure the `readonly` user has enough permission to modify the `max_execution_time` setting required by the underlying [clickhouse-go client](https://github.com/ClickHouse/clickhouse-go).
-3. If you're using a public ClickHouse instance, it is not recommended to set `readonly=2` in the `readonly` profile. Instead, leave `readonly=1` and set the constraint type of `max_execution_time` to [changeable_in_readonly](/operations/settings/constraints-on-settings) to allow modification of this setting.
+3. If you're using a public ClickHouse instance, it isn't recommended to set `readonly=2` in the `readonly` profile. Instead, leave `readonly=1` and set the constraint type of `max_execution_time` to [changeable_in_readonly](/operations/settings/constraints-on-settings) to allow modification of this setting.
## 3. Install the ClickHouse plugin for Grafana {#3--install-the-clickhouse-plugin-for-grafana}
-Before Grafana can connect to ClickHouse, you need to install the appropriate Grafana plugin. Assuming you are logged in to Grafana, follow these steps:
+Before Grafana can connect to ClickHouse, you need to install the appropriate Grafana plugin. Assuming you're logged in to Grafana, follow these steps:
1. From the **Connections** page in the sidebar, select the **Add new connection** tab.
@@ -85,7 +85,7 @@ Before Grafana can connect to ClickHouse, you need to install the appropriate Gr
- **Server port:** the port for your ClickHouse service. Will be different depending on server configuration and protocol.
- **Protocol** the protocol used to connect to your ClickHouse service.
- **Secure connection** enable if your server requires a secure connection.
-- **Username** and **Password**: enter your ClickHouse user credentials. If you have not configured any users, try `default` for the username. It is recommended to [configure a read-only user](#2-making-a-read-only-user).
+- **Username** and **Password**: enter your ClickHouse user credentials. If you haven't configured any users, try `default` for the username. It is recommended to [configure a read-only user](#2-making-a-read-only-user).
For more settings, check the [plugin configuration](./config.md) documentation.
@@ -99,12 +99,12 @@ Your data source is now ready to use! Learn more about how to build queries with
For more details on configuration, check the [plugin configuration](./config.md) documentation.
-If you're looking for more information that is not included in these docs, check the [plugin repository on GitHub](https://github.com/grafana/clickhouse-datasource).
+If you're looking for more information that isn't included in these docs, check the [plugin repository on GitHub](https://github.com/grafana/clickhouse-datasource).
## Upgrading plugin versions {#upgrading-plugin-versions}
Starting with v4, configurations and queries are able to be upgraded as new versions are released.
-Configurations and queries from v3 are migrated to v4 as they are opened. While the old configurations and dashboards will load in v4, the migration is not persisted until they are saved again in the new version. If you notice any issues when opening an old configuration/query, discard your changes and [report the issue on GitHub](https://github.com/grafana/clickhouse-datasource/issues).
+Configurations and queries from v3 are migrated to v4 as they're opened. While the old configurations and dashboards will load in v4, the migration isn't persisted until they're saved again in the new version. If you notice any issues when opening an old configuration/query, discard your changes and [report the issue on GitHub](https://github.com/grafana/clickhouse-datasource/issues).
-The plugin cannot downgrade to previous versions if the configuration/query was created with a newer version.
+The plugin can't downgrade to previous versions if the configuration/query was created with a newer version.
diff --git a/docs/integrations/data-visualization/grafana/query-builder.md b/docs/integrations/data-visualization/grafana/query-builder.md
index a4eaae38f9e..884fc0a6f44 100644
--- a/docs/integrations/data-visualization/grafana/query-builder.md
+++ b/docs/integrations/data-visualization/grafana/query-builder.md
@@ -207,7 +207,7 @@ With the data links present, you can open traces and logs using the provided tra
"**View Trace**" will open a split panel with the trace, and "**View Logs**" will open a logs query filtered by the trace ID.
If the link is clicked from a dashboard instead of the explore view, the link will be opened in a new tab in the explore view.
-Having defaults configured for both [logs](./config.md#logs) and [traces](./config.md#traces) is required when crossing query types (logs to traces and traces to logs). Defaults are not required when opening a link of the same query type since the query can be simply copied.
+Having defaults configured for both [logs](./config.md#logs) and [traces](./config.md#traces) is required when crossing query types (logs to traces and traces to logs). Defaults aren't required when opening a link of the same query type since the query can be simply copied.
Example of viewing a trace (right panel) from a logs query (left panel)
@@ -262,4 +262,4 @@ This is a list of all macros available in the plugin:
| `$__timeInterval(columnName)` | Replaced by a function calculating the interval based on window size in seconds. | `toStartOfInterval(toDateTime(columnName), INTERVAL 20 second)` |
| `$__timeInterval_ms(columnName)` | Replaced by a function calculating the interval based on window size in milliseconds. | `toStartOfInterval(toDateTime64(columnName, 3), INTERVAL 20 millisecond)` |
| `$__interval_s` | Replaced by the dashboard interval in seconds. | `20` |
-| `$__conditionalAll(condition, $templateVar)` | Replaced by the first parameter when the template variable in the second parameter does not select every value. Replaced by the 1=1 when the template variable selects every value. | `condition` or `1=1` |
+| `$__conditionalAll(condition, $templateVar)` | Replaced by the first parameter when the template variable in the second parameter doesn't select every value. Replaced by the 1=1 when the template variable selects every value. | `condition` or `1=1` |
diff --git a/docs/integrations/data-visualization/looker-and-clickhouse.md b/docs/integrations/data-visualization/looker-and-clickhouse.md
index 9fba107b3b8..729d67a104b 100644
--- a/docs/integrations/data-visualization/looker-and-clickhouse.md
+++ b/docs/integrations/data-visualization/looker-and-clickhouse.md
@@ -39,7 +39,7 @@ Choose a name for your data source, and select `ClickHouse` from the dialect dro
-If you are using ClickHouse Cloud or your deployment requires SSL, make sure you have SSL turned on in the additional settings.
+If you're using ClickHouse Cloud or your deployment requires SSL, make sure you have SSL turned on in the additional settings.
@@ -54,7 +54,7 @@ Now you should be able to attach ClickHouse Datasource to your Looker project.
## 3. Known limitations {#3-known-limitations}
1. The following data types are handled as strings by default:
- * Array - serialization does not work as expected due to the JDBC driver limitations
+ * Array - serialization doesn't work as expected due to the JDBC driver limitations
* Decimal* - can be changed to number in the model
* LowCardinality(...) - can be changed to a proper type in the model
* Enum8, Enum16
@@ -69,5 +69,5 @@ Now you should be able to attach ClickHouse Datasource to your Looker project.
* Polygon
* Point
* Ring
-2. [Symmetric aggregate feature](https://cloud.google.com/looker/docs/reference/param-explore-symmetric-aggregates) is not supported
-3. [Full outer join](https://cloud.google.com/looker/docs/reference/param-explore-join-type#full_outer) is not yet implemented in the driver
+2. [Symmetric aggregate feature](https://cloud.google.com/looker/docs/reference/param-explore-symmetric-aggregates) isn't supported
+3. [Full outer join](https://cloud.google.com/looker/docs/reference/param-explore-join-type#full_outer) isn't yet implemented in the driver
diff --git a/docs/integrations/data-visualization/metabase-and-clickhouse.md b/docs/integrations/data-visualization/metabase-and-clickhouse.md
index c588332d812..9dc3c4c3901 100644
--- a/docs/integrations/data-visualization/metabase-and-clickhouse.md
+++ b/docs/integrations/data-visualization/metabase-and-clickhouse.md
@@ -38,7 +38,7 @@ In this guide you will ask some questions of your ClickHouse data with Metabase
:::tip Add some data
-If you do not have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
+If you don't have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
:::
## 1. Gather your connection details {#1-gather-your-connection-details}
@@ -46,7 +46,7 @@ If you do not have a dataset to work with you can add one of the examples. This
## 2. Download the ClickHouse plugin for Metabase {#2--download-the-clickhouse-plugin-for-metabase}
-1. If you do not have a `plugins` folder, create one as a subfolder of where you have `metabase.jar` saved.
+1. If you don't have a `plugins` folder, create one as a subfolder of where you have `metabase.jar` saved.
2. The plugin is a JAR file named `clickhouse.metabase-driver.jar`. Download the latest version of the JAR file at https://github.com/clickhouse/metabase-clickhouse-driver/releases/latest
diff --git a/docs/integrations/data-visualization/powerbi-and-clickhouse.md b/docs/integrations/data-visualization/powerbi-and-clickhouse.md
index d1856d706ad..ed932d1a033 100644
--- a/docs/integrations/data-visualization/powerbi-and-clickhouse.md
+++ b/docs/integrations/data-visualization/powerbi-and-clickhouse.md
@@ -188,7 +188,7 @@ Fill in the connection details.
:::note
-If you are using a deployment that has SSL enabled (e.g. ClickHouse Cloud or a self-managed instance), in the `SSLMode` field you should supply `require`.
+If you're using a deployment that has SSL enabled (e.g. ClickHouse Cloud or a self-managed instance), in the `SSLMode` field you should supply `require`.
- `Host` should always have the protocol (i.e. `http://` or `https://`) omitted.
- `Timeout` is an integer representing seconds. Default value: `30 seconds`.
@@ -215,7 +215,7 @@ Select your previously created data source from the list.
:::note
-If you did not specify credentials during the data source creation, you will be prompted to specify username and password.
+If you didn't specify credentials during the data source creation, you will be prompted to specify username and password.
:::
diff --git a/docs/integrations/data-visualization/splunk-and-clickhouse.md b/docs/integrations/data-visualization/splunk-and-clickhouse.md
index 4adb4b619c5..6755b579a95 100644
--- a/docs/integrations/data-visualization/splunk-and-clickhouse.md
+++ b/docs/integrations/data-visualization/splunk-and-clickhouse.md
@@ -34,13 +34,13 @@ Looking to store ClickHouse audit logs to Splunk? Follow the ["Storing ClickHous
Splunk is a popular technology for security and observability. It is also a powerful search and dashboarding engine. There are hundreds of Splunk apps available to address different use cases.
-For ClickHouse specifically, we are leveraging the [Splunk DB Connect App](https://splunkbase.splunk.com/app/2686) which has a simple integration to the highly performant ClickHouse JDBC driver to query tables in ClickHouse directly.
+For ClickHouse specifically, we're leveraging the [Splunk DB Connect App](https://splunkbase.splunk.com/app/2686) which has a simple integration to the highly performant ClickHouse JDBC driver to query tables in ClickHouse directly.
-The ideal use case for this integration is when you are using ClickHouse for large data sources such as NetFlow, Avro or Protobuf binary data, DNS, VPC flow logs, and other OTEL logs that can be shared with your team on Splunk to search and create dashboards. By using this approach, the data is not ingested into the Splunk index layer and is simply queried directly from ClickHouse similarly to other visualization integrations such as [Metabase](https://www.metabase.com/) or [Superset](https://superset.apache.org/).
+The ideal use case for this integration is when you're using ClickHouse for large data sources such as NetFlow, Avro or Protobuf binary data, DNS, VPC flow logs, and other OTEL logs that can be shared with your team on Splunk to search and create dashboards. By using this approach, the data isn't ingested into the Splunk index layer and is simply queried directly from ClickHouse similarly to other visualization integrations such as [Metabase](https://www.metabase.com/) or [Superset](https://superset.apache.org/).
## Goal {#goal}
-In this guide, we will use the ClickHouse JDBC driver to connect ClickHouse to Splunk. We will install a local version of Splunk Enterprise but we are not indexing any data. Instead, we are using the search functions through the DB Connect query engine.
+In this guide, we will use the ClickHouse JDBC driver to connect ClickHouse to Splunk. We will install a local version of Splunk Enterprise but we're not indexing any data. Instead, we're using the search functions through the DB Connect query engine.
With this guide, you will be able to create a dashboard connected to ClickHouse similar to this:
@@ -133,7 +133,7 @@ If you receive an error, make sure that you have added the IP address of your Sp
We will now run a SQL query to test that everything works.
-Select your connection details in the SQL Explorer from the DataLab section of the DB Connect App. We are using the `trips` table for this demo:
+Select your connection details in the SQL Explorer from the DataLab section of the DB Connect App. We're using the `trips` table for this demo:
diff --git a/docs/integrations/data-visualization/superset-and-clickhouse.md b/docs/integrations/data-visualization/superset-and-clickhouse.md
index 224bb5aac53..d6b9e4504b8 100644
--- a/docs/integrations/data-visualization/superset-and-clickhouse.md
+++ b/docs/integrations/data-visualization/superset-and-clickhouse.md
@@ -42,7 +42,7 @@ In this guide you will build a dashboard in Superset with data from a ClickHouse
:::tip Add some data
-If you do not have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
+If you don't have a dataset to work with you can add one of the examples. This guide uses the [UK Price Paid](/getting-started/example-datasets/uk-price-paid.md) dataset, so you might choose that one. There are several others to look at in the same documentation category.
:::
## 1. Gather your connection details {#1-gather-your-connection-details}
@@ -93,11 +93,11 @@ If you do not have a dataset to work with you can add one of the examples. This
-3. Click the **ADD** button at the bottom of the dialog window and your table appears in the list of datasets. You are ready to build a dashboard and analyze your ClickHouse data!
+3. Click the **ADD** button at the bottom of the dialog window and your table appears in the list of datasets. You're ready to build a dashboard and analyze your ClickHouse data!
## 5. Creating charts and a dashboard in Superset {#5--creating-charts-and-a-dashboard-in-superset}
-If you are familiar with Superset, then you will feel right at home with this next section. If you are new to Superset, well...it's like a lot of the other cool visualization tools out there in the world - it doesn't take long to get started, but the details and nuances get learned over time as you use the tool.
+If you're familiar with Superset, then you will feel right at home with this next section. If you're new to Superset, well...it's like a lot of the other cool visualization tools out there in the world - it doesn't take long to get started, but the details and nuances get learned over time as you use the tool.
1. You start with a dashboard. From the top menu in Superset, select **Dashboards**. Click the button in the upper-right to add a new dashboard. The following dashboard is named **UK property prices**:
diff --git a/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md b/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md
index 881c602a8f0..3ac210e6fda 100644
--- a/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md
+++ b/docs/integrations/data-visualization/tableau/tableau-analysis-tips.md
@@ -17,7 +17,7 @@ integration:
- In Live mode the MEDIAN() and PERCENTILE() functions (since connector v0.1.3 release) use the [ClickHouse quantile()() function](/sql-reference/aggregate-functions/reference/quantile/), which significantly speeds up the calculation, but uses sampling. If you want to get accurate calculation results, then use functions `MEDIAN_EXACT()` and `PERCENTILE_EXACT()` (based on [quantileExact()()](/sql-reference/aggregate-functions/reference/quantileexact/)).
- In Extract mode you can't use MEDIAN_EXACT() and PERCENTILE_EXACT() because MEDIAN() and PERCENTILE() are always accurate (and slow).
## Additional functions for calculated fields in Live mode {#additional-functions-for-calculated-fields-in-live-mode}
-ClickHouse has a huge number of functions that can be used for data analysis — much more than Tableau supports. For the convenience of users, we have added new functions that are available for use in Live mode when creating Calculated Fields. Unfortunately, it is not possible to add descriptions to these functions in the Tableau interface, so we will add a description for them right here.
+ClickHouse has a huge number of functions that can be used for data analysis — much more than Tableau supports. For the convenience of users, we have added new functions that are available for use in Live mode when creating Calculated Fields. Unfortunately, it isn't possible to add descriptions to these functions in the Tableau interface, so we will add a description for them right here.
- **[`-If` Aggregation Combinator](/sql-reference/aggregate-functions/combinators/#-if)** *(added in v0.2.3)* - allows to have Row-Level Filters right in the Aggregate Calculation. `SUM_IF(), AVG_IF(), COUNT_IF(), MIN_IF() & MAX_IF()` functions have been added.
- **`BAR([my_int], [min_val_int], [max_val_int], [bar_string_length_int])`** *(added in v0.2.1)* — Forget about boring bar charts! Use `BAR()` function instead (equivalent of [`bar()`](/sql-reference/functions/other-functions#bar) in ClickHouse). For example, this calculated field returns nice bars as String:
```text
@@ -44,7 +44,7 @@ ClickHouse has a huge number of functions that can be used for data analysis —
- **`KURTOSIS([my_number])`** — Computes the sample kurtosis of a sequence. Equivalent of [`kurtSamp()`](/sql-reference/aggregate-functions/reference/kurtsamp).
- **`KURTOSISP([my_number])`** — Computes the kurtosis of a sequence. The equivalent of [`kurtPop()`](/sql-reference/aggregate-functions/reference/kurtpop).
- **`MEDIAN_EXACT([my_number])`** *(added in v0.1.3)* — Exactly computes the median of a numeric data sequence. Equivalent of [`quantileExact(0.5)(...)`](/sql-reference/aggregate-functions/reference/quantileexact).
-- **`MOD([my_number_1], [my_number_2])`** — Calculates the remainder after division. If arguments are floating-point numbers, they are pre-converted to integers by dropping the decimal portion. Equivalent of [`modulo()`](/sql-reference/functions/arithmetic-functions/#modulo).
+- **`MOD([my_number_1], [my_number_2])`** — Calculates the remainder after division. If arguments are floating-point numbers, they're pre-converted to integers by dropping the decimal portion. Equivalent of [`modulo()`](/sql-reference/functions/arithmetic-functions/#modulo).
- **`PERCENTILE_EXACT([my_number], [level_float])`** *(added in v0.1.3)* — Exactly computes the percentile of a numeric data sequence. The recommended level range is [0.01, 0.99]. Equivalent of [`quantileExact()()`](/sql-reference/aggregate-functions/reference/quantileexact).
- **`PROPER([my_string])`** *(added in v0.2.5)* - Converts a text string so the first letter of each word is capitalized and the remaining letters are in lowercase. Spaces and non-alphanumeric characters such as punctuation also act as separators. For example:
```text
diff --git a/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md b/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md
index b051b07afb5..af2ddc8fc7b 100644
--- a/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md
+++ b/docs/integrations/data-visualization/tableau/tableau-and-clickhouse.md
@@ -120,7 +120,7 @@ could change, but for now you must use **default** as the database.)
-You are now ready to build some visualizations in Tableau!
+You're now ready to build some visualizations in Tableau!
## Building Visualizations in Tableau {#building-visualizations-in-tableau}
@@ -152,7 +152,7 @@ Now that we have a ClickHouse data source configured in Tableau, let's visualize
Not a very exciting line chart, but the dataset was generated by a script and built for testing query performance, so
-you will notice there is not a lot of variations in the simulated orders of the TCPD data.
+you will notice there isn't a lot of variations in the simulated orders of the TCPD data.
6. Suppose you want to know the average order amount (in dollars) by quarter and also by shipping mode (air, mail, ship,
truck, etc.):
diff --git a/docs/integrations/data-visualization/tableau/tableau-connection-tips.md b/docs/integrations/data-visualization/tableau/tableau-connection-tips.md
index 7c4d6846bf2..df262062ad9 100644
--- a/docs/integrations/data-visualization/tableau/tableau-connection-tips.md
+++ b/docs/integrations/data-visualization/tableau/tableau-connection-tips.md
@@ -31,7 +31,7 @@ SET my_setting=value;
In 99% of cases you don't need the Advanced tab, for the remaining 1% you can use the following settings:
- **Custom Connection Parameters**. By default, `socket_timeout` is already specified, this parameter may need to be changed if some extracts are updated for a very long time. The value of this parameter is specified in milliseconds. The rest of the parameters can be found [here](https://github.com/ClickHouse/clickhouse-jdbc/blob/master/clickhouse-client/src/main/java/com/clickhouse/client/config/ClickHouseClientOption.java), add them in this field separated by commas
- **JDBC Driver custom_http_params**. This field allows you to drop some parameters into the ClickHouse connection string by passing values to the [`custom_http_params` parameter of the driver](https://github.com/ClickHouse/clickhouse-jdbc#configuration). For example, this is how `session_id` is specified when the *Set Session ID* checkbox is activated
-- **JDBC Driver `typeMappings`**. This field allows you to [pass a list of ClickHouse data type mappings to Java data types used by the JDBC driver](https://github.com/ClickHouse/clickhouse-jdbc#configuration). The connector automatically displays large Integers as strings thanks to this parameter, you can change this by passing your mapping set *(I do not know why)* using
+- **JDBC Driver `typeMappings`**. This field allows you to [pass a list of ClickHouse data type mappings to Java data types used by the JDBC driver](https://github.com/ClickHouse/clickhouse-jdbc#configuration). The connector automatically displays large Integers as strings thanks to this parameter, you can change this by passing your mapping set *(I don't know why)* using
```text
UInt256=java.lang.Double,Int256=java.lang.Double
```
@@ -55,4 +55,4 @@ However, such fields are most often used to find the number of unique values *(I
```text
COUNTD([myUInt256]) // Works well too!
```
-When using the data preview (View data) of a table with UInt64 fields, an error does not appear now.
+When using the data preview (View data) of a table with UInt64 fields, an error doesn't appear now.
diff --git a/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md b/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md
index 39392fd2eae..0ac10b454c4 100644
--- a/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md
+++ b/docs/integrations/data-visualization/tableau/tableau-online-and-clickhouse.md
@@ -62,7 +62,7 @@ NB: if you want to use Tableau Online in combination with Tableau Desktop and sh
## Connecting Tableau Online to ClickHouse (cloud or on-premise setup with SSL) {#connecting-tableau-online-to-clickhouse-cloud-or-on-premise-setup-with-ssl}
-As it is not possible to provide the SSL certificates via the Tableau Online MySQL connection setup wizard,
+As it isn't possible to provide the SSL certificates via the Tableau Online MySQL connection setup wizard,
the only way is to use Tableau Desktop to set the connection up, and then export it to Tableau Online. This process is, however, pretty straightforward.
Run Tableau Desktop on a Windows or Mac machine, and select "Connect" -> "To a Server" -> "MySQL".
@@ -104,4 +104,4 @@ Finally, click "Publish", and your datasource with embedded credentials will be
## Known limitations (ClickHouse 23.11) {#known-limitations-clickhouse-2311}
-All the known limitations has been fixed in ClickHouse `23.11`. If you encounter any other incompatibilities, please do not hesitate to [contact us](https://clickhouse.com/company/contact) or create a [new issue](https://github.com/ClickHouse/ClickHouse/issues).
+All the known limitations has been fixed in ClickHouse `23.11`. If you encounter any other incompatibilities, please don't hesitate to [contact us](https://clickhouse.com/company/contact) or create a [new issue](https://github.com/ClickHouse/ClickHouse/issues).
diff --git a/docs/integrations/index.mdx b/docs/integrations/index.mdx
index aa44a96dbd4..6681926f6d8 100644
--- a/docs/integrations/index.mdx
+++ b/docs/integrations/index.mdx
@@ -18,7 +18,7 @@ ClickHouse integrations are organized by their support level:
| Tier | Description |
|------|-------------|
-| Core integrations | Built or maintained by ClickHouse, they are supported by ClickHouse and live in the ClickHouse GitHub organization |
+| Core integrations | Built or maintained by ClickHouse, they're supported by ClickHouse and live in the ClickHouse GitHub organization |
| Partner integrations | Built or maintained, and supported by, third-party software vendors |
| Community integrations | Built or maintained and supported by community members. No direct support is available besides the public GitHub repositories and community Slack channels |
diff --git a/docs/integrations/interfaces/http.md b/docs/integrations/interfaces/http.md
index ce74e574a3e..a005bc78ac3 100644
--- a/docs/integrations/interfaces/http.md
+++ b/docs/integrations/interfaces/http.md
@@ -92,7 +92,7 @@ curl 'http://localhost:8123/?query=SELECT%201'
```
In this example wget is used with the `-nv` (non-verbose) and `-O-` parameters to output the result to the terminal.
-In this case it is not necessary to use URL encoding for the space:
+In this case it isn't necessary to use URL encoding for the space:
```bash title="command"
wget -nv -O- 'http://localhost:8123/?query=SELECT 1'
@@ -125,7 +125,7 @@ X-ClickHouse-Exception-Tag: dngjzjnxkvlwkeua
```
As you can see, the `curl` command is somewhat inconvenient in that spaces must be URL escaped.
-Although `wget` escapes everything itself, we do not recommend using it because it does not work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
+Although `wget` escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
```bash
$ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
@@ -277,7 +277,7 @@ To delete the table:
$ echo 'DROP TABLE t' | curl 'http://localhost:8123/' --data-binary @-
```
-For successful requests that do not return a data table, an empty response body is returned.
+For successful requests that don't return a data table, an empty response body is returned.
## Compression {#compression}
@@ -376,7 +376,7 @@ echo 'SELECT 1' | curl 'http://user:password@localhost:8123/' -d @-
2. In the `user` and `password` URL parameters
:::warning
-We do not recommend using this method as the parameter might be logged by web proxy and cached in the browser
+We don't recommend using this method as the parameter might be logged by web proxy and cached in the browser
:::
For example:
@@ -393,7 +393,7 @@ For example:
echo 'SELECT 1' | curl -H 'X-ClickHouse-User: user' -H 'X-ClickHouse-Key: password' 'http://localhost:8123/' -d @-
```
-If the user name is not specified, then the `default` name is used. If the password is not specified, then an empty password is used.
+If the user name isn't specified, then the `default` name is used. If the password isn't specified, then an empty password is used.
You can also use the URL parameters to specify any settings for processing a single query or entire profiles of settings.
For example:
@@ -450,7 +450,7 @@ The possible header fields are:
| `elapsed_ns` | Query runtime in nanoseconds.|
| `memory_usage` | Memory in bytes used by the query. (**Available from v25.11**) |
-Running requests do not stop automatically if the HTTP connection is lost. Parsing and data formatting are performed on the server-side, and using the network might be ineffective.
+Running requests don't stop automatically if the HTTP connection is lost. Parsing and data formatting are performed on the server-side, and using the network might be ineffective.
The following optional parameters exist:
@@ -473,7 +473,7 @@ The following settings can be used:
`buffer_size` determines the number of bytes in the result to buffer in the server memory. If a result body is larger than this threshold, the buffer is written to the HTTP channel, and the remaining data is sent directly to the HTTP channel.
-To ensure that the entire response is buffered, set `wait_end_of_query=1`. In this case, the data that is not stored in memory will be buffered in a temporary server file.
+To ensure that the entire response is buffered, set `wait_end_of_query=1`. In this case, the data that isn't stored in memory will be buffered in a temporary server file.
For example:
@@ -490,7 +490,7 @@ Use buffering to avoid situations where a query processing error occurred after
This feature was added in ClickHouse 24.4.
In specific scenarios, setting the granted role first might be required before executing the statement itself.
-However, it is not possible to send `SET ROLE` and the statement together, as multi-statements are not allowed:
+However, it isn't possible to send `SET ROLE` and the statement together, as multi-statements aren't allowed:
```bash
curl -sS "http://localhost:8123" --data-binary "SET ROLE my_role;SELECT * FROM my_table;"
@@ -520,7 +520,7 @@ In this case, `?role=my_role&role=my_other_role` works similarly to executing `S
## HTTP response codes caveats {#http_response_codes_caveats}
-Because of limitations of the HTTP protocol, a HTTP 200 response code does not guarantee that a query was successful.
+Because of limitations of the HTTP protocol, a HTTP 200 response code doesn't guarantee that a query was successful.
Here is an example:
@@ -537,7 +537,7 @@ The reason for this behavior is the nature of the HTTP protocol. The HTTP header
This behavior is independent of the format used, whether it's `Native`, `TSV`, or `JSON`; the error message will always be in the middle of the response stream.
-You can mitigate this problem by enabling `wait_end_of_query=1` ([Response Buffering](#response-buffering)). In this case, sending of the HTTP header is delayed until the entire query is resolved. This however, does not completely solve the problem because the result must still fit within the [`http_response_buffer_size`](../../operations/settings/settings.md#http_response_buffer_size), and other settings like [`send_progress_in_http_headers`](../../operations/settings/settings.md#send_progress_in_http_headers) can interfere with the delay of the header.
+You can mitigate this problem by enabling `wait_end_of_query=1` ([Response Buffering](#response-buffering)). In this case, sending of the HTTP header is delayed until the entire query is resolved. This however, doesn't completely solve the problem because the result must still fit within the [`http_response_buffer_size`](../../operations/settings/settings.md#http_response_buffer_size), and other settings like [`send_progress_in_http_headers`](../../operations/settings/settings.md#send_progress_in_http_headers) can interfere with the delay of the header.
:::tip
The only way to catch all errors is to analyze the HTTP body before parsing it using the required format.
@@ -638,7 +638,7 @@ curl -sS "http://localhost:8123?param_arg1=abc%09123" -d "SELECT splitByChar('\t
Code: 457. DB::Exception: Value abc 123 cannot be parsed as String for query parameter 'arg1' because it isn't parsed completely: only 3 of 7 bytes was parsed: abc. (BAD_QUERY_PARAMETER) (version 23.4.1.869 (official build))
```
-If you are using URL parameters, you will need to encode the `\t` as `%5C%09`. For example:
+If you're using URL parameters, you will need to encode the `\t` as `%5C%09`. For example:
```bash
curl -sS "http://localhost:8123?param_arg1=abc%5C%09123" -d "SELECT splitByChar('\t', {arg1:String})"
@@ -737,20 +737,20 @@ Configuration options for `http_handlers` work as follows.
Each of these are discussed below:
- `method` is responsible for matching the method part of the HTTP request. `method` fully conforms to the definition of [`method`]
- (https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) in the HTTP protocol. It is an optional configuration. If it is not defined in the
- configuration file, it does not match the method portion of the HTTP request.
+ (https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) in the HTTP protocol. It is an optional configuration. If it isn't defined in the
+ configuration file, it doesn't match the method portion of the HTTP request.
- `url` is responsible for matching the URL part (path and query string) of the HTTP request.
If the `url` prefixed with `regex:` it expects [RE2](https://github.com/google/re2)'s regular expressions.
- It is an optional configuration. If it is not defined in the configuration file, it does not match the URL portion of the HTTP request.
+ It is an optional configuration. If it isn't defined in the configuration file, it doesn't match the URL portion of the HTTP request.
- `full_url` same as `url`, but, includes complete URL, i.e. `schema://host:port/path?query_string`.
- Note, ClickHouse does not support "virtual hosts", so the `host` is an IP address (and not the value of `Host` header).
+ Note, ClickHouse doesn't support "virtual hosts", so the `host` is an IP address (and not the value of `Host` header).
- `empty_query_string` - ensures that there is no query string (`?query_string`) in the request
- `headers` are responsible for matching the header part of the HTTP request. It is compatible with RE2's regular expressions. It is an optional
- configuration. If it is not defined in the configuration file, it does not match the header portion of the HTTP request.
+ configuration. If it isn't defined in the configuration file, it doesn't match the header portion of the HTTP request.
- `handler` contains the main processing part.
@@ -770,7 +770,7 @@ Each of these are discussed below:
- `response_content` — use with `static` type, response content sent to client, when using the prefix 'file://' or 'config://', find the content
from the file or configuration sends to client.
- `user` - user to execute the query from (default user is `default`).
- **Note**, you do not need to specify password for this user.
+ **Note**, you don't need to specify password for this user.
The configuration methods for different `type`s are discussed next.
@@ -856,7 +856,7 @@ In one `predefined_query_handler` only one `query` is supported.
In `dynamic_query_handler`, the query is written in the form of parameter of the HTTP request. The difference is that in `predefined_query_handler`, the query is written in the configuration file. `query_param_name` can be configured in `dynamic_query_handler`.
-ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the parameter is not passed in.
+ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the parameter isn't passed in.
To experiment with this functionality, the following example defines the values of [`max_threads`](../../operations/settings/settings.md#max_threads) and `max_final_threads` and `queries` whether the settings were set successfully.
diff --git a/docs/integrations/interfaces/mysql.md b/docs/integrations/interfaces/mysql.md
index af6e90d375e..6aa905ad455 100644
--- a/docs/integrations/interfaces/mysql.md
+++ b/docs/integrations/interfaces/mysql.md
@@ -16,24 +16,24 @@ import mysql3 from '@site/static/images/interfaces/mysql3.png';
# MySQL Interface
-ClickHouse supports the MySQL wire protocol. This allows certain clients that do not have native ClickHouse connectors leverage the MySQL protocol instead, and it has been validated with the following BI tools:
+ClickHouse supports the MySQL wire protocol. This allows certain clients that don't have native ClickHouse connectors leverage the MySQL protocol instead, and it has been validated with the following BI tools:
- [Looker Studio](../data-visualization/looker-studio-and-clickhouse.md)
- [Tableau Online](../integrations/tableau-online)
- [QuickSight](../integrations/quicksight)
-If you are trying other untested clients or integrations, keep in mind that there could be the following limitations:
+If you're trying other untested clients or integrations, keep in mind that there could be the following limitations:
- SSL implementation might not be fully compatible; there could be potential [TLS SNI](https://www.cloudflare.com/learning/ssl/what-is-sni/) issues.
-- A particular tool might require dialect features (e.g., MySQL-specific functions or settings) that are not implemented yet.
+- A particular tool might require dialect features (e.g., MySQL-specific functions or settings) that aren't implemented yet.
-If there is a native driver available (e.g., [DBeaver](../integrations/dbeaver)), it is always preferred to use it instead of the MySQL interface. Additionally, while most of the MySQL language clients should work fine, MySQL interface is not guaranteed to be a drop-in replacement for a codebase with existing MySQL queries.
+If there is a native driver available (e.g., [DBeaver](../integrations/dbeaver)), it is always preferred to use it instead of the MySQL interface. Additionally, while most of the MySQL language clients should work fine, MySQL interface isn't guaranteed to be a drop-in replacement for a codebase with existing MySQL queries.
-If your use case involves a particular tool that does not have a native ClickHouse driver, and you would like to use it via the MySQL interface and you found certain incompatibilities - please [create an issue](https://github.com/ClickHouse/ClickHouse/issues) in the ClickHouse repository.
+If your use case involves a particular tool that doesn't have a native ClickHouse driver, and you would like to use it via the MySQL interface and you found certain incompatibilities - please [create an issue](https://github.com/ClickHouse/ClickHouse/issues) in the ClickHouse repository.
::::note
To support the SQL dialect of above BI tools better, ClickHouse's MySQL interface implicitly runs SELECT queries with setting [prefer_column_name_to_alias = 1](/operations/settings/settings#prefer_column_name_to_alias).
-This cannot be turned off and it can lead in rare edge cases to different behavior between queries sent to ClickHouse's normal and MySQL query interfaces.
+This can't be turned off and it can lead in rare edge cases to different behavior between queries sent to ClickHouse's normal and MySQL query interfaces.
::::
## Enabling the MySQL Interface On ClickHouse Cloud {#enabling-the-mysql-interface-on-clickhouse-cloud}
@@ -163,7 +163,7 @@ If user password is specified using [SHA256](/sql-reference/functions/hash-funct
Restrictions:
-- prepared queries are not supported
+- prepared queries aren't supported
- some data types are sent as strings
diff --git a/docs/integrations/interfaces/postgresql.md b/docs/integrations/interfaces/postgresql.md
index 888b4fedb91..f32d414f4b5 100644
--- a/docs/integrations/interfaces/postgresql.md
+++ b/docs/integrations/interfaces/postgresql.md
@@ -17,7 +17,7 @@ import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
Check out our [Managed Postgres](/docs/cloud/managed-postgres) service. Backed by NVMe storage that is physically collocated with compute, it delivers up to 10x faster performance for workloads that are disk-bound compared to alternatives using network-attached storage like EBS and allows you to replicate your Postgres data to ClickHouse using the Postgres CDC connector in ClickPipes.
:::
-ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that is not already directly supported by ClickHouse (for example, Amazon Redshift).
+ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that isn't already directly supported by ClickHouse (for example, Amazon Redshift).
To enable the PostgreSQL wire protocol, add the [postgresql_port](/operations/server-configuration-parameters/settings#postgresql_port) setting to your server's configuration file. For example, you could define the port in a new XML file in your `config.d` folder:
@@ -48,7 +48,7 @@ psql -p 9005 -h 127.0.0.1 -U alice default
```
:::note
-The `psql` client requires a login with a password, so you will not be able connect using the `default` user with no password. Either assign a password to the `default` user, or login as a different user.
+The `psql` client requires a login with a password, so you won't be able connect using the `default` user with no password. Either assign a password to the `default` user, or login as a different user.
:::
The `psql` client prompts for the password:
diff --git a/docs/integrations/interfaces/prometheus.md b/docs/integrations/interfaces/prometheus.md
index b58934239b4..16c4dfdc419 100644
--- a/docs/integrations/interfaces/prometheus.md
+++ b/docs/integrations/interfaces/prometheus.md
@@ -12,7 +12,7 @@ doc_type: 'reference'
## Exposing metrics {#expose}
:::note
-If you are using ClickHouse Cloud, you can expose metrics to Prometheus using the [Prometheus Integration](/integrations/prometheus).
+If you're using ClickHouse Cloud, you can expose metrics to Prometheus using the [Prometheus Integration](/integrations/prometheus).
:::
ClickHouse can expose its own metrics for scraping from Prometheus:
diff --git a/docs/integrations/interfaces/ssh.md b/docs/integrations/interfaces/ssh.md
index 2b5f59e5f9a..40f0f63cc98 100644
--- a/docs/integrations/interfaces/ssh.md
+++ b/docs/integrations/interfaces/ssh.md
@@ -25,7 +25,7 @@ After creating a [database user identified by an SSH key](/knowledgebase/how-to-
CREATE USER abcuser IDENTIFIED WITH ssh_key BY KEY '' TYPE 'ssh-ed25519';
```
-You are able to use this key to connect to a ClickHouse server. It will open a pseudoterminal (PTY) with an interactive session of clickhouse-client.
+You're able to use this key to connect to a ClickHouse server. It will open a pseudoterminal (PTY) with an interactive session of clickhouse-client.
```bash
> ssh -i ~/test_ssh/id_ed25519 abcuser@localhost -p 9022
@@ -83,7 +83,7 @@ ssh -o "StrictHostKeyChecking no" user@host
## Configuring embedded client {#configuring-embedded-client}
-You are able to pass options to an embedded client similar to the ordinary `clickhouse-client`, but with a few limitations.
+You're able to pass options to an embedded client similar to the ordinary `clickhouse-client`, but with a few limitations.
Since this is an SSH protocol, the only way to pass parameters to the target host is through environment variables.
For example setting the `format` can be done this way:
@@ -97,7 +97,7 @@ For example setting the `format` can be done this way:
└───┘
```
-You are able to change any user-level setting this way and additionally pass most of the ordinary `clickhouse-client` options (except ones which don't make sense in this setup.)
+You're able to change any user-level setting this way and additionally pass most of the ordinary `clickhouse-client` options (except ones which don't make sense in this setup.)
Important:
diff --git a/docs/integrations/interfaces/tcp.md b/docs/integrations/interfaces/tcp.md
index dfcd68595bf..c916c835c39 100644
--- a/docs/integrations/interfaces/tcp.md
+++ b/docs/integrations/interfaces/tcp.md
@@ -9,4 +9,4 @@ doc_type: 'reference'
# Native interface (TCP)
-The native protocol is used in the [command-line client](/interfaces/cli), for inter-server communication during distributed query processing, and also in other C++ programs. Unfortunately, native ClickHouse protocol does not have formal specification yet, but it can be reverse-engineered from ClickHouse source code (starting [around here](https://github.com/ClickHouse/ClickHouse/tree/master/src/Client)) and/or by intercepting and analyzing TCP traffic.
+The native protocol is used in the [command-line client](/interfaces/cli), for inter-server communication during distributed query processing, and also in other C++ programs. Unfortunately, native ClickHouse protocol doesn't have formal specification yet, but it can be reverse-engineered from ClickHouse source code (starting [around here](https://github.com/ClickHouse/ClickHouse/tree/master/src/Client)) and/or by intercepting and analyzing TCP traffic.
diff --git a/docs/integrations/language-clients/cpp.md b/docs/integrations/language-clients/cpp.md
index 800a2fc1a2a..c3eaa3b30a1 100644
--- a/docs/integrations/language-clients/cpp.md
+++ b/docs/integrations/language-clients/cpp.md
@@ -61,7 +61,7 @@ target_link_libraries(your-target PRIVATE clickhouse-cpp-lib)
### Setting the client object {#example-setup-client}
Create a `Client` instance to establish a connection to ClickHouse. The following example
-demonstrates connecting to a local ClickHouse instance, where no password is required and SSL is not
+demonstrates connecting to a local ClickHouse instance, where no password is required and SSL isn't
enabled.
```cpp
@@ -88,7 +88,7 @@ clickhouse::Client client{
### Creating tables and running queries without data {#example-create-table}
-To execute a query that does not return any data, such as creating tables, use the `Execute` method.
+To execute a query that doesn't return any data, such as creating tables, use the `Execute` method.
The same approach applies to other statements like `ALTER TABLE`, `DROP`, etc..
```cpp
diff --git a/docs/integrations/language-clients/csharp.md b/docs/integrations/language-clients/csharp.md
index 15d65a90886..8c97b07bb07 100644
--- a/docs/integrations/language-clients/csharp.md
+++ b/docs/integrations/language-clients/csharp.md
@@ -139,7 +139,7 @@ The `ClickHouseConnection` class normally allows for parallel operation (multipl
| Roles | `IReadOnlyList` | Empty | `Roles` | Comma-separated ClickHouse roles (e.g., `Roles=admin,reader`) |
:::note
-When using a connection string to set custom settings, use the `set_` prefix, e.g. "set_max_threads=4". When using a ClickHouseClientSettings object, do not use the `set_` prefix.
+When using a connection string to set custom settings, use the `set_` prefix, e.g. "set_max_threads=4". When using a ClickHouseClientSettings object, don't use the `set_` prefix.
For a full list of available settings, see [here](https://clickhouse.com/docs/operations/settings/settings).
:::
@@ -176,7 +176,7 @@ Choose **C#**. Connection details are displayed below.
-If you are using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
+If you're using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.
Using a connection string:
@@ -290,7 +290,7 @@ Console.WriteLine($"Rows written: {bulkCopy.RowsWritten}");
* Column names can be optionally provided via `ColumnNames` property if source data has fewer columns than target table.
* Configurable parameters: `Columns`, `BatchSize`, `MaxDegreeOfParallelism`.
* Before copying, a `SELECT * FROM
LIMIT 0` query is performed to get information about target table structure. Types of provided objects must reasonably match the target table.
-* Sessions are not compatible with parallel insertion. Connection passed to `ClickHouseBulkCopy` must have sessions disabled, or `MaxDegreeOfParallelism` must be set to `1`.
+* Sessions aren't compatible with parallel insertion. Connection passed to `ClickHouseBulkCopy` must have sessions disabled, or `MaxDegreeOfParallelism` must be set to `1`.
:::
---
@@ -364,7 +364,7 @@ Console.WriteLine($"QueryId: {command.QueryId}");
```
:::tip
-If you are overriding the `QueryId` parameter, you need to ensure its uniqueness for every call. A random GUID is a good choice.
+If you're overriding the `QueryId` parameter, you need to ensure its uniqueness for every call. A random GUID is a good choice.
:::
---
@@ -416,9 +416,9 @@ For additional practical usage examples, see the [examples directory](https://gi
`ClickHouse.Driver` uses `System.Net.Http.HttpClient` under the hood. `HttpClient` has a per-endpoint connection pool. As a consequence:
-* A `ClickHouseConnection` object does not have 1:1 mapping to TCP connections - multiple database sessions will be multiplexed through several TCP connections per server.
+* A `ClickHouseConnection` object doesn't have 1:1 mapping to TCP connections - multiple database sessions will be multiplexed through several TCP connections per server.
* `ClickHouseConnection` objects can be long-lived; the actual TCP connections underneath will be recycled by the connection pool.
-* Let `HttpClient` manage connection pooling internally. Do not pool `ClickHouseConnection` objects yourself.
+* Let `HttpClient` manage connection pooling internally. Don't pool `ClickHouseConnection` objects yourself.
* Connections can stay alive after `ClickHouseConnection` object was disposed.
* This behavior can be tweaked by passing a custom `HttpClientFactory` or `HttpClient` with custom `HttpClientHandler`.
@@ -467,7 +467,7 @@ settings.CustomSettings["wait_for_async_insert"] = 1; // Recommended: wait for f
| `wait_for_async_insert=0` | Insert returns immediately when data is buffered. No guarantee data will be persisted. | Only when data loss is acceptable |
:::warning
-With `wait_for_async_insert=0`, errors only surface during flush and cannot be traced back to the original insert. The client also provides no backpressure, risking server overload.
+With `wait_for_async_insert=0`, errors only surface during flush and can't be traced back to the original insert. The client also provides no backpressure, risking server overload.
:::
**Key settings:**
@@ -761,7 +761,7 @@ There is an important difference between HTTP parameter binding and bulk copy wh
**Bulk Copy** knows the target column's timezone and correctly interprets `Unspecified` values in that timezone.
-**HTTP Parameters** do not automatically know the column timezone. You must specify it in the parameter type hint:
+**HTTP Parameters** don't automatically know the column timezone. You must specify it in the parameter type hint:
```csharp
// CORRECT: Timezone in type hint
@@ -868,7 +868,7 @@ await bulkCopy.WriteToServerAsync(new[] { row1, row2 });
## Logging and diagnostics {#logging-and-diagnostics}
-The ClickHouse .NET client integrates with the `Microsoft.Extensions.Logging` abstractions to offer lightweight, opt-in logging. When enabled, the driver emits structured messages for connection lifecycle events, command execution, transport operations, and bulk copy uploads. Logging is entirely optional—applications that do not configure a logger continue to run without additional overhead.
+The ClickHouse .NET client integrates with the `Microsoft.Extensions.Logging` abstractions to offer lightweight, opt-in logging. When enabled, the driver emits structured messages for connection lifecycle events, command execution, transport operations, and bulk copy uploads. Logging is entirely optional—applications that don't configure a logger continue to run without additional overhead.
### Quick start {#logging-quick-start}
@@ -996,7 +996,7 @@ This will log:
### Debug mode: network tracing and diagnostics {#logging-debugmode}
-To help with diagnosing networking issues, the driver library includes a helper that enables low-level tracing of .NET networking internals. To enable it you must pass a LoggerFactory with the level set to Trace, and set EnableDebugMode to true (or manually enable it via the `ClickHouse.Driver.Diagnostic.TraceHelper` class). Events will be logged to the `ClickHouse.Driver.NetTrace` category. Warning: this will generate extremely verbose logs, and impact performance. It is not recommended to enable debug mode in production.
+To help with diagnosing networking issues, the driver library includes a helper that enables low-level tracing of .NET networking internals. To enable it you must pass a LoggerFactory with the level set to Trace, and set EnableDebugMode to true (or manually enable it via the `ClickHouse.Driver.Diagnostic.TraceHelper` class). Events will be logged to the `ClickHouse.Driver.NetTrace` category. Warning: this will generate extremely verbose logs, and impact performance. It isn't recommended to enable debug mode in production.
```csharp
var loggerFactory = LoggerFactory.Create(builder =>
@@ -1123,7 +1123,7 @@ await connection.OpenAsync();
:::note
Important considerations when providing a custom HttpClient
-- **Automatic decompression**: You must enable `AutomaticDecompression` if compression is not disabled (compression is enabled by default).
+- **Automatic decompression**: You must enable `AutomaticDecompression` if compression isn't disabled (compression is enabled by default).
- **Idle timeout**: Set `PooledConnectionIdleTimeout` smaller than the server's `keep_alive_timeout` (10 seconds for ClickHouse Cloud) to avoid connection errors from half-open connections.
:::
@@ -1131,7 +1131,7 @@ Important considerations when providing a custom HttpClient
### Dapper {#orm-support-dapper}
-`ClickHouse.Driver` can be used with Dapper, but anonymous objects are not supported.
+`ClickHouse.Driver` can be used with Dapper, but anonymous objects aren't supported.
**Working example:**
@@ -1219,7 +1219,7 @@ Entity Framework Core is currently not supported.
### AggregateFunction columns {#aggregatefunction-columns}
-Columns of type `AggregateFunction(...)` cannot be queried or inserted directly.
+Columns of type `AggregateFunction(...)` can't be queried or inserted directly.
To insert:
diff --git a/docs/integrations/language-clients/go/index.md b/docs/integrations/language-clients/go/index.md
index 5815d68b8ba..fabfefa0415 100644
--- a/docs/integrations/language-clients/go/index.md
+++ b/docs/integrations/language-clients/go/index.md
@@ -198,7 +198,7 @@ Both interfaces encode data using the [native format](/native-protocol/basics.md
## Installation {#installation}
-v1 of the driver is deprecated and will not reach feature updates or support for new ClickHouse types. You should migrate to v2, which offers superior performance.
+v1 of the driver is deprecated and won't reach feature updates or support for new ClickHouse types. You should migrate to v2, which offers superior performance.
To install the 2.x version of the client, add the package to your go.mod file:
@@ -251,7 +251,7 @@ The client is released independently of ClickHouse. 2.x represents the current m
The client supports:
-- All currently supported versions of ClickHouse as recorded [here](https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md). As ClickHouse versions are no longer supported they are also no longer actively tested against client releases.
+- All currently supported versions of ClickHouse as recorded [here](https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md). As ClickHouse versions are no longer supported they're also no longer actively tested against client releases.
- All versions of ClickHouse 2 years from the release date of the client. Note only LTS versions are actively tested.
#### Golang compatibility {#golang-compatibility}
@@ -267,7 +267,7 @@ All code examples for the ClickHouse Client API can be found [here](https://gith
### Connecting {#connecting}
-The following example, which returns the server version, demonstrates connecting to ClickHouse - assuming ClickHouse is not secured and accessible with the default user.
+The following example, which returns the server version, demonstrates connecting to ClickHouse - assuming ClickHouse isn't secured and accessible with the default user.
Note we use the default native port to connect.
@@ -402,7 +402,7 @@ fmt.Println(v.String())
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/ssl.go)
-This minimal `TLS.Config` is normally sufficient to connect to the secure native port (normally 9440) on a ClickHouse server. If the ClickHouse server does not have a valid certificate (expired, wrong hostname, not signed by a publicly recognized root Certificate Authority), `InsecureSkipVerify` can be true, but this is strongly discouraged.
+This minimal `TLS.Config` is normally sufficient to connect to the secure native port (normally 9440) on a ClickHouse server. If the ClickHouse server doesn't have a valid certificate (expired, wrong hostname, not signed by a publicly recognized root Certificate Authority), `InsecureSkipVerify` can be true, but this is strongly discouraged.
```go
conn, err := clickhouse.Open(&clickhouse.Options{
@@ -501,7 +501,7 @@ if err != nil {
### Execution {#execution}
-Arbitrary statements can be executed via the `Exec` method. This is useful for DDL and simple statements. It should not be used for larger inserts or query iterations.
+Arbitrary statements can be executed via the `Exec` method. This is useful for DDL and simple statements. It shouldn't be used for larger inserts or query iterations.
```go
conn.Exec(context.Background(), `DROP TABLE IF EXISTS example`)
@@ -586,7 +586,7 @@ return batch.Send()
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/batch.go)
-Recommendations for ClickHouse apply [here](/guides/inserting-data#best-practices-for-inserts). Batches should not be shared across go-routines - construct a separate batch per routine.
+Recommendations for ClickHouse apply [here](/guides/inserting-data#best-practices-for-inserts). Batches shouldn't be shared across go-routines - construct a separate batch per routine.
From the above example, note the need for variable types to align with the column type when appending rows. While the mapping is usually obvious, this interface tries to be flexible, and types will be converted provided no precision loss is incurred. For example, the following demonstrates inserting a string into a datetime64.
@@ -657,7 +657,7 @@ return rows.Err()
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/query_rows.go)
-Note in both cases, we are required to pass a pointer to the variables we wish to serialize the respective column values into. These must be passed in the order specified in the `SELECT` statement - by default, the order of column declaration will be used in the event of a `SELECT *` as shown above.
+Note in both cases, we're required to pass a pointer to the variables we wish to serialize the respective column values into. These must be passed in the order specified in the `SELECT` statement - by default, the order of column declaration will be used in the event of a `SELECT *` as shown above.
Similar to insertion, the Scan method requires the target variables to be of an appropriate type. This again aims to be flexible, with types converted where possible, provided no precision loss is possible, e.g., the above example shows a UUID column being read into a string variable. For a full list of supported go types for each Column type, see [Type Conversions](#type-conversions).
@@ -814,7 +814,7 @@ for i := 0; i < 1_000; i++ {
### Type conversions {#type-conversions}
-The client aims to be as flexible as possible concerning accepting variable types for both insertion and marshaling of responses. In most cases, an equivalent Golang type exists for a ClickHouse column type, e.g., [UInt64](/sql-reference/data-types/int-uint/) to [uint64](https://pkg.go.dev/builtin#uint64). These logical mappings should always be supported. You may wish to utilize variable types that can be inserted into columns or used to receive a response if the conversion of either the variable or received data takes place first. The client aims to support these conversions transparently, so users do not need to convert their data to align precisely before insertion and to provide flexible marshaling at query time. This transparent conversion does not allow for precision loss. For example, a uint32 cannot be used to receive data from a UInt64 column. Conversely, a string can be inserted into a datetime64 field provided it meets the format requirements.
+The client aims to be as flexible as possible concerning accepting variable types for both insertion and marshaling of responses. In most cases, an equivalent Golang type exists for a ClickHouse column type, e.g., [UInt64](/sql-reference/data-types/int-uint/) to [uint64](https://pkg.go.dev/builtin#uint64). These logical mappings should always be supported. You may wish to utilize variable types that can be inserted into columns or used to receive a response if the conversion of either the variable or received data takes place first. The client aims to support these conversions transparently, so users don't need to convert their data to align precisely before insertion and to provide flexible marshaling at query time. This transparent conversion doesn't allow for precision loss. For example, a uint32 can't be used to receive data from a UInt64 column. Conversely, a string can be inserted into a datetime64 field provided it meets the format requirements.
The type conversions currently supported for primitive types are captured [here](https://github.com/ClickHouse/clickhouse-go/blob/main/TYPES.md).
@@ -832,7 +832,7 @@ Handling of timezone information depends on the ClickHouse type and whether the
* At **insert** time the value is sent to ClickHouse in UNIX timestamp format. If no time zone is provided, the client will assume the client's local time zone. `time.Time{}` or `sql.NullTime` will be converted to epoch accordingly.
* At **select** time the timezone of the column will be used if set when returning a `time.Time` value. If not, the timezone of the server will be used.
* **Date/Date32**
- * At **insert** time, the timezone of any date is considered when converting the date to a unix timestamp, i.e., it will be offset by the timezone prior to storage as a date, as Date types have no locale in ClickHouse. If this is not specified in a string value, the local timezone will be used.
+ * At **insert** time, the timezone of any date is considered when converting the date to a unix timestamp, i.e., it will be offset by the timezone prior to storage as a date, as Date types have no locale in ClickHouse. If this isn't specified in a string value, the local timezone will be used.
* At **select** time, dates are scanned into `time.Time{}` or `sql.NullTime{}` instances will be returned without timezone information.
#### Array {#array}
@@ -1108,7 +1108,7 @@ rows.Close()
[Full Example - `flatten_tested=0`](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/nested.go#L28-L118)
-If the default value of 1 is used for `flatten_nested`, nested columns are flattened to separate arrays. This requires using nested slices for insertion and retrieval. While arbitrary levels of nesting may work, this is not officially supported.
+If the default value of 1 is used for `flatten_nested`, nested columns are flattened to separate arrays. This requires using nested slices for insertion and retrieval. While arbitrary levels of nesting may work, this isn't officially supported.
```go
conn, err := GetNativeConnection(nil, nil, nil)
@@ -1312,7 +1312,7 @@ if err = conn.QueryRow(ctx, "SELECT * FROM example").Scan(&col1, &col2); err !=
Due to Go's lack of a built-in Decimal type, we recommend using the third-party package [github.com/shopspring/decimal](https://github.com/shopspring/decimal) to work with Decimal types natively without modifying your original queries.
:::note
-You may be tempted to use Float instead to avoid third-party dependencies. However, be aware that [Float types in ClickHouse are not recommended when accurate values are required](https://clickhouse.com/docs/sql-reference/data-types/float).
+You may be tempted to use Float instead to avoid third-party dependencies. However, be aware that [Float types in ClickHouse aren't recommended when accurate values are required](https://clickhouse.com/docs/sql-reference/data-types/float).
If you still choose to use Go's built-in Float type on the client side, you must explicitly convert Decimal to Float using the [toFloat64() function](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#toFloat64) or [its variants](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#toFloat64OrZero) in your ClickHouse queries. Be aware that this conversion may result in loss of precision.
:::
@@ -1744,7 +1744,7 @@ rows.Close()
### Dynamic scanning {#dynamic-scanning}
-You may need to read tables for which they do not know the schema or type of the fields being returned. This is common in cases where ad-hoc data analysis is performed or generic tooling is written. To achieve this, column-type information is available on query responses. This can be used with Go reflection to create runtime instances of correctly typed variables which can be passed to Scan.
+You may need to read tables for which they don't know the schema or type of the fields being returned. This is common in cases where ad-hoc data analysis is performed or generic tooling is written. To achieve this, column-type information is available on query responses. This can be used with Go reflection to create runtime instances of correctly typed variables which can be passed to Scan.
```go
const query = `
@@ -1883,7 +1883,7 @@ Full details on exploiting tracing can be found under [OpenTelemetry support](/o
## Database/SQL API {#databasesql-api}
-The `database/sql` or "standard" API allows you to use the client in scenarios where application code should be agnostic of the underlying databases by conforming to a standard interface. This comes at some expense - additional layers of abstraction and indirection and primitives which are not necessarily aligned with ClickHouse. These costs are, however, typically acceptable in scenarios where tooling needs to connect to multiple databases.
+The `database/sql` or "standard" API allows you to use the client in scenarios where application code should be agnostic of the underlying databases by conforming to a standard interface. This comes at some expense - additional layers of abstraction and indirection and primitives which aren't necessarily aligned with ClickHouse. These costs are, however, typically acceptable in scenarios where tooling needs to connect to multiple databases.
Additionally, this client supports using HTTP as the transport layer - data will still be encoded in the native format for optimal performance.
@@ -1893,7 +1893,7 @@ Full code examples for the standard API can be found [here](https://github.com/C
### Connecting {#connecting-1}
-Connection can be achieved either via a DSN string with the format `clickhouse://:?=` and `Open` method or via the `clickhouse.OpenDB` method. The latter is not part of the `database/sql` specification but returns a `sql.DB` instance. This method provides functionality such as profiling, for which there are no obvious means of exposing through the `database/sql` specification.
+Connection can be achieved either via a DSN string with the format `clickhouse://:?=` and `Open` method or via the `clickhouse.OpenDB` method. The latter isn't part of the `database/sql` specification but returns a `sql.DB` instance. This method provides functionality such as profiling, for which there are no obvious means of exposing through the `database/sql` specification.
```go
func Connect() error {
@@ -2168,7 +2168,7 @@ _, err = conn.Exec("INSERT INTO example VALUES (1, 'test-1')")
[Full Example](https://github.com/ClickHouse/clickhouse-go/blob/main/examples/std/exec.go)
-This method does not support receiving a context - by default, it executes with the background context. You can use `ExecContext` if this is needed - see [Using Context](#using-context).
+This method doesn't support receiving a context - by default, it executes with the background context. You can use `ExecContext` if this is needed - see [Using Context](#using-context).
### Batch Insert {#batch-insert-1}
@@ -2305,7 +2305,7 @@ Unless stated, complex type handling should be the same as the [ClickHouse API](
#### Maps {#maps}
-Unlike the ClickHouse API, the standard API requires maps to be strongly typed at scan type. For example, you cannot pass a `map[string]interface{}` for a `Map(String,String)` field and must use a `map[string]string` instead. An `interface{}` variable will always be compatible and can be used for more complex structures. Structs are not supported at read time.
+Unlike the ClickHouse API, the standard API requires maps to be strongly typed at scan type. For example, you can't pass a `map[string]interface{}` for a `Map(String,String)` field and must use a `map[string]string` instead. An `interface{}` variable will always be compatible and can be used for more complex structures. Structs aren't supported at read time.
```go
var (
@@ -2579,7 +2579,7 @@ if err := rows.Err(); err != nil {
### Dynamic scanning {#dynamic-scanning-1}
-Similar to the [ClickHouse API](#dynamic-scanning), column type information is available to allow you to create runtime instances of correctly typed variables which can be passed to Scan. This allows columns to be read where the type is not known.
+Similar to the [ClickHouse API](#dynamic-scanning), column type information is available to allow you to create runtime instances of correctly typed variables which can be passed to Scan. This allows columns to be read where the type isn't known.
```go
const query = `
@@ -2695,7 +2695,7 @@ fmt.Printf("external_table_1 UNION external_table_2: %d\n", count)
### Open telemetry {#open-telemetry-1}
-ClickHouse allows a [trace context](/operations/opentelemetry/) to be passed as part of the native protocol. The client allows a Span to be created via the function `clickhouse.withSpan` and passed via the Context to achieve this. This is not supported when HTTP is used as transport.
+ClickHouse allows a [trace context](/operations/opentelemetry/) to be passed as part of the native protocol. The client allows a Span to be created via the function `clickhouse.withSpan` and passed via the Context to achieve this. This isn't supported when HTTP is used as transport.
```go
var count uint64
diff --git a/docs/integrations/language-clients/java/client/client.mdx b/docs/integrations/language-clients/java/client/client.mdx
index 0dff8f9721c..7f6c4a71d4c 100644
--- a/docs/integrations/language-clients/java/client/client.mdx
+++ b/docs/integrations/language-clients/java/client/client.mdx
@@ -114,7 +114,7 @@ Client client = new Client.Builder()
```
:::note
-SSL Authentication may be hard to troubleshoot on production because many errors from SSL libraries provide not enough information. For example, if client certificate and key do not match then server will terminate connection immediately (in case of HTTP it will be connection initiation stage where no HTTP requests are send so no response is sent).
+SSL Authentication may be hard to troubleshoot on production because many errors from SSL libraries provide not enough information. For example, if client certificate and key don't match then server will terminate connection immediately (in case of HTTP it will be connection initiation stage where no HTTP requests are send so no response is sent).
Please use tools like [openssl](https://docs.openssl.org/master/man1/openssl/) to verify certificates and keys:
- check key integrity: `openssl rsa -in [key-file.key] -check -noout`
@@ -127,7 +127,7 @@ Please use tools like [openssl](https://docs.openssl.org/master/man1/openssl/) t
## Configuration {#configuration}
All settings are defined by instance methods (a.k.a configuration methods) that make the scope and context of each value clear.
-Major configuration parameters are defined in one scope (client or operation) and do not override each other.
+Major configuration parameters are defined in one scope (client or operation) and don't override each other.
Configuration is defined during client creation. See `com.clickhouse.client.api.Client.Builder`.
@@ -499,14 +499,14 @@ Configuration options for insert operations.
| `serverSetting(String name, String value)` | Sets individual server settings for an operation. |
| `serverSetting(String name, Collection values)` | Sets individual server settings with multiple values for an operation. Items of the collection should be `String` values. |
| `setDBRoles(Collection dbRoles)` | Sets DB roles to be set before executing an operation. Items of the collection should be `String` values. |
-| `setOption(String option, Object value)` | Sets a configuration option in raw format. This is not a server setting. |
+| `setOption(String option, Object value)` | Sets a configuration option in raw format. This isn't a server setting. |
### InsertResponse {#insertresponse}
Response object that holds result of insert operation. It is only available if the client got response from a server.
:::note
-This object should be closed as soon as possible to release a connection because the connection cannot be re-used until all data of previous response is fully read.
+This object should be closed as soon as possible to release a connection because the connection can't be re-used until all data of previous response is fully read.
:::
| Method | Description |
@@ -659,21 +659,21 @@ Configuration options for query operations.
|----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| `setQueryId(String queryId)` | Sets query ID that will be assigned to the operation. |
| `setFormat(ClickHouseFormat format)` | Sets response format. See `RowBinaryWithNamesAndTypes` for the full list. |
-| `setMaxExecutionTime(Integer maxExecutionTime)` | Sets operation execution time on server. Will not affect read timeout. |
+| `setMaxExecutionTime(Integer maxExecutionTime)` | Sets operation execution time on server. Won't affect read timeout. |
| `waitEndOfQuery(Boolean waitEndOfQuery)` | Requests the server to wait for the end of the query before sending a response. |
| `setUseServerTimeZone(Boolean useServerTimeZone)` | Server timezone (see client config) will be used to parse date/time types in the result of an operation. Default `false`. |
| `setUseTimeZone(String timeZone)` | Requests server to use `timeZone` for time conversion. See [session_timezone](/operations/settings/settings#session_timezone). |
| `serverSetting(String name, String value)` | Sets individual server settings for an operation. |
| `serverSetting(String name, Collection values)` | Sets individual server settings with multiple values for an operation. Items of the collection should be `String` values. |
| `setDBRoles(Collection dbRoles)` | Sets DB roles to be set before executing an operation. Items of the collection should be `String` values. |
-| `setOption(String option, Object value)` | Sets a configuration option in raw format. This is not a server setting. |
+| `setOption(String option, Object value)` | Sets a configuration option in raw format. This isn't a server setting. |
### QueryResponse {#queryresponse}
Response object that holds result of query execution. It is only available if the client got a response from a server.
:::note
-This object should be closed as soon as possible to release a connection because the connection cannot be re-used until all data of previous response is fully read.
+This object should be closed as soon as possible to release a connection because the connection can't be re-used until all data of previous response is fully read.
:::
| Method | Description |
@@ -1374,7 +1374,7 @@ Here is a list of options to configure load balancing:
| Property | Default | Description |
|-----------------------|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| load_balancing_policy | `""` | The load-balancing policy can be one of:
`firstAlive` - request is sent to the first healthy node from the managed node list
`random` - request is sent to a random node from the managed node list
`roundRobin` - request is sent to each node from the managed node list, in turn.
full qualified class name implementing `ClickHouseLoadBalancingPolicy` - custom load balancing policy
If it is not specified the request is sent to the first node from the managed node list |
+| load_balancing_policy | `""` | The load-balancing policy can be one of:
`firstAlive` - request is sent to the first healthy node from the managed node list
`random` - request is sent to a random node from the managed node list
`roundRobin` - request is sent to each node from the managed node list, in turn.
full qualified class name implementing `ClickHouseLoadBalancingPolicy` - custom load balancing policy
If it isn't specified the request is sent to the first node from the managed node list |
| load_balancing_tags | `""` | Load balancing tags for filtering out nodes. Requests are sent only to nodes that have the specified tags |
| health_check_interval | `0` | Health check interval in milliseconds, zero or negative value means one-time. |
| health_check_method | `ClickHouseHealthCheckMethod.SELECT_ONE` | Health check method. Can be one of:
`ClickHouseHealthCheckMethod.SELECT_ONE` - check with `select 1` query
`ClickHouseHealthCheckMethod.PING` - protocol-specific check, which is generally faster
|
diff --git a/docs/integrations/language-clients/java/index.md b/docs/integrations/language-clients/java/index.md
index 3d0cecff088..1ad78306519 100644
--- a/docs/integrations/language-clients/java/index.md
+++ b/docs/integrations/language-clients/java/index.md
@@ -84,7 +84,7 @@ Java Client was developed far back in 2015. Its codebase became very hard to mai
[ClickHouse Data Types](/sql-reference/data-types)
:::note
-- AggregatedFunction - :warning: does not support `SELECT * FROM table ...`
+- AggregatedFunction - :warning: doesn't support `SELECT * FROM table ...`
- Decimal - `SET output_format_decimal_trailing_zeros=1` in 21.9+ for consistency
- Enum - can be treated as both string and integer
- UInt64 - mapped to `long` in client-v1
@@ -129,7 +129,7 @@ JDBC Drive inherits same features as underlying client implementation. Other JDB
### Logging {#logging}
Our Java language client uses [SLF4J](https://www.slf4j.org/) for logging. You can use any SLF4J-compatible logging framework, such as `Logback` or `Log4j`.
-For example, if you are using Maven you could add the following dependency to your `pom.xml` file:
+For example, if you're using Maven you could add the following dependency to your `pom.xml` file:
```xml title="pom.xml"
@@ -158,7 +158,7 @@ For example, if you are using Maven you could add the following dependency to yo
#### Configuring logging {#configuring-logging}
-This is going to depend on the logging framework you are using. For example, if you are using `Logback`, you could configure logging in a file called `logback.xml`:
+This is going to depend on the logging framework you're using. For example, if you're using `Logback`, you could configure logging in a file called `logback.xml`:
```xml title="logback.xml"
diff --git a/docs/integrations/language-clients/java/jdbc/jdbc.mdx b/docs/integrations/language-clients/java/jdbc/jdbc.mdx
index 58563d71959..89da1830c64 100644
--- a/docs/integrations/language-clients/java/jdbc/jdbc.mdx
+++ b/docs/integrations/language-clients/java/jdbc/jdbc.mdx
@@ -91,14 +91,14 @@ You can use the old JDBC implementation by setting the `clickhouse.jdbc.v1` prop
There are a few things to note about the URL syntax:
- **only** one endpoint is allowed in the URL
-- protocol should be specified when it is not the default one - 'HTTP'
-- port should be specified when it is not the default one '8123'
-- driver do not guess the protocol from the port, you need to specify it explicitly
-- `ssl` parameter is not required when protocol is specified.
+- protocol should be specified when it isn't the default one - 'HTTP'
+- port should be specified when it isn't the default one '8123'
+- driver don't guess the protocol from the port, you need to specify it explicitly
+- `ssl` parameter isn't required when protocol is specified.
### Connection Properties
Main configuration parameters are defined in the [java client](/integrations/language-clients/java/client#client-configuration). They should be passed
-as is to the driver. Driver has some own properties that are not part of the client configuration they are listed below.
+as is to the driver. Driver has some own properties that aren't part of the client configuration they're listed below.
**Driver properties**:
| Property | Default | Description |
@@ -191,7 +191,7 @@ There are few ways to change the mapping:
- `rs.getObject(1, Float64.class)` will return `Float64` value of `Int8` column.
- `rs.getLong(1)` will return `Long` value of `Int8` column.
- `rs.getByte(1)` can return `Byte` value of `Int16` column if it fits into `Byte`.
-- conversion from wider to narrower type is not recommend because of data coruption risk.
+- conversion from wider to narrower type isn't recommend because of data coruption risk.
- `Bool` type acts as number, too.
- All number types can be read as `java.lang.String`.
@@ -235,7 +235,7 @@ There are few ways to change the mapping:
- `rs.getObject(1, java.time.LocalDate.class)` will return `java.time.LocalDate` value of `Date` column.
- `rs.getObject(1, java.time.LocalDateTime.class)` will return `java.time.LocalDateTime` value of `DateTime` column.
- `rs.getObject(1, java.time.LocalTime.class)` will return `java.time.LocalTime` value of `Time` column.
-- `Date`, `Date32`, `Time`, `Time64` is not affected by the timezone of the server.
+- `Date`, `Date32`, `Time`, `Time64` isn't affected by the timezone of the server.
- `DateTime`, `DateTime64` is affected by the timezone of the server or session timezone.
- `DateTime` and `DateTime64` can be retrieved as `ZonedDateTime` by using `getObject(colIndex, ZonedDateTime.class)`.
@@ -250,10 +250,10 @@ There are few ways to change the mapping:
- `Array` is mapped to `java.sql.Array` by default to be compatible with JDBC. This is also done to give more information about returned array value. Useful for type inference.
- `Array` implements `getResultSet()` method to return `java.sql.ResultSet` with the same content as the original array.
-- Collection types should not be read as `java.lang.String` because it is not a valid way to represent the data (Ex. there is no quoting for string values in array).
+- Collection types shouldn't be read as `java.lang.String` because it isn't a valid way to represent the data (Ex. there is no quoting for string values in array).
- `Map` is mapped to `JAVA_OBJECT` because value can be read only with `getObject(columnIndex, Class)` method.
- - `Map` is not a `java.sql.Struct` because it doesn't have named columns.
-- `Tuple` is mapped to `Object[]` because it can contain different types and using `List` is not valid.
+ - `Map` isn't a `java.sql.Struct` because it doesn't have named columns.
+- `Tuple` is mapped to `Object[]` because it can contain different types and using `List` isn't valid.
- `Tuple` can be read as `Array` by using `getObject(columnIndex, Array.class)` method. In this case `Array#baseTypeName` will return `Tuple` column definition.
@@ -282,9 +282,9 @@ There are few ways to change the mapping:
| AggregateFunction | OTHER | (binary representation) |
| SimpleAggregateFunction | (wrapped type) | (wrapped class) |
-- `UUID` is not JDBC standard type. However it is part of JDK. By default `java.util.UUID` is returned on `getObject()` method.
+- `UUID` isn't JDBC standard type. However it is part of JDK. By default `java.util.UUID` is returned on `getObject()` method.
- `UUID` can be read/written as `String` by using `getObject(columnIndex, String.class)` method.
-- `IPv4` and `IPv6` are not JDBC standard types. However they are part of JDK. By default `java.net.Inet4Address` and `java.net.Inet6Address` are returned on `getObject()` method.
+- `IPv4` and `IPv6` aren't JDBC standard types. However they're part of JDK. By default `java.net.Inet4Address` and `java.net.Inet6Address` are returned on `getObject()` method.
- `IPv4` and `IPv6` can be read/written as `String` by using `getObject(columnIndex, String.class)` method.
### Handling Dates, Times, and Timezones {#handling-dates-times-and-timezones}
@@ -444,7 +444,7 @@ properties.setProperty("socket_keepalive", "true");
| Streaming Data With `PreparedStatement` | Supported | Not supported |
- JDBC V2 is implemented to be more lightweight and some features were removed.
- - Streaming Data is not supported in JDBC V2 because it is not part of the JDBC spec and Java.
+ - Streaming Data isn't supported in JDBC V2 because it isn't part of the JDBC spec and Java.
- JDBC V2 expects explicit configuration. No failover defaults.
- Protocol should be specified in the URL. No implicit protocol detection using port numbers.
@@ -457,7 +457,7 @@ There are only two enums:
Connection properties are parsed in the following way:
- URL is parsed first for properties. They override all other properties.
-- Driver properties are not passed to the client.
+- Driver properties aren't passed to the client.
- Endpoints (host, port, protocol) are parsed from the URL.
Example:
@@ -545,7 +545,7 @@ try (Connection conn = DriverManager.getConnection(url, properties)) {
- In V2 `Array` is mapped to `java.sql.Array` by default to be compatible with JDBC. This is also done to give more information about returned array value. Useful for type inference.
- In V2 `Array` implements `getResultSet()` method to return `java.sql.ResultSet` with the same content as the original array.
- V1 uses `STRUCT` for `Map` but returns `java.util.Map` object always. V2 fixes this by mapping `Map` to `JAVA_OBJECT`. Besides `STRUCT` is invalid for `Map` because it doesn't have named columns.
-- V1 uses `STRUCT` for `Tuple` but returns `List