diff --git a/integrations/connectors/data-ingestion/kafka/kafka-clickhouse-connect-sink.mdx b/integrations/connectors/data-ingestion/kafka/kafka-clickhouse-connect-sink.mdx index 1495b545..58a6f7ed 100644 --- a/integrations/connectors/data-ingestion/kafka/kafka-clickhouse-connect-sink.mdx +++ b/integrations/connectors/data-ingestion/kafka/kafka-clickhouse-connect-sink.mdx @@ -233,9 +233,7 @@ The connector can consume data from multiple topics } ``` -#### Using with different data formats {#using-with-different-data-formats} - -##### Avro schema support {#avro-schema-support} +### Avro schema support {#avro-schema-support} ```json { @@ -250,7 +248,7 @@ The connector can consume data from multiple topics } ``` -###### Avro type mapping {#avro-type-mapping} +#### Avro type mapping {#avro-type-mapping} The type mapping below is defined by `io.confluent.connect.avro.AvroConverter`, the official Avro serializer/deserializer implementation in Kafka Connect. See the Kafka Connect [docs](https://docs.confluent.io/platform/current/connect/userguide.html#avro) for advanced information on conversion logic. ✅: Supported @@ -259,26 +257,26 @@ The type mapping below is defined by `io.confluent.connect.avro.AvroConverter`, ️⚠️: Partially supported -| Avro Type | Kafka Connect Type | Supported | Notes | -|-----------|--------------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| null | _N/A_ | ❌ | Not supported as a standalone type, but can be used in unions | -| boolean | BOOLEAN | ✅ | | -| int | INT8/INT16/INT32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has property `connect.type=int8` (analagously for INT16 if `connect.type=int16`) | -| long | INT64 | ✅ | | -| float | FLOAT32 | ✅ | | -| double | FLOAT64 | ✅ | | -| bytes | BYTES | ✅ | | -| string | STRING | ✅ | | -| record | STRUCT | ✅ | | -| enum | STRING | ✅ | | -| array | ARRAY/MAP | ✅ | Defaults to ARRAY. Resolves to MAP if the field was originally constructed via `AvroData.fromConnectSchema` ([source](https://github.com/confluentinc/schema-registry/blob/174907bfc0d9424e8d02e788f450f4afcdda1750/avro-data/src/main/java/io/confluent/connect/avro/AvroData.java#L943)) | -| map | MAP | ✅ | | -| union | STRUCT/`` | ⚠️ | Defaults to STRUCT. Resolves to the singleton type `T` in the union definition if `flatten.singleton.unions=true` (see [docs](https://docs.confluent.io/cloud/current/connectors/reference/connector-configuration.html#value-converter-flatten-singleton-unions)) | -| fixed | BYTES | ⚠️ | Fixed `decimal` logical type is not supported (see below) | +| Avro Type | Kafka Connect Type | Supported | Notes | +|---|---|---|---| +| null | _N/A_ | ❌ | Not supported as a standalone type, but can be used in unions | +| boolean | BOOLEAN | ✅ | | +| int | INT8/INT16/INT32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has property `connect.type=int8` (analogously for INT16 if `connect.type=int16`) | +| long | INT64 | ✅ | | +| float | FLOAT32 | ✅ | | +| double | FLOAT64 | ✅ | | +| bytes | BYTES | ✅ | | +| string | STRING | ✅ | | +| record | STRUCT | ✅ | | +| enum | STRING | ✅ | | +| array | ARRAY/MAP | ✅ | Defaults to ARRAY. Resolves to MAP if the field was originally constructed via `AvroData.fromConnectSchema` ([source](https://github.com/confluentinc/schema-registry/blob/174907bfc0d9424e8d02e788f450f4afcdda1750/avro-data/src/main/java/io/confluent/connect/avro/AvroData.java#L943)) | +| map | MAP | ✅ | | +| union | STRUCT/`` | ⚠️ | Defaults to STRUCT. Resolves to the singleton type `T` in the union definition if `flatten.singleton.unions=true` (see [docs](https://docs.confluent.io/cloud/current/connectors/reference/connector-configuration.html#value-converter-flatten-singleton-unions)) | +| fixed | BYTES | ⚠️ | Fixed `decimal` logical type is not supported (see below) | Refer to [Supported data types](#supported-data-types) for the mapping between Kafka Connect types and ClickHouse types. -###### Unsupported Avro schemas {#unsupported-avro-schemas} +#### Unsupported Avro schemas {#unsupported-avro-schemas} The following Avro schemas are unsupported by the connector: - fixed `decimal` logical type @@ -318,7 +316,7 @@ The following Avro schemas are unsupported by the connector: } ``` -##### Protobuf schema support {#protobuf-schema-support} +### Protobuf schema support {#protobuf-schema-support} ```json { @@ -335,7 +333,7 @@ The following Avro schemas are unsupported by the connector: Please note: if you encounter issues with missing classes, not every environment comes with the protobuf converter and you may need an alternate release of the jar bundled with dependencies. -###### Protobuf type mapping {#proto-type-mapping} +#### Protobuf type mapping {#proto-type-mapping} The type mapping below is defined by `io.confluent.connect.protobuf.ProtobufConverter`, the official Protobuf serializer/deserializer implementation in Kafka Connect. See the Kafka Connect [docs](https://docs.confluent.io/platform/current/connect/userguide.html#json-schema-and-protobuf) for advanced information on conversion logic. ✅: Supported @@ -344,44 +342,47 @@ The type mapping below is defined by `io.confluent.connect.protobuf.ProtobufConv ️⚠️: Partially supported -| Protobuf Type | Kafka Connect Type | Supported | Notes | -|-----------------------------------------|-----------------------------------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| double | FLOAT64 | ✅ | | -| float | FLOAT32 | ✅ | | -| int32 | INT8/INT16/INT32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has option `connect.type=int8` (analogously for INT16 if `connect.type=int16`) | -| sint32 | INT8/INT16/INT32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has option `connect.type=int8` (analogously for INT16 if `connect.type=int16`) | -| sfixed32 | INT8/INT16/INT32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has option `connect.type=int8` (analogously for INT16 if `connect.type=int16`) | -| uint32 | INT64 | ✅ | | -| fixed32 | INT64 | ✅ | | -| int64 | INT64 | ✅ | | -| uint64 | INT64 | ✅ | | -| sint64 | INT64 | ✅ | | -| fixed64 | INT64 | ✅ | | -| sfixed64 | INT64 | ✅ | | -| bool | BOOLEAN | ✅ | | -| string | STRING | ✅ | | -| bytes | BYTES | ✅ | | -| enum | INT32/STRING | ✅ | Defaults to STRING. Resolves to INT32 if `int.for.enums=true` (see [schema registry docs](https://docs.confluent.io/platform/current/schema-registry/connect.html#protobuf)) | -| message | STRUCT | ⚠️ | See Unsupported schemas section below | -| repeated T (where T is not a map entry) | ARRAY | ✅ | | -| `map` | MAP | ✅ | | -| oneof | STRUCT | ⚠️ | See section below on translating oneof to ClickHouse schema | -| google.protobuf.DoubleValue | FLOAT64 | ✅ | | -| google.protobuf.FloatValue | FLOAT32 | ✅ | | -| google.protobuf.Int64Value | INT64 | ✅ | | -| google.protobuf.UInt64Value | INT64 | ✅ | | -| google.protobuf.UInt32Value | INT64 | ✅ | | -| google.protobuf.Int32Value | INT32 | ✅ | | -| google.protobuf.BoolValue | BOOLEAN | ✅ | | -| google.protobuf.StringValue | STRING | ✅ | | -| google.protobuf.BytesValue | BYTES | ✅ | | -| google.protobuf.Timestamp | org.apache.kafka.connect.data.Timestamp | ✅ | | -| google.type.Date | org.apache.kafka.connect.data.Date | ✅ | | -| google.type.TimeOfDay | org.apache.kafka.connect.data.Time | ✅ | | +| Protobuf Type | Kafka Connect Type | ClickHouse Type | Supported | Notes | +|:---|:---|:---|:---|:---| +| double | FLOAT64 | Float64 | ✅ | | +| float | FLOAT32 | Float32 | ✅ | | +| int32 | INT8/INT16/INT32 | Int32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has option `connect.type=int8` (analogously for INT16 if `connect.type=int16`) | +| sint32 | INT8/INT16/INT32 | Int32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has option `connect.type=int8` (analogously for INT16 if `connect.type=int16`) | +| sfixed32 | INT8/INT16/INT32 | Int32 | ✅ | Defaults to INT32. Resolves to INT8 if the schema has option `connect.type=int8` (analogously for INT16 if `connect.type=int16`) | +| uint32 | INT64 | UInt32 | ✅ | | +| fixed32 | INT64 | UInt32 | ✅ | | +| int64 | INT64 | Int64 | ✅ | | +| uint64 | INT64 | UInt64 | ✅ | | +| sint64 | INT64 | Int64 | ✅ | | +| fixed64 | INT64 | UInt64 | ✅ | | +| sfixed64 | INT64 | Int64 | ✅ | | +| bool | BOOLEAN | Bool | ✅ | | +| string | STRING | String | ✅ | | +| bytes | BYTES | String | ✅ | | +| enum | INT32/STRING | Int32 | ✅ | Defaults to STRING. Resolves to INT32 if `int.for.enums=true` (see [schema registry docs](https://docs.confluent.io/platform/current/schema-registry/connect.html#protobuf)) | +| message | STRUCT | Tuple / JSON | ⚠️ | See Unsupported schemas section below | +| repeated T (where T is not a map entry) | ARRAY | Array(T) | ✅ | | +| `map` | MAP | Map(K, V) | ✅ | | +| oneof | STRUCT | Tuple / Variant | ⚠️ | See section below on translating oneof to ClickHouse schema | +| google.protobuf.DoubleValue | FLOAT64 | Nullable(Float64) | ✅ | | +| google.protobuf.FloatValue | FLOAT32 | Nullable(Float32) | ✅ | | +| google.protobuf.Int64Value | INT64 | Nullable(Int64) | ✅ | | +| google.protobuf.UInt64Value | INT64 | Nullable(UInt64) | ✅ | | +| google.protobuf.UInt32Value | INT64 | Nullable(UInt32) | ✅ | | +| google.protobuf.Int32Value | INT32 | Nullable(Int32) | ✅ | | +| google.protobuf.BoolValue | BOOLEAN | Nullable(Bool) | ✅ | | +| google.protobuf.StringValue | STRING | Nullable(String) | ✅ | | +| google.protobuf.BytesValue | BYTES | Nullable(String) | ✅ | | +| google.protobuf.Timestamp | org.apache.kafka.connect.data.Timestamp | DateTime64(3) | ✅ | | +| google.type.Date | org.apache.kafka.connect.data.Date | Date | ✅ | | +| google.type.TimeOfDay | org.apache.kafka.connect.data.Time | Int32 / Int64 | ✅ | | +| google.protobuf.Duration | STRUCT | Tuple(`seconds` Int64, `nano` Nullable(Int32)) | ✅ | | +| google.protobuf.Any | _N/A_ | _N/A_ | ❌ | | +| google.protobuf.Empty | _N/A_ | _N/A_ | ❌ | | Refer to [Supported data types](#supported-data-types) for the mapping between Kafka Connect types and ClickHouse types. -###### Note on translating `oneof` fields to ClickHouse columns {#oneof-translation} +#### Note on translating `oneof` fields to ClickHouse columns {#oneof-translation} The connector does not support translating Protobuf unions (`oneof`) to the ClickHouse Variant type. Instead, list the `oneof` fields as individual nullable fields in your ClickHouse table schema. For example: @@ -409,7 +410,7 @@ CREATE TABLE IF NOT EXISTS `StringIntUnion` ) ENGINE = ...; ``` -###### Unsupported Protobuf schemas {#unsupported-proto-schemas} +#### Unsupported Protobuf schemas {#unsupported-proto-schemas} The following Protobuf schemas are unsupported by the connector: - multi-message unions (**before CH version 26.1**) ```protobuf @@ -437,7 +438,7 @@ message TwoRecords { From CH version 26.1 onwards, this schema is supported when `allow_experimental_nullable_tuple_type=1` (see [this documentation page](/reference/settings/session-settings#allow_experimental_nullable_tuple_type)). -##### JSON schema support {#json-schema-support} +### JSON schema support {#json-schema-support} ```json { @@ -450,7 +451,7 @@ From CH version 26.1 onwards, this schema is supported when `allow_experimental_ } ``` -##### String support {#string-support} +### String support {#string-support} The connector supports the String Converter in different ClickHouse formats: [JSON](/reference/formats/JSON/JSONEachRow), [CSV](/reference/formats/CSV/CSV), and [TSV](/reference/formats/TabSeparated/TabSeparated).