Skip to content

Support mapping to Postgres JSON type#230

Merged
theory merged 1 commit into
mainfrom
binary-json-both
May 11, 2026
Merged

Support mapping to Postgres JSON type#230
theory merged 1 commit into
mainfrom
binary-json-both

Conversation

@theory
Copy link
Copy Markdown
Collaborator

@theory theory commented May 6, 2026

In addition to the JSONB type mapping, add support for the use of the Postgres JSON type mapped to the ClickHouse JSON type. It already worked for the JSON driver, just hadn't been tested. But adding it to the binary driver required a bit more work.

Mainly that work encompasses making the binary type handling functions aware of the Postgres type used in columns. By default, the binary driver supports a short list of types that map to ClickHouse types, but allows for other types by exercising the functions in convert.c to convert between compatible types. These conversion require casts. And while Postgres supports casts between JSON and JSONB, they're COERCION_PATH_COERCEVIAIO casts, which convert.c doesn't handle.

We could consider adding support for these conversions, but since it requires using the _in and _out functions to do the conversion, and since for JSON all we really need is the text version, this seems unnecessarily wasteful in terms of CPU cycles and memory.

Instead, refactor things so that the binary conversion functions are aware of not only explicitly supported type mappings provided by get_corr_postgres_type(), but also the types used for the columns in the foreign tables. Then teach get_corr_postgres_type() to inspect these values in the context of the ClickHouse JSON type and allow both.

For fetches (selects), on the first call to ch_binary_read_row(), populate coltypes with the actual column types and teach make_datum() to examine it when considering JSON types. This requires that the List of attributes be passed to ch_binary_read_row, and also that coltypes be initialized with zero values. Ideally coltypes would be populated by an earlier hook, before fetching starts, but this will do for now.

For updates (inserts), on the other hand, create a new List and populate it with the requisite types in clickhouseBeginForeignInsert and clickhousePlanForeignModify (not sure why it needs both, but it follows the pattern of setting column attribute numbers, which are also in both places). THen pass it as a new argument to the prepare_insert function, now with an updated signature (the HTTP version currently ignores the new argument). Then fetch the types from it in ch_binary_prepare_insert() to pass to get_corr_postgres_type().

Teach init_output_convert_state() not to try to convert between JSON and JSONB types, and add comments to explain what's happening with all this type management stuff.

Finally, update the function and operator pushdown functions to support the -> and ->> operators for JSON, as well as the json_extract_path_text() and json_extract_path() functions. Rename the relevant constants to JSON_* instead of JSONB_*, since they now handle both, and JSON_* is the more generic.

Expand the JSON tests to cover all the same patterns for JSON that were previously covered for JSONB, for both the binary and http drivers. This requires additional alternate expected output files due to minor changes to earlier ClickHouse versions that did not support JSON and returned varying error messages about it.

Document the support for JSON, as well as the operators and functions. In fact, the jsonb_() functions weren't previously documented, so add them, too. Also fix reversed descriptions for the -> and ->> operators and mention that they work for both JSON and JSONB.

While at it, update the exception raised when column_append() can't handle a specific Postgres type to emit the name of the unsupported type.

@theory theory self-assigned this May 6, 2026
@theory theory added enhancement New feature or request pushdown Improvements to query pushdown data types Improve data type support casts Improve data type casting and pushdown functions Improve function pushdown drivers Improve binary and/or http driver support operators Improve operator pushdown labels May 6, 2026
@theory theory requested a review from serprex May 6, 2026 19:17
@theory theory marked this pull request as ready for review May 6, 2026 19:17
@theory theory force-pushed the binary-json-both branch from 1006c1e to 61e68c6 Compare May 6, 2026 19:39
Comment thread src/convert.c
Comment thread src/convert.c Outdated
@theory theory force-pushed the binary-json-both branch from 61e68c6 to 253fdca Compare May 6, 2026 20:18
@theory theory requested a review from serprex May 6, 2026 20:19
In addition to the JSONB type mapping, add support for the use of the
Postgres JSON type mapped to the ClickHouse JSON type. It already worked
for the JSON driver, just hadn't been tested. But adding it to the
binary driver required a bit more work.

Mainly that work encompasses making the binary type handling functions
aware of the Postgres type used in columns. By default, the binary
driver supports a short list of types that map to ClickHouse types, but
allows for other types by exercising the functions in `convert.c` to
convert between compatible types. These conversion require casts. And
while Postgres supports casts between JSON and JSONB, they're
`COERCION_PATH_COERCEVIAIO` casts, which `convert.c` doesn't handle.

We could consider adding support for these conversions, but since it
requires using the _in and _out functions to do the conversion, and
since for JSON all we really need is the text version, this seems
unnecessarily wasteful in terms of CPU cycles and memory.

Instead, refactor things so that the binary conversion functions are
aware of not only explicitly supported type mappings provided by
`get_corr_postgres_type()`, but also the types used for the columns in
the foreign tables. Then teach `get_corr_postgres_type()` to inspect
these values in the context of the ClickHouse JSON type and allow both.

For fetches (selects), on the first call to `ch_binary_read_row()`,
populate `coltypes` with the actual column types and teach
`make_datum()` to examine it when considering JSON types. This requires
that the `List` of attributes be passed to `ch_binary_read_row`, and
also that `coltypes` be initialized with zero values. Ideally `coltypes`
would be populated by an earlier hook, before fetching starts, but this
will do for now.

For updates (inserts), on the other hand, create a new `List` and
populate it with the requisite types in `clickhouseBeginForeignInsert`
and `clickhousePlanForeignModify` (not sure why it needs both, but it
follows the pattern of setting column attribute numbers, which are also
in both places). THen pass it as a new argument to the `prepare_insert`
function, now with an updated signature (the HTTP version currently
ignores the new argument). Then fetch the types from it in
`ch_binary_prepare_insert()` to pass to `get_corr_postgres_type()`.

Teach `init_output_convert_state()` not to try to convert between `JSON`
and `JSONB` types, and add comments to explain what's happening with all
this type management stuff.

Finally, update the function and operator pushdown functions to support
the `->` and `->>` operators for JSON, as well as the
`json_extract_path_text()` and `json_extract_path()` functions. Rename
the relevant constants to `JSON_*` instead of `JSONB_*`, since they now
handle both, and `JSON_*` is the more generic.

Expand the JSON tests to cover all the same patterns for JSON that were
previously covered for JSONB, for both the binary and http drivers. This
requires additional alternate expected output files due to minor changes
to earlier ClickHouse versions that did not support JSON and returned
varying error messages about it.

Document the support for JSON, as well as the operators and functions.
In fact, the `jsonb_()` functions weren't previously documented, so add
them, too. Also fix reversed descriptions for the `->` and `->>`
operators and mention that they work for both JSON and JSONB.

While at it, update the exception raised when `column_append()` can't
handle a specific Postgres type to emit the name of the unsupported
type.
@theory theory force-pushed the binary-json-both branch from 3d75450 to 4a37d3c Compare May 8, 2026 20:03
Base automatically changed from binary-json to main May 11, 2026 15:10
@theory theory merged commit 4a37d3c into main May 11, 2026
36 checks passed
@theory theory deleted the binary-json-both branch May 11, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

casts Improve data type casting and pushdown data types Improve data type support drivers Improve binary and/or http driver support enhancement New feature or request functions Improve function pushdown operators Improve operator pushdown pushdown Improvements to query pushdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants