Skip to content

Comments

reject pipe with empty tuple during validation#3972

Open
jgao54 wants to merge 1 commit intomainfrom
handle-empty-tuple
Open

reject pipe with empty tuple during validation#3972
jgao54 wants to merge 1 commit intomainfrom
handle-empty-tuple

Conversation

@jgao54
Copy link
Contributor

@jgao54 jgao54 commented Feb 20, 2026

In ClickHouse 25.12, ORDER BY TUPLE() is no longer supported. This has never worked properly in the past because snapshot would end up collapsing into a single row when PK is not specified in source database. So this PR will throw a validation error when order by empty tuple is detected.

The scenarios where empty tuple is allowed:

  • NullEngine: since it just forwards the data to MV without persisting them (the empty tuple error only applies to MergeTree)
  • When allow_suspicious_primary_key is enabled

@codecov
Copy link

codecov bot commented Feb 20, 2026

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
1651 6 1645 167
View the top 3 failed test(s) by shortest run time
github.com/PeerDB-io/peerdb/flow/e2e::TestApiMy
Stack Traces | 0.01s run time
=== RUN   TestApiMy
=== PAUSE TestApiMy
=== CONT  TestApiMy
--- FAIL: TestApiMy (0.01s)
github.com/PeerDB-io/peerdb/flow/e2e::TestApiPg
Stack Traces | 0.01s run time
=== RUN   TestApiPg
=== PAUSE TestApiPg
=== CONT  TestApiPg
--- FAIL: TestApiPg (0.01s)
github.com/PeerDB-io/peerdb/flow/e2e::TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering
Stack Traces | 0.02s run time
=== RUN   TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering
    api_test.go:368: 
        	Error Trace:	.../flow/e2e/api_test.go:368
        	Error:      	Received unexpected error:
        	            	rpc error: code = FailedPrecondition desc = failed to validate destination connector e2e_test_api_io1qkkxr_api_io1qkkxr: source table e2e_test_api_io1qkkxr.no_pkey has no primary key columns; add a primary key to the source table, specify custom ordering columns, or set allow_suspicious_primary_key=1 in your ClickHouse server profile
        	Test:       	TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering
--- FAIL: TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering (0.02s)
github.com/PeerDB-io/peerdb/flow/e2e::TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering
Stack Traces | 0.05s run time
=== RUN   TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering
2026/02/20 07:31:19 INFO Executing and processing query x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart query="SELECT id,good_column FROM e2e_test_sf_4uk1e0pp_20260220073106.\"test_lost_column_bug\" ORDER BY id"
2026/02/20 07:31:19 INFO Executing and processing query stream x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart query="SELECT id,good_column FROM e2e_test_sf_4uk1e0pp_20260220073106.\"test_lost_column_bug\" ORDER BY id"
2026/02/20 07:31:19 INFO [pg_query_executor] declared cursor x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart cursorQuery="DECLARE peerdb_cursor_16660648387943637927 CURSOR FOR SELECT id,good_column FROM e2e_test_sf_4uk1e0pp_20260220073106.\"test_lost_column_bug\" ORDER BY id" args=[]
2026/02/20 07:31:19 INFO [pg_query_executor] fetching rows start x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart query="SELECT id,good_column FROM e2e_test_sf_4uk1e0pp_20260220073106.\"test_lost_column_bug\" ORDER BY id" channelLen=0
2026/02/20 07:31:19 INFO [pg_query_executor] fetching from cursor x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart cursor=peerdb_cursor_16660648387943637927
2026/02/20 07:31:19 INFO processed row stream x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart cursor=peerdb_cursor_16660648387943637927 records=1 bytes=9 channelLen=0
2026/02/20 07:31:19 INFO [pg_query_executor] fetched rows x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart query="SELECT id,good_column FROM e2e_test_sf_4uk1e0pp_20260220073106.\"test_lost_column_bug\" ORDER BY id" rows=1 bytes=9 channelLen=0
2026/02/20 07:31:19 INFO [pg_query_executor] fetching from cursor x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart cursor=peerdb_cursor_16660648387943637927
2026/02/20 07:31:19 INFO processed row stream x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart cursor=peerdb_cursor_16660648387943637927 records=0 bytes=0 channelLen=0
2026/02/20 07:31:19 INFO [pg_query_executor] fetched rows x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart query="SELECT id,good_column FROM e2e_test_sf_4uk1e0pp_20260220073106.\"test_lost_column_bug\" ORDER BY id" rows=0 bytes=0 channelLen=0
2026/02/20 07:31:19 INFO [pg_query_executor] committing transaction x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart
2026/02/20 07:31:19 INFO [pg_query_executor] committed transaction for query x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} partitionId=testpart query="SELECT id,good_column FROM e2e_test_sf_4uk1e0pp_20260220073106.\"test_lost_column_bug\" ORDER BY id" rows=1 bytes=9 channelLen=0
    api_test.go:368: 
        	Error Trace:	.../flow/e2e/api_test.go:368
        	Error:      	Received unexpected error:
        	            	rpc error: code = FailedPrecondition desc = failed to validate destination connector e2e_test_api_ec20rlmv_api_ec20rlmv: source table e2e_test_api_ec20rlmv.no_pkey has no primary key columns; add a primary key to the source table, specify custom ordering columns, or set allow_suspicious_primary_key=1 in your ClickHouse server profile
        	Test:       	TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering
--- FAIL: TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey/Null_engine_allows_empty_ordering (0.05s)
github.com/PeerDB-io/peerdb/flow/e2e::TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey
Stack Traces | 0.08s run time
=== RUN   TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey
=== PAUSE TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey
=== CONT  TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey
2026/02/20 07:34:35 INFO Received AWS credentials from peer for connector: ci x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/02/20 07:34:35 INFO Received AWS credentials from peer for connector: clickhouse x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
    api_test.go:45: begin tearing down postgres schema api_io1qkkxr
--- FAIL: TestApiMy/TestClickHouseMirrorValidation_NoPrimaryKey (0.08s)
github.com/PeerDB-io/peerdb/flow/e2e::TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey
Stack Traces | 0.15s run time
=== RUN   TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey
=== PAUSE TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey
=== CONT  TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey
2026/02/20 07:31:19 INFO Received AWS credentials from peer for connector: ci x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/02/20 07:31:19 INFO Received AWS credentials from peer for connector: clickhouse x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
    api_test.go:45: begin tearing down postgres schema api_ec20rlmv
--- FAIL: TestApiPg/TestClickHouseMirrorValidation_NoPrimaryKey (0.15s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@jgao54 jgao54 force-pushed the handle-empty-tuple branch from eb124bf to dd0c3aa Compare February 20, 2026 07:23
@jgao54 jgao54 force-pushed the handle-empty-tuple branch from dd0c3aa to b81f1a5 Compare February 20, 2026 08:21
@jgao54 jgao54 marked this pull request as ready for review February 20, 2026 08:23
@jgao54 jgao54 requested review from ilidemi and serprex February 20, 2026 08:24
slices.ContainsFunc(tableMapping.Columns, func(col *protos.ColumnSetting) bool {
return col.Ordering > 0
})
if !hasOrderingKeys && tableMapping.Engine != protos.TableEngine_CH_ENGINE_NULL {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& tableMapping.Engine != protos.TableEngine_CH_ENGINE_MERGE_TREE
(if we're following the breaking change)

var settingVal string
if err := QueryRow(ctx, logger, conn,
"SELECT value FROM system.settings WHERE name = 'allow_suspicious_primary_key'",
).Scan(&settingVal); err == nil && settingVal == "1" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A network error here would result in a confusing message

}

return fmt.Errorf(
"source table %s has no primary key columns; add a primary key to the source table, "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't really know about primary keys on the source, it only checks a setting? It also implicitly depends on the caller validating the dest sorting keys and table engine

Comment on lines +83 to +86
hasOrderingKeys := len(processedSchema.PrimaryKeyColumns) > 0 ||
slices.ContainsFunc(tableMapping.Columns, func(col *protos.ColumnSetting) bool {
return col.Ordering > 0
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the behavior be for missing primary key, missing sorting keys but CDC on and replica identity full? As I understand it, setup flow would just take all columns as src primary key + dest sorting key in this case, and this change would validate away that scenario (but I might be wrong). Not sure if there's a way to unify all that logic somehow, possible there's not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants