Describe the bug
PushDownLeafProjections throws an error when it tries to push down an expression through an Unnest node. For example, given this table:
CREATE TABLE struct_and_list_table
AS VALUES
(struct(1, 2), [10, 20, 30]),
(struct(3, 4), [40, 50]);
When trying to run this query:
SELECT sum(get_field(s, 'c0'))
FROM (SELECT s, unnest(arr)
FROM (SELECT column1 AS s, column2 AS arr
FROM struct_and_list_table));
the optimizer throws this error:
Internal error: Assertion failed: expr.is_empty(): Unnest(Unnest { input: Projection(Projection { expr: [Alias(Alias { expr: Column(Column { relation: Some(Bare { table: "struct_and_list_table" }), name: "column1" }), relation: None, name: "s", metadata: None }), Alias(Alias { expr: Column(Column { relation: Some(Bare { table: "struct_and_list_table" }), name: "column2" }), relation: None, name: "__unnest_placeholder(arr)", metadata: None })], input: TableScan(TableScan { table_name: Bare { table: "struct_and_list_table" }, source: "...", projection: Some([0, 1]), projected_schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Struct([Field { name: "c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true }, Field { name: "column2", data_type: List(Field { data_type: Int64, nullable: true }), nullable: true }], metadata: {} }, field_qualifiers: [Some(Bare { table: "struct_and_list_table" }), Some(Bare { table: "struct_and_list_table" })], functional_dependencies: FunctionalDependencies { deps: [] } }, filters: [], fetch: None, .. }), schema: DFSchema { inner: Schema { fields: [Field { name: "s", data_type: Struct([Field { name: "c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true }, Field { name: "__unnest_placeholder(arr)", data_type: List(Field { data_type: Int64, nullable: true }), nullable: true }], metadata: {} }, field_qualifiers: [None, None], functional_dependencies: FunctionalDependencies { deps: [] } } }), exec_columns: [Column { relation: None, name: "__unnest_placeholder(arr)" }], list_type_columns: [(1, ColumnUnnestList { output_column: Column { relation: None, name: "__unnest_placeholder(arr,depth=1)" }, depth: 1 })], struct_type_columns: [], dependency_indices: [0, 1], schema: DFSchema { inner: Schema { fields: [Field { name: "s", data_type: Struct([Field { name: "c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true }, Field { name: "__unnest_placeholder(arr,depth=1)", data_type: Int64, nullable: true }], metadata: {} }, field_qualifiers: [None, None], functional_dependencies: FunctionalDependencies { deps: [] } }, options: UnnestOptions { preserve_nulls: false, recursions: [RecursionUnnestOption { input_column: Column { relation: None, name: "__unnest_placeholder(arr)" }, output_column: Column { relation: None, name: "__unnest_placeholder(arr,depth=1)" }, depth: 1 }] } }) should have no exprs, got [Column(Column { relation: None, name: "__unnest_placeholder(arr)" })].
The problem is in this line in function try_push_into_inputs in extract_leaf_expressions.rs:
let new_node = node.with_new_exprs(node.expressions(), new_inputs)?;
This doesn't work with Unnest nodes, as these nodes return the columns to unnest in the expression() functions, but don't accept them in the with_new_expressions, contrary to other nodes.
But there is a secondary problem that would appear when the leaf expression refers to the unnested column, like in this case:
CREATE TABLE list_struct_table
AS VALUES
([struct(1, 'a'), struct(2, 'b')]),
([struct(3, 'c')]);
SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
FROM list_struct_table;
The problem is in the logic that try_push_into_inputs uses to verify that an expression can be pushed through a node. It needs to make sure that the expressions to push through a node are in the input of that node to allow it to be pushed through. The problem is that for Unnest, the columns being unnest are going to have the same name as the input columns. Because the check compares just the names, it assumes this means that the columns referred in the expression are in the Unnest input and the check passes. But these are not the columns that the expression refers to. The expression is referring to the unnested columns, not to the input list columns, and therefore we can't push the expressions.
To Reproduce
First case:
CREATE TABLE struct_and_list_table
AS VALUES
(struct(1, 2), [10, 20, 30]),
(struct(3, 4), [40, 50]);
SELECT sum(get_field(s, 'c0'))
FROM (SELECT s, unnest(arr)
FROM (SELECT column1 AS s, column2 AS arr
FROM struct_and_list_table));
Second case:
CREATE TABLE list_struct_table
AS VALUES
([struct(1, 'a'), struct(2, 'b')]),
([struct(3, 'c')]);
query IT
SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
FROM list_struct_table;
Both queries fail with:
Internal error: Assertion failed: expr.is_empty():
Expected behavior
Both queries work.
First query result:
Second query result:
Additional context
No response
Describe the bug
PushDownLeafProjectionsthrows an error when it tries to push down an expression through anUnnestnode. For example, given this table:When trying to run this query:
the optimizer throws this error:
The problem is in this line in function
try_push_into_inputsinextract_leaf_expressions.rs:This doesn't work with
Unnestnodes, as these nodes return the columns to unnest in theexpression()functions, but don't accept them in thewith_new_expressions, contrary to other nodes.But there is a secondary problem that would appear when the leaf expression refers to the unnested column, like in this case:
The problem is in the logic that
try_push_into_inputsuses to verify that an expression can be pushed through a node. It needs to make sure that the expressions to push through a node are in the input of that node to allow it to be pushed through. The problem is that for Unnest, the columns being unnest are going to have the same name as the input columns. Because the check compares just the names, it assumes this means that the columns referred in the expression are in theUnnestinput and the check passes. But these are not the columns that the expression refers to. The expression is referring to the unnested columns, not to the input list columns, and therefore we can't push the expressions.To Reproduce
First case:
Second case:
Both queries fail with:
Expected behavior
Both queries work.
First query result:
Second query result:
Additional context
No response