Skip to content

PushDownLeafProjections error with Unnest nodes #22615

@pabadrubio

Description

@pabadrubio

Describe the bug

PushDownLeafProjections throws an error when it tries to push down an expression through an Unnest node. For example, given this table:

CREATE TABLE struct_and_list_table
AS VALUES
    (struct(1, 2), [10, 20, 30]),
    (struct(3, 4), [40, 50]);

When trying to run this query:

SELECT sum(get_field(s, 'c0'))
FROM (SELECT s, unnest(arr)
      FROM (SELECT column1 AS s, column2 AS arr
            FROM struct_and_list_table));

the optimizer throws this error:

Internal error: Assertion failed: expr.is_empty(): Unnest(Unnest { input: Projection(Projection { expr: [Alias(Alias { expr: Column(Column { relation: Some(Bare { table: "struct_and_list_table" }), name: "column1" }), relation: None, name: "s", metadata: None }), Alias(Alias { expr: Column(Column { relation: Some(Bare { table: "struct_and_list_table" }), name: "column2" }), relation: None, name: "__unnest_placeholder(arr)", metadata: None })], input: TableScan(TableScan { table_name: Bare { table: "struct_and_list_table" }, source: "...", projection: Some([0, 1]), projected_schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Struct([Field { name: "c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true }, Field { name: "column2", data_type: List(Field { data_type: Int64, nullable: true }), nullable: true }], metadata: {} }, field_qualifiers: [Some(Bare { table: "struct_and_list_table" }), Some(Bare { table: "struct_and_list_table" })], functional_dependencies: FunctionalDependencies { deps: [] } }, filters: [], fetch: None, .. }), schema: DFSchema { inner: Schema { fields: [Field { name: "s", data_type: Struct([Field { name: "c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true }, Field { name: "__unnest_placeholder(arr)", data_type: List(Field { data_type: Int64, nullable: true }), nullable: true }], metadata: {} }, field_qualifiers: [None, None], functional_dependencies: FunctionalDependencies { deps: [] } } }), exec_columns: [Column { relation: None, name: "__unnest_placeholder(arr)" }], list_type_columns: [(1, ColumnUnnestList { output_column: Column { relation: None, name: "__unnest_placeholder(arr,depth=1)" }, depth: 1 })], struct_type_columns: [], dependency_indices: [0, 1], schema: DFSchema { inner: Schema { fields: [Field { name: "s", data_type: Struct([Field { name: "c0", data_type: Int64, nullable: true }, Field { name: "c1", data_type: Int64, nullable: true }]), nullable: true }, Field { name: "__unnest_placeholder(arr,depth=1)", data_type: Int64, nullable: true }], metadata: {} }, field_qualifiers: [None, None], functional_dependencies: FunctionalDependencies { deps: [] } }, options: UnnestOptions { preserve_nulls: false, recursions: [RecursionUnnestOption { input_column: Column { relation: None, name: "__unnest_placeholder(arr)" }, output_column: Column { relation: None, name: "__unnest_placeholder(arr,depth=1)" }, depth: 1 }] } }) should have no exprs, got [Column(Column { relation: None, name: "__unnest_placeholder(arr)" })].

The problem is in this line in function try_push_into_inputs in extract_leaf_expressions.rs:

let new_node = node.with_new_exprs(node.expressions(), new_inputs)?;

This doesn't work with Unnest nodes, as these nodes return the columns to unnest in the expression() functions, but don't accept them in the with_new_expressions, contrary to other nodes.

But there is a secondary problem that would appear when the leaf expression refers to the unnested column, like in this case:

CREATE TABLE list_struct_table
AS VALUES
    ([struct(1, 'a'), struct(2, 'b')]),
    ([struct(3, 'c')]);

SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
FROM list_struct_table;

The problem is in the logic that try_push_into_inputs uses to verify that an expression can be pushed through a node. It needs to make sure that the expressions to push through a node are in the input of that node to allow it to be pushed through. The problem is that for Unnest, the columns being unnest are going to have the same name as the input columns. Because the check compares just the names, it assumes this means that the columns referred in the expression are in the Unnest input and the check passes. But these are not the columns that the expression refers to. The expression is referring to the unnested columns, not to the input list columns, and therefore we can't push the expressions.

To Reproduce

First case:

CREATE TABLE struct_and_list_table
AS VALUES
    (struct(1, 2), [10, 20, 30]),
    (struct(3, 4), [40, 50]);

SELECT sum(get_field(s, 'c0'))
FROM (SELECT s, unnest(arr)
      FROM (SELECT column1 AS s, column2 AS arr
            FROM struct_and_list_table));

Second case:

CREATE TABLE list_struct_table
AS VALUES
    ([struct(1, 'a'), struct(2, 'b')]),
    ([struct(3, 'c')]);

query IT
SELECT get_field(unnest(column1), 'c0'), get_field(unnest(column1), 'c1')
FROM list_struct_table;

Both queries fail with:

Internal error: Assertion failed: expr.is_empty():

Expected behavior

Both queries work.
First query result:

9

Second query result:

1 a
2 b
3 c

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions