Skip to content

feat: support sort_array expression#3706

Open
grorge123 wants to merge 1 commit intoapache:mainfrom
grorge123:sort_array
Open

feat: support sort_array expression#3706
grorge123 wants to merge 1 commit intoapache:mainfrom
grorge123:sort_array

Conversation

@grorge123
Copy link

Which issue does this PR close?

Closes #3159.

Rationale for this change

Currently, comet does not support sort_array expression, so using sort_array(...) would fall back to Spark. This PR adds sort_array support to achieve native acceleration.

The SortArray expression sorts the elements of an array in either ascending or descending order.

What changes are included in this PR?

  • Add CometSortArray in arrays.scala to serialize Spark SortArray as DataFusion array_sort.
  • Register SortArray in QueryPlanSerde.scala.
  • Preserve Spark sort semantics:
    • sort_array(arr) / sort_array(arr, true) -> ascending with NULLS FIRST
    • sort_array(arr, false) -> descending with NULLS LAST
  • Mark floating-point array sorting as Incompatible only when spark.comet.exec.strictFloatingPoint=true.
  • Explicitly reject unsupported nested complex cases such as array<array<struct<...>>> at planning time so they cleanly fall back to Spark instead of failing at runtime.
  • Update the supported-expression documentation in spark_expressions_support.md.

How are these changes tested?

  • Added SQL-file coverage in sort_array.sql for:
    - array
    - array
    - array including NaN, -0.0, and 0.0
    - array<decimal(10,0)>
    - array
    - array<struct<...>>
    - array<array>
    - array literal case
    - empty arrays
    - null arrays
    - explicit ascending / descending paths
    - literal and table-column inputs

Reference: https://github.com/apache/spark/blob/04b821c69e85be5f51a1270b3a9a4155afdb5334/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala#L706-L760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: sort_array

1 participant