Add support for aggregate functions over complex data types #2296

Prajwal-banakar · 2026-01-02T06:22:18Z

Purpose

Linked issue: close #2247

This PR adds support for aggregate functions (specifically MIN and MAX) over complex data types such as ARRAY and ROW. Previously, using these functions on complex types would result in an exception because they were considered incomparable. LAST_VALUE and FIRST_VALUE were already supported but are now verified to work with complex types.

Brief change log

Updated InternalRowUtils.compare() to support recursive comparison for ARRAY and ROW types.
Updated FieldMinAgg and FieldMaxAgg to use the new InternalRowUtils.compare() method that accepts DataType (preserving nested type information) instead of just DataTypeRoot.
Added ComplexTypeAggregationTest to verify LAST_VALUE, MIN, and MAX aggregations on ARRAY and ROW types.
Note: MAP type comparison is explicitly not supported and will throw an exception, consistent with SQL standards.

Tests

Added ComplexTypeAggregationTest which includes:
- testArrayLastValue: Verifies LAST_VALUE on ARRAY<INT>.
- testArrayMinMax: Verifies MIN/MAX on ARRAY<INT>.
- testRowMinMax: Verifies MIN/MAX on nested ROW<INT, STRING>.

API and Format

No API or storage format changes. This only enhances the runtime capability of existing aggregation functions.

Documentation

No documentation changes needed as this supports standard SQL behavior for these types.

Prajwal-banakar · 2026-01-02T07:04:26Z

"Hi @wuchong, I've submitted the PR to support aggregate functions over complex data types.

The current CI failure in FlinkUnionReadPrimaryKeyTableITCase. It appears to be caused by hardcoded year values (2025) in the test cases that are now mismatching because it is 2026.

wuchong · 2026-01-04T03:38:07Z

Hi @Prajwal-banakar, thank you for the contribution!

Just a quick note: the test failure in FlinkUnionReadPrimaryKeyTableITCase has already been resolved by PR #2295.

Regarding this PR, MAX and MIN are not intended to support complex data types such as ARRAY, MAP, or ROW. These types are not orderable, and therefore Fluss (like other engines such as Apache Spark and Flink) cannot define a consistent total ordering for them. As a result, using MAX/MIN on such columns should be disallowed. The FIP also declared the supported types of max/min: https://cwiki.apache.org/confluence/display/FLUSS/FIP-21%3A+Aggregation+Merge+Engine

Moreover, the original issue #2247 was actually aimed at supporting aggregation functions that operate on complex types, for example, the ARRAY_AGG that returns an ARRAY type.

As a gentle reminder: to avoid wasted effort, we encourage contributors to discuss the design and scope with committers in the GitHub issue and request to be assigned before opening a PR. This is part of our official contribution process, which helps ensure alignment and smoother reviews.

Thanks again for your engagement with Fluss!

wuchong · 2026-01-04T03:38:14Z

One important point you raised and that we should address is early type validation. We should fail fast during table creation if an aggregation function is applied to an unsupported data type. Specifically, in:

org.apache.fluss.server.utils.TableDescriptorValidation#validateAggregationFunctionParameters

we should validate that the column type is within the supported data types for the given aggregation function. If not, we should throw a clear error immediately. This validation also needs dedicated test coverage. I think this is the missing thing in last pull request, right? @platinumhamburg

Prajwal-banakar · 2026-01-04T07:07:35Z

Hi @wuchong,

Thank you for the detailed explanation! I now understand the reasoning behind not supporting MIN/MAX for complex types like ARRAY or ROW due to the lack of total ordering. I also appreciate the clarification on the original intent of #2247.

As a beginner in the codebase, I'd like to help implement the early type validation you mentioned in TableDescriptorValidation#validateAggregationFunctionParameters.

To make sure I stay on the right track:

Should I pivot this PR to focus on adding those "fail-fast" validations and tests?

Or would you prefer I close this PR and open a new one specifically for the validation logic?

I will make sure to discuss design in the issue tracker moving forward to better align with the project's contribution process. Thanks for your patience!

wuchong · 2026-01-04T13:42:29Z

@Prajwal-banakar Yeah, I think it’s best to close this issue and open a new one specifically for that purpose, along with a dedicated pull request. This will help keep the scope clear and the discussion focused.

wuchong · 2026-01-04T14:08:57Z

@Prajwal-banakar I created #2302 for this problem. You can comment in that issue and I can assign it to you.

wuchong · 2026-01-04T14:09:21Z

Closing this issue as discussed.

platinumhamburg · 2026-01-05T03:14:07Z

One important point you raised and that we should address is early type validation. We should fail fast during table creation if an aggregation function is applied to an unsupported data type. Specifically, in:
org.apache.fluss.server.utils.TableDescriptorValidation#validateAggregationFunctionParameters
we should validate that the column type is within the supported data types for the given aggregation function. If not, we should throw a clear error immediately. This validation also needs dedicated test coverage. I think this is the missing thing in last pull request, right? @platinumhamburg

Yes, the previous PR indeed missed the strict type validation.

Add support for aggregate functions over complex data types

83979c9

Merge branch 'apache:main' into Task/complex-type-aggregation

f19e04f

wuchong closed this Jan 4, 2026

Prajwal-banakar deleted the Task/complex-type-aggregation branch January 4, 2026 14:46

wuchong mentioned this pull request Jan 5, 2026

Add Early Validation for Aggregation Function Data Types During Table Creation #2302

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for aggregate functions over complex data types #2296

Add support for aggregate functions over complex data types #2296

Uh oh!

Prajwal-banakar commented Jan 2, 2026

Uh oh!

Prajwal-banakar commented Jan 2, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

Prajwal-banakar commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

platinumhamburg commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add support for aggregate functions over complex data types #2296

Add support for aggregate functions over complex data types #2296

Uh oh!

Conversation

Prajwal-banakar commented Jan 2, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Prajwal-banakar commented Jan 2, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

Prajwal-banakar commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

wuchong commented Jan 4, 2026

Uh oh!

platinumhamburg commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants