-
Notifications
You must be signed in to change notification settings - Fork 470
Add support for aggregate functions over complex data types #2296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for aggregate functions over complex data types #2296
Conversation
|
"Hi @wuchong, I've submitted the PR to support aggregate functions over complex data types. The current CI failure in FlinkUnionReadPrimaryKeyTableITCase. It appears to be caused by hardcoded year values (2025) in the test cases that are now mismatching because it is 2026. |
|
Hi @Prajwal-banakar, thank you for the contribution! Just a quick note: the test failure in Regarding this PR, Moreover, the original issue #2247 was actually aimed at supporting aggregation functions that operate on complex types, for example, the As a gentle reminder: to avoid wasted effort, we encourage contributors to discuss the design and scope with committers in the GitHub issue and request to be assigned before opening a PR. This is part of our official contribution process, which helps ensure alignment and smoother reviews. Thanks again for your engagement with Fluss! |
|
One important point you raised and that we should address is early type validation. We should fail fast during table creation if an aggregation function is applied to an unsupported data type. Specifically, in: org.apache.fluss.server.utils.TableDescriptorValidation#validateAggregationFunctionParameterswe should validate that the column type is within the supported data types for the given aggregation function. If not, we should throw a clear error immediately. This validation also needs dedicated test coverage. I think this is the missing thing in last pull request, right? @platinumhamburg |
|
Hi @wuchong, Thank you for the detailed explanation! I now understand the reasoning behind not supporting MIN/MAX for complex types like ARRAY or ROW due to the lack of total ordering. I also appreciate the clarification on the original intent of #2247. As a beginner in the codebase, I'd like to help implement the early type validation you mentioned in TableDescriptorValidation#validateAggregationFunctionParameters. To make sure I stay on the right track: Should I pivot this PR to focus on adding those "fail-fast" validations and tests? Or would you prefer I close this PR and open a new one specifically for the validation logic? I will make sure to discuss design in the issue tracker moving forward to better align with the project's contribution process. Thanks for your patience! |
|
@Prajwal-banakar Yeah, I think it’s best to close this issue and open a new one specifically for that purpose, along with a dedicated pull request. This will help keep the scope clear and the discussion focused. |
|
@Prajwal-banakar I created #2302 for this problem. You can comment in that issue and I can assign it to you. |
|
Closing this issue as discussed. |
Yes, the previous PR indeed missed the strict type validation. |
Purpose
Linked issue: close #2247
This PR adds support for aggregate functions (specifically
MINandMAX) over complex data types such asARRAYandROW. Previously, using these functions on complex types would result in an exception because they were considered incomparable.LAST_VALUEandFIRST_VALUEwere already supported but are now verified to work with complex types.Brief change log
InternalRowUtils.compare()to support recursive comparison forARRAYandROWtypes.FieldMinAggandFieldMaxAggto use the newInternalRowUtils.compare()method that acceptsDataType(preserving nested type information) instead of justDataTypeRoot.ComplexTypeAggregationTestto verifyLAST_VALUE,MIN, andMAXaggregations onARRAYandROWtypes.MAPtype comparison is explicitly not supported and will throw an exception, consistent with SQL standards.Tests
ComplexTypeAggregationTestwhich includes:testArrayLastValue: VerifiesLAST_VALUEonARRAY<INT>.testArrayMinMax: VerifiesMIN/MAXonARRAY<INT>.testRowMinMax: VerifiesMIN/MAXon nestedROW<INT, STRING>.API and Format
Documentation