Handle Object-returning scalar transform functions#18628
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18628 +/- ##
============================================
- Coverage 64.40% 64.38% -0.02%
Complexity 1137 1137
============================================
Files 3337 3337
Lines 206069 206106 +37
Branches 32128 32135 +7
============================================
- Hits 132710 132707 -3
- Misses 62726 62766 +40
Partials 10633 10633
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
Found a high-signal issue; see inline comment.
| @@ -66,7 +66,8 @@ public ScalarTransformFunctionWrapper(FunctionInfo functionInfo) { | |||
| ColumnDataType resultType = FunctionUtils.getColumnDataType(resultClass); | |||
| if (resultType != null) { | |||
| _resultType = resultType; | |||
There was a problem hiding this comment.
This changes every OBJECT-returning scalar UDF to advertise STRING output and stringify the value. That silently changes query semantics for functions like nullIf/caseWhen: NULLIF(int_col, 0) now becomes a VARCHAR result instead of preserving the input type, so downstream arithmetic/comparisons will see the wrong type. We need a type-preserving resolution here rather than coercing OBJECT to STRING.
yashmayya
left a comment
There was a problem hiding this comment.
The core fix looks right — mapping OBJECT-result functions to STRING metadata and the (String) -> String.valueOf swap. The new helper's type coverage has a couple of gaps though (inline).
| case BYTES: | ||
| return transformFunction.transformToBytesValuesSV(valueBlock); | ||
| default: | ||
| throw new IllegalStateException("Unsupported data type: " + dataType); |
There was a problem hiding this comment.
JSON isn't handled, so a JSON column or JSON-returning expr passed to an Object param (e.g. nullIf(jsonCol, '...')) hits this default and throws — even though JSON is stored as STRING. Can we add case JSON: next to STRING (here and in the MV switch)? There's a jsonSV test column handy to cover it.
| case BYTES: | ||
| return transformFunction.transformToBytesValuesMV(valueBlock); | ||
| default: | ||
| throw new IllegalStateException("Unsupported multi-value data type: " + dataType); |
There was a problem hiding this comment.
MV switch is missing BOOLEAN and TIMESTAMP, which the SV switch above handles — an MV boolean/timestamp child feeding an Object param would throw here. Intended, or should it mirror the SV cases?
| } | ||
| } | ||
|
|
||
| private static Object[] transformToObjectValues(TransformFunction transformFunction, ValueBlock valueBlock) { |
There was a problem hiding this comment.
Most of this helper (the numeric/decimal/timestamp/boolean/bytes SV branches and the whole MV switch) isn't exercised by the tests — only the STRING path is. Worth a couple more cases, especially one where the Object function returns null so the null-placeholder path gets hit.
Summary
Objectwithout exposingColumnDataType.OBJECTthrough transform result metadataObjectfrom child transform functionsnullIfwith anObjectresult and anObjectparameterFixes #11743
Tests
JAVA_HOME=/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk/Contents/Home ./mvnw -pl pinot-core -am -Dtest=ScalarTransformFunctionWrapperTest#testObjectResultScalarTransformFunction+testObjectParameterScalarTransformFunction -Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false testJAVA_HOME=/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk/Contents/Home ./mvnw -pl pinot-core -am -Dtest=ScalarTransformFunctionWrapperTest -Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false testgit diff --check