-
Notifications
You must be signed in to change notification settings - Fork 478
[lake/paimon] Support map types for tiering paimon #2308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lake/paimon] Support map types for tiering paimon #2308
Conversation
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuQianJin-Stars Thanks for the pr. Please also add a IT to make sure
tiering and union read works for map type just like we did for array & nest row type
| } | ||
|
|
||
| @Test | ||
| void testMapTypeWithIntegerKeyValue() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these test method be put into one single test method?
| } | ||
|
|
||
| @Test | ||
| void testMapTypeWithIntegerKeyValue() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
Can these test methods put into one single test method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for MAP data types when tiering data from Fluss to Apache Paimon. Previously, MAP types threw UnsupportedOperationException, limiting the ability to tier tables with MAP columns. The implementation follows the existing pattern used for ARRAY types and includes bidirectional conversion support.
Key Changes:
- Added
FlussMapAsPaimonMapadapter class to convert Fluss InternalMap to Paimon InternalMap format - Enhanced
FlussRowAsPaimonRowandFlussArrayAsPaimonArrayto support MAP type conversion with proper type information propagation - Added comprehensive test coverage including nested maps, null handling, empty maps, and maps within arrays
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/source/FlussMapAsPaimonMap.java | New adapter class that implements Paimon's InternalMap interface by wrapping Fluss InternalMap and converting key/value arrays with proper type information |
| fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/source/FlussRowAsPaimonRow.java | Replaced UnsupportedOperationException with MAP type conversion logic that extracts type information and creates FlussMapAsPaimonMap adapter |
| fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/source/FlussArrayAsPaimonArray.java | Added MAP type support for arrays containing maps, enabling nested MAP types within ARRAY columns |
| fluss-lake/fluss-lake-paimon/src/test/java/org/apache/fluss/lake/paimon/tiering/FlussRecordAsPaimonRowTest.java | Added 9 comprehensive test cases covering integer/string keys, nested maps, null handling, empty maps, nullable values, maps in arrays, and complex value types |
| fluss-lake/fluss-lake-paimon/src/test/java/org/apache/fluss/lake/paimon/source/FlussRowAsPaimonRowTest.java | Added 4 test cases for row-level MAP type conversion including nested maps and maps within arrays |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void testMapTypeWithIntegerKeyValue() { | ||
| int tableBucket = 0; | ||
| RowType tableRowType = | ||
| RowType.of( | ||
| new org.apache.paimon.types.MapType( | ||
| new org.apache.paimon.types.IntType(), | ||
| new org.apache.paimon.types.IntType()), | ||
| // system columns | ||
| new org.apache.paimon.types.IntType(), | ||
| new org.apache.paimon.types.BigIntType(), | ||
| new org.apache.paimon.types.LocalZonedTimestampType(3)); | ||
|
|
||
| FlussRecordAsPaimonRow flussRecordAsPaimonRow = | ||
| new FlussRecordAsPaimonRow(tableBucket, tableRowType); | ||
| long logOffset = 0; | ||
| long timeStamp = System.currentTimeMillis(); | ||
| GenericRow genericRow = new GenericRow(1); | ||
| Map<Object, Object> mapData = new HashMap<>(); | ||
| mapData.put(1, 100); | ||
| mapData.put(2, 200); | ||
| mapData.put(3, 300); | ||
| genericRow.setField(0, new GenericMap(mapData)); | ||
|
|
||
| LogRecord logRecord = new GenericRecord(logOffset, timeStamp, APPEND_ONLY, genericRow); | ||
| flussRecordAsPaimonRow.setFlussRecord(logRecord); | ||
|
|
||
| InternalMap map = flussRecordAsPaimonRow.getMap(0); | ||
| assertThat(map).isNotNull(); | ||
| assertThat(map.size()).isEqualTo(3); | ||
|
|
||
| InternalArray keys = map.keyArray(); | ||
| InternalArray values = map.valueArray(); | ||
| assertThat(keys.size()).isEqualTo(3); | ||
| assertThat(values.size()).isEqualTo(3); | ||
| } |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test methods testMapTypeWithIntegerKeyValue, testMapTypeWithStringKeyIntValue, testNestedMapType, testMapWithComplexTypes, testMapInArray cover the basic functionality but don't verify the actual key and value contents retrieved from the InternalMap. The tests only assert the size of keys and values arrays but don't check if the actual data is correctly converted. Consider adding assertions to verify the actual key-value pairs are correctly accessible through the Paimon InternalMap interface, for example by checking specific key and value retrievals like keys.getInt(0), values.getInt(0), etc.
| void testMapTypeWithIntegerKeyValue() { | ||
| RowType tableRowType = | ||
| RowType.of( | ||
| new org.apache.paimon.types.MapType( | ||
| new org.apache.paimon.types.IntType(), | ||
| new org.apache.paimon.types.IntType())); | ||
|
|
||
| long logOffset = 0; | ||
| long timeStamp = System.currentTimeMillis(); | ||
| GenericRow genericRow = new GenericRow(1); | ||
| Map<Object, Object> mapData = new HashMap<>(); | ||
| mapData.put(1, 100); | ||
| mapData.put(2, 200); | ||
| mapData.put(3, 300); | ||
| genericRow.setField(0, new GenericMap(mapData)); | ||
|
|
||
| LogRecord logRecord = new GenericRecord(logOffset, timeStamp, APPEND_ONLY, genericRow); | ||
| FlussRowAsPaimonRow flussRowAsPaimonRow = | ||
| new FlussRowAsPaimonRow(logRecord.getRow(), tableRowType); | ||
|
|
||
| InternalMap map = flussRowAsPaimonRow.getMap(0); | ||
| assertThat(map).isNotNull(); | ||
| assertThat(map.size()).isEqualTo(3); | ||
|
|
||
| InternalArray keys = map.keyArray(); | ||
| InternalArray values = map.valueArray(); | ||
| assertThat(keys.size()).isEqualTo(3); | ||
| assertThat(values.size()).isEqualTo(3); | ||
| } |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the tiering tests, these test methods only verify the size of the maps but don't assert the actual content of the keys and values. For example, in testMapTypeWithIntegerKeyValue, after getting the keys and values arrays, the test should verify that the arrays contain the expected data (e.g., keys contain 1, 2, 3 and values contain 100, 200, 300). This would ensure the data conversion is working correctly end-to-end.
| } | ||
|
|
||
| @Test | ||
| void testNestedMapType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this test method be moved to testMapWithAllTypes?
| } | ||
|
|
||
| @Test | ||
| void testMapInArray() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this test method be removed since testMapWithAllTypes already verify this case?
| } | ||
|
|
||
| @Test | ||
| void testNestedMapType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to move to testMapWithAllTypes?
|
@luoyuxia Hi, i already updated the pr. Please help review when you got some time. |
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuQianJin-Stars Thanks. LGTM!
Purpose
Linked issue: close #2257
This PR adds support for MAP data types when tiering data from Fluss to Apache Paimon. Previously, MAP types were not properly handled during the tiering process, which limited the ability to tier tables containing MAP columns to Paimon.
Brief change log
FlussMapAsPaimonMapadapter class to convert Fluss InternalMap to Paimon InternalMap formatPaimonMapAsFlussMapadapter class to wrap Paimon InternalMap as Fluss InternalMap for readingFlussRowAsPaimonRowto supportgetMap()method for handling MAP type fields by wrapping Fluss maps with proper key and value type informationFlussArrayAsPaimonArrayto support nested MAP types within arraysPaimonRowAsFlussRowandPaimonArrayAsFlussArrayto support bidirectional MAP type conversionTests
FlussRecordAsPaimonRowTest.testMapTypeWithIntegerKeyValue: Verifies basic MAP type with integer keys and valuesFlussRecordAsPaimonRowTest.testMapTypeWithStringKeyIntValue: Verifies MAP type with string keys and integer valuesFlussRecordAsPaimonRowTest.testNestedMapType: Verifies nested MAP types (MAP containing MAP as values)FlussRecordAsPaimonRowTest.testMapWithComplexTypes: Verifies MAP type with complex value types like DECIMALFlussRowAsPaimonRowTest: Contains additional test cases for MAP type handling in row conversionAPI and Format
This change does not affect API or storage format. It enables an existing data type (MAP) to be properly handled during the tiering process to Paimon.
Documentation
The documentation has been updated to include MAP<kt, vt> in the Paimon data type mapping table.