feat(format): introduce avro direct encoding#94
Open
zjw1111 wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds Avro “direct” encoding/decoding utilities (Arrow arrays ↔ Avro binary) under src/paimon/format/avro, migrated/adapted from Apache Iceberg C++, along with unit tests and updated third-party attributions.
Changes:
- Introduce
AvroDirectEncoderandAvroDirectDecoderimplementations for direct Arrow↔Avro conversion (bypassingGenericDatum). - Add extensive encoder/decoder unit tests, including projection and error-case coverage.
- Update
LICENSEandNOTICEto attribute Apache Iceberg C++ for the adapted code.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/format/avro/avro_direct_encoder.h | Declares direct Arrow→Avro encoding API and scratch context type. |
| src/paimon/format/avro/avro_direct_encoder.cpp | Implements direct Arrow→Avro encoding across several Avro/Arrow types. |
| src/paimon/format/avro/avro_direct_decoder.h | Declares direct Avro→Arrow builder decoding API and scratch buffers. |
| src/paimon/format/avro/avro_direct_decoder.cpp | Implements direct Avro→Arrow decoding, including projection skipping. |
| src/paimon/format/avro/avro_direct_encoder_decoder_test.cpp | Adds unit tests for round-trip correctness, projection, and error cases. |
| NOTICE | Adds Apache Iceberg C++ attribution entry. |
| LICENSE | Adds Apache Iceberg C++ attribution block for the adapted direct encoder/decoder code. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+56
to
+58
| if (branch_0->type() == ::avro::AVRO_NULL && branch_1->type() != ::avro::AVRO_NULL) { | ||
| return UnionBranches{.null_index = 0, .value_index = 1, .value_node = branch_1}; | ||
| } |
Comment on lines
+25
to
+26
| #include <set> | ||
|
|
|
|
||
| #pragma once | ||
|
|
||
| #include <vector> |
Comment on lines
+19
to
+21
| #include <memory> | ||
| #include <string> | ||
|
|
| std::shared_ptr<arrow::Array> input_array; | ||
| ASSERT_TRUE(builder.Finish(&input_array).ok()); | ||
|
|
||
| ASSERT_THROW(auto status = AvroDirectEncoder::EncodeArrowToAvro( |
| * src/paimon/format/avro/avro_direct_decoder.h | ||
| * src/paimon/format/avro/avro_direct_encoder.cpp | ||
| * src/paimon/format/avro/avro_direct_encoder.h | ||
| * Avro input stream in src/paimon/format/avro/avro_direct_decoder.cpp |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: None
Introduce the Avro direct encoder/decoder implementation migrated from Alibaba Paimon C++:
src/paimon/format/avro/avro_direct_encoder.hsrc/paimon/format/avro/avro_direct_encoder.cppsrc/paimon/format/avro/avro_direct_decoder.hsrc/paimon/format/avro/avro_direct_decoder.cppsrc/paimon/format/avro/avro_direct_encoder_decoder_test.cppThis also updates
LICENSEandNOTICEwith the Apache Iceberg C++ attribution required by the adapted direct encoder/decoder files.External contributor analysis found no external contribution threshold hits, so no
Co-authored-bytrailer or thank-you comment is required.Validation performed:
check_migration_batch.py --skip-deps --files ...analyze_external_contributors.py --files ...git diff --checkgit diff --cached --checkTests
API and Format
This adds Avro direct encoding/decoding implementation files under
src/paimon/format/avro. It does not change public headers underinclude/.Documentation
No documentation changes.
Generative AI tooling
Migrated-by: OpenAI Codex