Skip to content

feat(format): introduce avro direct encoding#94

Open
zjw1111 wants to merge 1 commit into
apache:mainfrom
zjw1111:migrate/avro-direct-encoder-decoder
Open

feat(format): introduce avro direct encoding#94
zjw1111 wants to merge 1 commit into
apache:mainfrom
zjw1111:migrate/avro-direct-encoder-decoder

Conversation

@zjw1111

@zjw1111 zjw1111 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: None

Introduce the Avro direct encoder/decoder implementation migrated from Alibaba Paimon C++:

  • src/paimon/format/avro/avro_direct_encoder.h
  • src/paimon/format/avro/avro_direct_encoder.cpp
  • src/paimon/format/avro/avro_direct_decoder.h
  • src/paimon/format/avro/avro_direct_decoder.cpp
  • src/paimon/format/avro/avro_direct_encoder_decoder_test.cpp

This also updates LICENSE and NOTICE with the Apache Iceberg C++ attribution required by the adapted direct encoder/decoder files.

External contributor analysis found no external contribution threshold hits, so no Co-authored-by trailer or thank-you comment is required.

Validation performed:

  • check_migration_batch.py --skip-deps --files ...
  • analyze_external_contributors.py --files ...
  • git diff --check
  • git diff --cached --check

Tests

API and Format

This adds Avro direct encoding/decoding implementation files under src/paimon/format/avro. It does not change public headers under include/.

Documentation

No documentation changes.

Generative AI tooling

Migrated-by: OpenAI Codex

Copilot AI review requested due to automatic review settings June 18, 2026 02:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Avro “direct” encoding/decoding utilities (Arrow arrays ↔ Avro binary) under src/paimon/format/avro, migrated/adapted from Apache Iceberg C++, along with unit tests and updated third-party attributions.

Changes:

  • Introduce AvroDirectEncoder and AvroDirectDecoder implementations for direct Arrow↔Avro conversion (bypassing GenericDatum).
  • Add extensive encoder/decoder unit tests, including projection and error-case coverage.
  • Update LICENSE and NOTICE to attribute Apache Iceberg C++ for the adapted code.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/paimon/format/avro/avro_direct_encoder.h Declares direct Arrow→Avro encoding API and scratch context type.
src/paimon/format/avro/avro_direct_encoder.cpp Implements direct Arrow→Avro encoding across several Avro/Arrow types.
src/paimon/format/avro/avro_direct_decoder.h Declares direct Avro→Arrow builder decoding API and scratch buffers.
src/paimon/format/avro/avro_direct_decoder.cpp Implements direct Avro→Arrow decoding, including projection skipping.
src/paimon/format/avro/avro_direct_encoder_decoder_test.cpp Adds unit tests for round-trip correctness, projection, and error cases.
NOTICE Adds Apache Iceberg C++ attribution entry.
LICENSE Adds Apache Iceberg C++ attribution block for the adapted direct encoder/decoder code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +56 to +58
if (branch_0->type() == ::avro::AVRO_NULL && branch_1->type() != ::avro::AVRO_NULL) {
return UnionBranches{.null_index = 0, .value_index = 1, .value_node = branch_1};
}
Comment on lines +25 to +26
#include <set>


#pragma once

#include <vector>
Comment on lines +19 to +21
#include <memory>
#include <string>

std::shared_ptr<arrow::Array> input_array;
ASSERT_TRUE(builder.Finish(&input_array).ok());

ASSERT_THROW(auto status = AvroDirectEncoder::EncodeArrowToAvro(
Comment thread LICENSE
* src/paimon/format/avro/avro_direct_decoder.h
* src/paimon/format/avro/avro_direct_encoder.cpp
* src/paimon/format/avro/avro_direct_encoder.h
* Avro input stream in src/paimon/format/avro/avro_direct_decoder.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants