fix(format): reset avro reader on schema change#374
Open
zjw1111 wants to merge 2 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Resets the underlying Avro reader when AvroFileBatchReader::SetReadSchema() is called so subsequent NextBatch() reads restart from the beginning and row numbering aligns with the FileBatchReader contract.
Changes:
- Recreate the underlying
avro::DataFileReaderBaseon successful schema changes. - Rebuild the Arrow
ArrayBuilderand update field projection on schema changes. - Add a regression test ensuring schema changes reset the reader back to the first row.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/paimon/format/avro/avro_file_batch_reader.cpp | Recreates the Avro reader and resets internal state on SetReadSchema() success. |
| src/paimon/format/avro/avro_file_batch_reader_test.cpp | Adds a test asserting SetReadSchema() causes reading to restart from row 0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+154
to
+165
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<::avro::DataFileReaderBase> reader, | ||
| CreateDataFileReader(input_stream_, pool_)); | ||
|
|
||
| if (reader_) { | ||
| reader_->close(); | ||
| } | ||
| reader_ = std::move(reader); | ||
| read_fields_projection_ = std::move(read_fields_projection); | ||
| array_builder_ = std::move(array_builder); | ||
| previous_first_row_ = std::numeric_limits<uint64_t>::max(); | ||
| next_row_to_read_ = std::numeric_limits<uint64_t>::max(); | ||
| close_ = false; |
|
|
||
| auto read_schema = arrow::schema({arrow::field("f1", arrow::int32())}); | ||
| std::unique_ptr<ArrowSchema> c_schema = std::make_unique<ArrowSchema>(); | ||
| ASSERT_TRUE(arrow::ExportSchema(*read_schema, c_schema.get()).ok()); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: none
AvroFileBatchReader::SetReadSchema()reset the reported row numbers and Arrow builder, but left the underlying AvroDataFileReaderBaseat its current read position. If a caller consumed part of a file and then changed the read schema, the next batch continued from the old Avro reader position while reporting row number 0.This patch recreates the Avro data file reader when
SetReadSchema()succeeds, so the nextNextBatch()call reads from the first row as required by theFileBatchReadercontract.Tests
git diff --cached --checkcmake --build build --target paimon-avro-format-test -j64./build/release/paimon-avro-format-test --gtest_filter=AvroFileBatchReaderTest.TestSetReadSchemaResetsReaderToFirstRow./build/release/paimon-avro-format-testAPI and Format
No public API, storage format, or protocol changes.
Documentation
No.
Generative AI tooling
Generated-by: OpenAI Codex