feat: add file store write and restore utilities#91
Open
lucasfang wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new write-session configuration and file-store write/restore utilities to the paimon-cpp core operation layer, along with initial unit tests.
Changes:
- Introduces
WriteContext/WriteContextBuilderas the configuration carrier for write operations. - Implements
FileStoreWrite::Create(...)wiring for append-only vs PK tables and adds anAbstractFileStoreWritebase implementation. - Adds write-restore plumbing (
WriteRestore,FileSystemWriteRestore,RestoreFiles) and corresponding gtests.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/operation/write_restore.h | Declares WriteRestore interface + helper API. |
| src/paimon/core/operation/write_restore.cpp | Implements WriteRestore::ExtractDataFiles. |
| src/paimon/core/operation/write_restore_test.cpp | Unit tests for ExtractDataFiles. |
| src/paimon/core/operation/write_context.cpp | Implements WriteContext and WriteContextBuilder. |
| src/paimon/core/operation/write_context_test.cpp | Unit tests for write-context builder behavior. |
| src/paimon/core/operation/restore_files.h | Defines RestoreFiles container for restored snapshot/files. |
| src/paimon/core/operation/file_system_write_restore.h | Implements restore-by-scanning filesystem snapshots. |
| src/paimon/core/operation/file_system_write_restore_test.cpp | Unit tests for filesystem restore behavior. |
| src/paimon/core/operation/file_store_write.cpp | Implements FileStoreWrite::Create(...) factory logic. |
| src/paimon/core/operation/file_store_write_test.cpp | Unit tests for create-path validation and table kinds. |
| src/paimon/core/operation/abstract_file_store_write.h | Declares base writer implementation. |
| src/paimon/core/operation/abstract_file_store_write.cpp | Implements base writer behavior, restore integration, and commit prep. |
| include/paimon/write_context.h | Public API header for write context + builder. |
| include/paimon/file_store_write.h | Public API header for file store write interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+131
to
+134
| PAIMON_ASSIGN_OR_RAISE(BinaryRow partition, | ||
| file_store_path_factory_->ToBinaryRow(batch->GetPartition())) | ||
| PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<BatchWriter> writer, | ||
| GetWriter(partition, batch->GetBucket())); |
Comment on lines
+60
to
+63
| // TODO(yonghao.fyh): java paimon doesn't use snapshot_manager.LatestSnapshot() here, | ||
| // because they don't want to flood the catalog with high concurrency | ||
| PAIMON_ASSIGN_OR_RAISE(std::optional<Snapshot> snapshot, | ||
| snapshot_manager_->LatestSnapshot()); |
Comment on lines
+68
to
+70
| PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<FileStoreScan::RawPlan> plan, | ||
| scan_->WithSnapshot(snapshot.value())->CreatePlan()); | ||
| std::vector<ManifestEntry> entries = plan->Files(); |
Comment on lines
+187
to
+192
| Result<std::unique_ptr<WriteContext>> WriteContextBuilder::Finish() { | ||
| PAIMON_ASSIGN_OR_RAISE(impl_->root_path_, PathUtil::NormalizePath(impl_->root_path_)); | ||
| if (impl_->root_path_.empty()) { | ||
| return Status::Invalid("root path is empty"); | ||
| } | ||
| bool enable_multi_thread_spill = impl_->spill_thread_number_ > 0; |
Comment on lines
+28
to
+32
| Result<std::optional<int32_t>> WriteRestore::ExtractDataFiles( | ||
| const std::vector<ManifestEntry>& entries, | ||
| std::vector<std::shared_ptr<DataFileMeta>>* data_files) { | ||
| std::optional<int32_t> total_buckets; | ||
| for (const auto& entry : entries) { |
Comment on lines
+34
to
+37
| return Status::Invalid(fmt::format( | ||
| "Bucket data files has different total bucket number, {} vs {}, this should " | ||
| "be a bug.", | ||
| total_buckets.value(), entry.TotalBuckets())); |
Comment on lines
+78
to
+80
| if (table_schema == std::nullopt) { | ||
| return Status::Invalid(fmt::format("cannot found latest schema in branch {}", branch)); | ||
| } |
Comment on lines
+75
to
+82
| std::vector<std::shared_ptr<IndexFileMeta>> deletion_vectors_index; | ||
| if (scan_deletion_vectors_index) { | ||
| PAIMON_ASSIGN_OR_RAISE( | ||
| deletion_vectors_index, | ||
| index_file_handler_->Scan( | ||
| snapshot.value(), std::string(DeletionVectorsIndexFile::DELETION_VECTORS_INDEX), | ||
| partition, bucket)); | ||
| } |
Comment on lines
+49
to
+63
| std::optional<Snapshot> GetSnapshot() const { | ||
| return snapshot_; | ||
| } | ||
| std::optional<int32_t> TotalBuckets() const { | ||
| return total_buckets_; | ||
| } | ||
| std::vector<std::shared_ptr<DataFileMeta>> DataFiles() const { | ||
| return data_files_; | ||
| } | ||
| std::shared_ptr<IndexFileMeta> DynamicBucketIndex() const { | ||
| return dynamic_bucket_index_; | ||
| } | ||
| std::vector<std::shared_ptr<IndexFileMeta>> DeleteVectorsIndex() const { | ||
| return delete_vectors_index_; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: No linked issue
This change adds file store write and restore utilities for the paimon-cpp core operation module.
Included changes:
Public API (
include/paimon/):write_context.hdefining write session context and configuration.file_store_write.hdefining the file store write interface.Core Operation Implementation (
src/paimon/core/operation/):abstract_file_store_write.h/.cppproviding base implementation for file store writers.file_store_write.cppimplementing concrete file store write logic.write_context.cppimplementing write context management.write_restore.h/.cppproviding write restore capabilities.file_system_write_restore.hdefining file-system-level restore interface.restore_files.hdefining restore file metadata structures.Test Coverage:
file_store_write_test.cppcovering file store write functionality.write_context_test.cppcovering write context behavior.file_system_write_restore_test.cppcovering file system restore logic.write_restore_test.cppcovering write restore operations.Tests
Not run. Local compile, CMake, and gtest environment checks are not part of this PR description.
Test coverage included in this change:
FileStoreWriteTestWriteContextTestFileSystemWriteRestoreTestWriteRestoreTestAPI and Format
This change adds public API in
include/paimon/write_context.handinclude/paimon/file_store_write.h.No storage format or protocol changes.
Documentation
No documentation changes required.
Generative AI tooling
Migrate-by: Aone Copilot (Qwen-3.7-Max)