Skip to content

feat: add file store write and restore utilities#91

Open
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate_2
Open

feat: add file store write and restore utilities#91
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate_2

Conversation

@lucasfang

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: No linked issue

This change adds file store write and restore utilities for the paimon-cpp core operation module.

Included changes:

  • Public API (include/paimon/):

    • Adds write_context.h defining write session context and configuration.
    • Adds file_store_write.h defining the file store write interface.
  • Core Operation Implementation (src/paimon/core/operation/):

    • Adds abstract_file_store_write.h/.cpp providing base implementation for file store writers.
    • Adds file_store_write.cpp implementing concrete file store write logic.
    • Adds write_context.cpp implementing write context management.
    • Adds write_restore.h/.cpp providing write restore capabilities.
    • Adds file_system_write_restore.h defining file-system-level restore interface.
    • Adds restore_files.h defining restore file metadata structures.
  • Test Coverage:

    • Adds file_store_write_test.cpp covering file store write functionality.
    • Adds write_context_test.cpp covering write context behavior.
    • Adds file_system_write_restore_test.cpp covering file system restore logic.
    • Adds write_restore_test.cpp covering write restore operations.

Tests

Not run. Local compile, CMake, and gtest environment checks are not part of this PR description.

Test coverage included in this change:

  • FileStoreWriteTest
  • WriteContextTest
  • FileSystemWriteRestoreTest
  • WriteRestoreTest

API and Format

This change adds public API in include/paimon/write_context.h and include/paimon/file_store_write.h.

No storage format or protocol changes.

Documentation

No documentation changes required.

Generative AI tooling

Migrate-by: Aone Copilot (Qwen-3.7-Max)

Copilot AI review requested due to automatic review settings June 18, 2026 02:37

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new write-session configuration and file-store write/restore utilities to the paimon-cpp core operation layer, along with initial unit tests.

Changes:

  • Introduces WriteContext / WriteContextBuilder as the configuration carrier for write operations.
  • Implements FileStoreWrite::Create(...) wiring for append-only vs PK tables and adds an AbstractFileStoreWrite base implementation.
  • Adds write-restore plumbing (WriteRestore, FileSystemWriteRestore, RestoreFiles) and corresponding gtests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/paimon/core/operation/write_restore.h Declares WriteRestore interface + helper API.
src/paimon/core/operation/write_restore.cpp Implements WriteRestore::ExtractDataFiles.
src/paimon/core/operation/write_restore_test.cpp Unit tests for ExtractDataFiles.
src/paimon/core/operation/write_context.cpp Implements WriteContext and WriteContextBuilder.
src/paimon/core/operation/write_context_test.cpp Unit tests for write-context builder behavior.
src/paimon/core/operation/restore_files.h Defines RestoreFiles container for restored snapshot/files.
src/paimon/core/operation/file_system_write_restore.h Implements restore-by-scanning filesystem snapshots.
src/paimon/core/operation/file_system_write_restore_test.cpp Unit tests for filesystem restore behavior.
src/paimon/core/operation/file_store_write.cpp Implements FileStoreWrite::Create(...) factory logic.
src/paimon/core/operation/file_store_write_test.cpp Unit tests for create-path validation and table kinds.
src/paimon/core/operation/abstract_file_store_write.h Declares base writer implementation.
src/paimon/core/operation/abstract_file_store_write.cpp Implements base writer behavior, restore integration, and commit prep.
include/paimon/write_context.h Public API header for write context + builder.
include/paimon/file_store_write.h Public API header for file store write interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +131 to +134
PAIMON_ASSIGN_OR_RAISE(BinaryRow partition,
file_store_path_factory_->ToBinaryRow(batch->GetPartition()))
PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<BatchWriter> writer,
GetWriter(partition, batch->GetBucket()));
Comment on lines +60 to +63
// TODO(yonghao.fyh): java paimon doesn't use snapshot_manager.LatestSnapshot() here,
// because they don't want to flood the catalog with high concurrency
PAIMON_ASSIGN_OR_RAISE(std::optional<Snapshot> snapshot,
snapshot_manager_->LatestSnapshot());
Comment on lines +68 to +70
PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<FileStoreScan::RawPlan> plan,
scan_->WithSnapshot(snapshot.value())->CreatePlan());
std::vector<ManifestEntry> entries = plan->Files();
Comment on lines +187 to +192
Result<std::unique_ptr<WriteContext>> WriteContextBuilder::Finish() {
PAIMON_ASSIGN_OR_RAISE(impl_->root_path_, PathUtil::NormalizePath(impl_->root_path_));
if (impl_->root_path_.empty()) {
return Status::Invalid("root path is empty");
}
bool enable_multi_thread_spill = impl_->spill_thread_number_ > 0;
Comment on lines +28 to +32
Result<std::optional<int32_t>> WriteRestore::ExtractDataFiles(
const std::vector<ManifestEntry>& entries,
std::vector<std::shared_ptr<DataFileMeta>>* data_files) {
std::optional<int32_t> total_buckets;
for (const auto& entry : entries) {
Comment on lines +34 to +37
return Status::Invalid(fmt::format(
"Bucket data files has different total bucket number, {} vs {}, this should "
"be a bug.",
total_buckets.value(), entry.TotalBuckets()));
Comment on lines +78 to +80
if (table_schema == std::nullopt) {
return Status::Invalid(fmt::format("cannot found latest schema in branch {}", branch));
}
Comment on lines +75 to +82
std::vector<std::shared_ptr<IndexFileMeta>> deletion_vectors_index;
if (scan_deletion_vectors_index) {
PAIMON_ASSIGN_OR_RAISE(
deletion_vectors_index,
index_file_handler_->Scan(
snapshot.value(), std::string(DeletionVectorsIndexFile::DELETION_VECTORS_INDEX),
partition, bucket));
}
Comment on lines +49 to +63
std::optional<Snapshot> GetSnapshot() const {
return snapshot_;
}
std::optional<int32_t> TotalBuckets() const {
return total_buckets_;
}
std::vector<std::shared_ptr<DataFileMeta>> DataFiles() const {
return data_files_;
}
std::shared_ptr<IndexFileMeta> DynamicBucketIndex() const {
return dynamic_bucket_index_;
}
std::vector<std::shared_ptr<IndexFileMeta>> DeleteVectorsIndex() const {
return delete_vectors_index_;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants