Skip to content

feat(parquet): support parquet metadata cache#373

Open
gripleaf wants to merge 1 commit into
alibaba:mainfrom
gripleaf:feat/parquet-meta-cache-v2
Open

feat(parquet): support parquet metadata cache#373
gripleaf wants to merge 1 commit into
alibaba:mainfrom
gripleaf:feat/parquet-meta-cache-v2

Conversation

@gripleaf

Copy link
Copy Markdown
Contributor

Purpose

Support caching raw Parquet metadata footer bytes to reduce repeated footer reads and metadata deserialization when opening the same Parquet files in one process.

This change:

  • Adds CacheKind::PARQUET_METADATA.
  • Adds CacheKey::ForParquetMeta(file_uri) for footer cache keys.
  • Injects cache into reader builders through WithCache.
  • Caches raw Parquet footer bytes in ParquetReaderBuilder.
  • Passes cached parquet::FileMetaData into ParquetFileBatchReader::Create.
  • Reuses a shared counting cache test utility for manifest and Parquet metadata cache tests.

Tests

  • ParquetFileBatchReaderTest.TestParquetMetadataRawBytesCacheReusesFooter
  • ParquetFileBatchReaderTest.TestParquetMetadataCacheBypassesWhenGetUriFails
  • WriteAndReadInteTest.TestAppendWithParquetMetadataCache

API and Format

This change affects public API in include:

  • Adds CacheKind::PARQUET_METADATA.
  • Adds CacheKey::ForParquetMeta.
  • Adds ReaderBuilder::WithCache.

This change also updates the internal Parquet reader creation path to accept an optional parquet::FileMetaData.

No storage format or protocol change.

Documentation

Yes. Added docs/source/user_guide/parquet_metadata_cache.rst and linked it from the user guide.

Generative AI tooling

Generated-by: OpenAI Codex (GPT-5)

@gripleaf gripleaf force-pushed the feat/parquet-meta-cache-v2 branch from 9c00c86 to 1927cc6 Compare June 17, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant