Skip to content

Core: Skip length check in ContentCache.tryCache for cached locations#16787

Draft
rahulsmahadev wants to merge 1 commit into
apache:mainfrom
rahulsmahadev:core-content-cache-skip-length-probe
Draft

Core: Skip length check in ContentCache.tryCache for cached locations#16787
rahulsmahadev wants to merge 1 commit into
apache:mainfrom
rahulsmahadev:core-content-cache-skip-length-probe

Conversation

@rahulsmahadev

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #16762, recommended by @szehon-ho in his review there.

Unlike manifests, manifest-list files do not track their length in table metadata, so the InputFile handed to ContentCache.tryCache resolves its length with a remote getFileStatus/HEAD call. Since tryCache gates on input.getLength() <= maxContentLength before wrapping, every BaseSnapshot.cacheManifests call pays that round-trip — including cache hits, which is exactly the table-refresh/streaming-poll case that manifest-list caching targets.

This change probes the cache for the location first and only consults getLength() for uncached locations, making cache hits round-trip-free:

  • The probe uses cache.asMap().containsKey(...) rather than getIfPresent(...) so it does not record cache statistics — a getIfPresent probe would count a hit before any read happens and skew stats consumers.
  • An entry can only be present if it previously passed the maxContentLength gate, so skipping the length check for cached locations does not change which files are cacheable.
  • If the entry is evicted between the probe and the read, CachingInputFile falls back to loading through the regular path, same as today.

The change is independent of #16762 (tryCache is used for manifest caching on main today, where lengths are known locally and the gate is free), but it matters once manifest lists flow through the cache.

Test plan

New TestContentCache covering:

  • a cached location is wrapped without calling getLength() on the input (the test input throws if its length is resolved), and reads are served from the cache
  • the probe records no hit/miss stats; the subsequent read records exactly one hit and no miss
  • uncached locations still resolve length, and oversized files are still returned unwrapped and uncached

Also ran TestManifestCaching (cache stats/size expectations unchanged) plus spotless locally.

This pull request and its description were written by Isaac, an AI coding assistant (Claude Code), on behalf of @rahulsmahadev.

@github-actions github-actions Bot added the core label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant