Cache iceberg metadata to reduce redundant requests to storage#509
Cache iceberg metadata to reduce redundant requests to storage#509cbb330 wants to merge 15 commits intolinkedin:mainfrom
Conversation
fe80df8 to
b646513
Compare
| @Autowired MeterRegistry meterRegistry; | ||
|
|
||
| /** Cache FileIO by TableIdentifier to avoid redundant HTS lookups within the same request. */ | ||
| private final Cache<TableIdentifier, FileIO> fileIOCache = |
There was a problem hiding this comment.
did we do tracing that this is actually used from the 6 callsites to reduce to 1? presumably this is visible on the flame graph.
There was a problem hiding this comment.
I will add this (and hdfs logs) to the PR as per e2e testing in test cluster
...lcatalog/src/main/java/com/linkedin/openhouse/internal/catalog/OpenHouseInternalCatalog.java
Outdated
Show resolved
Hide resolved
...lcatalog/src/main/java/com/linkedin/openhouse/internal/catalog/OpenHouseInternalCatalog.java
Outdated
Show resolved
Hide resolved
...tables/src/main/java/com/linkedin/openhouse/tables/services/IcebergSnapshotsServiceImpl.java
Outdated
Show resolved
Hide resolved
...tables/src/main/java/com/linkedin/openhouse/tables/services/IcebergSnapshotsServiceImpl.java
Outdated
Show resolved
Hide resolved
Reuse request-scoped metadata and toggle lookups so repeated refreshes in a single request avoid extra HDFS reads and duplicate HTS calls without weakening commit concurrency checks. Made-with: Cursor
b646513 to
552381d
Compare
|
|
||
| @Autowired MeterRegistry meterRegistry; | ||
|
|
||
| @Autowired(required = false) |
There was a problem hiding this comment.
will fix this, it requires test fixes.
Register the request cache from internalcatalog configuration so table service contexts can inject it consistently without optional autowiring fallbacks. Made-with: Cursor
Clarify why RequestScopedCache is provided from internal catalog config and replace bespoke RequestAttributes test helpers with Spring's servlet-backed request attributes. Made-with: Cursor
6d979e1 to
a7c4ab9
Compare
Replace the hand-rolled request cache with a Spring CacheManager backed by Caffeine so request-local metadata reuse is easier to extend and observe later. Made-with: Cursor
Made-with: Cursor
The metadata write path already rethrows cache seeding failures before house table mapping runs, so the extra variable never changes behavior. Made-with: Cursor
Use a dedicated ConfigurationProperties bean so Spring applies defaults and binds cache overrides from cluster YAML without manual parsing in InternalCatalogConfig. Made-with: Cursor
This keeps the cache manager owned by the tables app so metadata cache settings bind through the shared Spring environment without creating the cache in other services. Made-with: Cursor
Keep the metadata cache properties and cache manager in the tables service so Spring Boot can bind them directly without redundant internal catalog config types. Made-with: Cursor
Keep the tables metadata-cache namespace owned by the deployable while leaving cache configuration and tests in internalcatalog. Made-with: Cursor
Verify the tables-service metadata cache properties configure the internal catalog Caffeine cache end to end, and add the test-only dependencies needed to assert that wiring directly. Made-with: Cursor
Keep services/tables focused on generic cache property propagation and move Caffeine-specific verification into internalcatalog so backend changes stay local to that module. Made-with: Cursor
Differentiate raw Spring-bound properties, module settings, and cache bean wiring so the tables app boundary and internalcatalog runtime config are easier to follow. Made-with: Cursor
Rely on Spring's default method-based bean name for the internal catalog cache manager so the wiring stays explicit in tests without duplicating the same name in configuration. Made-with: Cursor
| @Setter | ||
| public static class MetadataCache { | ||
| private Duration ttl = Duration.ofMinutes(5); | ||
| private long maxSize = 1000; |
There was a problem hiding this comment.
If this is a per request cache, why do we need more than one entity stored and a TTL?
There was a problem hiding this comment.
the per request cache is not able to emit prometheus metrics, so we can't monitor it.
hence I'm changed strategy to use per-pod cache and key with UUID -- we get metrics, better cache-hit ratio, and don't affect our concurrency model.
There was a problem hiding this comment.
don't affect our concurrency model.
I'm not sure.
T0 - Table Manifest State X
T1 - MAnifest is cached
T2 - Table Manifest updated to state Y
T3 - Cached value is read vs latest value and transaction fails.
T4 - Cached value is read vs latest value and transaction fails.
T5 - Cache TTLs.
A per-request cache limited lifecycle makes this a non-issue.
There was a problem hiding this comment.
"the manifest" refers to a manifest file (the Avro file that tracks a list of data files). this is distinct from "the metadata" which specifically refers to a json file that represents table state e.g. /data/openhouse/<db>/<table>/01226-03d77993-b04c-46a2-b3a8-dbb7b6b45783.metadata.json
and the metadata is immutable, once the UUID is generated and file is written it can never change.
so in the same format listed above it more accurately this:
T0 - Table Metadata State X
T1 - Metadata UUIDv1 is cached
T2 - Table Metadata updated to state Y
T3 - Cache miss because UUID is not found
T4 - Metadata UUIDv2 is cached
T5 - Metadata UUIDv1 TTLs
There was a problem hiding this comment.
Trying to understand the approach here and why per pod cache? As there are 56 tables service pod, the issue with per pod cache is cache miss scenario could be more as most of the requests could land on different pods. What are we missing with request level cache? I believe idea was to optimize multiple metadata calls for a given requests.
|
|
||
| FileIOManager fileIOManager; | ||
|
|
||
| TableMetadataCache tableMetadataCache; |
There was a problem hiding this comment.
This can be a generic file cache no? I might call it a file cache but the variable name be metadata.
There was a problem hiding this comment.
there are two layers here: internalCatalogCacheManager can be generic for any file since it is just the Spring CacheManager, but the injected wrapper is still intentionally TableMetadataCache and specific to tablemetdata.
Right now the cache namespace is tableMetadata and the value type is still TableMetadata
We can easily reuse the generic internalCatalogCacheManager for other use cases later like this
@CachePut(
cacheManager = "internalCatalogCacheManager",
cacheNames = "newUseCase",
key = "#newUseCaseUUID")| @CachePut( | ||
| cacheManager = "internalCatalogCacheManager", | ||
| cacheNames = "tableMetadata", | ||
| key = "#metadataLocation") |
There was a problem hiding this comment.
SO this abstracts the cache as a spring component?
There was a problem hiding this comment.
Yea.SpringTableMetadataCache is the Spring-managed bean we inject as the implementation. This gives us runtime dependency injection so it allows us to override in the li-openhouse variant at runtime if needed.
And @Cacheable / @CachePut delegate the actual cache read/write through Spring’s cache abstraction rather than us calling Caffeine directly. This gives us auto-instrumentation among other things.
internalCatalogCacheManager represents a named reference to the logic actually used for implementation , and in our config that’s currently a CaffeineCacheManager.
so the end result is that callers stay behind the TableMetadataCache interface and don’t need to know about the concrete cache impl and the impl is mostly framework config.
| @@ -0,0 +1,76 @@ | |||
| package com.linkedin.openhouse.tables.config; | |||
There was a problem hiding this comment.
I think we need a test where there are two racing commits so we can ensure stale metadata being returned from the cache does not impact.
There was a problem hiding this comment.
I think the racing-commits case is safe here because the metadata is UUID-keyed, so concurrent commits are not reusing the same cached object.
For example, commit 1 writes uuid-1.json and atomically swaps that into HTS. If commit 2 races, it reads uuid-1.json from HTS, evolves that state, writes the updated metadata back to HTS, and then caches it as uuid-2.json. So even if an older cache entry exists, the next commit is still advancing from the HTS-backed metadata, not from stale cached metadata.
mkuchenbecker
left a comment
There was a problem hiding this comment.
Is this ready for review?
ca6b5ed to
fbf3a65
Compare
| @Getter | ||
| @Setter | ||
| public static class MetadataCache { | ||
| private Duration ttl = Duration.ofMinutes(5); |
There was a problem hiding this comment.
Was any profiling was done on per-entry TableMetadata memory size to validate the 1000-entry max ?
| public class SpringTableMetadataCache implements TableMetadataCache { | ||
|
|
||
| @Override | ||
| @Cacheable( |
There was a problem hiding this comment.
what would happen if there are concurrent requests for same metadata location? Probably both could miss and read from HDFS. Consider https://docs.spring.io/spring-framework/reference/integration/cache/annotations.html#cache-annotations-cacheable-synchronized ?
| cacheNames = "tableMetadata", | ||
| key = "#metadataLocation") | ||
| public TableMetadata load(String metadataLocation, Supplier<TableMetadata> metadataLoader) { | ||
| return metadataLoader.get(); |
There was a problem hiding this comment.
Is the caching here per request basis? What happens for the scenario immediate get after put or multiople concurrent get after put?
Design Doc Summary
Full design doc: https://docs.google.com/document/d/1J4yNWVePjm5nknsiuUmp-eYEJ_eXIeCLhjASgawi2tY
Problem
Kafka ETL reported that OpenHouse
PUT /snapshots(commit) p99 latency is missing SLOs, sometimes exceeding 1 minute. Root cause:refreshMetadatais called 6 times per commit request, but 3 of those are redundant real HDFS reads of the same pre-commit metadata file. This happens because eachloadTable()creates a freshTableOperationsinstance with no shared cache.The 6 refreshes break down as:
TablesService.putTableinitialfindByIdsave()redundantexistsByIdsave()realcatalog.loadTableops.refresh()Solution: TableMetadata Cache
TableMetadatacache keyed by metadata location eliminates redundant HDFS reads acrossloadTable()calls within the same podCache Configuration
hitRate()and eviction stats to guide future TTL tuningBefore/After Validation
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Internal API Changes:
TableMetadataCacheandToggleStatusCache, and wire their defaults through SpringCacheManagerinstead of the custom request-cache abstraction.OpenHouseCachePropertiesplus internal Spring configurations so each semantic cache can chooserequestorpodlifetime while downstream repos can still override the semantic bean directly.Performance Improvements:
simpleandcaffeinebackends for longer-lived caching when enabled.recordStats()so pod-scoped caches fit Spring and Micrometer cache instrumentation.Refactoring:
CacheandCacheManageradapters, including a lightweightRequestAttributes-backed cache manager for request lifetime.Tests:
InternalCatalogConfigTestandToggleStatusCacheConfigTest.Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Added new tests for the changes made:
Added lifecycle-selection and no-request fallback coverage in
InternalCatalogConfigTestandToggleStatusCacheConfigTest.Updated existing tests to reflect the changes made:
Updated
OpenHouseInternalTableOperationsTest,RepositoryTestWithSettableComponents, andToggleStatusesRepositoryImplTestto consume the new semantic cache interfaces and lifecycle-neutral wiring.Validation run:
JAVA_HOME=$(/usr/libexec/java_home -v 17) ./gradlew :services:common:spotlessJavaCheck :iceberg:openhouse:internalcatalog:spotlessJavaCheck :services:tables:spotlessJavaCheck :iceberg:openhouse:internalcatalog:test --tests com.linkedin.openhouse.internal.catalog.InternalCatalogConfigTest --tests com.linkedin.openhouse.internal.catalog.OpenHouseInternalTableOperationsTest :services:tables:test --tests com.linkedin.openhouse.tables.config.ToggleStatusCacheConfigTest --tests com.linkedin.openhouse.tables.toggle.repository.ToggleStatusesRepositoryImplTest --tests com.linkedin.openhouse.tables.e2e.h2.RepositoryTestWithSettableComponentsAdditional Information
For all the boxes checked, include additional details of the changes made in this pull request.