Open
Conversation
Instrument LocalDiskBackend.read_file() and write_file() which already compute timing and bandwidth internally (logged via logger.debug). New metrics (mirrors remote_* pattern): lmcache:local_disk_read_bytes_total Counter lmcache:local_disk_write_bytes_total Counter lmcache:local_disk_read_latency Histogram (seconds) lmcache:local_disk_write_latency Histogram (seconds)
Track disk backend evictions via lmcache:local_disk_evict_count counter. Mirrors existing local_cpu_evict_count pattern. Called from LocalDiskBackend.remove() when cached entries are evicted.
Split aggregate hit tokens into per-backend (cpu/disk/remote) counters.
_process_tokens_internal already iterates block_mapping keyed by backend
name. This PR counts tokens per backend and exposes them via the
num_hit_tokens counter with tier={cpu,disk,remote} labels.
Before: lmcache:num_hit_tokens{tier="local"} 5000
After: lmcache:num_hit_tokens{tier="cpu"} 4200
lmcache:num_hit_tokens{tier="disk"} 800
lmcache:num_hit_tokens{tier="remote"} 0
Time each batched_get call per backend in _process_tokens_internal.
Exposes lmcache:tier_get_latency{tier="cpu|disk|remote"} histogram
so operators can identify which cache tier causes retrieval latency.
New metric:
lmcache:tier_get_latency{tier="cpu"} Histogram (seconds)
lmcache:tier_get_latency{tier="disk"} Histogram (seconds)
lmcache:tier_get_latency{tier="remote"} Histogram (seconds)
Track which cache tier served each retrieve request:
lmcache:request_tier_served{tier="cpu|disk|remote|mixed|miss"} Counter
lmcache:request_tier_hit_tokens{tier="cpu|disk|remote"} Histogram
Dominant tier = tier serving >50% of tokens. "mixed" = no majority.
Enables tier migration heatmaps and per-request routing analysis.
Layerwise path (retrieve_layer): location is already known via storage_manager.contains(). Set cpu/disk/remote_hit_tokens on retrieve_stats before on_retrieve_finished. Async path (_async_process_tokens_internal): track which backend each key came from via key_to_backend map built during result unpacking. Propagate per-backend token counts to retrieve_stats. Also update cleanup_memory_objs to handle the new result format from gather_with_keys which now includes backend names.
Modify gather_with_keys() to include backend_name alongside each tier's key-memobj pairs, enabling per-tier hit attribution in the async path. Update prefetch_all_done_callback to unpack the new tuple format. Update existing tests to match.
Clean up comments referencing internal PR numbers and reuse disk_latency_buckets for tier_get_latency histogram.
Prevents dataclass initialization errors by providing sensible defaults for all fields (0 for ints, 0.0 for floats, field(default_factory=list) for lists).
3e84e60 to
f4cf7c2
Compare
Change DEFAULT_PROMETHEUS_CONFIG to enabled=True so metrics are collected out of the box.
f4cf7c2 to
1a07771
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This PR updates the observability and telemetry subsystems to track metrics by storage tier.
When we use multiple storage backends (CPU, disk, remote) or async retrieve operations, aggregating all hits into a single counter makes it impossible to debug performance bottlenecks.
Main changes:
tier="cpu",tier="disk", andtier="remote"labels to Prometheus hit token counters.0,0.0,field(default_factory=list)) to LMCacheStats fields to fix dataclass initialization errors.If applicable: