Track cache token metrics for Anthropic and OpenAI#144
Open
Conversation
Add Prometheus counters for cache creation tokens (with duration label) and cache read tokens. Anthropic responses include per-duration breakdowns (5m/1h) for non-streaming, and flat totals for streaming that are resolved using the request's cache_control TTL hint. OpenAI cached read tokens are extracted from prompt_tokens_details/input_tokens_details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fc69c18 to
8c74735
Compare
aviau
reviewed
Apr 27, 2026
| assert_eq!(metadata.output_tokens, Some(150)); | ||
| } | ||
|
|
||
| // --- Cache token tests --- |
aviau
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ref https://linear.app/flared/issue/AGENTIC-50/track-cached-token-usage
Summary
lacuna_provider_tokens_cache_creation_total(withcache_durationlabel: 5m/1h/unknown) andlacuna_provider_tokens_cache_read_totalfor cache token trackingcache_controlblocks; resolve flat streaming totals into duration-labeled buckets viaRequestInspectionMetadatapassed toresponse_inspectorcached_tokensfrom OpenAIprompt_tokens_details(Chat) andinput_tokens_details(Responses)Test plan
cache_creationbreakdown -> "5m" and "1h" keys🤖 Generated with Claude Code