Feature/issue 227 expose token usage stats#249
Conversation
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughThis PR adds TokenUsage tracking: a TokenUsage model, Context.usage and record_usage, provider async ctx parameters and instrumentation, threading ctx through orchestration and storage, snapshotting usage into result objects, documentation updates, and tests. ChangesToken Usage Tracking
Sequence Diagram(s)sequenceDiagram
participant Client
participant GraphRAG
participant LLM
participant Embedder
participant VectorStore
Client->>GraphRAG: call retrieve()/completion() (passes ctx)
GraphRAG->>LLM: ainvoke / ainvoke_messages (ctx=ctx)
LLM-->>GraphRAG: LLMResponse (usage)
GraphRAG->>Embedder: aembed_query / aembed_documents (ctx=ctx)
Embedder-->>GraphRAG: embeddings (usage)
GraphRAG->>VectorStore: index_chunks(..., ctx=ctx)
VectorStore-->>GraphRAG: indexing result
GraphRAG-->>Client: return result (usage snapshot from ctx)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
graphrag_sdk/src/graphrag_sdk/core/context.py (1)
58-72: ⚡ Quick winDocument child-context usage loss in
_ingest_batchworkflow.The clarified contract is good ("child starts with zero counters, NOT propagated back"). However,
GraphRAG._ingest_batchinapi/main.pypassesparent_ctx.child()to each_ingest_singlecall — meaning per-source token usage is captured on eachIngestionResult.usage(good) but the batch caller'sctx.usagewill stay at zero, which may surprise users who pass their ownctxexpecting an aggregated total. Consider mentioning this in the public docstring ofGraphRAG.ingest()for the list-source case, e.g. "for batch ingest, inspect eachIngestionResult.usage— the passed-inctxis not updated."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@graphrag_sdk/src/graphrag_sdk/core/context.py` around lines 58 - 72, The child Context.child(...) method resets usage counters, and GraphRAG._ingest_batch currently calls parent_ctx.child() for each source, so the original ctx.usage remains zero after batch ingest; update the public docstring for GraphRAG.ingest (and optionally mention in GraphRAG._ingest_batch docs) to explicitly state: when ingesting a list of sources the passed-in ctx will not be updated with aggregated token usage and callers should inspect each returned IngestionResult.usage for per-source usage (or pass the parent ctx directly to _ingest_single if they want aggregation). Reference Context.child, GraphRAG._ingest_batch, GraphRAG.ingest, IngestionResult.usage and ctx in the docstring to make this behavior discoverable.graphrag_sdk/src/graphrag_sdk/core/providers/base.py (1)
63-91: 💤 Low valueAsync defaults silently drop
ctx; document or make this explicit at call sites.
aembed_queryandaembed_documentsacceptctxbut never forward it toself.embed_query/self.embed_documents—ctxis documented as ignored in the default async path. That's a reasonable backwards-compat choice (the default-async path can't be sure a sync subclass acceptsctx), but note the asymmetry:embed_documents(line 75-85) does forwardctxtoembed_querywhen non-None. Consider either:
- Making the sync default also drop
ctx(consistency, simplest), or- Documenting in the class-level docstring that subclasses wanting usage tracking from
ctxmust override the async methods (aembed_query,aembed_documents,ainvoke), since the defaults won't propagate it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py` around lines 63 - 91, The async defaults (aembed_query, aembed_documents) accept ctx but do not forward it to sync methods, while embed_documents forwards ctx to embed_query, causing asymmetry; either make the sync path drop ctx for consistency or document the behavior. Fix by updating the class-level docstring to state explicitly that the default async implementations (aembed_query, aembed_documents, ainvoke) do not propagate ctx and subclasses must override them to support usage-tracking via ctx, and/or make embed_documents stop forwarding ctx to embed_query so sync and async defaults behave consistently (choose one approach and apply it to embed_documents, aembed_query, aembed_documents, and mention ainvoke in the docstring).graphrag_sdk/tests/test_token_usage.py (1)
251-251: ⚡ Quick winRemove unnecessary
@pytest.mark.asynciodecorators.Based on learnings, the project's
pytest-asyncioconfiguration usesasyncio_mode = "auto", which auto-detects async test functions. The explicit@pytest.mark.asynciodecorators throughout this file are unnecessary and can be removed.♻️ Cleanup: remove decorators from all async tests
The decorators appear on lines 251, 266, 278, 293, 304, 319, 331, 346, 358, 376, 391, 406, 432, 453, 470, 482, 499, 515, 529, 619, and 643. Simply delete the
@pytest.mark.asyncioline above each async test method.Example diff for the first occurrence:
- `@pytest.mark.asyncio` async def test_ainvoke_records_usage_when_ctx_provided(self):Apply the same pattern to all async test methods in this file.
Based on learnings from
graphrag_sdk/tests/test_coref_resolvers.py(PR#187): pytest-asyncio auto-detects async tests whenasyncio_mode = "auto".Also applies to: 266-266, 278-278, 293-293, 304-304, 319-319, 331-331, 346-346, 358-358, 376-376, 391-391, 406-406, 432-432, 453-453, 470-470, 482-482, 499-499, 515-515, 529-529, 619-619, 643-643
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@graphrag_sdk/tests/test_token_usage.py` at line 251, Remove the redundant `@pytest.mark.asyncio` decorators that appear above async test functions in graphrag_sdk/tests/test_token_usage.py; locate each occurrence of the literal decorator line ("@pytest.mark.asyncio") (not the test function bodies) and delete that line so the async test defs remain (e.g., the decorator currently above the async test functions at the listed occurrences); ensure you leave the async def test_... functions intact and run tests to verify asyncio_mode="auto" handles them.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Around line 10-59: The CHANGELOG.md contains unresolved git conflict markers
around the [1.0.2] release and the new TokenUsage notes; remove the conflict
markers (<<<<<<<, =======, >>>>>>>), pick a single [1.0.2] header (delete the
duplicate), and merge the TokenUsage / implementation notes (the block
referencing TokenUsage, IngestionResult, RagResult, RetrieverResult,
VectorStore.index_chunks, and ctx.record_usage) into the appropriate changelog
section (e.g., Unreleased or 1.0.2) so the file conforms to Keep a Changelog
format; ensure the final file has no leftover markers and that the TokenUsage
text is integrated cleanly under a logical heading (Added/Changed) rather than
duplicated.
In `@graphrag_sdk/src/graphrag_sdk/api/main.py`:
- Around line 552-554: The retriever_result.usage is being aliased to the
mutable ctx.usage (set in retrieve()), which later gets completion tokens via
self.llm.ainvoke_messages in completion(), causing retriever_result to
incorrectly include completion tokens; fix by snapshotting the TokenUsage when
attaching it: replace direct assignments like retriever_result.usage = ctx.usage
and RagResult(usage=ctx.usage) with pydantic model copies (e.g.,
ctx.usage.model_copy() or an equivalent deep copy) so retriever_result.usage
captures the retrieval-only usage and RagResult.usage gets an independent
snapshot, leaving ctx.usage free to be mutated afterwards.
In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 117-120: Remove the unused ctx parameter from the synchronous
abstract method signature so callers cannot pass ctx into sync providers: update
the abstractmethod def invoke in BaseProvider (or the class where this ABC
lives) to be def invoke(self, prompt: str, **kwargs: Any) -> LLMResponse (drop
ctx from the signature and related type hint), adjust the docstring to note sync
path does not accept usage-tracking ctx, and ensure concrete sync
implementations (LiteLLM.invoke and OpenRouterLLM.invoke) match this signature;
leave ainvoke (async def ainvoke(..., ctx: Any | None = None, ...)) unchanged so
only async paths accept ctx.
---
Nitpick comments:
In `@graphrag_sdk/src/graphrag_sdk/core/context.py`:
- Around line 58-72: The child Context.child(...) method resets usage counters,
and GraphRAG._ingest_batch currently calls parent_ctx.child() for each source,
so the original ctx.usage remains zero after batch ingest; update the public
docstring for GraphRAG.ingest (and optionally mention in GraphRAG._ingest_batch
docs) to explicitly state: when ingesting a list of sources the passed-in ctx
will not be updated with aggregated token usage and callers should inspect each
returned IngestionResult.usage for per-source usage (or pass the parent ctx
directly to _ingest_single if they want aggregation). Reference Context.child,
GraphRAG._ingest_batch, GraphRAG.ingest, IngestionResult.usage and ctx in the
docstring to make this behavior discoverable.
In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 63-91: The async defaults (aembed_query, aembed_documents) accept
ctx but do not forward it to sync methods, while embed_documents forwards ctx to
embed_query, causing asymmetry; either make the sync path drop ctx for
consistency or document the behavior. Fix by updating the class-level docstring
to state explicitly that the default async implementations (aembed_query,
aembed_documents, ainvoke) do not propagate ctx and subclasses must override
them to support usage-tracking via ctx, and/or make embed_documents stop
forwarding ctx to embed_query so sync and async defaults behave consistently
(choose one approach and apply it to embed_documents, aembed_query,
aembed_documents, and mention ainvoke in the docstring).
In `@graphrag_sdk/tests/test_token_usage.py`:
- Line 251: Remove the redundant `@pytest.mark.asyncio` decorators that appear
above async test functions in graphrag_sdk/tests/test_token_usage.py; locate
each occurrence of the literal decorator line ("@pytest.mark.asyncio") (not the
test function bodies) and delete that line so the async test defs remain (e.g.,
the decorator currently above the async test functions at the listed
occurrences); ensure you leave the async def test_... functions intact and run
tests to verify asyncio_mode="auto" handles them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 54576f3f-53dd-4a63-a71d-d0ea039cb457
📒 Files selected for processing (20)
CHANGELOG.mddocs/api-reference.mddocs/getting-started.mddocs/index.mddocs/ingestion.mddocs/providers.mddocs/retrieval.mddocs/token-usage.mdgraphrag_sdk/src/graphrag_sdk/__init__.pygraphrag_sdk/src/graphrag_sdk/api/main.pygraphrag_sdk/src/graphrag_sdk/core/context.pygraphrag_sdk/src/graphrag_sdk/core/models.pygraphrag_sdk/src/graphrag_sdk/core/providers/base.pygraphrag_sdk/src/graphrag_sdk/core/providers/litellm.pygraphrag_sdk/src/graphrag_sdk/core/providers/openrouter.pygraphrag_sdk/src/graphrag_sdk/ingestion/extraction_strategies/graph_extraction.pygraphrag_sdk/src/graphrag_sdk/ingestion/pipeline.pygraphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.pygraphrag_sdk/src/graphrag_sdk/storage/vector_store.pygraphrag_sdk/tests/test_token_usage.py
… tokens in completion results
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 87-91: The async fallback aembed_documents drops the tracking
context because it calls self.embed_documents without passing ctx; update the
asyncio.to_thread call in aembed_documents to pass ctx through (i.e., call
self.embed_documents with texts, ctx=ctx, and existing **kwargs) so the
usage-tracking context is preserved when using the default sync-in-thread path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c0bf0d08-e5b6-4fec-a9d3-91a2d2af9ea6
📒 Files selected for processing (4)
CHANGELOG.mdgraphrag_sdk/src/graphrag_sdk/api/main.pygraphrag_sdk/src/graphrag_sdk/core/providers/base.pygraphrag_sdk/tests/test_facade.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
- graphrag_sdk/src/graphrag_sdk/api/main.py
Summary
This is a propose for LLM token usage tracking at provider level. This ensures that LLM and embedding token consumption is accurately recorded, aggregated and exposed across the SDK's publbic API.
Changes
Test Plan
pytest tests/ -q)ruff check src/)Notes
The feature was based on #227 issue
Summary by CodeRabbit
New Features
Documentation
Tests
Bug Fixes