Skip to content

Feature/issue 227 expose token usage stats#249

Open
drr00t wants to merge 9 commits into
FalkorDB:mainfrom
drr00t:feature/issue-227-expose-token-usage-stats
Open

Feature/issue 227 expose token usage stats#249
drr00t wants to merge 9 commits into
FalkorDB:mainfrom
drr00t:feature/issue-227-expose-token-usage-stats

Conversation

@drr00t
Copy link
Copy Markdown
Contributor

@drr00t drr00t commented May 11, 2026

Summary

This is a propose for LLM token usage tracking at provider level. This ensures that LLM and embedding token consumption is accurately recorded, aggregated and exposed across the SDK's publbic API.

Changes

  • Verification: Confirmed TokenUsage model implementation in core/models.py with support for additive aggregation (+, +=)
  • Test Validation: Validated that tests/test_token_usage.py
  • Documentation Audit: Verified that docs/totken-usage.md is comprehensive, covering quick-start examples, cost estimation patterns, and provider integration guides.

Test Plan

  • All existing tests pass (pytest tests/ -q)
  • New tests added for new functionality (if applicable)
  • Lint passes (ruff check src/)

Notes

The feature was based on #227 issue

Summary by CodeRabbit

  • New Features

    • Token usage tracking added to ingestion, retrieval, and completion results with context-aware propagation, snapshotting, and provider instrumentation; SDK now exposes TokenUsage at top level.
  • Documentation

    • Comprehensive Token Usage guide plus updates to API, getting-started, providers, retrieval, and index docs with examples and integration notes.
  • Tests

    • New tests for token-usage models, context recording, provider wiring, and end-to-end snapshots.
  • Bug Fixes

    • Changelog updated; lint issues resolved and CI now enforces lint rules.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR adds TokenUsage tracking: a TokenUsage model, Context.usage and record_usage, provider async ctx parameters and instrumentation, threading ctx through orchestration and storage, snapshotting usage into result objects, documentation updates, and tests.

Changes

Token Usage Tracking

Layer / File(s) Summary
TokenUsage Data Model
graphrag_sdk/src/graphrag_sdk/core/models.py
New TokenUsage Pydantic model with prompt_tokens, completion_tokens, embedding_tokens fields (default 0) and __add__/__iadd__ aggregation.
Result Model Extensions
graphrag_sdk/src/graphrag_sdk/core/models.py
IngestionResult, RagResult, and RetrieverResult each gain usage: TokenUsage = Field(default_factory=TokenUsage).
Context Usage Tracking
graphrag_sdk/src/graphrag_sdk/core/context.py
Context gains usage: TokenUsage and record_usage(*, prompt_tokens=0, completion_tokens=0, embedding_tokens=0); child contexts start with zeroed usage.
Public API Export
graphrag_sdk/src/graphrag_sdk/__init__.py
TokenUsage re-exported and added to __all__.
Provider Interface Updates
graphrag_sdk/src/graphrag_sdk/core/providers/base.py
LLMInterface and Embedder async methods (ainvoke, ainvoke_messages, abatch_invoke, aembed_query, aembed_documents, astream) accept optional keyword-only `ctx: Any
LiteLLM Provider
graphrag_sdk/src/graphrag_sdk/core/providers/litellm.py
Adds _extract_llm_usage/_extract_embedding_usage; ainvoke, ainvoke_messages, _raw_embed_async, aembed_query, aembed_documents accept ctx and call ctx.record_usage(...) with extracted counts.
OpenRouter Provider
graphrag_sdk/src/graphrag_sdk/core/providers/openrouter.py
Adds OpenAI-style usage extractors; ainvoke, ainvoke_messages, _raw_embed_async, aembed_query, aembed_documents accept ctx and record extracted usage; batched embedding propagation accepts ctx.
API Orchestration
graphrag_sdk/src/graphrag_sdk/api/main.py
retrieve() sets retriever_result.usage from ctx.usage.model_copy(); _rewrite_question_with_history() and completion() forward ctx into LLM calls; completion() returns RagResult(usage=ctx.usage).
Pipeline & Integration
graphrag_sdk/src/graphrag_sdk/ingestion/pipeline.py, ingestion/extraction_strategies/graph_extraction.py, retrieval/strategies/multi_path.py, storage/vector_store.py
Threads ctx into entity extraction, batched LLM calls, embedder calls, and VectorStore.index_chunks(..., ctx=ctx); pipeline snapshots ctx.usage into metadata.
Documentation
docs/api-reference.md, docs/providers.md, docs/retrieval.md, docs/token-usage.md, docs/getting-started.md, docs/index.md, docs/ingestion.md
Adds TokenUsage section, updates provider override signatures/tips, expands Context API docs, updates examples to show result.usage.*, and adds a dedicated token-usage guide with aggregation and cost estimation examples.
Test Coverage
graphrag_sdk/tests/test_token_usage.py, graphrag_sdk/tests/test_facade.py
New tests for TokenUsage arithmetic/serialization, Context.record_usage behavior/isolation, result usage fields, provider instrumentation (LiteLLM/OpenRouter), backward compatibility when ctx is omitted, export presence, and accumulation snapshots; facade test updated with new usage isolation test.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant GraphRAG
  participant LLM
  participant Embedder
  participant VectorStore

  Client->>GraphRAG: call retrieve()/completion() (passes ctx)
  GraphRAG->>LLM: ainvoke / ainvoke_messages (ctx=ctx)
  LLM-->>GraphRAG: LLMResponse (usage)
  GraphRAG->>Embedder: aembed_query / aembed_documents (ctx=ctx)
  Embedder-->>GraphRAG: embeddings (usage)
  GraphRAG->>VectorStore: index_chunks(..., ctx=ctx)
  VectorStore-->>GraphRAG: indexing result
  GraphRAG-->>Client: return result (usage snapshot from ctx)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • FalkorDB/GraphRAG-SDK#253: Overlaps changes to GraphRAG.completion() and related call paths; both PRs touch the same API orchestration code path.

Suggested reviewers

  • Naseem77

"🐰 Hops with glee through the token‑tracked trees,
Counting prompts, completions, and embeddings with ease,
Contexts keep tally while providers report true,
Results now carry the totals back to you! "

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: exposing token usage statistics through the SDK public API, which aligns with the extensive modifications across models, providers, documentation, and tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
graphrag_sdk/src/graphrag_sdk/core/context.py (1)

58-72: ⚡ Quick win

Document child-context usage loss in _ingest_batch workflow.

The clarified contract is good ("child starts with zero counters, NOT propagated back"). However, GraphRAG._ingest_batch in api/main.py passes parent_ctx.child() to each _ingest_single call — meaning per-source token usage is captured on each IngestionResult.usage (good) but the batch caller's ctx.usage will stay at zero, which may surprise users who pass their own ctx expecting an aggregated total. Consider mentioning this in the public docstring of GraphRAG.ingest() for the list-source case, e.g. "for batch ingest, inspect each IngestionResult.usage — the passed-in ctx is not updated."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphrag_sdk/src/graphrag_sdk/core/context.py` around lines 58 - 72, The
child Context.child(...) method resets usage counters, and
GraphRAG._ingest_batch currently calls parent_ctx.child() for each source, so
the original ctx.usage remains zero after batch ingest; update the public
docstring for GraphRAG.ingest (and optionally mention in GraphRAG._ingest_batch
docs) to explicitly state: when ingesting a list of sources the passed-in ctx
will not be updated with aggregated token usage and callers should inspect each
returned IngestionResult.usage for per-source usage (or pass the parent ctx
directly to _ingest_single if they want aggregation). Reference Context.child,
GraphRAG._ingest_batch, GraphRAG.ingest, IngestionResult.usage and ctx in the
docstring to make this behavior discoverable.
graphrag_sdk/src/graphrag_sdk/core/providers/base.py (1)

63-91: 💤 Low value

Async defaults silently drop ctx; document or make this explicit at call sites.

aembed_query and aembed_documents accept ctx but never forward it to self.embed_query/self.embed_documentsctx is documented as ignored in the default async path. That's a reasonable backwards-compat choice (the default-async path can't be sure a sync subclass accepts ctx), but note the asymmetry: embed_documents (line 75-85) does forward ctx to embed_query when non-None. Consider either:

  1. Making the sync default also drop ctx (consistency, simplest), or
  2. Documenting in the class-level docstring that subclasses wanting usage tracking from ctx must override the async methods (aembed_query, aembed_documents, ainvoke), since the defaults won't propagate it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py` around lines 63 - 91,
The async defaults (aembed_query, aembed_documents) accept ctx but do not
forward it to sync methods, while embed_documents forwards ctx to embed_query,
causing asymmetry; either make the sync path drop ctx for consistency or
document the behavior. Fix by updating the class-level docstring to state
explicitly that the default async implementations (aembed_query,
aembed_documents, ainvoke) do not propagate ctx and subclasses must override
them to support usage-tracking via ctx, and/or make embed_documents stop
forwarding ctx to embed_query so sync and async defaults behave consistently
(choose one approach and apply it to embed_documents, aembed_query,
aembed_documents, and mention ainvoke in the docstring).
graphrag_sdk/tests/test_token_usage.py (1)

251-251: ⚡ Quick win

Remove unnecessary @pytest.mark.asyncio decorators.

Based on learnings, the project's pytest-asyncio configuration uses asyncio_mode = "auto", which auto-detects async test functions. The explicit @pytest.mark.asyncio decorators throughout this file are unnecessary and can be removed.

♻️ Cleanup: remove decorators from all async tests

The decorators appear on lines 251, 266, 278, 293, 304, 319, 331, 346, 358, 376, 391, 406, 432, 453, 470, 482, 499, 515, 529, 619, and 643. Simply delete the @pytest.mark.asyncio line above each async test method.

Example diff for the first occurrence:

-    `@pytest.mark.asyncio`
     async def test_ainvoke_records_usage_when_ctx_provided(self):

Apply the same pattern to all async test methods in this file.

Based on learnings from graphrag_sdk/tests/test_coref_resolvers.py (PR #187): pytest-asyncio auto-detects async tests when asyncio_mode = "auto".

Also applies to: 266-266, 278-278, 293-293, 304-304, 319-319, 331-331, 346-346, 358-358, 376-376, 391-391, 406-406, 432-432, 453-453, 470-470, 482-482, 499-499, 515-515, 529-529, 619-619, 643-643

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphrag_sdk/tests/test_token_usage.py` at line 251, Remove the redundant
`@pytest.mark.asyncio` decorators that appear above async test functions in
graphrag_sdk/tests/test_token_usage.py; locate each occurrence of the literal
decorator line ("@pytest.mark.asyncio") (not the test function bodies) and
delete that line so the async test defs remain (e.g., the decorator currently
above the async test functions at the listed occurrences); ensure you leave the
async def test_... functions intact and run tests to verify asyncio_mode="auto"
handles them.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Around line 10-59: The CHANGELOG.md contains unresolved git conflict markers
around the [1.0.2] release and the new TokenUsage notes; remove the conflict
markers (<<<<<<<, =======, >>>>>>>), pick a single [1.0.2] header (delete the
duplicate), and merge the TokenUsage / implementation notes (the block
referencing TokenUsage, IngestionResult, RagResult, RetrieverResult,
VectorStore.index_chunks, and ctx.record_usage) into the appropriate changelog
section (e.g., Unreleased or 1.0.2) so the file conforms to Keep a Changelog
format; ensure the final file has no leftover markers and that the TokenUsage
text is integrated cleanly under a logical heading (Added/Changed) rather than
duplicated.

In `@graphrag_sdk/src/graphrag_sdk/api/main.py`:
- Around line 552-554: The retriever_result.usage is being aliased to the
mutable ctx.usage (set in retrieve()), which later gets completion tokens via
self.llm.ainvoke_messages in completion(), causing retriever_result to
incorrectly include completion tokens; fix by snapshotting the TokenUsage when
attaching it: replace direct assignments like retriever_result.usage = ctx.usage
and RagResult(usage=ctx.usage) with pydantic model copies (e.g.,
ctx.usage.model_copy() or an equivalent deep copy) so retriever_result.usage
captures the retrieval-only usage and RagResult.usage gets an independent
snapshot, leaving ctx.usage free to be mutated afterwards.

In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 117-120: Remove the unused ctx parameter from the synchronous
abstract method signature so callers cannot pass ctx into sync providers: update
the abstractmethod def invoke in BaseProvider (or the class where this ABC
lives) to be def invoke(self, prompt: str, **kwargs: Any) -> LLMResponse (drop
ctx from the signature and related type hint), adjust the docstring to note sync
path does not accept usage-tracking ctx, and ensure concrete sync
implementations (LiteLLM.invoke and OpenRouterLLM.invoke) match this signature;
leave ainvoke (async def ainvoke(..., ctx: Any | None = None, ...)) unchanged so
only async paths accept ctx.

---

Nitpick comments:
In `@graphrag_sdk/src/graphrag_sdk/core/context.py`:
- Around line 58-72: The child Context.child(...) method resets usage counters,
and GraphRAG._ingest_batch currently calls parent_ctx.child() for each source,
so the original ctx.usage remains zero after batch ingest; update the public
docstring for GraphRAG.ingest (and optionally mention in GraphRAG._ingest_batch
docs) to explicitly state: when ingesting a list of sources the passed-in ctx
will not be updated with aggregated token usage and callers should inspect each
returned IngestionResult.usage for per-source usage (or pass the parent ctx
directly to _ingest_single if they want aggregation). Reference Context.child,
GraphRAG._ingest_batch, GraphRAG.ingest, IngestionResult.usage and ctx in the
docstring to make this behavior discoverable.

In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 63-91: The async defaults (aembed_query, aembed_documents) accept
ctx but do not forward it to sync methods, while embed_documents forwards ctx to
embed_query, causing asymmetry; either make the sync path drop ctx for
consistency or document the behavior. Fix by updating the class-level docstring
to state explicitly that the default async implementations (aembed_query,
aembed_documents, ainvoke) do not propagate ctx and subclasses must override
them to support usage-tracking via ctx, and/or make embed_documents stop
forwarding ctx to embed_query so sync and async defaults behave consistently
(choose one approach and apply it to embed_documents, aembed_query,
aembed_documents, and mention ainvoke in the docstring).

In `@graphrag_sdk/tests/test_token_usage.py`:
- Line 251: Remove the redundant `@pytest.mark.asyncio` decorators that appear
above async test functions in graphrag_sdk/tests/test_token_usage.py; locate
each occurrence of the literal decorator line ("@pytest.mark.asyncio") (not the
test function bodies) and delete that line so the async test defs remain (e.g.,
the decorator currently above the async test functions at the listed
occurrences); ensure you leave the async def test_... functions intact and run
tests to verify asyncio_mode="auto" handles them.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54576f3f-53dd-4a63-a71d-d0ea039cb457

📥 Commits

Reviewing files that changed from the base of the PR and between 58e7764 and fadba63.

📒 Files selected for processing (20)
  • CHANGELOG.md
  • docs/api-reference.md
  • docs/getting-started.md
  • docs/index.md
  • docs/ingestion.md
  • docs/providers.md
  • docs/retrieval.md
  • docs/token-usage.md
  • graphrag_sdk/src/graphrag_sdk/__init__.py
  • graphrag_sdk/src/graphrag_sdk/api/main.py
  • graphrag_sdk/src/graphrag_sdk/core/context.py
  • graphrag_sdk/src/graphrag_sdk/core/models.py
  • graphrag_sdk/src/graphrag_sdk/core/providers/base.py
  • graphrag_sdk/src/graphrag_sdk/core/providers/litellm.py
  • graphrag_sdk/src/graphrag_sdk/core/providers/openrouter.py
  • graphrag_sdk/src/graphrag_sdk/ingestion/extraction_strategies/graph_extraction.py
  • graphrag_sdk/src/graphrag_sdk/ingestion/pipeline.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py
  • graphrag_sdk/src/graphrag_sdk/storage/vector_store.py
  • graphrag_sdk/tests/test_token_usage.py

Comment thread CHANGELOG.md Outdated
Comment thread graphrag_sdk/src/graphrag_sdk/api/main.py
Comment thread graphrag_sdk/src/graphrag_sdk/core/providers/base.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 87-91: The async fallback aembed_documents drops the tracking
context because it calls self.embed_documents without passing ctx; update the
asyncio.to_thread call in aembed_documents to pass ctx through (i.e., call
self.embed_documents with texts, ctx=ctx, and existing **kwargs) so the
usage-tracking context is preserved when using the default sync-in-thread path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c0bf0d08-e5b6-4fec-a9d3-91a2d2af9ea6

📥 Commits

Reviewing files that changed from the base of the PR and between fadba63 and 83c9eca.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • graphrag_sdk/src/graphrag_sdk/api/main.py
  • graphrag_sdk/src/graphrag_sdk/core/providers/base.py
  • graphrag_sdk/tests/test_facade.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • graphrag_sdk/src/graphrag_sdk/api/main.py

Comment thread graphrag_sdk/src/graphrag_sdk/core/providers/base.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant