Feature/issue 227 expose token usage stats by drr00t · Pull Request #249 · FalkorDB/GraphRAG-SDK

drr00t · 2026-05-11T00:19:51Z

Summary

This is a propose for LLM token usage tracking at provider level. This ensures that LLM and embedding token consumption is accurately recorded, aggregated and exposed across the SDK's publbic API.

Changes

Verification: Confirmed TokenUsage model implementation in core/models.py with support for additive aggregation (+, +=)
Test Validation: Validated that tests/test_token_usage.py
Documentation Audit: Verified that docs/totken-usage.md is comprehensive, covering quick-start examples, cost estimation patterns, and provider integration guides.

Test Plan

All existing tests pass (pytest tests/ -q)
New tests added for new functionality (if applicable)
Lint passes (ruff check src/)

Notes

The feature was based on #227 issue

Summary by CodeRabbit

New Features
- Token usage tracking added to ingestion, retrieval, and completion results with context-aware propagation, snapshotting, and provider instrumentation; SDK now exposes TokenUsage at top level.
Documentation
- Comprehensive Token Usage guide plus updates to API, getting-started, providers, retrieval, and index docs with examples and integration notes.
Tests
- New tests for token-usage models, context recording, provider wiring, and end-to-end snapshots.
Bug Fixes
- Changelog updated; lint issues resolved and CI now enforces lint rules.

…across the RAG pipeline

coderabbitai · 2026-05-11T00:20:04Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR adds TokenUsage tracking: a TokenUsage model, Context.usage and record_usage, provider async ctx parameters and instrumentation, threading ctx through orchestration and storage, snapshotting usage into result objects, documentation updates, and tests.

Changes

Token Usage Tracking

Layer / File(s)	Summary
TokenUsage Data Model `graphrag_sdk/src/graphrag_sdk/core/models.py`	New `TokenUsage` Pydantic model with `prompt_tokens`, `completion_tokens`, `embedding_tokens` fields (default 0) and `__add__`/`__iadd__` aggregation.
Result Model Extensions `graphrag_sdk/src/graphrag_sdk/core/models.py`	`IngestionResult`, `RagResult`, and `RetrieverResult` each gain `usage: TokenUsage = Field(default_factory=TokenUsage)`.
Context Usage Tracking `graphrag_sdk/src/graphrag_sdk/core/context.py`	`Context` gains `usage: TokenUsage` and `record_usage(*, prompt_tokens=0, completion_tokens=0, embedding_tokens=0)`; child contexts start with zeroed usage.
Public API Export `graphrag_sdk/src/graphrag_sdk/__init__.py`	`TokenUsage` re-exported and added to `__all__`.
Provider Interface Updates `graphrag_sdk/src/graphrag_sdk/core/providers/base.py`	`LLMInterface` and `Embedder` async methods (`ainvoke`, `ainvoke_messages`, `abatch_invoke`, `aembed_query`, `aembed_documents`, `astream`) accept optional keyword-only `ctx: Any
LiteLLM Provider `graphrag_sdk/src/graphrag_sdk/core/providers/litellm.py`	Adds `_extract_llm_usage`/`_extract_embedding_usage`; `ainvoke`, `ainvoke_messages`, `_raw_embed_async`, `aembed_query`, `aembed_documents` accept `ctx` and call `ctx.record_usage(...)` with extracted counts.
OpenRouter Provider `graphrag_sdk/src/graphrag_sdk/core/providers/openrouter.py`	Adds OpenAI-style usage extractors; `ainvoke`, `ainvoke_messages`, `_raw_embed_async`, `aembed_query`, `aembed_documents` accept `ctx` and record extracted usage; batched embedding propagation accepts `ctx`.
API Orchestration `graphrag_sdk/src/graphrag_sdk/api/main.py`	`retrieve()` sets `retriever_result.usage` from `ctx.usage.model_copy()`; `_rewrite_question_with_history()` and `completion()` forward `ctx` into LLM calls; `completion()` returns `RagResult(usage=ctx.usage)`.
Pipeline & Integration `graphrag_sdk/src/graphrag_sdk/ingestion/pipeline.py`, `ingestion/extraction_strategies/graph_extraction.py`, `retrieval/strategies/multi_path.py`, `storage/vector_store.py`	Threads `ctx` into entity extraction, batched LLM calls, embedder calls, and `VectorStore.index_chunks(..., ctx=ctx)`; pipeline snapshots `ctx.usage` into metadata.
Documentation `docs/api-reference.md`, `docs/providers.md`, `docs/retrieval.md`, `docs/token-usage.md`, `docs/getting-started.md`, `docs/index.md`, `docs/ingestion.md`	Adds TokenUsage section, updates provider override signatures/tips, expands Context API docs, updates examples to show `result.usage.*`, and adds a dedicated token-usage guide with aggregation and cost estimation examples.
Test Coverage `graphrag_sdk/tests/test_token_usage.py`, `graphrag_sdk/tests/test_facade.py`	New tests for `TokenUsage` arithmetic/serialization, `Context.record_usage` behavior/isolation, result usage fields, provider instrumentation (LiteLLM/OpenRouter), backward compatibility when `ctx` is omitted, export presence, and accumulation snapshots; facade test updated with new usage isolation test.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant GraphRAG
  participant LLM
  participant Embedder
  participant VectorStore

  Client->>GraphRAG: call retrieve()/completion() (passes ctx)
  GraphRAG->>LLM: ainvoke / ainvoke_messages (ctx=ctx)
  LLM-->>GraphRAG: LLMResponse (usage)
  GraphRAG->>Embedder: aembed_query / aembed_documents (ctx=ctx)
  Embedder-->>GraphRAG: embeddings (usage)
  GraphRAG->>VectorStore: index_chunks(..., ctx=ctx)
  VectorStore-->>GraphRAG: indexing result
  GraphRAG-->>Client: return result (usage snapshot from ctx)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

FalkorDB/GraphRAG-SDK#253: Overlaps changes to GraphRAG.completion() and related call paths; both PRs touch the same API orchestration code path.

Suggested reviewers

Naseem77

"🐰 Hops with glee through the token‑tracked trees,
Counting prompts, completions, and embeddings with ease,
Contexts keep tally while providers report true,
Results now carry the totals back to you! "

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: exposing token usage statistics through the SDK public API, which aligns with the extensive modifications across models, providers, documentation, and tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

graphrag_sdk/src/graphrag_sdk/core/context.py (1)
58-72: ⚡ Quick win

Document child-context usage loss in _ingest_batch workflow.

The clarified contract is good ("child starts with zero counters, NOT propagated back"). However, GraphRAG._ingest_batch in api/main.py passes parent_ctx.child() to each _ingest_single call — meaning per-source token usage is captured on each IngestionResult.usage (good) but the batch caller's ctx.usage will stay at zero, which may surprise users who pass their own ctx expecting an aggregated total. Consider mentioning this in the public docstring of GraphRAG.ingest() for the list-source case, e.g. "for batch ingest, inspect each IngestionResult.usage — the passed-in ctx is not updated."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphrag_sdk/src/graphrag_sdk/core/context.py` around lines 58 - 72, The
child Context.child(...) method resets usage counters, and
GraphRAG._ingest_batch currently calls parent_ctx.child() for each source, so
the original ctx.usage remains zero after batch ingest; update the public
docstring for GraphRAG.ingest (and optionally mention in GraphRAG._ingest_batch
docs) to explicitly state: when ingesting a list of sources the passed-in ctx
will not be updated with aggregated token usage and callers should inspect each
returned IngestionResult.usage for per-source usage (or pass the parent ctx
directly to _ingest_single if they want aggregation). Reference Context.child,
GraphRAG._ingest_batch, GraphRAG.ingest, IngestionResult.usage and ctx in the
docstring to make this behavior discoverable.
graphrag_sdk/src/graphrag_sdk/core/providers/base.py (1)
63-91: 💤 Low value

Async defaults silently drop ctx; document or make this explicit at call sites.

aembed_query and aembed_documents accept ctx but never forward it to self.embed_query/self.embed_documents — ctx is documented as ignored in the default async path. That's a reasonable backwards-compat choice (the default-async path can't be sure a sync subclass accepts ctx), but note the asymmetry: embed_documents (line 75-85) does forward ctx to embed_query when non-None. Consider either:

Making the sync default also drop ctx (consistency, simplest), or

Documenting in the class-level docstring that subclasses wanting usage tracking from ctx must override the async methods (aembed_query, aembed_documents, ainvoke), since the defaults won't propagate it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py` around lines 63 - 91,
The async defaults (aembed_query, aembed_documents) accept ctx but do not
forward it to sync methods, while embed_documents forwards ctx to embed_query,
causing asymmetry; either make the sync path drop ctx for consistency or
document the behavior. Fix by updating the class-level docstring to state
explicitly that the default async implementations (aembed_query,
aembed_documents, ainvoke) do not propagate ctx and subclasses must override
them to support usage-tracking via ctx, and/or make embed_documents stop
forwarding ctx to embed_query so sync and async defaults behave consistently
(choose one approach and apply it to embed_documents, aembed_query,
aembed_documents, and mention ainvoke in the docstring).
graphrag_sdk/tests/test_token_usage.py (1)
251-251: ⚡ Quick win

Remove unnecessary @pytest.mark.asyncio decorators.

Based on learnings, the project's pytest-asyncio configuration uses asyncio_mode = "auto", which auto-detects async test functions. The explicit @pytest.mark.asyncio decorators throughout this file are unnecessary and can be removed.
♻️ Cleanup: remove decorators from all async tests

The decorators appear on lines 251, 266, 278, 293, 304, 319, 331, 346, 358, 376, 391, 406, 432, 453, 470, 482, 499, 515, 529, 619, and 643. Simply delete the @pytest.mark.asyncio line above each async test method.

Example diff for the first occurrence:
-    `@pytest.mark.asyncio`
     async def test_ainvoke_records_usage_when_ctx_provided(self):
Apply the same pattern to all async test methods in this file.
Based on learnings from graphrag_sdk/tests/test_coref_resolvers.py (PR #187): pytest-asyncio auto-detects async tests when asyncio_mode = "auto".

Also applies to: 266-266, 278-278, 293-293, 304-304, 319-319, 331-331, 346-346, 358-358, 376-376, 391-391, 406-406, 432-432, 453-453, 470-470, 482-482, 499-499, 515-515, 529-529, 619-619, 643-643
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphrag_sdk/tests/test_token_usage.py` at line 251, Remove the redundant
`@pytest.mark.asyncio` decorators that appear above async test functions in
graphrag_sdk/tests/test_token_usage.py; locate each occurrence of the literal
decorator line ("@pytest.mark.asyncio") (not the test function bodies) and
delete that line so the async test defs remain (e.g., the decorator currently
above the async test functions at the listed occurrences); ensure you leave the
async def test_... functions intact and run tests to verify asyncio_mode="auto"
handles them.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Around line 10-59: The CHANGELOG.md contains unresolved git conflict markers
around the [1.0.2] release and the new TokenUsage notes; remove the conflict
markers (<<<<<<<, =======, >>>>>>>), pick a single [1.0.2] header (delete the
duplicate), and merge the TokenUsage / implementation notes (the block
referencing TokenUsage, IngestionResult, RagResult, RetrieverResult,
VectorStore.index_chunks, and ctx.record_usage) into the appropriate changelog
section (e.g., Unreleased or 1.0.2) so the file conforms to Keep a Changelog
format; ensure the final file has no leftover markers and that the TokenUsage
text is integrated cleanly under a logical heading (Added/Changed) rather than
duplicated.

In `@graphrag_sdk/src/graphrag_sdk/api/main.py`:
- Around line 552-554: The retriever_result.usage is being aliased to the
mutable ctx.usage (set in retrieve()), which later gets completion tokens via
self.llm.ainvoke_messages in completion(), causing retriever_result to
incorrectly include completion tokens; fix by snapshotting the TokenUsage when
attaching it: replace direct assignments like retriever_result.usage = ctx.usage
and RagResult(usage=ctx.usage) with pydantic model copies (e.g.,
ctx.usage.model_copy() or an equivalent deep copy) so retriever_result.usage
captures the retrieval-only usage and RagResult.usage gets an independent
snapshot, leaving ctx.usage free to be mutated afterwards.

In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 117-120: Remove the unused ctx parameter from the synchronous
abstract method signature so callers cannot pass ctx into sync providers: update
the abstractmethod def invoke in BaseProvider (or the class where this ABC
lives) to be def invoke(self, prompt: str, **kwargs: Any) -> LLMResponse (drop
ctx from the signature and related type hint), adjust the docstring to note sync
path does not accept usage-tracking ctx, and ensure concrete sync
implementations (LiteLLM.invoke and OpenRouterLLM.invoke) match this signature;
leave ainvoke (async def ainvoke(..., ctx: Any | None = None, ...)) unchanged so
only async paths accept ctx.

---

Nitpick comments:
In `@graphrag_sdk/src/graphrag_sdk/core/context.py`:
- Around line 58-72: The child Context.child(...) method resets usage counters,
and GraphRAG._ingest_batch currently calls parent_ctx.child() for each source,
so the original ctx.usage remains zero after batch ingest; update the public
docstring for GraphRAG.ingest (and optionally mention in GraphRAG._ingest_batch
docs) to explicitly state: when ingesting a list of sources the passed-in ctx
will not be updated with aggregated token usage and callers should inspect each
returned IngestionResult.usage for per-source usage (or pass the parent ctx
directly to _ingest_single if they want aggregation). Reference Context.child,
GraphRAG._ingest_batch, GraphRAG.ingest, IngestionResult.usage and ctx in the
docstring to make this behavior discoverable.

In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 63-91: The async defaults (aembed_query, aembed_documents) accept
ctx but do not forward it to sync methods, while embed_documents forwards ctx to
embed_query, causing asymmetry; either make the sync path drop ctx for
consistency or document the behavior. Fix by updating the class-level docstring
to state explicitly that the default async implementations (aembed_query,
aembed_documents, ainvoke) do not propagate ctx and subclasses must override
them to support usage-tracking via ctx, and/or make embed_documents stop
forwarding ctx to embed_query so sync and async defaults behave consistently
(choose one approach and apply it to embed_documents, aembed_query,
aembed_documents, and mention ainvoke in the docstring).

In `@graphrag_sdk/tests/test_token_usage.py`:
- Line 251: Remove the redundant `@pytest.mark.asyncio` decorators that appear
above async test functions in graphrag_sdk/tests/test_token_usage.py; locate
each occurrence of the literal decorator line ("@pytest.mark.asyncio") (not the
test function bodies) and delete that line so the async test defs remain (e.g.,
the decorator currently above the async test functions at the listed
occurrences); ensure you leave the async def test_... functions intact and run
tests to verify asyncio_mode="auto" handles them.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54576f3f-53dd-4a63-a71d-d0ea039cb457

📥 Commits

Reviewing files that changed from the base of the PR and between 58e7764 and fadba63.

📒 Files selected for processing (20)

CHANGELOG.md
docs/api-reference.md
docs/getting-started.md
docs/index.md
docs/ingestion.md
docs/providers.md
docs/retrieval.md
docs/token-usage.md
graphrag_sdk/src/graphrag_sdk/__init__.py
graphrag_sdk/src/graphrag_sdk/api/main.py
graphrag_sdk/src/graphrag_sdk/core/context.py
graphrag_sdk/src/graphrag_sdk/core/models.py
graphrag_sdk/src/graphrag_sdk/core/providers/base.py
graphrag_sdk/src/graphrag_sdk/core/providers/litellm.py
graphrag_sdk/src/graphrag_sdk/core/providers/openrouter.py
graphrag_sdk/src/graphrag_sdk/ingestion/extraction_strategies/graph_extraction.py
graphrag_sdk/src/graphrag_sdk/ingestion/pipeline.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py
graphrag_sdk/src/graphrag_sdk/storage/vector_store.py
graphrag_sdk/tests/test_token_usage.py

… tokens in completion results

…ated entries

…sses

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@graphrag_sdk/src/graphrag_sdk/core/providers/base.py`:
- Around line 87-91: The async fallback aembed_documents drops the tracking
context because it calls self.embed_documents without passing ctx; update the
asyncio.to_thread call in aembed_documents to pass ctx through (i.e., call
self.embed_documents with texts, ctx=ctx, and existing **kwargs) so the
usage-tracking context is preserved when using the default sync-in-thread path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c0bf0d08-e5b6-4fec-a9d3-91a2d2af9ea6

📥 Commits

Reviewing files that changed from the base of the PR and between fadba63 and 83c9eca.

📒 Files selected for processing (4)

CHANGELOG.md
graphrag_sdk/src/graphrag_sdk/api/main.py
graphrag_sdk/src/graphrag_sdk/core/providers/base.py
graphrag_sdk/tests/test_facade.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

🚧 Files skipped from review as they are similar to previous changes (1)

graphrag_sdk/src/graphrag_sdk/api/main.py

…thod

drr00t added 3 commits May 10, 2026 21:01

feat: implement built-in token usage tracking and cost observability …

e0fb8f8

…across the RAG pipeline

Refactor code structure for improved readability and maintainability

9abd03b

refactor: improve formatting of method signatures for better readability

fadba63

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread graphrag_sdk/src/graphrag_sdk/api/main.py

Comment thread graphrag_sdk/src/graphrag_sdk/core/providers/base.py

drr00t added 4 commits May 10, 2026 22:06

fix: enhance token usage tracking to isolate retrieval and generation…

dfe4eaa

… tokens in completion results

fix: update changelog to include token usage tracking and remove outd…

5dc520c

…ated entries

refactor: simplify method signatures in Embedder and LLMInterface cla…

869db43

…sses

fix: correct formatting in embed_documents method docstring

83c9eca

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread graphrag_sdk/src/graphrag_sdk/core/providers/base.py Outdated

drr00t added 2 commits May 10, 2026 22:43

fix: pass context parameter to embed_documents in aembed_documents me…

c256666

…thod

Merge branch 'main' into feature/issue-227-expose-token-usage-stats

e9f987e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/issue 227 expose token usage stats#249

Feature/issue 227 expose token usage stats#249
drr00t wants to merge 9 commits into
FalkorDB:mainfrom
drr00t:feature/issue-227-expose-token-usage-stats

drr00t commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drr00t commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

drr00t commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading