Skip to content

fix(task): centralize token tracking and preserve usage during guardrail retries#6251

Open
Jacopos311 wants to merge 17 commits into
crewAIInc:mainfrom
Jacopos311:fix/task-token-tracking
Open

fix(task): centralize token tracking and preserve usage during guardrail retries#6251
Jacopos311 wants to merge 17 commits into
crewAIInc:mainfrom
Jacopos311:fix/task-token-tracking

Conversation

@Jacopos311

@Jacopos311 Jacopos311 commented Jun 19, 2026

Copy link
Copy Markdown

This Pull Request addresses critical bugs and structural flaws within the token tracking lifecycle of Task executions, specifically optimizing metrics continuity across core execution boundaries and guardrail loops.

The implementation was highly challenging, refactoring the state management across both synchronous (_execute_core) and asynchronous (_aexecute_core) runtime flows. Ensuring state persistence during multi-layered asynchronous retry loops proved to be an incredibly complex task.

Key Fixes Implemented
Bug 1: Guardrail Token Loss Recovery

Issue: Token metrics were dropped or wiped clean whenever a guardrail failed validation and triggered a sub-task retry.

Fix: Introduced an internal token snapshot (current_token_usage) at the top of the retry loop within _ainvoke_guardrail_function. If a guardrail yields a new TaskOutput, the preceding metrics are explicitly preserved and restored.

Bug 2: Duplicate Core Tracking Redundancy

Issue: Decentralized and fragmented token calculations were being redundantly invoked inside background async workers and parallel threads, causing unstable delta spikes.

Fix: Centralized token delta computations exclusively inside the core pipeline. Wrapper functions now stream results without interfering with state counters.

Bug 3: Premature Usage Snapshots

Issue: The final snapshot of the LLM token usage was captured prior to exporting outputs into structured Pydantic models or JSON dictionaries. Any extra token overhead consumed by formatting models was entirely lost.

Fix: Shifted the tokens_after capture and _calculate_token_delta invocation structurally below the _export_output / _aexport_output blocks, ensuring 100% telemetry coverage.

Testing
Verified alignment across synchronous and asynchronous interfaces.

Ensured zero NameError bounds or uninitialized variable leaks during back-to-back guardrail re-evaluations.

Summary by CodeRabbit

  • New Features
    • Added a Mimir Memory Backend for persistent memory, including configurable database path and example setup.
    • Improved memory backend resolution to support selecting storage with optional configuration (including built-in Mimir support).
  • Bug Fixes
    • Improved task LLM token accounting, ensuring token usage is consistently recorded in task outputs, including across guardrail retries.
  • Documentation
    • Added a Mimir Memory Backend section covering prerequisites and a Python configuration example.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

No new commits to review since the last review.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4c118d60-0d8a-478b-a9f9-f30affb53634

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds MimirStorage, a new persistent memory backend that delegates to the mimir CLI via subprocess for store and search operations. The storage factory gains a config parameter and a built-in mimir spec fallback. Separately, Task execution now computes and propagates token-usage deltas through both core execution and guardrail retry paths.

Changes

Mimir Memory Backend

Layer / File(s) Summary
MimirStorage class
lib/crewai/src/crewai/memory/storage/mimir_storage.py
New StorageBackend subclass that resolves db_path, finds the mimir binary at init, validates category and query inputs to prevent flag injection, stores MemoryRecords via mimir store subprocess with MD5-derived keys, and searches via mimir search returning (MemoryRecord, score) tuples.
Factory wiring and README docs
lib/crewai/src/crewai/memory/storage/factory.py, lib/crewai/README.md
resolve_memory_storage gains an optional config parameter and a built-in mimir spec fallback that constructs MimirStorage(config=config). README documents the Mimir binary prerequisite, the mcp Python package dependency, and a MimirStorage configuration example.

Task Token Usage Tracking

Layer / File(s) Summary
Token helpers and core execution capture
lib/crewai/src/crewai/task.py
Adds _get_agent_token_usage and _calculate_token_delta helpers. Both sync (_execute_core) and async (_aexecute_core) paths snapshot agent.llm.token_usage before and after execution, compute a delta, and pass it into TaskOutput.token_usage.
Token preservation through guardrail retries
lib/crewai/src/crewai/task.py
In both sync and async guardrail invocation, current_token_usage is captured at each attempt. It is backfilled onto a returned TaskOutput that lacks token_usage, and explicitly set when constructing a retry TaskOutput.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: centralizing token tracking and preserving token usage during guardrail retries, which are core to all modifications across task.py and supporting memory infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@corridor-security corridor-security Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: This PR adds a Mimir-backed memory storage integration and adjusts task token tracking behavior around execution and guardrail retries; no exploitable security vulnerabilities were identified.

Risk: Low risk. The new external subprocess integration is not directly public-facing, uses argument-list subprocess execution without shell invocation, and validates CLI-sensitive inputs such as categories and hyphen-prefixed queries.

Recommendations:

  • Ensure deployments only enable the Mimir backend with a trusted installed mimir binary and trusted db_path configuration.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/task.py (1)

691-704: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Finalize token usage after guardrail retries finish.

token_delta is attached before guardrails run, then retry TaskOutputs copy the prior current_token_usage after executing the agent again. Tokens consumed by LLM-backed guardrails, retry execute_task/aexecute_task, and retry _export_output/_aexport_output are still omitted.

Suggested direction
-            tokens_after = self._get_agent_token_usage(agent)
-            token_delta = self._calculate_token_delta(tokens_before, tokens_after)
-
             task_output = TaskOutput(
                 name=self.name or self.description,
                 description=self.description,
                 expected_output=self.expected_output,
                 raw=raw,
@@
                 output_format=self._get_output_format(),
                 messages=agent.last_messages,  # type: ignore[attr-defined]
-                token_usage=token_delta,
+                token_usage=None,
             )

             if self._guardrails:
                 for idx, guardrail in enumerate(self._guardrails):
@@
             if self._guardrail:
                 task_output = self._invoke_guardrail_function(
                     task_output=task_output,
                     agent=agent,
                     tools=tools,
                     guardrail=self._guardrail,
                 )
+
+            tokens_after = self._get_agent_token_usage(agent)
+            task_output.token_usage = self._calculate_token_delta(
+                tokens_before, tokens_after
+            )

Apply the same placement in _aexecute_core, or alternatively accumulate each retry delta inside the guardrail helpers before constructing the retry TaskOutput.

Also applies to: 820-833, 1376-1376, 1489-1489

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/task.py` around lines 691 - 704, The token_delta
calculation is being performed and attached to TaskOutput before guardrails
execute their retries, which causes tokens consumed by guardrail execution,
retry task execution, and retry output export operations to be excluded from the
final token_usage. Move the token_delta calculation to after all guardrail retry
operations complete by either applying the same placement in the _aexecute_core
method (the async counterpart), or accumulate each retry delta within the
guardrail helper functions before constructing the retry TaskOutput objects.
This same fix should be applied at the additional locations mentioned (820-833,
1376-1376, 1489-1489) to ensure consistent token usage tracking across all code
paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/memory/storage/factory.py`:
- Around line 47-63: The `config` parameter in the `resolve_memory_storage`
function is unused because no callers pass it and it cannot be forwarded to
custom factories (which only accept `spec` per the `MemoryStorageFactory`
signature at line 27). Remove the `config` parameter from the
`resolve_memory_storage` function signature and its use in the MimirStorage
instantiation call (line 63), or alternatively, extend the
`MemoryStorageFactory` protocol/type signature to accept an optional `config`
parameter, update the custom factory invocation (line 56) to pass `config`, and
ensure all places that set memory storage factories are aware of the new
signature.

In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py`:
- Around line 89-93: The search method in MimirStorage has a signature mismatch
with the storage contract expected by shallow memory retrieval. The method
currently only accepts query as a string along with limit and scope_prefix, but
the shallow retrieval path invokes it with an embedding vector, categories
parameter, and min_score threshold. Update the search method signature to accept
all required parameters including categories and min_score, and add logic to
handle the incoming query appropriately whether it is a string or an embedding
vector. This will ensure the method aligns with the actual contract consumed by
the memory runtime and prevents runtime failures due to argument mismatches.

In `@lib/crewai/src/crewai/task.py`:
- Around line 1313-1314: The condition `and current_token_usage` performs a
truthiness check, but valid token usage objects can be falsy (zero or empty),
which causes the code to incorrectly skip backfilling token usage. Replace the
truthiness check with an explicit None check by changing `and
current_token_usage` to `and current_token_usage is not None` in both locations:
the code block shown around lines 1313-1314, and the identical pattern in the
`_ainvoke_guardrail_function` method around lines 1426-1427.
- Around line 1167-1174: The _calculate_token_delta method in the task.py file
attempts to subtract two values directly using the minus operator, but when
these values are dictionaries (as token_usage from agent.llm is dict[str, int]),
the subtraction fails silently in the exception handler and returns the
cumulative after value unchanged, causing duplicate token spikes. Fix this by
first adding explicit type checks to detect when both before and after are
dictionaries, and if they are, iterate through the keys in the after dictionary
to explicitly subtract the numeric values (int or float types) from their
corresponding before values for each key, building a delta dictionary with the
differences. Keep the fallback try-except logic for non-dict numeric types to
maintain backward compatibility.

---

Outside diff comments:
In `@lib/crewai/src/crewai/task.py`:
- Around line 691-704: The token_delta calculation is being performed and
attached to TaskOutput before guardrails execute their retries, which causes
tokens consumed by guardrail execution, retry task execution, and retry output
export operations to be excluded from the final token_usage. Move the
token_delta calculation to after all guardrail retry operations complete by
either applying the same placement in the _aexecute_core method (the async
counterpart), or accumulate each retry delta within the guardrail helper
functions before constructing the retry TaskOutput objects. This same fix should
be applied at the additional locations mentioned (820-833, 1376-1376, 1489-1489)
to ensure consistent token usage tracking across all code paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 905ee92f-52d4-49d2-b8c7-6d0bb921220c

📥 Commits

Reviewing files that changed from the base of the PR and between 9db2d44 and cd0fbaf.

📒 Files selected for processing (4)
  • lib/crewai/README.md
  • lib/crewai/src/crewai/memory/storage/factory.py
  • lib/crewai/src/crewai/memory/storage/mimir_storage.py
  • lib/crewai/src/crewai/task.py

Comment thread lib/crewai/src/crewai/memory/storage/factory.py
Comment thread lib/crewai/src/crewai/memory/storage/mimir_storage.py Outdated
Comment thread lib/crewai/src/crewai/task.py
Comment thread lib/crewai/src/crewai/task.py Outdated
@Jacopos311

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/memory/storage/mimir_storage.py (1)

59-63: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix MemoryRecord field mismatch (value vs content) to avoid runtime failures.

Line 59 reads record.value, but MemoryRecord exposes content. Line 132 constructs MemoryRecord(value=...), which misses required content. This will break both save and search paths at runtime.

Proposed patch
-            value_str = str(record.value)
+            value_str = str(record.content)
@@
-                content_text = res.get("value", res.get("text", ""))
+                content_text = res.get("value", res.get("text", ""))
@@
-                record = MemoryRecord(value=content_text, metadata=meta)
+                record = MemoryRecord(content=content_text, metadata=meta)

Also applies to: 132-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py` around lines 59 - 63,
There is a field mismatch in the MemoryRecord class usage. The code at line 59
attempts to access record.value, but the MemoryRecord class exposes a content
field instead. Similarly, at line 132, MemoryRecord is being instantiated with a
value parameter, but it requires the content parameter. Replace all occurrences
of record.value with record.content and change the MemoryRecord constructor call
from using value= to using content= to match the actual field name exposed by
the MemoryRecord class.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py`:
- Around line 92-97: The search_memory method (and the related method around
lines 121-135) accept **kwargs but do not apply the metadata_filter parameter,
which violates the storage interface contract and can return unexpected records.
Additionally, the default values for limit (currently 3) and min_score
(currently None) do not match the protocol defaults. Fix this by extracting
metadata_filter from kwargs or adding it as an explicit parameter and applying
it during the search operation, and update the default values for limit and
min_score to align with the storage protocol specification. Apply these same
changes consistently to both affected methods.

---

Outside diff comments:
In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py`:
- Around line 59-63: There is a field mismatch in the MemoryRecord class usage.
The code at line 59 attempts to access record.value, but the MemoryRecord class
exposes a content field instead. Similarly, at line 132, MemoryRecord is being
instantiated with a value parameter, but it requires the content parameter.
Replace all occurrences of record.value with record.content and change the
MemoryRecord constructor call from using value= to using content= to match the
actual field name exposed by the MemoryRecord class.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9d17a345-755b-4623-97bb-91b92621e6e1

📥 Commits

Reviewing files that changed from the base of the PR and between bba8015 and 539d2f9.

📒 Files selected for processing (2)
  • lib/crewai/src/crewai/memory/storage/factory.py
  • lib/crewai/src/crewai/memory/storage/mimir_storage.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/crewai/src/crewai/memory/storage/factory.py

Comment thread lib/crewai/src/crewai/memory/storage/mimir_storage.py Outdated
@Jacopos311

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants