fix(task): centralize token tracking and preserve usage during guardrail retries#6251
fix(task): centralize token tracking and preserve usage during guardrail retries#6251Jacopos311 wants to merge 17 commits into
Conversation
… protocol and addressing CodeRabbit feedback
|
Important Review skippedNo new commits to review since the last review. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds ChangesMimir Memory Backend
Task Token Usage Tracking
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Summary: This PR adds a Mimir-backed memory storage integration and adjusts task token tracking behavior around execution and guardrail retries; no exploitable security vulnerabilities were identified.
Risk: Low risk. The new external subprocess integration is not directly public-facing, uses argument-list subprocess execution without shell invocation, and validates CLI-sensitive inputs such as categories and hyphen-prefixed queries.
Recommendations:
- Ensure deployments only enable the Mimir backend with a trusted installed
mimirbinary and trusteddb_pathconfiguration.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/task.py (1)
691-704:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftFinalize token usage after guardrail retries finish.
token_deltais attached before guardrails run, then retryTaskOutputs copy the priorcurrent_token_usageafter executing the agent again. Tokens consumed by LLM-backed guardrails, retryexecute_task/aexecute_task, and retry_export_output/_aexport_outputare still omitted.Suggested direction
- tokens_after = self._get_agent_token_usage(agent) - token_delta = self._calculate_token_delta(tokens_before, tokens_after) - task_output = TaskOutput( name=self.name or self.description, description=self.description, expected_output=self.expected_output, raw=raw, @@ output_format=self._get_output_format(), messages=agent.last_messages, # type: ignore[attr-defined] - token_usage=token_delta, + token_usage=None, ) if self._guardrails: for idx, guardrail in enumerate(self._guardrails): @@ if self._guardrail: task_output = self._invoke_guardrail_function( task_output=task_output, agent=agent, tools=tools, guardrail=self._guardrail, ) + + tokens_after = self._get_agent_token_usage(agent) + task_output.token_usage = self._calculate_token_delta( + tokens_before, tokens_after + )Apply the same placement in
_aexecute_core, or alternatively accumulate each retry delta inside the guardrail helpers before constructing the retryTaskOutput.Also applies to: 820-833, 1376-1376, 1489-1489
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/src/crewai/task.py` around lines 691 - 704, The token_delta calculation is being performed and attached to TaskOutput before guardrails execute their retries, which causes tokens consumed by guardrail execution, retry task execution, and retry output export operations to be excluded from the final token_usage. Move the token_delta calculation to after all guardrail retry operations complete by either applying the same placement in the _aexecute_core method (the async counterpart), or accumulate each retry delta within the guardrail helper functions before constructing the retry TaskOutput objects. This same fix should be applied at the additional locations mentioned (820-833, 1376-1376, 1489-1489) to ensure consistent token usage tracking across all code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/memory/storage/factory.py`:
- Around line 47-63: The `config` parameter in the `resolve_memory_storage`
function is unused because no callers pass it and it cannot be forwarded to
custom factories (which only accept `spec` per the `MemoryStorageFactory`
signature at line 27). Remove the `config` parameter from the
`resolve_memory_storage` function signature and its use in the MimirStorage
instantiation call (line 63), or alternatively, extend the
`MemoryStorageFactory` protocol/type signature to accept an optional `config`
parameter, update the custom factory invocation (line 56) to pass `config`, and
ensure all places that set memory storage factories are aware of the new
signature.
In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py`:
- Around line 89-93: The search method in MimirStorage has a signature mismatch
with the storage contract expected by shallow memory retrieval. The method
currently only accepts query as a string along with limit and scope_prefix, but
the shallow retrieval path invokes it with an embedding vector, categories
parameter, and min_score threshold. Update the search method signature to accept
all required parameters including categories and min_score, and add logic to
handle the incoming query appropriately whether it is a string or an embedding
vector. This will ensure the method aligns with the actual contract consumed by
the memory runtime and prevents runtime failures due to argument mismatches.
In `@lib/crewai/src/crewai/task.py`:
- Around line 1313-1314: The condition `and current_token_usage` performs a
truthiness check, but valid token usage objects can be falsy (zero or empty),
which causes the code to incorrectly skip backfilling token usage. Replace the
truthiness check with an explicit None check by changing `and
current_token_usage` to `and current_token_usage is not None` in both locations:
the code block shown around lines 1313-1314, and the identical pattern in the
`_ainvoke_guardrail_function` method around lines 1426-1427.
- Around line 1167-1174: The _calculate_token_delta method in the task.py file
attempts to subtract two values directly using the minus operator, but when
these values are dictionaries (as token_usage from agent.llm is dict[str, int]),
the subtraction fails silently in the exception handler and returns the
cumulative after value unchanged, causing duplicate token spikes. Fix this by
first adding explicit type checks to detect when both before and after are
dictionaries, and if they are, iterate through the keys in the after dictionary
to explicitly subtract the numeric values (int or float types) from their
corresponding before values for each key, building a delta dictionary with the
differences. Keep the fallback try-except logic for non-dict numeric types to
maintain backward compatibility.
---
Outside diff comments:
In `@lib/crewai/src/crewai/task.py`:
- Around line 691-704: The token_delta calculation is being performed and
attached to TaskOutput before guardrails execute their retries, which causes
tokens consumed by guardrail execution, retry task execution, and retry output
export operations to be excluded from the final token_usage. Move the
token_delta calculation to after all guardrail retry operations complete by
either applying the same placement in the _aexecute_core method (the async
counterpart), or accumulate each retry delta within the guardrail helper
functions before constructing the retry TaskOutput objects. This same fix should
be applied at the additional locations mentioned (820-833, 1376-1376, 1489-1489)
to ensure consistent token usage tracking across all code paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 905ee92f-52d4-49d2-b8c7-6d0bb921220c
📒 Files selected for processing (4)
lib/crewai/README.mdlib/crewai/src/crewai/memory/storage/factory.pylib/crewai/src/crewai/memory/storage/mimir_storage.pylib/crewai/src/crewai/task.py
…t guardrail none checks
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/memory/storage/mimir_storage.py (1)
59-63:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix
MemoryRecordfield mismatch (valuevscontent) to avoid runtime failures.Line 59 reads
record.value, butMemoryRecordexposescontent. Line 132 constructsMemoryRecord(value=...), which misses requiredcontent. This will break both save and search paths at runtime.Proposed patch
- value_str = str(record.value) + value_str = str(record.content) @@ - content_text = res.get("value", res.get("text", "")) + content_text = res.get("value", res.get("text", "")) @@ - record = MemoryRecord(value=content_text, metadata=meta) + record = MemoryRecord(content=content_text, metadata=meta)Also applies to: 132-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py` around lines 59 - 63, There is a field mismatch in the MemoryRecord class usage. The code at line 59 attempts to access record.value, but the MemoryRecord class exposes a content field instead. Similarly, at line 132, MemoryRecord is being instantiated with a value parameter, but it requires the content parameter. Replace all occurrences of record.value with record.content and change the MemoryRecord constructor call from using value= to using content= to match the actual field name exposed by the MemoryRecord class.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py`:
- Around line 92-97: The search_memory method (and the related method around
lines 121-135) accept **kwargs but do not apply the metadata_filter parameter,
which violates the storage interface contract and can return unexpected records.
Additionally, the default values for limit (currently 3) and min_score
(currently None) do not match the protocol defaults. Fix this by extracting
metadata_filter from kwargs or adding it as an explicit parameter and applying
it during the search operation, and update the default values for limit and
min_score to align with the storage protocol specification. Apply these same
changes consistently to both affected methods.
---
Outside diff comments:
In `@lib/crewai/src/crewai/memory/storage/mimir_storage.py`:
- Around line 59-63: There is a field mismatch in the MemoryRecord class usage.
The code at line 59 attempts to access record.value, but the MemoryRecord class
exposes a content field instead. Similarly, at line 132, MemoryRecord is being
instantiated with a value parameter, but it requires the content parameter.
Replace all occurrences of record.value with record.content and change the
MemoryRecord constructor call from using value= to using content= to match the
actual field name exposed by the MemoryRecord class.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9d17a345-755b-4623-97bb-91b92621e6e1
📒 Files selected for processing (2)
lib/crewai/src/crewai/memory/storage/factory.pylib/crewai/src/crewai/memory/storage/mimir_storage.py
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/crewai/src/crewai/memory/storage/factory.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
This Pull Request addresses critical bugs and structural flaws within the token tracking lifecycle of Task executions, specifically optimizing metrics continuity across core execution boundaries and guardrail loops.
The implementation was highly challenging, refactoring the state management across both synchronous (_execute_core) and asynchronous (_aexecute_core) runtime flows. Ensuring state persistence during multi-layered asynchronous retry loops proved to be an incredibly complex task.
Key Fixes Implemented
Bug 1: Guardrail Token Loss Recovery
Issue: Token metrics were dropped or wiped clean whenever a guardrail failed validation and triggered a sub-task retry.
Fix: Introduced an internal token snapshot (current_token_usage) at the top of the retry loop within _ainvoke_guardrail_function. If a guardrail yields a new TaskOutput, the preceding metrics are explicitly preserved and restored.
Bug 2: Duplicate Core Tracking Redundancy
Issue: Decentralized and fragmented token calculations were being redundantly invoked inside background async workers and parallel threads, causing unstable delta spikes.
Fix: Centralized token delta computations exclusively inside the core pipeline. Wrapper functions now stream results without interfering with state counters.
Bug 3: Premature Usage Snapshots
Issue: The final snapshot of the LLM token usage was captured prior to exporting outputs into structured Pydantic models or JSON dictionaries. Any extra token overhead consumed by formatting models was entirely lost.
Fix: Shifted the tokens_after capture and _calculate_token_delta invocation structurally below the _export_output / _aexport_output blocks, ensuring 100% telemetry coverage.
Testing
Verified alignment across synchronous and asynchronous interfaces.
Ensured zero NameError bounds or uninitialized variable leaks during back-to-back guardrail re-evaluations.
Summary by CodeRabbit