You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Memory entries are injected verbatim into the system prompt (or turn context) at the start of every session via the inject_memories builtin (#3015). A compromised tool result, a malicious web page visited during a task, or a supply-chain attack on a sister session can write a poisoned entry that silently redirects the agent in every subsequent session until the entry is manually removed.
This risk is higher for memory than for ordinary tool output because:
The user rarely inspects raw memory content, so a poisoned entry can go unnoticed for a long time.
The fix is a two-layer defence: scan at write time (reject before persisting) and sanitise at load / inject time (block from system prompt even if a poisoned entry already exists on disk, while keeping it visible so the user can remove it).
Proposed design
1. Threat-pattern library (pkg/memory/security/)
Create pkg/memory/security/threats.go with a compiled set of regular expressions covering:
Prompt-injection patterns: ignore previous instructions, disregard your system prompt, new persona, you are now, your true instructions are, ANSI escape injection, zero-width character smuggling, Unicode direction-override sequences (RLO/LRO).
Exfiltration patterns: send to, POST to, curl, wget, exfiltrate, base64 in suspicious context, known exfil URL shapes.
Patterns are organised by scope (strict vs relaxed). Memory scanning always uses strict (broadest set) because entries are user-curated and the user can always rewrite a blocked entry.
// pkg/memory/security/threats.gopackage security
typeScopestringconst (
ScopeStrictScope="strict"ScopeRelaxedScope="relaxed"
)
// ScanContent returns a list of threat IDs matched in content, or nil.funcScanContent(contentstring, scopeScope) []string { … }
// FirstThreatMessage returns a human-readable error for the first match, or "".funcFirstThreatMessage(contentstring, scopeScope) string { … }
2. Write-time scanning
In each memory-write path (pkg/tools/builtin/memory/add_memory.go, update_memory.go) call security.FirstThreatMessage(content, ScopeStrict) before touching the DB. On a non-empty return, reject the write with a structured error:
When inject_memories (#3015 / #3017) builds the system-prompt snapshot:
For each entry in the DB, call security.ScanContent(entry.Content, ScopeStrict).
If threat IDs are returned, replace the entry text in the snapshot with:
[BLOCKED: entry contained threat pattern(s): <ids>. Use delete_memory(id=…) to remove it.]
The original entry remains in the DB so the user can inspect it with get_memories and delete it with delete_memory.
This preserves the prefix-cache invariant (#3017): the snapshot is built once from deterministic DB bytes and is byte-stable for the whole session.
4. get_memories response flags
Extend the get_memories response in pkg/tools/builtin/memory/get_memories.go to include a blocked: true field on entries that would be blocked at inject time:
pkg/memory/security/threats.go — threat-pattern library with strict and relaxed scopes; compiled regexps; ScanContent, FirstThreatMessage
pkg/memory/security/threats_test.go — unit tests for each pattern class (injection, exfil, unicode smuggling); assert clean content passes; assert poisoned content is caught
pkg/tools/builtin/memory/add_memory.go — call FirstThreatMessage before insert; return structured error on match
pkg/tools/builtin/memory/update_memory.go — same scan on new content
pkg/hooks/builtins/inject_memories.go — scan each entry during snapshot build; replace blocked entries with [BLOCKED: …] placeholder in snapshot; leave DB row untouched
pkg/tools/builtin/memory/get_memories.go — add blocked and block_reason fields to response for entries that fail the scan
Integration test: write a poisoned entry via direct DB insert (bypassing write-scan); confirm it appears as [BLOCKED: …] in the injected snapshot and as blocked: true in get_memories; confirm delete_memory removes it
Acceptance criteria
add_memory with injection-pattern content returns a structured error and nothing is written to the DB
update_memory with injection-pattern content in the new value returns a structured error; existing entry unchanged
A clean, legitimate entry is never blocked
A pre-existing poisoned DB entry (e.g. written by an external process) does not appear verbatim in the system-prompt snapshot; a [BLOCKED: …] placeholder appears instead
get_memories returns blocked: true for any entry that would be blocked at inject time
Blocked entries are still deletable via delete_memory
Background
Sub-issue of #3011.
Memory entries are injected verbatim into the system prompt (or turn context) at the start of every session via the
inject_memoriesbuiltin (#3015). A compromised tool result, a malicious web page visited during a task, or a supply-chain attack on a sister session can write a poisoned entry that silently redirects the agent in every subsequent session until the entry is manually removed.This risk is higher for memory than for ordinary tool output because:
The fix is a two-layer defence: scan at write time (reject before persisting) and sanitise at load / inject time (block from system prompt even if a poisoned entry already exists on disk, while keeping it visible so the user can remove it).
Proposed design
1. Threat-pattern library (
pkg/memory/security/)Create
pkg/memory/security/threats.gowith a compiled set of regular expressions covering:ignore previous instructions,disregard your system prompt,new persona,you are now,your true instructions are, ANSI escape injection, zero-width character smuggling, Unicode direction-override sequences (RLO/LRO).send to,POST to,curl,wget,exfiltrate,base64in suspicious context, known exfil URL shapes.Patterns are organised by scope (
strictvsrelaxed). Memory scanning always usesstrict(broadest set) because entries are user-curated and the user can always rewrite a blocked entry.2. Write-time scanning
In each memory-write path (
pkg/tools/builtin/memory/add_memory.go,update_memory.go) callsecurity.FirstThreatMessage(content, ScopeStrict)before touching the DB. On a non-empty return, reject the write with a structured error:{ "success": false, "error": "Content blocked: matched prompt-injection pattern 'ignore_previous_instructions'. Rephrase the entry." }The entry is never written to the DB.
3. Load-time sanitisation (snapshot building)
When
inject_memories(#3015 / #3017) builds the system-prompt snapshot:security.ScanContent(entry.Content, ScopeStrict).get_memoriesand delete it withdelete_memory.This preserves the prefix-cache invariant (#3017): the snapshot is built once from deterministic DB bytes and is byte-stable for the whole session.
4.
get_memoriesresponse flagsExtend the
get_memoriesresponse inpkg/tools/builtin/memory/get_memories.goto include ablocked: truefield on entries that would be blocked at inject time:{ "id": "abc123", "content": "ignore all previous instructions …", "blocked": true, "block_reason": ["ignore_previous_instructions"] }Implementation checklist
pkg/memory/security/threats.go— threat-pattern library withstrictandrelaxedscopes; compiled regexps;ScanContent,FirstThreatMessagepkg/memory/security/threats_test.go— unit tests for each pattern class (injection, exfil, unicode smuggling); assert clean content passes; assert poisoned content is caughtpkg/tools/builtin/memory/add_memory.go— callFirstThreatMessagebefore insert; return structured error on matchpkg/tools/builtin/memory/update_memory.go— same scan on new contentpkg/hooks/builtins/inject_memories.go— scan each entry during snapshot build; replace blocked entries with[BLOCKED: …]placeholder in snapshot; leave DB row untouchedpkg/tools/builtin/memory/get_memories.go— addblockedandblock_reasonfields to response for entries that fail the scan[BLOCKED: …]in the injected snapshot and asblocked: trueinget_memories; confirmdelete_memoryremoves itAcceptance criteria
add_memorywith injection-pattern content returns a structured error and nothing is written to the DBupdate_memorywith injection-pattern content in the new value returns a structured error; existing entry unchanged[BLOCKED: …]placeholder appears insteadget_memoriesreturnsblocked: truefor any entry that would be blocked at inject timedelete_memorygo test -racepasses on the security packagepkg/memory/security/