-
Notifications
You must be signed in to change notification settings - Fork 0
Agent Memory
Agents are stateless by default -- each invocation starts fresh. But some agents need to accumulate knowledge over time: a reviewer that remembers patterns from past reviews, a Fibonacci generator that tracks its position in the sequence, a support bot that learns user preferences. MemoryBank provides this persistent state.
MemoryBank is an in-memory persistent store backed by ConcurrentHashMap<String, String>. Each agent writes under its own name as the key. The API is minimal:
class MemoryBank(val maxLines: Int = Int.MAX_VALUE) {
fun read(key: String): String // returns "" if key does not exist
fun write(key: String, content: String) // overwrites previous content
fun entries(): Map<String, String> // snapshot of all stored entries
}Defined in agents_engine.core.Memory.kt.
Key properties:
-
Thread-safe.
ConcurrentHashMaphandles concurrent reads and writes. - String-valued. Memory content is always a string. Structure it however you like -- CSV, JSON, pipe-delimited, natural language.
-
Overwrite semantics.
writereplaces the entire content for a key. There is no append operation -- the agent is responsible for reading, modifying, and writing back. - No persistence beyond JVM. When the process exits, memory is gone. For durable storage, write an adapter that saves to disk or a database.
When you call memory(bank) in the agent DSL, the framework automatically creates three tools and registers them in the agent's tool map:
Retrieves the stored memory for this agent.
memory_read() -> String
Returns the content stored under the agent's name, or an empty string if nothing has been written yet.
Overwrites the agent's memory with new content.
memory_write(content: String) -> "ok"
The content argument replaces whatever was previously stored. The tool returns "ok" on success.
Searches the agent's memory for lines matching a query.
memory_search(query: String) -> String
Returns all lines from the agent's memory that contain the query string (case-insensitive). Lines are joined with newlines. Returns an empty string if nothing matches or memory is empty.
These tools are available to the LLM during the agentic loop. The LLM decides when to read, write, or search memory -- the framework does not force a specific pattern.
Add memory to an agent with the memory(bank) DSL call:
val bank = MemoryBank()
val reviewer = agent<String, String>("reviewer") {
prompt("You are a code reviewer. Use memory to remember patterns you have seen.")
memory(bank)
model { ollama("qwen2.5:7b") }
skills {
skill<String, String>("review", "Review code") {
tools() // marks as agentic -- LLM can call memory_read, memory_write, memory_search
}
}
}The memory(bank) call does two things:
- Stores a reference to the bank on the agent (
agent.memoryBank). - Registers
memory_read,memory_write, andmemory_searchas tools in the agent's tool map.
The tools are keyed to the agent's name. When the LLM calls memory_write, the content is stored under "reviewer" in the bank. When it calls memory_read, it retrieves the content stored under "reviewer".
If you define a tool with the same name before calling memory(bank), the auto-injected tool does not overwrite it:
val agent = agent<String, String>("a") {
tools { tool("memory_read") { _ -> "custom implementation" } }
memory(bank) // memory_read is NOT overwritten
}Pass the same MemoryBank to multiple agents. Each agent reads and writes under its own name, so data is isolated by default:
val bank = MemoryBank()
val agentA = agent<String, String>("agent-a") {
memory(bank)
// ...
}
val agentB = agent<String, String>("agent-b") {
memory(bank)
// ...
}
// agent-a writes "from-a" under key "agent-a"
// agent-b writes "from-b" under key "agent-b"
// agent-a reads "" when reading "agent-b" (different key)For agents that need to read each other's data, use bank.read("other-agent-name") in a custom tool or in pre-seeding logic. The auto-injected tools only access the current agent's key.
You can also inspect the entire bank:
bank.entries() // {"agent-a": "from-a", "agent-b": "from-b"}Write initial content to the bank before the first agent run:
val bank = MemoryBank()
bank.write("reviewer", "Known pattern: prefer val over var\nKnown pattern: use data classes for DTOs")
val reviewer = agent<String, String>("reviewer") {
memory(bank)
// ...
}
// When the LLM calls memory_read, it gets the pre-seeded content immediately.This is useful for:
- Bootstrapping an agent with domain knowledge.
- Resuming from a previously saved state.
- Injecting test fixtures.
The test in FibonacciMemoryTest.kt demonstrates the full memory pattern. An agent maintains a Fibonacci sequence using only memory tools -- no external state.
val bank = MemoryBank()
val fib = agent<String, Int>("fibonacci") {
prompt("""You maintain a Fibonacci sequence in memory.
Memory format: "prev|curr" (example: "5|8" means prev=5 curr=8).
Empty memory means no numbers generated yet.
PROCEDURE -- do this EVERY time, no exceptions:
1. Call memory_read
2. Look at the result:
- If empty -> new prev=0, new curr=1, answer=1
- If "A|B" -> compute next=A+B, new prev=B, new curr=next, answer=next
3. Call memory_write with content "new_prev|new_curr"
4. Reply with ONLY the answer number
Worked examples:
memory="" -> answer=1, write "0|1"
memory="0|1" -> 0+1=1, answer=1, write "1|1"
memory="1|1" -> 1+1=2, answer=2, write "1|2"
memory="1|2" -> 1+2=3, answer=3, write "2|3"
memory="2|3" -> 2+3=5, answer=5, write "3|5"
Rules: exactly one memory_read, exactly one memory_write, then reply with just the number.""")
memory(bank)
model { ollama("gpt-oss:120b-cloud"); temperature = 0.0 }
budget { maxTurns = 5 }
skills {
skill<String, Int>("fib", "Generate next Fibonacci number") {
tools()
transformOutput { it.trim().toIntOrNull() ?: error("No int in: $it") }
}
}
}Each invocation follows the same pattern: read memory, compute, write memory, reply.
Call 1: memory="" -> answer=1, write "0|1"
Call 2: memory="0|1" -> 0+1=1, answer=1, write "1|1"
Call 3: memory="1|1" -> 1+1=2, answer=2, write "1|2"
Call 4: memory="1|2" -> 1+2=3, answer=3, write "2|3"
Call 5: memory="2|3" -> 2+3=5, answer=5, write "3|5"
assertEquals(1, fib("do it")) // first call
assertEquals(1, fib("do it")) // second call
assertEquals(2, fib("do it")) // third call
assertEquals(3, fib("do it")) // fourth call
assertEquals(5, fib("do it")) // fifth callYou can inspect the bank directly between calls:
fib("do it"); assertEquals("0|1", bank.read("fibonacci"))
fib("do it"); assertEquals("1|1", bank.read("fibonacci"))
fib("do it"); assertEquals("1|2", bank.read("fibonacci"))
fib("do it"); assertEquals("2|3", bank.read("fibonacci"))val bank = MemoryBank()
bank.write("fibonacci", "21|34")
val fib = fibAgent(bank)
assertEquals(55, fib("do it")) // 21+34
assertEquals(89, fib("do it")) // 34+55
assertEquals(144, fib("do it")) // 55+89This pattern -- system prompt teaches the algorithm, memory maintains state -- generalizes to any agent that needs to accumulate knowledge across invocations.
Memory is the right choice when an agent improves with experience or needs to maintain state across calls. It is the wrong choice for stateless transformations.
Good fits for memory:
- An agent that learns patterns from past inputs (code reviewer, support bot).
- An agent that maintains running state (Fibonacci, counters, conversation context).
- An agent that needs to remember user preferences or corrections.
- Agents in a pipeline where early stages accumulate context for later stages.
Not needed:
- Pure transformation agents (
implementedByskills). - Agents that receive all necessary context in their input.
- One-shot agents that are invoked exactly once.
A practical test: if you would lose important information by restarting the agent, it needs memory. If every invocation is self-contained, it does not.
The maxLines constructor parameter truncates memory content, keeping only the last N lines:
val bank = MemoryBank(maxLines = 3)
bank.write("a", "line1\nline2\nline3\nline4\nline5")
bank.read("a") // "line3\nline4\nline5"Truncation happens at write time. The oldest lines are dropped, keeping the most recent ones. This is useful for:
- Preventing unbounded memory growth in long-running agents.
- Implementing a sliding window of recent observations.
- Keeping memory focused on the most relevant recent information.
If maxLines is not specified, it defaults to Int.MAX_VALUE -- effectively unlimited:
val unlimited = MemoryBank() // no truncation
val capped = MemoryBank(maxLines = 100) // keeps last 100 linesTruncation applies through the memory_write tool as well. When the LLM writes content that exceeds the limit, only the last N lines are stored:
val bank = MemoryBank(maxLines = 2)
val agent = agent<String, String>("a") { memory(bank); /* ... */ }
// LLM calls memory_write with "a\nb\nc\nd"
// Bank stores "c\nd"- @Generable & @Guide -- Typed LLM output with annotations and lenient parsing.
- Sealed Types & Branching -- Multi-shape outputs that drive conditional routing.
- Model & Tool Calling -- The agentic loop that memory tools participate in.
- Composition: Pipeline -- Chain agents into sequential workflows.
Getting Started
Core Concepts
Composition Operators
LLM Integration
- Model & Tool Calling
- Tool Error Recovery
- Skill Selection & Routing
- Budget Controls
- Observability Hooks
Guided Generation
Agent Memory
Reference