Skip to content

Agent Memory

skobeltsyn edited this page Mar 28, 2026 · 1 revision

Agent Memory

Agents are stateless by default -- each invocation starts fresh. But some agents need to accumulate knowledge over time: a reviewer that remembers patterns from past reviews, a Fibonacci generator that tracks its position in the sequence, a support bot that learns user preferences. MemoryBank provides this persistent state.


What is MemoryBank?

MemoryBank is an in-memory persistent store backed by ConcurrentHashMap<String, String>. Each agent writes under its own name as the key. The API is minimal:

class MemoryBank(val maxLines: Int = Int.MAX_VALUE) {
    fun read(key: String): String        // returns "" if key does not exist
    fun write(key: String, content: String)  // overwrites previous content
    fun entries(): Map<String, String>    // snapshot of all stored entries
}

Defined in agents_engine.core.Memory.kt.

Key properties:

  • Thread-safe. ConcurrentHashMap handles concurrent reads and writes.
  • String-valued. Memory content is always a string. Structure it however you like -- CSV, JSON, pipe-delimited, natural language.
  • Overwrite semantics. write replaces the entire content for a key. There is no append operation -- the agent is responsible for reading, modifying, and writing back.
  • No persistence beyond JVM. When the process exits, memory is gone. For durable storage, write an adapter that saves to disk or a database.

Auto-Injected Tools

When you call memory(bank) in the agent DSL, the framework automatically creates three tools and registers them in the agent's tool map:

memory_read

Retrieves the stored memory for this agent.

memory_read() -> String

Returns the content stored under the agent's name, or an empty string if nothing has been written yet.

memory_write

Overwrites the agent's memory with new content.

memory_write(content: String) -> "ok"

The content argument replaces whatever was previously stored. The tool returns "ok" on success.

memory_search

Searches the agent's memory for lines matching a query.

memory_search(query: String) -> String

Returns all lines from the agent's memory that contain the query string (case-insensitive). Lines are joined with newlines. Returns an empty string if nothing matches or memory is empty.

These tools are available to the LLM during the agentic loop. The LLM decides when to read, write, or search memory -- the framework does not force a specific pattern.


Basic Usage

Add memory to an agent with the memory(bank) DSL call:

val bank = MemoryBank()

val reviewer = agent<String, String>("reviewer") {
    prompt("You are a code reviewer. Use memory to remember patterns you have seen.")
    memory(bank)
    model { ollama("qwen2.5:7b") }
    skills {
        skill<String, String>("review", "Review code") {
            tools()  // marks as agentic -- LLM can call memory_read, memory_write, memory_search
        }
    }
}

The memory(bank) call does two things:

  1. Stores a reference to the bank on the agent (agent.memoryBank).
  2. Registers memory_read, memory_write, and memory_search as tools in the agent's tool map.

The tools are keyed to the agent's name. When the LLM calls memory_write, the content is stored under "reviewer" in the bank. When it calls memory_read, it retrieves the content stored under "reviewer".

User-defined tools take precedence

If you define a tool with the same name before calling memory(bank), the auto-injected tool does not overwrite it:

val agent = agent<String, String>("a") {
    tools { tool("memory_read") { _ -> "custom implementation" } }
    memory(bank)  // memory_read is NOT overwritten
}

Shared Memory

Pass the same MemoryBank to multiple agents. Each agent reads and writes under its own name, so data is isolated by default:

val bank = MemoryBank()

val agentA = agent<String, String>("agent-a") {
    memory(bank)
    // ...
}

val agentB = agent<String, String>("agent-b") {
    memory(bank)
    // ...
}

// agent-a writes "from-a" under key "agent-a"
// agent-b writes "from-b" under key "agent-b"
// agent-a reads "" when reading "agent-b" (different key)

For agents that need to read each other's data, use bank.read("other-agent-name") in a custom tool or in pre-seeding logic. The auto-injected tools only access the current agent's key.

You can also inspect the entire bank:

bank.entries()  // {"agent-a": "from-a", "agent-b": "from-b"}

Pre-Seeding

Write initial content to the bank before the first agent run:

val bank = MemoryBank()
bank.write("reviewer", "Known pattern: prefer val over var\nKnown pattern: use data classes for DTOs")

val reviewer = agent<String, String>("reviewer") {
    memory(bank)
    // ...
}

// When the LLM calls memory_read, it gets the pre-seeded content immediately.

This is useful for:

  • Bootstrapping an agent with domain knowledge.
  • Resuming from a previously saved state.
  • Injecting test fixtures.

The Fibonacci Example

The test in FibonacciMemoryTest.kt demonstrates the full memory pattern. An agent maintains a Fibonacci sequence using only memory tools -- no external state.

The agent

val bank = MemoryBank()

val fib = agent<String, Int>("fibonacci") {
    prompt("""You maintain a Fibonacci sequence in memory.

Memory format: "prev|curr" (example: "5|8" means prev=5 curr=8).
Empty memory means no numbers generated yet.

PROCEDURE -- do this EVERY time, no exceptions:
1. Call memory_read
2. Look at the result:
   - If empty -> new prev=0, new curr=1, answer=1
   - If "A|B" -> compute next=A+B, new prev=B, new curr=next, answer=next
3. Call memory_write with content "new_prev|new_curr"
4. Reply with ONLY the answer number

Worked examples:
  memory="" -> answer=1, write "0|1"
  memory="0|1" -> 0+1=1, answer=1, write "1|1"
  memory="1|1" -> 1+1=2, answer=2, write "1|2"
  memory="1|2" -> 1+2=3, answer=3, write "2|3"
  memory="2|3" -> 2+3=5, answer=5, write "3|5"

Rules: exactly one memory_read, exactly one memory_write, then reply with just the number.""")
    memory(bank)
    model { ollama("gpt-oss:120b-cloud"); temperature = 0.0 }
    budget { maxTurns = 5 }
    skills {
        skill<String, Int>("fib", "Generate next Fibonacci number") {
            tools()
            transformOutput { it.trim().toIntOrNull() ?: error("No int in: $it") }
        }
    }
}

The sequence

Each invocation follows the same pattern: read memory, compute, write memory, reply.

Call 1: memory="" -> answer=1, write "0|1"
Call 2: memory="0|1" -> 0+1=1, answer=1, write "1|1"
Call 3: memory="1|1" -> 1+1=2, answer=2, write "1|2"
Call 4: memory="1|2" -> 1+2=3, answer=3, write "2|3"
Call 5: memory="2|3" -> 2+3=5, answer=5, write "3|5"
assertEquals(1, fib("do it"))   // first call
assertEquals(1, fib("do it"))   // second call
assertEquals(2, fib("do it"))   // third call
assertEquals(3, fib("do it"))   // fourth call
assertEquals(5, fib("do it"))   // fifth call

Verifying memory state

You can inspect the bank directly between calls:

fib("do it"); assertEquals("0|1", bank.read("fibonacci"))
fib("do it"); assertEquals("1|1", bank.read("fibonacci"))
fib("do it"); assertEquals("1|2", bank.read("fibonacci"))
fib("do it"); assertEquals("2|3", bank.read("fibonacci"))

Pre-seeding resumes from an arbitrary point

val bank = MemoryBank()
bank.write("fibonacci", "21|34")
val fib = fibAgent(bank)

assertEquals(55,  fib("do it"))   // 21+34
assertEquals(89,  fib("do it"))   // 34+55
assertEquals(144, fib("do it"))   // 55+89

This pattern -- system prompt teaches the algorithm, memory maintains state -- generalizes to any agent that needs to accumulate knowledge across invocations.


When to Use Memory

Memory is the right choice when an agent improves with experience or needs to maintain state across calls. It is the wrong choice for stateless transformations.

Good fits for memory:

  • An agent that learns patterns from past inputs (code reviewer, support bot).
  • An agent that maintains running state (Fibonacci, counters, conversation context).
  • An agent that needs to remember user preferences or corrections.
  • Agents in a pipeline where early stages accumulate context for later stages.

Not needed:

  • Pure transformation agents (implementedBy skills).
  • Agents that receive all necessary context in their input.
  • One-shot agents that are invoked exactly once.

A practical test: if you would lose important information by restarting the agent, it needs memory. If every invocation is self-contained, it does not.


Line Cap

The maxLines constructor parameter truncates memory content, keeping only the last N lines:

val bank = MemoryBank(maxLines = 3)
bank.write("a", "line1\nline2\nline3\nline4\nline5")
bank.read("a")  // "line3\nline4\nline5"

Truncation happens at write time. The oldest lines are dropped, keeping the most recent ones. This is useful for:

  • Preventing unbounded memory growth in long-running agents.
  • Implementing a sliding window of recent observations.
  • Keeping memory focused on the most relevant recent information.

If maxLines is not specified, it defaults to Int.MAX_VALUE -- effectively unlimited:

val unlimited = MemoryBank()          // no truncation
val capped    = MemoryBank(maxLines = 100)  // keeps last 100 lines

Truncation applies through the memory_write tool as well. When the LLM writes content that exceeds the limit, only the last N lines are stored:

val bank = MemoryBank(maxLines = 2)
val agent = agent<String, String>("a") { memory(bank); /* ... */ }

// LLM calls memory_write with "a\nb\nc\nd"
// Bank stores "c\nd"

Next Steps

Clone this wiki locally