English | 中文
LLM-supervised persistent memory for AI agents.
LLM agents forget everything between sessions. Context compaction drops critical decisions, cross-session knowledge vanishes, and long conversations push early information out of the window.
Mnemon gives your agent persistent, cross-session memory — a four-graph knowledge store with intent-aware recall, importance decay, and automatic deduplication. Single binary, zero API keys, one setup command.
Claude Max / Pro subscriber? Mnemon works entirely through your existing subscription — no separate API key required. Your LLM subscription is the intelligence layer. Two commands and you're done.
Most memory tools embed their own LLM inside the pipeline. Mnemon takes a different approach: your host LLM is the supervisor. The binary handles deterministic computation (storage, graph indexing, search, decay); the LLM makes judgment calls (what to remember, how to link, when to forget). No middleman, no extra inference cost.
| Pattern | LLM Role | Representative |
|---|---|---|
| LLM-Embedded | Executor inside the pipeline | Mem0, Letta |
| File Injection | None — reads file at session start | Claude Code Memory |
| MCP Server | Tool provider via MCP protocol | claude-mem |
| LLM-Supervised | External supervisor of a standalone binary | Mnemon |
Mnemon also addresses a gap in the protocol stack. MCP standardizes how LLMs discover and invoke tools. ODBC/JDBC standardizes how applications access databases. But how LLMs interact with databases using memory semantics — this layer has no protocol. Mnemon's three primitives — remember, link, recall — form an intent-native protocol: command names map to the LLM's cognitive vocabulary (remember not INSERT, recall not SELECT), and output is structured JSON with signal transparency rather than raw database rows.
The LLM-Supervised pattern: hooks drive the lifecycle, the host LLM makes judgment calls, the binary handles deterministic computation.
Memory has a compound interest effect — the longer it accumulates, the greater its value. LLM engines iterate constantly, skill files cost nearly nothing to write, but memory is a private asset that grows with the user. It is the only component in the agent ecosystem worth deep investment.
A real knowledge graph built by Mnemon — 87 insights, 2150 edges across four graph types.
See Design & Architecture for details.
Homebrew (macOS / Linux):
brew install mnemon-dev/tap/mnemonGo install:
go install github.com/mnemon-dev/mnemon@latestFrom source:
git clone https://github.com/mnemon-dev/mnemon.git && cd mnemon
make installVerify installation:
mnemon --versionmnemon setupmnemon setup auto-detects Claude Code, then interactively deploys skill, hooks, and behavioral guide. Start a new session — memory just works.
mnemon setup --target openclaw --yesOne command deploys skill, hook, plugin, and behavioral guide to ~/.openclaw/. Restart the OpenClaw gateway to activate.
NanoClaw runs agents inside Linux containers. Use the /add-mnemon skill to integrate:
- Install mnemon on the host (see above)
- In your NanoClaw project, run
/add-mnemon— Claude Code will modify the Dockerfile, add a container skill, and set up volume mounts - Each WhatsApp group gets its own isolated memory store, with optional global shared memory (read-only)
The skill is available at .claude/skills/add-mnemon/ in the NanoClaw repo.
mnemon setup --ejectOnce set up, memory operates transparently — you use your LLM CLI as usual. Mnemon integrates via Claude Code's hook system, injecting memory operations at key lifecycle points:
Session starts
│
▼
Prime (SessionStart) ─── prime.sh ──→ load guide.md (memory execution manual)
│
▼
User sends message
│
▼
Remind (UserPromptSubmit) ─── user_prompt.sh ──→ remind agent to recall & remember
│
▼
LLM generates response (guided by skill + guide.md rules)
│
▼
Nudge (Stop) ─── stop.sh ──→ remind agent to remember
│
▼
(when context compacts)
Compact (PreCompact) ─── compact.sh ──→ extract critical insights to remember
Four hooks drive the memory lifecycle. Prime loads the behavioral guide — a detailed execution manual for recall, remember, and sub-agent delegation. Remind prompts the agent to evaluate recall and remember before starting work. Nudge reminds the agent to consider remember after finishing work. Compact instructs the agent to extract and save critical insights before context compression. The skill file teaches command syntax. The guide (~/.mnemon/prompt/guide.md) defines the detailed rules for when to recall, what to remember, and how to delegate.
You don't run mnemon commands yourself. The agent does — driven by hooks and guided by the skill and behavioral guide.
- Zero user-side operation — install once, memory runs in the background via hooks
- LLM-supervised — the host LLM decides what to remember, update, and forget; no embedded LLM, no API keys
- Hook-based integration — four lifecycle hooks: Prime (load guide), Remind (recall & remember), Nudge (remember), and Compact (save before compression)
- Four-graph architecture — temporal, entity, causal, and semantic edges, not just vector similarity
- Intent-native protocol — three primitives (
remember,link,recall) map to the LLM's cognitive vocabulary, not database syntax; structured JSON output with signal transparency - Intent-aware recall — graph traversal + optional vector search (RRF fusion), enabled by default for all queries
- Built-in deduplication —
rememberauto-detects duplicates and conflicts; skips or auto-replaces - Retention lifecycle — importance decay, access-count boosting, and garbage collection
- Optional embeddings — works fully without Ollama; add local Ollama for enhanced vector+keyword hybrid search
All your local agentic AIs — across sessions and frameworks — sharing one pool of live memory.
Claude Code ──┐
│
OpenClaw ─────┤
│
NanoClaw ─────┤
├──▶ ~/.mnemon ◀── shared memory
OpenCode ─────┤
│
Gemini CLI ───┘
The foundation is in place: a single ~/.mnemon database that any agent can read and write. Claude Code's hook integration is the reference implementation; OpenClaw uses a plugin-based approach; NanoClaw integrates via container skills and volume mounts. The same pattern can be replicated for any LLM CLI that supports event hooks or system prompts.
The longer-term direction is a memory gateway: protocol decoupled from storage engine. The current SQLite backend is the first adapter; the protocol surface (remember / link / recall) can sit on top of PostgreSQL, Neo4j, or any graph database. Agent-side optimization (when to recall, what to remember) and storage-side optimization (indexing, graph algorithms) evolve independently. See Future Direction for details.
Do different sessions share memory?
Yes. By default, all sessions use the same default store — a decision remembered in one session is available in every future session.
Can I isolate memory per project or agent? Yes. Use named stores to separate memory:
mnemon store create work # create a new store
mnemon store set work # set as default
MNEMON_STORE=work mnemon recall "query" # or use env var per-processDifferent agents/processes can use different stores via the MNEMON_STORE environment variable — no global state contention.
Local or global mode?
mnemon setup defaults to local (project-scoped .claude/), recommended for most users. Global (mnemon setup --global, installed to ~/.claude/) activates mnemon across all projects — convenient if you want other frameworks (e.g., OpenClaw) to share memory by forwarding requests through Claude Code CLI, but may add maintenance overhead.
How do I customize the behavior?
Edit ~/.mnemon/prompt/guide.md. This file controls when the agent recalls memories and what it considers worth remembering. The skill file (SKILL.md) is auto-deployed and should not need manual editing.
What is sub-agent delegation?
Memory writes don't happen in the main conversation. The host LLM (e.g., Opus) decides what to remember, then delegates the actual mnemon remember execution to a lightweight sub-agent (e.g., Sonnet). This saves tokens and keeps memory operations out of the main context.
| Environment Variable | Default | Description |
|---|---|---|
MNEMON_DATA_DIR |
~/.mnemon |
Base data directory |
MNEMON_STORE |
(active file or default) |
Named memory store for data isolation |
Ollama-specific (only relevant if using embeddings):
| Environment Variable | Default | Description |
|---|---|---|
MNEMON_EMBED_ENDPOINT |
http://localhost:11434 |
Ollama API endpoint |
MNEMON_EMBED_MODEL |
nomic-embed-text |
Embedding model name |
make build # build binary
make install # build + install to $GOBIN
make test # run E2E test suite
mnemon setup # interactive setup
mnemon setup --eject # remove all integrations
make help # show all targetsDependencies: Go 1.24+, modernc.org/sqlite, spf13/cobra, google/uuid
- Design & Architecture — philosophy, algorithms, integration design
- Usage & Reference — CLI commands, embedding support, architecture overview
- Architecture Diagrams — system architecture, pipelines, lifecycle management
Mnemon combines the paradigm of one paper with the methodology of another, grounded in the structural insight that graph memory is isomorphic to LLM attention. See Theoretical Foundations for details.
- RLM — Zhang, Kraska & Khattab. Recursive Language Models. 2025. Establishes the paradigm: LLMs are more effective as orchestrators of external environments than as direct data processors.
- MAGMA — Zou et al. A Multi-Graph based Agentic Memory Architecture. 2025. Provides the methodology: four-graph model (temporal, entity, causal, semantic) with intent-adaptive retrieval.
- Graph-LLM Structural Insight — Joshi & Zhu. Building Powerful GNNs from Transformers. 2025; and the Graph-based Agent Memory survey (Chang Yang et al., 2026). Confirms that LLM attention is computationally equivalent to GNN operations — graph memory is a structural match, not an engineering convenience.