Give your AI agents a memory that persists, searches by meaning, and lives in plain files on your own machine.
AI agents forget everything between sessions. Every conversation starts from scratch — no recall of past decisions, no accumulated knowledge, no continuity. You can wire up a database, but then you're running infrastructure and writing queries instead of building your agent.
Engram gives agents persistent memory through a simple REST API. Write a memory, search for it later by meaning, and everything is stored as readable Markdown files you control. No cloud service, no API keys, no database to manage. Point it at a directory and start storing memories.
When you write a memory, Engram checks whether you already have a similar one. If it's genuinely new, it's added. If it's a duplicate, the existing one is kept. If it's an update, the old memory is replaced — preserving its importance score. You decide how strict the deduplication is, and you can bring your own LLM to make the call when similarity is ambiguous.
Each agent gets its own namespace, so multiple agents can share the same Engram instance without stepping on each other. Search combines vector similarity with keyword matching, weighted by importance. If embeddings aren't available, CRUD still works — search returns 503 until you fix the embedding provider.
- Persistent memory — store text memories that survive across sessions, each one a human-readable Markdown file
- Smart deduplication — every write checks for similar memories: add new ones, ignore duplicates, or update existing ones with preserved importance
- LLM-assisted decisions — when similarity is ambiguous, consult a local LLM to decide whether to add, update, or ignore
- Semantic search — find memories by meaning, not just exact keyword matches
- Importance scoring — tag memories with priority; scores decay over time and get bumped on retrieval
- Multi-agent isolation — each agent gets its own namespace; no overlap, no conflicts
- Local-first and private — no cloud, no API keys, no telemetry. Your data stays on your machine
- Human-readable storage — every memory is a Markdown file you can read, edit, and version-control
- Automatic indexing — memories are chunked and indexed as you write them, no manual rebuilds
- Graceful degradation — CRUD works even without embeddings; search returns 503 until the provider is available
Engram is for developers building AI agents who need those agents to remember things across conversations. If you're working with LLM-based tools, chatbots, or autonomous agents and need persistent, searchable memory without running a database server, Engram solves that problem.
Engram runs a local HTTP server. Agents interact with it through a REST API — create, read, list, delete, and search memories. Each memory is stored as a Markdown file with YAML frontmatter inside a vault directory you choose. A LanceDB index handles search, combining vector embeddings with keyword matching and importance-weighted reranking.
When you create a memory, the smart write pipeline runs: the content is embedded, compared against existing memories, and a decision is made — add it as new, update an existing one, or ignore it as a duplicate. When you search, results are ranked by relevance and importance, and each result's importance score is decayed and bumped so frequently accessed memories stay fresh.
When you create a memory, it looks like this on disk:
---
agent: my-agent
created: "2026-05-03T12:00:00+00:00"
id: deployed-v2-to-production-my-agent-2026-05-03
importance: 0.9
importance_updated: "2026-05-03T12:00:00+00:00"
tags:
- deploy
- production
type: memory
updated: "2026-05-03T12:00:00+00:00"
---
Deployed v2 to productionYou can open this file in any text editor, edit it directly, or put the vault directory under version control. The search index rebuilds from these files automatically.
uv sync --extra devThis command works on all platforms. It creates a virtual environment and installs Engram with all dependencies. Do not create a virtual environment manually — uv manages its own .venv.
Engram stores memories as Markdown files inside a vault directory. Choose any location — for example:
| Platform | Example path |
|---|---|
| macOS | /Users/you/.engram/vault |
| Linux | /home/you/.engram/vault |
| Windows | C:\Users\you\.engram\vault |
You can also point Engram at an existing Obsidian vault — any directory works.
Create the directory you chose:
# macOS / Linux
mkdir -p ~/.engram/vault# Windows PowerShell
New-Item -ItemType Directory -Path "$env:USERPROFILE\.engram\vault" -Force# Windows CMD
mkdir "%USERPROFILE%\.engram\vault"Copy the example configuration file:
# macOS / Linux
cp .env.example .env# Windows PowerShell
Copy-Item .env.example .env# Windows CMD
copy .env.example .envThen edit .env and set ENGRAM_VAULT_PATH to the directory you created:
# macOS / Linux
ENGRAM_VAULT_PATH=~/.engram/vault# Windows
ENGRAM_VAULT_PATH=C:\Users\you\.engram\vaultENGRAM_VAULT_PATH is the only required variable. All others have defaults.
uv run engram startThis command works on all platforms. When the server starts, you will see:
2026-05-03 12:00:00.000 | INFO | engram.cli.cli:start:142 - Starting Engram on 127.0.0.1:7777
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:7777 (Press CTRL+C to quit)
The timestamp, line number, and PID vary each time. The key line is Uvicorn running on http://127.0.0.1:7777.
curl http://127.0.0.1:7777/healthThis command is the same on all platforms.
Response:
{
"status": "healthy",
"version": "1.2.0",
"components": {
"vault": "healthy",
"lancedb": "healthy",
"embeddings": "healthy"
}
}On Windows PowerShell, use Invoke-RestMethod http://127.0.0.1:7777/health | ConvertTo-Json instead.
Press Ctrl+C in the terminal running the server. If running as a daemon:
uv run engram stopThis command works on all platforms.
Start the Engram server.
# Start in foreground (default)
uv run engram start
# Start on a custom host and port
uv run engram start --host 0.0.0.0 --port 9000
# Start as a background daemon
uv run engram start --daemon| Flag | Default | Description |
|---|---|---|
--host TEXT |
127.0.0.1 (from ENGRAM_HOST) |
Host address to bind to |
--port INTEGER |
7777 (from ENGRAM_PORT) |
Port to bind to |
--daemon, -d |
off | Run as a background daemon |
In foreground mode, press Ctrl+C to stop. In daemon mode, use engram stop.
On Windows, daemon mode uses CREATE_NEW_PROCESS_GROUP (start_new_session=True). If it does not work as expected, use foreground mode (the default).
Stop a running Engram daemon.
uv run engram stopIf no server is running:
No running Engram server found
Exit code: 1. On Windows, the stop command uses taskkill instead of SIGTERM.
All memory endpoints are prefixed with /agents/{agent_id}. The agent_id is a string that identifies the agent namespace (for example, my-agent). The following characters are rejected with a 400 error: path separators (/, \, ..) and Windows-illegal filename characters (< > : " | ? *).
curl http://127.0.0.1:7777/healthThis command is the same on all platforms.
Response (200):
{
"status": "healthy",
"version": "1.2.0",
"components": {
"vault": "healthy",
"lancedb": "healthy",
"embeddings": "healthy"
}
}Response when the vault directory is missing (503):
{
"status": "unhealthy",
"version": "1.2.0",
"components": {
"vault": "unhealthy",
"lancedb": "unhealthy",
"embeddings": "unhealthy"
}
}When you write a memory, Engram checks for similar existing memories first. There are three possible outcomes:
- added — no similar memory found, or the similarity is below the add threshold. A new memory is created. Returns
201. - updated — a similar memory exists and the LLM (or threshold) decides the incoming content replaces it. The old memory is deleted and a new one is created with preserved importance. Returns
200. - ignored — a very similar memory already exists (above the ignore threshold). No new memory is written. Returns
200with the existing memory's ID.
macOS / Linux:
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
-H "Content-Type: application/json" \
-d '{"content":"The quick brown fox jumps over the lazy dog","tags":["test","example"],"importance":0.8}'Windows CMD:
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories -H "Content-Type: application/json" -d "{\"content\":\"The quick brown fox jumps over the lazy dog\",\"tags\":[\"test\",\"example\"],\"importance\":0.8}"Request body fields:
| Field | Type | Required | Description |
|---|---|---|---|
content |
string | yes | Memory text (minimum 1 character) |
tags |
string[] | no | List of tags (default: []) |
importance |
float | no | Importance score 0.0 to 1.0 (default: 0.5) |
Response for added (201):
{
"decision": "added",
"id": "the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03",
"similarity_score": null
}Response for updated (200):
{
"decision": "updated",
"id": "the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03",
"similarity_score": 0.72
}Response for ignored (200):
{
"decision": "ignored",
"id": "existing-memory-id-my-agent-2026-05-03",
"similarity_score": 0.95
}The id is generated from the content, agent ID, and current date. The similarity_score shows how similar the incoming content was to the best match (null for added memories with no match). The id and similarity_score values will differ each time — use the values returned by your own response.
curl http://127.0.0.1:7777/agents/my-agent/memoriesThis command is the same on all platforms.
Response (200): an array of memory objects. Returns [] if the agent has no memories.
[
{
"id": "the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03",
"agent": "my-agent",
"type": "memory",
"importance": 0.8,
"tags": ["test", "example"],
"created": "2026-05-03T12:00:00+00:00",
"updated": "2026-05-03T12:00:00+00:00",
"importance_updated": "2026-05-03T12:00:00+00:00",
"body": "The quick brown fox jumps over the lazy dog"
}
]curl http://127.0.0.1:7777/agents/my-agent/memories/the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03This command is the same on all platforms. Response (200): a single memory object in the same format as List Memories.
Response (404):
{ "detail": "Memory not found" }curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03This command is the same on all platforms. Response (204): empty body on success.
Response (404):
{ "detail": "Memory not found" }macOS / Linux:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=python&limit=5"Windows CMD:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=python&limit=5"Query parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
q |
string | yes | Search query (minimum 1 character) |
limit |
integer | no | Maximum results to return, 1–100 (default: 10) |
Response (200): an array of search result objects ranked by relevance. Importance scores are updated on each retrieval — decayed by time since last access, then bumped by the hit increment.
[
{
"id": "python-programming-basics-my-agent-2026-05-03",
"score": 0.87,
"importance": 0.5,
"chunk": "Python is a versatile programming language...",
"agent": "my-agent",
"created": "2026-05-03T12:00:00+00:00"
}
]Missing query parameter (422):
{
"detail": [
{
"type": "missing",
"loc": ["query", "q"],
"msg": "Field required"
}
]
}Search unavailable (503):
{ "detail": "Search index not available" }{ "detail": "Search unavailable: embedding provider is not configured" }| Status | When |
|---|---|
400 |
Agent ID or memory ID contains path separators or Windows-illegal characters |
404 |
Memory not found (on read or delete) |
422 |
Request body validation failed (for example, empty content) |
Agent ID with illegal characters (400):
{ "detail": "agent_id contains illegal characters: 'bad<agent'" }Memory ID with path traversal (400):
{ "detail": "Invalid memory_id: 'test-bad..agent-2026-05-03'" }Memory not found (404):
{ "detail": "Memory not found" }Empty content (422):
{
"detail": [
{
"type": "string_too_short",
"loc": ["body", "content"],
"msg": "String should have at least 1 character",
"input": "",
"ctx": { "min_length": 1 }
}
]
}When you write a memory, the smart write pipeline runs automatically. You don't configure it separately — it's built into the write endpoint. Here's how it decides what to do:
- Embed the incoming content
- Find similar — search the index for the top 3 most similar memories for the same agent
- Threshold check:
- Similarity below
SIMILARITY_ADD_THRESHOLD(default 0.3) → add as new - Similarity at or above
SIMILARITY_IGNORE_THRESHOLD(default 0.92) → ignore as duplicate - Similarity between the two thresholds → consult LLM
- Similarity below
- LLM consultation — send the incoming content and similar memories to a local Ollama model, which decides add, update, or ignore
- Execute — add a new memory, update the existing one (preserving its importance), or do nothing
If the LLM is unreachable, the pipeline falls back to "add" — keeping your data is always preferred over losing it.
Every memory has an importance score between 0.0 and 1.0. You set it when you create a memory (default: 0.5). The score changes in two ways:
- Decay — importance decreases over time based on a half-life (default: 7 days). A memory that hasn't been accessed in 7 days has its importance halved.
- Retrieval bump — every time a memory appears in search results, its importance is bumped by the hit increment (default: 0.05), then clamped to 1.0.
Decay is lazy — it's only calculated when a memory is retrieved, not on a schedule. This means importance stays accurate without any background jobs.
When the smart write pipeline updates a memory, the old memory's importance is preserved on the new one.
Search requires an embedding provider to vectorize memories. Engram supports two providers:
-
Ollama (default) — runs locally at
http://localhost:11434using thenomic-embed-textmodel. Start Ollama before Engram:ollama serve, then pull the model:ollama pull nomic-embed-text. -
fastembed — runs in-process with no external service. Uses the
BAAI/bge-small-en-v1.5model. Fallback only; setENGRAM_EMBEDDING_PROVIDER=fastembedto use it directly.
When Ollama is unavailable and ENGRAM_EMBEDDING_AUTOFALLBACK=true (the default), Engram automatically falls back to fastembed. If both providers fail, the server starts without search — CRUD still works, search returns 503.
On Windows, the onnxruntime dependency that fastembed requires may fail to load. If you see a 503 error from search, start Ollama and let Engram use it as the embedding provider instead.
All configuration uses environment variables with the ENGRAM_ prefix. Set them directly or via a .env file in the working directory.
Required:
| Variable | Default | Description |
|---|---|---|
ENGRAM_VAULT_PATH |
— | Path to the vault directory where memory files are stored |
Optional:
| Variable | Default | Description |
|---|---|---|
ENGRAM_HOST |
127.0.0.1 |
Server bind address |
ENGRAM_PORT |
7777 |
Server bind port |
ENGRAM_IMPORTANCE_INITIAL_SCORE |
0.5 |
Default importance score for new memories |
ENGRAM_LOG_LEVEL |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL |
ENGRAM_LOG_FILE |
~/.engram/logs/engram.log |
Path to the log file |
ENGRAM_LOG_ROTATION |
10 MB |
Log rotation size threshold |
ENGRAM_LOG_RETENTION |
7 days |
Log retention period |
ENGRAM_STATE_FILE |
~/.engram/state.json |
Path to the PID state file (used by start and stop) |
ENGRAM_EMBEDDING_PROVIDER |
ollama |
Embedding provider: ollama or fastembed |
ENGRAM_EMBEDDING_MODEL |
nomic-embed-text |
Embedding model name (provider-specific) |
ENGRAM_EMBEDDING_AUTOFALLBACK |
true |
Auto-fallback to fastembed if Ollama is unavailable |
ENGRAM_CHUNK_MAX_TOKENS |
512 |
Maximum tokens per chunk for semantic chunking |
ENGRAM_CHUNK_OVERLAP_TOKENS |
50 |
Overlap tokens between adjacent chunks |
ENGRAM_RRF_K |
10 |
RRF constant for hybrid search fusion |
ENGRAM_IMPORTANCE_RERANK_WEIGHT |
0.3 |
Weight for importance score in reranking (0.0 to 1.0) |
ENGRAM_INDEX_PATH |
~/.engram/index |
Path to the LanceDB index directory |
ENGRAM_SIMILARITY_ADD_THRESHOLD |
0.3 |
Below this similarity, always add as new memory |
ENGRAM_SIMILARITY_IGNORE_THRESHOLD |
0.92 |
At or above this similarity, ignore as duplicate |
ENGRAM_IMPORTANCE_DECAY_HALFLIFE |
7.0 |
Half-life in days for importance decay |
ENGRAM_IMPORTANCE_HIT_INCREMENT |
0.05 |
Importance bump on each search retrieval |
ENGRAM_LLM_MODEL |
llama3 |
Ollama model name for smart write LLM consultation |
ENGRAM_LLM_HOST |
http://localhost:11434 |
Ollama host URL for smart write LLM consultation |
Reserved (accepted but unused):
| Variable | Default | Note |
|---|---|---|
ENGRAM_OBSIDIAN_MODE |
true |
No effect in current version |
ENGRAM_SHARED_MODE |
false |
No effect in current version |
The .env.example file in the repository root contains all variables with their defaults.
This walkthrough creates a memory, reads it, searches for it, and deletes it. Use the my-agent agent ID throughout.
Step 1: Start the server
uv run engram startStep 2: Create a memory
POST requests with JSON bodies require different quoting on Windows CMD. See Write a Memory for the Windows CMD variant.
macOS / Linux:
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
-H "Content-Type: application/json" \
-d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'Windows CMD:
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories -H "Content-Type: application/json" -d "{\"content\":\"Deployed v2 to production on Saturday\",\"tags\":[\"deploy\",\"production\"],\"importance\":0.9}"The response includes a decision and id field — note the id for the next steps. The id, decision, and similarity_score will differ based on whether similar memories exist:
{
"decision": "added",
"id": "deployed-v2-to-production-on-saturday-my-agent-2026-05-03",
"similarity_score": null
}Step 3: Read the memory
Use the id from step 2. Your id will contain today's date instead of 2026-05-03:
curl http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-on-saturday-my-agent-2026-05-03This command is the same on all platforms.
Step 4: Search for the memory
macOS / Linux:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"Windows CMD:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"The search returns ranked results with relevance scores:
[
{
"id": "deployed-v2-to-production-on-saturday-my-agent-2026-05-03",
"score": 0.87,
"importance": 0.9,
"chunk": "Deployed v2 to production on Saturday",
"agent": "my-agent",
"created": "2026-05-03T12:00:00+00:00"
}
]Step 5: List all memories
curl http://127.0.0.1:7777/agents/my-agent/memoriesReturns an array containing the memory from step 2. This command is the same on all platforms.
Step 6: Delete the memory
Use the id from step 2:
curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-on-saturday-my-agent-2026-05-03Returns 204 with an empty body. This command is the same on all platforms.
Step 7: Verify deletion
curl http://127.0.0.1:7777/agents/my-agent/memoriesReturns []. This command is the same on all platforms.
Step 8: Stop the server
Press Ctrl+C in the terminal running the server, or:
uv run engram stopThis command works on all platforms.
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
vault_path
Field required
The ENGRAM_VAULT_PATH environment variable is not set. Set it before starting the server.
macOS / Linux:
export ENGRAM_VAULT_PATH="$HOME/.engram/vault"# Windows PowerShell
$env:ENGRAM_VAULT_PATH = "$env:USERPROFILE\.engram\vault"# Windows CMD
set ENGRAM_VAULT_PATH=%USERPROFILE%\.engram\vaultOr edit the .env file and set ENGRAM_VAULT_PATH to the path you chose for your vault directory.
No running Engram server found
The engram stop command cannot find a running server. Either the server was never started, or it crashed without cleaning up its state file. If a stale state file exists, engram start removes it automatically before starting.
ERROR: [Errno 98] Address already in use (or [WinError 10048] on Windows)
Another process is using port 7777. Use a different port:
uv run engram start --port 8080Or find and stop the process using port 7777:
macOS / Linux:
lsof -i :7777
kill <PID>Windows PowerShell:
Get-NetTCPConnection -LocalPort 7777 | Select-Object OwningProcess
Stop-Process -Id <PID>Windows CMD:
netstat -ano | findstr :7777
taskkill /PID <PID> /F'cp' is not recognized as an internal or external command
The cp command is Unix-only. On Windows CMD, use copy .env.example .env. On Windows PowerShell, use Copy-Item .env.example .env.
{
"detail": [
{
"type": "string_too_short",
"loc": ["body", "content"],
"msg": "String should have at least 1 character"
}
]
}The content field is required and must be at least 1 character. Provide non-empty content in the request body.
Daemon failed to start on 127.0.0.1:7777. Process may have exited (PID 12345).
On the first run, the server may need more than a few seconds to initialize (embedding model downloads, index creation). The daemon timeout is 30 seconds. If it still fails, try running in foreground mode first to see startup logs:
uv run engram startIf foreground mode works, the daemon should work on subsequent attempts since model files are cached.
{ "detail": "Search unavailable: embedding provider is not configured" }Neither Ollama nor fastembed could be loaded. On Windows, this is typically caused by the onnxruntime DLL failing to load. The server starts without search, but CRUD operations still work. To resolve:
- Start Ollama:
ollama serve(then pull the model:ollama pull nomic-embed-text) - Or set
ENGRAM_EMBEDDING_PROVIDER=fastembedin your.envfile (may require Visual C++ Redistributable on Windows)
{ "detail": "Search index not available" }The search index has not been initialized. This means the server started without embedding support. See the resolution steps above.
If Ollama is not running, the LLM consultation falls back to "add" every time. Deduplication still works at the threshold level — memories with similarity at or above ENGRAM_SIMILARITY_IGNORE_THRESHOLD (default 0.92) are still ignored. Only the ambiguous zone between 0.3 and 0.92 defaults to "add" instead of consulting the LLM.
To enable LLM-assisted decisions in the ambiguous zone:
- Install Ollama: see ollama.com
- Pull a model:
ollama pull llama3 - Start Ollama:
ollama serve - If Ollama runs on a non-default host, set
ENGRAM_LLM_HOSTin your.envfile
When running uv sync --extra dev, you may see:
Resolved 72 packages in 2ms
Checked 69 packages in 13ms
The exact package count and time vary. This is normal — uv is resolving and checking dependencies. No action required.
When running engram start for the first time, Engram creates the engram subdirectory inside your vault path. This is expected — the health check verifies this directory exists.
If you created a virtual environment manually before running uv sync, you may see:
warning: `VIRTUAL_ENV=venv` does not match the project environment path `.venv` and will be ignored
This is harmless. uv run uses its own .venv and ignores the manual environment. You can delete your manually created virtual environment directory.
v1.1 retrieval evaluation (24 queries, golden set):
| Metric | v1.1 | v1.2 | Change |
|---|---|---|---|
| P@1 | 0.3478 | 0.6957 | +100% |
| R@5 | 1.0 | 1.0 | = |
| MRR@10 | 0.5841 | 0.8152 | +39% |
| Latency@10 | 5324 ms | 18561 ms | +249% |
Precision and MRR improved significantly with the smart write pipeline and importance-weighted reranking. Latency increased because v1.2 updates importance scores on every search result (decay + bump).
See CHANGELOG.md for the full history.
v1.2 adds smart write deduplication with LLM consultation, importance scoring with time-based decay and retrieval bumps, configurable similarity thresholds, and 6 new environment variables for the intelligence features.
v1.1 adds semantic search with LanceDB, embedding providers (Ollama and fastembed), semantic chunking with configurable overlap, importance-weighted reranking, and a health endpoint that reports component status. Search works alongside CRUD — if embeddings aren't available, CRUD still works and search returns 503.
uv sync --extra dev
uv run pytest --cov=engram -vOutput: 276 passed with 92.58% coverage. The time varies by machine.
Lint and format:
uv run ruff check src/ tests/
uv run ruff format src/ tests/