Skip to content

Obsidian68/Engram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Engram

Give your AI agents a memory that persists, searches by meaning, and lives in plain files on your own machine.

Why Engram?

AI agents forget everything between sessions. Every conversation starts from scratch — no recall of past decisions, no accumulated knowledge, no continuity. You can wire up a database, but then you're running infrastructure and writing queries instead of building your agent.

Engram gives agents persistent memory through a simple REST API. Write a memory, search for it later by meaning, and everything is stored as readable Markdown files you control. No cloud service, no API keys, no database to manage. Point it at a directory and start storing memories.

When you write a memory, Engram checks whether you already have a similar one. If it's genuinely new, it's added. If it's a duplicate, the existing one is kept. If it's an update, the old memory is replaced — preserving its importance score. You decide how strict the deduplication is, and you can bring your own LLM to make the call when similarity is ambiguous.

Each agent gets its own namespace, so multiple agents can share the same Engram instance without stepping on each other. Search combines vector similarity with keyword matching, weighted by importance. If embeddings aren't available, CRUD still works — search returns 503 until you fix the embedding provider.

Features

  • Persistent memory — store text memories that survive across sessions, each one a human-readable Markdown file
  • Smart deduplication — every write checks for similar memories: add new ones, ignore duplicates, or update existing ones with preserved importance
  • LLM-assisted decisions — when similarity is ambiguous, consult a local LLM to decide whether to add, update, or ignore
  • Semantic search — find memories by meaning, not just exact keyword matches
  • Importance scoring — tag memories with priority; scores decay over time and get bumped on retrieval
  • Multi-agent isolation — each agent gets its own namespace; no overlap, no conflicts
  • Local-first and private — no cloud, no API keys, no telemetry. Your data stays on your machine
  • Human-readable storage — every memory is a Markdown file you can read, edit, and version-control
  • Automatic indexing — memories are chunked and indexed as you write them, no manual rebuilds
  • Graceful degradation — CRUD works even without embeddings; search returns 503 until the provider is available

Who It's For

Engram is for developers building AI agents who need those agents to remember things across conversations. If you're working with LLM-based tools, chatbots, or autonomous agents and need persistent, searchable memory without running a database server, Engram solves that problem.

How It Works

Engram runs a local HTTP server. Agents interact with it through a REST API — create, read, list, delete, and search memories. Each memory is stored as a Markdown file with YAML frontmatter inside a vault directory you choose. A LanceDB index handles search, combining vector embeddings with keyword matching and importance-weighted reranking.

When you create a memory, the smart write pipeline runs: the content is embedded, compared against existing memories, and a decision is made — add it as new, update an existing one, or ignore it as a duplicate. When you search, results are ranked by relevance and importance, and each result's importance score is decayed and bumped so frequently accessed memories stay fresh.

When you create a memory, it looks like this on disk:

---
agent: my-agent
created: "2026-05-03T12:00:00+00:00"
id: deployed-v2-to-production-my-agent-2026-05-03
importance: 0.9
importance_updated: "2026-05-03T12:00:00+00:00"
tags:
  - deploy
  - production
type: memory
updated: "2026-05-03T12:00:00+00:00"
---

Deployed v2 to production

You can open this file in any text editor, edit it directly, or put the vault directory under version control. The search index rebuilds from these files automatically.

Quick Start

1. Install dependencies

uv sync --extra dev

This command works on all platforms. It creates a virtual environment and installs Engram with all dependencies. Do not create a virtual environment manually — uv manages its own .venv.

2. Create a vault directory

Engram stores memories as Markdown files inside a vault directory. Choose any location — for example:

Platform Example path
macOS /Users/you/.engram/vault
Linux /home/you/.engram/vault
Windows C:\Users\you\.engram\vault

You can also point Engram at an existing Obsidian vault — any directory works.

Create the directory you chose:

# macOS / Linux
mkdir -p ~/.engram/vault
# Windows PowerShell
New-Item -ItemType Directory -Path "$env:USERPROFILE\.engram\vault" -Force
# Windows CMD
mkdir "%USERPROFILE%\.engram\vault"

3. Configure environment variables

Copy the example configuration file:

# macOS / Linux
cp .env.example .env
# Windows PowerShell
Copy-Item .env.example .env
# Windows CMD
copy .env.example .env

Then edit .env and set ENGRAM_VAULT_PATH to the directory you created:

# macOS / Linux
ENGRAM_VAULT_PATH=~/.engram/vault
# Windows
ENGRAM_VAULT_PATH=C:\Users\you\.engram\vault

ENGRAM_VAULT_PATH is the only required variable. All others have defaults.

4. Start the server

uv run engram start

This command works on all platforms. When the server starts, you will see:

2026-05-03 12:00:00.000 | INFO     | engram.cli.cli:start:142 - Starting Engram on 127.0.0.1:7777
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:7777 (Press CTRL+C to quit)

The timestamp, line number, and PID vary each time. The key line is Uvicorn running on http://127.0.0.1:7777.

5. Verify the server is running

curl http://127.0.0.1:7777/health

This command is the same on all platforms.

Response:

{
  "status": "healthy",
  "version": "1.2.0",
  "components": {
    "vault": "healthy",
    "lancedb": "healthy",
    "embeddings": "healthy"
  }
}

On Windows PowerShell, use Invoke-RestMethod http://127.0.0.1:7777/health | ConvertTo-Json instead.

6. Stop the server

Press Ctrl+C in the terminal running the server. If running as a daemon:

uv run engram stop

This command works on all platforms.

CLI Reference

engram start

Start the Engram server.

# Start in foreground (default)
uv run engram start

# Start on a custom host and port
uv run engram start --host 0.0.0.0 --port 9000

# Start as a background daemon
uv run engram start --daemon
Flag Default Description
--host TEXT 127.0.0.1 (from ENGRAM_HOST) Host address to bind to
--port INTEGER 7777 (from ENGRAM_PORT) Port to bind to
--daemon, -d off Run as a background daemon

In foreground mode, press Ctrl+C to stop. In daemon mode, use engram stop.

On Windows, daemon mode uses CREATE_NEW_PROCESS_GROUP (start_new_session=True). If it does not work as expected, use foreground mode (the default).

engram stop

Stop a running Engram daemon.

uv run engram stop

If no server is running:

No running Engram server found

Exit code: 1. On Windows, the stop command uses taskkill instead of SIGTERM.

REST API Reference

All memory endpoints are prefixed with /agents/{agent_id}. The agent_id is a string that identifies the agent namespace (for example, my-agent). The following characters are rejected with a 400 error: path separators (/, \, ..) and Windows-illegal filename characters (< > : " | ? *).

Health Check

curl http://127.0.0.1:7777/health

This command is the same on all platforms.

Response (200):

{
  "status": "healthy",
  "version": "1.2.0",
  "components": {
    "vault": "healthy",
    "lancedb": "healthy",
    "embeddings": "healthy"
  }
}

Response when the vault directory is missing (503):

{
  "status": "unhealthy",
  "version": "1.2.0",
  "components": {
    "vault": "unhealthy",
    "lancedb": "unhealthy",
    "embeddings": "unhealthy"
  }
}

Write a Memory

When you write a memory, Engram checks for similar existing memories first. There are three possible outcomes:

  • added — no similar memory found, or the similarity is below the add threshold. A new memory is created. Returns 201.
  • updated — a similar memory exists and the LLM (or threshold) decides the incoming content replaces it. The old memory is deleted and a new one is created with preserved importance. Returns 200.
  • ignored — a very similar memory already exists (above the ignore threshold). No new memory is written. Returns 200 with the existing memory's ID.

macOS / Linux:

curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
  -H "Content-Type: application/json" \
  -d '{"content":"The quick brown fox jumps over the lazy dog","tags":["test","example"],"importance":0.8}'

Windows CMD:

curl -X POST http://127.0.0.1:7777/agents/my-agent/memories -H "Content-Type: application/json" -d "{\"content\":\"The quick brown fox jumps over the lazy dog\",\"tags\":[\"test\",\"example\"],\"importance\":0.8}"

Request body fields:

Field Type Required Description
content string yes Memory text (minimum 1 character)
tags string[] no List of tags (default: [])
importance float no Importance score 0.0 to 1.0 (default: 0.5)

Response for added (201):

{
  "decision": "added",
  "id": "the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03",
  "similarity_score": null
}

Response for updated (200):

{
  "decision": "updated",
  "id": "the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03",
  "similarity_score": 0.72
}

Response for ignored (200):

{
  "decision": "ignored",
  "id": "existing-memory-id-my-agent-2026-05-03",
  "similarity_score": 0.95
}

The id is generated from the content, agent ID, and current date. The similarity_score shows how similar the incoming content was to the best match (null for added memories with no match). The id and similarity_score values will differ each time — use the values returned by your own response.

List Memories

curl http://127.0.0.1:7777/agents/my-agent/memories

This command is the same on all platforms.

Response (200): an array of memory objects. Returns [] if the agent has no memories.

[
  {
    "id": "the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03",
    "agent": "my-agent",
    "type": "memory",
    "importance": 0.8,
    "tags": ["test", "example"],
    "created": "2026-05-03T12:00:00+00:00",
    "updated": "2026-05-03T12:00:00+00:00",
    "importance_updated": "2026-05-03T12:00:00+00:00",
    "body": "The quick brown fox jumps over the lazy dog"
  }
]

Read a Memory

curl http://127.0.0.1:7777/agents/my-agent/memories/the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03

This command is the same on all platforms. Response (200): a single memory object in the same format as List Memories.

Response (404):

{ "detail": "Memory not found" }

Delete a Memory

curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/the-quick-brown-fox-jumps-over-the-lazy-dog-my-agent-2026-05-03

This command is the same on all platforms. Response (204): empty body on success.

Response (404):

{ "detail": "Memory not found" }

Search Memories

macOS / Linux:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=python&limit=5"

Windows CMD:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=python&limit=5"

Query parameters:

Parameter Type Required Description
q string yes Search query (minimum 1 character)
limit integer no Maximum results to return, 1–100 (default: 10)

Response (200): an array of search result objects ranked by relevance. Importance scores are updated on each retrieval — decayed by time since last access, then bumped by the hit increment.

[
  {
    "id": "python-programming-basics-my-agent-2026-05-03",
    "score": 0.87,
    "importance": 0.5,
    "chunk": "Python is a versatile programming language...",
    "agent": "my-agent",
    "created": "2026-05-03T12:00:00+00:00"
  }
]

Missing query parameter (422):

{
  "detail": [
    {
      "type": "missing",
      "loc": ["query", "q"],
      "msg": "Field required"
    }
  ]
}

Search unavailable (503):

{ "detail": "Search index not available" }
{ "detail": "Search unavailable: embedding provider is not configured" }

Error Responses

Status When
400 Agent ID or memory ID contains path separators or Windows-illegal characters
404 Memory not found (on read or delete)
422 Request body validation failed (for example, empty content)

Agent ID with illegal characters (400):

{ "detail": "agent_id contains illegal characters: 'bad<agent'" }

Memory ID with path traversal (400):

{ "detail": "Invalid memory_id: 'test-bad..agent-2026-05-03'" }

Memory not found (404):

{ "detail": "Memory not found" }

Empty content (422):

{
  "detail": [
    {
      "type": "string_too_short",
      "loc": ["body", "content"],
      "msg": "String should have at least 1 character",
      "input": "",
      "ctx": { "min_length": 1 }
    }
  ]
}

Smart Write Pipeline

When you write a memory, the smart write pipeline runs automatically. You don't configure it separately — it's built into the write endpoint. Here's how it decides what to do:

  1. Embed the incoming content
  2. Find similar — search the index for the top 3 most similar memories for the same agent
  3. Threshold check:
    • Similarity below SIMILARITY_ADD_THRESHOLD (default 0.3) → add as new
    • Similarity at or above SIMILARITY_IGNORE_THRESHOLD (default 0.92) → ignore as duplicate
    • Similarity between the two thresholds → consult LLM
  4. LLM consultation — send the incoming content and similar memories to a local Ollama model, which decides add, update, or ignore
  5. Execute — add a new memory, update the existing one (preserving its importance), or do nothing

If the LLM is unreachable, the pipeline falls back to "add" — keeping your data is always preferred over losing it.

Importance Scoring

Every memory has an importance score between 0.0 and 1.0. You set it when you create a memory (default: 0.5). The score changes in two ways:

  • Decay — importance decreases over time based on a half-life (default: 7 days). A memory that hasn't been accessed in 7 days has its importance halved.
  • Retrieval bump — every time a memory appears in search results, its importance is bumped by the hit increment (default: 0.05), then clamped to 1.0.

Decay is lazy — it's only calculated when a memory is retrieved, not on a schedule. This means importance stays accurate without any background jobs.

When the smart write pipeline updates a memory, the old memory's importance is preserved on the new one.

Embedding Providers

Search requires an embedding provider to vectorize memories. Engram supports two providers:

  1. Ollama (default) — runs locally at http://localhost:11434 using the nomic-embed-text model. Start Ollama before Engram: ollama serve, then pull the model: ollama pull nomic-embed-text.

  2. fastembed — runs in-process with no external service. Uses the BAAI/bge-small-en-v1.5 model. Fallback only; set ENGRAM_EMBEDDING_PROVIDER=fastembed to use it directly.

When Ollama is unavailable and ENGRAM_EMBEDDING_AUTOFALLBACK=true (the default), Engram automatically falls back to fastembed. If both providers fail, the server starts without search — CRUD still works, search returns 503.

On Windows, the onnxruntime dependency that fastembed requires may fail to load. If you see a 503 error from search, start Ollama and let Engram use it as the embedding provider instead.

Configuration

All configuration uses environment variables with the ENGRAM_ prefix. Set them directly or via a .env file in the working directory.

Required:

Variable Default Description
ENGRAM_VAULT_PATH Path to the vault directory where memory files are stored

Optional:

Variable Default Description
ENGRAM_HOST 127.0.0.1 Server bind address
ENGRAM_PORT 7777 Server bind port
ENGRAM_IMPORTANCE_INITIAL_SCORE 0.5 Default importance score for new memories
ENGRAM_LOG_LEVEL INFO Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
ENGRAM_LOG_FILE ~/.engram/logs/engram.log Path to the log file
ENGRAM_LOG_ROTATION 10 MB Log rotation size threshold
ENGRAM_LOG_RETENTION 7 days Log retention period
ENGRAM_STATE_FILE ~/.engram/state.json Path to the PID state file (used by start and stop)
ENGRAM_EMBEDDING_PROVIDER ollama Embedding provider: ollama or fastembed
ENGRAM_EMBEDDING_MODEL nomic-embed-text Embedding model name (provider-specific)
ENGRAM_EMBEDDING_AUTOFALLBACK true Auto-fallback to fastembed if Ollama is unavailable
ENGRAM_CHUNK_MAX_TOKENS 512 Maximum tokens per chunk for semantic chunking
ENGRAM_CHUNK_OVERLAP_TOKENS 50 Overlap tokens between adjacent chunks
ENGRAM_RRF_K 10 RRF constant for hybrid search fusion
ENGRAM_IMPORTANCE_RERANK_WEIGHT 0.3 Weight for importance score in reranking (0.0 to 1.0)
ENGRAM_INDEX_PATH ~/.engram/index Path to the LanceDB index directory
ENGRAM_SIMILARITY_ADD_THRESHOLD 0.3 Below this similarity, always add as new memory
ENGRAM_SIMILARITY_IGNORE_THRESHOLD 0.92 At or above this similarity, ignore as duplicate
ENGRAM_IMPORTANCE_DECAY_HALFLIFE 7.0 Half-life in days for importance decay
ENGRAM_IMPORTANCE_HIT_INCREMENT 0.05 Importance bump on each search retrieval
ENGRAM_LLM_MODEL llama3 Ollama model name for smart write LLM consultation
ENGRAM_LLM_HOST http://localhost:11434 Ollama host URL for smart write LLM consultation

Reserved (accepted but unused):

Variable Default Note
ENGRAM_OBSIDIAN_MODE true No effect in current version
ENGRAM_SHARED_MODE false No effect in current version

The .env.example file in the repository root contains all variables with their defaults.

End-to-End Walkthrough

This walkthrough creates a memory, reads it, searches for it, and deletes it. Use the my-agent agent ID throughout.

Step 1: Start the server

uv run engram start

Step 2: Create a memory

POST requests with JSON bodies require different quoting on Windows CMD. See Write a Memory for the Windows CMD variant.

macOS / Linux:

curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
  -H "Content-Type: application/json" \
  -d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'

Windows CMD:

curl -X POST http://127.0.0.1:7777/agents/my-agent/memories -H "Content-Type: application/json" -d "{\"content\":\"Deployed v2 to production on Saturday\",\"tags\":[\"deploy\",\"production\"],\"importance\":0.9}"

The response includes a decision and id field — note the id for the next steps. The id, decision, and similarity_score will differ based on whether similar memories exist:

{
  "decision": "added",
  "id": "deployed-v2-to-production-on-saturday-my-agent-2026-05-03",
  "similarity_score": null
}

Step 3: Read the memory

Use the id from step 2. Your id will contain today's date instead of 2026-05-03:

curl http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-on-saturday-my-agent-2026-05-03

This command is the same on all platforms.

Step 4: Search for the memory

macOS / Linux:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"

Windows CMD:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"

The search returns ranked results with relevance scores:

[
  {
    "id": "deployed-v2-to-production-on-saturday-my-agent-2026-05-03",
    "score": 0.87,
    "importance": 0.9,
    "chunk": "Deployed v2 to production on Saturday",
    "agent": "my-agent",
    "created": "2026-05-03T12:00:00+00:00"
  }
]

Step 5: List all memories

curl http://127.0.0.1:7777/agents/my-agent/memories

Returns an array containing the memory from step 2. This command is the same on all platforms.

Step 6: Delete the memory

Use the id from step 2:

curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-on-saturday-my-agent-2026-05-03

Returns 204 with an empty body. This command is the same on all platforms.

Step 7: Verify deletion

curl http://127.0.0.1:7777/agents/my-agent/memories

Returns []. This command is the same on all platforms.

Step 8: Stop the server

Press Ctrl+C in the terminal running the server, or:

uv run engram stop

This command works on all platforms.

Troubleshooting

ValidationError: vault_path field required

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
vault_path
  Field required

The ENGRAM_VAULT_PATH environment variable is not set. Set it before starting the server.

macOS / Linux:

export ENGRAM_VAULT_PATH="$HOME/.engram/vault"
# Windows PowerShell
$env:ENGRAM_VAULT_PATH = "$env:USERPROFILE\.engram\vault"
# Windows CMD
set ENGRAM_VAULT_PATH=%USERPROFILE%\.engram\vault

Or edit the .env file and set ENGRAM_VAULT_PATH to the path you chose for your vault directory.

No running Engram server found

No running Engram server found

The engram stop command cannot find a running server. Either the server was never started, or it crashed without cleaning up its state file. If a stale state file exists, engram start removes it automatically before starting.

Port 7777 already in use

ERROR:    [Errno 98] Address already in use (or [WinError 10048] on Windows)

Another process is using port 7777. Use a different port:

uv run engram start --port 8080

Or find and stop the process using port 7777:

macOS / Linux:

lsof -i :7777
kill <PID>

Windows PowerShell:

Get-NetTCPConnection -LocalPort 7777 | Select-Object OwningProcess
Stop-Process -Id <PID>

Windows CMD:

netstat -ano | findstr :7777
taskkill /PID <PID> /F

`'cp' is not recognized as an internal or external command

'cp' is not recognized as an internal or external command

The cp command is Unix-only. On Windows CMD, use copy .env.example .env. On Windows PowerShell, use Copy-Item .env.example .env.

Empty content rejected with 422

{
  "detail": [
    {
      "type": "string_too_short",
      "loc": ["body", "content"],
      "msg": "String should have at least 1 character"
    }
  ]
}

The content field is required and must be at least 1 character. Provide non-empty content in the request body.

Daemon fails to start on first attempt

Daemon failed to start on 127.0.0.1:7777. Process may have exited (PID 12345).

On the first run, the server may need more than a few seconds to initialize (embedding model downloads, index creation). The daemon timeout is 30 seconds. If it still fails, try running in foreground mode first to see startup logs:

uv run engram start

If foreground mode works, the daemon should work on subsequent attempts since model files are cached.

Search returns 503

{ "detail": "Search unavailable: embedding provider is not configured" }

Neither Ollama nor fastembed could be loaded. On Windows, this is typically caused by the onnxruntime DLL failing to load. The server starts without search, but CRUD operations still work. To resolve:

  • Start Ollama: ollama serve (then pull the model: ollama pull nomic-embed-text)
  • Or set ENGRAM_EMBEDDING_PROVIDER=fastembed in your .env file (may require Visual C++ Redistributable on Windows)
{ "detail": "Search index not available" }

The search index has not been initialized. This means the server started without embedding support. See the resolution steps above.

Smart write always adds (never deduplicates)

If Ollama is not running, the LLM consultation falls back to "add" every time. Deduplication still works at the threshold level — memories with similarity at or above ENGRAM_SIMILARITY_IGNORE_THRESHOLD (default 0.92) are still ignored. Only the ambiguous zone between 0.3 and 0.92 defaults to "add" instead of consulting the LLM.

To enable LLM-assisted decisions in the ambiguous zone:

  1. Install Ollama: see ollama.com
  2. Pull a model: ollama pull llama3
  3. Start Ollama: ollama serve
  4. If Ollama runs on a non-default host, set ENGRAM_LLM_HOST in your .env file

Known Runtime Warnings

When running uv sync --extra dev, you may see:

Resolved 72 packages in 2ms
Checked 69 packages in 13ms

The exact package count and time vary. This is normal — uv is resolving and checking dependencies. No action required.

When running engram start for the first time, Engram creates the engram subdirectory inside your vault path. This is expected — the health check verifies this directory exists.

If you created a virtual environment manually before running uv sync, you may see:

warning: `VIRTUAL_ENV=venv` does not match the project environment path `.venv` and will be ignored

This is harmless. uv run uses its own .venv and ignores the manual environment. You can delete your manually created virtual environment directory.

Evaluation Results

v1.1 retrieval evaluation (24 queries, golden set):

Metric v1.1 v1.2 Change
P@1 0.3478 0.6957 +100%
R@5 1.0 1.0 =
MRR@10 0.5841 0.8152 +39%
Latency@10 5324 ms 18561 ms +249%

Precision and MRR improved significantly with the smart write pipeline and importance-weighted reranking. Latency increased because v1.2 updates importance scores on every search result (decay + bump).

What's New

See CHANGELOG.md for the full history.

v1.2 adds smart write deduplication with LLM consultation, importance scoring with time-based decay and retrieval bumps, configurable similarity thresholds, and 6 new environment variables for the intelligence features.

v1.1 adds semantic search with LanceDB, embedding providers (Ollama and fastembed), semantic chunking with configurable overlap, importance-weighted reranking, and a health endpoint that reports component status. Search works alongside CRUD — if embeddings aren't available, CRUD still works and search returns 503.

Development

uv sync --extra dev
uv run pytest --cov=engram -v

Output: 276 passed with 92.58% coverage. The time varies by machine.

Lint and format:

uv run ruff check src/ tests/
uv run ruff format src/ tests/