Skip to content

Sahith59/AgentMem-OS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AgentMem OS (MemNAI)

A local-first, persistent memory operating system for long-horizon LLM agents.

Overview

Modern LLM agents forget everything when a conversation ends. AgentMem OS solves this with a four-tier memory hierarchy that persists knowledge across sessions, compresses it intelligently, and retrieves the most relevant context at inference time β€” all running locally with no cloud dependencies.

The system ships four novel ML algorithms as its core contributions, each targeting a different failure mode of naive long-context approaches (truncation, full-history replay, and flat retrieval).


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        LLM Agent                            β”‚
β”‚               (Claude / Ollama / any LiteLLM model)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚  query
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Context Assembler                          β”‚
β”‚          (MMR retrieval across all 4 tiers)                 β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚            β”‚               β”‚              β”‚
     β–Ό            β–Ό               β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Redis  β”‚  β”‚ SQLite  β”‚  β”‚ ChromaDB  β”‚  β”‚  Procedural  β”‚
β”‚ Tier 1  β”‚  β”‚ Tier 2  β”‚  β”‚  Tier 3   β”‚  β”‚   Tier 4     β”‚
β”‚Hot Cacheβ”‚  β”‚Episodic β”‚  β”‚ Semantic  β”‚  β”‚   Patterns   β”‚
β”‚ <100ms  β”‚  β”‚ History β”‚  β”‚ Retrieval β”‚  β”‚   Library    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Memory Tiers

Tier Backend Role Latency
1 β€” Working Memory Redis Recent turns, immediate context < 5 ms
2 β€” Episodic Memory SQLite Full session history, structured recall < 20 ms
3 β€” Semantic Memory ChromaDB Vector similarity search across all sessions < 50 ms
4 β€” Procedural Memory SQLite + NetworkX Recurring interaction patterns < 30 ms

Novel ML Algorithms

1. MemoryImportanceScorer

Scores each conversation turn with an EMA-weighted importance signal combining entity density, semantic novelty, and recency decay. Drives selective retention β€” only high-signal turns survive consolidation.

from agentmem_os.llm.importance_scorer import MemoryImportanceScorer
scorer = MemoryImportanceScorer(get_db)
score = scorer.score_turn(session_id, turn_content, role="user")

2. SleepConsolidationEngine

Runs offline compression using DBSCAN clustering over turn embeddings. Groups semantically similar turns into clusters, extracts representative summaries, and writes them to the Summary table β€” analogous to hippocampal replay during sleep.

from agentmem_os.llm.consolidation_engine import SleepConsolidationEngine
engine = SleepConsolidationEngine(get_db, summarizer, chroma, scorer, get_embedding)
engine.consolidate(session_id)

3. EntityKnowledgeGraph

Builds a persistent co-occurrence graph (NetworkX) of named entities extracted from conversation turns. Supports subgraph retrieval for world-model queries, updated incrementally as new turns arrive.

from agentmem_os.db.knowledge_graph import EntityKnowledgeGraph
kg = EntityKnowledgeGraph(get_db)
subgraph = kg.get_relevant_subgraph("Tell me about Sahith AgentMem", agent_id=None)

4. ProceduralMemory

Mines recurring interaction patterns from session history (e.g., user always asks for code before explanation). Patterns are scored by frequency and recency, retrieved at inference time to pre-shape responses.

from agentmem_os.llm.procedural_memory import ProceduralMemory
pm = ProceduralMemory(get_db)
patterns = pm.get_relevant_patterns("explain research methodology", agent_id=None)

Benchmark Metrics

Three novel metrics designed for memory-augmented LLM evaluation:

CRS β€” Context Relevance Score

Measures how relevant the assembled memory context is to the current query vs. a random baseline context.

CRS = cosine_sim(embed(query), embed(assembled_context))
      vs.
      cosine_sim(embed(query), embed(random_context))

TES β€” Token Efficiency Score

Measures compression quality: how much token reduction is achieved while preserving key entities.

TES = √(compression_ratio Γ— entity_preservation_rate)

compression_ratio      = 1 - (tokens_after / tokens_before)
entity_preservation    = |entities_in_summary ∩ entities_in_original| / |entities_in_original|

LCS β€” Long-Horizon Continuity Score

Measures whether the agent can answer factual questions about things said K turns ago β€” the core capability that distinguishes a memory system from naive context truncation.

LCS = (facts correctly recalled with AgentMem OS) / (total facts seeded)
baseline = (facts recalled with recent-only context)

Project Structure

agentmem_os/
β”œβ”€β”€ agents/                  # Multi-agent memory federation
β”‚   β”œβ”€β”€ memory_federation.py
β”‚   β”œβ”€β”€ namespace_manager.py
β”‚   └── trust_network.py
β”œβ”€β”€ api/                     # FastAPI REST interface
β”‚   └── app.py
β”œβ”€β”€ benchmarks/
β”‚   └── eval_harness.py      # CRS / TES / LCS evaluators
β”œβ”€β”€ cache/
β”‚   └── redis_client.py      # Tier 1: Redis working memory
β”œβ”€β”€ cli/
β”‚   └── main.py              # Typer CLI
β”œβ”€β”€ db/
β”‚   β”œβ”€β”€ chroma_client.py     # Tier 3: ChromaDB semantic store
β”‚   β”œβ”€β”€ engine.py            # SQLAlchemy engine + session factory
β”‚   β”œβ”€β”€ knowledge_graph.py   # EntityKnowledgeGraph (NetworkX)
β”‚   └── models.py            # Turn, Session, Summary, CostLog, Pattern
β”œβ”€β”€ llm/
β”‚   β”œβ”€β”€ adapters.py          # LiteLLM universal adapter + prompt caching
β”‚   β”œβ”€β”€ consolidation_engine.py   # SleepConsolidationEngine (DBSCAN)
β”‚   β”œβ”€β”€ context_assembler.py      # MMR retrieval across all tiers
β”‚   β”œβ”€β”€ importance_scorer.py      # MemoryImportanceScorer (EMA)
β”‚   β”œβ”€β”€ procedural_memory.py      # ProceduralMemory (pattern mining)
β”‚   β”œβ”€β”€ summarizer.py             # Extractive + LLM-based summarizer
β”‚   └── token_counter.py          # tiktoken wrapper
β”œβ”€β”€ storage/
β”‚   └── store.py             # ConversationStore (coordinates all tiers)
β”œβ”€β”€ tests/
β”‚   └── test_e2e_claude.py   # Full end-to-end benchmark test
β”œβ”€β”€ config.yaml
β”œβ”€β”€ requirements.txt
└── .env.example

Installation

Requirements: Python 3.11+, Redis running locally, optional Ollama for local embeddings.

# Clone
git clone https://github.com/yourusername/agentmem-os.git
cd agentmem-os

# Virtual environment
python3 -m venv venv
source venv/bin/activate      # Windows: venv\Scripts\activate

# Dependencies
pip install -r requirements.txt

# Environment
cp .env.example .env
# Edit .env and set ANTHROPIC_API_KEY (or GROQ_API_KEY for free tier)

# Database
python -c "from agentmem_os.db.engine import init_db; init_db()"

Running the End-to-End Benchmark

The E2E test runs a full 25-turn conversation with Claude, then evaluates all three benchmark metrics:

# Start Redis (required for Tier 1)
redis-server &

# Optional: start Ollama for best CRS scores (768-dim embeddings)
ollama pull nomic-embed-text

# Run
python tests/test_e2e_claude.py

What the test does:

  • Turns 1–5: Seeds 5 grounding facts (name, project, tiers, algorithms, deadline)
  • Turns 6–15: Work turns that push the grounding turns beyond the context window
  • Turns 16–25: Probe turns with no hints β€” agent must retrieve from memory
  • Step 6: Forces sleep consolidation to generate summaries for TES
  • Step 7: Verifies Entity Knowledge Graph population
  • Step 8: Mines procedural patterns
  • Step 9: Measures token cost and prompt caching savings
  • Step 10: Evaluates CRS / TES / LCS against baselines

Example output:

╔══════════════════════════════════════════════════╗
β•‘   AgentMem OS β€” Benchmark Report                 β•‘
╠══════════════════════════════════════════════════╣
β•‘ Metric      :  Ours   Baseline      Ξ”            β•‘
β•‘ CRS         : 0.6821    0.4103  +0.2718  ↑       β•‘
β•‘ TES         : 0.5940    0.3211  +0.2729  ↑       β•‘
β•‘ LCS         : 1.0000    0.0000  +1.0000  ↑       β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Configuration

Edit config.yaml to change models and storage paths:

models:
  default_model: "anthropic/claude-haiku-4-5-20251001"   # cheapest Claude
  fallback_model: "ollama/llama3.1"                       # local fallback
  compression_threshold: 0.70                             # trigger consolidation at 70% context

storage:
  base_path: "~/.agentmem_os/"

Supported model strings (LiteLLM format):

Model String Use case
Claude Haiku 4.5 anthropic/claude-haiku-4-5-20251001 E2E testing (cheap)
Claude Sonnet 4.6 anthropic/claude-sonnet-4-6 Best benchmark quality
Llama 3.1 (local) ollama/llama3.1 Free, no API key needed
Groq Llama groq/llama-3.1-8b-instant Free tier fallback

Cost Efficiency

A key claim of the paper is that AgentMem OS reduces API token costs through prompt caching and aggressive context compression. The E2E test measures this automatically:

Input tokens    : 45,230
Cached tokens   : 38,190   (84.4% of input β€” charged at 10% rate)
Est. session cost: $0.0089
Cache savings   : 75.8% reduction in effective input token cost

Prompt caching works because AgentMem OS always places the system context (assembled memory) at the beginning of the message, making it eligible for Anthropic's cache prefix matching.


Research Context

This project is being developed as part of PhD research on persistent memory architectures for LLM agents. The NeurIPS 2026 workshop paper will include:

  • Formal definitions of CRS, TES, and LCS metrics
  • Ablation study: individual contribution of each memory tier
  • Comparison against baselines: full history replay, sliding window, RAG-only
  • Cost-efficiency analysis across session lengths

Environment Variables

# .env
ANTHROPIC_API_KEY=sk-ant-...      # Required for Claude models
GROQ_API_KEY=gsk_...              # Optional: free fallback
OLLAMA_BASE_URL=http://localhost:11434   # Optional: local embeddings
REDIS_URL=redis://localhost:6379  # Default Redis location

License

MIT License β€” see LICENSE file.


About

A local-first, persistent memory operating system for long-horizon LLM agents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors