From 9b7b455cbe2565cd402a9eb0056b85486e434b96 Mon Sep 17 00:00:00 2001 From: Auri Wren Date: Mon, 9 Feb 2026 19:13:43 +0000 Subject: [PATCH 1/4] Add documents table, update hooks for OpenClaw compatibility - schema.sql: Add documents table for workspace file/knowledge registry - Tracks path, title, doc_type, description, tags - Unique constraint on path, GIN index on tags - hooks/session-init: Updated to query PostgreSQL for recent events/decisions/lessons and inject as SESSION_CONTEXT.md bootstrap file (OpenClaw format) - hooks/memory-extract: Rewritten for OpenClaw with LLM-based extraction - Heuristic gate to avoid unnecessary API calls - 5-minute cooldown per session - Extracts facts, events, decisions, lessons via Claude Haiku - Stores via memory-db CLI - README.md: Document the documents table and session-init hook --- README.md | 47 +++- hooks/memory-extract/HOOK.md | 30 ++- hooks/memory-extract/handler.ts | 406 ++++++++++++++++++++++++++------ hooks/session-init/HOOK.md | 34 +-- hooks/session-init/handler.ts | 199 +++++++++++----- schema.sql | 86 +++++++ 6 files changed, 640 insertions(+), 162 deletions(-) diff --git a/README.md b/README.md index d267af7..75c715f 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,7 @@ The schema (`schema.sql`) includes tables for: - **preferences** - User/system preferences - **sops** - Standard Operating Procedures for various tasks and workflows - **agents** - Registry of AI agent instances for delegation +- **documents** - Registry of workspace files and knowledge documents for cross-session discovery ### Access Control Architecture @@ -556,6 +557,39 @@ SELECT title, theme, quality_score, FROM artwork ORDER BY created_at DESC LIMIT 5; ``` +### Documents Table (Document Registry) + +The `documents` table tracks workspace files and knowledge documents: + +| Column | Type | Purpose | +|--------|------|---------| +| `id` | int | Primary key | +| `path` | text | Unique file path (relative to workspace) | +| `title` | varchar(255) | Human-readable title | +| `doc_type` | varchar(50) | Category: config, memory, tool, hook, skill, etc. | +| `description` | text | What this document contains/does | +| `tags` | text[] | Searchable tags | + +**Use Cases:** +- Discover what files exist without scanning the filesystem +- Find relevant documents by type or keyword across sessions +- Track which workspace files are registered vs unregistered + +**Example:** +```sql +-- Register a document +INSERT INTO documents (path, title, doc_type, description) +VALUES ('memory/research-craft-cli.md', 'Craft CLI Research', 'memory', 'Research notes on Craft.do API integration') +ON CONFLICT (path) DO UPDATE SET title=EXCLUDED.title, updated_at=now(); + +-- Find all memory documents +SELECT title, path FROM documents WHERE doc_type = 'memory' ORDER BY title; + +-- Search documents +SELECT title, path, description FROM documents +WHERE title ILIKE '%research%' OR description ILIKE '%research%'; +``` + ### Setup ```bash @@ -618,7 +652,18 @@ git add schema.sql && git commit -m "Update schema: [description]" git push ``` -## Clawdbot Hook (Automatic Extraction) +## Hooks + +### Session Init Hook (`hooks/session-init/`) + +Automatically injects recent activity context from PostgreSQL into the agent's bootstrap. On `agent:bootstrap`, queries events (48h), decisions (7d), and lessons (7d), then formats them as a `SESSION_CONTEXT.md` bootstrap file. Falls back silently if PostgreSQL is unavailable. + +```bash +cp -r hooks/session-init ~/clawd/hooks/ +openclaw hooks enable session-init +``` + +### Memory Extract Hook (`hooks/memory-extract/`) The `hooks/memory-extract/` directory contains a Clawdbot hook that automatically extracts memories from incoming messages. diff --git a/hooks/memory-extract/HOOK.md b/hooks/memory-extract/HOOK.md index 2fd21fd..63e470e 100644 --- a/hooks/memory-extract/HOOK.md +++ b/hooks/memory-extract/HOOK.md @@ -1,9 +1,33 @@ --- name: memory-extract -description: "Extracts memories from incoming messages and stores in database" -metadata: {"clawdbot":{"emoji":"🧠","events":["message:received"]}} +description: "Extract and store memories from assistant responses after each turn" +metadata: + { + "openclaw": + { + "emoji": "🧠", + "events": ["message:sent"], + "requires": { "bins": ["node"] }, + }, + } --- # Memory Extraction Hook -Automatically extracts entities, facts, opinions, and relationships from incoming messages and stores them in the PostgreSQL memory database. +Automatically extracts facts, events, decisions, and lessons from conversation turns and stores them in the PostgreSQL memory database via `memory-db` CLI. + +## How It Works + +When `message:sent` fires: + +1. Reads the last few messages from the session transcript +2. Applies lightweight heuristics to detect extractable content (skips heartbeats, short messages, routine responses) +3. If heuristics pass, calls the Anthropic API (Claude Haiku) to parse structured memory entries +4. Shells out to `memory-db` CLI to store extracted facts, events, and lessons + +## Safeguards + +- **Cooldown:** Minimum 5 minutes between extractions per session +- **Heuristic gate:** LLM is only called when keywords/patterns suggest extractable content +- **Graceful failure:** All errors are caught and logged; never blocks message delivery +- **Skips:** Heartbeats, commands, short messages, isolated/spawn sessions diff --git a/hooks/memory-extract/handler.ts b/hooks/memory-extract/handler.ts index ab0af79..db1e9c3 100644 --- a/hooks/memory-extract/handler.ts +++ b/hooks/memory-extract/handler.ts @@ -1,92 +1,342 @@ -import { exec } from "child_process"; -import { appendFileSync, existsSync, readFileSync, writeFileSync } from "fs"; - -const ACTIVITY_STATE = "/home/nova/clawd/logs/activity-state.json"; -const ACTIVITY_LOG = "/home/nova/clawd/logs/session-activity.jsonl"; - -interface ActivityState { - activeMinutesToday: number; - lastActiveAt: number | null; - todayDate: string | null; - userMessages: number; - heartbeats: number; +/** + * memory-extract: Extract and store memories from conversation turns. + * + * On message:sent (assistant response): + * 1. Read recent transcript lines + * 2. Heuristic check — skip heartbeats, short messages, routine responses + * 3. If extractable content detected, call Claude Haiku to parse structured entries + * 4. Shell out to memory-db CLI to store facts, events, lessons + * + * Never blocks message delivery. All errors are swallowed and logged. + */ + +import { execSync } from "child_process"; +import { readFileSync, existsSync, readdirSync, statSync, appendFileSync, writeFileSync } from "fs"; +import { join } from "path"; + +const MEMORY_DB = "/home/openclaw/.openclaw/workspace/tools/memory-db"; +const SESSIONS_DIR = "/home/openclaw/.openclaw/agents/main/sessions"; +const LOG_FILE = "/home/openclaw/.openclaw/workspace/db/memory-extract.log"; +const STATE_FILE = "/home/openclaw/.openclaw/workspace/db/memory-extract-state.json"; +const COOLDOWN_MS = 5 * 60 * 1000; // 5 minutes between extractions +const MIN_MESSAGE_LENGTH = 80; + +// Heuristic keywords that suggest extractable content +const EXTRACT_SIGNALS = [ + // People / entities + /\b(?:name is|called|known as|nickname)\b/i, + /\b(?:lives in|moved to|works at|job is|employed)\b/i, + /\b(?:birthday|born on|anniversary)\b/i, + /\b(?:allergic|allergy|intolerant)\b/i, + /\b(?:prefer|favorite|loves|hates|dislikes)\b/i, + // Events + /\b(?:happened|occurred|took place|event|trip|visited|traveled)\b/i, + /\b(?:meeting|appointment|scheduled|booked)\b/i, + /\b(?:bought|purchased|ordered|received|shipped)\b/i, + // Decisions + /\b(?:decided|decision|going to|plan to|will switch|chose|picked)\b/i, + /\b(?:signed up|subscribed|cancelled|enrolled)\b/i, + // Lessons / insights + /\b(?:learned|lesson|realize|figured out|turns out|note to self)\b/i, + /\b(?:important to remember|don't forget|keep in mind)\b/i, + // Projects / goals + /\b(?:project|goal|milestone|deadline|launch|shipped|deployed)\b/i, +]; + +const SKIP_PATTERNS = [ + /HEARTBEAT/i, + /heartbeat poll/i, + /DASHBOARD UPDATE/i, + /^System: \[/, + /^HEARTBEAT_OK$/, + /^\//, // commands +]; + +interface ExtractState { + lastExtractTime: Record; // sessionId -> timestamp +} + +function log(msg: string) { + try { + const ts = new Date().toISOString(); + appendFileSync(LOG_FILE, `${ts} ${msg}\n`); + } catch {} +} + +function loadState(): ExtractState { + try { + if (existsSync(STATE_FILE)) { + return JSON.parse(readFileSync(STATE_FILE, "utf-8")); + } + } catch {} + return { lastExtractTime: {} }; +} + +function saveState(state: ExtractState) { + try { + writeFileSync(STATE_FILE, JSON.stringify(state, null, 2)); + } catch {} +} + +function getRecentTranscript(sessionId?: string, lineCount = 20): string[] { + try { + let sessionFile: string | null = null; + + if (sessionId) { + const candidate = join(SESSIONS_DIR, `${sessionId}.jsonl`); + if (existsSync(candidate)) sessionFile = candidate; + } + + if (!sessionFile) { + const files = readdirSync(SESSIONS_DIR) + .filter(f => f.endsWith(".jsonl")) + .map(f => ({ name: f, mtime: statSync(join(SESSIONS_DIR, f)).mtimeMs })) + .sort((a, b) => b.mtime - a.mtime); + if (files.length > 0) sessionFile = join(SESSIONS_DIR, files[0].name); + } + + if (!sessionFile) return []; + + const content = readFileSync(sessionFile, "utf-8"); + const lines = content.trim().split("\n").slice(-lineCount); + const messages: string[] = []; + + for (const line of lines) { + try { + const entry = JSON.parse(line); + const msg = entry.message || entry; + const role = msg.role; + const rawContent = msg.content; + if (!role || !rawContent) continue; + + const text = typeof rawContent === "string" + ? rawContent + : Array.isArray(rawContent) + ? rawContent.filter((b: any) => b.type === "text").map((b: any) => b.text).join(" ") + : null; + + if (text) messages.push(`${role}: ${text}`); + } catch { continue; } + } + + return messages; + } catch { + return []; + } +} + +function shouldSkip(messages: string[]): boolean { + if (messages.length < 2) return true; + + const lastAssistant = [...messages].reverse().find(m => m.startsWith("assistant:")); + const lastUser = [...messages].reverse().find(m => m.startsWith("user:")); + + if (!lastAssistant || !lastUser) return true; + + // Skip short responses + if (lastAssistant.length < MIN_MESSAGE_LENGTH && lastUser.length < MIN_MESSAGE_LENGTH) return true; + + // Skip heartbeats and commands + for (const pattern of SKIP_PATTERNS) { + if (pattern.test(lastUser) || pattern.test(lastAssistant)) return true; + } + + return false; } -function updateActivityState(isUserMessage: boolean) { - let state: ActivityState = { - activeMinutesToday: 0, - lastActiveAt: null, - todayDate: null, - userMessages: 0, - heartbeats: 0 - }; - +function hasExtractableContent(messages: string[]): boolean { + const combined = messages.slice(-6).join("\n"); + return EXTRACT_SIGNALS.some(pattern => pattern.test(combined)); +} + +interface MemoryEntry { + type: "fact" | "event" | "lesson" | "decision"; + entity?: string; + key?: string; + value?: string; + title?: string; + description?: string; + lesson?: string; + context?: string; +} + +async function extractWithLLM(messages: string[]): Promise { + const apiKey = process.env.ANTHROPIC_API_KEY; + if (!apiKey) { + log("no ANTHROPIC_API_KEY"); + return []; + } + + const transcript = messages.slice(-8).join("\n"); + + const systemPrompt = `You extract structured memories from conversations. Output ONLY a JSON array of memory entries. Each entry has a "type" field: + +- fact: { "type": "fact", "entity": "PersonName", "key": "fact_key", "value": "fact value" } + For facts about people, places, things. Use snake_case keys like "lives_in", "birthday", "job_title", "favorite_food" +- event: { "type": "event", "title": "Short title", "description": "What happened" } + For things that happened or were scheduled +- lesson: { "type": "lesson", "lesson": "The insight", "context": "Where it came from" } + For insights, lessons learned, things to remember +- decision: { "type": "decision", "title": "What was decided", "description": "Details and rationale" } + +Rules: +- Only extract NEW information explicitly stated in the conversation +- Skip greetings, meta-discussion, status updates, and routine exchanges +- Skip anything the assistant already knew (don't re-extract existing knowledge) +- If nothing is worth extracting, return an empty array [] +- Entity names should be proper names (e.g., "Eiwe" not "the user") +- Be conservative — only extract clear, concrete facts`; + try { - if (existsSync(ACTIVITY_STATE)) { - state = JSON.parse(readFileSync(ACTIVITY_STATE, 'utf8')); + const resp = await fetch("https://api.anthropic.com/v1/messages", { + method: "POST", + headers: { + "x-api-key": apiKey, + "content-type": "application/json", + "anthropic-version": "2023-06-01", + }, + body: JSON.stringify({ + model: "claude-haiku-4-20250414", + max_tokens: 1024, + system: systemPrompt, + messages: [{ role: "user", content: `Extract memories from this conversation:\n\n${transcript}` }], + }), + }); + + if (!resp.ok) { + log(`LLM error: ${resp.status}`); + return []; } - } catch (e) {} - - const today = new Date().toISOString().split('T')[0]; - const now = Date.now(); - - // Reset if new day - if (state.todayDate !== today) { - state = { activeMinutesToday: 0, lastActiveAt: null, todayDate: today, userMessages: 0, heartbeats: 0 }; + + const data = await resp.json(); + const text = data.content?.[0]?.text || "[]"; + + // Parse JSON from response (handle markdown code blocks) + const jsonMatch = text.match(/\[[\s\S]*\]/); + if (!jsonMatch) return []; + + const entries: MemoryEntry[] = JSON.parse(jsonMatch[0]); + return Array.isArray(entries) ? entries : []; + } catch (err) { + log(`LLM parse error: ${err}`); + return []; } - - if (isUserMessage) { - state.userMessages++; - if (state.lastActiveAt) { - const gap = (now - state.lastActiveAt) / 60000; - if (gap <= 5) state.activeMinutesToday += gap; +} + +function storeMemory(entry: MemoryEntry) { + try { + const esc = (s: string) => s.replace(/'/g, "'\\''"); + + switch (entry.type) { + case "fact": + if (entry.entity && entry.key && entry.value) { + execSync(`${MEMORY_DB} add-fact '${esc(entry.entity)}' '${esc(entry.key)}' '${esc(entry.value)}'`, { + timeout: 5000, + stdio: "pipe", + }); + log(`stored fact: ${entry.entity}.${entry.key} = ${entry.value}`); + } + break; + + case "event": + if (entry.title) { + const today = new Date().toISOString().split("T")[0]; + const desc = entry.description ? `--desc '${esc(entry.description)}'` : ""; + execSync(`${MEMORY_DB} log-event '${today}' '${esc(entry.title)}' ${desc}`, { + timeout: 5000, + stdio: "pipe", + }); + log(`stored event: ${entry.title}`); + } + break; + + case "lesson": + if (entry.lesson) { + const ctx = entry.context ? `'${esc(entry.context)}'` : ""; + execSync(`${MEMORY_DB} add-lesson '${esc(entry.lesson)}' ${ctx}`, { + timeout: 5000, + stdio: "pipe", + }); + log(`stored lesson: ${entry.lesson.slice(0, 60)}`); + } + break; + + case "decision": + if (entry.title) { + // Store decisions as events with a "decision:" prefix + const today = new Date().toISOString().split("T")[0]; + const desc = entry.description ? `--desc '${esc(entry.description)}'` : ""; + execSync(`${MEMORY_DB} log-event '${today}' 'Decision: ${esc(entry.title)}' ${desc}`, { + timeout: 5000, + stdio: "pipe", + }); + log(`stored decision: ${entry.title}`); + } + break; } - state.lastActiveAt = now; - } else { - state.heartbeats++; + } catch (err) { + log(`store error (${entry.type}): ${err}`); } - - writeFileSync(ACTIVITY_STATE, JSON.stringify(state, null, 2)); - appendFileSync(ACTIVITY_LOG, JSON.stringify({ timestamp: new Date().toISOString(), isUserMessage, activeMinutes: state.activeMinutesToday }) + '\n'); } -const handler = async (event) => { - const LOG = "/home/nova/clawd/logs/memory-extract-hook.log"; - const ts = new Date().toISOString(); - - appendFileSync(LOG, `${ts} | Event: ${event.type}:${event.action}\n`); - - // Track activity for cost/hour calculations - if (event.type === "message") { +const handler = async (event: any) => { + try { + // Fire on message:sent (after assistant responds) + if (event.type !== "message" || event.action !== "sent") return; + const ctx = event.context ?? {}; - const rawBody = ctx.rawBody ?? ctx.message ?? ""; - const isHeartbeat = rawBody.includes("HEARTBEAT") || rawBody.includes("DASHBOARD UPDATE") || rawBody.startsWith("System: ["); - updateActivityState(!isHeartbeat); + const sessionKey = ctx.sessionKey || event.sessionKey || ""; + + // Skip isolated/spawn sessions and subagents + if (sessionKey.includes("isolated") || sessionKey.includes("spawn") || sessionKey.includes("subagent")) return; + + const sessionId = ctx.sessionId as string | undefined; + + // Cooldown check + const state = loadState(); + const sid = sessionId || sessionKey || "default"; + const now = Date.now(); + const lastExtract = state.lastExtractTime[sid] || 0; + if (now - lastExtract < COOLDOWN_MS) return; + + // Get recent transcript + const messages = getRecentTranscript(sessionId); + if (shouldSkip(messages)) return; + + // Heuristic gate + if (!hasExtractableContent(messages)) { + return; + } + + log(`extracting from session ${sid}`); + + // Update cooldown before async work + state.lastExtractTime[sid] = now; + // Prune old entries (older than 1 day) + for (const key of Object.keys(state.lastExtractTime)) { + if (now - state.lastExtractTime[key] > 86400000) delete state.lastExtractTime[key]; + } + saveState(state); + + // Call LLM for structured extraction + const entries = await extractWithLLM(messages); + if (entries.length === 0) { + log("no entries extracted"); + return; + } + + log(`extracted ${entries.length} entries`); + + // Store each entry + for (const entry of entries) { + storeMemory(entry); + } + + log(`extraction complete: ${entries.length} entries stored`); + } catch (err) { + log(`handler error: ${err}`); + // Never throw — don't block message delivery } - - if (event.type !== "message" || event.action !== "received") return; - - const ctx = event.context ?? {}; - const rawBody = ctx.rawBody ?? ctx.message ?? ""; - if (!rawBody || rawBody.trim().length < 10) return; - - // Skip commands - if (rawBody.startsWith("/")) return; - - // Get sender info for attribution - const senderName = ctx.senderName ?? "unknown"; - const senderId = ctx.senderId ?? ""; // Phone number or UUID for unique matching - const isGroup = ctx.isGroup ?? false; - - appendFileSync(LOG, `${ts} | From: ${senderName} (${senderId}) (group: ${isGroup}) | Message: ${rawBody.substring(0, 80)}...\n`); - - // Run extraction with attribution env vars (include senderId for unique matching) - const escaped = rawBody.replace(/'/g, "'\\''"); - const envVars = `SENDER_NAME='${senderName}' SENDER_ID='${senderId}' IS_GROUP='${isGroup}'`; - - exec(`${envVars} /home/nova/clawd/scripts/process-input.sh '${escaped}'`, (err) => { - appendFileSync(LOG, `${ts} | ${err ? 'Error: ' + err.message : 'Extraction complete for ' + senderName}\n`); - }); }; export default handler; diff --git a/hooks/session-init/HOOK.md b/hooks/session-init/HOOK.md index c8067dc..bb53013 100644 --- a/hooks/session-init/HOOK.md +++ b/hooks/session-init/HOOK.md @@ -1,38 +1,30 @@ --- name: session-init -description: "Generate privacy-filtered context when session starts" +description: "Inject recent activity context from PostgreSQL into agent bootstrap" metadata: { - "clawdbot": + "openclaw": { - "emoji": "🔐", - "events": ["message:received"], + "emoji": "📋", + "events": ["agent:bootstrap"], + "requires": { "bins": ["psql", "node"] }, }, } --- # Session Init Hook -Generates privacy-filtered context based on session participants. +Automatically queries PostgreSQL for recent activity and injects a summary into the agent's bootstrap context, replacing daily log file loading. ## What It Does -When a message is received: -1. Checks if session context file is stale (>5 min old or participants changed) -2. Resolves participant phone numbers to entity IDs -3. Queries entity_facts with privacy filtering -4. Writes filtered context to `~/clawd/SESSION_CONTEXT.md` +When `agent:bootstrap` fires: -## Privacy Filtering +1. Queries `events` table for last 48 hours of activity +2. Queries `decisions` table for last 7 days +3. Queries `lessons` table for last 7 days (non-superseded) +4. Formats results and injects as `SESSION_CONTEXT.md` bootstrap file -Only includes facts where: -- `visibility = 'public'`, OR -- `source_entity_id` matches a participant (their own data), OR -- `privacy_scope` includes a participant (explicitly shared) +## Fallback -## Output - -`SESSION_CONTEXT.md` contains: -- Participant names and entity IDs -- Privacy-filtered facts from database -- Timestamp for staleness checking +If PostgreSQL is unavailable or no recent data exists, the hook silently skips. diff --git a/hooks/session-init/handler.ts b/hooks/session-init/handler.ts index 9f9fc63..f44a699 100644 --- a/hooks/session-init/handler.ts +++ b/hooks/session-init/handler.ts @@ -1,64 +1,145 @@ -import { execSync, exec } from "child_process"; -import { existsSync, statSync, readFileSync, writeFileSync } from "fs"; -import { join } from "path"; - -const CONTEXT_FILE = join(process.env.HOME || "", "clawd", "SESSION_CONTEXT.md"); -const SCRIPTS_DIR = join(process.env.HOME || "", "clawd", "scripts"); -const STALE_MINUTES = 5; - -// Track current session participants -let currentParticipantHash = ""; - -const handler = async (event) => { - if (event.type !== "message" || event.action !== "received") return; - - const ctx = event.context ?? {}; - const senderId = ctx.senderId ?? ""; - const isGroup = ctx.isGroup ?? false; - - // Skip if no sender ID - if (!senderId) return; - - // For now, just use the sender. In group chats, we'd need all participant IDs. - // TODO: Get full participant list for groups - const participants = [senderId]; - const participantHash = participants.sort().join(","); - - // Check if context needs refresh - let needsRefresh = false; - - if (!existsSync(CONTEXT_FILE)) { - needsRefresh = true; - } else { - // Check staleness - const stats = statSync(CONTEXT_FILE); - const ageMinutes = (Date.now() - stats.mtimeMs) / 1000 / 60; - if (ageMinutes > STALE_MINUTES) { - needsRefresh = true; - } - - // Check if participants changed - if (participantHash !== currentParticipantHash) { - needsRefresh = true; - } +/** + * session-init: Inject recent activity context into agent bootstrap. + * + * On agent:bootstrap, queries PostgreSQL for: + * - Events from the last 48 hours + * - Decisions from the last 7 days + * - Lessons from the last 7 days + * Formats and injects as SESSION_CONTEXT.md bootstrap file. + */ + +import { execSync } from "child_process"; +import { appendFileSync } from "fs"; + +const LOG_FILE = "/home/openclaw/.openclaw/workspace/db/session-init.log"; + +function log(msg: string) { + try { + appendFileSync(LOG_FILE, `${new Date().toISOString()} ${msg}\n`); + } catch {} +} + +function pgAvailable(): boolean { + try { + execSync("pg_isready -q", { timeout: 2000 }); + return true; + } catch { + return false; } - - if (!needsRefresh) return; - - // Update participant hash - currentParticipantHash = participantHash; - - // Generate new context (async to not block message processing) - const scriptPath = join(SCRIPTS_DIR, "generate-session-context.sh"); - const args = participants.map(p => `"${p}"`).join(" "); - - exec(`"${scriptPath}" "${CONTEXT_FILE}" ${args}`, (err) => { - if (err) { - console.error(`[session-init] Error generating context: ${err.message}`); - } else { - console.log(`[session-init] Context refreshed for participants: ${participantHash}`); - } +} + +function pgQuery(sql: string): any[] { + const script = ` + const { Pool } = require('pg'); + const pool = new Pool({ database: 'auri_memory', host: '/var/run/postgresql' }); + (async () => { + const { rows } = await pool.query(${JSON.stringify(sql)}); + console.log(JSON.stringify(rows)); + await pool.end(); + })().catch(() => { process.stdout.write('[]'); process.exit(0); }); + `; + try { + const result = execSync(`node -e ${JSON.stringify(script)}`, { + encoding: "utf-8", + timeout: 8000, + cwd: "/home/openclaw/.openclaw/workspace", + }).trim(); + return JSON.parse(result || "[]"); + } catch { + return []; + } +} + +function formatEvents(rows: any[]): string { + if (!rows.length) return ""; + let out = "## Recent Events (48h)\n\n"; + for (const r of rows) { + const date = r.event_date?.slice(0, 10) || "?"; + const time = r.event_time?.slice(0, 5) || ""; + const desc = r.description ? `: ${r.description.slice(0, 200)}` : ""; + out += `- **${date}${time ? " " + time : ""}** ${r.title}${desc}\n`; + } + return out + "\n"; +} + +function formatDecisions(rows: any[]): string { + if (!rows.length) return ""; + let out = "## Recent Decisions (7d)\n\n"; + for (const r of rows) { + const date = r.decided_at?.slice(0, 10) || "?"; + const ctx = r.context ? ` (${r.context.slice(0, 120)})` : ""; + out += `- **${date}** ${r.decision}${ctx}\n`; + } + return out + "\n"; +} + +function formatLessons(rows: any[]): string { + if (!rows.length) return ""; + let out = "## Active Lessons (7d)\n\n"; + for (const r of rows) { + const ctx = r.context ? ` — ${r.context.slice(0, 120)}` : ""; + out += `- ${r.lesson}${ctx}\n`; + } + return out + "\n"; +} + +const handler = async (event: any) => { + if (event.type !== "agent" || event.action !== "bootstrap") return; + + const ctx = event.context; + if (!ctx?.bootstrapFiles) return; + + const sessionKey = ctx.sessionKey || event.sessionKey || ""; + if (sessionKey.includes("isolated") || sessionKey.includes("spawn")) return; + + if (!pgAvailable()) { + log("skipped: PG unavailable"); + return; + } + + const events = pgQuery( + `SELECT event_date::text, event_time::text, title, description + FROM events + WHERE event_date >= (CURRENT_DATE - INTERVAL '2 days') + ORDER BY event_date DESC, event_time DESC NULLS LAST + LIMIT 30` + ); + + const decisions = pgQuery( + `SELECT decided_at::text, decision, context + FROM decisions + WHERE decided_at >= (NOW() - INTERVAL '7 days') + ORDER BY decided_at DESC + LIMIT 10` + ); + + const lessons = pgQuery( + `SELECT lesson, context + FROM lessons + WHERE learned_at >= (NOW() - INTERVAL '7 days') + AND superseded_by IS NULL + ORDER BY learned_at DESC + LIMIT 10` + ); + + if (!events.length && !decisions.length && !lessons.length) { + log("skipped: no recent data"); + return; + } + + let content = "# Session Context\n"; + content += "*Auto-generated from PostgreSQL memory. Recent activity summary.*\n\n"; + content += formatEvents(events); + content += formatDecisions(decisions); + content += formatLessons(lessons); + + (ctx.bootstrapFiles as any[]).push({ + name: "SESSION_CONTEXT.md", + content, + missing: false, }); + + log(`injected: ${events.length} events, ${decisions.length} decisions, ${lessons.length} lessons`); }; export default handler; diff --git a/schema.sql b/schema.sql index be4a768..72908eb 100644 --- a/schema.sql +++ b/schema.sql @@ -5597,3 +5597,89 @@ CREATE EVENT TRIGGER schema_change_trigger ON ddl_command_end \unrestrict ft6e6VEhwIjHLqwKSzFuygOFIO1cXGO3Wra820aceqB9mDli6mmOOKVwko0fEzl + +-- +-- Name: documents; Type: TABLE; Schema: public; Owner: - +-- + +CREATE TABLE public.documents ( + id integer NOT NULL, + path text NOT NULL, + title character varying(255), + doc_type character varying(50), + description text, + tags text[], + created_at timestamp with time zone DEFAULT now(), + updated_at timestamp with time zone DEFAULT now() +); + + +-- +-- Name: TABLE documents; Type: COMMENT; Schema: public; Owner: - +-- + +COMMENT ON TABLE public.documents IS 'Registry of workspace documents, configs, and knowledge files. Tracks what files exist and their purpose for cross-session discovery.'; + + +-- +-- Name: documents_id_seq; Type: SEQUENCE; Schema: public; Owner: - +-- + +CREATE SEQUENCE public.documents_id_seq + AS integer + START WITH 1 + INCREMENT BY 1 + NO MINVALUE + NO MAXVALUE + CACHE 1; + + +-- +-- Name: documents_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: - +-- + +ALTER SEQUENCE public.documents_id_seq OWNED BY public.documents.id; + + +-- +-- Name: documents id; Type: DEFAULT; Schema: public; Owner: - +-- + +ALTER TABLE ONLY public.documents ALTER COLUMN id SET DEFAULT nextval('public.documents_id_seq'::regclass); + + +-- +-- Name: documents documents_pkey; Type: CONSTRAINT; Schema: public; Owner: - +-- + +ALTER TABLE ONLY public.documents + ADD CONSTRAINT documents_pkey PRIMARY KEY (id); + + +-- +-- Name: documents documents_path_key; Type: CONSTRAINT; Schema: public; Owner: - +-- + +ALTER TABLE ONLY public.documents + ADD CONSTRAINT documents_path_key UNIQUE (path); + + +-- +-- Name: idx_documents_doc_type; Type: INDEX; Schema: public; Owner: - +-- + +CREATE INDEX idx_documents_doc_type ON public.documents USING btree (doc_type); + + +-- +-- Name: idx_documents_path; Type: INDEX; Schema: public; Owner: - +-- + +CREATE INDEX idx_documents_path ON public.documents USING btree (path); + + +-- +-- Name: idx_documents_tags; Type: INDEX; Schema: public; Owner: - +-- + +CREATE INDEX idx_documents_tags ON public.documents USING gin (tags); From 9f53c7edde2cfaa49af7b78394ea24c374ee8ab3 Mon Sep 17 00:00:00 2001 From: Auri Wren Date: Mon, 9 Feb 2026 19:25:55 +0000 Subject: [PATCH 2/4] Fix critical issues from code review - memory-extract: Replace execSync with execFileSync to prevent command injection from LLM-generated content (CVE-grade fix) - session-init: Switch from node -e/require(pg) to psql for queries, reducing bootstrap overhead from ~1s to ~150ms - Both hooks: Use HOME/WORKSPACE env vars instead of hardcoded paths - Remove redundant idx_documents_path (UNIQUE constraint already indexes) - memory-extract: Use tail for efficient transcript reading --- hooks/memory-extract/handler.ts | 53 +++++++++++++--------------- hooks/session-init/handler.ts | 62 +++++++++++++++------------------ schema.sql | 1 - 3 files changed, 53 insertions(+), 63 deletions(-) diff --git a/hooks/memory-extract/handler.ts b/hooks/memory-extract/handler.ts index db1e9c3..3967030 100644 --- a/hooks/memory-extract/handler.ts +++ b/hooks/memory-extract/handler.ts @@ -10,14 +10,16 @@ * Never blocks message delivery. All errors are swallowed and logged. */ -import { execSync } from "child_process"; +import { execSync, execFileSync } from "child_process"; import { readFileSync, existsSync, readdirSync, statSync, appendFileSync, writeFileSync } from "fs"; import { join } from "path"; -const MEMORY_DB = "/home/openclaw/.openclaw/workspace/tools/memory-db"; -const SESSIONS_DIR = "/home/openclaw/.openclaw/agents/main/sessions"; -const LOG_FILE = "/home/openclaw/.openclaw/workspace/db/memory-extract.log"; -const STATE_FILE = "/home/openclaw/.openclaw/workspace/db/memory-extract-state.json"; +const HOME = process.env.HOME || "/home/openclaw"; +const WORKSPACE = process.env.OPENCLAW_WORKSPACE || join(HOME, ".openclaw/workspace"); +const MEMORY_DB = join(WORKSPACE, "tools/memory-db"); +const SESSIONS_DIR = join(HOME, ".openclaw/agents/main/sessions"); +const LOG_FILE = join(WORKSPACE, "db/memory-extract.log"); +const STATE_FILE = join(WORKSPACE, "db/memory-extract-state.json"); const COOLDOWN_MS = 5 * 60 * 1000; // 5 minutes between extractions const MIN_MESSAGE_LENGTH = 80; @@ -97,8 +99,13 @@ function getRecentTranscript(sessionId?: string, lineCount = 20): string[] { if (!sessionFile) return []; - const content = readFileSync(sessionFile, "utf-8"); - const lines = content.trim().split("\n").slice(-lineCount); + let content: string; + try { + content = execSync(`tail -n ${lineCount} ${JSON.stringify(sessionFile)}`, { encoding: "utf-8" }).trim(); + } catch { + content = readFileSync(sessionFile, "utf-8").trim(); + } + const lines = content.split("\n"); const messages: string[] = []; for (const line of lines) { @@ -225,15 +232,12 @@ Rules: function storeMemory(entry: MemoryEntry) { try { - const esc = (s: string) => s.replace(/'/g, "'\\''"); + const execOpts = { timeout: 5000, stdio: "pipe" as const }; switch (entry.type) { case "fact": if (entry.entity && entry.key && entry.value) { - execSync(`${MEMORY_DB} add-fact '${esc(entry.entity)}' '${esc(entry.key)}' '${esc(entry.value)}'`, { - timeout: 5000, - stdio: "pipe", - }); + execFileSync(MEMORY_DB, ["add-fact", entry.entity, entry.key, entry.value], execOpts); log(`stored fact: ${entry.entity}.${entry.key} = ${entry.value}`); } break; @@ -241,35 +245,28 @@ function storeMemory(entry: MemoryEntry) { case "event": if (entry.title) { const today = new Date().toISOString().split("T")[0]; - const desc = entry.description ? `--desc '${esc(entry.description)}'` : ""; - execSync(`${MEMORY_DB} log-event '${today}' '${esc(entry.title)}' ${desc}`, { - timeout: 5000, - stdio: "pipe", - }); + const args = ["log-event", today, entry.title]; + if (entry.description) args.push(entry.description); + execFileSync(MEMORY_DB, args, execOpts); log(`stored event: ${entry.title}`); } break; case "lesson": if (entry.lesson) { - const ctx = entry.context ? `'${esc(entry.context)}'` : ""; - execSync(`${MEMORY_DB} add-lesson '${esc(entry.lesson)}' ${ctx}`, { - timeout: 5000, - stdio: "pipe", - }); + const args = ["add-lesson", entry.lesson]; + if (entry.context) args.push(entry.context); + execFileSync(MEMORY_DB, args, execOpts); log(`stored lesson: ${entry.lesson.slice(0, 60)}`); } break; case "decision": if (entry.title) { - // Store decisions as events with a "decision:" prefix const today = new Date().toISOString().split("T")[0]; - const desc = entry.description ? `--desc '${esc(entry.description)}'` : ""; - execSync(`${MEMORY_DB} log-event '${today}' 'Decision: ${esc(entry.title)}' ${desc}`, { - timeout: 5000, - stdio: "pipe", - }); + const args = ["log-event", today, `Decision: ${entry.title}`]; + if (entry.description) args.push(entry.description); + execFileSync(MEMORY_DB, args, execOpts); log(`stored decision: ${entry.title}`); } break; diff --git a/hooks/session-init/handler.ts b/hooks/session-init/handler.ts index f44a699..80afc7d 100644 --- a/hooks/session-init/handler.ts +++ b/hooks/session-init/handler.ts @@ -10,8 +10,12 @@ import { execSync } from "child_process"; import { appendFileSync } from "fs"; +import { join } from "path"; -const LOG_FILE = "/home/openclaw/.openclaw/workspace/db/session-init.log"; +const HOME = process.env.HOME || "/home/openclaw"; +const WORKSPACE = process.env.OPENCLAW_WORKSPACE || join(HOME, ".openclaw/workspace"); +const LOG_FILE = join(WORKSPACE, "db/session-init.log"); +const DB_NAME = "auri_memory"; function log(msg: string) { try { @@ -29,56 +33,45 @@ function pgAvailable(): boolean { } function pgQuery(sql: string): any[] { - const script = ` - const { Pool } = require('pg'); - const pool = new Pool({ database: 'auri_memory', host: '/var/run/postgresql' }); - (async () => { - const { rows } = await pool.query(${JSON.stringify(sql)}); - console.log(JSON.stringify(rows)); - await pool.end(); - })().catch(() => { process.stdout.write('[]'); process.exit(0); }); - `; try { - const result = execSync(`node -e ${JSON.stringify(script)}`, { - encoding: "utf-8", - timeout: 8000, - cwd: "/home/openclaw/.openclaw/workspace", - }).trim(); - return JSON.parse(result || "[]"); + const result = execSync( + `psql -d ${DB_NAME} -h /var/run/postgresql -t -A -F '\t' -c ${JSON.stringify(sql)}`, + { encoding: "utf-8", timeout: 5000, stdio: ["pipe", "pipe", "pipe"] } + ).trim(); + if (!result) return []; + return result.split("\n").map(line => line.split("\t")); } catch { return []; } } -function formatEvents(rows: any[]): string { +function formatEvents(rows: any[][]): string { if (!rows.length) return ""; let out = "## Recent Events (48h)\n\n"; - for (const r of rows) { - const date = r.event_date?.slice(0, 10) || "?"; - const time = r.event_time?.slice(0, 5) || ""; - const desc = r.description ? `: ${r.description.slice(0, 200)}` : ""; - out += `- **${date}${time ? " " + time : ""}** ${r.title}${desc}\n`; + for (const [eventDate, eventTime, title, description] of rows) { + const time = eventTime ? ` ${eventTime}` : ""; + const desc = description ? `: ${description.slice(0, 200)}` : ""; + out += `- **${eventDate || "?"}${time}** ${title}${desc}\n`; } return out + "\n"; } -function formatDecisions(rows: any[]): string { +function formatDecisions(rows: any[][]): string { if (!rows.length) return ""; let out = "## Recent Decisions (7d)\n\n"; - for (const r of rows) { - const date = r.decided_at?.slice(0, 10) || "?"; - const ctx = r.context ? ` (${r.context.slice(0, 120)})` : ""; - out += `- **${date}** ${r.decision}${ctx}\n`; + for (const [decidedAt, decision, context] of rows) { + const ctx = context ? ` (${context.slice(0, 120)})` : ""; + out += `- **${decidedAt || "?"}** ${decision}${ctx}\n`; } return out + "\n"; } -function formatLessons(rows: any[]): string { +function formatLessons(rows: any[][]): string { if (!rows.length) return ""; let out = "## Active Lessons (7d)\n\n"; - for (const r of rows) { - const ctx = r.context ? ` — ${r.context.slice(0, 120)}` : ""; - out += `- ${r.lesson}${ctx}\n`; + for (const [lesson, context] of rows) { + const ctx = context ? ` — ${context.slice(0, 120)}` : ""; + out += `- ${lesson}${ctx}\n`; } return out + "\n"; } @@ -90,7 +83,7 @@ const handler = async (event: any) => { if (!ctx?.bootstrapFiles) return; const sessionKey = ctx.sessionKey || event.sessionKey || ""; - if (sessionKey.includes("isolated") || sessionKey.includes("spawn")) return; + if (sessionKey.includes("isolated") || sessionKey.includes("spawn") || sessionKey.includes("subagent")) return; if (!pgAvailable()) { log("skipped: PG unavailable"); @@ -98,7 +91,7 @@ const handler = async (event: any) => { } const events = pgQuery( - `SELECT event_date::text, event_time::text, title, description + `SELECT event_date::text, to_char(event_time, 'HH24:MI'), title, description FROM events WHERE event_date >= (CURRENT_DATE - INTERVAL '2 days') ORDER BY event_date DESC, event_time DESC NULLS LAST @@ -106,7 +99,7 @@ const handler = async (event: any) => { ); const decisions = pgQuery( - `SELECT decided_at::text, decision, context + `SELECT decided_at::date::text, decision, context FROM decisions WHERE decided_at >= (NOW() - INTERVAL '7 days') ORDER BY decided_at DESC @@ -118,6 +111,7 @@ const handler = async (event: any) => { FROM lessons WHERE learned_at >= (NOW() - INTERVAL '7 days') AND superseded_by IS NULL + AND confidence > 0.3 ORDER BY learned_at DESC LIMIT 10` ); diff --git a/schema.sql b/schema.sql index 72908eb..f3ae6f3 100644 --- a/schema.sql +++ b/schema.sql @@ -5675,7 +5675,6 @@ CREATE INDEX idx_documents_doc_type ON public.documents USING btree (doc_type); -- Name: idx_documents_path; Type: INDEX; Schema: public; Owner: - -- -CREATE INDEX idx_documents_path ON public.documents USING btree (path); -- From 34bd32604fe4db43d63dcd8f9262361de755b836 Mon Sep 17 00:00:00 2001 From: Auri Wren Date: Mon, 9 Feb 2026 19:34:37 +0000 Subject: [PATCH 3/4] Move Why section to top of README with detailed motivation Explains the fundamental problems with file-based memory (scaling, queryability, dual-write, structure enforcement, discoverability) and why PostgreSQL is the right foundation for AI assistant memory. --- README.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/README.md b/README.md index 75c715f..4db5b4f 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,26 @@ A PostgreSQL-based long-term memory system for AI assistants, with natural language extraction scripts. +## Why This Exists + +AI assistants have a fundamental problem: they forget everything between sessions. Every conversation starts from zero. The workaround most people reach for is flat files: markdown logs, daily journals, a MEMORY.md that grows until it blows out the context window. That works for a while, then it doesn't. + +Here's what breaks: + +**Files don't scale.** A daily log from last Tuesday is 15KB. Multiply that across weeks and you're burning context tokens on information that's 95% irrelevant to the current conversation. You can't selectively load "just the facts about this person" from a wall of chronological text. + +**Files aren't queryable.** "What did Eiwe decide about the Vietnam hotel?" requires reading every daily log until you find it, or hoping your keyword search hits. A `SELECT decision, context FROM decisions WHERE decided_at > '2026-02-01'` returns the answer in milliseconds. + +**Files create dual-write problems.** Information lives in daily logs AND memory summaries AND reference docs. When something changes, you update one and forget the others. Now your memory contradicts itself. + +**Files can't enforce structure.** An entity fact stored as a line in a markdown file has no schema, no confidence score, no timestamp, no source attribution, no visibility controls. You can't decay stale facts, detect contradictions, or control what gets shared in group contexts. + +**Files are invisible across sessions.** If you create `memory/research-humidifiers.md` in one session, a future session has no way to know it exists unless it happens to search for the right keywords. There's no registry, no index, no discoverability layer. + +PostgreSQL solves all of these. Structured tables for entities, facts, events, decisions, and lessons. pgvector for semantic search across everything. Row-level confidence scores that decay over time. Privacy-scoped facts that filter based on who's in the conversation. A document registry so files don't become orphans. And hooks that automatically extract memories from conversations and hydrate context at session start, so none of this requires manual discipline. + +This isn't about replacing files entirely. Files are still great for reference documents, research notes, and human-readable archives. But the *memory system*, the thing that makes an AI assistant actually remember and reason about what it knows, belongs in a database. + **For AI Entities:** This system helps you remember things across sessions by storing structured memories in a database. Follow the Quick Start below to set it up. ## Quick Start (For AI Assistants) From 6183d698dd4da9f8d4b86c2205c26b63a3dc9402 Mon Sep 17 00:00:00 2001 From: Auri Wren Date: Mon, 9 Feb 2026 19:35:55 +0000 Subject: [PATCH 4/4] Revert "Move Why section to top of README with detailed motivation" This reverts commit 34bd32604fe4db43d63dcd8f9262361de755b836. --- README.md | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/README.md b/README.md index 4db5b4f..75c715f 100644 --- a/README.md +++ b/README.md @@ -2,26 +2,6 @@ A PostgreSQL-based long-term memory system for AI assistants, with natural language extraction scripts. -## Why This Exists - -AI assistants have a fundamental problem: they forget everything between sessions. Every conversation starts from zero. The workaround most people reach for is flat files: markdown logs, daily journals, a MEMORY.md that grows until it blows out the context window. That works for a while, then it doesn't. - -Here's what breaks: - -**Files don't scale.** A daily log from last Tuesday is 15KB. Multiply that across weeks and you're burning context tokens on information that's 95% irrelevant to the current conversation. You can't selectively load "just the facts about this person" from a wall of chronological text. - -**Files aren't queryable.** "What did Eiwe decide about the Vietnam hotel?" requires reading every daily log until you find it, or hoping your keyword search hits. A `SELECT decision, context FROM decisions WHERE decided_at > '2026-02-01'` returns the answer in milliseconds. - -**Files create dual-write problems.** Information lives in daily logs AND memory summaries AND reference docs. When something changes, you update one and forget the others. Now your memory contradicts itself. - -**Files can't enforce structure.** An entity fact stored as a line in a markdown file has no schema, no confidence score, no timestamp, no source attribution, no visibility controls. You can't decay stale facts, detect contradictions, or control what gets shared in group contexts. - -**Files are invisible across sessions.** If you create `memory/research-humidifiers.md` in one session, a future session has no way to know it exists unless it happens to search for the right keywords. There's no registry, no index, no discoverability layer. - -PostgreSQL solves all of these. Structured tables for entities, facts, events, decisions, and lessons. pgvector for semantic search across everything. Row-level confidence scores that decay over time. Privacy-scoped facts that filter based on who's in the conversation. A document registry so files don't become orphans. And hooks that automatically extract memories from conversations and hydrate context at session start, so none of this requires manual discipline. - -This isn't about replacing files entirely. Files are still great for reference documents, research notes, and human-readable archives. But the *memory system*, the thing that makes an AI assistant actually remember and reason about what it knows, belongs in a database. - **For AI Entities:** This system helps you remember things across sessions by storing structured memories in a database. Follow the Quick Start below to set it up. ## Quick Start (For AI Assistants)