feat: hindsight-agent CLI + self-learning agent skills by nicoloboschi · Pull Request #1191 · vectorize-io/hindsight

nicoloboschi · 2026-04-22T07:30:36Z

Summary

hindsight-agent: New Python CLI for agent scaffolding and runtime. One command (setup) creates the Hindsight bank, knowledge pages, ingests reference docs, configures the harness agent, and installs the skill. Runtime commands (pages, recall, documents, retain) let the agent and plugin interact with Hindsight using just an agent ID — no bank/URL/KB knowledge needed.
OpenClaw plugin: Lightweight retain plugin that reads ~/.hindsight-agent/config.json for bank resolution, filters to user/assistant text, and retains async. No child_process, no complex routing logic.
agent-knowledge skill: Updated with recall, documents, source_query patterns, and delta mode. Auto-loaded at session startup via AGENTS.md patch.
Control plane UI: KB selector in mental models view, KB badge on cards, KB field in create dialog, kb_id in API responses and bank templates.
Bank templates: kb_id support on BankTemplateMentalModel so templates can pre-populate KB-scoped pages.

Architecture

[hindsight-agent setup] → creates bank + KB + pages + ingests docs + configures harness
[OpenClaw plugin]       → retains conversations (reads config.json → async POST)
[agent skill]           → reads pages at startup, creates pages, recalls memories
[consolidation]         → extracts observations → refreshes pages via source_query (delta)

The agent decides what to track (page creation). The system handles capture (plugin) and synthesis (consolidation + refresh).

Test plan

hindsight-agent setup with --template and --content creates bank, KB, pages, ingests docs
hindsight-agent pages list/create/update/delete works
hindsight-agent recall returns relevant memories
OpenClaw plugin retains conversations async
Control plane KB selector filters mental models
Bank template kb_id assigns pages to KB on import
Agent reads skill at session startup and executes mandatory sequence

New skill that gives the agent a `memory/` directory in its workspace where it creates and maintains markdown files — one per topic — as a growing wiki of everything it learned across sessions. Key design choices: - Pure files, no external dependency. The agent is the only writer. - One file per topic (user preferences, procedures, setup facts, etc.) - Every fact must have an `## Evidence` section with dated entries citing the conversation/event that established it. - Updates are in-place rewrites (current state of knowledge), not append-only logs. Superseded evidence is struck through, not deleted. - Agent reads relevant memory files at the start of each task and treats them as ground truth unless the user contradicts them. - Write-after-act: the agent finishes the user's task first, then updates memory. No mid-task interruptions for note-taking. This is intentionally independent of Hindsight — it's the agent's own procedural memory, self-organized, with built-in provenance via the evidence trail. Can coexist with Hindsight-backed skills (which handle semantic recall / consolidation / mental models) or stand alone.

Knowledge files track what the agent knows (preferences, rules, setup). Activity logs track what the agent did (feed items delivered, sweep results, user reactions). Append-only, dated entries, pruned at ~30 days. The agent reads the log before each task run and skips items it already delivered. Solves the "showed the same headline twice" problem and gives the agent a "what did I show you yesterday" answer.

…ional

…lf-review Four additions to push the file-based approach to its ceiling: 1. Git: memory dir is a git repo, auto-committed after every write. Full diff history, rollback, blame — answers "when did this change" without relying on the evidence section alone. 2. Index file: _index.md maintained by the agent, one line per file with a short description. Agent reads the index first, then only the relevant files — scales retrieval to ~100 files before breaking. 3. Strict post-response writes: the skill now mandates respond first, write memory second, commit third. Memory updates are a visible post-script the user can see, not a silent side-effect or a blocker. 4. Periodic self-review: every ~10 sessions (or on user request) the agent reads ALL files and does a consolidation pass — resolve contradictions, merge duplicates, split overgrown files, prune stale log entries, flag facts without evidence. Committed as a single "memory review" git commit. Also: evidence supersession now shown inline with ~~strikethrough~~ AND preserved in git history, so both quick-scan and deep-dig work.

…ite order Incorporates the agent's own feedback on where the skill was ambiguous: - If _index.md is missing but files exist, the agent MUST create it during the post-response phase of that same turn — not "later" or "next session". Leaving the directory without an index is a bug. - Duplicate topic files (e.g. preferences.md + news-feed-preferences.md) are treated as hygiene errors: merge immediately during post-response, don't wait for the periodic self-review. - Write order now has 4 explicit numbered steps: 1. Respond (user sees their answer) 2. Repair (missing index, duplicates, name normalization) 3. Update (knowledge files + activity logs) 4. Commit (git) - Clarified that structural reads (checking if _index.md exists) are allowed at any time, but all writes (repairs included) happen after the user's response is delivered.

… structure

The agent knows the write rules but drops the post-response steps because the LLM's natural stopping point is after the response. Adding more prose to the "write after responding" section doesn't help — the agent already read and agreed to the rule, then forgot. Fix: force the agent to print a visible checklist as the LAST thing it outputs every turn: 📝 Memory: [wrote: <files> | logged: <yes/no> | committed: <yes/no>] If the agent completed a task run and the checklist says "logged: no", it has a bug and must fix it before ending the turn. The checklist is a self-check that makes skipping the write step visibly wrong — both to the agent (it has to fill in the fields) and to the user (they can see if logging was missed).

…gap analysis

New `knowledge_bases` table + KB CRUD in engine + HTTP endpoints + `kb_id` filter on list_mental_models. See commit body for details.

- Wire kb_id through create_mental_model engine method + HTTP endpoint - Include kb_id in _row_to_mental_model output - Include kb_id in list_mental_models SELECT - Add scripts/hindsight-mount: renders a bank's mental models (optionally filtered by KB) as local markdown files at ~/.agent-knowledge/<bank>/ with _index.md for browsing. Agent reads files with normal tools.

Fix NameError in KB HTTP endpoints — was using bare `engine` instead of `app.state.memory` (copy-paste error from the endpoint template). New skill: `agent-knowledge` — the Hindsight-backed read-only counterpart to `agent-memory`. Agent mounts its KB as local files via `hindsight-mount`, browses with normal tools, never writes. The system handles all knowledge maintenance through the retain → consolidation → KB update → MM refresh pipeline.

- Fix consolidator: use `llm_config.call()` instead of nonexistent `call_llm` import — the knowledge_base_update was crashing with ImportError on every consolidation run - Pipeline tests: handle `no_new_memories` status (background worker may process memories before the test's explicit consolidation call) - FINDINGS.md: document the current approach (KB + direct CLI reads), architecture diagram, what's implemented vs not, comparison table

…rvations

…ask output pages

…ricter prompt

…n same call

Drop auto-create from the KB pipeline. The agent decides what pages to create (same judgment it used with local files), the system keeps them current via consolidation + refresh_after_consolidation. The agent calls `mental-model create` with a good source_query when it discovers a recurring topic. The system never creates pages on its own — it only refreshes existing ones. This fixes the junk-page problem: a cheap LLM reading decontextualized observations can't distinguish signal from noise. The agent can, because it has the full conversation context.

…ete pages

…erywhere

…y each choice

- hindsight-agent: Python CLI for agent scaffolding and runtime - setup: one-shot onboarding (bank, KB, skill, harness config, content ingestion) - retain: pipe content to Hindsight (always async) - pages: list/get/create/update/delete knowledge pages - recall: search memories for ad-hoc research - documents: list retained reference docs - --template: import bank template at setup - --content: ingest directory of reference docs at setup - OpenClaw plugin: lightweight retain plugin - Reads ~/.hindsight-agent/config.json for bank/URL resolution - Filters to user/assistant text only - Retains async via Hindsight API - Control plane: Knowledge Base UI - KB selector in mental models view - KB badge on cards, KB field in create dialog - kb_id in MentalModelResponse and BankTemplateMentalModel - agent-knowledge skill: recall, documents, source_query patterns - Remove broken _run_knowledge_base_updates reference from consolidator

KB added complexity without value — each agent already has its own bank, which provides the isolation boundary. The bank IS the grouping. Removed from: - API: KB CRUD endpoints, kb_id on MentalModelResponse, CreateMentalModelRequest, BankTemplateMentalModel, list_mental_models kb filter - Engine: KB CRUD methods, _row_to_knowledge_base, kb_id on create/list MM, _run_knowledge_base_updates consolidation step - Control plane: KB selector, KB badge, KB in create dialog, listKnowledgeBases, kb route proxy - hindsight-agent: kb_id from config/api/setup/pages/template Migration and test files left in place (DB will be erased).

…o templates - Remove KB migration, tests, Rust CLI kb command and types - Reset Rust CLI to main (only changes were KB-related) - Remove agent-memory skill (superseded by agent-knowledge) - Remove hindsight-mount script (superseded by direct CLI reads) - Remove DEMO.md and marketing-seo templates (moved to nicolo-agents repo)

- Add --api-token to setup (also reads HINDSIGHT_API_TOKEN env var) - Store api_token in agent config, pass to all API calls (Bearer auth) - OpenClaw plugin reads and passes token from config - Sync skills/agent-knowledge/SKILL.md with hindsight-agent/skill/SKILL.md

… hindsight-agent/skill/)

…s for self-learning agents Technical draft covering: unreliable writer problem, capture/synthesis separation, source_query abstraction, comparison with file-based memory, pipeline auto-creation, and Memento-Skills. References: Karpathy LLM Wiki, MemGPT, Reflexion, Voyager, CoALA.

…swers Comprehensive reviewer anticipation: empirical methodology gaps, missing experiments, failure modes, architectural positions, related work (Generative Agents, MemoryBank, A-MEM, practitioner tools), framing/positioning, deployment data. Each question has a drafted answer based on our actual experience building this.

Resolved in PAPER_DRAFT.md: - Added Generative Agents [8] as key related work with explicit comparison - Added MemoryBank [9], A-MEM [10], practitioner tools (mem0, Letta, Zep) - Qualified "100% capture" as "deterministic for completed sessions" - Added failure mode analysis (Section 5.6): overlapping pages, contradictions, bad queries, adversarial content, unbounded accumulation - Added privacy/deletion discussion (Section 5.7) - Added cost analysis (Section 5.8) - Added page discovery, versioning, pages vs recall sections - Softened convergence claim to "empirically observed" - Softened steerable claim to "observed in practice, formal analysis needed" - Added thesis statement to introduction - Standardized terminology: "synthesis query" (not source_query), "knowledge page" - Sharpened "is this a new paradigm" argument (Section 5.5) - Added production anecdote to Section 5.3 - Expanded limitations to 9 items with specific future work Remaining in OPEN_QUESTIONS.md (12 items, all need experiments/data): - Hard blockers: end-to-end quality benchmark, cross-session transfer numbers - Should do: formalized write reliability, extraction faithfulness, real demo data - Can acknowledge: phrasing stability, delta vs full, scaling, latency distribution

- New memory provider plugin at plugin/hermes/ implementing MemoryProvider ABC - Buffers turns via sync_turn, retains on session end (async) - Reads ~/.hindsight-agent/config.json for bank/URL/token - No tools or prefetch — skill handles reads via CLI - setup --harness hermes: installs skill to ~/.hermes/skills/, copies plugin to ~/.hermes/plugins/hindsight-agent/ - Updated README with Hermes plugin docs

- Plugin name uses underscore (hindsight_agent) to match Hermes conventions - Install to ~/.hermes/hermes-agent/plugins/memory/ (not ~/.hermes/plugins/) - Fallback agent resolution: if agent_identity doesn't match, find any hermes-harness agent in config - Auto-set memory.provider via hermes config set - Tested end-to-end: setup → chat → retain → recall works

… instructions

- New 'list' command shows all configured agents - Skill tells agent how to find its own ID (baked-in → profile name → list) - Fixes shared-skill problem in Hermes multi-profile setups

… without calling on_session_end

… 2min)

…ances, buffer is lost

…ering

…r large content

nicoloboschi added 30 commits April 21, 2026 15:46

fix(agent-memory): use ~/.agent-memory/<agent-name>/ outside workspace

93b8766

fix(agent-memory): make activity logging a mandatory trigger, not opt…

9e2c642

…ional

fix(agent-memory): memory is internal — never ask the user about file…

1087bfb

… structure

docs: add FINDINGS.md — self-improving agent skill research log

ee1e266

docs(FINDINGS): add requirements for the external system + Hindsight …

7644300

…gap analysis

docs(FINDINGS): add LLM Wiki Maintainer PoC spec (Karpathy-inspired)

0b83cec

docs(FINDINGS): Knowledge Base (KB) feature proposal for Hindsight

9d5d9bd

feat: add Knowledge Base (KB) entity — CRUD, migration, API endpoints

b2e65d2

New `knowledge_bases` table + KB CRUD in engine + HTTP endpoints + `kb_id` filter on list_mental_models. See commit body for details.

feat(cli): add hindsight kb subcommand — create/list/get/update/delete

2d14a7d

feat: knowledge_base_update operation + 13 KB tests

1e3cdc4

fix(agent-knowledge): agent bootstraps its own KB on first session

db1b6ae

feat(agent-knowledge): drop mount, use direct CLI calls for live reads

b3a017a

fix: KB update prompt — source_query must be a question, not raw obse…

a53ced8

…rvations

fix: KB-created MMs use exclude_mental_models=true + mode=delta

328c54d

fix: KB update prompt — strict mission scope, no agent internals or t…

6f5db0a

…ask output pages

fix: KB auto-create — default is [], require 3+ observations, much st…

638198a

…ricter prompt

fix: pre-filter noise observations before KB update LLM call

843d448

revert: remove code-level observation filter, rely on prompt only

299c534

feat: KB update can now DROP duplicate/junk pages + CREATE new ones i…

9457e5e

…n same call

feat(agent-knowledge): agent can update source_query, rename, and del…

77f1163

…ete pages

fix(agent-knowledge): use API for page create so kb_id is set

cb8c700

nicoloboschi added 29 commits April 21, 2026 15:47

feat(cli): --kb flag on mental-model list + create; skill uses CLI ev…

0d10105

…erywhere

docs: rewrite FINDINGS.md — key discoveries, current architecture, wh…

d3146ab

…y each choice

docs: add hindsight-agent README with all commands and deployment modes

1582a32

cleanup: remove duplicate agent-knowledge skill (canonical copy is in…

8f56c22

… hindsight-agent/skill/)

docs: add self-learning loop Excalidraw diagram

36442e0

fix(hermes): strict multi-profile resolution, warn on ambiguous match

9879519

fix: hermes setup shows correct chat command instead of gateway restart

80fc8fa

fix(hermes): auto-create profile, profile-scoped config, correct chat…

573c60d

… instructions

feat: health check before setup, clear error if Hindsight is unreachable

ec67d9d

fix(hermes): resolve real HOME at import time, not at runtime

f38eda3

feat: add 'hindsight-agent list' command, skill self-resolves agent ID

d11d77d

- New 'list' command shows all configured agents - Skill tells agent how to find its own ID (baked-in → profile name → list) - Fixes shared-skill problem in Hermes multi-profile setups

fix: create_page defaults to observation-only fact type

28e499e

feat(hermes): auto-patch SOUL.md, improve capture guidance in skill

10d4b81

fix(hermes): flush buffered turns on reinit — gateway reuses provider…

123561c

… without calling on_session_end

feat(hermes): periodic retain by turn count (every 5) and time (every…

81f05c1

… 2min)

feat: add ingest-document command for direct content upload

123022b

fix(hermes): retain on every turn — gateway creates new provider inst…

0fcd0fd

…ances, buffer is lost

feat(hermes): use append mode — retain each turn immediately, no buff…

15eb7cf

…ering

fix: make page ID a required positional argument on create

a350415

fix(skill): tell agent to never summarize before ingesting, use -f fo…

a50c761

…r large content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: hindsight-agent CLI + self-learning agent skills#1191

feat: hindsight-agent CLI + self-learning agent skills#1191
nicoloboschi wants to merge 59 commits intomainfrom
feat/agent-procedural-memory

nicoloboschi commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Apr 22, 2026

Summary

Architecture

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant