Skip to content

feat: hindsight-agent CLI + self-learning agent skills#1191

Draft
nicoloboschi wants to merge 59 commits intomainfrom
feat/agent-procedural-memory
Draft

feat: hindsight-agent CLI + self-learning agent skills#1191
nicoloboschi wants to merge 59 commits intomainfrom
feat/agent-procedural-memory

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

  • hindsight-agent: New Python CLI for agent scaffolding and runtime. One command (setup) creates the Hindsight bank, knowledge pages, ingests reference docs, configures the harness agent, and installs the skill. Runtime commands (pages, recall, documents, retain) let the agent and plugin interact with Hindsight using just an agent ID — no bank/URL/KB knowledge needed.

  • OpenClaw plugin: Lightweight retain plugin that reads ~/.hindsight-agent/config.json for bank resolution, filters to user/assistant text, and retains async. No child_process, no complex routing logic.

  • agent-knowledge skill: Updated with recall, documents, source_query patterns, and delta mode. Auto-loaded at session startup via AGENTS.md patch.

  • Control plane UI: KB selector in mental models view, KB badge on cards, KB field in create dialog, kb_id in API responses and bank templates.

  • Bank templates: kb_id support on BankTemplateMentalModel so templates can pre-populate KB-scoped pages.

Architecture

[hindsight-agent setup] → creates bank + KB + pages + ingests docs + configures harness
[OpenClaw plugin]       → retains conversations (reads config.json → async POST)
[agent skill]           → reads pages at startup, creates pages, recalls memories
[consolidation]         → extracts observations → refreshes pages via source_query (delta)

The agent decides what to track (page creation). The system handles capture (plugin) and synthesis (consolidation + refresh).

Test plan

  • hindsight-agent setup with --template and --content creates bank, KB, pages, ingests docs
  • hindsight-agent pages list/create/update/delete works
  • hindsight-agent recall returns relevant memories
  • OpenClaw plugin retains conversations async
  • Control plane KB selector filters mental models
  • Bank template kb_id assigns pages to KB on import
  • Agent reads skill at session startup and executes mandatory sequence

New skill that gives the agent a `memory/` directory in its workspace
where it creates and maintains markdown files — one per topic — as a
growing wiki of everything it learned across sessions.

Key design choices:
- Pure files, no external dependency. The agent is the only writer.
- One file per topic (user preferences, procedures, setup facts, etc.)
- Every fact must have an `## Evidence` section with dated entries
  citing the conversation/event that established it.
- Updates are in-place rewrites (current state of knowledge), not
  append-only logs. Superseded evidence is struck through, not deleted.
- Agent reads relevant memory files at the start of each task and
  treats them as ground truth unless the user contradicts them.
- Write-after-act: the agent finishes the user's task first, then
  updates memory. No mid-task interruptions for note-taking.

This is intentionally independent of Hindsight — it's the agent's own
procedural memory, self-organized, with built-in provenance via the
evidence trail. Can coexist with Hindsight-backed skills (which handle
semantic recall / consolidation / mental models) or stand alone.
Knowledge files track what the agent knows (preferences, rules, setup).
Activity logs track what the agent did (feed items delivered, sweep
results, user reactions). Append-only, dated entries, pruned at ~30 days.

The agent reads the log before each task run and skips items it already
delivered. Solves the "showed the same headline twice" problem and gives
the agent a "what did I show you yesterday" answer.
…lf-review

Four additions to push the file-based approach to its ceiling:

1. Git: memory dir is a git repo, auto-committed after every write.
   Full diff history, rollback, blame — answers "when did this change"
   without relying on the evidence section alone.

2. Index file: _index.md maintained by the agent, one line per file
   with a short description. Agent reads the index first, then only
   the relevant files — scales retrieval to ~100 files before breaking.

3. Strict post-response writes: the skill now mandates respond first,
   write memory second, commit third. Memory updates are a visible
   post-script the user can see, not a silent side-effect or a blocker.

4. Periodic self-review: every ~10 sessions (or on user request) the
   agent reads ALL files and does a consolidation pass — resolve
   contradictions, merge duplicates, split overgrown files, prune
   stale log entries, flag facts without evidence. Committed as a
   single "memory review" git commit.

Also: evidence supersession now shown inline with ~~strikethrough~~ AND
preserved in git history, so both quick-scan and deep-dig work.
…ite order

Incorporates the agent's own feedback on where the skill was ambiguous:

- If _index.md is missing but files exist, the agent MUST create it
  during the post-response phase of that same turn — not "later" or
  "next session". Leaving the directory without an index is a bug.

- Duplicate topic files (e.g. preferences.md + news-feed-preferences.md)
  are treated as hygiene errors: merge immediately during post-response,
  don't wait for the periodic self-review.

- Write order now has 4 explicit numbered steps:
  1. Respond (user sees their answer)
  2. Repair (missing index, duplicates, name normalization)
  3. Update (knowledge files + activity logs)
  4. Commit (git)

- Clarified that structural reads (checking if _index.md exists) are
  allowed at any time, but all writes (repairs included) happen after
  the user's response is delivered.
The agent knows the write rules but drops the post-response steps
because the LLM's natural stopping point is after the response.
Adding more prose to the "write after responding" section doesn't
help — the agent already read and agreed to the rule, then forgot.

Fix: force the agent to print a visible checklist as the LAST thing
it outputs every turn:

  📝 Memory: [wrote: <files> | logged: <yes/no> | committed: <yes/no>]

If the agent completed a task run and the checklist says "logged: no",
it has a bug and must fix it before ending the turn. The checklist is
a self-check that makes skipping the write step visibly wrong — both
to the agent (it has to fill in the fields) and to the user (they can
see if logging was missed).
New `knowledge_bases` table + KB CRUD in engine + HTTP endpoints +
`kb_id` filter on list_mental_models. See commit body for details.
- Wire kb_id through create_mental_model engine method + HTTP endpoint
- Include kb_id in _row_to_mental_model output
- Include kb_id in list_mental_models SELECT
- Add scripts/hindsight-mount: renders a bank's mental models (optionally
  filtered by KB) as local markdown files at ~/.agent-knowledge/<bank>/
  with _index.md for browsing. Agent reads files with normal tools.
Fix NameError in KB HTTP endpoints — was using bare `engine` instead
of `app.state.memory` (copy-paste error from the endpoint template).

New skill: `agent-knowledge` — the Hindsight-backed read-only
counterpart to `agent-memory`. Agent mounts its KB as local files
via `hindsight-mount`, browses with normal tools, never writes.
The system handles all knowledge maintenance through the
retain → consolidation → KB update → MM refresh pipeline.
- Fix consolidator: use `llm_config.call()` instead of nonexistent
  `call_llm` import — the knowledge_base_update was crashing with
  ImportError on every consolidation run
- Pipeline tests: handle `no_new_memories` status (background worker
  may process memories before the test's explicit consolidation call)
- FINDINGS.md: document the current approach (KB + direct CLI reads),
  architecture diagram, what's implemented vs not, comparison table
Drop auto-create from the KB pipeline. The agent decides what pages
to create (same judgment it used with local files), the system keeps
them current via consolidation + refresh_after_consolidation.

The agent calls `mental-model create` with a good source_query when
it discovers a recurring topic. The system never creates pages on its
own — it only refreshes existing ones.

This fixes the junk-page problem: a cheap LLM reading decontextualized
observations can't distinguish signal from noise. The agent can,
because it has the full conversation context.
- hindsight-agent: Python CLI for agent scaffolding and runtime
  - setup: one-shot onboarding (bank, KB, skill, harness config, content ingestion)
  - retain: pipe content to Hindsight (always async)
  - pages: list/get/create/update/delete knowledge pages
  - recall: search memories for ad-hoc research
  - documents: list retained reference docs
  - --template: import bank template at setup
  - --content: ingest directory of reference docs at setup

- OpenClaw plugin: lightweight retain plugin
  - Reads ~/.hindsight-agent/config.json for bank/URL resolution
  - Filters to user/assistant text only
  - Retains async via Hindsight API

- Control plane: Knowledge Base UI
  - KB selector in mental models view
  - KB badge on cards, KB field in create dialog
  - kb_id in MentalModelResponse and BankTemplateMentalModel

- agent-knowledge skill: recall, documents, source_query patterns

- Remove broken _run_knowledge_base_updates reference from consolidator
KB added complexity without value — each agent already has its own bank,
which provides the isolation boundary. The bank IS the grouping.

Removed from:
- API: KB CRUD endpoints, kb_id on MentalModelResponse, CreateMentalModelRequest,
  BankTemplateMentalModel, list_mental_models kb filter
- Engine: KB CRUD methods, _row_to_knowledge_base, kb_id on create/list MM,
  _run_knowledge_base_updates consolidation step
- Control plane: KB selector, KB badge, KB in create dialog, listKnowledgeBases,
  kb route proxy
- hindsight-agent: kb_id from config/api/setup/pages/template

Migration and test files left in place (DB will be erased).
…o templates

- Remove KB migration, tests, Rust CLI kb command and types
- Reset Rust CLI to main (only changes were KB-related)
- Remove agent-memory skill (superseded by agent-knowledge)
- Remove hindsight-mount script (superseded by direct CLI reads)
- Remove DEMO.md and marketing-seo templates (moved to nicolo-agents repo)
- Add --api-token to setup (also reads HINDSIGHT_API_TOKEN env var)
- Store api_token in agent config, pass to all API calls (Bearer auth)
- OpenClaw plugin reads and passes token from config
- Sync skills/agent-knowledge/SKILL.md with hindsight-agent/skill/SKILL.md
…s for self-learning agents

Technical draft covering: unreliable writer problem, capture/synthesis separation,
source_query abstraction, comparison with file-based memory, pipeline auto-creation,
and Memento-Skills. References: Karpathy LLM Wiki, MemGPT, Reflexion, Voyager, CoALA.
…swers

Comprehensive reviewer anticipation: empirical methodology gaps, missing
experiments, failure modes, architectural positions, related work (Generative
Agents, MemoryBank, A-MEM, practitioner tools), framing/positioning,
deployment data. Each question has a drafted answer based on our actual
experience building this.
Resolved in PAPER_DRAFT.md:
- Added Generative Agents [8] as key related work with explicit comparison
- Added MemoryBank [9], A-MEM [10], practitioner tools (mem0, Letta, Zep)
- Qualified "100% capture" as "deterministic for completed sessions"
- Added failure mode analysis (Section 5.6): overlapping pages, contradictions,
  bad queries, adversarial content, unbounded accumulation
- Added privacy/deletion discussion (Section 5.7)
- Added cost analysis (Section 5.8)
- Added page discovery, versioning, pages vs recall sections
- Softened convergence claim to "empirically observed"
- Softened steerable claim to "observed in practice, formal analysis needed"
- Added thesis statement to introduction
- Standardized terminology: "synthesis query" (not source_query), "knowledge page"
- Sharpened "is this a new paradigm" argument (Section 5.5)
- Added production anecdote to Section 5.3
- Expanded limitations to 9 items with specific future work

Remaining in OPEN_QUESTIONS.md (12 items, all need experiments/data):
- Hard blockers: end-to-end quality benchmark, cross-session transfer numbers
- Should do: formalized write reliability, extraction faithfulness, real demo data
- Can acknowledge: phrasing stability, delta vs full, scaling, latency distribution
- New memory provider plugin at plugin/hermes/ implementing MemoryProvider ABC
  - Buffers turns via sync_turn, retains on session end (async)
  - Reads ~/.hindsight-agent/config.json for bank/URL/token
  - No tools or prefetch — skill handles reads via CLI
- setup --harness hermes: installs skill to ~/.hermes/skills/,
  copies plugin to ~/.hermes/plugins/hindsight-agent/
- Updated README with Hermes plugin docs
- Plugin name uses underscore (hindsight_agent) to match Hermes conventions
- Install to ~/.hermes/hermes-agent/plugins/memory/ (not ~/.hermes/plugins/)
- Fallback agent resolution: if agent_identity doesn't match, find any
  hermes-harness agent in config
- Auto-set memory.provider via hermes config set
- Tested end-to-end: setup → chat → retain → recall works
- New 'list' command shows all configured agents
- Skill tells agent how to find its own ID (baked-in → profile name → list)
- Fixes shared-skill problem in Hermes multi-profile setups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant