feat: hindsight-agent CLI + self-learning agent skills#1191
Draft
nicoloboschi wants to merge 59 commits intomainfrom
Draft
feat: hindsight-agent CLI + self-learning agent skills#1191nicoloboschi wants to merge 59 commits intomainfrom
nicoloboschi wants to merge 59 commits intomainfrom
Conversation
New skill that gives the agent a `memory/` directory in its workspace where it creates and maintains markdown files — one per topic — as a growing wiki of everything it learned across sessions. Key design choices: - Pure files, no external dependency. The agent is the only writer. - One file per topic (user preferences, procedures, setup facts, etc.) - Every fact must have an `## Evidence` section with dated entries citing the conversation/event that established it. - Updates are in-place rewrites (current state of knowledge), not append-only logs. Superseded evidence is struck through, not deleted. - Agent reads relevant memory files at the start of each task and treats them as ground truth unless the user contradicts them. - Write-after-act: the agent finishes the user's task first, then updates memory. No mid-task interruptions for note-taking. This is intentionally independent of Hindsight — it's the agent's own procedural memory, self-organized, with built-in provenance via the evidence trail. Can coexist with Hindsight-backed skills (which handle semantic recall / consolidation / mental models) or stand alone.
Knowledge files track what the agent knows (preferences, rules, setup). Activity logs track what the agent did (feed items delivered, sweep results, user reactions). Append-only, dated entries, pruned at ~30 days. The agent reads the log before each task run and skips items it already delivered. Solves the "showed the same headline twice" problem and gives the agent a "what did I show you yesterday" answer.
…lf-review Four additions to push the file-based approach to its ceiling: 1. Git: memory dir is a git repo, auto-committed after every write. Full diff history, rollback, blame — answers "when did this change" without relying on the evidence section alone. 2. Index file: _index.md maintained by the agent, one line per file with a short description. Agent reads the index first, then only the relevant files — scales retrieval to ~100 files before breaking. 3. Strict post-response writes: the skill now mandates respond first, write memory second, commit third. Memory updates are a visible post-script the user can see, not a silent side-effect or a blocker. 4. Periodic self-review: every ~10 sessions (or on user request) the agent reads ALL files and does a consolidation pass — resolve contradictions, merge duplicates, split overgrown files, prune stale log entries, flag facts without evidence. Committed as a single "memory review" git commit. Also: evidence supersession now shown inline with ~~strikethrough~~ AND preserved in git history, so both quick-scan and deep-dig work.
…ite order Incorporates the agent's own feedback on where the skill was ambiguous: - If _index.md is missing but files exist, the agent MUST create it during the post-response phase of that same turn — not "later" or "next session". Leaving the directory without an index is a bug. - Duplicate topic files (e.g. preferences.md + news-feed-preferences.md) are treated as hygiene errors: merge immediately during post-response, don't wait for the periodic self-review. - Write order now has 4 explicit numbered steps: 1. Respond (user sees their answer) 2. Repair (missing index, duplicates, name normalization) 3. Update (knowledge files + activity logs) 4. Commit (git) - Clarified that structural reads (checking if _index.md exists) are allowed at any time, but all writes (repairs included) happen after the user's response is delivered.
The agent knows the write rules but drops the post-response steps because the LLM's natural stopping point is after the response. Adding more prose to the "write after responding" section doesn't help — the agent already read and agreed to the rule, then forgot. Fix: force the agent to print a visible checklist as the LAST thing it outputs every turn: 📝 Memory: [wrote: <files> | logged: <yes/no> | committed: <yes/no>] If the agent completed a task run and the checklist says "logged: no", it has a bug and must fix it before ending the turn. The checklist is a self-check that makes skipping the write step visibly wrong — both to the agent (it has to fill in the fields) and to the user (they can see if logging was missed).
New `knowledge_bases` table + KB CRUD in engine + HTTP endpoints + `kb_id` filter on list_mental_models. See commit body for details.
- Wire kb_id through create_mental_model engine method + HTTP endpoint - Include kb_id in _row_to_mental_model output - Include kb_id in list_mental_models SELECT - Add scripts/hindsight-mount: renders a bank's mental models (optionally filtered by KB) as local markdown files at ~/.agent-knowledge/<bank>/ with _index.md for browsing. Agent reads files with normal tools.
Fix NameError in KB HTTP endpoints — was using bare `engine` instead of `app.state.memory` (copy-paste error from the endpoint template). New skill: `agent-knowledge` — the Hindsight-backed read-only counterpart to `agent-memory`. Agent mounts its KB as local files via `hindsight-mount`, browses with normal tools, never writes. The system handles all knowledge maintenance through the retain → consolidation → KB update → MM refresh pipeline.
- Fix consolidator: use `llm_config.call()` instead of nonexistent `call_llm` import — the knowledge_base_update was crashing with ImportError on every consolidation run - Pipeline tests: handle `no_new_memories` status (background worker may process memories before the test's explicit consolidation call) - FINDINGS.md: document the current approach (KB + direct CLI reads), architecture diagram, what's implemented vs not, comparison table
Drop auto-create from the KB pipeline. The agent decides what pages to create (same judgment it used with local files), the system keeps them current via consolidation + refresh_after_consolidation. The agent calls `mental-model create` with a good source_query when it discovers a recurring topic. The system never creates pages on its own — it only refreshes existing ones. This fixes the junk-page problem: a cheap LLM reading decontextualized observations can't distinguish signal from noise. The agent can, because it has the full conversation context.
- hindsight-agent: Python CLI for agent scaffolding and runtime - setup: one-shot onboarding (bank, KB, skill, harness config, content ingestion) - retain: pipe content to Hindsight (always async) - pages: list/get/create/update/delete knowledge pages - recall: search memories for ad-hoc research - documents: list retained reference docs - --template: import bank template at setup - --content: ingest directory of reference docs at setup - OpenClaw plugin: lightweight retain plugin - Reads ~/.hindsight-agent/config.json for bank/URL resolution - Filters to user/assistant text only - Retains async via Hindsight API - Control plane: Knowledge Base UI - KB selector in mental models view - KB badge on cards, KB field in create dialog - kb_id in MentalModelResponse and BankTemplateMentalModel - agent-knowledge skill: recall, documents, source_query patterns - Remove broken _run_knowledge_base_updates reference from consolidator
KB added complexity without value — each agent already has its own bank, which provides the isolation boundary. The bank IS the grouping. Removed from: - API: KB CRUD endpoints, kb_id on MentalModelResponse, CreateMentalModelRequest, BankTemplateMentalModel, list_mental_models kb filter - Engine: KB CRUD methods, _row_to_knowledge_base, kb_id on create/list MM, _run_knowledge_base_updates consolidation step - Control plane: KB selector, KB badge, KB in create dialog, listKnowledgeBases, kb route proxy - hindsight-agent: kb_id from config/api/setup/pages/template Migration and test files left in place (DB will be erased).
…o templates - Remove KB migration, tests, Rust CLI kb command and types - Reset Rust CLI to main (only changes were KB-related) - Remove agent-memory skill (superseded by agent-knowledge) - Remove hindsight-mount script (superseded by direct CLI reads) - Remove DEMO.md and marketing-seo templates (moved to nicolo-agents repo)
- Add --api-token to setup (also reads HINDSIGHT_API_TOKEN env var) - Store api_token in agent config, pass to all API calls (Bearer auth) - OpenClaw plugin reads and passes token from config - Sync skills/agent-knowledge/SKILL.md with hindsight-agent/skill/SKILL.md
… hindsight-agent/skill/)
…s for self-learning agents Technical draft covering: unreliable writer problem, capture/synthesis separation, source_query abstraction, comparison with file-based memory, pipeline auto-creation, and Memento-Skills. References: Karpathy LLM Wiki, MemGPT, Reflexion, Voyager, CoALA.
…swers Comprehensive reviewer anticipation: empirical methodology gaps, missing experiments, failure modes, architectural positions, related work (Generative Agents, MemoryBank, A-MEM, practitioner tools), framing/positioning, deployment data. Each question has a drafted answer based on our actual experience building this.
Resolved in PAPER_DRAFT.md: - Added Generative Agents [8] as key related work with explicit comparison - Added MemoryBank [9], A-MEM [10], practitioner tools (mem0, Letta, Zep) - Qualified "100% capture" as "deterministic for completed sessions" - Added failure mode analysis (Section 5.6): overlapping pages, contradictions, bad queries, adversarial content, unbounded accumulation - Added privacy/deletion discussion (Section 5.7) - Added cost analysis (Section 5.8) - Added page discovery, versioning, pages vs recall sections - Softened convergence claim to "empirically observed" - Softened steerable claim to "observed in practice, formal analysis needed" - Added thesis statement to introduction - Standardized terminology: "synthesis query" (not source_query), "knowledge page" - Sharpened "is this a new paradigm" argument (Section 5.5) - Added production anecdote to Section 5.3 - Expanded limitations to 9 items with specific future work Remaining in OPEN_QUESTIONS.md (12 items, all need experiments/data): - Hard blockers: end-to-end quality benchmark, cross-session transfer numbers - Should do: formalized write reliability, extraction faithfulness, real demo data - Can acknowledge: phrasing stability, delta vs full, scaling, latency distribution
- New memory provider plugin at plugin/hermes/ implementing MemoryProvider ABC - Buffers turns via sync_turn, retains on session end (async) - Reads ~/.hindsight-agent/config.json for bank/URL/token - No tools or prefetch — skill handles reads via CLI - setup --harness hermes: installs skill to ~/.hermes/skills/, copies plugin to ~/.hermes/plugins/hindsight-agent/ - Updated README with Hermes plugin docs
- Plugin name uses underscore (hindsight_agent) to match Hermes conventions - Install to ~/.hermes/hermes-agent/plugins/memory/ (not ~/.hermes/plugins/) - Fallback agent resolution: if agent_identity doesn't match, find any hermes-harness agent in config - Auto-set memory.provider via hermes config set - Tested end-to-end: setup → chat → retain → recall works
- New 'list' command shows all configured agents - Skill tells agent how to find its own ID (baked-in → profile name → list) - Fixes shared-skill problem in Hermes multi-profile setups
… without calling on_session_end
…ances, buffer is lost
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hindsight-agent: New Python CLI for agent scaffolding and runtime. One command (
setup) creates the Hindsight bank, knowledge pages, ingests reference docs, configures the harness agent, and installs the skill. Runtime commands (pages,recall,documents,retain) let the agent and plugin interact with Hindsight using just an agent ID — no bank/URL/KB knowledge needed.OpenClaw plugin: Lightweight retain plugin that reads
~/.hindsight-agent/config.jsonfor bank resolution, filters to user/assistant text, and retains async. Nochild_process, no complex routing logic.agent-knowledge skill: Updated with recall, documents, source_query patterns, and delta mode. Auto-loaded at session startup via AGENTS.md patch.
Control plane UI: KB selector in mental models view, KB badge on cards, KB field in create dialog,
kb_idin API responses and bank templates.Bank templates:
kb_idsupport onBankTemplateMentalModelso templates can pre-populate KB-scoped pages.Architecture
The agent decides what to track (page creation). The system handles capture (plugin) and synthesis (consolidation + refresh).
Test plan
hindsight-agent setupwith--templateand--contentcreates bank, KB, pages, ingests docshindsight-agent pages list/create/update/deleteworkshindsight-agent recallreturns relevant memorieskb_idassigns pages to KB on import