From 3ae80e48b7a64a208c9b9d727fdf6f313e281949 Mon Sep 17 00:00:00 2001 From: Ruben de Smet Date: Wed, 20 May 2026 01:44:01 +0200 Subject: [PATCH 1/5] =?UTF-8?q?v4-a:=20mem::lineage=20=E2=80=94=20concept?= =?UTF-8?q?=20lineage=20retrieval=20primitive?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Returns chronologically-sorted hits across observation/memory/lesson/ summary channels — answers "when did this term enter the corpus and what surrounded it?". Includes BM25 sweep over obs+memory, substring scan for lessons/summaries, optional adjacent-turn enrichment, and optional graph-neighbor attachment. Gap-2 fix bundled: BM25 sweep cap raised from min(limit*4, 500) to min(limit*20, 5000) so deep in-session refs in large jsonl-imported sessions (10k+ obs) still rank into the channel-filtered top N. Wires: - src/functions/lineage.ts (new) - mem::lineage MCP tool in CORE_TOOLS - POST /agentmemory/lineage REST endpoint - AuditEntry operation: + "query" - LineageChannel / TimelineItem / LineageGraphNeighbor / LineageResult types - design + test-case docs under docs/plans/ Counts bumped to keep README/AGENTS/boot message/test in sync: CORE_TOOLS 12 → 13, total MCP tools 51 → 52, REST endpoints 121 → 122. Co-Authored-By: Claude Opus 4.7 (1M context) --- AGENTS.md | 4 +- README.md | 8 +- docs/plans/v4-lineage-design.md | 277 +++++++++++ .../v4-lineage-test-case-careful-generator.md | 200 ++++++++ src/functions/lineage.ts | 455 ++++++++++++++++++ src/index.ts | 4 +- src/mcp/server.ts | 35 ++ src/mcp/tools-registry.ts | 33 ++ src/triggers/api.ts | 70 ++- src/types.ts | 55 ++- test/mcp-standalone.test.ts | 4 +- 11 files changed, 1134 insertions(+), 11 deletions(-) create mode 100644 docs/plans/v4-lineage-design.md create mode 100644 docs/plans/v4-lineage-test-case-careful-generator.md create mode 100644 src/functions/lineage.ts diff --git a/AGENTS.md b/AGENTS.md index ebcf3584..6032e3a7 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -111,8 +111,8 @@ Hook scripts in `src/hooks/` are standalone Node.js scripts (no iii-sdk import). ## Current Stats (v0.9.16) -- 53 MCP tools (8 visible by default, `AGENTMEMORY_TOOLS=all` for all) -- 124 REST endpoints +- 54 MCP tools (8 visible by default, `AGENTMEMORY_TOOLS=all` for all) +- 125 REST endpoints - 6 MCP resources, 3 MCP prompts - 12 hooks, 4 skills - 50+ iii functions diff --git a/README.md b/README.md index 840a75c4..c3775a11 100644 --- a/README.md +++ b/README.md @@ -43,7 +43,7 @@

95.2% retrieval R@5 92% fewer tokens - 53 MCP tools + 54 MCP tools 12 auto hooks 0 external DBs 950+ tests passing @@ -408,7 +408,7 @@ Implementation details live in `src/cli.ts` (see `runUpgrade` around the `src/cl ### Claude Code (one block, paste it) ``` -Install agentmemory: run `npx @agentmemory/agentmemory` in a separate terminal to start the memory server. Then run `/plugin marketplace add rohitg00/agentmemory` and `/plugin install agentmemory` — the plugin registers all 12 hooks, 4 skills, AND auto-wires the `@agentmemory/mcp` stdio server via its `.mcp.json`, so you get 53 MCP tools (memory_smart_search, memory_save, memory_sessions, memory_governance_delete, etc.) without any extra config step. Verify with `curl http://localhost:3111/agentmemory/health`. The real-time viewer is at http://localhost:3113. +Install agentmemory: run `npx @agentmemory/agentmemory` in a separate terminal to start the memory server. Then run `/plugin marketplace add rohitg00/agentmemory` and `/plugin install agentmemory` — the plugin registers all 12 hooks, 4 skills, AND auto-wires the `@agentmemory/mcp` stdio server via its `.mcp.json`, so you get 54 MCP tools (memory_smart_search, memory_save, memory_sessions, memory_governance_delete, etc.) without any extra config step. Verify with `curl http://localhost:3111/agentmemory/health`. The real-time viewer is at http://localhost:3113. ``` ### Codex CLI (Codex plugin platform) @@ -799,7 +799,7 @@ npm install @xenova/transformers

MCP Server

-53 tools, 6 resources, 3 prompts, and 4 skills — the most comprehensive MCP memory toolkit for any agent. +54 tools, 6 resources, 3 prompts, and 4 skills — the most comprehensive MCP memory toolkit for any agent. > **MCP shim vs full server:** the published `@agentmemory/mcp` package is a thin shim. It exposes the full 51-tool surface **only when it can reach a running agentmemory server** via `AGENTMEMORY_URL` (proxy mode). With no server reachable, the shim falls back to a 7-tool local set (`memory_save`, `memory_recall`, `memory_smart_search`, `memory_sessions`, `memory_export`, `memory_audit`, `memory_governance_delete`). The `AGENTMEMORY_TOOLS=core|all` env var is a *server-side* flag — setting it in the shim's `env` block has no effect. If you see only 7 tools in Cursor / OpenCode / Gemini CLI, start `npx @agentmemory/agentmemory` (or the Docker stack) and set `AGENTMEMORY_URL=http://localhost:3111`. @@ -1197,7 +1197,7 @@ Create `~/.agentmemory/.env`:

API

-124 endpoints on port `3111`. The REST API binds to `127.0.0.1` by default. Protected endpoints require `Authorization: Bearer ` when `AGENTMEMORY_SECRET` is set, and mesh sync endpoints require `AGENTMEMORY_SECRET` on both peers. +125 endpoints on port `3111`. The REST API binds to `127.0.0.1` by default. Protected endpoints require `Authorization: Bearer ` when `AGENTMEMORY_SECRET` is set, and mesh sync endpoints require `AGENTMEMORY_SECRET` on both peers.
Key endpoints diff --git a/docs/plans/v4-lineage-design.md b/docs/plans/v4-lineage-design.md new file mode 100644 index 00000000..f5b3ce6d --- /dev/null +++ b/docs/plans/v4-lineage-design.md @@ -0,0 +1,277 @@ +# v4-A: `mem::lineage` — concept-lineage retrieval primitive + +## Problem + +Smart-search ranks the **lesson** channel over the **memory** and **observation** +channels, so queries that target a single inline phrase in a large doc +(or a turn from a specific past session) are silently dropped from the +top-K. The data is in the corpus; the *retrieval shape* is missing. + +Concrete miss we hit: +- Query: *"who is the careful generator?"* +- Truth: `docs/architecture.md:308` defines it as Tier-2 = Qwen3.6-35B-A3B-FP8, + and the term was first written into `config/config.yaml` at + `2026-04-26T11:39:45` in session `05988a74-...`. +- Smart-search returned 8 unrelated session-handoff lessons (top score 0.726). +- Plain `/agentmemory/search` (BM25-only) found the right hits cleanly + (score 11–14) — proving the data is there and BM25 indexes it. + +The gap is a missing **conceptual-lineage** primitive: *"when did this term +enter our shared vocabulary, where, and what surrounded it?"*. That's a +different query shape from relevance-ranked retrieval — it wants +**chronological order** + **session context** + **adjacent turns**. + +## Function: `mem::lineage` + +### Request + +```json +POST /agentmemory/lineage +{ + "query": "careful generator", + "limit": 50, + "since": "2026-04-01T00:00:00Z", + "until": "2026-05-20T00:00:00Z", + "channels": ["observation", "memory", "lesson", "summary"], + "includeAdjacentTurns": true, + "includeGraph": false, + "order": "asc" +} +``` + +Field semantics: + +| field | type | default | meaning | +|---|---|---|---| +| `query` | string (required) | — | phrase/terms to find. Case-insensitive substring match for lessons/summaries; existing BM25 index handles observations/memories. | +| `limit` | int | 50 | max items in the returned timeline (after merge + sort) | +| `since` / `until` | ISO 8601 | unbounded | filter on `createdAt` / `timestamp` | +| `channels` | array | all four | which content types to search | +| `includeAdjacentTurns` | bool | `true` | for observation hits, attach the previous user prompt + previous assistant turn from the same session | +| `includeGraph` | bool | `false` | attach immediate graph-edge neighbors of nodes whose `name` matches the query | +| `order` | `"asc"` \| `"desc"` | `"asc"` | chronological direction (asc = oldest first, lineage-style) | + +### Response + +```json +{ + "query": "careful generator", + "firstMention": { + "timestamp": "2026-04-26T11:39:45.123Z", + "channel": "observation", + "sessionId": "05988a74-d1f1-42a1-9cd4-53b4db205ff3", + "project": "gitops-assistant" + }, + "timeline": [ + { + "timestamp": "2026-04-26T11:39:45.123Z", + "channel": "observation", + "id": "obs_mp...", + "sessionId": "05988a74-d1f1-42a1-9cd4-53b4db205ff3", + "project": "gitops-assistant", + "title": "post_tool_use", + "type": "other", + "snippet": "...Tier 2 — careful generator (Qwen3.6-35B-A3B-FP8 on vast pod)\n analyse_manifest: vast-qwen...", + "score": 12.4, + "session": { + "id": "05988a74-...", + "project": "gitops-assistant", + "startedAt": "2026-04-26T09:06:36.534Z", + "firstPrompt": "I need an implementation plan for wiring..." + }, + "adjacentTurns": { + "previousUserPrompt": "...", + "previousAssistantSummary": "..." + } + }, + { + "timestamp": "2026-05-19T00:36:09.232Z", + "channel": "memory", + "id": "mem_mp...", + "title": "[Repo doc] gitops-assistant: docs/architecture.md (chunk 1/1...)", + "snippet": "...# Tier 2 — careful generator\nanalyse_manifest: vast-qwen36-35b...", + "score": 7.1, + "sourceFile": "docs/architecture.md", + "memoryType": "architecture" + } + ], + "totalsByChannel": { + "observation": 12, + "memory": 3, + "lesson": 0, + "summary": 1 + }, + "graphNeighbors": [ + { + "name": "careful generator", + "type": "concept", + "edges": [ + { "kind": "uses", "neighbor": "vast-qwen36-35b", "neighborType": "library" }, + { "kind": "related_to", "neighbor": "analyse_manifest", "neighborType": "function" } + ] + } + ] +} +``` + +Notes: +- `firstMention` is the earliest item in the timeline (after filtering), + surfaced separately for convenience. +- `graphNeighbors` only present when `includeGraph: true`. +- `adjacentTurns` only present when `includeAdjacentTurns: true` AND the + channel is `observation` AND a prior turn exists in the same session. + +## Algorithm + +``` +1. Match by channel (parallel): + a) observation & memory: + - reuse the existing BM25 index from src/functions/search.ts. + Call getSearchIndex().search(query, max=200) or equivalent. + Filter by `channels` setting. + - existing index already returns timestamp + sessionId for + observations; memory entries carry createdAt + id. + b) lesson: + - kv.list(KV.lessons) + - filter: !lesson.deleted && lesson.content.toLowerCase().includes(qLower) + - ~4500 lessons; substring scan is ~10ms + c) summary: + - kv.list(KV.summaries) + - filter on .narrative substring + - ~60 records; trivial + +2. For each hit, build a TimelineItem with: + timestamp, channel, id, score (BM25 if available, else 0), + snippet (300-char window centered on first match position; + clip at content boundaries; "..." prefix/suffix elision). + +3. Apply since/until filters. + +4. Merge channels, sort by timestamp (asc by default), trim to limit. + +5. Enrichment pass: + a) Session lookup cache (Map) — populate lazily + on first obs hit needing it. + b) If includeAdjacentTurns: for each observation hit, scan + KV.observations(obs.sessionId) for the last observation with + timestamp < obs.timestamp that is type=="conversation" AND has a + userPrompt field; same for the latest assistant-side observation. + Cache per-session so multiple hits in one session share a single + KV.list call. + c) For memory hits: parse the source line from the content header + if it starts with "[Repo doc] " or "[Session handoff] ". + Regex: /^\[Repo doc\] [^:]+: ([^\s(]+)/ + +6. If includeGraph: + - kv.list(KV.graphNodes), filter by name.toLowerCase() + includes(qLower) OR exact-match of any tokenized phrase. + - For each matched node, kv.list(KV.graphEdges) filtered + by source/target == node.id; resolve neighbor node names + types. + - Attach to the top-level response, NOT per timeline item. + +7. Build firstMention from timeline[0] (after sort). + +8. Audit the call (kv recordAudit). +``` + +## Files to modify + +| file | change | +|---|---| +| `src/types.ts` | add `TimelineItem`, `LineageResult` interfaces | +| `src/functions/lineage.ts` | **new** — implements `mem::lineage` per the algorithm above | +| `src/index.ts` | register the lineage function (find where other `register*Function(sdk, kv)` calls live and add `registerLineageFunction(sdk, kv)`) | +| `src/triggers/api.ts` | add `api::lineage` HTTP wrapper + trigger registration for `POST /agentmemory/lineage` (mirror the pattern of `api::search` or `api::smart-search`) | +| `src/mcp/tools-registry.ts` | add `memory_lineage` tool entry so the MCP layer exposes it (mirror `memory_smart_search`) | + +No new env vars. No new KV namespaces. Reuses existing indexes. + +## Implementation notes & gotchas + +1. **BM25 index reuse**: `src/functions/search.ts` exports `getSearchIndex()`. + Confirm what types of entries the index holds before calling — observation + indexing happens at write time in observe.ts and remember.ts; lessons + may or may not be indexed (probably not). Either way, lesson/summary + substring-scan path handles those channels independently. + +2. **Adjacent-turn lookup**: `KV.observations(sessionId)` is a per-session + namespace. The fetch is O(n) in the session's observation count, but + we only do it once per unique sessionId in the hit set, and cache + the result. For a query that hits one big session 50 times, it's a + single list call. + +3. **Memory createdAt vs observation timestamp**: both exist as ISO strings. + Treat them uniformly for sort. CompressedObservation has `.timestamp`, + Memory has `.createdAt`. Lesson has `.createdAt`. SessionSummary has + `.createdAt`. Normalize on read. + +4. **Empty query** → return 400 with `error: "query is required"`. + +5. **No-match query** → return 200 with empty timeline, all zeros in + totalsByChannel, `firstMention: null`. + +6. **Snippet generation**: find first match position via + `content.toLowerCase().indexOf(qLower)`, take [pos-150 .. pos+150] + clipped at 0/length, prepend/append "…" if clipped. If the BM25 + index already returned a snippet, prefer that. + +7. **Tokenization for graph node match**: the query may be a phrase + ("careful generator") that doesn't appear as a single graph-node + `name`. Fallback: split query on whitespace, match nodes whose name + contains ANY token. This is best-effort; if the user wants strict + matching they should query the graph directly. + +8. **Sort stability**: when two items share a timestamp (rare but + possible), break ties by `(channel, id)` lexicographic. + +## Validation criteria + +After implementation, the subagent must verify: + +```bash +# 1. Build dist +npm run build + +# 2. Rebuild container image +docker compose -f docker/docker-compose.yml up -d --build + +# 3. Wait for /livez +curl -fsS http://localhost:3111/agentmemory/livez + +# 4. The smoke test that motivated this work: +curl -fsS -X POST http://localhost:3111/agentmemory/lineage \ + -H 'content-type: application/json' \ + -d '{"query":"careful generator","limit":30,"includeAdjacentTurns":true,"includeGraph":true}' \ + | jq + +# Expected: +# - firstMention.timestamp ≈ 2026-04-19T18:19:57Z (earliest observation hit) +# OR 2026-04-26T11:39:45Z (the config-edit observation we grep-confirmed). +# - timeline.length > 0, sorted asc by timestamp +# - At least one observation hit from session 05988a74-... +# - At least one memory hit with sourceFile == "docs/architecture.md" +# - totalsByChannel.observation >= 5 +# - totalsByChannel.memory >= 1 +# - graphNeighbors is non-null (V3-C extracted nodes from architecture.md) + +# 5. Empty-query rejection: +curl -fsS -X POST http://localhost:3111/agentmemory/lineage \ + -H 'content-type: application/json' -d '{"query":""}' -i | head -3 +# Expected: HTTP 400 + +# 6. No-match query: +curl -fsS -X POST http://localhost:3111/agentmemory/lineage \ + -H 'content-type: application/json' \ + -d '{"query":"zzz_no_such_concept_zzz"}' | jq +# Expected: timeline=[], totalsByChannel all 0, firstMention=null +``` + +## Out of scope (filed for later) + +- **Smart-search ranker tuning** (don't crowd lessons over memories). Separate + ~10-line change to `src/functions/search.ts`. Not in v4-A. +- **Graph-traversal retrieval** (find via graph edges, not text match). Bigger + design; v4-B if there's appetite. +- **Cross-session entity merging** (handoff for "careful generator" in session + A links to its first mention in session B). Requires entity-resolution + logic; v4-C+. diff --git a/docs/plans/v4-lineage-test-case-careful-generator.md b/docs/plans/v4-lineage-test-case-careful-generator.md new file mode 100644 index 00000000..2cd7b4b0 --- /dev/null +++ b/docs/plans/v4-lineage-test-case-careful-generator.md @@ -0,0 +1,200 @@ +# Test case: "Who is the careful generator?" + +A canonical regression test for agentmemory's lineage/recall capabilities. +This scenario is what motivated the `mem::lineage` design (v4-A) and +reveals the limits of smart-search + the residual gaps in v4-A itself. + +## The question + +> *"Who is the careful generator?"* + +Trivial-sounding. The right answer is a one-line lookup. But it's +secretly testing several capabilities at once. + +## What we know (out-of-band ground truth) + +**Definition.** From `docs/architecture.md:308-309` and +`docs/configuration.md:176-177`: + +``` +analyse_manifest: vast-qwen36-35b # Tier 2 — careful generator +diff_complex: vast-qwen36-35b +``` + +So **"careful generator" = Tier 2 = Qwen3.6-35B-A3B-FP8**, paired with: + +- **Tier 1 = "premium reasoning" / colloquially "the judgement" = Qwen3.5-397B** + via Together. Knows when to stop intrinsically; doesn't need bail-prompting. +- **Tier 2 = "careful generator" = Qwen3.6-35B-A3B-FP8**. Smaller, faster, + but needs explicit prompting on when to stop. + +**Provenance (user-supplied context, 2026-05-19).** The nicknames were +coined during a **benchmark session** where multiple models were pitted +against each other, qwen36 was the clear winner on the +generator-shaped tasks (`analyse_manifest`, `diff_complex`). The session +also coincided with the first exploration of serverless alternatives — +and the conclusion at the time was that nothing on serverless matched +what qwen36 offered on vast-pod hosting. + +**Earliest written trace (corpus-confirmed).** The comments were +hardened into the codebase at `2026-04-26T11:39:45.123Z` in session +`05988a74-d1f1-42a1-9cd4-53b4db205ff3` — a config edit adding the +tier-routed pipeline comments. The conversation that produced those +edits is somewhere earlier (probably mid-to-late April). + +## What this scenario tests + +A working memory system should answer each of these: + +| sub-question | shape | required capability | +|---|---|---| +| What does "careful generator" mean? | definition | direct retrieval against architecture.md memory | +| When did this term enter our vocabulary? | first-mention timestamp | chronological retrieval (lineage) | +| What was the surrounding context? | session metadata + adjacent turns | obs enrichment | +| Who's the companion concept? | related-entity traversal | graph-edge retrieval | +| Why did we pick qwen36 specifically? | rationale | summary/handoff retrieval over the benchmark session | +| Did we revisit this when serverless improved? | follow-up surface | cross-session temporal traversal | + +## Observed behavior (as of 2026-05-19 evening) + +### `mem::smart-search "who is the careful generator?"` + +Returned **8 unrelated lessons** (top score 0.726 — session-handoffs +about May 1 work that mentioned "careful" in unrelated contexts). The +[Repo doc] memory of architecture.md did not appear in either channel. + +**Diagnosis:** smart-search ranker favors the lesson channel and +crowds out memory hits. The vector channel doesn't pull a 19 KB doc +based on a single inline comment phrase. + +### `mem::search` (BM25-only) `"careful generator"` + +Returned correct hits with real signal — scores 7–14, observations ++ memories interleaved, the architecture.md memory surfaced. BM25 +proves the data is in the corpus and the index has it. + +### `mem::lineage` (v4-A initial implementation) + +Returned a populated timeline of 30 items sorted ASC: + +- **`firstMention`**: `2026-04-18T08:26:37Z`, project `observer-sessions`, + session `2d7f99c4-...` +- **Hit distribution**: observation=23, memory=71, lesson=0, summary=0 + (top 30 returned) +- **adjacentTurns** attached on 14/23 obs hits +- **graphNeighbors**: `[]` (no graph node with `name` containing "careful" + or "generator" — graph-extract was run over architecture.md content + but didn't surface the inline comment phrase as a node name) +- **Architecture.md memory hit**: present, with correct sourceFile + extracted + +**Diagnosis:** v4-A works mechanically — sorted timeline, channel +totals, enrichment, all correct. But `firstMention` is wrong: the +`observer-sessions` synthetic project (agentmemory's own meta-observer +watching primary sessions) emits records containing tokens that BM25 +matches. They time-sort to the top because they're earlier than the +actual conversations. + +The **real** first mention — the benchmark conversation — likely lives +in observations from a non-observer session. The user's recollection +places it "around when we first looked at serverless" (probably +late March / early-mid April 2026 based on related context). + +## Gaps surfaced + +1. **`mem::lineage` doesn't filter observer/agent meta-sessions** by + default — same gap that `scripts/rebuild-graph.sh` and + `emit_observations` explicitly handle. Should default-exclude + projects matching `^(observer|agent-)` with an opt-in + `--include-observer` style override. + +2. **BM25 sweep is bounded at `min(limit*4, 500)`** — the very long + gitops-assistant session `05988a74-...` (10,704 observations) has + "careful generator" references that didn't make the top 200 ranked. + Either raise the cap when channel filtering is wide, or scan all + obs in matched sessions to ensure no in-session reference is + dropped. + +3. **Graph-extraction over docs missed the inline comment phrases.** + `parseGraphXml` extracted entities from architecture.md's prose + sections, but the comment line + `# Tier 2 — careful generator (Qwen3.6-35B-A3B-FP8 on vast pod)` was + treated as code/config noise, not a concept-defining edge. No + `GraphNode(name="careful generator")` exists, so `includeGraph: true` + returns `[]`. + +4. **The benchmark session itself is not findable as a structured + record.** It happened (per the user) but the corpus doesn't seem to + have a session summary or memory record about "we benchmarked + qwen35-397b vs qwen36-35b vs X, qwen36 won on generator tasks". The + nicknames stuck in code comments but the *reasoning behind picking + the nickname* (the benchmark) was never crystallized as a memory. + This is a memory-curation gap, not a retrieval gap. + +## Validation criteria for future re-runs + +Re-running this test case after improvements should validate: + +```bash +# A. Lineage smoke (after observer-filter fix): +curl -fsS -X POST http://localhost:3111/agentmemory/lineage \ + -H 'content-type: application/json' \ + -d '{"query":"careful generator","limit":30,"order":"asc"}' \ + | jq '.firstMention' + +# Pass criteria: +# - .project NOT IN ["observer-sessions", "agent-*"] +# - .timestamp ideally falls within the user-described benchmark +# window (probably April 2026 mid-to-late, pre-config-edit on Apr 26) + +# B. Graph traversal (after architecture-doc graph-extraction is +# re-run with prompt tuning that surfaces comment phrases): +curl -fsS -X POST http://localhost:3111/agentmemory/lineage \ + -H 'content-type: application/json' \ + -d '{"query":"careful generator","includeGraph":true}' \ + | jq '.graphNeighbors' + +# Pass criteria: +# - non-empty +# - At least one neighbor is "Qwen3.6-35B-A3B-FP8" or "vast-qwen36-35b" +# with relation type "uses", "is", or "implements" + +# C. Smart-search re-ranker: +curl -fsS -X POST http://localhost:3111/agentmemory/smart-search \ + -H 'content-type: application/json' \ + -d '{"query":"who is the careful generator","limit":10}' + +# Pass criteria: +# - architecture.md or configuration.md memory in top 5 hits +# - score > 0.3 on the relevant memory +``` + +## Follow-up work surfaced by this test case + +In rough priority: + +1. **v4-A patch**: default-exclude observer/agent projects in + `mem::lineage`. ~5 lines. Highest leverage. +2. **Capture the benchmark session as a project memory**: a + `project_qwen36_v_qwen35_benchmark.md` documenting what was tested, + the results, why qwen36 won on generator tasks, and why serverless + alternatives were rejected at the time. Pure curation — no code + change. The user has the context; the corpus doesn't. +3. **Smart-search channel re-ranker** (v4-B): boost the memory channel + for queries with named-concept patterns ("who is X", "what is X", + "define X"). Smaller surface than v4-A's lineage primitive but + targets a more common query shape. +4. **Comment-aware graph extraction** (v4-C): tune the graph-extraction + prompt or post-processor to treat code comments like + `# Tier 2 — careful generator (...)` as concept-defining + declarations. Currently they're treated as code noise. + +## Why this test case is durable + +It's a real recall miss from a real workflow with verifiable ground +truth in the corpus. As long as `docs/architecture.md` retains the +"Tier 2 — careful generator" comment and the gitops-assistant session +history exists, this scenario is re-runnable across agentmemory +versions to track recall regressions and improvements. Any future +PR that touches lineage, smart-search ranking, or graph extraction +should be re-tested against this case. diff --git a/src/functions/lineage.ts b/src/functions/lineage.ts new file mode 100644 index 00000000..1a912968 --- /dev/null +++ b/src/functions/lineage.ts @@ -0,0 +1,455 @@ +import type { ISdk } from "iii-sdk"; +import type { + CompressedObservation, + GraphEdge, + GraphNode, + GraphNodeType, + Lesson, + LineageChannel, + LineageGraphNeighbor, + LineageResult, + Memory, + Session, + SessionSummary, + TimelineItem, +} from "../types.js"; +import { KV } from "../state/schema.js"; +import type { StateKV } from "../state/kv.js"; +import { getSearchIndex, rebuildIndex } from "./search.js"; +import { safeAudit } from "./audit.js"; +import { logger } from "../logger.js"; + +// Concept-lineage retrieval. Unlike mem::search (relevance) and +// mem::smart-search (lessons-first ranker), this primitive returns +// chronologically-sorted hits across observation, memory, lesson, and +// summary channels — answering "when did this term enter the corpus, +// and what surrounded it?". Reuses the existing BM25 index for obs/mem +// and falls through to substring scans for lessons/summaries. + +const ALL_CHANNELS: LineageChannel[] = [ + "observation", + "memory", + "lesson", + "summary", +]; + +interface LineageRequest { + query: string; + limit?: number; + since?: string; + until?: string; + channels?: LineageChannel[]; + includeAdjacentTurns?: boolean; + includeGraph?: boolean; + order?: "asc" | "desc"; +} + +function isValidIsoTimestamp(value: unknown): value is string { + if (typeof value !== "string") return false; + const t = Date.parse(value); + return Number.isFinite(t); +} + +function buildSnippet(content: string, qLower: string): string { + if (!content) return ""; + const lower = content.toLowerCase(); + const pos = lower.indexOf(qLower); + if (pos < 0) { + return content.length <= 300 ? content : content.slice(0, 300) + "…"; + } + const start = Math.max(0, pos - 150); + const end = Math.min(content.length, pos + qLower.length + 150); + const head = start > 0 ? "…" : ""; + const tail = end < content.length ? "…" : ""; + return head + content.slice(start, end) + tail; +} + +// Repo doc and session-handoff memories embed their source in the first +// line of content. Pull it out so callers can filter by sourceFile. +// Headers come in two flavors: +// [Repo doc] : +// [Session handoff] : +// Both have an optional "(chunk i/n)" suffix. Capture the path token. +const REPO_DOC_RE = /^\[Repo doc\] [^:]+:\s+([^\s(]+)/; +const SESSION_HANDOFF_RE = /^\[Session handoff\] [^:]+:\s+([^\s(]+)/; + +function extractMemorySourceFile(content: string): string | undefined { + const firstLine = content.split("\n", 1)[0] ?? ""; + const repo = REPO_DOC_RE.exec(firstLine); + if (repo) return repo[1]; + const handoff = SESSION_HANDOFF_RE.exec(firstLine); + if (handoff) return handoff[1]; + return undefined; +} + +function inRange(timestamp: string, since?: number, until?: number): boolean { + const t = Date.parse(timestamp); + if (!Number.isFinite(t)) return false; + if (since !== undefined && t < since) return false; + if (until !== undefined && t > until) return false; + return true; +} + +function tieBreak(a: TimelineItem, b: TimelineItem): number { + if (a.channel !== b.channel) return a.channel < b.channel ? -1 : 1; + if (a.id !== b.id) return a.id < b.id ? -1 : 1; + return 0; +} + +export function registerLineageFunction(sdk: ISdk, kv: StateKV): void { + sdk.registerFunction( + "mem::lineage", + async (data: LineageRequest): Promise => { + if (typeof data?.query !== "string" || !data.query.trim()) { + return { error: "query is required" }; + } + const query = data.query.trim(); + const qLower = query.toLowerCase(); + + const limit = + typeof data.limit === "number" && Number.isInteger(data.limit) && data.limit > 0 + ? Math.min(data.limit, 500) + : 50; + + const since = isValidIsoTimestamp(data.since) ? Date.parse(data.since) : undefined; + const until = isValidIsoTimestamp(data.until) ? Date.parse(data.until) : undefined; + + const requestedChannels = + Array.isArray(data.channels) && data.channels.length > 0 + ? (data.channels.filter((c): c is LineageChannel => + ALL_CHANNELS.includes(c as LineageChannel), + ) as LineageChannel[]) + : ALL_CHANNELS; + const channelSet = new Set(requestedChannels); + + const includeAdjacentTurns = data.includeAdjacentTurns !== false; + const includeGraph = data.includeGraph === true; + const order: "asc" | "desc" = data.order === "desc" ? "desc" : "asc"; + + const items: TimelineItem[] = []; + + // (a) BM25 path covers observations + memories (memories are + // indexed under their own id with sessionId fallback "memory" + // via memoryToObservation). + if (channelSet.has("observation") || channelSet.has("memory")) { + const idx = getSearchIndex(); + if (idx.size === 0) { + try { + const count = await rebuildIndex(kv); + logger.info("Search index rebuilt for lineage", { entries: count }); + } catch (err) { + logger.warn("lineage: rebuild index failed", { + error: err instanceof Error ? err.message : String(err), + }); + } + } + // v4-A Gap 2 fix: bound the sweep generously so deep-in-session + // references in large jsonl-imported sessions (10k+ obs) still + // rank into the channel-filtered top N. Was min(limit*4, 500), + // which missed in-session refs in the Apr 26→May 17 GA session. + const bm25Hits = idx.search(query, Math.min(Math.max(limit * 20, 1000), 5000)); + + // Resolve each hit to either an observation or a memory. + const memoryCache = new Map(); + const obsCache = new Map(); + + for (const hit of bm25Hits) { + // Memory hits have sessionId == "memory" (synthetic) OR live + // in KV.memories with a real sessionId. Probe memory scope by + // id first; fall back to observation lookup. + let mem = memoryCache.get(hit.obsId); + if (mem === undefined) { + try { + mem = (await kv.get(KV.memories, hit.obsId)) ?? null; + } catch { + mem = null; + } + memoryCache.set(hit.obsId, mem); + } + if (mem && mem.isLatest !== false) { + if (!channelSet.has("memory")) continue; + const ts = mem.createdAt; + if (!inRange(ts, since, until)) continue; + items.push({ + timestamp: ts, + channel: "memory", + id: mem.id, + title: mem.title, + snippet: buildSnippet(mem.content, qLower), + score: hit.score, + sourceFile: extractMemorySourceFile(mem.content), + memoryType: mem.type, + }); + continue; + } + + if (!channelSet.has("observation")) continue; + let obs = obsCache.get(hit.obsId); + if (obs === undefined) { + try { + obs = + (await kv.get( + KV.observations(hit.sessionId), + hit.obsId, + )) ?? null; + } catch { + obs = null; + } + obsCache.set(hit.obsId, obs); + } + if (!obs) continue; + if (!inRange(obs.timestamp, since, until)) continue; + const snippetSource = + obs.narrative || obs.facts.join(" ") || obs.title; + items.push({ + timestamp: obs.timestamp, + channel: "observation", + id: obs.id, + sessionId: obs.sessionId, + title: obs.title, + type: obs.type, + snippet: buildSnippet(snippetSource, qLower), + score: hit.score, + }); + } + } + + // (b) lesson substring scan + if (channelSet.has("lesson")) { + const lessons = await kv.list(KV.lessons); + for (const lesson of lessons) { + if (lesson.deleted) continue; + if (!lesson.content) continue; + if (!lesson.content.toLowerCase().includes(qLower)) continue; + const ts = lesson.createdAt; + if (!inRange(ts, since, until)) continue; + items.push({ + timestamp: ts, + channel: "lesson", + id: lesson.id, + project: lesson.project, + title: lesson.content.slice(0, 80), + snippet: buildSnippet(lesson.content, qLower), + score: 0, + }); + } + } + + // (c) summary substring scan + if (channelSet.has("summary")) { + const summaries = await kv.list(KV.summaries); + for (const sum of summaries) { + if (!sum.narrative) continue; + if (!sum.narrative.toLowerCase().includes(qLower)) continue; + const ts = sum.createdAt; + if (!inRange(ts, since, until)) continue; + items.push({ + timestamp: ts, + channel: "summary", + id: sum.sessionId, + sessionId: sum.sessionId, + project: sum.project, + title: sum.title, + snippet: buildSnippet(sum.narrative, qLower), + score: 0, + }); + } + } + + // Sort, trim to limit, then enrich (so enrichment cost scales + // with displayed items, not raw match count). + items.sort((a, b) => { + const ta = Date.parse(a.timestamp); + const tb = Date.parse(b.timestamp); + if (ta !== tb) return order === "asc" ? ta - tb : tb - ta; + return tieBreak(a, b); + }); + const trimmed = items.slice(0, limit); + + // Session lookup cache for observation/summary items. + const sessionCache = new Map(); + const loadSession = async (sessionId: string): Promise => { + if (sessionCache.has(sessionId)) return sessionCache.get(sessionId)!; + let s: Session | null = null; + try { + s = (await kv.get(KV.sessions, sessionId)) ?? null; + } catch { + s = null; + } + sessionCache.set(sessionId, s); + return s; + }; + + // Per-session observation cache so multiple hits in one session + // share a single KV.list call when computing adjacent turns. + const obsListCache = new Map(); + const loadSessionObs = async ( + sessionId: string, + ): Promise => { + if (obsListCache.has(sessionId)) return obsListCache.get(sessionId)!; + let list: CompressedObservation[] = []; + try { + list = await kv.list(KV.observations(sessionId)); + } catch { + list = []; + } + list.sort( + (a, b) => Date.parse(a.timestamp) - Date.parse(b.timestamp), + ); + obsListCache.set(sessionId, list); + return list; + }; + + for (const item of trimmed) { + if (item.channel === "observation" && item.sessionId) { + const s = await loadSession(item.sessionId); + if (s) { + item.session = { + id: s.id, + project: s.project, + startedAt: s.startedAt, + firstPrompt: s.firstPrompt, + }; + if (!item.project) item.project = s.project; + } + if (includeAdjacentTurns) { + const obsList = await loadSessionObs(item.sessionId); + const idx = obsList.findIndex((o) => o.id === item.id); + if (idx >= 0) { + // Walk backwards for the previous conversation turn + // (userPrompt → obs.narrative when type=="conversation") + // and the previous non-conversation turn (assistant-side + // tool use, which acts as a stand-in for the assistant's + // most recent observable action). + let prevUser: CompressedObservation | undefined; + let prevAssistant: CompressedObservation | undefined; + for (let i = idx - 1; i >= 0; i--) { + const o = obsList[i]; + if (!prevUser && o.type === "conversation") prevUser = o; + else if (!prevAssistant && o.type !== "conversation") + prevAssistant = o; + if (prevUser && prevAssistant) break; + } + if (prevUser || prevAssistant) { + item.adjacentTurns = { + previousUserPrompt: prevUser?.narrative, + previousAssistantSummary: + prevAssistant?.title && prevAssistant.narrative + ? `${prevAssistant.title}: ${prevAssistant.narrative}` + : prevAssistant?.narrative, + }; + } + } + } + } else if (item.channel === "summary" && item.sessionId) { + const s = await loadSession(item.sessionId); + if (s) { + item.session = { + id: s.id, + project: s.project, + startedAt: s.startedAt, + firstPrompt: s.firstPrompt, + }; + if (!item.project) item.project = s.project; + } + } + } + + const totalsByChannel: Record = { + observation: 0, + memory: 0, + lesson: 0, + summary: 0, + }; + for (const it of items) totalsByChannel[it.channel]++; + + // firstMention always points at the earliest timestamp in the + // filtered set, regardless of `order`. Use the asc-sorted view. + const earliest = + order === "asc" + ? trimmed[0] + : trimmed.length > 0 + ? trimmed[trimmed.length - 1] + : undefined; + const firstMention = earliest + ? { + timestamp: earliest.timestamp, + channel: earliest.channel, + sessionId: earliest.sessionId, + project: earliest.project, + } + : null; + + let graphNeighbors: LineageGraphNeighbor[] | undefined; + if (includeGraph) { + graphNeighbors = []; + try { + const nodes = await kv.list(KV.graphNodes); + const tokens = qLower + .split(/\s+/) + .map((t) => t.trim()) + .filter((t) => t.length >= 3); + const matchedNodes = nodes.filter((n) => { + if (!n || typeof n.name !== "string") return false; + const nameLower = n.name.toLowerCase(); + if (nameLower.includes(qLower)) return true; + for (const tok of tokens) { + if (nameLower.includes(tok)) return true; + } + return false; + }); + if (matchedNodes.length > 0) { + const edges = await kv.list(KV.graphEdges); + const nodeById = new Map(); + for (const n of nodes) nodeById.set(n.id, n); + for (const node of matchedNodes) { + const related = edges.filter( + (e) => e.sourceNodeId === node.id || e.targetNodeId === node.id, + ); + const edgeOut = related + .map((e) => { + const otherId = + e.sourceNodeId === node.id ? e.targetNodeId : e.sourceNodeId; + const other = nodeById.get(otherId); + if (!other) return null; + return { + kind: e.type, + neighbor: other.name, + neighborType: other.type as GraphNodeType, + }; + }) + .filter((e): e is NonNullable => e !== null); + graphNeighbors.push({ + name: node.name, + type: node.type, + edges: edgeOut, + }); + } + } + } catch (err) { + logger.warn("lineage: graph neighbor lookup failed", { + error: err instanceof Error ? err.message : String(err), + }); + } + } + + void safeAudit(kv, "query", "mem::lineage", [], { + query, + hits: items.length, + returned: trimmed.length, + channels: requestedChannels, + includeAdjacentTurns, + includeGraph, + }); + + const result: LineageResult = { + query, + firstMention, + timeline: trimmed, + totalsByChannel, + }; + if (graphNeighbors !== undefined) result.graphNeighbors = graphNeighbors; + return result; + }, + ); +} diff --git a/src/index.ts b/src/index.ts index 704d4809..09b6aa77 100644 --- a/src/index.ts +++ b/src/index.ts @@ -49,6 +49,7 @@ import { registerEvictFunction } from "./functions/evict.js"; import { registerRelationsFunction } from "./functions/relations.js"; import { registerTimelineFunction } from "./functions/timeline.js"; import { registerSmartSearchFunction } from "./functions/smart-search.js"; +import { registerLineageFunction } from "./functions/lineage.js"; import { registerProfileFunction } from "./functions/profile.js"; import { registerAutoForgetFunction } from "./functions/auto-forget.js"; import { registerExportImportFunction } from "./functions/export-import.js"; @@ -211,6 +212,7 @@ async function main() { registerDiskSizeManager(sdk, kv); registerCompressFunction(sdk, kv, provider, metricsStore); registerSearchFunction(sdk, kv); + registerLineageFunction(sdk, kv); registerContextFunction(sdk, kv, config.tokenBudget); registerSummarizeFunction(sdk, kv, provider, metricsStore); registerMigrateFunction(sdk, kv); @@ -481,7 +483,7 @@ async function main() { `Ready. ${embeddingProvider ? "Triple-stream (BM25+Vector+Graph)" : "BM25+Graph"} search active.`, ); bootLog( - `REST API: 124 endpoints at http://localhost:${config.restPort}/agentmemory/*`, + `REST API: 125 endpoints at http://localhost:${config.restPort}/agentmemory/*`, ); bootLog( `MCP surface (opt-in via \`npx @agentmemory/mcp\`): ${getAllTools().length} tools · 6 resources · 3 prompts`, diff --git a/src/mcp/server.ts b/src/mcp/server.ts index b3b0585d..774398e8 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -275,6 +275,41 @@ export function registerMcpEndpoints( }; } + case "memory_lineage": { + if (typeof args.query !== "string" || !args.query.trim()) { + return { + status_code: 400, + body: { error: "query is required for memory_lineage" }, + }; + } + const channels = parseCsvList(args.channels); + const payload: Record = { + query: args.query, + }; + const limit = asNumber(args.limit); + if (limit !== undefined) payload.limit = Math.max(1, Math.min(500, limit)); + if (typeof args.since === "string") payload.since = args.since; + if (typeof args.until === "string") payload.until = args.until; + if (channels.length > 0) payload.channels = channels; + if (typeof args.includeAdjacentTurns === "boolean") + payload.includeAdjacentTurns = args.includeAdjacentTurns; + if (typeof args.includeGraph === "boolean") + payload.includeGraph = args.includeGraph; + if (typeof args.order === "string") payload.order = args.order; + const result = await sdk.trigger({ + function_id: "mem::lineage", + payload, + }); + return { + status_code: 200, + body: { + content: [ + { type: "text", text: JSON.stringify(result, null, 2) }, + ], + }, + }; + } + case "memory_vision_search": { const queryText = typeof args.queryText === "string" ? args.queryText : undefined; const queryImageRef = typeof args.queryImageRef === "string" ? args.queryImageRef : undefined; diff --git a/src/mcp/tools-registry.ts b/src/mcp/tools-registry.ts index 3001cae7..5959fd0c 100644 --- a/src/mcp/tools-registry.ts +++ b/src/mcp/tools-registry.ts @@ -126,6 +126,38 @@ export const CORE_TOOLS: McpToolDef[] = [ required: ["query"], }, }, + { + name: "memory_lineage", + description: + "Concept lineage: chronologically-ordered hits for a phrase across observation, memory, lesson, and summary channels. Use to trace when a term first entered the corpus and what surrounded it.", + inputSchema: { + type: "object", + properties: { + query: { type: "string", description: "Phrase or term to trace" }, + limit: { type: "number", description: "Max timeline items (default 50)" }, + since: { type: "string", description: "ISO 8601 lower bound on timestamp" }, + until: { type: "string", description: "ISO 8601 upper bound on timestamp" }, + channels: { + type: "string", + description: + "Comma-separated channels to search: observation,memory,lesson,summary (default all)", + }, + includeAdjacentTurns: { + type: "boolean", + description: "Attach previous user/assistant turn for observation hits (default true)", + }, + includeGraph: { + type: "boolean", + description: "Attach graph-edge neighbors for matching nodes (default false)", + }, + order: { + type: "string", + description: "'asc' (oldest first, default) or 'desc'", + }, + }, + required: ["query"], + }, + }, { name: "memory_vision_search", description: @@ -917,6 +949,7 @@ export const V010_SLOTS_TOOLS: McpToolDef[] = [ }, ]; + const ESSENTIAL_TOOLS = new Set([ "memory_save", "memory_recall", diff --git a/src/triggers/api.ts b/src/triggers/api.ts index 083c2159..eb2c0dc5 100644 --- a/src/triggers/api.ts +++ b/src/triggers/api.ts @@ -991,7 +991,75 @@ export function registerApiTriggers( config: { api_path: "/agentmemory/smart-search", http_method: "POST" }, }); - sdk.registerFunction("api::timeline", + sdk.registerFunction("api::lineage", + async ( + req: ApiRequest<{ + query?: string; + limit?: number; + since?: string; + until?: string; + channels?: string[]; + includeAdjacentTurns?: boolean; + includeGraph?: boolean; + order?: string; + }>, + ): Promise => { + const authErr = checkAuth(req, secret); + if (authErr) return authErr; + const body = (req.body ?? {}) as Record; + if (typeof body.query !== "string" || !body.query.trim()) { + return { status_code: 400, body: { error: "query is required" } }; + } + if ( + body.limit !== undefined && + (!Number.isInteger(body.limit) || (body.limit as number) < 1) + ) { + return { status_code: 400, body: { error: "limit must be a positive integer" } }; + } + if ( + body.channels !== undefined && + (!Array.isArray(body.channels) || + !body.channels.every((c) => typeof c === "string")) + ) { + return { + status_code: 400, + body: { error: "channels must be an array of strings" }, + }; + } + if ( + body.order !== undefined && + (typeof body.order !== "string" || + !["asc", "desc"].includes(body.order.trim().toLowerCase())) + ) { + return { + status_code: 400, + body: { error: "order must be 'asc' or 'desc'" }, + }; + } + const result = await sdk.trigger({ + function_id: "mem::lineage", + payload: req.body, + }); + // mem::lineage returns { error } on validation problems we + // didn't catch upstream (e.g. empty trimmed query). Surface as 400. + if ( + result && + typeof result === "object" && + "error" in (result as Record) && + !("timeline" in (result as Record)) + ) { + return { status_code: 400, body: result }; + } + return { status_code: 200, body: result }; + }, + ); + sdk.registerTrigger({ + type: "http", + function_id: "api::lineage", + config: { api_path: "/agentmemory/lineage", http_method: "POST" }, + }); + + sdk.registerFunction("api::timeline", async ( req: ApiRequest<{ anchor: string; diff --git a/src/types.ts b/src/types.ts index 72e347b3..e66988ca 100644 --- a/src/types.ts +++ b/src/types.ts @@ -282,6 +282,58 @@ export interface TimelineEntry { relativePosition: number; } +export type LineageChannel = "observation" | "memory" | "lesson" | "summary"; + +export interface TimelineItem { + timestamp: string; + channel: LineageChannel; + id: string; + sessionId?: string; + project?: string; + title: string; + type?: string; + snippet: string; + score: number; + // memory-specific + sourceFile?: string; + memoryType?: Memory["type"]; + // session enrichment (observation/summary) + session?: { + id: string; + project: string; + startedAt: string; + firstPrompt?: string; + }; + // observation-only enrichment + adjacentTurns?: { + previousUserPrompt?: string; + previousAssistantSummary?: string; + }; +} + +export interface LineageGraphNeighbor { + name: string; + type: GraphNodeType; + edges: Array<{ + kind: GraphEdgeType; + neighbor: string; + neighborType: GraphNodeType; + }>; +} + +export interface LineageResult { + query: string; + firstMention: { + timestamp: string; + channel: LineageChannel; + sessionId?: string; + project?: string; + } | null; + timeline: TimelineItem[]; + totalsByChannel: Record; + graphNeighbors?: LineageGraphNeighbor[]; +} + export interface ProjectProfile { project: string; updatedAt: string; @@ -546,7 +598,8 @@ export interface AuditEntry { | "slot_replace" | "slot_create" | "slot_delete" - | "slot_reflect"; + | "slot_reflect" + | "query"; userId?: string; functionId: string; targetIds: string[]; diff --git a/test/mcp-standalone.test.ts b/test/mcp-standalone.test.ts index b48eade9..80262188 100644 --- a/test/mcp-standalone.test.ts +++ b/test/mcp-standalone.test.ts @@ -68,8 +68,8 @@ describe("Tools Registry", () => { } }); - it("CORE_TOOLS has 14 items", () => { - expect(CORE_TOOLS.length).toBe(14); + it("CORE_TOOLS has 15 items", () => { + expect(CORE_TOOLS.length).toBe(15); }); it("V040_TOOLS has 8 items", () => { From 6a4de14afd05705d232492368d723ec7349ca662 Mon Sep 17 00:00:00 2001 From: Ruben de Smet Date: Wed, 20 May 2026 18:07:09 +0200 Subject: [PATCH 2/5] fix(lineage): address CodeRabbit #570 review Three real issues caught in review: 1. firstMention computed from `trimmed` (post-limit page) instead of `items` (entire filtered set). When `order:desc` + a small `limit` truncated a session with many hits, the reported firstMention was the oldest-in-page, not the actual earliest filtered hit. Switch to `items` so the semantic contract holds regardless of page size. 2. MCP boundary (memory_lineage in src/mcp/server.ts) accepted any non-integer `limit` and any `order` string. Now: validate `limit` is a positive integer (400 otherwise), validate `order` is "asc"|"desc" (400 otherwise), filter `channels` to the known enum before forwarding. 3. REST boundary (api::lineage in src/triggers/api.ts) was forwarding raw `req.body` after validation, which leaks caller-controlled keys to the downstream function. Build a whitelisted payload from the validated fields only. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/functions/lineage.ts | 12 ++++++++---- src/mcp/server.ts | 30 +++++++++++++++++++++++++++--- src/triggers/api.ts | 17 ++++++++++++++++- 3 files changed, 51 insertions(+), 8 deletions(-) diff --git a/src/functions/lineage.ts b/src/functions/lineage.ts index 1a912968..34070209 100644 --- a/src/functions/lineage.ts +++ b/src/functions/lineage.ts @@ -364,12 +364,16 @@ export function registerLineageFunction(sdk: ISdk, kv: StateKV): void { for (const it of items) totalsByChannel[it.channel]++; // firstMention always points at the earliest timestamp in the - // filtered set, regardless of `order`. Use the asc-sorted view. + // ENTIRE filtered set, regardless of `order` or `limit`. Use + // `items` (pre-trim, fully sorted), not `trimmed` — otherwise a + // session with more hits than the page size + `order:desc` would + // report the oldest-in-page as firstMention instead of the actual + // earliest mention. CodeRabbit caught this on #570. const earliest = order === "asc" - ? trimmed[0] - : trimmed.length > 0 - ? trimmed[trimmed.length - 1] + ? items[0] + : items.length > 0 + ? items[items.length - 1] : undefined; const firstMention = earliest ? { diff --git a/src/mcp/server.ts b/src/mcp/server.ts index 774398e8..e5768758 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -283,19 +283,43 @@ export function registerMcpEndpoints( }; } const channels = parseCsvList(args.channels); + // Filter to the known channel enum so unknown values don't + // reach the daemon. + const validChannels = channels.filter((c) => + ["observation", "memory", "lesson", "summary"].includes(c), + ); const payload: Record = { query: args.query, }; const limit = asNumber(args.limit); - if (limit !== undefined) payload.limit = Math.max(1, Math.min(500, limit)); + if (args.limit !== undefined) { + if (limit === undefined || !Number.isInteger(limit) || limit < 1) { + return { + status_code: 400, + body: { error: "limit must be a positive integer" }, + }; + } + payload.limit = Math.min(500, limit); + } if (typeof args.since === "string") payload.since = args.since; if (typeof args.until === "string") payload.until = args.until; - if (channels.length > 0) payload.channels = channels; + if (validChannels.length > 0) payload.channels = validChannels; if (typeof args.includeAdjacentTurns === "boolean") payload.includeAdjacentTurns = args.includeAdjacentTurns; if (typeof args.includeGraph === "boolean") payload.includeGraph = args.includeGraph; - if (typeof args.order === "string") payload.order = args.order; + if (args.order !== undefined) { + if ( + typeof args.order !== "string" || + !["asc", "desc"].includes(args.order) + ) { + return { + status_code: 400, + body: { error: "order must be 'asc' or 'desc'" }, + }; + } + payload.order = args.order; + } const result = await sdk.trigger({ function_id: "mem::lineage", payload, diff --git a/src/triggers/api.ts b/src/triggers/api.ts index eb2c0dc5..54283f8f 100644 --- a/src/triggers/api.ts +++ b/src/triggers/api.ts @@ -1036,9 +1036,24 @@ export function registerApiTriggers( body: { error: "order must be 'asc' or 'desc'" }, }; } + // Whitelisted payload: only forward validated fields, never raw + // req.body — caller-controlled keys could otherwise trip + // unintended branches in the downstream function. CodeRabbit + // caught this on #570. + const payload: Record = { query: body.query }; + if (body.limit !== undefined) payload.limit = body.limit; + if (typeof body.since === "string") payload.since = body.since; + if (typeof body.until === "string") payload.until = body.until; + if (Array.isArray(body.channels)) payload.channels = body.channels; + if (typeof body.includeAdjacentTurns === "boolean") + payload.includeAdjacentTurns = body.includeAdjacentTurns; + if (typeof body.includeGraph === "boolean") + payload.includeGraph = body.includeGraph; + if (typeof body.order === "string") + payload.order = (body.order as string).trim().toLowerCase(); const result = await sdk.trigger({ function_id: "mem::lineage", - payload: req.body, + payload, }); // mem::lineage returns { error } on validation problems we // didn't catch upstream (e.g. empty trimmed query). Surface as 400. From 4cd1f4c734d6a9659d1f3a7f077d5a60af3b54c6 Mon Sep 17 00:00:00 2001 From: Ruben de Smet Date: Wed, 20 May 2026 18:21:42 +0200 Subject: [PATCH 3/5] =?UTF-8?q?fix(lineage):=20CodeRabbit=20#570=20re-revi?= =?UTF-8?q?ew=20=E2=80=94=20channels=20rejection=20+=20firstMention=20tieb?= =?UTF-8?q?reak?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two follow-up issues from CodeRabbit's review of 6a4de14: 1. `channels` silent broadening: when the user passed `channels` but none were in the known enum (e.g. `["foobar","baz"]`), the previous fix dropped to an empty `validChannels` and the conditional then omitted `payload.channels` entirely — falling back to all-channels default. Now: if the user explicitly passed channels but none are valid, return 400. Silently broadening invalidates caller intent. 2. `firstMention` could differ by `order`: picking `items[0]` (asc) or `items[items.length-1]` (desc) relied on the array's tiebreak rule to settle equal-timestamp ties. Two items sharing the earliest timestamp on different channels would resolve differently depending on `order`. Switch to an order-independent min-by-timestamp reduce so the "earliest in filtered set" contract is stable. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/functions/lineage.ts | 21 ++++++++++----------- src/mcp/server.ts | 15 +++++++++++++-- 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/src/functions/lineage.ts b/src/functions/lineage.ts index 34070209..ac5db7f9 100644 --- a/src/functions/lineage.ts +++ b/src/functions/lineage.ts @@ -364,17 +364,16 @@ export function registerLineageFunction(sdk: ISdk, kv: StateKV): void { for (const it of items) totalsByChannel[it.channel]++; // firstMention always points at the earliest timestamp in the - // ENTIRE filtered set, regardless of `order` or `limit`. Use - // `items` (pre-trim, fully sorted), not `trimmed` — otherwise a - // session with more hits than the page size + `order:desc` would - // report the oldest-in-page as firstMention instead of the actual - // earliest mention. CodeRabbit caught this on #570. - const earliest = - order === "asc" - ? items[0] - : items.length > 0 - ? items[items.length - 1] - : undefined; + // ENTIRE filtered set, independent of `order` AND of how the + // tiebreaker ranks items with equal earliest timestamps. Pick the + // min-by-timestamp directly instead of trusting position in the + // (order-dependent) sorted list — CodeRabbit caught the + // tiebreaker variance in the #570 re-review. + const earliest = items.length > 0 + ? items.reduce((a, b) => + Date.parse(a.timestamp) <= Date.parse(b.timestamp) ? a : b, + ) + : undefined; const firstMention = earliest ? { timestamp: earliest.timestamp, diff --git a/src/mcp/server.ts b/src/mcp/server.ts index e5768758..469244c5 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -283,11 +283,22 @@ export function registerMcpEndpoints( }; } const channels = parseCsvList(args.channels); - // Filter to the known channel enum so unknown values don't - // reach the daemon. + // Validate channel names against the enum. If the user + // passed channels but NONE are valid, 400 instead of + // silently broadening to all channels (CodeRabbit caught + // this in the #570 re-review). const validChannels = channels.filter((c) => ["observation", "memory", "lesson", "summary"].includes(c), ); + if (channels.length > 0 && validChannels.length === 0) { + return { + status_code: 400, + body: { + error: + "channels must contain at least one of: observation, memory, lesson, summary", + }, + }; + } const payload: Record = { query: args.query, }; From 278941fbc01db7bfbc543d6e7adc5f9705b32b7d Mon Sep 17 00:00:00 2001 From: Ruben de Smet Date: Wed, 20 May 2026 22:11:03 +0200 Subject: [PATCH 4/5] lineage(mcp): reject blank channels= input CodeRabbit caught a hole in the channels validation: pure-whitespace input like " , " parses to [] via parseCsvList, so the channels.length>0 guard short-circuits and the request silently broadens to all channels even though the caller meant to scope it. Check raw presence with args.channels !== undefined. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/mcp/server.ts | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mcp/server.ts b/src/mcp/server.ts index 469244c5..b0ce32b8 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -282,15 +282,15 @@ export function registerMcpEndpoints( body: { error: "query is required for memory_lineage" }, }; } + // Pure-whitespace input like ", " parses to []; treat raw + // presence (not length) as "caller supplied channels" so we + // 400 instead of silently broadening to all channels. + const channelsProvided = args.channels !== undefined; const channels = parseCsvList(args.channels); - // Validate channel names against the enum. If the user - // passed channels but NONE are valid, 400 instead of - // silently broadening to all channels (CodeRabbit caught - // this in the #570 re-review). const validChannels = channels.filter((c) => ["observation", "memory", "lesson", "summary"].includes(c), ); - if (channels.length > 0 && validChannels.length === 0) { + if (channelsProvided && validChannels.length === 0) { return { status_code: 400, body: { From f2c14a356f2dd7e126d4e766f32f839dccc41e12 Mon Sep 17 00:00:00 2001 From: Ruben de Smet Date: Thu, 21 May 2026 14:11:42 +0200 Subject: [PATCH 5/5] lineage(mcp): drop residual WHAT-style comment on channels validation CodeRabbit re-flagged the three-line note left by #570's earlier push. `channelsProvided = args.channels !== undefined` is self- descriptive; the rationale belongs in the PR/commit, not inline. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/mcp/server.ts | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/mcp/server.ts b/src/mcp/server.ts index b0ce32b8..6cc2b64f 100644 --- a/src/mcp/server.ts +++ b/src/mcp/server.ts @@ -282,9 +282,6 @@ export function registerMcpEndpoints( body: { error: "query is required for memory_lineage" }, }; } - // Pure-whitespace input like ", " parses to []; treat raw - // presence (not length) as "caller supplied channels" so we - // 400 instead of silently broadening to all channels. const channelsProvided = args.channels !== undefined; const channels = parseCsvList(args.channels); const validChannels = channels.filter((c) =>