Skip to content

feat(lineage): mem::lineage primitive — chronological concept retrieval across all channels#570

Open
efenex wants to merge 3 commits into
rohitg00:mainfrom
efenex:feat/v4-a-mem-lineage
Open

feat(lineage): mem::lineage primitive — chronological concept retrieval across all channels#570
efenex wants to merge 3 commits into
rohitg00:mainfrom
efenex:feat/v4-a-mem-lineage

Conversation

@efenex
Copy link
Copy Markdown
Contributor

@efenex efenex commented May 20, 2026

Summary

Adds a new MCP primitive `mem::lineage` that answers "when did this term enter the corpus, and what surrounded it?" — chronologically ordered hits for a phrase across observation, memory, lesson, and summary channels, with optional adjacent-turn enrichment and graph-neighbor attachment.

Distinct from existing retrieval:

  • `mem::search` ranks by relevance (BM25 hybrid).
  • `mem::smart-search` is the lessons-first ranker.
  • `mem::lineage` is time-ordered, multi-channel, and enrichment-rich — the right tool for "trace this term" / "what was the first mention" workflows.

What's included

  • New REST endpoint `POST /agentmemory/lineage` (+125 from 124)
  • New MCP tool `memory_lineage` in CORE_TOOLS (CORE 15 / total 54)
  • Channels: observation, memory, lesson, summary — opt in/out per call via `channels: ["observation","lesson"]`
  • Time bounds: `since` / `until` ISO timestamps
  • Adjacent turns: for observation hits, attach previous user prompt + previous assistant action (opt-in via `includeAdjacentTurns`, default true)
  • Graph neighbors: when `includeGraph: true`, attach graph-edge neighbors for matching nodes
  • Order: `asc` (default, oldest first) or `desc`
  • firstMention: convenience field pointing at the earliest timestamp in the filtered set

Gap-2 fix bundled

BM25 sweep cap raised from `min(limit4, 500)` to `min(limit20, 5000)` — large jsonl-imported sessions (10k+ observations) have deep-in-session references that didn't rank into the channel-filtered top N at the old cap. With wide channel-filtering, the sweep needs more headroom.

Test plan

  • `npm test` passes (1140+ tests)
  • Mechanical smoke: `mem::lineage` with all 4 channels, `firstMention` populated, timeline sorted ASC, channel totals correct, adjacent turns attached, sourceFile extraction works, no-match returns empty arrays, empty query returns 400
  • Docs in `docs/plans/v4-lineage-design.md` + `docs/plans/v4-lineage-test-case-careful-generator.md`
  • Tool counts in README + AGENTS.md + boot message + test/mcp-standalone.test.ts updated

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added lineage retrieval to trace a phrase/term across observation, memory, lesson, and summary channels with time bounds, channel filtering, ordering, and optional adjacency/graph enrichment; exposed via a new REST endpoint and tool.
  • Documentation

    • Updated docs and README/AGENTS stats to 54 MCP tools and 125 REST endpoints; added lineage design and test-case guidance.
  • Tests

    • Adjusted tests for the added MCP tool.

Review Change Stack

Returns chronologically-sorted hits across observation/memory/lesson/
summary channels — answers "when did this term enter the corpus and
what surrounded it?". Includes BM25 sweep over obs+memory, substring
scan for lessons/summaries, optional adjacent-turn enrichment, and
optional graph-neighbor attachment.

Gap-2 fix bundled: BM25 sweep cap raised from min(limit*4, 500) to
min(limit*20, 5000) so deep in-session refs in large jsonl-imported
sessions (10k+ obs) still rank into the channel-filtered top N.

Wires:
- src/functions/lineage.ts (new)
- mem::lineage MCP tool in CORE_TOOLS
- POST /agentmemory/lineage REST endpoint
- AuditEntry operation: + "query"
- LineageChannel / TimelineItem / LineageGraphNeighbor / LineageResult types
- design + test-case docs under docs/plans/

Counts bumped to keep README/AGENTS/boot message/test in sync:
CORE_TOOLS 12 → 13, total MCP tools 51 → 52, REST endpoints 121 → 122.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

Implements mem::lineage: a chronological phrase-lineage retrieval across observation, memory, lesson, and summary channels with session enrichment, optional adjacent-turn reconstruction and graph neighbors, plus HTTP (POST /agentmemory/lineage) and MCP exposure, types, tests, and design/test docs.

Changes

mem::lineage Retrieval Feature

Layer / File(s) Summary
Design & Test Case Documentation
docs/plans/v4-lineage-design.md, docs/plans/v4-lineage-test-case-careful-generator.md
v4-A design doc specifies the mem::lineage request/response contract, multi-channel retrieval algorithm (BM25 for observation/memory, KV substring for lessons/summaries), enrichment strategy (session metadata, adjacent turns, graph neighbors), implementation files, and validation criteria. Companion test case doc defines the "careful generator" regression scenario with observed behavior and follow-up priorities.
Type Definitions & Contracts
src/types.ts
Adds LineageChannel union, TimelineItem interface with optional enrichment (memory, session, adjacentTurns), LineageGraphNeighbor interface, LineageResult interface, and extends AuditEntry.operation union to include "query".
Core Lineage Retrieval Implementation
src/functions/lineage.ts
Implements mem::lineage SDK function handler: validates/normalizes query and time/channel/ordering parameters, performs BM25-backed retrieval for observation/memory and KV substring scan for lessons/summaries, merges and sorts timeline items by timestamp with deterministic tie-breaking, enriches items with session metadata via per-session caching, optionally computes adjacentTurns by backward-walking observations, calculates per-channel totals and firstMention, optionally builds graph neighbors from graph KV data, audits, and returns LineageResult.
HTTP API Endpoint Registration
src/triggers/api.ts
Registers POST /agentmemory/lineage endpoint, validates request body for required query and optional limit/channels/order fields, forwards a whitelisted payload to mem::lineage, and maps upstream { error } (without timeline) to HTTP 400; otherwise returns HTTP 200 with result.
MCP Tool Handler & Registry
src/mcp/tools-registry.ts, src/mcp/server.ts
Adds memory_lineage tool entry to CORE_TOOLS (query, time bounds, channels, limit, includeAdjacentTurns, includeGraph, order) and implements MCP handler that validates args, parses optional params (CSV channels), clamps limit, triggers mem::lineage, and returns result as MCP content.
Worker Initialization & Documentation Updates
src/index.ts, test/mcp-standalone.test.ts, AGENTS.md, README.md
Imports and registers lineage on startup, updates readiness log REST endpoint count to 125, updates test asserting CORE_TOOLS length to 15, and updates AGENTS.md/README.md stats to MCP tools = 54 and REST endpoints = 125.

Sequence Diagram

sequenceDiagram
  participant Client as HTTP/MCP Client
  participant Handler as Lineage Handler
  participant BM25 as BM25 Index
  participant KV as KV Store
  participant Session as Session Cache
  participant Graph as Graph Data
  Client->>Handler: query, limit, channels, time range
  Handler->>BM25: BM25 search (observation/memory)
  BM25-->>Handler: timeline hits
  Handler->>KV: List/scan lessons & summaries
  KV-->>Handler: additional items
  Handler->>Handler: Sort by timestamp + tie-break
  Handler->>Session: Lookup session metadata
  Session-->>Handler: Session info (cached)
  Handler->>Handler: Enrich items with session
  Handler->>Handler: (optional) Compute adjacentTurns
  Handler->>Graph: Match query tokens to node names
  Graph-->>Handler: Matched nodes + edges
  Handler-->>Client: LineageResult (timeline, totals, firstMention, graphNeighbors)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hop through snippets, sessions, and time,
Tracing a phrase in memory's rhyme,
Stitching turns and graph-lit trails,
A lineage found where recall prevails,
📜✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: introduction of a new mem::lineage primitive for chronological concept retrieval across multiple channels.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/lineage.ts`:
- Around line 366-373: The calculation of earliest/firstMention is using the
page-truncated array trimmed, so when order === "desc" and limit removed older
items it incorrectly omits earlier hits; update the logic that sets
earliest/firstMention to derive from the full filtered set (e.g., filteredHits
or the pre-trimmed array used before slicing) rather than from trimmed so that
firstMention always points to the true earliest timestamp in the filtered set
regardless of order or pagination (adjust references to earliest, firstMention,
trimmed, and order accordingly).

In `@src/mcp/server.ts`:
- Around line 289-299: The handler currently accepts non-integer limits and any
order string; tighten validation by ensuring limit is a finite integer within
[1,500] (use Number.isInteger on the parsed value from asNumber(args.limit)) and
return/raise a clear MCP validation error if it fails instead of silently
clamping; likewise, validate args.order against an explicit allowed set (e.g.,
allowedOrders = ['asc','desc'] or the app-specific enum) before assigning
payload.order and return/raise a validation error for unknown values. Update the
validation around the asNumber(args.limit) usage and the payload.order
assignment before calling sdk.trigger so invalid inputs are rejected at the MCP
boundary.

In `@src/triggers/api.ts`:
- Around line 1009-1042: The code currently forwards the raw req.body to
sdk.trigger for function_id "mem::lineage"; instead construct a whitelisted
payload object from the validated local variable body (not req.body) including
only the allowed fields (query, limit, channels, order) — normalize order to
lower-case and ensure limit is a number and channels is an array of strings —
then call sdk.trigger({ function_id: "mem::lineage", payload: payload }) so no
unvalidated fields are passed downstream.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc313b12-f9f6-49dd-974e-8841efbe68ad

📥 Commits

Reviewing files that changed from the base of the PR and between 93d1bdd and 3ae80e4.

📒 Files selected for processing (11)
  • AGENTS.md
  • README.md
  • docs/plans/v4-lineage-design.md
  • docs/plans/v4-lineage-test-case-careful-generator.md
  • src/functions/lineage.ts
  • src/index.ts
  • src/mcp/server.ts
  • src/mcp/tools-registry.ts
  • src/triggers/api.ts
  • src/types.ts
  • test/mcp-standalone.test.ts

Comment thread src/functions/lineage.ts Outdated
Comment thread src/mcp/server.ts
Comment thread src/triggers/api.ts
efenex added a commit to efenex/agentmemory that referenced this pull request May 20, 2026
Three real issues caught in review:

1. firstMention computed from `trimmed` (post-limit page) instead of
   `items` (entire filtered set). When `order:desc` + a small `limit`
   truncated a session with many hits, the reported firstMention was
   the oldest-in-page, not the actual earliest filtered hit. Switch to
   `items` so the semantic contract holds regardless of page size.

2. MCP boundary (memory_lineage in src/mcp/server.ts) accepted any
   non-integer `limit` and any `order` string. Now: validate `limit`
   is a positive integer (400 otherwise), validate `order` is
   "asc"|"desc" (400 otherwise), filter `channels` to the known enum
   before forwarding.

3. REST boundary (api::lineage in src/triggers/api.ts) was forwarding
   raw `req.body` after validation, which leaks caller-controlled keys
   to the downstream function. Build a whitelisted payload from the
   validated fields only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three real issues caught in review:

1. firstMention computed from `trimmed` (post-limit page) instead of
   `items` (entire filtered set). When `order:desc` + a small `limit`
   truncated a session with many hits, the reported firstMention was
   the oldest-in-page, not the actual earliest filtered hit. Switch to
   `items` so the semantic contract holds regardless of page size.

2. MCP boundary (memory_lineage in src/mcp/server.ts) accepted any
   non-integer `limit` and any `order` string. Now: validate `limit`
   is a positive integer (400 otherwise), validate `order` is
   "asc"|"desc" (400 otherwise), filter `channels` to the known enum
   before forwarding.

3. REST boundary (api::lineage in src/triggers/api.ts) was forwarding
   raw `req.body` after validation, which leaks caller-controlled keys
   to the downstream function. Build a whitelisted payload from the
   validated fields only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@efenex efenex force-pushed the feat/v4-a-mem-lineage branch from d8b9f30 to 6a4de14 Compare May 20, 2026 16:08
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/lineage.ts`:
- Around line 372-377: The current selection of earliest (const earliest = order
=== "asc" ? items[0] : items[items.length - 1]) causes firstMention to flip when
multiple items share the same timestamp because it just picks opposite ends;
instead compute earliest deterministically by finding the minimal timestamp
across items and then applying a stable tie-breaker (e.g., compare channelId or
messageId lexicographically) so selection does not depend on the order
parameter—update the logic around earliest/firstMention in lineage.ts
(referencing variables items, order, earliest, firstMention) to scan items for
the minimum (timestamp, channelId/messageId) tuple and pick that entry.

In `@src/mcp/server.ts`:
- Around line 286-307: The current code silently drops invalid channel tokens by
computing validChannels from channels and only setting payload.channels when
validChannels.length>0; instead, when the caller provided a channels value but
none are valid we should reject the request with a 400 error. Modify the logic
around the validChannels computation in src/mcp/server.ts (the block that
defines validChannels and sets payload.channels) to check if args.channels (or
channels) was supplied and validChannels.length === 0, and return a 400 response
(e.g., { status_code: 400, body: { error: "invalid channels" } }) rather than
omitting payload.channels; otherwise set payload.channels = validChannels as
before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4aa8dde-8d3f-44b4-8256-f210cff0cc93

📥 Commits

Reviewing files that changed from the base of the PR and between 3ae80e4 and 6a4de14.

📒 Files selected for processing (3)
  • src/functions/lineage.ts
  • src/mcp/server.ts
  • src/triggers/api.ts

Comment thread src/functions/lineage.ts Outdated
Comment thread src/mcp/server.ts Outdated
…+ firstMention tiebreak

Two follow-up issues from CodeRabbit's review of 6a4de14:

1. `channels` silent broadening: when the user passed `channels` but
   none were in the known enum (e.g. `["foobar","baz"]`), the previous
   fix dropped to an empty `validChannels` and the conditional then
   omitted `payload.channels` entirely — falling back to all-channels
   default. Now: if the user explicitly passed channels but none are
   valid, return 400. Silently broadening invalidates caller intent.

2. `firstMention` could differ by `order`: picking `items[0]` (asc) or
   `items[items.length-1]` (desc) relied on the array's tiebreak rule
   to settle equal-timestamp ties. Two items sharing the earliest
   timestamp on different channels would resolve differently depending
   on `order`. Switch to an order-independent min-by-timestamp reduce
   so the "earliest in filtered set" contract is stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/mcp/server.ts (1)

285-301: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject blank channels values too.

Line 285 turns "" / " , " into [], so Line 293 misses the “caller supplied channels but none are valid” case and the request still broadens to all channels. Check raw presence (args.channels !== undefined) instead of channels.length > 0.

Suggested fix
-            const channels = parseCsvList(args.channels);
+            const channelsProvided = args.channels !== undefined;
+            const channels = parseCsvList(args.channels);
             const validChannels = channels.filter((c) =>
               ["observation", "memory", "lesson", "summary"].includes(c),
             );
-            if (channels.length > 0 && validChannels.length === 0) {
+            if (channelsProvided && validChannels.length === 0) {
               return {
                 status_code: 400,
                 body: {

As per coding guidelines, input validation must occur at system boundaries (MCP handlers, REST endpoints).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp/server.ts` around lines 285 - 301, The handler currently checks
channels.length > 0 which misses cases where the caller provided an empty/blank
channels string (parseCsvList converts "" or " , " to []), so change the
validation to test raw presence instead: replace the condition `channels.length
> 0 && validChannels.length === 0` with `args.channels !== undefined &&
validChannels.length === 0` (keeping parseCsvList, validChannels and the
existing 400 error body) so any supplied-but-empty channels input is rejected
rather than silently broadening to all channels.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/mcp/server.ts`:
- Around line 285-301: The handler currently checks channels.length > 0 which
misses cases where the caller provided an empty/blank channels string
(parseCsvList converts "" or " , " to []), so change the validation to test raw
presence instead: replace the condition `channels.length > 0 &&
validChannels.length === 0` with `args.channels !== undefined &&
validChannels.length === 0` (keeping parseCsvList, validChannels and the
existing 400 error body) so any supplied-but-empty channels input is rejected
rather than silently broadening to all channels.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f53ffb7c-9890-4808-be09-9edce07ffe3f

📥 Commits

Reviewing files that changed from the base of the PR and between 6a4de14 and 4cd1f4c.

📒 Files selected for processing (2)
  • src/functions/lineage.ts
  • src/mcp/server.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant