feat(lineage): mem::lineage primitive — chronological concept retrieval across all channels#570
feat(lineage): mem::lineage primitive — chronological concept retrieval across all channels#570efenex wants to merge 3 commits into
Conversation
Returns chronologically-sorted hits across observation/memory/lesson/ summary channels — answers "when did this term enter the corpus and what surrounded it?". Includes BM25 sweep over obs+memory, substring scan for lessons/summaries, optional adjacent-turn enrichment, and optional graph-neighbor attachment. Gap-2 fix bundled: BM25 sweep cap raised from min(limit*4, 500) to min(limit*20, 5000) so deep in-session refs in large jsonl-imported sessions (10k+ obs) still rank into the channel-filtered top N. Wires: - src/functions/lineage.ts (new) - mem::lineage MCP tool in CORE_TOOLS - POST /agentmemory/lineage REST endpoint - AuditEntry operation: + "query" - LineageChannel / TimelineItem / LineageGraphNeighbor / LineageResult types - design + test-case docs under docs/plans/ Counts bumped to keep README/AGENTS/boot message/test in sync: CORE_TOOLS 12 → 13, total MCP tools 51 → 52, REST endpoints 121 → 122. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughImplements mem::lineage: a chronological phrase-lineage retrieval across observation, memory, lesson, and summary channels with session enrichment, optional adjacent-turn reconstruction and graph neighbors, plus HTTP (POST /agentmemory/lineage) and MCP exposure, types, tests, and design/test docs. Changesmem::lineage Retrieval Feature
Sequence DiagramsequenceDiagram
participant Client as HTTP/MCP Client
participant Handler as Lineage Handler
participant BM25 as BM25 Index
participant KV as KV Store
participant Session as Session Cache
participant Graph as Graph Data
Client->>Handler: query, limit, channels, time range
Handler->>BM25: BM25 search (observation/memory)
BM25-->>Handler: timeline hits
Handler->>KV: List/scan lessons & summaries
KV-->>Handler: additional items
Handler->>Handler: Sort by timestamp + tie-break
Handler->>Session: Lookup session metadata
Session-->>Handler: Session info (cached)
Handler->>Handler: Enrich items with session
Handler->>Handler: (optional) Compute adjacentTurns
Handler->>Graph: Match query tokens to node names
Graph-->>Handler: Matched nodes + edges
Handler-->>Client: LineageResult (timeline, totals, firstMention, graphNeighbors)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/functions/lineage.ts`:
- Around line 366-373: The calculation of earliest/firstMention is using the
page-truncated array trimmed, so when order === "desc" and limit removed older
items it incorrectly omits earlier hits; update the logic that sets
earliest/firstMention to derive from the full filtered set (e.g., filteredHits
or the pre-trimmed array used before slicing) rather than from trimmed so that
firstMention always points to the true earliest timestamp in the filtered set
regardless of order or pagination (adjust references to earliest, firstMention,
trimmed, and order accordingly).
In `@src/mcp/server.ts`:
- Around line 289-299: The handler currently accepts non-integer limits and any
order string; tighten validation by ensuring limit is a finite integer within
[1,500] (use Number.isInteger on the parsed value from asNumber(args.limit)) and
return/raise a clear MCP validation error if it fails instead of silently
clamping; likewise, validate args.order against an explicit allowed set (e.g.,
allowedOrders = ['asc','desc'] or the app-specific enum) before assigning
payload.order and return/raise a validation error for unknown values. Update the
validation around the asNumber(args.limit) usage and the payload.order
assignment before calling sdk.trigger so invalid inputs are rejected at the MCP
boundary.
In `@src/triggers/api.ts`:
- Around line 1009-1042: The code currently forwards the raw req.body to
sdk.trigger for function_id "mem::lineage"; instead construct a whitelisted
payload object from the validated local variable body (not req.body) including
only the allowed fields (query, limit, channels, order) — normalize order to
lower-case and ensure limit is a number and channels is an array of strings —
then call sdk.trigger({ function_id: "mem::lineage", payload: payload }) so no
unvalidated fields are passed downstream.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc313b12-f9f6-49dd-974e-8841efbe68ad
📒 Files selected for processing (11)
AGENTS.mdREADME.mddocs/plans/v4-lineage-design.mddocs/plans/v4-lineage-test-case-careful-generator.mdsrc/functions/lineage.tssrc/index.tssrc/mcp/server.tssrc/mcp/tools-registry.tssrc/triggers/api.tssrc/types.tstest/mcp-standalone.test.ts
Three real issues caught in review: 1. firstMention computed from `trimmed` (post-limit page) instead of `items` (entire filtered set). When `order:desc` + a small `limit` truncated a session with many hits, the reported firstMention was the oldest-in-page, not the actual earliest filtered hit. Switch to `items` so the semantic contract holds regardless of page size. 2. MCP boundary (memory_lineage in src/mcp/server.ts) accepted any non-integer `limit` and any `order` string. Now: validate `limit` is a positive integer (400 otherwise), validate `order` is "asc"|"desc" (400 otherwise), filter `channels` to the known enum before forwarding. 3. REST boundary (api::lineage in src/triggers/api.ts) was forwarding raw `req.body` after validation, which leaks caller-controlled keys to the downstream function. Build a whitelisted payload from the validated fields only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three real issues caught in review: 1. firstMention computed from `trimmed` (post-limit page) instead of `items` (entire filtered set). When `order:desc` + a small `limit` truncated a session with many hits, the reported firstMention was the oldest-in-page, not the actual earliest filtered hit. Switch to `items` so the semantic contract holds regardless of page size. 2. MCP boundary (memory_lineage in src/mcp/server.ts) accepted any non-integer `limit` and any `order` string. Now: validate `limit` is a positive integer (400 otherwise), validate `order` is "asc"|"desc" (400 otherwise), filter `channels` to the known enum before forwarding. 3. REST boundary (api::lineage in src/triggers/api.ts) was forwarding raw `req.body` after validation, which leaks caller-controlled keys to the downstream function. Build a whitelisted payload from the validated fields only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d8b9f30 to
6a4de14
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/functions/lineage.ts`:
- Around line 372-377: The current selection of earliest (const earliest = order
=== "asc" ? items[0] : items[items.length - 1]) causes firstMention to flip when
multiple items share the same timestamp because it just picks opposite ends;
instead compute earliest deterministically by finding the minimal timestamp
across items and then applying a stable tie-breaker (e.g., compare channelId or
messageId lexicographically) so selection does not depend on the order
parameter—update the logic around earliest/firstMention in lineage.ts
(referencing variables items, order, earliest, firstMention) to scan items for
the minimum (timestamp, channelId/messageId) tuple and pick that entry.
In `@src/mcp/server.ts`:
- Around line 286-307: The current code silently drops invalid channel tokens by
computing validChannels from channels and only setting payload.channels when
validChannels.length>0; instead, when the caller provided a channels value but
none are valid we should reject the request with a 400 error. Modify the logic
around the validChannels computation in src/mcp/server.ts (the block that
defines validChannels and sets payload.channels) to check if args.channels (or
channels) was supplied and validChannels.length === 0, and return a 400 response
(e.g., { status_code: 400, body: { error: "invalid channels" } }) rather than
omitting payload.channels; otherwise set payload.channels = validChannels as
before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c4aa8dde-8d3f-44b4-8256-f210cff0cc93
📒 Files selected for processing (3)
src/functions/lineage.tssrc/mcp/server.tssrc/triggers/api.ts
…+ firstMention tiebreak Two follow-up issues from CodeRabbit's review of 6a4de14: 1. `channels` silent broadening: when the user passed `channels` but none were in the known enum (e.g. `["foobar","baz"]`), the previous fix dropped to an empty `validChannels` and the conditional then omitted `payload.channels` entirely — falling back to all-channels default. Now: if the user explicitly passed channels but none are valid, return 400. Silently broadening invalidates caller intent. 2. `firstMention` could differ by `order`: picking `items[0]` (asc) or `items[items.length-1]` (desc) relied on the array's tiebreak rule to settle equal-timestamp ties. Two items sharing the earliest timestamp on different channels would resolve differently depending on `order`. Switch to an order-independent min-by-timestamp reduce so the "earliest in filtered set" contract is stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/mcp/server.ts (1)
285-301:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject blank
channelsvalues too.Line 285 turns
""/" , "into[], so Line 293 misses the “caller supplied channels but none are valid” case and the request still broadens to all channels. Check raw presence (args.channels !== undefined) instead ofchannels.length > 0.Suggested fix
- const channels = parseCsvList(args.channels); + const channelsProvided = args.channels !== undefined; + const channels = parseCsvList(args.channels); const validChannels = channels.filter((c) => ["observation", "memory", "lesson", "summary"].includes(c), ); - if (channels.length > 0 && validChannels.length === 0) { + if (channelsProvided && validChannels.length === 0) { return { status_code: 400, body: {As per coding guidelines, input validation must occur at system boundaries (MCP handlers, REST endpoints).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mcp/server.ts` around lines 285 - 301, The handler currently checks channels.length > 0 which misses cases where the caller provided an empty/blank channels string (parseCsvList converts "" or " , " to []), so change the validation to test raw presence instead: replace the condition `channels.length > 0 && validChannels.length === 0` with `args.channels !== undefined && validChannels.length === 0` (keeping parseCsvList, validChannels and the existing 400 error body) so any supplied-but-empty channels input is rejected rather than silently broadening to all channels.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/mcp/server.ts`:
- Around line 285-301: The handler currently checks channels.length > 0 which
misses cases where the caller provided an empty/blank channels string
(parseCsvList converts "" or " , " to []), so change the validation to test raw
presence instead: replace the condition `channels.length > 0 &&
validChannels.length === 0` with `args.channels !== undefined &&
validChannels.length === 0` (keeping parseCsvList, validChannels and the
existing 400 error body) so any supplied-but-empty channels input is rejected
rather than silently broadening to all channels.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f53ffb7c-9890-4808-be09-9edce07ffe3f
📒 Files selected for processing (2)
src/functions/lineage.tssrc/mcp/server.ts
Summary
Adds a new MCP primitive `mem::lineage` that answers "when did this term enter the corpus, and what surrounded it?" — chronologically ordered hits for a phrase across observation, memory, lesson, and summary channels, with optional adjacent-turn enrichment and graph-neighbor attachment.
Distinct from existing retrieval:
What's included
Gap-2 fix bundled
BM25 sweep cap raised from `min(limit4, 500)` to `min(limit20, 5000)` — large jsonl-imported sessions (10k+ observations) have deep-in-session references that didn't rank into the channel-filtered top N at the old cap. With wide channel-filtering, the sweep needs more headroom.
Test plan
Related
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests