Skip to content

feat(smart-search): boost title/narrative matches on 'who/what is X' queries#571

Open
efenex wants to merge 2 commits into
rohitg00:mainfrom
efenex:feat/v4-b-smart-search-named-concept-boost
Open

feat(smart-search): boost title/narrative matches on 'who/what is X' queries#571
efenex wants to merge 2 commits into
rohitg00:mainfrom
efenex:feat/v4-b-smart-search-named-concept-boost

Conversation

@efenex
Copy link
Copy Markdown
Contributor

@efenex efenex commented May 20, 2026

Summary

For named-concept queries ("who is the careful generator?", "what is a circuit breaker", "what does eventual consistency mean?"), the BM25 hybrid ranker scores busier observations above records that name the concept directly — question scaffolding tokens ("who", "is", "the") add noise that dilutes the true match signal. The record that defines the concept ranks below records that mention it incidentally.

What it does

  1. Detect the query as a named-concept pattern via 5 regexes (`/who is/`, `/what is/`, `/what's/`, `/what does X mean/`, `/who's/`). Skip if no match.
  2. Extract the concept phrase (e.g. "careful generator"). Reject degenerate phrases — single tokens shorter than 3 chars (`it`, `x`) and phrases longer than 6 tokens.
  3. Deepen the BM25 sweep to `limit*3` so the boost has candidates to re-rank (boost on a top-10 set has limited room to move records around).
  4. Re-rank with multiplicative boosts:
    • Title contains the phrase → 2.0×
    • Narrative contains the phrase → 1.3×
  5. Same treatment for lessons whose content contains the phrase (2.0×).
  6. Re-sort by combined score, trim to original `limit`.

Non-named-concept queries are untouched.

Why this lives in smart-search and not lineage

`mem::lineage` is chronologically-ordered and multi-channel; this is a ranking concern that affects the primary recall path (smart-search), which is what `memory_recall` / `memory_smart_search` MCP tools land on. Lineage benefits from upstream improvements in BM25 score, so this lift propagates.

Test plan

  • `npm test` passes
  • New unit tests for `extractNamedConcept` (7 cases) — pattern matching, degenerate-phrase rejection
  • New integration test that proves the boost re-ranks: an observation whose title contains "careful generator" but has lower BM25 score than a busier unrelated observation gets moved to rank fix: system audit -- 10 bugs fixed across hooks, triggers, and core #1
  • Non-named-concept query preserves original ordering (regression test)

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Enhanced search ranking for conceptual queries: the system now detects "who is/what is/what does…" questions and re-ranks results so matching observations and lessons surface higher for concept-focused queries.
  • Tests
    • Added coverage validating concept extraction and that conceptual queries trigger the expected result re-ranking while non-concept queries preserve original ordering.

Review Change Stack

For "who is X" / "what is X" / "what does X mean" queries, BM25 ranks
busier observations above the records that actually name X — the
question-scaffolding tokens add noise that dilutes the true match
signal. Pre-existing regression test: docs/plans/v4-lineage-test-case-
careful-generator.md (Gap exposed there, but the fix lives in smart-
search rather than lineage since smart-search is the lessons-first
ranker used by the recall paths).

Approach: detect the question pattern at handler entry, extract the
concept phrase, deepen the BM25 sweep to limit*3 so the boost has
candidates, then post-multiply combinedScore by 2.0 for title matches
and 1.3 for narrative matches, re-sort, trim to limit. Lessons whose
content names the concept get the same 2.0 title-boost.

Single-token / 6+ token phrases are skipped (degenerate). Original
ordering is preserved on non-named-concept queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c0e356d-703c-46e5-a685-8be654d139f2

📥 Commits

Reviewing files that changed from the base of the PR and between d1fcb71 and 997d25d.

📒 Files selected for processing (2)
  • src/functions/smart-search.ts
  • src/types.ts

📝 Walkthrough

Walkthrough

Adds extraction of named concepts from "who is / what is / what does X mean" queries and uses the concept to expand observation fetches and apply multiplicative boosts when the concept appears in observation titles/narratives or lesson content, then re-sorts and trims results.

Changes

Named-Concept Query Detection and Ranking Boost

Layer / File(s) Summary
Named-concept extraction and boost constants
src/functions/smart-search.ts, test/smart-search.test.ts
extractNamedConcept() parses "who is/what is/what does ... mean/who's ..." queries with regex, trims punctuation, filters degenerate short matches, and introduces title/body boost multipliers. Unit tests verify extraction and null cases.
Smart-search pipeline boost and re-ranking
src/functions/smart-search.ts, src/types.ts, test/smart-search.test.ts
mem::smart-search detects named concepts, increases observation fetch size (min(limit*3,100)), runs hybrid observation search and lesson recall in parallel, marks CompactLessonResult.boostMatched in recallLessons, applies multiplicative boosts to observation combinedScore and lesson score when concept matches, re-sorts results, and truncates back to limit. Integration tests assert boosted re-ranking and stable ordering for non-matching queries.

Sequence Diagram

sequenceDiagram
  participant Query
  participant extractNamedConcept
  participant hybridSearch
  participant lessonRecall
  participant boostProcessor
  participant returnSorted

  Query->>extractNamedConcept: parse query -> concept or null
  extractNamedConcept-->>Query: concept|null
  Query->>hybridSearch: run observation search (expanded limit if concept)
  Query->>lessonRecall: run lesson recall (pass boostPhrase)
  hybridSearch-->>boostProcessor: observations with combinedScore
  lessonRecall-->>boostProcessor: lessons with boostMatched flag
  boostProcessor->>boostProcessor: multiply observation combinedScore for title/body matches
  boostProcessor->>boostProcessor: multiply lesson score when boostMatched or content includes concept
  boostProcessor->>returnSorted: re-sort and truncate to limit
  returnSorted-->>Query: final observations and lessons
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rohitg00/agentmemory#473: Adds compact lesson inclusion and recallLessons/CompactLessonResult plumbing that this PR extends with boostMatched and named-concept ranking.

Poem

🐰 I sniff a phrase, nimble and bright,

"Who is" hops in and sets things right.
Titles gleam with a joyful boast,
Search finds the thing I needed most.
A hopping cheer — code and carrot toast!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main feature: adding a boost mechanism for 'who/what is X' named-concept queries that improves ranking by prioritizing title and narrative matches.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/smart-search.test.ts (1)

331-335: ⚡ Quick win

Tighten this assertion so dual-match regressions actually fail.

obsNamed already contains "careful generator" in both title and narrative, but the test only asserts score > 1.0. That still passes with a single applied boost, so it won't catch the bug in the new re-ranker. Either remove the narrative match from the fixture for a pure title-only case, or assert the full expected multiplier for a dual-match case.

Also applies to: 387-389

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/smart-search.test.ts` around lines 331 - 335, The test fixture obsNamed
created via makeObs currently contains "careful generator" in both title and
narrative, which makes the weak assertion (score > 1.0) insufficient; either
remove the phrase from the narrative so the fixture is a title-only match and
keep the simple assertion, or tighten the assertion to check the full expected
boosted score for a dual-match (compute and assert the exact expected
multiplier/threshold instead of >1.0). Update the corresponding duplicate
assertions mentioned (around the second occurrence at lines 387-389) to use the
same fix and reference obsNamed/makeObs when locating the fixture and
assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/smart-search.ts`:
- Around line 151-156: The current boost logic uses the truncated preview in
rawLessons, so named-concept matching misses occurrences beyond the 240-char
cutoff; update the scoring to operate on the full lesson text before any preview
truncation by either (A) running this phrase includes check against the
untruncated field returned by recallLessons (e.g., use the original full content
property such as fullContent or contentFull instead of the previewed content) or
(B) change recallLessons to preserve a fullContent field on each lesson and use
that field in the map that adjusts score (referencing rawLessons, lessons,
phrase, and NAMED_CONCEPT_TITLE_BOOST). Ensure the boost is applied using the
full text and only truncate for presentation after ranking is complete.
- Around line 143-145: The current logic in smart-search that sets mult using an
if/else if (checking title.includes(phrase) then else if
narrative.includes(phrase)) prevents applying both NAMED_CONCEPT_TITLE_BOOST and
NAMED_CONCEPT_BODY_BOOST when both title and narrative match; change it to
compute the multiplier by starting mult = 1 and multiplying by
NAMED_CONCEPT_TITLE_BOOST if title.includes(phrase) and by
NAMED_CONCEPT_BODY_BOOST if narrative.includes(phrase), then return r unchanged
when mult === 1 else return { ...r, combinedScore: r.combinedScore * mult } so
dual matches get the product of both boosts (use the existing symbols title,
narrative, phrase, mult, NAMED_CONCEPT_TITLE_BOOST, NAMED_CONCEPT_BODY_BOOST, r,
combinedScore).

---

Nitpick comments:
In `@test/smart-search.test.ts`:
- Around line 331-335: The test fixture obsNamed created via makeObs currently
contains "careful generator" in both title and narrative, which makes the weak
assertion (score > 1.0) insufficient; either remove the phrase from the
narrative so the fixture is a title-only match and keep the simple assertion, or
tighten the assertion to check the full expected boosted score for a dual-match
(compute and assert the exact expected multiplier/threshold instead of >1.0).
Update the corresponding duplicate assertions mentioned (around the second
occurrence at lines 387-389) to use the same fix and reference obsNamed/makeObs
when locating the fixture and assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c24364d3-8993-4417-a12c-9c0c02cd7c30

📥 Commits

Reviewing files that changed from the base of the PR and between 93d1bdd and d1fcb71.

📒 Files selected for processing (2)
  • src/functions/smart-search.ts
  • test/smart-search.test.ts

Comment thread src/functions/smart-search.ts Outdated
Comment thread src/functions/smart-search.ts
…l content

CodeRabbit caught two issues on rohitg00#571:

1. The boost branch used `if (title) ... else if (narrative) ...`,
   capping observations that contain the concept in BOTH fields at the
   title-only 2.0× multiplier. The feature is specified as
   multiplicative — title-and-narrative matches now compound to
   2.0 × 1.3 = 2.6×. Single-field matches behave as before.

2. The lesson boost path was scanning the 240-char preview emitted by
   recallLessons, not the lesson's full pre-truncation content. Any
   concept that appeared past the preview boundary silently missed
   the boost.

   Fix: thread the concept phrase into recallLessons via a new
   `boostPhrase` parameter. The function now decides match against
   `content + context` BEFORE truncation, stamps each result with
   `boostMatched: boolean`, and the smart-search caller uses that
   flag instead of re-scanning the preview.

   `boostMatched` added as an optional field on CompactLessonResult.
   Callers that don't pass `boostPhrase` get `boostMatched: false` —
   the smart-search caller falls back to scanning the (truncated)
   content for the phrase if `boostMatched` is absent, preserving the
   pre-fix behavior for any non-smart-search caller of recallLessons.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant