Skip to content

fix: improve search quality and query deduplication#102

Open
gzenz wants to merge 1 commit intotirth8205:mainfrom
gzenz:fix/search-quality-and-deduplication
Open

fix: improve search quality and query deduplication#102
gzenz wants to merge 1 commit intotirth8205:mainfrom
gzenz:fix/search-quality-and-deduplication

Conversation

@gzenz
Copy link
Copy Markdown
Contributor

@gzenz gzenz commented Apr 5, 2026

Summary

  • FTS5 multi-word AND logic: Queries now use "graph" AND "store" instead of phrase matching, so "graph store" finds GraphStore and related nodes
  • Deduplicated query results: callers_of/callees_of/inheritors_of no longer return duplicate nodes when multiple call-site edges exist between the same pair
  • Ambiguous query auto-resolution: Bare-name queries with multiple matches auto-resolve to the production function when exactly one non-test candidate exists
  • Test function deprioritization: Search results apply 0.5x score penalty to test functions so production code ranks higher
  • search_nodes FTS5 fast path: GraphStore.search_nodes() now tries FTS5 first, falling back to LIKE only when FTS5 returns no results
  • Composite edge index: v6 migration adds idx_edges_composite on (kind, source_qualified, target_qualified, file_path, line) for faster upsert_edge

Changed files

File Change
code_review_graph/graph.py search_nodes rewritten: FTS5 fast path with LIKE fallback
code_review_graph/search.py FTS5 AND logic for multi-word queries, test score penalty
code_review_graph/tools/query.py Deduplication via seen_qn set, ambiguous auto-resolution
code_review_graph/migrations.py v6 migration with composite edge index

Test plan

  • All 589 tests pass
  • Multi-word search "graph store" returns GraphStore (was empty before)
  • callers_of with multiple call sites returns unique nodes
  • Bare "build" query resolves to production build() not test_build()

@gzenz gzenz force-pushed the fix/search-quality-and-deduplication branch from 9975e72 to 164ea89 Compare April 5, 2026 14:04
- FTS5 multi-word queries now use AND logic instead of phrase matching,
  so "graph store" finds GraphStore and related nodes
- callers_of/callees_of/inheritors_of deduplicate results by qualified
  name (multiple call-site edges no longer produce duplicate nodes)
- Ambiguous bare-name queries auto-resolve to the production function
  when exactly one non-test candidate exists
- Test functions receive a 0.5x score penalty in hybrid search so
  production code ranks higher
- search_nodes now uses FTS5 as fast path, falling back to LIKE only
  when FTS5 returns no results
- Add v6 migration with composite edge index for upsert_edge performance
@gzenz gzenz force-pushed the fix/search-quality-and-deduplication branch from 164ea89 to 1978c14 Compare April 5, 2026 14:11
Copy link
Copy Markdown
Owner

@tirth8205 tirth8205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good search quality improvements. Note: the v6 migration in this PR conflicts with the v6 migration that's already on main. You'll need to rebase and renumber to v8 (v7 is taken by PR #127).

Please rebase on latest main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants