refactor(storage): split queries.rs into 10-module per-domain tree by sdsrss · Pull Request #15 · sdsrss/code-graph-mcp

sdsrss · 2026-05-09T20:26:17Z

Summary

Mechanical split of src/storage/queries.rs (2892 lines) into a src/storage/queries/{helpers,files,nodes,edges,vectors,search,routes,imports,project_map,dead_code}.rs tree, following the existing // --- X --- section markers.
queries/mod.rs flat-re-exports every public type and function, so the 33 existing crate::storage::queries::X import sites compile unchanged.
Tests redistributed per target module (21 tests + test_db harness now pub(super) in helpers.rs).

Layout

File	Lines	Contents
`helpers.rs`	25	`MAX_IN_PARAMS`, `first_row`, `make_placeholders`, `test_db`
`mod.rs`	55	submodule declarations + flat `pub use` re-exports
`files.rs`	160	`IndexStatus`, `FileRecord`, file CRUD
`imports.rs`	161	`FileDependency`, `get_import_tree` (recursive CTE)
`vectors.rs`	175	vec ops + unembedded queries
`project_map.rs`	278	`ModuleStats`/`Dep`/`EntryPoint`/`HotFunction`, `get_project_map`
`routes.rs`	291	`RouteMatch`, `CallerWithRouteInfo`, `ModuleExport`
`edges.rs`	340	`EdgeRecord`, `IncomingReference`, `PendingCallRow`, edge CRUD + graph helpers
`dead_code.rs`	359	`DeadCodeResult`, `find_dead_code`
`search.rs`	360	`FtsResult`, `NameCandidate`, `fts5_search`, fuzzy + Levenshtein
`nodes.rs`	814	`NodeRecord`/`Result`/`WithFile`, `NameEntry`, node CRUD + filters

NODE_SELECT, NODE_SELECT_ALIASED, and map_node_row live with NodeResult in nodes.rs (avoids a helpers ↔ nodes cross-import cycle); levenshtein stays private to search.rs, dir_of to project_map.rs, map_incoming_ref to edges.rs.

Invariants preserved

feedback_ast_search_ranking — get_nodes_with_files_by_filters keeps caller_count DESC, f.path ASC, n.start_line ASC ordering; regression test stays green.
feedback_test_filter_propagation — get_module_exports keeps n.is_test = 0 filter on both phase 1 and fallback paths.
feedback_recursive_cte_parent_tracking — get_import_tree recursive CTE structure unchanged.
feedback_match_confidence_heuristics, feedback_chars_per_token_is_bytes, feedback_edge_resolution_same_language — out of scope, untouched.

Test plan

cargo +1.95.0 clippy --all-targets clean (default + --no-default-features)
cargo test --lib — 292 passed, 0 failed (default + --no-default-features)
cargo test — 484 total passed across all 8 binaries (lib + 7 integration), 0 failed
All 21 redistributed query tests pass under their new module paths
git diff --stat: 12 files, +3018 / -2892 (overhead = per-file imports + pub use block)

Out of scope

Two more giants flagged in the prior session — src/parser/relations.rs (2783 lines / 125KB) and src/indexer/pipeline.rs (2374 lines / 105KB) — get their own session + PR each (per the original plan).

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added dead code detection for identifying unused functions and symbols.
- Enabled full-text search for discovering code entities.
- Introduced vector-based semantic search for enhanced code understanding.
- Added route and dependency mapping for visualizing module relationships and entry points.
Refactor
- Reorganized database query layer for improved performance and maintainability.

Mechanical split of the 2892-line src/storage/queries.rs into src/storage/queries/{helpers,files,nodes,edges,vectors,search,routes, imports,project_map,dead_code}.rs. Boundaries follow the existing `// --- X ---` section markers in the source. Tests redistributed per target module (21 tests; test_db harness moved to helpers.rs as pub(super)). queries/mod.rs flat-re-exports every public type and fn so the 33 existing `crate::storage::queries::X` import sites in indexer/, mcp/, graph/, and tests/ keep compiling unchanged. Cross-module utilities: - helpers.rs: MAX_IN_PARAMS, first_row, make_placeholders, test_db - nodes.rs: NODE_SELECT, NODE_SELECT_ALIASED, map_node_row (co-located with NodeResult to avoid cross-import cycles) - search.rs keeps levenshtein private; project_map keeps dir_of private; edges keeps map_incoming_ref private Verification: cargo +1.95.0 clippy --all-targets clean both feature sets; 292 lib tests + 192 integration tests pass (484 total, 0 failed) on no-default-features and default features. No SQL, no ordering, no public API changes — preserved invariants from feedback_ast_search_ranking (caller_count DESC), feedback_test_filter_propagation (n.is_test=0), feedback_recursive_cte_parent_tracking (import_tree depth handling). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-09T20:26:27Z

📝 Walkthrough

Walkthrough

This PR adds a complete SQLite query layer organizing graph database operations across eleven domain-specific modules. It introduces CRUD helpers for files, nodes, and edges; implements recursive file-dependency traversal; adds dead-code detection, route matching, project analytics, FTS5 search, and vector similarity operations; and re-exports all public APIs through a flat module interface.

Changes

SQLite Query Layer for Graph Database

Layer / File(s)	Summary
Utility Constants & Helpers `src/storage/queries/helpers.rs`	`MAX_IN_PARAMS=500`, `first_row()` iterator mapper, `make_placeholders()` for SQL generation, and `test_db()` factory.
File Metadata CRUD `src/storage/queries/files.rs`	`IndexStatus` aggregates file/node/edge counts; `upsert_file()` inserts-or-updates by path; `delete_files_by_paths()` batches deletions; hash/path/language lookups.
Node Data Model & Insertion `src/storage/queries/nodes.rs` (partial)	`NodeRecord` for inserts, `NodeResult` for queries, `NodeWithFile` for file joins; `insert_node()` and `insert_node_cached()` variants.
Edge Data Model & CRUD `src/storage/queries/edges.rs` (partial)	`EdgeRecord`, `PendingCallRow`, `IncomingReference` types; `insert_edge()` with ignore-duplicate semantics; pending-call buffering.
Node Queries & Context `src/storage/queries/nodes.rs` (continued)	Lookups by name/id/file; `get_inbound_cross_file_edges()`, `get_dirty_node_ids()` for incremental indexing; context-string updates (single/batch); missing-context detection.
Edge Traversal & Resolution `src/storage/queries/edges.rs` (continued)	Edge fetching by node; target/source name resolution (single/batch); `get_edges_batch()` with direction annotation; file path joins; incoming reference queries.
File-Level Dependencies `src/storage/queries/imports.rs`	Recursive CTEs for outgoing/incoming import/call trees; cycle prevention; symbol-count and depth computation.
Dead Code Detection `src/storage/queries/dead_code.rs`	Identifies nodes without incoming call/import/inherit/implement edges; excludes routes-to self-edges and patterns (module, main); supports test/path/type/min-lines filters; computes `has_export_edge`.
Routes & Module Exports `src/storage/queries/routes.rs`	HTTP route path extraction and prefix matching; caller route-info batching; module export discovery via explicit edges or fallback symbol collection.
Project Analytics `src/storage/queries/project_map.rs`	Per-directory stats (file/function/class counts, languages, key symbols); cross-directory import edges; HTTP and main entry points; hot functions (production vs test caller counts).
Full-Text & Fuzzy Search `src/storage/queries/search.rs`	FTS5 AND-first with OR fallback; BM25 scoring; query preprocessing (stop-words, tokenization, acronym expansion); fuzzy name matching via LIKE + Levenshtein distance.
Vector Embeddings `src/storage/queries/vectors.rs`	Single and batch node embedding insertion; vector similarity search; unembedded node prioritization by edge count; embedding statistics.
Module Re-exports `src/storage/queries/mod.rs`	Centralizes re-export of all public functions/types across modules; preserves flat `crate::storage::queries::X` import paths; conditional test-only re-exports.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

sdsrss/code-graph-mcp#6: Adds inbound cross-file edge preservation/restoration in indexer—directly related to the get_inbound_cross_file_edges and get_inbound_calls_for_pending functions introduced here.

Poem

🐰 A rabbit hops through queries deep,
SQL dreams and data to keep,
Nodes and edges dance in rows,
Dead code found where no one goes,
Vector dreams in embedding's glow! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately describes the main change: refactoring a large monolithic queries.rs file into a well-organized 10-module per-domain tree structure while maintaining backward compatibility through flat re-exports.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/split-queries-rs

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

src/storage/queries/nodes.rs (1)
395-406: ⚡ Quick win

Parameterize the caller-count subquery instead of interpolating REL_CALLS.

The relation is baked into the SQL text here, which violates the repo's parameterized-query rule and prevents this statement from being reused as-is by SQLite. Bind it as another positional parameter next to limit.
Suggested fix
     let sql = format!(
         "SELECT {cols}, f.path, f.language \
          FROM nodes n JOIN files f ON f.id = n.file_id{where_clause} \
-         ORDER BY (SELECT COUNT(*) FROM edges e WHERE e.target_id = n.id AND e.relation = '{rel}') DESC, \
+         ORDER BY (SELECT COUNT(*) FROM edges e WHERE e.target_id = n.id AND e.relation = ?{rel_idx}) DESC, \
                   f.path ASC, n.start_line ASC \
          LIMIT ?{limit_idx}",
         cols = NODE_SELECT_ALIASED,
         where_clause = where_clause,
-        rel = REL_CALLS,
-        limit_idx = params.len() + 1,
+        rel_idx = params.len() + 1,
+        limit_idx = params.len() + 2,
     );
+    params.push(Box::new(REL_CALLS.to_string()));
     params.push(Box::new(limit as i64));
As per coding guidelines, "Database queries must be parameterized — use parameterized queries in src/storage/queries.rs instead of string concatenation".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/nodes.rs` around lines 395 - 406, The SQL currently in
variable sql interpolates REL_CALLS into the query string (violating
parameterization); change the relation literal to a positional parameter
(replace '{rel}' with '?'), update limit_idx to params.len() + 2 (since you'll
add one more param before the limit), then push the relation value
(Box::new(REL_CALLS)) into params before pushing the limit (Box::new(limit as
i64)); keep NODE_SELECT_ALIASED, where_clause and the same ordering but bind the
relation as a parameter instead of string interpolation.
src/storage/queries/search.rs (1)
159-172: 💤 Low value

Consider documenting the LIMIT 5000 tradeoff in Phase 2 fallback.

The edit-distance fallback fetches up to 5000 nodes for client-side Levenshtein filtering. This is a reasonable limit to prevent unbounded scans, but it means fuzzy matching may miss valid results in very large databases. Consider adding a comment explaining this design choice.

Note: This is likely an acceptable tradeoff for a fallback mechanism that only activates when the fast LIKE-based phase returns no results.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/search.rs` around lines 159 - 172, Add a clarifying
comment above the edit-distance fallback SQL (the sql2 variable / "Phase 2"
block) documenting the tradeoff of LIMIT 5000: state that we intentionally cap
the rows retrieved to 5000 to avoid expensive/unbounded scans and explain that
this may miss matches in very large DBs but is acceptable because this is a
fallback after the fast LIKE-based phase; mention that client-side Levenshtein
filtering is applied to these results and that the limit balances performance vs
recall.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/storage/queries/nodes.rs`:
- Around line 197-217: The query currently drops any row that produced an error
by using filter_map(Result::ok) and hardcodes the relation string; change it to
bind the relation via REL_CALLS and propagate DB errors instead of silencing
them: prepare the statement to use a parameter for relation (pass
domain::REL_CALLS as the second bind), keep the same query_map closure, then
replace the filter_map/collect pipeline with collecting the iterator into a
Result<Vec<_>, _> (e.g., rows.collect::<Result<Vec<_>, _>>()?) to propagate any
row errors, and after successful collection filter out entries with empty lang
before returning; reference symbols: stmt/query_map/rows, REL_CALLS from
src/domain.rs, and the file_id bind.

In `@src/storage/queries/routes.rs`:
- Line 17: The function find_routes_by_path currently accepts an arbitrary
relation string; change it to use the canonical relation constant from
src/domain.rs (e.g., REL_ROUTES_TO) instead of a caller-provided string: remove
the relation parameter from the signature of find_routes_by_path and reference
the REL_ROUTES_TO constant inside the function body (and update all callers to
stop passing a relation). Ensure you import REL_ROUTES_TO from src::domain where
find_routes_by_path is defined so the relation is no longer a free-form string.

In `@src/storage/queries/vectors.rs`:
- Around line 15-31: Wrap the delete+insert loop in a rusqlite
transaction/savepoint so the whole batch is atomic: begin a Transaction (e.g.,
let tx = conn.transaction()?), prepare the DELETE and INSERT statements on the
transaction (use tx.prepare_cached(...) / tx.execute(...)), run the loop against
tx so any error will cause tx.rollback() and prevent partial application, then
tx.commit() at the end; update insert_node_vectors_batch to use the
transaction-bound statements (del_stmt, ins_stmt) and return any commit error.
- Around line 80-88: In count_nodes_with_vectors, don't collapse every
node_vectors query error to 0; replace the unwrap_or(0) call with error handling
that matches the rusqlite error from querying "SELECT COUNT(*) FROM
node_vectors" and only treat the specific "no such table" (missing node_vectors)
case as a fallback to 0, while returning Err(e) for any other sqlite or schema
errors so they surface; locate the query in count_nodes_with_vectors and update
its error handling to inspect the rusqlite::Error (e.g., pattern-match
SqliteFailure or check the error message for "no such table") and propagate
non-missing-table errors.

---

Nitpick comments:
In `@src/storage/queries/nodes.rs`:
- Around line 395-406: The SQL currently in variable sql interpolates REL_CALLS
into the query string (violating parameterization); change the relation literal
to a positional parameter (replace '{rel}' with '?'), update limit_idx to
params.len() + 2 (since you'll add one more param before the limit), then push
the relation value (Box::new(REL_CALLS)) into params before pushing the limit
(Box::new(limit as i64)); keep NODE_SELECT_ALIASED, where_clause and the same
ordering but bind the relation as a parameter instead of string interpolation.

In `@src/storage/queries/search.rs`:
- Around line 159-172: Add a clarifying comment above the edit-distance fallback
SQL (the sql2 variable / "Phase 2" block) documenting the tradeoff of LIMIT
5000: state that we intentionally cap the rows retrieved to 5000 to avoid
expensive/unbounded scans and explain that this may miss matches in very large
DBs but is acceptable because this is a fallback after the fast LIKE-based
phase; mention that client-side Levenshtein filtering is applied to these
results and that the limit balances performance vs recall.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e373ed61-0707-4aba-89ea-c725ba224292

📥 Commits

Reviewing files that changed from the base of the PR and between 67fcc0a and da49abd.

📒 Files selected for processing (12)

src/storage/queries.rs
src/storage/queries/dead_code.rs
src/storage/queries/edges.rs
src/storage/queries/files.rs
src/storage/queries/helpers.rs
src/storage/queries/imports.rs
src/storage/queries/mod.rs
src/storage/queries/nodes.rs
src/storage/queries/project_map.rs
src/storage/queries/routes.rs
src/storage/queries/search.rs
src/storage/queries/vectors.rs

coderabbitai · 2026-05-09T20:32:08Z

+    let mut stmt = conn.prepare_cached(
+        "SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata
+         FROM edges e
+         JOIN nodes nt ON nt.id = e.target_id
+         JOIN nodes ns ON ns.id = e.source_id
+         JOIN files fs ON fs.id = ns.file_id
+         WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = 'calls'
+           AND fs.language IS NOT NULL"
+    )?;
+    let rows = stmt.query_map([file_id], |row| {
+        Ok((
+            row.get::<_, i64>(0)?,
+            row.get::<_, String>(1)?,
+            row.get::<_, String>(2)?,
+            row.get::<_, Option<String>>(3)?,
+        ))
+    })?;
+    rows.filter_map(Result::ok)
+        .filter(|(_, _, lang, _)| !lang.is_empty())
+        .map(Ok)
+        .collect()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate row errors instead of dropping pending calls.

Line 214's filter_map(Result::ok) turns any query_map failure into a silently missing pending-call row, so cross-file callers can lose re-resolution data after cascade deletes. While touching this query, please bind REL_CALLS instead of hardcoding 'calls'.

Suggested fix

+ use crate::domain::REL_CALLS; + let mut stmt = conn.prepare_cached( "SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata FROM edges e JOIN nodes nt ON nt.id = e.target_id JOIN nodes ns ON ns.id = e.source_id JOIN files fs ON fs.id = ns.file_id - WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = 'calls' + WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = ?2 AND fs.language IS NOT NULL" )?; - let rows = stmt.query_map([file_id], |row| { + let rows = stmt.query_map(rusqlite::params![file_id, REL_CALLS], |row| { Ok(( row.get::<_, i64>(0)?, row.get::<_, String>(1)?, row.get::<_, String>(2)?, row.get::<_, Option<String>>(3)?, )) })?; - rows.filter_map(Result::ok) - .filter(|(_, _, lang, _)| !lang.is_empty()) - .map(Ok) - .collect() + let mut pending = Vec::new(); + for row in rows { + let row = row?; + if !row.2.is_empty() { + pending.push(row); + } + } + Ok(pending)

As per coding guidelines, "Use relation type constants (e.g., REL_CALLS) from src/domain.rs instead of hardcoding relation type strings".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let mut stmt = conn.prepare_cached(

"SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata

FROM edges e

JOIN nodes nt ON nt.id = e.target_id

JOIN nodes ns ON ns.id = e.source_id

JOIN files fs ON fs.id = ns.file_id

WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = 'calls'

AND fs.language IS NOT NULL"

)?;

let rows = stmt.query_map([file_id], |row| {

Ok((

row.get::<_, i64>(0)?,

row.get::<_, String>(1)?,

row.get::<_, String>(2)?,

row.get::<_, Option<String>>(3)?,

))

})?;

rows.filter_map(Result::ok)

.filter(|(_, _, lang, _)| !lang.is_empty())

.map(Ok)

.collect()

use crate::domain::REL_CALLS;

let mut stmt = conn.prepare_cached(

"SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata

FROM edges e

JOIN nodes nt ON nt.id = e.target_id

JOIN nodes ns ON ns.id = e.source_id

JOIN files fs ON fs.id = ns.file_id

WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = ?2

AND fs.language IS NOT NULL"

)?;

let rows = stmt.query_map(rusqlite::params![file_id, REL_CALLS], |row| {

Ok((

row.get::<_, i64>(0)?,

row.get::<_, String>(1)?,

row.get::<_, String>(2)?,

row.get::<_, Option<String>>(3)?,

))

})?;

let mut pending = Vec::new();

for row in rows {

let row = row?;

if !row.2.is_empty() {

pending.push(row);

}

}

Ok(pending)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/storage/queries/nodes.rs` around lines 197 - 217, The query currently drops any row that produced an error by using filter_map(Result::ok) and hardcodes the relation string; change it to bind the relation via REL_CALLS and propagate DB errors instead of silencing them: prepare the statement to use a parameter for relation (pass domain::REL_CALLS as the second bind), keep the same query_map closure, then replace the filter_map/collect pipeline with collecting the iterator into a Result<Vec<_>, _> (e.g., rows.collect::<Result<Vec<_>, _>>()?) to propagate any row errors, and after successful collection filter out entries with empty lang before returning; reference symbols: stmt/query_map/rows, REL_CALLS from src/domain.rs, and the file_id bind.

coderabbitai · 2026-05-09T20:32:08Z

+    pub end_line: i64,
+}
+
+pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use relation type constant instead of accepting string parameter.

The relation parameter should use a constant from src/domain.rs (e.g., REL_ROUTES_TO) rather than accepting an arbitrary string. This prevents typos and ensures consistency across the codebase.

🔧 Suggested fix

Consider changing the signature to accept a specific relation constant or document which constants should be used:

-pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> { +/// Find routes by path. Typically called with `crate::domain::REL_ROUTES_TO`. +pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {

Or more strictly:

+use crate::domain::REL_ROUTES_TO; + -pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> { +pub fn find_routes_by_path(conn: &Connection, route_path: &str) -> Result<Vec<RouteMatch>> { + let relation = REL_ROUTES_TO;

As per coding guidelines: "Use relation type constants (e.g., REL_CALLS) from src/domain.rs instead of hardcoding relation type strings."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {

use crate::domain::REL_ROUTES_TO;

pub fn find_routes_by_path(conn: &Connection, route_path: &str) -> Result<Vec<RouteMatch>> {

let relation = REL_ROUTES_TO;

// ... rest of function body remains unchanged

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/storage/queries/routes.rs` at line 17, The function find_routes_by_path currently accepts an arbitrary relation string; change it to use the canonical relation constant from src/domain.rs (e.g., REL_ROUTES_TO) instead of a caller-provided string: remove the relation parameter from the signature of find_routes_by_path and reference the REL_ROUTES_TO constant inside the function body (and update all callers to stop passing a relation). Ensure you import REL_ROUTES_TO from src::domain where find_routes_by_path is defined so the relation is no longer a free-form string.

coderabbitai · 2026-05-09T20:32:08Z

+pub fn insert_node_vectors_batch(conn: &Connection, vectors: &[(i64, Vec<f32>)]) -> Result<()> {
+    if vectors.is_empty() {
+        return Ok(());
+    }
+    // vec0 virtual tables do not support INSERT OR REPLACE, so delete first.
+    let mut del_stmt = conn.prepare_cached(
+        "DELETE FROM node_vectors WHERE node_id = ?1"
+    )?;
+    let mut ins_stmt = conn.prepare_cached(
+        "INSERT INTO node_vectors(node_id, embedding) VALUES (?1, ?2)"
+    )?;
+    for (node_id, embedding) in vectors {
+        let bytes: &[u8] = bytemuck::cast_slice(embedding);
+        del_stmt.execute(rusqlite::params![node_id])?;
+        ins_stmt.execute(rusqlite::params![node_id, bytes])?;
+    }
+    Ok(())


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make the batch delete+insert sequence atomic.

Lines 19-29 delete the old vector before inserting the replacement. If any insert fails mid-loop, that node is left without an embedding, and the batch exits in a partially applied state. This helper should enforce atomicity itself with a transaction/savepoint, or accept a transaction-bound connection type instead of relying on the caller comment.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/storage/queries/vectors.rs` around lines 15 - 31, Wrap the delete+insert loop in a rusqlite transaction/savepoint so the whole batch is atomic: begin a Transaction (e.g., let tx = conn.transaction()?), prepare the DELETE and INSERT statements on the transaction (use tx.prepare_cached(...) / tx.execute(...)), run the loop against tx so any error will cause tx.rollback() and prevent partial application, then tx.commit() at the end; update insert_node_vectors_batch to use the transaction-bound statements (del_stmt, ins_stmt) and return any commit error.

coderabbitai · 2026-05-09T20:32:08Z

+pub fn count_nodes_with_vectors(conn: &Connection) -> Result<(i64, i64)> {
+    let total: i64 = conn.query_row(
+        "SELECT COUNT(*) FROM nodes WHERE context_string IS NOT NULL", [], |r| r.get(0)
+    )?;
+    // node_vectors table may not exist when embed-model feature is disabled; return 0 in that case
+    let with_vectors: i64 = conn.query_row(
+        "SELECT COUNT(*) FROM node_vectors", [], |r| r.get(0)
+    ).unwrap_or(0);
+    Ok((with_vectors, total))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't collapse every node_vectors query failure into 0.

Line 87's unwrap_or(0) hides corrupted-schema and database errors behind a fake "no vectors yet" result. Only the missing-table case should fall back to 0; everything else should still surface as an error.

Suggested fix

- let with_vectors: i64 = conn.query_row( - "SELECT COUNT(*) FROM node_vectors", [], |r| r.get(0) - ).unwrap_or(0); + let with_vectors: i64 = match conn.query_row( + "SELECT COUNT(*) FROM node_vectors", [], |r| r.get(0) + ) { + Ok(count) => count, + Err(err) if err.to_string().contains("no such table: node_vectors") => 0, + Err(err) => return Err(err.into()), + };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/storage/queries/vectors.rs` around lines 80 - 88, In count_nodes_with_vectors, don't collapse every node_vectors query error to 0; replace the unwrap_or(0) call with error handling that matches the rusqlite error from querying "SELECT COUNT(*) FROM node_vectors" and only treat the specific "no such table" (missing node_vectors) case as a fallback to 0, while returning Err(e) for any other sqlite or schema errors so they surface; locate the query in count_nodes_with_vectors and update its error handling to inspect the rusqlite::Error (e.g., pattern-match SqliteFailure or check the error message for "no such table") and propagate non-missing-table errors.

…/ pipeline) Pure refactor release — three biggest source files (8049 lines as monoliths) decomposed into 26 per-concern submodules. Zero behavior change, public surface preserved across all three splits. - src/storage/queries.rs (2892 lines) → src/storage/queries/ (10 files, #15) - src/parser/relations.rs (2783 lines) → src/parser/relations/ (9 files, #16) - src/indexer/pipeline.rs (2374 lines) → src/indexer/pipeline/ (7 files, #17) External callers in cli.rs / mcp/server / tests / benches / claude-plugin keep their imports unchanged. Two orchestrator-style functions kept whole deliberately (walk_for_relations ~650 lines, index_files ~770 lines) — their arms/phases share local state that splitting would either duplicate or thread back via large arg lists. Verification: cargo check + cargo +1.95.0 clippy (no-default-features + all-targets, -D warnings) + cargo test --release (292 lib + 129 integration = 421 tests, 0 failed). Pre-merge CI green on all three PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

sdsrss merged commit 657a1f9 into main May 9, 2026
9 checks passed

sdsrss deleted the refactor/split-queries-rs branch May 9, 2026 20:35

This was referenced May 9, 2026

refactor(parser): split relations.rs (2783 lines) into 9-module per-concern tree #16

Merged

refactor(indexer): split pipeline.rs (2374 lines) into 7-module per-concern tree #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(storage): split queries.rs into 10-module per-domain tree#15

refactor(storage): split queries.rs into 10-module per-domain tree#15
sdsrss merged 1 commit into
mainfrom
refactor/split-queries-rs

sdsrss commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 9, 2026

Uh oh!

coderabbitai Bot May 9, 2026

Uh oh!

coderabbitai Bot May 9, 2026

Uh oh!

coderabbitai Bot May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {
+use crate::domain::REL_ROUTES_TO;
+pub fn find_routes_by_path(conn: &Connection, route_path: &str) -> Result<Vec<RouteMatch>> {
+    let relation = REL_ROUTES_TO;
+    // ... rest of function body remains unchanged

Conversation

sdsrss commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Layout

Invariants preserved

Test plan

Out of scope

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sdsrss commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading