Skip to content

refactor(storage): split queries.rs into 10-module per-domain tree#15

Merged
sdsrss merged 1 commit into
mainfrom
refactor/split-queries-rs
May 9, 2026
Merged

refactor(storage): split queries.rs into 10-module per-domain tree#15
sdsrss merged 1 commit into
mainfrom
refactor/split-queries-rs

Conversation

@sdsrss
Copy link
Copy Markdown
Owner

@sdsrss sdsrss commented May 9, 2026

Summary

  • Mechanical split of src/storage/queries.rs (2892 lines) into a src/storage/queries/{helpers,files,nodes,edges,vectors,search,routes,imports,project_map,dead_code}.rs tree, following the existing // --- X --- section markers.
  • queries/mod.rs flat-re-exports every public type and function, so the 33 existing crate::storage::queries::X import sites compile unchanged.
  • Tests redistributed per target module (21 tests + test_db harness now pub(super) in helpers.rs).

Layout

File Lines Contents
helpers.rs 25 MAX_IN_PARAMS, first_row, make_placeholders, test_db
mod.rs 55 submodule declarations + flat pub use re-exports
files.rs 160 IndexStatus, FileRecord, file CRUD
imports.rs 161 FileDependency, get_import_tree (recursive CTE)
vectors.rs 175 vec ops + unembedded queries
project_map.rs 278 ModuleStats/Dep/EntryPoint/HotFunction, get_project_map
routes.rs 291 RouteMatch, CallerWithRouteInfo, ModuleExport
edges.rs 340 EdgeRecord, IncomingReference, PendingCallRow, edge CRUD + graph helpers
dead_code.rs 359 DeadCodeResult, find_dead_code
search.rs 360 FtsResult, NameCandidate, fts5_search, fuzzy + Levenshtein
nodes.rs 814 NodeRecord/Result/WithFile, NameEntry, node CRUD + filters

NODE_SELECT, NODE_SELECT_ALIASED, and map_node_row live with NodeResult in nodes.rs (avoids a helpers ↔ nodes cross-import cycle); levenshtein stays private to search.rs, dir_of to project_map.rs, map_incoming_ref to edges.rs.

Invariants preserved

  • feedback_ast_search_rankingget_nodes_with_files_by_filters keeps caller_count DESC, f.path ASC, n.start_line ASC ordering; regression test stays green.
  • feedback_test_filter_propagationget_module_exports keeps n.is_test = 0 filter on both phase 1 and fallback paths.
  • feedback_recursive_cte_parent_trackingget_import_tree recursive CTE structure unchanged.
  • feedback_match_confidence_heuristics, feedback_chars_per_token_is_bytes, feedback_edge_resolution_same_language — out of scope, untouched.

Test plan

  • cargo +1.95.0 clippy --all-targets clean (default + --no-default-features)
  • cargo test --lib — 292 passed, 0 failed (default + --no-default-features)
  • cargo test — 484 total passed across all 8 binaries (lib + 7 integration), 0 failed
  • All 21 redistributed query tests pass under their new module paths
  • git diff --stat: 12 files, +3018 / -2892 (overhead = per-file imports + pub use block)

Out of scope

Two more giants flagged in the prior session — src/parser/relations.rs (2783 lines / 125KB) and src/indexer/pipeline.rs (2374 lines / 105KB) — get their own session + PR each (per the original plan).

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added dead code detection for identifying unused functions and symbols.
    • Enabled full-text search for discovering code entities.
    • Introduced vector-based semantic search for enhanced code understanding.
    • Added route and dependency mapping for visualizing module relationships and entry points.
  • Refactor

    • Reorganized database query layer for improved performance and maintainability.

Review Change Stack

Mechanical split of the 2892-line src/storage/queries.rs into
src/storage/queries/{helpers,files,nodes,edges,vectors,search,routes,
imports,project_map,dead_code}.rs. Boundaries follow the existing
`// --- X ---` section markers in the source. Tests redistributed per
target module (21 tests; test_db harness moved to helpers.rs as
pub(super)).

queries/mod.rs flat-re-exports every public type and fn so the 33
existing `crate::storage::queries::X` import sites in indexer/, mcp/,
graph/, and tests/ keep compiling unchanged.

Cross-module utilities:
- helpers.rs: MAX_IN_PARAMS, first_row, make_placeholders, test_db
- nodes.rs: NODE_SELECT, NODE_SELECT_ALIASED, map_node_row (co-located
  with NodeResult to avoid cross-import cycles)
- search.rs keeps levenshtein private; project_map keeps dir_of private;
  edges keeps map_incoming_ref private

Verification: cargo +1.95.0 clippy --all-targets clean both feature
sets; 292 lib tests + 192 integration tests pass (484 total, 0 failed)
on no-default-features and default features. No SQL, no ordering, no
public API changes — preserved invariants from feedback_ast_search_ranking
(caller_count DESC), feedback_test_filter_propagation (n.is_test=0),
feedback_recursive_cte_parent_tracking (import_tree depth handling).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

📝 Walkthrough

Walkthrough

This PR adds a complete SQLite query layer organizing graph database operations across eleven domain-specific modules. It introduces CRUD helpers for files, nodes, and edges; implements recursive file-dependency traversal; adds dead-code detection, route matching, project analytics, FTS5 search, and vector similarity operations; and re-exports all public APIs through a flat module interface.

Changes

SQLite Query Layer for Graph Database

Layer / File(s) Summary
Utility Constants & Helpers
src/storage/queries/helpers.rs
MAX_IN_PARAMS=500, first_row() iterator mapper, make_placeholders() for SQL generation, and test_db() factory.
File Metadata CRUD
src/storage/queries/files.rs
IndexStatus aggregates file/node/edge counts; upsert_file() inserts-or-updates by path; delete_files_by_paths() batches deletions; hash/path/language lookups.
Node Data Model & Insertion
src/storage/queries/nodes.rs (partial)
NodeRecord for inserts, NodeResult for queries, NodeWithFile for file joins; insert_node() and insert_node_cached() variants.
Edge Data Model & CRUD
src/storage/queries/edges.rs (partial)
EdgeRecord, PendingCallRow, IncomingReference types; insert_edge() with ignore-duplicate semantics; pending-call buffering.
Node Queries & Context
src/storage/queries/nodes.rs (continued)
Lookups by name/id/file; get_inbound_cross_file_edges(), get_dirty_node_ids() for incremental indexing; context-string updates (single/batch); missing-context detection.
Edge Traversal & Resolution
src/storage/queries/edges.rs (continued)
Edge fetching by node; target/source name resolution (single/batch); get_edges_batch() with direction annotation; file path joins; incoming reference queries.
File-Level Dependencies
src/storage/queries/imports.rs
Recursive CTEs for outgoing/incoming import/call trees; cycle prevention; symbol-count and depth computation.
Dead Code Detection
src/storage/queries/dead_code.rs
Identifies nodes without incoming call/import/inherit/implement edges; excludes routes-to self-edges and patterns (module, main); supports test/path/type/min-lines filters; computes has_export_edge.
Routes & Module Exports
src/storage/queries/routes.rs
HTTP route path extraction and prefix matching; caller route-info batching; module export discovery via explicit edges or fallback symbol collection.
Project Analytics
src/storage/queries/project_map.rs
Per-directory stats (file/function/class counts, languages, key symbols); cross-directory import edges; HTTP and main entry points; hot functions (production vs test caller counts).
Full-Text & Fuzzy Search
src/storage/queries/search.rs
FTS5 AND-first with OR fallback; BM25 scoring; query preprocessing (stop-words, tokenization, acronym expansion); fuzzy name matching via LIKE + Levenshtein distance.
Vector Embeddings
src/storage/queries/vectors.rs
Single and batch node embedding insertion; vector similarity search; unembedded node prioritization by edge count; embedding statistics.
Module Re-exports
src/storage/queries/mod.rs
Centralizes re-export of all public functions/types across modules; preserves flat crate::storage::queries::X import paths; conditional test-only re-exports.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • sdsrss/code-graph-mcp#6: Adds inbound cross-file edge preservation/restoration in indexer—directly related to the get_inbound_cross_file_edges and get_inbound_calls_for_pending functions introduced here.

Poem

🐰 A rabbit hops through queries deep,
SQL dreams and data to keep,
Nodes and edges dance in rows,
Dead code found where no one goes,
Vector dreams in embedding's glow!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: refactoring a large monolithic queries.rs file into a well-organized 10-module per-domain tree structure while maintaining backward compatibility through flat re-exports.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/split-queries-rs

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
src/storage/queries/nodes.rs (1)

395-406: ⚡ Quick win

Parameterize the caller-count subquery instead of interpolating REL_CALLS.

The relation is baked into the SQL text here, which violates the repo's parameterized-query rule and prevents this statement from being reused as-is by SQLite. Bind it as another positional parameter next to limit.

Suggested fix
     let sql = format!(
         "SELECT {cols}, f.path, f.language \
          FROM nodes n JOIN files f ON f.id = n.file_id{where_clause} \
-         ORDER BY (SELECT COUNT(*) FROM edges e WHERE e.target_id = n.id AND e.relation = '{rel}') DESC, \
+         ORDER BY (SELECT COUNT(*) FROM edges e WHERE e.target_id = n.id AND e.relation = ?{rel_idx}) DESC, \
                   f.path ASC, n.start_line ASC \
          LIMIT ?{limit_idx}",
         cols = NODE_SELECT_ALIASED,
         where_clause = where_clause,
-        rel = REL_CALLS,
-        limit_idx = params.len() + 1,
+        rel_idx = params.len() + 1,
+        limit_idx = params.len() + 2,
     );
+    params.push(Box::new(REL_CALLS.to_string()));
     params.push(Box::new(limit as i64));

As per coding guidelines, "Database queries must be parameterized — use parameterized queries in src/storage/queries.rs instead of string concatenation".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/nodes.rs` around lines 395 - 406, The SQL currently in
variable sql interpolates REL_CALLS into the query string (violating
parameterization); change the relation literal to a positional parameter
(replace '{rel}' with '?'), update limit_idx to params.len() + 2 (since you'll
add one more param before the limit), then push the relation value
(Box::new(REL_CALLS)) into params before pushing the limit (Box::new(limit as
i64)); keep NODE_SELECT_ALIASED, where_clause and the same ordering but bind the
relation as a parameter instead of string interpolation.
src/storage/queries/search.rs (1)

159-172: 💤 Low value

Consider documenting the LIMIT 5000 tradeoff in Phase 2 fallback.

The edit-distance fallback fetches up to 5000 nodes for client-side Levenshtein filtering. This is a reasonable limit to prevent unbounded scans, but it means fuzzy matching may miss valid results in very large databases. Consider adding a comment explaining this design choice.

Note: This is likely an acceptable tradeoff for a fallback mechanism that only activates when the fast LIKE-based phase returns no results.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/search.rs` around lines 159 - 172, Add a clarifying
comment above the edit-distance fallback SQL (the sql2 variable / "Phase 2"
block) documenting the tradeoff of LIMIT 5000: state that we intentionally cap
the rows retrieved to 5000 to avoid expensive/unbounded scans and explain that
this may miss matches in very large DBs but is acceptable because this is a
fallback after the fast LIKE-based phase; mention that client-side Levenshtein
filtering is applied to these results and that the limit balances performance vs
recall.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/storage/queries/nodes.rs`:
- Around line 197-217: The query currently drops any row that produced an error
by using filter_map(Result::ok) and hardcodes the relation string; change it to
bind the relation via REL_CALLS and propagate DB errors instead of silencing
them: prepare the statement to use a parameter for relation (pass
domain::REL_CALLS as the second bind), keep the same query_map closure, then
replace the filter_map/collect pipeline with collecting the iterator into a
Result<Vec<_>, _> (e.g., rows.collect::<Result<Vec<_>, _>>()?) to propagate any
row errors, and after successful collection filter out entries with empty lang
before returning; reference symbols: stmt/query_map/rows, REL_CALLS from
src/domain.rs, and the file_id bind.

In `@src/storage/queries/routes.rs`:
- Line 17: The function find_routes_by_path currently accepts an arbitrary
relation string; change it to use the canonical relation constant from
src/domain.rs (e.g., REL_ROUTES_TO) instead of a caller-provided string: remove
the relation parameter from the signature of find_routes_by_path and reference
the REL_ROUTES_TO constant inside the function body (and update all callers to
stop passing a relation). Ensure you import REL_ROUTES_TO from src::domain where
find_routes_by_path is defined so the relation is no longer a free-form string.

In `@src/storage/queries/vectors.rs`:
- Around line 15-31: Wrap the delete+insert loop in a rusqlite
transaction/savepoint so the whole batch is atomic: begin a Transaction (e.g.,
let tx = conn.transaction()?), prepare the DELETE and INSERT statements on the
transaction (use tx.prepare_cached(...) / tx.execute(...)), run the loop against
tx so any error will cause tx.rollback() and prevent partial application, then
tx.commit() at the end; update insert_node_vectors_batch to use the
transaction-bound statements (del_stmt, ins_stmt) and return any commit error.
- Around line 80-88: In count_nodes_with_vectors, don't collapse every
node_vectors query error to 0; replace the unwrap_or(0) call with error handling
that matches the rusqlite error from querying "SELECT COUNT(*) FROM
node_vectors" and only treat the specific "no such table" (missing node_vectors)
case as a fallback to 0, while returning Err(e) for any other sqlite or schema
errors so they surface; locate the query in count_nodes_with_vectors and update
its error handling to inspect the rusqlite::Error (e.g., pattern-match
SqliteFailure or check the error message for "no such table") and propagate
non-missing-table errors.

---

Nitpick comments:
In `@src/storage/queries/nodes.rs`:
- Around line 395-406: The SQL currently in variable sql interpolates REL_CALLS
into the query string (violating parameterization); change the relation literal
to a positional parameter (replace '{rel}' with '?'), update limit_idx to
params.len() + 2 (since you'll add one more param before the limit), then push
the relation value (Box::new(REL_CALLS)) into params before pushing the limit
(Box::new(limit as i64)); keep NODE_SELECT_ALIASED, where_clause and the same
ordering but bind the relation as a parameter instead of string interpolation.

In `@src/storage/queries/search.rs`:
- Around line 159-172: Add a clarifying comment above the edit-distance fallback
SQL (the sql2 variable / "Phase 2" block) documenting the tradeoff of LIMIT
5000: state that we intentionally cap the rows retrieved to 5000 to avoid
expensive/unbounded scans and explain that this may miss matches in very large
DBs but is acceptable because this is a fallback after the fast LIKE-based
phase; mention that client-side Levenshtein filtering is applied to these
results and that the limit balances performance vs recall.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e373ed61-0707-4aba-89ea-c725ba224292

📥 Commits

Reviewing files that changed from the base of the PR and between 67fcc0a and da49abd.

📒 Files selected for processing (12)
  • src/storage/queries.rs
  • src/storage/queries/dead_code.rs
  • src/storage/queries/edges.rs
  • src/storage/queries/files.rs
  • src/storage/queries/helpers.rs
  • src/storage/queries/imports.rs
  • src/storage/queries/mod.rs
  • src/storage/queries/nodes.rs
  • src/storage/queries/project_map.rs
  • src/storage/queries/routes.rs
  • src/storage/queries/search.rs
  • src/storage/queries/vectors.rs

Comment on lines +197 to +217
let mut stmt = conn.prepare_cached(
"SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata
FROM edges e
JOIN nodes nt ON nt.id = e.target_id
JOIN nodes ns ON ns.id = e.source_id
JOIN files fs ON fs.id = ns.file_id
WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = 'calls'
AND fs.language IS NOT NULL"
)?;
let rows = stmt.query_map([file_id], |row| {
Ok((
row.get::<_, i64>(0)?,
row.get::<_, String>(1)?,
row.get::<_, String>(2)?,
row.get::<_, Option<String>>(3)?,
))
})?;
rows.filter_map(Result::ok)
.filter(|(_, _, lang, _)| !lang.is_empty())
.map(Ok)
.collect()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate row errors instead of dropping pending calls.

Line 214's filter_map(Result::ok) turns any query_map failure into a silently missing pending-call row, so cross-file callers can lose re-resolution data after cascade deletes. While touching this query, please bind REL_CALLS instead of hardcoding 'calls'.

Suggested fix
+    use crate::domain::REL_CALLS;
+
     let mut stmt = conn.prepare_cached(
         "SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata
          FROM edges e
          JOIN nodes nt ON nt.id = e.target_id
          JOIN nodes ns ON ns.id = e.source_id
          JOIN files fs ON fs.id = ns.file_id
-         WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = 'calls'
+         WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = ?2
            AND fs.language IS NOT NULL"
     )?;
-    let rows = stmt.query_map([file_id], |row| {
+    let rows = stmt.query_map(rusqlite::params![file_id, REL_CALLS], |row| {
         Ok((
             row.get::<_, i64>(0)?,
             row.get::<_, String>(1)?,
             row.get::<_, String>(2)?,
             row.get::<_, Option<String>>(3)?,
         ))
     })?;
-    rows.filter_map(Result::ok)
-        .filter(|(_, _, lang, _)| !lang.is_empty())
-        .map(Ok)
-        .collect()
+    let mut pending = Vec::new();
+    for row in rows {
+        let row = row?;
+        if !row.2.is_empty() {
+            pending.push(row);
+        }
+    }
+    Ok(pending)

As per coding guidelines, "Use relation type constants (e.g., REL_CALLS) from src/domain.rs instead of hardcoding relation type strings".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let mut stmt = conn.prepare_cached(
"SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata
FROM edges e
JOIN nodes nt ON nt.id = e.target_id
JOIN nodes ns ON ns.id = e.source_id
JOIN files fs ON fs.id = ns.file_id
WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = 'calls'
AND fs.language IS NOT NULL"
)?;
let rows = stmt.query_map([file_id], |row| {
Ok((
row.get::<_, i64>(0)?,
row.get::<_, String>(1)?,
row.get::<_, String>(2)?,
row.get::<_, Option<String>>(3)?,
))
})?;
rows.filter_map(Result::ok)
.filter(|(_, _, lang, _)| !lang.is_empty())
.map(Ok)
.collect()
use crate::domain::REL_CALLS;
let mut stmt = conn.prepare_cached(
"SELECT e.source_id, nt.name, COALESCE(fs.language, ''), e.metadata
FROM edges e
JOIN nodes nt ON nt.id = e.target_id
JOIN nodes ns ON ns.id = e.source_id
JOIN files fs ON fs.id = ns.file_id
WHERE nt.file_id = ?1 AND ns.file_id != ?1 AND e.relation = ?2
AND fs.language IS NOT NULL"
)?;
let rows = stmt.query_map(rusqlite::params![file_id, REL_CALLS], |row| {
Ok((
row.get::<_, i64>(0)?,
row.get::<_, String>(1)?,
row.get::<_, String>(2)?,
row.get::<_, Option<String>>(3)?,
))
})?;
let mut pending = Vec::new();
for row in rows {
let row = row?;
if !row.2.is_empty() {
pending.push(row);
}
}
Ok(pending)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/nodes.rs` around lines 197 - 217, The query currently
drops any row that produced an error by using filter_map(Result::ok) and
hardcodes the relation string; change it to bind the relation via REL_CALLS and
propagate DB errors instead of silencing them: prepare the statement to use a
parameter for relation (pass domain::REL_CALLS as the second bind), keep the
same query_map closure, then replace the filter_map/collect pipeline with
collecting the iterator into a Result<Vec<_>, _> (e.g.,
rows.collect::<Result<Vec<_>, _>>()?) to propagate any row errors, and after
successful collection filter out entries with empty lang before returning;
reference symbols: stmt/query_map/rows, REL_CALLS from src/domain.rs, and the
file_id bind.

pub end_line: i64,
}

pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use relation type constant instead of accepting string parameter.

The relation parameter should use a constant from src/domain.rs (e.g., REL_ROUTES_TO) rather than accepting an arbitrary string. This prevents typos and ensures consistency across the codebase.

🔧 Suggested fix

Consider changing the signature to accept a specific relation constant or document which constants should be used:

-pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {
+/// Find routes by path. Typically called with `crate::domain::REL_ROUTES_TO`.
+pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {

Or more strictly:

+use crate::domain::REL_ROUTES_TO;
+
-pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {
+pub fn find_routes_by_path(conn: &Connection, route_path: &str) -> Result<Vec<RouteMatch>> {
+    let relation = REL_ROUTES_TO;

As per coding guidelines: "Use relation type constants (e.g., REL_CALLS) from src/domain.rs instead of hardcoding relation type strings."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn find_routes_by_path(conn: &Connection, route_path: &str, relation: &str) -> Result<Vec<RouteMatch>> {
use crate::domain::REL_ROUTES_TO;
pub fn find_routes_by_path(conn: &Connection, route_path: &str) -> Result<Vec<RouteMatch>> {
let relation = REL_ROUTES_TO;
// ... rest of function body remains unchanged
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/routes.rs` at line 17, The function find_routes_by_path
currently accepts an arbitrary relation string; change it to use the canonical
relation constant from src/domain.rs (e.g., REL_ROUTES_TO) instead of a
caller-provided string: remove the relation parameter from the signature of
find_routes_by_path and reference the REL_ROUTES_TO constant inside the function
body (and update all callers to stop passing a relation). Ensure you import
REL_ROUTES_TO from src::domain where find_routes_by_path is defined so the
relation is no longer a free-form string.

Comment on lines +15 to +31
pub fn insert_node_vectors_batch(conn: &Connection, vectors: &[(i64, Vec<f32>)]) -> Result<()> {
if vectors.is_empty() {
return Ok(());
}
// vec0 virtual tables do not support INSERT OR REPLACE, so delete first.
let mut del_stmt = conn.prepare_cached(
"DELETE FROM node_vectors WHERE node_id = ?1"
)?;
let mut ins_stmt = conn.prepare_cached(
"INSERT INTO node_vectors(node_id, embedding) VALUES (?1, ?2)"
)?;
for (node_id, embedding) in vectors {
let bytes: &[u8] = bytemuck::cast_slice(embedding);
del_stmt.execute(rusqlite::params![node_id])?;
ins_stmt.execute(rusqlite::params![node_id, bytes])?;
}
Ok(())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make the batch delete+insert sequence atomic.

Lines 19-29 delete the old vector before inserting the replacement. If any insert fails mid-loop, that node is left without an embedding, and the batch exits in a partially applied state. This helper should enforce atomicity itself with a transaction/savepoint, or accept a transaction-bound connection type instead of relying on the caller comment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/vectors.rs` around lines 15 - 31, Wrap the delete+insert
loop in a rusqlite transaction/savepoint so the whole batch is atomic: begin a
Transaction (e.g., let tx = conn.transaction()?), prepare the DELETE and INSERT
statements on the transaction (use tx.prepare_cached(...) / tx.execute(...)),
run the loop against tx so any error will cause tx.rollback() and prevent
partial application, then tx.commit() at the end; update
insert_node_vectors_batch to use the transaction-bound statements (del_stmt,
ins_stmt) and return any commit error.

Comment on lines +80 to +88
pub fn count_nodes_with_vectors(conn: &Connection) -> Result<(i64, i64)> {
let total: i64 = conn.query_row(
"SELECT COUNT(*) FROM nodes WHERE context_string IS NOT NULL", [], |r| r.get(0)
)?;
// node_vectors table may not exist when embed-model feature is disabled; return 0 in that case
let with_vectors: i64 = conn.query_row(
"SELECT COUNT(*) FROM node_vectors", [], |r| r.get(0)
).unwrap_or(0);
Ok((with_vectors, total))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't collapse every node_vectors query failure into 0.

Line 87's unwrap_or(0) hides corrupted-schema and database errors behind a fake "no vectors yet" result. Only the missing-table case should fall back to 0; everything else should still surface as an error.

Suggested fix
-    let with_vectors: i64 = conn.query_row(
-        "SELECT COUNT(*) FROM node_vectors", [], |r| r.get(0)
-    ).unwrap_or(0);
+    let with_vectors: i64 = match conn.query_row(
+        "SELECT COUNT(*) FROM node_vectors", [], |r| r.get(0)
+    ) {
+        Ok(count) => count,
+        Err(err) if err.to_string().contains("no such table: node_vectors") => 0,
+        Err(err) => return Err(err.into()),
+    };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/storage/queries/vectors.rs` around lines 80 - 88, In
count_nodes_with_vectors, don't collapse every node_vectors query error to 0;
replace the unwrap_or(0) call with error handling that matches the rusqlite
error from querying "SELECT COUNT(*) FROM node_vectors" and only treat the
specific "no such table" (missing node_vectors) case as a fallback to 0, while
returning Err(e) for any other sqlite or schema errors so they surface; locate
the query in count_nodes_with_vectors and update its error handling to inspect
the rusqlite::Error (e.g., pattern-match SqliteFailure or check the error
message for "no such table") and propagate non-missing-table errors.

@sdsrss sdsrss merged commit 657a1f9 into main May 9, 2026
9 checks passed
@sdsrss sdsrss deleted the refactor/split-queries-rs branch May 9, 2026 20:35
sdsrss added a commit that referenced this pull request May 9, 2026
…/ pipeline)

Pure refactor release — three biggest source files (8049 lines as monoliths)
decomposed into 26 per-concern submodules. Zero behavior change, public
surface preserved across all three splits.

- src/storage/queries.rs (2892 lines) → src/storage/queries/ (10 files, #15)
- src/parser/relations.rs (2783 lines) → src/parser/relations/ (9 files, #16)
- src/indexer/pipeline.rs (2374 lines) → src/indexer/pipeline/ (7 files, #17)

External callers in cli.rs / mcp/server / tests / benches / claude-plugin
keep their imports unchanged. Two orchestrator-style functions kept whole
deliberately (walk_for_relations ~650 lines, index_files ~770 lines) — their
arms/phases share local state that splitting would either duplicate or thread
back via large arg lists.

Verification: cargo check + cargo +1.95.0 clippy (no-default-features +
all-targets, -D warnings) + cargo test --release (292 lib + 129 integration
= 421 tests, 0 failed). Pre-merge CI green on all three PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant