-
Notifications
You must be signed in to change notification settings - Fork 12
Add SQLite-based indexing and search tools (RLM Phase 1) #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
84c18b4 to
4329ed9
Compare
|
PR for phase 1: #87 |
4329ed9 to
6da0716
Compare
|
@cursoragent please /review |
|
Now let me create a comprehensive code review document. 🔬 Code Review - RLM Phase 1 SQLite FoundationExecutive SummaryThis PR successfully implements a solid SQLite-based indexing and search foundation with comprehensive test coverage (103 new tests, 407 total passing). The code demonstrates strong functional programming practices, good security awareness, and excellent documentation. All tests, linting, and type checking pass. Recommendation: Approve with minor improvements suggested below. ✅ Strengths1. Excellent Architecture & Code Quality
2. Comprehensive Test Coverage
3. Security - Generally Strong
4. Performance Optimization
5. Documentation
|
| Vulnerability | Status | Notes |
|---|---|---|
| A01: Broken Access Control | ✅ Pass | File system access only; OS handles permissions |
| A02: Cryptographic Failures | ✅ Pass | Uses SHA3-256 for integrity hashing |
| A03: Injection | SQL injection risk in metadata.js (see HIGH PRIORITY #1) |
|
| A04: Insecure Design | ✅ Pass | Sound architecture with proper abstractions |
| A05: Security Misconfiguration | ✅ Pass | WAL mode, foreign keys, appropriate defaults |
| A06: Vulnerable Components | Run npm audit fix |
|
| A07: Authentication Failure | N/A | Not applicable to this module |
| A08: Data Integrity Failures | No schema migration strategy documented | |
| A09: Logging Failures | Silent error swallowing in some places | |
| A10: SSRF | ✅ Pass | No network requests; file traversal uses SKIP_DIRECTORIES |
📊 Test Quality Assessment
Coverage: Excellent
- All major functions tested
- Edge cases covered (empty results, limits, filters)
- Error conditions tested (invalid syntax, missing files)
- Integration tests for CLI tools would be beneficial but not critical
Test Quality: High
- All tests follow riteway/5-questions format
- Proper setup/teardown with
onTestFinished - No shared mutable state between tests
- Descriptive assertions with clear
given/should/actual/expected
Minor Issue: Some tests could be more specific:
// ai/tools/search/fts5.test.js:92
actual: results.length >= 1,
expected: true,Better:
actual: results.length > 0,
expected: true,🎯 Adherence to Project Standards
JavaScript Style (javascript.mdc) ✅
- Functional programming throughout
- No classes used
- Arrow functions, destructuring, spread operators used appropriately
- Pure functions with clear inputs/outputs
- Good use of
map,filter,reduce - Named exports only
TDD (tdd.mdc) ✅
- All tests colocated with implementation
- Tests answer the 5 questions
- Proper isolation and cleanup
- Good use of factories for test data
Code Comments ✅
- Docblocks on public APIs
- Comments add value without being redundant
- No style guide repetition
📦 Package.json & Dependencies
New Dependencies Added:
better-sqlite3- Well-maintained, good choicechalk@4.1.2- Correct ESM-compatible versioncommander@11.1.0- Standard CLI librarygray-matter- Standard frontmatter parserjs-sha3- Good choice per security guidelines
Exports: ✅ Correctly adds "./tools" export
Scripts: ✅ New npm scripts work correctly
Issue: No types exported for ./tools
📝 Documentation Quality
README.md: ✅ Excellent
- Clear quick start guide
- Examples for all CLI commands
- Options tables well-formatted
- Architecture diagram included
ai/tools/README.md: ✅ Comprehensive
- Quick start section
- All CLI commands documented
- Programmatic API examples
- Database schema documented
- Performance metrics included
Code Comments: ✅ Good
- JSDoc on all public functions
- Complex logic explained
- TODO comment should be converted to issue
🏗️ Architecture & Design
Strengths:
- Clean layering: db → indexers/search/graph → CLI
- Single responsibility principle followed
- Good abstraction boundaries
- Extensible design (easy to add new search strategies)
- Proper use of SQLite features (FTS5, CTEs, transactions)
Considerations:
- Fan-out search aggregation could be its own module
- Consider extracting SQL builders for complex queries
- CLI tools have some duplication (formatters)
🚀 Performance Considerations
Optimizations:
- ✅ WAL mode for concurrency
- ✅ Indexes on frequently queried columns
- ✅ Incremental indexing
- ✅ FTS5 for fast full-text search
- ✅ Recursive CTEs for graph traversal
Potential Issues:
- Large codebases: No pagination on
indexDirectory - Deep graphs: maxDepth=3 is reasonable default
- FTS5 queries: No query cost limit or timeout
Metrics (from README):
- Indexing: ~68 files in ~50ms ✅
- Search: <50ms ✅
- Graph: <100ms for depth 5 ✅
🔄 Git & Commit Quality
Commit: Single logical commit ✅
Add SQLite-based indexing and search tools (RLM Phase 1)
Follows conventional commit format and is descriptive.
Branch: feature/rlm-phase-1-sqlite-foundation ✅ Clear naming
Files Changed: 35 files, +4,817/-22 lines
Git Attributes: ✅ Correctly configured for SQLite binary diffs
📋 Checklist: Requirements Met
Based on PR description:
- ✅ SQLite database foundation with FTS5
- ✅ Frontmatter indexer with hash-based updates
- ✅ Dependency indexer for imports/requires
- ✅ Fan-out search with relevance scoring
- ✅ Recursive graph traversal with CTEs
- ✅ CLI tools:
index-cli,query-cli,find-related-cli - ✅ 103 new tests (407 total)
- ✅ All tests pass
- ✅ CLI tools manually tested
- ✅ Incremental indexing detects changes
- ✅ FTS5 returns relevant results
- ✅ Graph traversal works correctly
Deferred Items (as designed):
- ProductManager tables → Phase 4
- Semantic search → Stubbed for RAG integration
- Watch mode → Manual CLI + pre-commit hook
- Schema migrations → Deferred
🎬 Final Recommendations
Before Merge (Critical):
- ✅ Fix SQL injection in
metadata.js(see HIGH PRIORITY feat: Establish AIDD metaprograms system for Cursor #1) - ✅ Run
npm audit fixand resolve vulnerabilities - ✅ Add TypeScript definitions or document why they're not needed
Post-Merge (Recommended):
- Add logging to silent error handlers
- Add input validation to CLI tools
- Convert TODO comment to GitHub issue
- Document schema migration strategy
- Consider integration tests for CLI tools
Future Enhancements:
- Add query cost limits for DoS protection
- Add pagination for large directory indexing
- Consider extracting SQL builders
- Add CLI command for schema version check
📊 Score Summary
| Category | Score | Notes |
|---|---|---|
| Code Quality | 9/10 | Excellent functional style |
| Test Coverage | 10/10 | Comprehensive, well-written |
| Security | 7/10 | Strong overall, SQL injection issue |
| Documentation | 9/10 | Clear and thorough |
| Performance | 9/10 | Well-optimized |
| Architecture | 9/10 | Clean, extensible design |
| Standards Adherence | 10/10 | Follows all project guidelines |
Overall: 9/10 - Excellent work with one critical security fix needed.
This is a high-quality implementation that demonstrates strong engineering practices. The SQL injection vulnerability in metadata search is the only blocker. Once addressed, this PR is ready to merge. Great work on the comprehensive test coverage and documentation! 🎉
|
@claude please /review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements Phase 1 of the RLM (Relevant Library Management) system by adding SQLite-based indexing and search tools for fast codebase exploration. The implementation provides full-text search using FTS5, metadata filtering, dependency graph traversal, and three CLI tools for indexing and querying.
Changes:
- Added SQLite database foundation with FTS5 full-text search, documents table, and dependencies tracking
- Implemented frontmatter and dependency indexers with hash-based incremental updates
- Added fan-out search combining FTS5, metadata, and semantic search strategies with relevance scoring
- Created CLI tools (index-cli, query-cli, find-related-cli) with npm script wrappers
- Added 103 new tests across database, indexing, search, and graph traversal modules
- Updated documentation with comprehensive usage examples and API reference
Reviewed changes
Copilot reviewed 32 out of 35 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Added better-sqlite3 dependency, new npm scripts, and ./tools export |
| package-lock.json | Dependency lockfile updates for better-sqlite3 and related packages |
| ai/tools/db/connection.js | Database factory with WAL mode and foreign key support |
| ai/tools/db/schema.js | Schema definitions with FTS5 triggers and cascade deletes |
| ai/tools/indexers/frontmatter.js | YAML frontmatter parser with prototype pollution protection |
| ai/tools/indexers/dependencies.js | Import/require statement parser for dependency graph |
| ai/tools/search/fts5.js | Full-text search implementation with BM25 ranking |
| ai/tools/search/metadata.js | JSON field filtering for frontmatter queries ( |
| ai/tools/search/fan-out.js | Multi-strategy search orchestration with relevance scoring |
| ai/tools/graph/traverse.js | Recursive CTE-based dependency graph traversal |
| ai/tools/cli/*.js | CLI tools for indexing, querying, and finding related files |
| lib/index-generator.js | Updated to filter empty subdirectories ( |
| .gitignore | Added .aidd/ directory exclusion ( |
| .gitattributes | Added SQLite diff configuration |
| .claude/settings.local.json | Claude Code settings file ( |
| ai/tools/README.md | Comprehensive documentation for the new tools |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.claude/settings.local.json
Outdated
| "permissions": { | ||
| "allow": [ | ||
| "Bash(git reset:*)", | ||
| "Bash(gh pr create --title \"Replace ESLint and Prettier with Biome\" --body \"$\\(cat <<''EOF''\n## Summary\n- Remove ESLint, eslint-config-prettier, eslint-plugin-prettier, and Prettier\n- Add Biome v2.3.12 for linting and formatting\n- Update npm scripts \\(`lint`, `format`, `format:check`\\) to use Biome commands\n- Apply Biome formatting and lint fixes across codebase\n- Add `.claude/` to .gitignore\n\n## Configuration notes\n- Added override to allow `any` in TypeScript declaration files \\(`.d.ts`\\)\n- Disabled `useNodejsImportProtocol` rule for compatibility with existing imports\n\n## Test plan\n- [x] All unit tests pass \\(`npm run test:unit`\\)\n- [x] Lint passes \\(`npm run lint`\\)\n- [x] Type check passes \\(`npm run typecheck`\\)\n- [x] Pre-commit and pre-push hooks work correctly\n\n🤖 Generated with [Claude Code]\\(https://claude.com/claude-code\\)\nEOF\n\\)\")", | ||
| "Bash(git add:*)", | ||
| "Bash(git commit:*)", | ||
| "Bash(git push:*)", | ||
| "Bash(git checkout:*)", | ||
| "Bash(npm install)", | ||
| "Bash(npm run test:unit:*)", | ||
| "Bash(npx vitest:*)", | ||
| "Bash(node ai/tools/cli/index-cli.js:*)", | ||
| "Bash(node:*)", | ||
| "Bash(sqlite3:*)", | ||
| "Bash(gh pr create:*)", | ||
| "Bash(git rm:*)", | ||
| "Bash(ls:*)", | ||
| "Bash(grep:*)", | ||
| "Bash(npm test:*)", | ||
| "Bash(gh pr view:*)", | ||
| "Bash(gh pr edit:*)" |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains Claude-specific local settings including bash command permissions. This file should not be committed to the repository as it contains local development permissions and could expose security-sensitive command patterns. Add .claude/settings.local.json to .gitignore and document how developers should configure their local settings.
| "permissions": { | |
| "allow": [ | |
| "Bash(git reset:*)", | |
| "Bash(gh pr create --title \"Replace ESLint and Prettier with Biome\" --body \"$\\(cat <<''EOF''\n## Summary\n- Remove ESLint, eslint-config-prettier, eslint-plugin-prettier, and Prettier\n- Add Biome v2.3.12 for linting and formatting\n- Update npm scripts \\(`lint`, `format`, `format:check`\\) to use Biome commands\n- Apply Biome formatting and lint fixes across codebase\n- Add `.claude/` to .gitignore\n\n## Configuration notes\n- Added override to allow `any` in TypeScript declaration files \\(`.d.ts`\\)\n- Disabled `useNodejsImportProtocol` rule for compatibility with existing imports\n\n## Test plan\n- [x] All unit tests pass \\(`npm run test:unit`\\)\n- [x] Lint passes \\(`npm run lint`\\)\n- [x] Type check passes \\(`npm run typecheck`\\)\n- [x] Pre-commit and pre-push hooks work correctly\n\n🤖 Generated with [Claude Code]\\(https://claude.com/claude-code\\)\nEOF\n\\)\")", | |
| "Bash(git add:*)", | |
| "Bash(git commit:*)", | |
| "Bash(git push:*)", | |
| "Bash(git checkout:*)", | |
| "Bash(npm install)", | |
| "Bash(npm run test:unit:*)", | |
| "Bash(npx vitest:*)", | |
| "Bash(node ai/tools/cli/index-cli.js:*)", | |
| "Bash(node:*)", | |
| "Bash(sqlite3:*)", | |
| "Bash(gh pr create:*)", | |
| "Bash(git rm:*)", | |
| "Bash(ls:*)", | |
| "Bash(grep:*)", | |
| "Bash(npm test:*)", | |
| "Bash(gh pr view:*)", | |
| "Bash(gh pr edit:*)" | |
| "__comment": "This is a template for Claude local settings. Do NOT put real local bash permissions in version control.", | |
| "__instructions": "Create your own untracked .claude/settings.local.json (ignored via .gitignore) and copy this structure there, then add your actual permissions locally.", | |
| "permissions": { | |
| "allow": [ | |
| "Bash(<your-command-pattern-here>)" |
lib/index-generator.js
Outdated
| const hasIndexableContent = async (dirPath) => { | ||
| const files = await getIndexableFiles(dirPath); | ||
| const subdirs = await getSubdirectories(dirPath); | ||
| return files.length > 0 || subdirs.length > 0; | ||
| }; |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hasIndexableContent function has a potential infinite recursion issue. It checks if subdirs.length > 0, but doesn't verify if those subdirectories actually have any markdown content. An empty subdirectory would make hasIndexableContent return true, causing an index to be created, which then references the empty subdirectory. This contradicts the intended behavior described in the test "excludes subdirectory references when subdirectory has no content".
Consider making hasIndexableContent recursive to check if subdirectories contain actual markdown files, not just that they exist.
| const jsonPath = key.replace("frontmatter.", ""); | ||
|
|
||
| if (typeof value === "object" && value !== null && "contains" in value) { | ||
| // Array contains check using JSON functions | ||
| conditions.push( | ||
| `EXISTS ( | ||
| SELECT 1 FROM json_each(json_extract(frontmatter, '$.${jsonPath}')) | ||
| WHERE value = ? | ||
| )`, | ||
| ); | ||
| params.push(value.contains); | ||
| } else if (typeof value === "boolean") { | ||
| // Boolean comparison | ||
| conditions.push(`json_extract(frontmatter, '$.${jsonPath}') = ?`); | ||
| params.push(value ? 1 : 0); | ||
| } else if (typeof value === "number") { | ||
| // Numeric comparison | ||
| conditions.push(`json_extract(frontmatter, '$.${jsonPath}') = ?`); | ||
| params.push(value); | ||
| } else { | ||
| // String comparison | ||
| conditions.push(`json_extract(frontmatter, '$.${jsonPath}') = ?`); | ||
| params.push(value); | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jsonPath variable is directly interpolated into SQL queries without sanitization or validation. This creates a SQL injection vulnerability. An attacker could pass malicious input like '); DROP TABLE documents; -- through the frontmatter filter key.
The getFieldValues function has the same issue where the field parameter is directly interpolated into the SQL query.
Consider using parameterized queries or validating/sanitizing the jsonPath to only allow alphanumeric characters and underscores.
| const getFieldValues = (db, field) => { | ||
| const sql = ` | ||
| SELECT DISTINCT json_extract(frontmatter, '$.${field}') as value | ||
| FROM documents | ||
| WHERE json_extract(frontmatter, '$.${field}') IS NOT NULL | ||
| ORDER BY value | ||
| `; |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field parameter is directly interpolated into SQL queries without sanitization. This creates a SQL injection vulnerability when user input can influence the field name. Consider validating the field parameter to only allow safe characters (alphanumeric and underscores) or using a whitelist of allowed field names.
| "dependencies": { | ||
| "@paralleldrive/cuid2": "^3.1.0", | ||
| "@sinclair/typebox": "^0.34.41", | ||
| "better-sqlite3": "^11.8.1", |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installed version is 11.10.0 but package.json specifies ^11.8.1. While this is technically correct (the caret allows minor and patch updates), there's a discrepancy. Consider updating package.json to reflect the actual version being used, or ensuring the version constraint is intentional.
ai/tools/indexers/frontmatter.js
Outdated
| import fs from "fs-extra"; | ||
| import matter from "gray-matter"; | ||
| import sha3 from "js-sha3"; | ||
| import { errorCauses, createError } from "error-causes"; |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import createError.
| import { errorCauses, createError } from "error-causes"; | |
| import { errorCauses } from "error-causes"; |
ai/tools/indexers/frontmatter.js
Outdated
|
|
||
| const { sha3_256 } = sha3; | ||
|
|
||
| const [{ IndexingError }] = errorCauses({ |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable IndexingError.
| const [{ IndexingError }] = errorCauses({ | |
| errorCauses({ |
cb497ad to
c512dcd
Compare
Implement the foundation for the Recursive Language Model (RLM) project: - SQLite database with FTS5 full-text search - Frontmatter indexer for .md/.mdc files with incremental updates - Dependency indexer for import/require statements - Fan-out search combining FTS5 and metadata strategies - Recursive graph traversal using SQLite CTEs - CLI tools: index-cli, query-cli, find-related-cli - 103 tests with comprehensive coverage New npm scripts: - npm run aidd:index - Full reindex with dependencies - npm run aidd:index:incremental - Update only changed files - npm run aidd:query "term" - Fan-out search - npm run aidd:find-related file - Dependency graph traversal Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
c512dcd to
08d6bf6
Compare
Summary
index-cli,query-cli,find-related-cliNew npm scripts
Review Fixes
Based on code review feedback, the following improvements were made:
Security (HIGH PRIORITY)
isValidJsonPath()validation inmetadata.jsto prevent injection through filter keysCode Quality (MEDIUM PRIORITY)
silentparameter tosearchFts5()andfanOutSearch()for error visibilitychar(0)(null byte) for robust path trackingai/tools/index.d.tswith full type coverageCleanup (LOW PRIORITY)
error-causesimports fromfrontmatter.jsDeferred by design
The following Phase 1 items were intentionally deferred based on implementation decisions:
--incremental) + pre-commit hook integrationTest plan
🤖 Generated with Claude Code