Conversation
- Extract hardcoded Tier markers into dynamic detector Options - Redesign walkUp to retain highest-priority roots instead of returning eagerly - Refactor rag_index_workspace tool to necessitate a solid 'workspace_root' rather than 'file_path' - Introduce dual-stage AI validation via the 'confirm' tool schema parameter - Eradicate legacy hardcoded workspace paths inside test suite (avoid 'backend' assumptions) - Update documentation and server.json resolving legacy file_path instructions - Expose FindAlternativeCandidates securely via engine wrappers
- Extract hardcoded Tier markers into dynamic detector Options - Redesign walkUp to retain highest-priority roots instead of returning eagerly - Refactor rag_index_workspace tool to necessitate a solid 'workspace_root' rather than 'file_path' - Introduce dual-stage AI validation via the 'confirm' tool schema parameter - Eradicate legacy hardcoded workspace paths inside test suite (avoid 'backend' assumptions) - Update documentation and server.json resolving legacy file_path instructions - Expose FindAlternativeCandidates securely via engine wrappers - Introduce structural clean-ups in index_status (hasParentRagcode) and Registry logic (absorbChildren) to thwart fragmentation
There was a problem hiding this comment.
Code Review
This pull request implements a tiered workspace detection system and introduces a mandatory confirmation step for the rag_index_workspace tool to prevent accidental indexing of incorrect directories. It also adds logic to automatically "absorb" and clean up nested workspace registries when a parent workspace is registered. Feedback includes a recommendation to ensure the engine's alternative candidate detection uses consistent configuration, a warning about potential deadlocks when calling audit interfaces while holding registry locks, and a suggestion to refine the workspace detection logic to prefer the highest valid root within a tier.
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent nested workspace/root fragmentation by enforcing higher-priority root detection (VCS-first), tightening the rag_index_workspace API to require an explicit workspace_root + confirmation flow, and adding safeguards/cleanup to avoid nested .ragcode state.
Changes:
- Refactors workspace detection to tier markers and selects the highest-priority candidate while walking upward.
- Updates
rag_index_workspaceto acceptworkspace_rootand introduce a confirmation gate; updates docs/server metadata accordingly. - Adds registry/index-status logic intended to suppress/clean nested
.ragcodeartifacts, plus adjusts installer IDE config path handling.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| server.json | Updates tool description to require explicit workspace_root + confirm step. |
| pkg/workspace/registry/registry.go | Adds absorbChildren to remove nested registry entries and delete nested .ragcode dirs. |
| pkg/workspace/detector/detector.go | Introduces tiered marker options and “best tier wins” upward-walk selection. |
| pkg/workspace/detector/detector_test.go | Adds test ensuring .git root (tier 1) beats nested language marker roots. |
| pkg/indexer/index_status.go | Blocks writing index status when a parent .ragcode exists (nested suppression). |
| llms-full.txt | Updates documentation examples to use workspace_root + confirm. |
| internal/service/tools/index_workspace.go | Switches tool input to workspace_root/confirm and adds suggestion messaging. |
| internal/service/engine/engine.go | Adds wrapper for detector-based alternative root suggestions. |
| docs/tools/doc_index_workspace.md | Updates tool docs to reference workspace_root + confirmation sequence. |
| cmd/rag-code-install/main.go | Updates IDE config path detection (Cursor/VS Code, adds additional clients). |
Resolved 6 out of 7 review comments from Gemini Code Assist and Copilot: - C1: Use engine's configured detector instead of DefaultOptions() in FindAlternativeCandidates for consistent AllowedRoots/ExcludePatterns - C2/C6: Move os.RemoveAll and audit.Record outside mutex lock in absorbChildren to eliminate deadlock risk (FS ops + interface calls) - C4: Remove .ragcode and root from Tier2Markers to prevent circular detection and false positives from generic filenames - C5: Enforce two-stage confirmation with nonce/token in rag_index_workspace tool (prevents bypassing preview step) - C7: Cache hasParentRagcode results via sync.Map and skip check when .ragcode dir already exists (avoids 10x os.Stat per status write) - C8: Add DetectContextAsRoot routing workspace_root through req.WorkspaceRoot (ReasonExplicitWorkspaceRoot, confidence 1.0) instead of req.FilePath (marker detection path) C3 (walkUp tier comparison) kept as-is: current < logic correctly prefers deepest candidate within same tier for monorepo subprojects. Also fixed pre-existing lint warnings in smart_search.go and engine_searchcode_test.go (unused params, unused write).
…ude tmp dir - Added `Breakdown map[string]int` to `LangStatus` to track sub-extension counts per language (e.g., TS vs JS vs Vue). - Refactored `CountAllFiles` to return a `FileCountResult` struct containing both main counts and extension breakdowns. - Updated `engine.go` to populate `Breakdown` during indexing pre-scan, allowing user interfaces to distinguish between types like TypeScript and JavaScript files. - Added `tmp` to hardcoded exclusion directories to prevent test fixture workspaces from skewing index counts. - Added comprehensive unit tests for `CountAllFiles` and JSON round-trip stability.
| absorbed = r.absorbChildren(root) | ||
| err := r.save() | ||
| r.mu.Unlock() | ||
| r.cleanupAbsorbed(absorbed) | ||
| return entry, err |
There was a problem hiding this comment.
Upsert deletes/absorbs child entries and then calls cleanupAbsorbed regardless of whether r.save() succeeds. If save fails, you can end up deleting child .ragcode dirs / emitting audit events while the on-disk registry still contains those children (and/or the parent upsert wasn’t persisted), causing state divergence after restart. Consider only running absorbChildren + cleanup when the save succeeds, or rolling back in-memory mutations on save error before unlocking/cleaning up.
| absorbed = r.absorbChildren(root) | ||
| err := r.save() | ||
| r.mu.Unlock() | ||
| r.cleanupAbsorbed(absorbed) | ||
| return entry, err |
There was a problem hiding this comment.
Same issue as the existing-entry branch: absorbChildren + cleanupAbsorbed runs even when r.save() returns an error. That can delete child .ragcode directories and mutate the in-memory registry without persisting the change, leading to inconsistency between memory/disk and across restarts. Gate the cleanup (and ideally the absorption) on a successful save, or implement rollback on save failure.
| child := r.entries[id] | ||
| absorbed = append(absorbed, absorbedChild{Root: child.Root, ParentRoot: parentRoot}) | ||
| delete(r.entries, id) | ||
| delete(r.indexRoot, strings.ToLower(child.Root)) |
There was a problem hiding this comment.
absorbChildren deletes r.indexRoot using strings.ToLower(child.Root), but indexRoot keys elsewhere are normalized with filepath.Clean(...) (e.g., LookupByRoot uses strings.ToLower(filepath.Clean(root))). This mismatch can leave stale indexRoot entries behind for paths that differ only by cleaning (trailing separators, etc.). Delete using the same normalization (strings.ToLower(filepath.Clean(child.Root))).
| delete(r.indexRoot, strings.ToLower(child.Root)) | |
| delete(r.indexRoot, strings.ToLower(filepath.Clean(child.Root))) |
| if len(opts.Tier1Markers) == 0 && len(opts.Tier2Markers) == 0 && len(opts.Tier3Markers) == 0 && len(opts.Markers) == 0 { | ||
| opts.Tier1Markers = defaults.Tier1Markers | ||
| opts.Tier2Markers = defaults.Tier2Markers | ||
| opts.Tier3Markers = defaults.Tier3Markers | ||
| } | ||
|
|
||
| if len(opts.Markers) == 0 { | ||
| opts.Markers = defaults.Markers | ||
| opts.Markers = append(opts.Markers, opts.Tier1Markers...) | ||
| opts.Markers = append(opts.Markers, opts.Tier2Markers...) | ||
| opts.Markers = append(opts.Markers, opts.Tier3Markers...) | ||
| } |
There was a problem hiding this comment.
Detector.New only populates Tier1/2/3 marker lists when both tiers and Markers are empty. If a caller supplies opts.Markers (custom marker list) but leaves Tier* empty, getTier/getCandidateTier will treat all markers as Tier 3, breaking the intended root-priority selection. Consider always defaulting Tier1/2/3 when they’re empty (independent of Markers), or deriving tiers from the provided Markers.
| ext := strings.ToLower(filepath.Ext(path)) | ||
| if result.Breakdowns[lang] == nil { | ||
| result.Breakdowns[lang] = make(map[string]int) | ||
| } | ||
| result.Breakdowns[lang][ext]++ |
There was a problem hiding this comment.
CountAllFiles uses filepath.Ext(path) for breakdown keys; for extensionless-but-supported filenames (e.g., Dockerfile, Gemfile, artisan), ext will be "" and the breakdown map will end up with an empty-string key, which is not very useful for consumers. Consider using the basename when Ext() is empty so breakdown keys remain meaningful.
| ext := strings.ToLower(filepath.Ext(path)) | |
| if result.Breakdowns[lang] == nil { | |
| result.Breakdowns[lang] = make(map[string]int) | |
| } | |
| result.Breakdowns[lang][ext]++ | |
| key := strings.ToLower(filepath.Ext(path)) | |
| if key == "" { | |
| key = strings.ToLower(filepath.Base(path)) | |
| } | |
| if result.Breakdowns[lang] == nil { | |
| result.Breakdowns[lang] = make(map[string]int) | |
| } | |
| result.Breakdowns[lang][key]++ |
| if !confirm { | ||
| // Generate a confirmation nonce for this workspace root | ||
| nonce := generateNonce() | ||
| t.pendingConfirmations.Store(wctx.Root, confirmationEntry{ | ||
| Nonce: nonce, | ||
| Expiry: time.Now().Add(5 * time.Minute), | ||
| }) |
There was a problem hiding this comment.
pendingConfirmations entries only get removed on successful LoadAndDelete during confirmation. If users validate many different roots and never confirm (or abandon), the map can grow without bound in a long-running daemon. Consider adding a lightweight expiry sweep (on Store/Execute) or bounding the cache size and evicting expired entries.
| // Auto-trigger indexing same as DetectContext | ||
| if e.config == nil || e.config.Workspace.AutoIndex { | ||
| if _, triggered := e.connectTriggered.LoadOrStore(wctx.ID, true); !triggered { | ||
| logger.Instance.Info("[DAEMON] [WS-DETECT] Auto-triggering incremental index for workspace: %s", wctx.Root) | ||
| e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
DetectContextAsRoot currently auto-triggers indexing the same way DetectContext does. This conflicts with the new two-stage "validate then confirm" design for rag_index_workspace (and any other validation-only call paths), because simply resolving a root can start indexing. Consider separating resolution from auto-index side effects, or adding an option to disable auto-index for explicit root validation calls.
| // Auto-trigger indexing same as DetectContext | |
| if e.config == nil || e.config.Workspace.AutoIndex { | |
| if _, triggered := e.connectTriggered.LoadOrStore(wctx.ID, true); !triggered { | |
| logger.Instance.Info("[DAEMON] [WS-DETECT] Auto-triggering incremental index for workspace: %s", wctx.Root) | |
| e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false) | |
| } | |
| } | |
| // Explicit root resolution is validation-only; callers must confirm before indexing. |
| } | ||
|
|
||
| // FindAlternativeCandidates wraps detector logic to offer alternative root suggestions internally. | ||
| // Uses the engine's configured detector to ensure consistent AllowedRoots/ExcludePatterns. |
There was a problem hiding this comment.
FindAlternativeCandidates comment says it uses the engine’s configured detector to ensure consistent AllowedRoots/ExcludePatterns, but NewEngine currently constructs the detector with DefaultOptions() and doesn’t apply cfg.Workspace.ExcludePatterns/allowed roots. Either wire config into detector options or adjust the comment to avoid misleading future readers.
| // Uses the engine's configured detector to ensure consistent AllowedRoots/ExcludePatterns. | |
| // It delegates to the engine-held detector instance used for workspace detection. |
Description
This pull request resolves a critical issue regarding nested workspace detection and
ragcoderegistry fragmentation. It enforces a strict root-first detection policy by relying on priority markers (VCS > Project > Language markers).Key fixes include:
pkg/workspace/detectorto discover and retain only the highest-priority root workspace.rag_index_workspaceto require an explicitworkspace_rootargument, deprecating the vulnerablefile_pathreliance.index_status(hasParentRagcode) andRegistry(absorbChildren).server.jsoninaccuracies relating to thefile_pathAPI parameter.Fixes #48
Type of change
Checklist:
go fmt ./...go test ./...and they pass