Skip to content

Fix/nested workspace detection#50

Open
doITmagic wants to merge 6 commits intodevfrom
fix/nested-workspace-detection
Open

Fix/nested workspace detection#50
doITmagic wants to merge 6 commits intodevfrom
fix/nested-workspace-detection

Conversation

@doITmagic
Copy link
Copy Markdown
Owner

Description

This pull request resolves a critical issue regarding nested workspace detection and ragcode registry fragmentation. It enforces a strict root-first detection policy by relying on priority markers (VCS > Project > Language markers).

Key fixes include:

  • Overhauling pkg/workspace/detector to discover and retain only the highest-priority root workspace.
  • Refactoring the AI tool rag_index_workspace to require an explicit workspace_root argument, deprecating the vulnerable file_path reliance.
  • Adding a mandatory dual-stage AI validation via schemas, eliminating accidental indexing decisions.
  • Suppressing sub-system fragmentations in index_status (hasParentRagcode) and Registry (absorbChildren).
  • Sanitizing the test suite to remove hardcoded assertions dependent on a "backend" layer path.
  • Resolving documentation and server.json inaccuracies relating to the file_path API parameter.

Fixes #48

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist:

  • I have performed a self-review of my own code
  • I have formatted my code with go fmt ./...
  • I have run tests go test ./... and they pass
  • I have verified integration with Ollama/Qdrant (if applicable)
  • I have updated the documentation accordingly

- Extract hardcoded Tier markers into dynamic detector Options
- Redesign walkUp to retain highest-priority roots instead of returning eagerly
- Refactor rag_index_workspace tool to necessitate a solid 'workspace_root' rather than 'file_path'
- Introduce dual-stage AI validation via the 'confirm' tool schema parameter
- Eradicate legacy hardcoded workspace paths inside test suite (avoid 'backend' assumptions)
- Update documentation and server.json resolving legacy file_path instructions
- Expose FindAlternativeCandidates securely via engine wrappers
- Extract hardcoded Tier markers into dynamic detector Options
- Redesign walkUp to retain highest-priority roots instead of returning eagerly
- Refactor rag_index_workspace tool to necessitate a solid 'workspace_root' rather than 'file_path'
- Introduce dual-stage AI validation via the 'confirm' tool schema parameter
- Eradicate legacy hardcoded workspace paths inside test suite (avoid 'backend' assumptions)
- Update documentation and server.json resolving legacy file_path instructions
- Expose FindAlternativeCandidates securely via engine wrappers
- Introduce structural clean-ups in index_status (hasParentRagcode) and Registry logic (absorbChildren) to thwart fragmentation
Copilot AI review requested due to automatic review settings April 1, 2026 13:34
@doITmagic doITmagic self-assigned this Apr 1, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a tiered workspace detection system and introduces a mandatory confirmation step for the rag_index_workspace tool to prevent accidental indexing of incorrect directories. It also adds logic to automatically "absorb" and clean up nested workspace registries when a parent workspace is registered. Feedback includes a recommendation to ensure the engine's alternative candidate detection uses consistent configuration, a warning about potential deadlocks when calling audit interfaces while holding registry locks, and a suggestion to refine the workspace detection logic to prefer the highest valid root within a tier.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent nested workspace/root fragmentation by enforcing higher-priority root detection (VCS-first), tightening the rag_index_workspace API to require an explicit workspace_root + confirmation flow, and adding safeguards/cleanup to avoid nested .ragcode state.

Changes:

  • Refactors workspace detection to tier markers and selects the highest-priority candidate while walking upward.
  • Updates rag_index_workspace to accept workspace_root and introduce a confirmation gate; updates docs/server metadata accordingly.
  • Adds registry/index-status logic intended to suppress/clean nested .ragcode artifacts, plus adjusts installer IDE config path handling.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
server.json Updates tool description to require explicit workspace_root + confirm step.
pkg/workspace/registry/registry.go Adds absorbChildren to remove nested registry entries and delete nested .ragcode dirs.
pkg/workspace/detector/detector.go Introduces tiered marker options and “best tier wins” upward-walk selection.
pkg/workspace/detector/detector_test.go Adds test ensuring .git root (tier 1) beats nested language marker roots.
pkg/indexer/index_status.go Blocks writing index status when a parent .ragcode exists (nested suppression).
llms-full.txt Updates documentation examples to use workspace_root + confirm.
internal/service/tools/index_workspace.go Switches tool input to workspace_root/confirm and adds suggestion messaging.
internal/service/engine/engine.go Adds wrapper for detector-based alternative root suggestions.
docs/tools/doc_index_workspace.md Updates tool docs to reference workspace_root + confirmation sequence.
cmd/rag-code-install/main.go Updates IDE config path detection (Cursor/VS Code, adds additional clients).

razvan added 2 commits April 2, 2026 22:56
Resolved 6 out of 7 review comments from Gemini Code Assist and Copilot:

- C1: Use engine's configured detector instead of DefaultOptions() in
  FindAlternativeCandidates for consistent AllowedRoots/ExcludePatterns
- C2/C6: Move os.RemoveAll and audit.Record outside mutex lock in
  absorbChildren to eliminate deadlock risk (FS ops + interface calls)
- C4: Remove .ragcode and root from Tier2Markers to prevent circular
  detection and false positives from generic filenames
- C5: Enforce two-stage confirmation with nonce/token in
  rag_index_workspace tool (prevents bypassing preview step)
- C7: Cache hasParentRagcode results via sync.Map and skip check when
  .ragcode dir already exists (avoids 10x os.Stat per status write)
- C8: Add DetectContextAsRoot routing workspace_root through
  req.WorkspaceRoot (ReasonExplicitWorkspaceRoot, confidence 1.0)
  instead of req.FilePath (marker detection path)

C3 (walkUp tier comparison) kept as-is: current < logic correctly
prefers deepest candidate within same tier for monorepo subprojects.

Also fixed pre-existing lint warnings in smart_search.go and
engine_searchcode_test.go (unused params, unused write).
…ude tmp dir

- Added `Breakdown map[string]int` to `LangStatus` to track sub-extension counts per language (e.g., TS vs JS vs Vue).
- Refactored `CountAllFiles` to return a `FileCountResult` struct containing both main counts and extension breakdowns.
- Updated `engine.go` to populate `Breakdown` during indexing pre-scan, allowing user interfaces to distinguish between types like TypeScript and JavaScript files.
- Added `tmp` to hardcoded exclusion directories to prevent test fixture workspaces from skewing index counts.
- Added comprehensive unit tests for `CountAllFiles` and JSON round-trip stability.
Copilot AI review requested due to automatic review settings April 2, 2026 20:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Comment on lines +149 to +153
absorbed = r.absorbChildren(root)
err := r.save()
r.mu.Unlock()
r.cleanupAbsorbed(absorbed)
return entry, err
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upsert deletes/absorbs child entries and then calls cleanupAbsorbed regardless of whether r.save() succeeds. If save fails, you can end up deleting child .ragcode dirs / emitting audit events while the on-disk registry still contains those children (and/or the parent upsert wasn’t persisted), causing state divergence after restart. Consider only running absorbChildren + cleanup when the save succeeds, or rolling back in-memory mutations on save error before unlocking/cleaning up.

Copilot uses AI. Check for mistakes.
Comment on lines +172 to +176
absorbed = r.absorbChildren(root)
err := r.save()
r.mu.Unlock()
r.cleanupAbsorbed(absorbed)
return entry, err
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the existing-entry branch: absorbChildren + cleanupAbsorbed runs even when r.save() returns an error. That can delete child .ragcode directories and mutate the in-memory registry without persisting the change, leading to inconsistency between memory/disk and across restarts. Gate the cleanup (and ideally the absorption) on a successful save, or implement rollback on save failure.

Copilot uses AI. Check for mistakes.
child := r.entries[id]
absorbed = append(absorbed, absorbedChild{Root: child.Root, ParentRoot: parentRoot})
delete(r.entries, id)
delete(r.indexRoot, strings.ToLower(child.Root))
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absorbChildren deletes r.indexRoot using strings.ToLower(child.Root), but indexRoot keys elsewhere are normalized with filepath.Clean(...) (e.g., LookupByRoot uses strings.ToLower(filepath.Clean(root))). This mismatch can leave stale indexRoot entries behind for paths that differ only by cleaning (trailing separators, etc.). Delete using the same normalization (strings.ToLower(filepath.Clean(child.Root))).

Suggested change
delete(r.indexRoot, strings.ToLower(child.Root))
delete(r.indexRoot, strings.ToLower(filepath.Clean(child.Root)))

Copilot uses AI. Check for mistakes.
Comment on lines +59 to 69
if len(opts.Tier1Markers) == 0 && len(opts.Tier2Markers) == 0 && len(opts.Tier3Markers) == 0 && len(opts.Markers) == 0 {
opts.Tier1Markers = defaults.Tier1Markers
opts.Tier2Markers = defaults.Tier2Markers
opts.Tier3Markers = defaults.Tier3Markers
}

if len(opts.Markers) == 0 {
opts.Markers = defaults.Markers
opts.Markers = append(opts.Markers, opts.Tier1Markers...)
opts.Markers = append(opts.Markers, opts.Tier2Markers...)
opts.Markers = append(opts.Markers, opts.Tier3Markers...)
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detector.New only populates Tier1/2/3 marker lists when both tiers and Markers are empty. If a caller supplies opts.Markers (custom marker list) but leaves Tier* empty, getTier/getCandidateTier will treat all markers as Tier 3, breaking the intended root-priority selection. Consider always defaulting Tier1/2/3 when they’re empty (independent of Markers), or deriving tiers from the provided Markers.

Copilot uses AI. Check for mistakes.
Comment on lines +655 to +659
ext := strings.ToLower(filepath.Ext(path))
if result.Breakdowns[lang] == nil {
result.Breakdowns[lang] = make(map[string]int)
}
result.Breakdowns[lang][ext]++
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountAllFiles uses filepath.Ext(path) for breakdown keys; for extensionless-but-supported filenames (e.g., Dockerfile, Gemfile, artisan), ext will be "" and the breakdown map will end up with an empty-string key, which is not very useful for consumers. Consider using the basename when Ext() is empty so breakdown keys remain meaningful.

Suggested change
ext := strings.ToLower(filepath.Ext(path))
if result.Breakdowns[lang] == nil {
result.Breakdowns[lang] = make(map[string]int)
}
result.Breakdowns[lang][ext]++
key := strings.ToLower(filepath.Ext(path))
if key == "" {
key = strings.ToLower(filepath.Base(path))
}
if result.Breakdowns[lang] == nil {
result.Breakdowns[lang] = make(map[string]int)
}
result.Breakdowns[lang][key]++

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +128
if !confirm {
// Generate a confirmation nonce for this workspace root
nonce := generateNonce()
t.pendingConfirmations.Store(wctx.Root, confirmationEntry{
Nonce: nonce,
Expiry: time.Now().Add(5 * time.Minute),
})
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pendingConfirmations entries only get removed on successful LoadAndDelete during confirmation. If users validate many different roots and never confirm (or abandon), the map can grow without bound in a long-running daemon. Consider adding a lightweight expiry sweep (on Store/Execute) or bounding the cache size and evicting expired entries.

Copilot uses AI. Check for mistakes.
Comment on lines +366 to +373
// Auto-trigger indexing same as DetectContext
if e.config == nil || e.config.Workspace.AutoIndex {
if _, triggered := e.connectTriggered.LoadOrStore(wctx.ID, true); !triggered {
logger.Instance.Info("[DAEMON] [WS-DETECT] Auto-triggering incremental index for workspace: %s", wctx.Root)
e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false)
}
}

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DetectContextAsRoot currently auto-triggers indexing the same way DetectContext does. This conflicts with the new two-stage "validate then confirm" design for rag_index_workspace (and any other validation-only call paths), because simply resolving a root can start indexing. Consider separating resolution from auto-index side effects, or adding an option to disable auto-index for explicit root validation calls.

Suggested change
// Auto-trigger indexing same as DetectContext
if e.config == nil || e.config.Workspace.AutoIndex {
if _, triggered := e.connectTriggered.LoadOrStore(wctx.ID, true); !triggered {
logger.Instance.Info("[DAEMON] [WS-DETECT] Auto-triggering incremental index for workspace: %s", wctx.Root)
e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false)
}
}
// Explicit root resolution is validation-only; callers must confirm before indexing.

Copilot uses AI. Check for mistakes.
}

// FindAlternativeCandidates wraps detector logic to offer alternative root suggestions internally.
// Uses the engine's configured detector to ensure consistent AllowedRoots/ExcludePatterns.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindAlternativeCandidates comment says it uses the engine’s configured detector to ensure consistent AllowedRoots/ExcludePatterns, but NewEngine currently constructs the detector with DefaultOptions() and doesn’t apply cfg.Workspace.ExcludePatterns/allowed roots. Either wire config into detector options or adjust the comment to avoid misleading future readers.

Suggested change
// Uses the engine's configured detector to ensure consistent AllowedRoots/ExcludePatterns.
// It delegates to the engine-held detector instance used for workspace detection.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants