fix(grep_files): wrap walk in spawn_blocking + 30s timeout#2146
Conversation
grep_files runs its directory walk and per-file regex synchronously inside the async execute(). On a large tree this pins the runtime worker for minutes, so the turn loop can't observe the cancel token and the stop button stays unresponsive — the same failure file_search fixed in Hmbown#2035. Mirror that fix: move the blocking walk onto spawn_blocking, bounded by a 30s hard timeout with a biased select on the cancel token, via a run_blocking_grep helper that parallels run_blocking_file_search. Output and the existing per-file/per-line cancel checks are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request wraps the synchronous, blocking directory walk and regex matching of the grep_files tool inside a tokio::task::spawn_blocking worker bounded by a 30-second timeout. This prevents blocking the async runtime during large searches. A review comment points out that stripping the workspace prefix from file paths might fail if the workspace path is relative while the searched file paths are absolute. It suggests canonicalizing the workspace path to ensure consistent relative paths in the results.
| let search_path = context.resolve_path(path_str)?; | ||
|
|
||
| let cancel_token = context.cancel_token.as_ref(); | ||
| let workspace = context.workspace.clone(); |
There was a problem hiding this comment.
If context.workspace is a relative path (e.g., . or ./), but search_path is resolved to an absolute canonicalized path via context.resolve_path(path_str)?, then file_path (which is relative to search_path) will also be absolute and canonicalized. In this case, file_path.strip_prefix(&workspace) will fail because one is absolute and the other is relative, resulting in the grep results containing absolute paths instead of relative paths.
Canonicalizing workspace ensures both paths are absolute and canonicalized, allowing strip_prefix to succeed.
| let workspace = context.workspace.clone(); | |
| let workspace = context | |
| .workspace | |
| .canonicalize() | |
| .unwrap_or_else(|_| context.workspace.clone()); |
| async fn run_blocking_grep<F>( | ||
| timeout: Duration, | ||
| cancel_token: Option<CancellationToken>, | ||
| search: F, | ||
| ) -> Result<Value, ToolError> | ||
| where | ||
| F: FnOnce() -> Result<Value, ToolError> + Send + 'static, | ||
| { | ||
| if cancel_token | ||
| .as_ref() | ||
| .is_some_and(CancellationToken::is_cancelled) | ||
| { | ||
| return Err(grep_cancelled()); | ||
| } | ||
|
|
||
| if regex.is_match(line) { | ||
| total_matches += 1; | ||
|
|
||
| // Get context lines | ||
| let context_before: Vec<String> = (line_idx.saturating_sub(context_lines) | ||
| ..line_idx) | ||
| .filter_map(|i| lines.get(i).map(|s| (*s).to_string())) | ||
| .collect(); | ||
|
|
||
| let context_after: Vec<String> = ((line_idx + 1) | ||
| ..=(line_idx + context_lines).min(lines.len() - 1)) | ||
| .filter_map(|i| lines.get(i).map(|s| (*s).to_string())) | ||
| .collect(); | ||
|
|
||
| // Get relative path from workspace | ||
| let relative_path = file_path | ||
| .strip_prefix(&context.workspace) | ||
| .unwrap_or(&file_path) | ||
| .to_string_lossy() | ||
| .to_string(); | ||
|
|
||
| results.push(GrepMatch { | ||
| file: relative_path, | ||
| line_number: line_idx + 1, | ||
| line: (*line).to_string(), | ||
| context_before, | ||
| context_after, | ||
| }); | ||
|
|
||
| if results.len() >= max_results { | ||
| break; | ||
| } | ||
| } | ||
| let task = tokio::task::spawn_blocking(search); | ||
| let result = match cancel_token { | ||
| Some(token) => { | ||
| tokio::select! { | ||
| biased; | ||
| () = token.cancelled() => return Err(grep_cancelled()), | ||
| result = tokio::time::timeout(timeout, task) => result, | ||
| } | ||
| } | ||
| None => tokio::time::timeout(timeout, task).await, | ||
| }; | ||
|
|
||
| let matches_json: Vec<Value> = results | ||
| .iter() | ||
| .map(|item| grep_match_to_json(item, context_lines)) | ||
| .collect(); | ||
|
|
||
| // Build result. When context_lines == 1, return the single context | ||
| // line as a string instead of a one-item array. That keeps the common | ||
| // "show just the adjacent line" case easy for model callers to read. | ||
| let result = json!({ | ||
| "matches": matches_json, | ||
| "total_matches": total_matches, | ||
| "files_searched": files_searched, | ||
| "truncated": total_matches > max_results, | ||
| }); | ||
| let joined = result.map_err(|_| grep_timeout(timeout))?; | ||
| joined.map_err(|err| { | ||
| ToolError::execution_failed(format!("grep_files worker failed before completion: {err}")) | ||
| })? | ||
| } |
There was a problem hiding this comment.
Missing timeout unit test for
run_blocking_grep
file_search.rs has test_file_search_blocking_wrapper_reports_timeout which directly calls run_blocking_file_search with a 1ms timeout and a 50ms sleep, confirming that ToolError::Timeout { seconds: 1 } is returned. file.rs has the same test for list_dir. The new run_blocking_grep has no equivalent test, so the 30-second timeout path is unexercised. A slow or stuck scan that hits the cap would silently skip validation of the ToolError::Timeout shape and the correct seconds value returned to callers.
| fn grep_cancelled() -> ToolError { | ||
| ToolError::execution_failed("grep_files cancelled before completion") | ||
| } |
There was a problem hiding this comment.
Inconsistent cancellation error messages
grep_cancelled() returns "grep_files cancelled before completion", while the existing check_cancelled() helper in the same file returns "search cancelled before completion". When a cancel fires through the select! path the user sees one message; when it fires through the per-file check_cancelled checkpoint they see a different one. Aligning both to the same string avoids confusing diagnostics.
| fn grep_cancelled() -> ToolError { | |
| ToolError::execution_failed("grep_files cancelled before completion") | |
| } | |
| fn grep_cancelled() -> ToolError { | |
| ToolError::execution_failed("search cancelled before completion") | |
| } |
Problem
grep_filesruns its directory walk and per-file regex synchronously inside the asyncexecute(). On a large tree this pins the Tokio worker thread for the whole scan (can be minutes). While the per-file/per-linecheck_cancelledchecks let it bail between files, the synchronous stretch monopolizes the runtime worker, so the turn loop can't make progress and the stop button stays unresponsive until the scan reaches the next check point.This is the same failure mode
file_searchhad, fixed in #2035 by moving the blocking walk ontospawn_blockingwith a hard timeout.grep_fileshas identical structure but wasn't given the same treatment — so the two blocking searches behave inconsistently.Fix
Mirror the
file_searchfix:tokio::task::spawn_blocking.select!on the cancel token, via a newrun_blocking_grephelper that parallelsrun_blocking_file_search.check_cancelledcalls andMAX_FILE_SIZEskip — output is byte-for-byte unchanged.Net effect: the stop button stays responsive during a large grep (the runtime worker is no longer pinned), and a pathological scan can't run unbounded.
Test
cargo test -p codewhale-tui --bins tools::search— all 15 tests pass, includingtest_grep_files_respects_cancel_token(which now also exercises the newrun_blocking_grepcancel path).cargo fmt+cargo clippyclean on the changed file.🤖 Generated with Claude Code
Greptile Summary
This PR moves the synchronous directory walk and per-file regex work in
grep_filesoff the Tokio async worker thread and ontospawn_blocking, bounded by a 30-second hard timeout and a biasedselect!on the cancellation token. The change directly mirrors the identical fix previously applied tofile_searchin #2035.run_blocking_grepis added as a new async helper that faithfully replicates the structure ofrun_blocking_file_search, including the early-cancellation guard, biasedselect!, andJoinErrorunwrapping.check_cancelledcheckpoints andMAX_FILE_SIZEskips are preserved unchanged inside the closure; output is byte-for-byte identical to the previous synchronous path.file_search.rsmirror has a dedicated unit test (test_file_search_blocking_wrapper_reports_timeout) that directly exercises the timeout return path; no equivalent test was added here forrun_blocking_grep.Confidence Score: 4/5
Safe to merge; the change correctly offloads blocking work to a dedicated thread pool and the existing cancel and file-size guards remain intact.
The blocking-to-spawn_blocking refactor is structurally identical to the already-shipped file_search fix, so the pattern is battle-tested. The only concrete gaps are a missing unit test for the 30s timeout path and a minor inconsistency in the cancellation error message — neither affects correctness of the main fix.
Only crates/tui/src/tools/search.rs changed; the timeout test gap in that file is the one area worth a second look before merging.
Important Files Changed
tokio::task::spawn_blockingwith a 30-secondtokio::time::timeoutand biasedselect!on the cancel token, mirroring the existingrun_blocking_file_searchpattern. Minor: the newgrep_cancellederror message is inconsistent with the existingcheck_cancelledmessage, and there is no unit test for therun_blocking_greptimeout path (unlike the equivalent function infile_search.rs).Sequence Diagram
sequenceDiagram participant Caller as execute() participant RBG as run_blocking_grep() participant BT as spawn_blocking task participant CT as CancellationToken Caller->>RBG: run_blocking_grep(30s, cancel_token.clone(), closure) RBG->>CT: is_cancelled()? (early guard) CT-->>RBG: false continue RBG->>BT: tokio::task::spawn_blocking(closure) Note over BT: walk dirs, regex per file alt cancel fires before timeout CT-->>RBG: token.cancelled() (biased select!) RBG-->>Caller: Err(grep_cancelled) else task completes within 30s BT-->>RBG: Ok(json result) RBG-->>Caller: Ok(Value) else 30s timeout expires RBG-->>Caller: Err(ToolError::Timeout) endReviews (1): Last reviewed commit: "fix(grep_files): wrap walk in spawn_bloc..." | Re-trigger Greptile