Skip to content

fix(grep_files): wrap walk in spawn_blocking + 30s timeout#2146

Merged
Hmbown merged 1 commit into
Hmbown:mainfrom
h3c-hexin:fix/grep-files-timeout
May 26, 2026
Merged

fix(grep_files): wrap walk in spawn_blocking + 30s timeout#2146
Hmbown merged 1 commit into
Hmbown:mainfrom
h3c-hexin:fix/grep-files-timeout

Conversation

@h3c-hexin
Copy link
Copy Markdown
Contributor

@h3c-hexin h3c-hexin commented May 26, 2026

Problem

grep_files runs its directory walk and per-file regex synchronously inside the async execute(). On a large tree this pins the Tokio worker thread for the whole scan (can be minutes). While the per-file/per-line check_cancelled checks let it bail between files, the synchronous stretch monopolizes the runtime worker, so the turn loop can't make progress and the stop button stays unresponsive until the scan reaches the next check point.

This is the same failure mode file_search had, fixed in #2035 by moving the blocking walk onto spawn_blocking with a hard timeout. grep_files has identical structure but wasn't given the same treatment — so the two blocking searches behave inconsistently.

Fix

Mirror the file_search fix:

  • Move the blocking walk + per-file regex onto tokio::task::spawn_blocking.
  • Bound it with a 30s hard timeout and a biased select! on the cancel token, via a new run_blocking_grep helper that parallels run_blocking_file_search.
  • Keep the existing per-file/per-line check_cancelled calls and MAX_FILE_SIZE skip — output is byte-for-byte unchanged.

Net effect: the stop button stays responsive during a large grep (the runtime worker is no longer pinned), and a pathological scan can't run unbounded.

Test

cargo test -p codewhale-tui --bins tools::search — all 15 tests pass, including test_grep_files_respects_cancel_token (which now also exercises the new run_blocking_grep cancel path). cargo fmt + cargo clippy clean on the changed file.

🤖 Generated with Claude Code

Greptile Summary

This PR moves the synchronous directory walk and per-file regex work in grep_files off the Tokio async worker thread and onto spawn_blocking, bounded by a 30-second hard timeout and a biased select! on the cancellation token. The change directly mirrors the identical fix previously applied to file_search in #2035.

  • run_blocking_grep is added as a new async helper that faithfully replicates the structure of run_blocking_file_search, including the early-cancellation guard, biased select!, and JoinError unwrapping.
  • Existing per-file/per-line check_cancelled checkpoints and MAX_FILE_SIZE skips are preserved unchanged inside the closure; output is byte-for-byte identical to the previous synchronous path.
  • The file_search.rs mirror has a dedicated unit test (test_file_search_blocking_wrapper_reports_timeout) that directly exercises the timeout return path; no equivalent test was added here for run_blocking_grep.

Confidence Score: 4/5

Safe to merge; the change correctly offloads blocking work to a dedicated thread pool and the existing cancel and file-size guards remain intact.

The blocking-to-spawn_blocking refactor is structurally identical to the already-shipped file_search fix, so the pattern is battle-tested. The only concrete gaps are a missing unit test for the 30s timeout path and a minor inconsistency in the cancellation error message — neither affects correctness of the main fix.

Only crates/tui/src/tools/search.rs changed; the timeout test gap in that file is the one area worth a second look before merging.

Important Files Changed

Filename Overview
crates/tui/src/tools/search.rs Wraps the synchronous directory walk and per-file regex in tokio::task::spawn_blocking with a 30-second tokio::time::timeout and biased select! on the cancel token, mirroring the existing run_blocking_file_search pattern. Minor: the new grep_cancelled error message is inconsistent with the existing check_cancelled message, and there is no unit test for the run_blocking_grep timeout path (unlike the equivalent function in file_search.rs).

Sequence Diagram

sequenceDiagram
    participant Caller as execute()
    participant RBG as run_blocking_grep()
    participant BT as spawn_blocking task
    participant CT as CancellationToken

    Caller->>RBG: run_blocking_grep(30s, cancel_token.clone(), closure)
    RBG->>CT: is_cancelled()? (early guard)
    CT-->>RBG: false continue

    RBG->>BT: tokio::task::spawn_blocking(closure)
    Note over BT: walk dirs, regex per file

    alt cancel fires before timeout
        CT-->>RBG: token.cancelled() (biased select!)
        RBG-->>Caller: Err(grep_cancelled)
    else task completes within 30s
        BT-->>RBG: Ok(json result)
        RBG-->>Caller: Ok(Value)
    else 30s timeout expires
        RBG-->>Caller: Err(ToolError::Timeout)
    end
Loading

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Reviews (1): Last reviewed commit: "fix(grep_files): wrap walk in spawn_bloc..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

grep_files runs its directory walk and per-file regex synchronously inside
the async execute(). On a large tree this pins the runtime worker for
minutes, so the turn loop can't observe the cancel token and the stop
button stays unresponsive — the same failure file_search fixed in Hmbown#2035.

Mirror that fix: move the blocking walk onto spawn_blocking, bounded by a
30s hard timeout with a biased select on the cancel token, via a
run_blocking_grep helper that parallels run_blocking_file_search. Output
and the existing per-file/per-line cancel checks are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request wraps the synchronous, blocking directory walk and regex matching of the grep_files tool inside a tokio::task::spawn_blocking worker bounded by a 30-second timeout. This prevents blocking the async runtime during large searches. A review comment points out that stripping the workspace prefix from file paths might fail if the workspace path is relative while the searched file paths are absolute. It suggests canonicalizing the workspace path to ensure consistent relative paths in the results.

let search_path = context.resolve_path(path_str)?;

let cancel_token = context.cancel_token.as_ref();
let workspace = context.workspace.clone();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If context.workspace is a relative path (e.g., . or ./), but search_path is resolved to an absolute canonicalized path via context.resolve_path(path_str)?, then file_path (which is relative to search_path) will also be absolute and canonicalized. In this case, file_path.strip_prefix(&workspace) will fail because one is absolute and the other is relative, resulting in the grep results containing absolute paths instead of relative paths.

Canonicalizing workspace ensures both paths are absolute and canonicalized, allowing strip_prefix to succeed.

Suggested change
let workspace = context.workspace.clone();
let workspace = context
.workspace
.canonicalize()
.unwrap_or_else(|_| context.workspace.clone());

Comment on lines +264 to +295
async fn run_blocking_grep<F>(
timeout: Duration,
cancel_token: Option<CancellationToken>,
search: F,
) -> Result<Value, ToolError>
where
F: FnOnce() -> Result<Value, ToolError> + Send + 'static,
{
if cancel_token
.as_ref()
.is_some_and(CancellationToken::is_cancelled)
{
return Err(grep_cancelled());
}

if regex.is_match(line) {
total_matches += 1;

// Get context lines
let context_before: Vec<String> = (line_idx.saturating_sub(context_lines)
..line_idx)
.filter_map(|i| lines.get(i).map(|s| (*s).to_string()))
.collect();

let context_after: Vec<String> = ((line_idx + 1)
..=(line_idx + context_lines).min(lines.len() - 1))
.filter_map(|i| lines.get(i).map(|s| (*s).to_string()))
.collect();

// Get relative path from workspace
let relative_path = file_path
.strip_prefix(&context.workspace)
.unwrap_or(&file_path)
.to_string_lossy()
.to_string();

results.push(GrepMatch {
file: relative_path,
line_number: line_idx + 1,
line: (*line).to_string(),
context_before,
context_after,
});

if results.len() >= max_results {
break;
}
}
let task = tokio::task::spawn_blocking(search);
let result = match cancel_token {
Some(token) => {
tokio::select! {
biased;
() = token.cancelled() => return Err(grep_cancelled()),
result = tokio::time::timeout(timeout, task) => result,
}
}
None => tokio::time::timeout(timeout, task).await,
};

let matches_json: Vec<Value> = results
.iter()
.map(|item| grep_match_to_json(item, context_lines))
.collect();

// Build result. When context_lines == 1, return the single context
// line as a string instead of a one-item array. That keeps the common
// "show just the adjacent line" case easy for model callers to read.
let result = json!({
"matches": matches_json,
"total_matches": total_matches,
"files_searched": files_searched,
"truncated": total_matches > max_results,
});
let joined = result.map_err(|_| grep_timeout(timeout))?;
joined.map_err(|err| {
ToolError::execution_failed(format!("grep_files worker failed before completion: {err}"))
})?
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing timeout unit test for run_blocking_grep

file_search.rs has test_file_search_blocking_wrapper_reports_timeout which directly calls run_blocking_file_search with a 1ms timeout and a 50ms sleep, confirming that ToolError::Timeout { seconds: 1 } is returned. file.rs has the same test for list_dir. The new run_blocking_grep has no equivalent test, so the 30-second timeout path is unexercised. A slow or stuck scan that hits the cap would silently skip validation of the ToolError::Timeout shape and the correct seconds value returned to callers.

Fix in Codex Fix in Claude Code Fix in Cursor

Comment on lines +297 to +299
fn grep_cancelled() -> ToolError {
ToolError::execution_failed("grep_files cancelled before completion")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent cancellation error messages

grep_cancelled() returns "grep_files cancelled before completion", while the existing check_cancelled() helper in the same file returns "search cancelled before completion". When a cancel fires through the select! path the user sees one message; when it fires through the per-file check_cancelled checkpoint they see a different one. Aligning both to the same string avoids confusing diagnostics.

Suggested change
fn grep_cancelled() -> ToolError {
ToolError::execution_failed("grep_files cancelled before completion")
}
fn grep_cancelled() -> ToolError {
ToolError::execution_failed("search cancelled before completion")
}

Fix in Codex Fix in Claude Code Fix in Cursor

@Hmbown Hmbown merged commit 795de32 into Hmbown:main May 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants