Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 0 additions & 19 deletions src/routers/http/router.rs
Original file line number Diff line number Diff line change
Expand Up @@ -593,13 +593,6 @@ impl Router {
false
};

// Keep a clone for potential cleanup on retry
let worker_for_cleanup = if load_incremented {
Some(worker.clone())
} else {
None
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load counter leak on early-return error paths

Low Severity

The PR's assumption that send_typed_request always handles load decrements isn't fully accurate. When intra_node_data_parallel_size > 1 and extract_dp_rank fails, send_typed_request returns INTERNAL_SERVER_ERROR (500) at line 778 without decrementing. Since 500 is retryable, the removed retry cleanup block was the only thing that would have decremented the load in this path. Now, each such failure permanently leaks +1 into the load_counter, making the worker appear increasingly overloaded over time.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 070d34c. Configure here.

let response = self
.send_typed_request(
headers,
Expand All @@ -617,18 +610,6 @@ impl Router {
let status = response.status();
worker.record_outcome(status.is_success() || status.is_client_error());

// For retryable failures, we need to decrement load since send_typed_request
// won't have done it (it only decrements on success or non-retryable failures)
if is_retryable_status(response.status()) && load_incremented {
if let Some(cleanup_worker) = worker_for_cleanup {
cleanup_worker.decrement_load();
RouterMetrics::set_running_requests(
cleanup_worker.url(),
cleanup_worker.load(),
);
}
}

response
Comment on lines 610 to 613
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve load decrement for retryable preflight failures

Removing the retry-path cleanup here assumes send_typed_request always releases load_incremented, but send_typed_request can still return a retryable 500 before any decrement (e.g., DP rank extraction failure in send_typed_request at src/routers/http/router.rs:774-783). In cache_aware mode with retries enabled, that path now increments load on each attempt and never decrements it, which can leave a worker permanently overcounted and distort subsequent routing decisions.

Useful? React with 👍 / 👎.

},
// should_retry predicate
Expand Down
Loading