feat(apigateway): Retry proxied cell requests on transient failures#116664
feat(apigateway): Retry proxied cell requests on transient failures#116664dmajere wants to merge 5 commits into
Conversation
Add a Retryable exception wrapper and a retry decorator gated by the apigateway.proxy.retry.enabled option. Wrap Timeout, ConnectionError, and 5xx (>= 502) responses in proxy_cell_request so the decorator can retry up to three times before re-raising the original exception. Co-Authored-By: Claude <noreply@anthropic.com>
| R = TypeVar("R") | ||
|
|
||
|
|
||
| def retry( |
There was a problem hiding this comment.
If the goal is to add retries we should use requests.HTTPAdapter and Retry instead.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e2d0c4e. Configure here.
| max_retries = options.get("apigateway.proxy.retry.max_count") if retry_enabled else 0 | ||
| requester: Callable[..., ExternalResponse] = ( | ||
| _get_connection().request if use_pooling else external_request | ||
| _get_connection(max_retries).request if use_pooling else external_request |
There was a problem hiding this comment.
Retries require connection pooling
High Severity
With apigateway.proxy.retry.enabled on, retries only apply when hybridcloud.apigateway.use_pooling.rate selects pooled _get_connection(max_retries).request. The non-pooling path still uses external_request, which has no Retry adapter, so max_retries is unused and transient failures are not retried.
Reviewed by Cursor Bugbot for commit e2d0c4e. Configure here.
| total=retry_count, | ||
| backoff_factor=0.1, | ||
| status_forcelist=[502, 503, 504], | ||
| allowed_methods=["GET", "POST", "PUT", "DELETE"], |
There was a problem hiding this comment.
Bug: The Retry configuration for the API gateway proxy is missing PATCH and HEAD from allowed_methods, preventing transient failures for these request types from being retried.
Severity: MEDIUM
Suggested Fix
Add PATCH and HEAD to the allowed_methods list in the Retry configuration within src/sentry/hybridcloud/apigateway/proxy.py. This will align its behavior with other retry mechanisms in the codebase and ensure that all supported HTTP methods benefit from the retry logic on transient failures.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/hybridcloud/apigateway/proxy.py#L82
Potential issue: The `Retry` adapter configuration in
`src/sentry/hybridcloud/apigateway/proxy.py` omits `PATCH` and `HEAD` from its
`allowed_methods` list. When `retry_enabled` is true, proxied requests using these
methods will not be retried upon encountering transient failures like 502, 503, or 504
errors. This behavior is inconsistent with the feature's intent to provide retries for
all proxied requests and also differs from other retry configurations in the codebase
which explicitly include `PATCH` and `HEAD`. While other methods like `GET` and `POST`
will be retried as expected, the lack of retries for `PATCH` and `HEAD` reduces the
reliability of features that use them.


Retry proxied cell requests on transient failures (Timeout, connection errors, and 5xx responses with status >= 502) before surfacing the error to the caller.
Retry Decorator
A new
retrydecorator inproxy.pyre-invokes the wrapped function up to three times when it raisesRetryable. If retries are exhausted, the original wrapped exception is re-raised so callers see the same exception type as before.Option Gate
Retries are gated on the
apigateway.proxy.retry.enabledoption. When disabled, the decorator runs the function once and immediately re-raises the original exception, preserving today's behavior.Wrapping in proxy_cell_request
proxy_cell_requestnow raisesRetryableforTimeout(wrappingRequestTimeout),ConnectionError, and responses withstatus_code >= 502(wrappingrequests.HTTPError). Previously, 5xx responses were returned to the caller; with retry enabled, they trigger a retry, and on exhaustion theHTTPErroris raised.