Skip to content

feat(apigateway): Retry proxied cell requests on transient failures#116664

Open
dmajere wants to merge 5 commits into
masterfrom
change/proxy/add-url-attrib
Open

feat(apigateway): Retry proxied cell requests on transient failures#116664
dmajere wants to merge 5 commits into
masterfrom
change/proxy/add-url-attrib

Conversation

@dmajere
Copy link
Copy Markdown
Contributor

@dmajere dmajere commented Jun 2, 2026

Retry proxied cell requests on transient failures (Timeout, connection errors, and 5xx responses with status >= 502) before surfacing the error to the caller.

Retry Decorator

A new retry decorator in proxy.py re-invokes the wrapped function up to three times when it raises Retryable. If retries are exhausted, the original wrapped exception is re-raised so callers see the same exception type as before.

Option Gate

Retries are gated on the apigateway.proxy.retry.enabled option. When disabled, the decorator runs the function once and immediately re-raises the original exception, preserving today's behavior.

Wrapping in proxy_cell_request

proxy_cell_request now raises Retryable for Timeout (wrapping RequestTimeout), ConnectionError, and responses with status_code >= 502 (wrapping requests.HTTPError). Previously, 5xx responses were returned to the caller; with retry enabled, they trigger a retry, and on exhaustion the HTTPError is raised.

Add a Retryable exception wrapper and a retry decorator gated by the
apigateway.proxy.retry.enabled option. Wrap Timeout, ConnectionError,
and 5xx (>= 502) responses in proxy_cell_request so the decorator can
retry up to three times before re-raising the original exception.

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 2, 2026
@dmajere dmajere marked this pull request as ready for review June 2, 2026 04:24
@dmajere dmajere requested a review from a team as a code owner June 2, 2026 04:24
Comment thread src/sentry/hybridcloud/apigateway/proxy.py
Comment thread src/sentry/hybridcloud/apigateway/proxy.py
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
R = TypeVar("R")


def retry(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to add retries we should use requests.HTTPAdapter and Retry instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markstory reimplemented with adapter

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e2d0c4e. Configure here.

max_retries = options.get("apigateway.proxy.retry.max_count") if retry_enabled else 0
requester: Callable[..., ExternalResponse] = (
_get_connection().request if use_pooling else external_request
_get_connection(max_retries).request if use_pooling else external_request
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retries require connection pooling

High Severity

With apigateway.proxy.retry.enabled on, retries only apply when hybridcloud.apigateway.use_pooling.rate selects pooled _get_connection(max_retries).request. The non-pooling path still uses external_request, which has no Retry adapter, so max_retries is unused and transient failures are not retried.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e2d0c4e. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intended

Comment thread src/sentry/hybridcloud/apigateway/proxy.py
Comment thread src/sentry/hybridcloud/apigateway/proxy.py
@dmajere dmajere requested a review from markstory June 2, 2026 20:23
Comment thread src/sentry/hybridcloud/apigateway/proxy.py
Comment thread src/sentry/hybridcloud/apigateway/proxy.py
total=retry_count,
backoff_factor=0.1,
status_forcelist=[502, 503, 504],
allowed_methods=["GET", "POST", "PUT", "DELETE"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The Retry configuration for the API gateway proxy is missing PATCH and HEAD from allowed_methods, preventing transient failures for these request types from being retried.
Severity: MEDIUM

Suggested Fix

Add PATCH and HEAD to the allowed_methods list in the Retry configuration within src/sentry/hybridcloud/apigateway/proxy.py. This will align its behavior with other retry mechanisms in the codebase and ensure that all supported HTTP methods benefit from the retry logic on transient failures.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/hybridcloud/apigateway/proxy.py#L82

Potential issue: The `Retry` adapter configuration in
`src/sentry/hybridcloud/apigateway/proxy.py` omits `PATCH` and `HEAD` from its
`allowed_methods` list. When `retry_enabled` is true, proxied requests using these
methods will not be retried upon encountering transient failures like 502, 503, or 504
errors. This behavior is inconsistent with the feature's intent to provide retries for
all proxied requests and also differs from other retry configurations in the codebase
which explicitly include `PATCH` and `HEAD`. While other methods like `GET` and `POST`
will be retried as expected, the lack of retries for `PATCH` and `HEAD` reduces the
reliability of features that use them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants