fix: make HTTP 429 retryable when no fallback model, respect Retry-After header#2096
fix: make HTTP 429 retryable when no fallback model, respect Retry-After header#2096simon-agent-go-expert wants to merge 2 commits intodocker:mainfrom
Conversation
…ter header When a model returns HTTP 429 (Too Many Requests): - If fallback models are configured: skip to next fallback immediately (existing behavior preserved) - If no fallbacks are configured: retry the same model using the Retry-After header duration (capped at 60s), falling back to exponential backoff if the header is absent HTTP 500 was already handled as retryable via IsRetryableStatusCode — no change needed. New additions to pkg/modelerrors: - ExtractRetryAfter(err): extracts Retry-After duration from *anthropic.Error and *openai.Error responses - parseRetryAfterHeader(value): parses integer-seconds or HTTP-date Retry-After values - ClassifyModelError(err): unified classifier returning (retryable, rateLimited bool, retryAfter time.Duration) - MaxRetryAfterWait constant (60s cap) pkg/runtime/fallback.go changes: - Replace both IsRetryableModelError call sites with ClassifyModelError - Add handleModelError helper to avoid duplicating classification logic - 429 with fallbacks → break to next model (unchanged behavior) - 429 without fallbacks → sleep(min(retryAfter, 60s) or backoff), then continue Fixes docker#2095 Assisted-By: docker-agent
There was a problem hiding this comment.
Review Summary
Assessment: 🟢 APPROVE
Summary
This PR successfully implements HTTP 429 retry logic with Retry-After header support. The code review found no bugs in the changed code.
Key changes reviewed:
- ✅
ExtractRetryAftercorrectly extracts and parses Retry-After headers from both Anthropic and OpenAI SDK errors - ✅
parseRetryAfterHeaderproperly handles both integer-seconds and HTTP-date formats per RFC 7231 - ✅
ClassifyModelErrorprovides unified error classification with proper separation of concerns (retryable vs rate-limited) - ✅
handleModelErrorin fallback.go correctly implements the retry-vs-fallback decision logic - ✅ 60-second cap on Retry-After prevents indefinite blocking
- ✅ Comprehensive test coverage for all new functions and edge cases
The implementation is well-structured, properly tested, and maintains backward compatibility with existing error handling functions.
No issues found — the code is ready to merge.
…ter to providers Addresses two inline review comments on PR docker#2096: 1. Comment 1 (line 35): Merge the standalone maxRetryAfterWait and MaxRetryAfterWait constants into the existing backoff constants block. Eliminates the two-step declaration. 2. Comment 2 (line 313): Move provider-specific Retry-After header extraction logic out of the shared modelerrors package and into each provider package: - pkg/model/provider/anthropic/adapter.go: ExtractRetryAfter + parseRetryAfterHeader - pkg/model/provider/openai/retry.go: ExtractRetryAfter + parseRetryAfterHeader - pkg/model/provider/retry.go: top-level ExtractRetryAfter dispatcher - modelerrors.ClassifyModelError now accepts retryAfter time.Duration as a parameter (caller supplies it from provider.ExtractRetryAfter) instead of calling into SDKs itself - modelerrors.ParseRetryAfterHeader exported for general use (header parsing is generic) - fallback.go: calls provider.ExtractRetryAfter(err) then passes result to ClassifyModelError Tests updated accordingly: - modelerrors_test.go: updated ClassifyModelError call sites, removed SDK helper functions - anthropic/retry_test.go: new test file covering ExtractRetryAfter and parseRetryAfterHeader - openai/retry_test.go: new test file covering ExtractRetryAfter and parseRetryAfterHeader Assisted-By: docker-agent
pkg/model/provider/openai/retry.go
Outdated
|
|
||
| // ExtractRetryAfter extracts the Retry-After wait duration from an OpenAI API error. | ||
| // Returns 0 if the error is not an *openai.Error, has no response, or has no Retry-After header. | ||
| func ExtractRetryAfter(err error) time.Duration { |
There was a problem hiding this comment.
This seems to be the same exact same as the anthropic implementation.
I like the idea of having each provider customize the retry after extraction, but I'd also like to avoid code duplication
pkg/model/provider/retry.go
Outdated
| // It delegates to each provider package to keep provider-specific SDK knowledge | ||
| // local to each provider. | ||
| // Returns 0 if no Retry-After header is found or the error type is unsupported. | ||
| func ExtractRetryAfter(err error) time.Duration { |
There was a problem hiding this comment.
We shouldn't need this. Providers could return a typed error that contains everything we need so the check would become (in fallbac) a errors.As or whatever
…rors at Recv() Addresses review feedback from rumpl and simonferquel on PR docker#2096. The previous approach extracted Retry-After metadata in a dispatcher function that type-switched on provider SDK error types (`*anthropic.Error`, `*openai.Error`). This kept all providers coupled to the modelerrors package and duplicated parsing logic. New design: 1. `modelerrors.StatusError` — a thin typed wrapper carrying: - StatusCode int - RetryAfter time.Duration (parsed from Retry-After header) - Err error (original SDK error, accessible via Unwrap) 2. `modelerrors.WrapHTTPError(statusCode, resp, err)` — wraps any provider error in StatusError; parses Retry-After from resp.Header if non-nil. 3. `modelerrors.ParseRetryAfterHeader(value)` — shared RFC 7231 §7.1.3 parser (integer seconds + HTTP-date). No SDK imports. 4. `modelerrors.ClassifyModelError(err)` — single-arg; uses errors.As to find *StatusError in the chain; falls back to message/pattern matching for providers not yet wrapped (e.g. Bedrock). 5. Three new wrap.go files (one per provider family), each a single function: - pkg/model/provider/anthropic/wrap.go — wrapAnthropicError(*anthropic.Error) - pkg/model/provider/oaistream/wrap.go — WrapOpenAIError(*openai.Error) (exported) - pkg/model/provider/gemini/wrap.go — wrapGeminiError(*genai.APIError) 6. Wrap sites — one-line change at Recv() in each adapter: - anthropic/adapter.go, beta_adapter.go - oaistream/adapter.go - openai/response_stream.go (reuses oaistream.WrapOpenAIError) - gemini/adapter.go No changes to fallback.go logic — ClassifyModelError signature stays `(err error)` and retry/rate-limit decisions are unchanged. Assisted-By: docker-agent
96806ae to
3d44b7c
Compare
|
I like the new approach, letting @rumpl to take the final decision |
|
/review triggering docker agent for good measure |
There was a problem hiding this comment.
Review Summary
Assessment: 🟢 APPROVE
Summary
This PR implements HTTP 429 retry logic with Retry-After header support when no fallback models are configured. The implementation is well-structured and comprehensive:
Strengths:
- ✅ Proper error classification with
ClassifyModelErrorthat distinguishes retryable errors, rate limits, and extracts retry delays - ✅ Correct
Retry-Afterheader parsing supporting both integer-seconds and HTTP-date formats per RFC 7231 - ✅ Sensible 60-second cap on
Retry-Afterto prevent indefinite blocking - ✅ Clean refactoring of
fallback.gowith extractedhandleModelErrorhelper to avoid code duplication - ✅ Comprehensive test coverage including edge cases (429 with/without fallbacks, 500 retries, header parsing)
- ✅ Backward compatibility preserved for existing
IsRetryableModelErrorandIsRetryableStatusCodefunctions - ✅ Proper error wrapping throughout the provider adapters
No bugs found in the changed code. The implementation correctly handles the distinction between rate-limited errors (429) with and without fallback models, and properly respects the Retry-After header when present.
Findings
No issues to report. The code is production-ready.
Automated review by docker-agent PR reviewer
Summary
Fixes #2095
When the model provider returns HTTP 429 (Too Many Requests):
Retry-Afterresponse header (capped at 60 s), falling back to exponential backoff when the header is absentHTTP 500 was already retryable via
IsRetryableStatusCode— no change needed.What changed
pkg/modelerrors/modelerrors.goExtractRetryAfter(err error) time.Duration— extractsRetry-Afterfrom*anthropic.Errorand*openai.Error(both expose*http.Response); supports integer-seconds and HTTP-date formatsparseRetryAfterHeader(value string) time.Duration— parses the header per RFC 7231 §7.1.3ClassifyModelError(err error) (retryable, rateLimited bool, retryAfter time.Duration)— unified classifier used by the retry loop; existingIsRetryableModelError/IsRetryableStatusCodeunchanged for backward compatibilityMaxRetryAfterWaitconstant (60 s cap) to prevent a misbehaving server from blocking the agent indefinitelypkg/runtime/fallback.goIsRetryableModelErrorcall sites withClassifyModelErrorhandleModelErrorhelper method to avoid duplicating classification logic across stream-creation and stream-handling error pathsretryDecisiontype (Continue/Break/Return) for clarityTests
pkg/modelerrors/modelerrors_test.go:TestClassifyModelError,TestExtractRetryAfter,TestParseRetryAfterHeaderpkg/runtime/fallback_test.go:TestFallback429WithFallbacksSkipsToNextModel,TestFallback429WithoutFallbacksRetriesSameModel,TestFallback429WithoutFallbacksExhaustsRetries,TestFallback500RetryableWithBackoffNote on HTTP 500
HTTP 500 was already handled as retryable by
IsRetryableStatusCode(500) == truebefore this PR. The requirement is satisfied by the existing code; this PR adds an explicit test (TestFallback500RetryableWithBackoff) to document and protect that behaviour.