fix(bootstrap): retry transient manifest fetch failures at boot#12
Merged
jh-lee-cryptolab merged 1 commit intoJun 9, 2026
Merged
Conversation
FetchManifest did a single HTTP GET, so a transient GitHub CDN failure (e.g. a 504) hard-failed the daemon at boot even though artifact downloads already retry via downloadWithRetry. This wraps the manifest fetch in the same bounded exponential-backoff retry (3 attempts, backoff 5s -> 15s -> 45s, ctx-cancel aware), reusing the existing download retry constants. Only the network fetch is retried; deterministic parse/version errors fail fast. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
couragehong
approved these changes
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FetchManifestnow retries the manifest network GET with bounded exponential backoff (3 attempts, backoff 5s → 15s → 45s, aborts immediately onctxcancellation), reusing the existingdownloadWithRetryconstants. The GET is split intofetchManifestBody; manifest parse / version validation stays outside the retry so deterministic errors fail fast.downloadWithRetry, but the manifest fetch was a singlehttp.DefaultClient.Do— a transient GitHub CDN failure (a 504 was observed in production) hard-failed the daemon at boot. This closes that last unguarded fetch in the boot path.Validation
go build ./...,go vet ./internal/bootstrap/,go test ./internal/bootstrap/all pass.TestFetchManifest_RetriesTransient(two 504s then success); the existing 500 error test still surfaces the error (now after bounded retries, with backoff compressed in tests).Notes for Reviewers
downloadRetryBackoffis a packagevarso tests compress it; parse/version errors are deliberately not retried.FetchManifest(ctx)signature unchanged; retry logging usesslog.Warn.