sandbox: cap API and SSH timeouts so unenabled regions fail fast#5565
Open
akshaysingla-db wants to merge 1 commit into
Open
sandbox: cap API and SSH timeouts so unenabled regions fail fast#5565akshaysingla-db wants to merge 1 commit into
akshaysingla-db wants to merge 1 commit into
Conversation
…ns fail fast In regions where the sandbox manager service isn't deployed, the gateway holds the connection without returning a structured error, so `databricks sandbox <anything>` would hang indefinitely behind the SDK's default 60s inactivity timeout (and ssh's ~75s connect default). Tighten both to 10s and translate the API timeout into a user-facing message that hints at the most likely cause. Co-authored-by: Isaac
Contributor
Waiting for approvalBased on git history, these people are best suited to review:
Eligible reviewers: Suggestions based on git history. See OWNERS for ownership rules. |
samhuan-db
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
databricks sandbox <anything>would hang for ~60s on the API call and another ~75s insshbefore giving up.context.WithTimeoutinside a newdowrapper on*sandboxAPI), and add-o ConnectTimeout=10to thesshargv.context.DeadlineExceededfrom the API path into"sandbox API timed out after 10s — this region may not have sandbox enabled, or the manager is unreachable", so the user gets an actionable hint instead of a raw deadline error.Long flows (cold start, etc.) are made of many short calls polled in a loop, so each call still has its own 10s budget while the outer wait loop's own deadline (
startWaitTimeout = 10m) keeps governing the overall flow.Considered routing through the SDK's
Config.HTTPTimeoutSecondsinstead, but that path rewrites a timeout into a freshly-allocatedurl.Errorwhose inner error no longer satisfieserrors.Is(err, context.DeadlineExceeded)— making clean error detection require string matching, which the repo style guide forbids. Thecontext.WithTimeoutapproach composes fine with the SDK's 60s default backstop.Test plan
go test ./cmd/sandbox/...— newTestSandboxAPIDoTranslatesTimeoutexercises the wrapper against anhttptestserver that hangs and asserts both the "timed out" string and the region hint; existing tests pass unchanged.TestBuildSSHArgsBaseFlagsnow assertsConnectTimeout=10is in the argv.databricks sandbox listagainst a profile in a region without the manager and confirm the clear timeout message arrives in ~10s instead of hanging.This pull request and its description were written by Isaac.