tools: sse-timeout-probe for UI-01 / UI-02 empirical trace#292
Draft
jamesbroadhead wants to merge 1 commit intodatabricks:mainfrom
Draft
tools: sse-timeout-probe for UI-01 / UI-02 empirical trace#292jamesbroadhead wants to merge 1 commit intodatabricks:mainfrom
jamesbroadhead wants to merge 1 commit intodatabricks:mainfrom
Conversation
Adds a tiny TS reproducer for the SSE idle-timeout gap reported in ES-1742245
(field-facing "AI Value Roadmap" app dropping ~75% of SSE connections
through the Apps reverse proxy).
Two files:
- probe.ts — opens one SSE connection per duration in a configurable
ladder; records lifetime, bytes, and how the connection
ended (completed / server-close / network-error).
- server.ts — companion server that responds on /sse-probe, holding the
connection open for the requested duration with an optional
heartbeat comment. Deploy as an app entrypoint to measure
the Databricks-hosted ceiling vs an EKS / localhost control.
- README.md — usage, what to look for (sharp cliff at 60s/90s/120s/180s
maps back to apps/gateway vs oauth2-proxy vs DP ApiProxy
envoy), and how heartbeat behavior distinguishes idle
timeouts from absolute request timeouts.
Why this is a separate PR: UI-01's source doc and ES-1742245 disagree on
whether the drop is timeout-driven or buffering-driven. Running this probe
against a dogfood app answers that question empirically and tells us which
fix to pursue (per-route request_timeout raise, heartbeat middleware, or
buffering / HTTP/2 hardening). Draft because the fix itself depends on
those results.
Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a tiny TS reproducer (
tools/sse-timeout-probe/) for the Databricks Apps SSE idle-timeout gap reported in ES-1742245 and captured as UI-01 / UI-02 in the internal EMEA Apps "gaps that matter" doc.Why this is a separate PR
UI-01's source doc and the ES ticket disagree on whether the drop is timeout-driven or buffering-driven:
We can't pick the fix until we know which diagnosis is correct. Running this probe against a dogfood app answers the question deterministically and tells us whether to:
request_timeoutper-route onapps/gateway, orFlushInterval/ HTTP/2 end-to-end acrossapps-gateway+oauth2-proxy+apps/runtime.What's in the PR
tools/sse-timeout-probe/probe.ts— opens one SSE connection per duration in a configurable ladder; records lifetime, bytes, and how the connection ended (completed/server-close/network-error).tools/sse-timeout-probe/server.ts— companion server on/sse-probethat holds the connection open for the requested duration with optional heartbeat. Deploy as an app entrypoint to measure the Databricks-hosted ceiling vs an EKS / localhost control.tools/sse-timeout-probe/README.md— usage, what to look for (sharp cliff at 60s/90s/120s/180s maps back toapps/gatewayvsoauth2-proxyvs DP ApiProxy envoy), and how heartbeat behavior distinguishes idle timeouts from absolute request timeouts.Test plan
server.tsas an app in dogfood; runprobe.tsagainst it and against an EKS / localhost control; capture the result ladders side-by-side.--heartbeat 30000to isolate idle-timeout behavior from absolute request-timeout.Follow-ups explicitly out of scope for this PR
apps/dev-playgroundso probing is onepnpm deployaway.apps/gatewayPR will reference this probe as the regression test.This pull request and its description were written by Claude (claude.ai).