Decouple SSE prefill keepalive from progress callback#224
Open
njbrake wants to merge 1 commit into
Open
Conversation
The existing keepalive only fired from inside the prefill_chunk progress callback, so a stall inside a single chunk could leave the socket silent for many minutes (observed: 962s spent on the last 107 tokens of a 1046 token prefill). Clients with body-idle timeouts drop the connection, and the final stream write fails with "client stream write failed". Add a small wall-clock keepalive thread that runs alongside prefill, ticks roughly once a second, and writes ": prefill\n\n" whenever 5s have elapsed since the last keepalive event, regardless of whether progress callbacks are firing. The thread also sends the SSE headers if the callback has not done so yet. The thread and the callback share a mutex so fd writes and shared state (headers_sent, stream_failed, last_keepalive) never interleave. Attached and detached around every ds4_session_sync call site: the main prefill, the cold-prefix prefill, and the tool checkpoint rebuild. Fixes antirez#222
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This would unblock the issues that I was seeing but I'm also curious if you want this to be a configurable parameter to enable/disable, or if you would like me to keep this in my own fork instead of merging into your repo.
Note: this PR description was drafted by Claude via back-and-forth with @njbrake. The reasoning and decisions are his; the prose is Claude's.
Fixes #222.
Summary
The current keepalive only fires from inside the
prefill_chunkprogress callback. When a single prefill chunk stalls internally for many minutes (observed: 962s spent on the last 107 tokens ofa 1046 token prefill), no progress callback fires, no
: prefill\n\ncomment goes out, and clients with body-idle timeouts drop the socket. The server then logsclient stream write failedwhen it tries to emit the first gen token.
This change adds a small wall-clock keepalive thread that runs alongside prefill, ticks roughly once a second, and writes
: prefill\n\nwhenever 5 seconds have elapsed since the last keepaliveevent, regardless of whether progress callbacks are firing. The thread also handles the initial SSE headers if the callback has not done so yet. The thread and the callback share a mutex so fd
writes and the shared
headers_sent/stream_failed/last_keepalivefields never interleave.Attached and detached around every
ds4_session_synccall site: main prefill, cold-prefix prefill, and tool-checkpoint rebuild.