Skip to content

StreamableHTTP server accumulates CLOSE_WAIT sockets behind reverse proxy due to missing disconnect cleanup #2958

Description

@simpson1045

Initial Checks

Description

When running a StreamableHTTP MCP server behind a reverse proxy (nginx, Nginx Proxy Manager, etc.), TCP sockets accumulate in CLOSE_WAIT state after each tool call. After 10-20 calls, the server becomes unresponsive and stops accepting new connections, requiring a full process restart.

The root cause is in StreamableHTTPServerTransport._handle_post_request() — when the reverse proxy closes its side of the connection after a completed request, the sse_writer coroutine remains blocked on an in-memory stream (request_stream_reader) and never exits. Because the ASGI callable never fully returns, uvicorn never closes the server side of the socket, leaving it in CLOSE_WAIT indefinitely.

Environment

  • MCP Python SDK: 1.28.0
  • Python: 3.13
  • Server: uvicorn (via mcp.run(transport="streamable-http"))
  • Reverse proxy: Nginx Proxy Manager (nginx-based)
  • OS: Windows 11 (but the bug is platform-independent)

To Reproduce

  1. Set up a FastMCP server with StreamableHTTP transport behind any reverse proxy (nginx, NPM, Caddy, etc.)
  2. Connect a client and make several tool calls
  3. Monitor sockets: netstat -ano | findstr <port>
  4. Observe CLOSE_WAIT connections accumulating after each call
  5. After ~10-20 calls, the server stops responding to new connections
TCP    0.0.0.0:8849           0.0.0.0:0              LISTENING       43056
TCP    192.168.0.150:8849     192.168.0.248:41992    CLOSE_WAIT      43056
TCP    192.168.0.150:8849     192.168.0.248:42224    CLOSE_WAIT      43056
TCP    192.168.0.150:8849     192.168.0.248:42264    CLOSE_WAIT      43056
TCP    192.168.0.150:8849     192.168.0.248:42274    CLOSE_WAIT      43056
TCP    192.168.0.150:8849     192.168.0.248:42284    CLOSE_WAIT      43056
TCP    192.168.0.150:8849     192.168.0.248:42292    CLOSE_WAIT      43056
TCP    192.168.0.150:8849     192.168.0.248:42394    CLOSE_WAIT      43056
TCP    192.168.0.150:8849     192.168.0.248:42402    CLOSE_WAIT      43056
...

Root Cause Analysis

In streamable_http.py, the SSE response path in _handle_post_request() (~line 380):

async with anyio.create_task_group() as tg:
    tg.start_soon(response, scope, receive, send)
    session_message = self._create_session_message(message, request, request_id, protocol_version)
    await writer.send(session_message)

When the reverse proxy closes the TCP connection:

  1. EventSourceResponse (from sse-starlette) detects the disconnect via ASGI receive() and its task completes
  2. However, sse_writer() is still alive inside the response, blocked on async for event_message in request_stream_reader — this is an in-memory stream, not a socket, so it has no awareness of the TCP disconnect
  3. Nobody closes request_stream_reader, so sse_writer hangs indefinitely
  4. The ASGI callable never fully returns because the streams are never cleaned up
  5. uvicorn never sends FIN on the server side → socket remains in CLOSE_WAIT

Additionally, session_idle_timeout in StreamableHTTPSessionManager defaults to None (no timeout), meaning orphaned sessions are never reaped. The docstring itself recommends 1800 seconds for most deployments, but the default doesn't reflect this.

Proposed Fix

1. Transport layer — disconnect-aware cleanup in _handle_post_request()

Wrap the response() call so that when it returns (whether from normal completion or client disconnect), the request streams are immediately cleaned up, unblocking sse_writer:

async with anyio.create_task_group() as tg:
    async def run_response_with_cleanup():
        try:
            await response(scope, receive, send)
        finally:
            # Response finished — client disconnected or normal completion.
            # Close request streams to unblock sse_writer if it's still
            # waiting on the in-memory stream.
            await self._clean_up_memory_streams(request_id)
            writer_ref = self._sse_stream_writers.pop(request_id, None)
            if writer_ref:
                writer_ref.close()

    tg.start_soon(run_response_with_cleanup)
    session_message = self._create_session_message(
        message, request, request_id, protocol_version
    )
    await writer.send(session_message)

When the client disconnects:

  1. response() returns (EventSourceResponse detects disconnect)
  2. finally block fires, closes request streams
  3. sse_writer unblocks with ClosedResourceError (which it already catches gracefully)
  4. sse_writer exits cleanly
  5. ASGI callable returns → uvicorn sends FIN → socket closes properly

2. Session manager — sensible default for session_idle_timeout

In streamable_http_manager.py, change the default from None to 1800 (30 minutes), consistent with the existing docstring recommendation:

session_idle_timeout: float | None = 1800,

This provides a safety net: even if disconnect detection misses an edge case, orphaned sessions will eventually be cleaned up rather than accumulating indefinitely.

Related Issues

Impact

This affects every StreamableHTTP MCP server deployed behind a reverse proxy. Direct connections (localhost) are less affected because the OS handles TCP teardown more aggressively, but the underlying resource leak (orphaned in-memory streams and sessions) still exists.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions