Skip to content

Conversation

@deansheather
Copy link
Member

Prevents thundering herd issues with SSH connections by adding health tracking, exponential backoff, and singleflighting to the connection pool.

Changes

  • SSHConnectionPool class with:

    • Health tracking (healthy/unhealthy/unknown states)
    • Exponential backoff: 1s → 5s → 10s → 20s → 40s → 60s (cap)
    • Singleflighting: concurrent probes to same host share one attempt
    • Fast-path for known-healthy connections (no re-probe)
  • Integration points:

    • SSHRuntime.exec() and execSSHCommand() call acquireConnection()
    • PTYService calls acquireConnection() before spawning SSH terminals

Flow

acquireConnection() → in backoff? → throw immediately
                   → known healthy? → return immediately  
                   → inflight probe? → wait on existing promise
                   → start probe → success? → mark healthy, return
                                 → failure? → mark failed + backoff, throw

Generated with mux

@deansheather deansheather force-pushed the ssh-mux-connection-backoff branch from cfa5fa8 to f611745 Compare December 5, 2025 03:19
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@deansheather deansheather force-pushed the ssh-mux-connection-backoff branch from f611745 to 53de00a Compare December 5, 2025 03:23
@deansheather
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@deansheather deansheather force-pushed the ssh-mux-connection-backoff branch from 53de00a to 997109f Compare December 8, 2025 03:54
Prevents thundering herd issues with SSH connections by:

- Adding SSHConnectionPool class with health tracking
- Implementing exponential backoff (1s → 5s → 10s → 20s → 40s → 60s cap)
- Singleflighting concurrent connection attempts to same host
- Probing unknown connections before first use
- Skipping probes for known-healthy connections

Integration points:
- SSHRuntime.exec() and execSSHCommand() call acquireConnection()
- PTYService calls acquireConnection() before spawning SSH terminals

_Generated with mux_
- Remove srcBaseDir from connection key: workspaces on same host now
  share health tracking and control socket multiplexing
- Fix double markFailedByKey on timeout: add timedOut flag to prevent
  both timeout callback and on('close') from incrementing failures
- Add HEALTHY_TTL_MS (5 min): stale healthy connections get re-probed
  when network may have silently degraded
- Fix singleflighting test: actually test concurrent probes share
  one failure count instead of pre-marking healthy
@deansheather deansheather force-pushed the ssh-mux-connection-backoff branch from 32fdb79 to 7ae97cb Compare December 9, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant