Skip to content

fix(publisher): treat GitHub device-flow slow_down as retriable#1290

Open
developer-ishan wants to merge 4 commits into
modelcontextprotocol:mainfrom
developer-ishan:fix/publisher-slow-down
Open

fix(publisher): treat GitHub device-flow slow_down as retriable#1290
developer-ishan wants to merge 4 commits into
modelcontextprotocol:mainfrom
developer-ishan:fix/publisher-slow-down

Conversation

@developer-ishan
Copy link
Copy Markdown

Summary

  • Handle GitHub's OAuth device-flow slow_down response per RFC 8628 §3.5 — treat it as retriable and bump the polling interval by 5 seconds, instead of bailing with a fatal error.

Fixes #1289.

Why

pollForToken in cmd/publisher/auth/github-at.go only treats authorization_pending as retriable. Every other response from GitHub's token endpoint — including the non-terminal slow_down signal — falls through to:

return "", fmt.Errorf("token request failed: %s", tokenResp.Error)

So users hitting GitHub's polling rate-limit get:

Error: login failed: error polling for token: token request failed: slow_down

…and have to re-run mcp-publisher login github from scratch, which is especially painful because slow_down is more likely when the user takes more than a few seconds to enter and authorize the device code.

Per RFC 8628 §3.5, slow_down is defined as:

A variant of "authorization_pending", the authorization request is still pending and polling should continue, but the interval MUST be increased by 5 seconds for this and all subsequent requests.

Change

Treat slow_down the same as authorization_pending but increase interval by 5 seconds before sleeping, matching the RFC exactly.

A possible follow-up (not in this PR, to keep the diff minimal): seed the initial interval from DeviceCodeResponse.Interval (already parsed but currently unused) instead of hardcoding 5. Happy to do that here or as a separate PR if preferred.

Test plan

  • go build ./cmd/publisher/... — clean
  • go test ./cmd/publisher/auth/... — passes
  • gofmt -d and go vet ./cmd/publisher/auth/... — clean
  • CI make check (lint + integration tests)

I could not exercise pollForToken from a unit test without refactoring GitHubAccessTokenURL to be injectable, which felt out of scope for a bug-fix PR. Glad to add that refactor + a table-driven test for authorization_pending / slow_down / fatal errors in a follow-up if maintainers want it.

GitHub's device-flow token endpoint returns slow_down when the client
polls too frequently. Per RFC 8628 §3.5 this is a non-terminal signal:
the client must increase its polling interval by 5 seconds and keep
polling. The current pollForToken implementation only handles
authorization_pending as retriable and bails on every other error,
turning a routine rate-limit response into an unrecoverable
"login failed: error polling for token: token request failed: slow_down".

Treat slow_down the same as authorization_pending, bumping the
interval per the RFC.

Fixes modelcontextprotocol#1289
@P4ST4S
Copy link
Copy Markdown

P4ST4S commented May 30, 2026

Hi @developer-ishan, nice catch on this one, the fix is textually correct and strictly RFC 8628 §3.5 compliant. Two small suggestions before a maintainer reviews:

1. Add a regression test

github_at_test.go already has httptest-based mocks (the pattern used in TestLogin_* tests), and pollForToken has no test coverage today. Two table-driven cases would close the loop:

  • slow_down then success: confirms the poll continues and the success path is reached
  • slow_down then authorization_pending then success: confirms the interval increment compounds correctly across mixed retryable errors (RFC 8628 §3.5 doesn't reset the interval between slow_down instances)

Both are deterministic with a fake clock or by passing a tiny initial interval (the value isn't capped today, so even interval = 0 works in a test).

2. Optional: cap the interval

RFC 8628 §3.5 says only "the interval MUST be increased by 5 seconds", but it leaves the maximum to the implementation. If a misbehaving auth server returns slow_down repeatedly, the current code grows the interval unboundedly. Not a regression in this PR (the original code couldn't slow down at all), but worth considering a sanity cap (say, 60s) before merging. Equally valid to leave it for a follow-up.

Other than that, the patch is minimal, correct, and reads exactly like what the spec asks for. 👍

Thanks for the careful RFC reference in the issue, it made this PR very easy to review.

@developer-ishan
Copy link
Copy Markdown
Author

@P4ST4S good suggestions, let me check and test both these cases.

developer-ishan and others added 2 commits May 31, 2026 20:01
Address review feedback on the device-flow slow_down fix:

- Add regression tests for pollForToken (previously untested):
  - slow_down then success: confirms polling continues to success
  - slow_down / authorization_pending / slow_down then success: confirms
    the RFC 8628 §3.5 interval increment compounds across mixed retryable
    errors and is not reset between slow_down instances
  - repeated slow_down: confirms the interval is capped
  - non-retryable error: confirms polling stops immediately
- Cap the polling interval at 60s so a misbehaving auth server returning
  slow_down repeatedly cannot grow it unboundedly (RFC 8628 §3.5 mandates
  the +5s increase but leaves the maximum to the implementation).

To make pollForToken testable, the access-token URL, initial interval, and
sleep function are now provider fields wired to defaults in the constructor;
tests inject a mock server and capture the back-off sequence deterministically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@developer-ishan
Copy link
Copy Markdown
Author

Thanks for the thorough review, @P4ST4S! Both points are addressed in 9c5e9fc.

1. Regression testspollForToken was untested, so I made the access-token URL, initial interval, and sleep function provider fields (wired to the existing defaults in the constructor) so a test can point at an httptest mock and capture the back-off sequence without real delays. Four table-style cases:

  • slow_down → success: polling continues and the success path is reached
  • slow_downauthorization_pendingslow_down → success: confirms the interval compounds (5 → 5 → 10) and isn't reset between slow_down instances per RFC 8628 §3.5
  • repeated slow_down: confirms the interval cap holds
  • non-retryable error: confirms polling stops immediately with no back-off

2. Interval cap — added a 60s ceiling so a misbehaving auth server returning slow_down repeatedly can't grow the interval unboundedly. The RFC mandates the +5s increase but leaves the max to the implementation, so this is just a sanity bound.

gofmt, go vet, go build, the full auth package tests, and golangci-lint (v2.11.4, matching CI) all pass. Let me know if you'd prefer the cap split into a follow-up instead.

@P4ST4S
Copy link
Copy Markdown

P4ST4S commented May 31, 2026

This is an excellent turnaround. A few specific things that stood out:

Capturing the sleep sequence rather than just zeroing it out is the right call, the tests now verify the correctness of the back-off (5 → 5 → 10) rather than just that the loop terminates. That's the test that would catch a future regression where someone "simplifies" the logic by resetting the interval between slow_down instances.

TestPollForToken_MixedRetryableErrorsCompoundInterval is exactly the case I had in mind, and the inline comments mapping the response sequence to the expected interval evolution (0 → 5 → stays 5 → 5 → 10) make the RFC semantics readable from the test alone. Nice.

The 60s cap test pins both of the last two back-offs at the ceiling, which is tighter than just assert.LessOrEqual would be. That catches an off-by-one in the cap clamp.

TestPollForToken_FatalErrorStopsPolling wasn't in my suggestions but is the right reflex, verifying that a fatal error doesn't get masked behind a retry is exactly the kind of safety test a security-adjacent codebase needs.

Minor, non-blocking observation: RFC 8628 §3.5 lists five terminal error codes (access_denied, expired_token, unsupported_grant_type, invalid_client, invalid_grant). The current code correctly treats all of them as fatal by default, your FatalErrorStopsPolling test covers the branch logic with one representative case, which is fine. If you wanted to be extra-thorough, a table-driven variant exercising each code would lock in that none of them ever slip into the retry path, but the existing single-case test already protects against the regression that would matter most (a stray case "expired_token": continue etc.).

LGTM from an external reviewer's perspective. Thanks for the careful work and the validation against golangci-lint matching CI, that level of pre-flight detail makes maintainer review much faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mcp-publisher: device-flow login treats GitHub slow_down as fatal instead of backing off

3 participants