Skip to content

fix(mcp): refresh tools on list_changed notification#5604

Open
shizhigu wants to merge 4 commits intolivekit:mainfrom
shizhigu:codex/mcp-tool-list-changed
Open

fix(mcp): refresh tools on list_changed notification#5604
shizhigu wants to merge 4 commits intolivekit:mainfrom
shizhigu:codex/mcp-tool-list-changed

Conversation

@shizhigu
Copy link
Copy Markdown

@shizhigu shizhigu commented Apr 30, 2026

Fixes #5378.

Why

MCP servers can advertise new or removed tools at runtime via notifications/tools/list_changed. Today MCPServer ignores that notification, so an agent keeps using the stale tool list for the rest of the session — calls to removed tools fail and newly added tools are invisible until restart.

What changed

  • MCPServer listens for notifications/tools/list_changed, invalidates its cached list_tools() result, and fans out to subscribed toolsets.
  • MCPServer.list_tools() keeps the cache dirty if a notification arrives while a fetch is in flight, so direct callers refetch instead of treating an older snapshot as clean.
  • MCPToolset reloads tools in a background task: coalesces duplicate notifications, retries up to 4 attempts with 1s/2s/4s exponential backoff, preserves user filter_tools predicates across reloads, and unregisters on close.
  • On terminal reload failure, the toolset fires a callback that AgentActivity bridges to AgentSession.emit("error", ...), so operators can detect that the agent's tool view has gone stale.
  • AgentActivity subscribes to Toolset changes, refreshes the realtime session and chat context under _refresh_lock, and inserts an AgentConfigUpdate describing the diff. The public update_tools() path still records a no-op AgentConfigUpdate when the requested tool set is unchanged.
  • _close_session cancels refresh tasks and takes _refresh_lock before closing realtime resources, so an in-flight tool publish cannot race with realtime close.
  • The publish path is extracted into _publish_tools_change, used by both update_tools() and background refresh.
  • _handle_message passes Exception-typed messages through anyio.lowlevel.checkpoint(), matching the MCP SDK default handler so cooperative cancellation propagates.

No public API changes; new callbacks are internal plumbing.

Test plan

Automated

  • uv run --extra mcp pytest tests/test_mcp.py
    • 19 passed
  • uv run ruff check livekit-agents/livekit/agents/voice/agent_activity.py livekit-agents/livekit/agents/llm/mcp.py tests/test_mcp.py
  • make check
  • Earlier branch run: uv run --extra mcp pytest tests/test_tools.py
    • 37 passed
  • Earlier branch run: make unit-tests
    • 626 passed, 2 skipped; tests/test_room.py setup failed locally because this machine does not have the livekit-server binary on PATH.

New coverage in tests/test_mcp.py includes:

  • notification during initial list_tools() does not get missed
  • notification during base MCPServer.list_tools() keeps the cache dirty
  • toolset reload retries transient failures and surfaces terminal failures
  • realtime/chat-context refresh serializes concurrent publishers
  • close waits for an in-flight tool publish before closing realtime resources
  • public no-op update_tools() still records an audit AgentConfigUpdate
  • post-close update_tools() is a no-op
  • subscriptions are removed on close
  • Exception-typed MCP messages preserve cancellation behavior

End-to-end smoke against a real subprocess MCP server

Wrote a minimal MCP stdio server (using mcp.server.lowlevel.Server + ServerSession.send_tool_list_changed) that mutates its tool list 1s after start, then drove MCPServerStdio + MCPToolset against it in three modes (_RELOAD_RETRY_DELAYS patched to 0/0.1/0.1 for speed):

=== scenario: happy ===
  initial: ['echo']
  after_change_observed: True
  final: ['ping', 'shout']
  reload_failures: []

=== scenario: retry ===
  initial: ['echo']
  after_change_observed: True
  final: ['ping', 'shout']
  reload_failures: []

=== scenario: terminal ===
  initial: ['echo']
  after_change_observed: True
  final: ['echo']
  reload_failures: ['McpError']

Notes

  • tests/test_mcp.py is added to the unit-tests make target so it runs in CI.
  • Reload uses bounded retry (4 attempts, 1s/2s/4s backoff). On exhaustion the previous tool set is preserved and the failure surfaces via AgentSession's error event.
  • _refresh_lock is separate from self._lock because _close_session holds self._lock while cancelling refresh tasks; using one lock for both would deadlock.
  • The reload-failure ErrorEvent is tagged with source=<MCPServer> (not the agent) so listeners with multiple MCP servers can identify which one went stale.
  • The error event is one-shot per terminal failure; there is no paired "recovered" event when the next reload succeeds. Operators who need recovery confirmation should poll the agent's tool list. Tracked as a follow-up if it becomes painful in practice.
  • _publish_tools_change re-checks _closed after acquiring _refresh_lock so a publish queued behind the lock cannot wake into a closing realtime session. _request_tools_reload bails if the toolset has already unsubscribed, closing a snapshot-iteration race in MCPServer._handle_tool_list_changed.
  • anyio.lowlevel.checkpoint() runs at the top of _handle_message for every message (not just Exception-typed), matching the MCP SDK default handler so notification bursts don't starve the receive loop of cancellation points.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

shizhigu and others added 2 commits April 29, 2026 19:58
Robustness pass on the tool list_changed handling added in 93315ac:

- MCPToolset reload now retries with exponential backoff (1s/2s/4s, 4
  attempts total) before giving up, instead of failing on the first
  transient error and stranding the agent on a stale tool list.
- On terminal failure, MCPServer fires a new _reload_failed callback that
  AgentActivity bridges to AgentSession's "error" event so operators can
  react. Previous behavior only logged.
- _handle_message now passes Exception-typed messages through
  anyio.lowlevel.checkpoint() (matching the MCP SDK default handler) so
  cooperative cancellation propagates through bursts of notifications.
- Extract _publish_tools_change helper shared between the public
  update_tools API and the background refresh task. Eliminates ~80%
  duplication and standardizes the fail-safe ordering (rt_session first,
  chat_ctx insert last).
- Add a dedicated _refresh_lock so concurrent calls to update_tools and
  the toolset refresh task serialize their writes to _chat_ctx and
  rt_session.update_tools. Cannot reuse the main _lock because it is held
  across long operations like _close_session and would deadlock the
  cancel_and_wait of the refresh task.
- Drop the per-toolset closure in _sync_toolset_change_subscriptions; the
  bound method self._request_toolset_refresh has the right signature.

Tests: 5 new cases in tests/test_mcp.py cover retry success, retry
exhaustion + callback fire, AgentSession error emission, refresh-lock
serialization, and the cancellation pass-through. Existing test for
"keeps existing tools when reload fails" updated for the new 4-attempt
retry budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shizhigu shizhigu force-pushed the codex/mcp-tool-list-changed branch from 10e0236 to 233c214 Compare April 30, 2026 04:07
Second-pass review hardening on top of e4024b1:

- ErrorEvent emitted for a terminal MCP reload now carries
  source=<failing MCPServer>, not source=self._agent. Operators with
  multiple MCP servers per agent can now identify which server went
  stale. Bridge uses functools.partial to bind the server identity at
  registration time, so MCPServer's callback signature is unchanged.
- _publish_tools_change re-checks _closed after acquiring _refresh_lock.
  Without this, a publish queued behind the lock could be awoken by a
  cancelled refresh task and write to an rt_session that _close_session
  is about to tear down.
- _handle_message hoists anyio.lowlevel.checkpoint() to fire for every
  message (matches the MCP SDK default handler). Previously only the
  Exception branch yielded, starving the receive loop of cancellation
  points during notification bursts.
- _request_tools_reload bails early if the toolset has already
  unsubscribed from the server's callback set. Closes a snapshot-
  iteration race in MCPServer._handle_tool_list_changed where a
  callback fires after aclose() removed it.

Tests: 4 new cases (source-is-server, skip-emit-after-close, skip-
publish-after-close, checkpoint-on-notification, request-reload-
after-unsubscribe) plus a snapshot-equality assertion on the existing
lock-serialization test. 23/23 in tests/test_mcp.py pass; no
regressions in tests/test_tools.py (37) or tests/test_agent_session.py
(20).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for MCP tool list changed notification

1 participant