fix(mcp): refresh tools on list_changed notification by shizhigu · Pull Request #5604 · livekit/agents

shizhigu · 2026-04-30T00:23:33Z

Why

MCP servers can advertise new or removed tools at runtime via notifications/tools/list_changed. Today MCPServer ignores that notification, so an agent keeps using the stale tool list for the rest of the session — calls to removed tools fail and newly added tools are invisible until restart.

What changed

MCPServer listens for notifications/tools/list_changed, invalidates its cached list_tools() result, and fans out to subscribed toolsets.
MCPServer.list_tools() keeps the cache dirty if a notification arrives while a fetch is in flight, so direct callers refetch instead of treating an older snapshot as clean.
MCPToolset reloads tools in a background task: coalesces duplicate notifications, retries up to 4 attempts with 1s/2s/4s exponential backoff, preserves user filter_tools predicates across reloads, and unregisters on close.
On terminal reload failure, the toolset fires a callback that AgentActivity bridges to AgentSession.emit("error", ...), so operators can detect that the agent's tool view has gone stale.
AgentActivity subscribes to Toolset changes, refreshes the realtime session and chat context under _refresh_lock, and inserts an AgentConfigUpdate describing the diff. The public update_tools() path still records a no-op AgentConfigUpdate when the requested tool set is unchanged.
_close_session cancels refresh tasks and takes _refresh_lock before closing realtime resources, so an in-flight tool publish cannot race with realtime close.
The publish path is extracted into _publish_tools_change, used by both update_tools() and background refresh.
_handle_message passes Exception-typed messages through anyio.lowlevel.checkpoint(), matching the MCP SDK default handler so cooperative cancellation propagates.

No public API changes; new callbacks are internal plumbing.

Test plan

Automated

uv run --extra mcp pytest tests/test_mcp.py
- 19 passed
uv run ruff check livekit-agents/livekit/agents/voice/agent_activity.py livekit-agents/livekit/agents/llm/mcp.py tests/test_mcp.py
make check
Earlier branch run: uv run --extra mcp pytest tests/test_tools.py
- 37 passed
Earlier branch run: make unit-tests
- 626 passed, 2 skipped; tests/test_room.py setup failed locally because this machine does not have the livekit-server binary on PATH.

New coverage in tests/test_mcp.py includes:

notification during initial list_tools() does not get missed
notification during base MCPServer.list_tools() keeps the cache dirty
toolset reload retries transient failures and surfaces terminal failures
realtime/chat-context refresh serializes concurrent publishers
close waits for an in-flight tool publish before closing realtime resources
public no-op update_tools() still records an audit AgentConfigUpdate
post-close update_tools() is a no-op
subscriptions are removed on close
Exception-typed MCP messages preserve cancellation behavior

End-to-end smoke against a real subprocess MCP server

Wrote a minimal MCP stdio server (using mcp.server.lowlevel.Server + ServerSession.send_tool_list_changed) that mutates its tool list 1s after start, then drove MCPServerStdio + MCPToolset against it in three modes (_RELOAD_RETRY_DELAYS patched to 0/0.1/0.1 for speed):

=== scenario: happy ===
  initial: ['echo']
  after_change_observed: True
  final: ['ping', 'shout']
  reload_failures: []

=== scenario: retry ===
  initial: ['echo']
  after_change_observed: True
  final: ['ping', 'shout']
  reload_failures: []

=== scenario: terminal ===
  initial: ['echo']
  after_change_observed: True
  final: ['echo']
  reload_failures: ['McpError']

Notes

tests/test_mcp.py is added to the unit-tests make target so it runs in CI.
Reload uses bounded retry (4 attempts, 1s/2s/4s backoff). On exhaustion the previous tool set is preserved and the failure surfaces via AgentSession's error event.
_refresh_lock is separate from self._lock because _close_session holds self._lock while cancelling refresh tasks; using one lock for both would deadlock.
The reload-failure ErrorEvent is tagged with source=<MCPServer> (not the agent) so listeners with multiple MCP servers can identify which one went stale.
The error event is one-shot per terminal failure; there is no paired "recovered" event when the next reload succeeds. Operators who need recovery confirmation should poll the agent's tool list. Tracked as a follow-up if it becomes painful in practice.
_publish_tools_change re-checks _closed after acquiring _refresh_lock so a publish queued behind the lock cannot wake into a closing realtime session. _request_tools_reload bails if the toolset has already unsubscribed, closing a snapshot-iteration race in MCPServer._handle_tool_list_changed.
anyio.lowlevel.checkpoint() runs at the top of _handle_message for every message (not just Exception-typed), matching the MCP SDK default handler so notification bursts don't starve the receive loop of cancellation points.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Robustness pass on the tool list_changed handling added in 93315ac: - MCPToolset reload now retries with exponential backoff (1s/2s/4s, 4 attempts total) before giving up, instead of failing on the first transient error and stranding the agent on a stale tool list. - On terminal failure, MCPServer fires a new _reload_failed callback that AgentActivity bridges to AgentSession's "error" event so operators can react. Previous behavior only logged. - _handle_message now passes Exception-typed messages through anyio.lowlevel.checkpoint() (matching the MCP SDK default handler) so cooperative cancellation propagates through bursts of notifications. - Extract _publish_tools_change helper shared between the public update_tools API and the background refresh task. Eliminates ~80% duplication and standardizes the fail-safe ordering (rt_session first, chat_ctx insert last). - Add a dedicated _refresh_lock so concurrent calls to update_tools and the toolset refresh task serialize their writes to _chat_ctx and rt_session.update_tools. Cannot reuse the main _lock because it is held across long operations like _close_session and would deadlock the cancel_and_wait of the refresh task. - Drop the per-toolset closure in _sync_toolset_change_subscriptions; the bound method self._request_toolset_refresh has the right signature. Tests: 5 new cases in tests/test_mcp.py cover retry success, retry exhaustion + callback fire, AgentSession error emission, refresh-lock serialization, and the cancellation pass-through. Existing test for "keeps existing tools when reload fails" updated for the new 4-attempt retry budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Second-pass review hardening on top of e4024b1: - ErrorEvent emitted for a terminal MCP reload now carries source=<failing MCPServer>, not source=self._agent. Operators with multiple MCP servers per agent can now identify which server went stale. Bridge uses functools.partial to bind the server identity at registration time, so MCPServer's callback signature is unchanged. - _publish_tools_change re-checks _closed after acquiring _refresh_lock. Without this, a publish queued behind the lock could be awoken by a cancelled refresh task and write to an rt_session that _close_session is about to tear down. - _handle_message hoists anyio.lowlevel.checkpoint() to fire for every message (matches the MCP SDK default handler). Previously only the Exception branch yielded, starving the receive loop of cancellation points during notification bursts. - _request_tools_reload bails early if the toolset has already unsubscribed from the server's callback set. Closes a snapshot- iteration race in MCPServer._handle_tool_list_changed where a callback fires after aclose() removed it. Tests: 4 new cases (source-is-server, skip-emit-after-close, skip- publish-after-close, checkpoint-on-notification, request-reload- after-unsubscribe) plus a snapshot-equality assertion on the existing lock-serialization test. 23/23 in tests/test_mcp.py pass; no regressions in tests/test_tools.py (37) or tests/test_agent_session.py (20). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(mcp): refresh tools on list_changed notification

93315ac

devin-ai-integration Bot reviewed Apr 30, 2026

View reviewed changes

shizhigu and others added 2 commits April 29, 2026 19:58

fix(mcp): close notification race gaps

233c214

shizhigu force-pushed the codex/mcp-tool-list-changed branch from 10e0236 to 233c214 Compare April 30, 2026 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): refresh tools on list_changed notification#5604

fix(mcp): refresh tools on list_changed notification#5604
shizhigu wants to merge 4 commits intolivekit:mainfrom
shizhigu:codex/mcp-tool-list-changed

shizhigu commented Apr 30, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shizhigu commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Test plan

Automated

End-to-end smoke against a real subprocess MCP server

Notes

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shizhigu commented Apr 30, 2026 •

edited

Loading