Skip to content

FastMCP alignment: new tools, prompts, and middleware#15

Open
tony wants to merge 103 commits intomainfrom
2026-04-follow-ups
Open

FastMCP alignment: new tools, prompts, and middleware#15
tony wants to merge 103 commits intomainfrom
2026-04-follow-ups

Conversation

@tony
Copy link
Copy Markdown
Member

@tony tony commented Apr 13, 2026

Summary

FastMCP alignment for libtmux-mcp: new tool families, prompt recipes, middleware stack, bounded outputs, and correctness fixes.

Breaking changes

  • search_panes returns SearchPanesResult (was list[PaneContentMatch]). Matches move to .matches; new pagination fields. Migration: for m in search_panes(...).matches.
  • Minimum fastmcp>=3.2.4.

New tools

  • Discoverylist_servers.
  • Waitswait_for_text, wait_for_content_change, wait_for_channel, signal_channel. Bounded, cancellable, emit ctx.report_progress / ctx.warning.
  • Buffersload_buffer, paste_buffer, show_buffer, delete_buffer. UUID-namespaced; leaked buffers GC'd on shutdown.
  • Hooks (read-only) — show_hook, show_hooks.
  • Panes / windowssnapshot_pane, pipe_pane, display_message, paste_text, select_pane, swap_pane, select_window, move_window, enter_copy_mode, exit_copy_mode.

New prompts

Four recipes: run_and_wait, diagnose_failing_pane, build_dev_workspace, interrupt_gracefully. Expose as tools with LIBTMUX_MCP_PROMPTS_AS_TOOLS=1.

Middleware

TimingMiddleware, ErrorHandlingMiddleware, AuditMiddleware, SafetyMiddleware, ReadonlyRetryMiddleware, TailPreservingResponseLimitingMiddleware.

Bounded outputs

capture_pane, snapshot_pane, show_buffer take max_lines (default 500) with tail-preserving truncation. Pass max_lines=None to opt out.

Fixes

  • search_panes — neutralize tmux format-string injection.
  • macOS TMUX_TMPDIR self-kill guard — resolve socket via display-message before env fallback.
  • build_dev_workspace prompt — real parameter names, drop post-launch prompt waits, OS-neutral log_command.

Test plan

  • uv run ruff check . && uv run ruff format --check .
  • uv run mypy
  • uv run py.test --reruns 0 — 276 tests pass
  • just build-docs
  • Manual: start with tmux renamed on PATH → clean RuntimeError from lifespan probe
  • Manual: capture_pane on a >50 KB scrollback pane with max_lines=None → head trimmed, tail preserved
  • Manual: search_panes pagination via offset/limit
  • Manual: wait_for_channel + signal_channel round-trip
  • Manual: LIBTMUX_MCP_PROMPTS_AS_TOOLS=1 → prompts in tool list

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 88.77551% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.18%. Comparing base (88fcf6a) to head (3f8f281).

Files with missing lines Patch % Lines
src/libtmux_mcp/_utils.py 84.34% 11 Missing and 7 partials ⚠️
src/libtmux_mcp/tools/server_tools.py 75.38% 15 Missing and 1 partial ⚠️
src/libtmux_mcp/tools/hook_tools.py 81.35% 8 Missing and 3 partials ⚠️
src/libtmux_mcp/tools/wait_for_tools.py 72.50% 11 Missing ⚠️
src/libtmux_mcp/tools/buffer_tools.py 89.01% 8 Missing and 2 partials ⚠️
src/libtmux_mcp/tools/pane_tools/io.py 86.20% 7 Missing and 1 partial ⚠️
src/libtmux_mcp/tools/pane_tools/layout.py 85.71% 4 Missing and 4 partials ⚠️
src/libtmux_mcp/middleware.py 90.47% 3 Missing and 3 partials ⚠️
src/libtmux_mcp/tools/pane_tools/lifecycle.py 86.36% 2 Missing and 1 partial ⚠️
src/libtmux_mcp/server.py 92.59% 2 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
+ Coverage   82.90%   86.18%   +3.27%     
==========================================
  Files          17       30      +13     
  Lines         983     1513     +530     
  Branches      110      182      +72     
==========================================
+ Hits          815     1304     +489     
- Misses        122      155      +33     
- Partials       46       54       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony tony force-pushed the 2026-04-follow-ups branch from 76bec04 to f9d5388 Compare April 14, 2026 00:00
@tony tony marked this pull request as ready for review April 14, 2026 00:20
@tony
Copy link
Copy Markdown
Member Author

tony commented Apr 14, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony changed the title Follow-ups from code review: caller identity, observability, pane split, docs Follow-ups from code review + FastMCP alignment (Phase 2) Apr 14, 2026
tony added a commit that referenced this pull request Apr 15, 2026
…thread around capture_pane

why: In commit 67c3fc8 the wait tools became `async def` so that
     ctx.report_progress could be awaited during long polls. FastMCP
     direct-awaits async tools on the main event loop
     (`fastmcp/tools/function_tool.py:277-278`) — it only uses a
     threadpool for *sync* tools via `call_sync_fn_in_threadpool`.
     That means every blocking `pane.capture_pane()` subprocess.run
     inside the poll loop starves every other concurrent MCP request
     for the duration of the tmux roundtrip (~10s of ms per tick), for
     the full configured timeout.

     A 60-second `wait_for_text` call would pin the entire server for
     60 seconds, delaying list_sessions / capture_pane / anything else.
     Loom code review surfaced this with 100% confidence (Gemini
     Skeptic pass on PR #15); verified against FastMCP source.

what:
- Wrap the three blocking `pane.capture_pane(...)` calls in
  src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`:
  * `wait_for_text` polling call.
  * `wait_for_content_change` initial snapshot.
  * `wait_for_content_change` polling call.
- Inline comment at the first site explains why (future maintainers
  will otherwise see "sync call inside async fn" as a bug and try to
  "fix" it by removing `async def`, reintroducing the blocking path).
- No docstring changes — the `async def` signature already promises
  non-blocking semantics; this commit actually delivers on it.
- Add `test_wait_tools_do_not_block_event_loop` in
  tests/test_pane_tools.py that runs `wait_for_text` against a
  never-matching pattern inside an asyncio.gather with a ticker
  coroutine. Asserts the ticker counter >= 5 over the 300 ms wait
  window — a blocking capture would leave it at 0 or 1. Deliberately
  a generous lower bound to stay robust on slow CI.
tony added a commit that referenced this pull request Apr 15, 2026
…_wait

why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
     wait-for channel name. tmux channels are server-global, so two
     concurrent agents (or parallel prompt renderings from one agent)
     would race: one agent's ``tmux wait-for -S mcp_done`` would
     unblock another's pending ``wait_for_channel("mcp_done")``,
     producing a false positive completion signal.

     Flagged as Critical by GPT Builder in the Loom review of PR #15.

what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
  * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
    future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
    every MCP-owned tmux artifact uniformly.
  * 8 hex characters (~32 bits) is safe against collision inside a
    single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
  and the wait_for_channel call so one prompt rendering remains
  internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
  * Update ``test_run_and_wait_returns_string_template`` to pin the
    new prefix.
  * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
    (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
    (b) send_keys and wait_for_channel use the same name within one
        rendering,
    (c) two separate renderings emit different channel names.
tony added a commit that referenced this pull request Apr 15, 2026
why: ``show_hook`` swallowed three tmux error substrings into an empty
     result: ``"too many arguments"`` (correctly — that's how tmux
     reports an unset hook), but also ``"unknown hook"`` and
     ``"invalid option"``. The latter two fire for *typos* and
     *wrong-scope* mistakes, not "hook is unset". Agents sending
     ``show_hook("pane-esited")`` (typo) or
     ``show_hook("pane-exited", scope="server")`` (wrong scope) got
     back an empty list indistinguishable from a correctly-named but
     unset hook — masking the real input error.

     Flagged as Important by both Gemini (Skeptic) and GPT (Builder)
     in the Loom review of PR #15. This commit also lands the upstream
     TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug).

what:
- Narrow the ``except libtmux_exc.OptionError`` branch in
  src/libtmux_mcp/tools/hook_tools.py::show_hook to only match
  ``"too many arguments"`` — the single substring all tmux builds
  supported by this project use for an unset hook. Any other
  OptionError (``"unknown hook"``, ``"invalid option"``, new future
  substrings) re-raises so ``handle_tool_errors`` surfaces it.
- Replace the block's comment with the reasoning so a future
  maintainer knows not to re-broaden the catch.
- Add a TODO(libtmux upstream) comment in ``_resolve_hook_target``
  explaining why the scope-nulling workaround exists. The fix
  belongs in libtmux's argv-assembly path, not here.
- tests/test_hook_tools.py:
  * Remove ``test_show_hook_missing_returns_empty`` (its use of
    ``"after-nonexistent-hook-cxyz"`` was actually exercising the
    now-removed broad catch).
  * Add ``test_show_hook_unset_known_hook_returns_empty`` that sets
    and then unsets a real hook (``pane-exited``) before calling
    ``show_hook`` — exercises the narrow "too many arguments" path.
  * Add ``test_show_hook_unknown_name_raises`` asserting that a
    bogus hook name now surfaces as ``ToolError`` matching
    ``r"invalid option|unknown hook"`` (regression guard against
    re-broadening the swallow).
tony added a commit that referenced this pull request Apr 15, 2026
…tmcp

why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that
     ``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models
     get rendered in the generated schema docs. After PR #15 added
     seven new models, the tuple still listed only the original ten —
     so the new schemas (``SearchPanesResult``, ``PaneSnapshot``,
     ``ContentChangeResult``, ``HookEntry``, ``HookListResult``,
     ``BufferRef``, ``BufferContent``) were invisible in the built
     API docs, even though they're part of every tool surface that
     was documented.

     Flagged as Important by Gemini (Skeptic) in the Loom review.

what:
- Add the seven new model names to ``conf["fastmcp_model_classes"]``
  in docs/conf.py. Keeps the existing order for the old entries;
  new entries are inserted at semantically adjacent positions
  (e.g. ``SearchPanesResult`` next to ``PaneContentMatch``,
  ``ContentChangeResult`` next to ``WaitForTextResult``,
  ``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` /
  ``BufferContent`` grouped).
- Verified post-build: all seven names appear in the generated
  reference/api/models/index.html.
tony added a commit that referenced this pull request Apr 15, 2026
…ult shape

why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult``
     model (matches + pagination fields) but the docs/tools/panes.md
     example still showed the pre-wrapper bare-array response shape.
     Three reviewers converged on this (Claude + Gemini + GPT in the
     Loom review, 3-way consensus).

what:
- Rewrite the response block in docs/tools/panes.md from:
      [ {pane_id: ..., matched_lines: [...]} ]
  to the actual wrapper:
      { matches: [...], truncated, truncated_panes,
        total_panes_matched, offset, limit }
- Add a one-paragraph pagination hint above the example explaining
  how to iterate larger result sets via ``offset += len(matches)``
  until ``truncated == false`` and ``truncated_panes == []``.
tony added a commit that referenced this pull request Apr 15, 2026
why: AGENTS.md §255-287 requires doctests on pure helpers. Most of
     the pure helpers introduced by PR #15 have them
     (``_validate_channel_name``, ``_validate_logical_name``,
     ``_validate_buffer_name``, ``_truncate_lines_tail``,
     ``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced
     ``_allocate_buffer_name`` as the lone remaining pure helper
     without a doctest — an oversight when commit 272396d introduced
     the module.

     Loom review Suggestion #16 (finalized).

what:
- Expand the docstring on
  src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with
  a NumPy-style ``Examples`` block covering:
  * prefix contract (``"libtmux_mcp_"``),
  * logical-name suffix preservation,
  * 32-hex-character uuid nonce length,
  * empty-string and ``None`` both collapsing to the ``"buf"``
    fallback.
- The docstring also expands the rationale for why the name has the
  exact shape it does (privacy prefix + collision-resistant nonce
  + logical suffix), so future maintainers don't reduce the
  structure and accidentally re-introduce either the OS-clipboard
  read risk or the cross-agent collision risk that commit 272396d
  fixed.
@tony tony force-pushed the 2026-04-follow-ups branch 2 times, most recently from 65971d8 to c714044 Compare April 16, 2026 15:34
tony added a commit that referenced this pull request Apr 16, 2026
…thread around capture_pane

why: In commit 67c3fc8 the wait tools became `async def` so that
     ctx.report_progress could be awaited during long polls. FastMCP
     direct-awaits async tools on the main event loop
     (`fastmcp/tools/function_tool.py:277-278`) — it only uses a
     threadpool for *sync* tools via `call_sync_fn_in_threadpool`.
     That means every blocking `pane.capture_pane()` subprocess.run
     inside the poll loop starves every other concurrent MCP request
     for the duration of the tmux roundtrip (~10s of ms per tick), for
     the full configured timeout.

     A 60-second `wait_for_text` call would pin the entire server for
     60 seconds, delaying list_sessions / capture_pane / anything else.
     Loom code review surfaced this with 100% confidence (Gemini
     Skeptic pass on PR #15); verified against FastMCP source.

what:
- Wrap the three blocking `pane.capture_pane(...)` calls in
  src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`:
  * `wait_for_text` polling call.
  * `wait_for_content_change` initial snapshot.
  * `wait_for_content_change` polling call.
- Inline comment at the first site explains why (future maintainers
  will otherwise see "sync call inside async fn" as a bug and try to
  "fix" it by removing `async def`, reintroducing the blocking path).
- No docstring changes — the `async def` signature already promises
  non-blocking semantics; this commit actually delivers on it.
- Add `test_wait_tools_do_not_block_event_loop` in
  tests/test_pane_tools.py that runs `wait_for_text` against a
  never-matching pattern inside an asyncio.gather with a ticker
  coroutine. Asserts the ticker counter >= 5 over the 300 ms wait
  window — a blocking capture would leave it at 0 or 1. Deliberately
  a generous lower bound to stay robust on slow CI.
tony added a commit that referenced this pull request Apr 16, 2026
…_wait

why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
     wait-for channel name. tmux channels are server-global, so two
     concurrent agents (or parallel prompt renderings from one agent)
     would race: one agent's ``tmux wait-for -S mcp_done`` would
     unblock another's pending ``wait_for_channel("mcp_done")``,
     producing a false positive completion signal.

     Flagged as Critical by GPT Builder in the Loom review of PR #15.

what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
  * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
    future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
    every MCP-owned tmux artifact uniformly.
  * 8 hex characters (~32 bits) is safe against collision inside a
    single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
  and the wait_for_channel call so one prompt rendering remains
  internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
  * Update ``test_run_and_wait_returns_string_template`` to pin the
    new prefix.
  * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
    (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
    (b) send_keys and wait_for_channel use the same name within one
        rendering,
    (c) two separate renderings emit different channel names.
@tony tony force-pushed the 2026-04-follow-ups branch from 94c4044 to bc4c2ed Compare April 16, 2026 22:01
tony added a commit that referenced this pull request Apr 16, 2026
why: ``show_hook`` swallowed three tmux error substrings into an empty
     result: ``"too many arguments"`` (correctly — that's how tmux
     reports an unset hook), but also ``"unknown hook"`` and
     ``"invalid option"``. The latter two fire for *typos* and
     *wrong-scope* mistakes, not "hook is unset". Agents sending
     ``show_hook("pane-esited")`` (typo) or
     ``show_hook("pane-exited", scope="server")`` (wrong scope) got
     back an empty list indistinguishable from a correctly-named but
     unset hook — masking the real input error.

     Flagged as Important by both Gemini (Skeptic) and GPT (Builder)
     in the Loom review of PR #15. This commit also lands the upstream
     TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug).

what:
- Narrow the ``except libtmux_exc.OptionError`` branch in
  src/libtmux_mcp/tools/hook_tools.py::show_hook to only match
  ``"too many arguments"`` — the single substring all tmux builds
  supported by this project use for an unset hook. Any other
  OptionError (``"unknown hook"``, ``"invalid option"``, new future
  substrings) re-raises so ``handle_tool_errors`` surfaces it.
- Replace the block's comment with the reasoning so a future
  maintainer knows not to re-broaden the catch.
- Add a TODO(libtmux upstream) comment in ``_resolve_hook_target``
  explaining why the scope-nulling workaround exists. The fix
  belongs in libtmux's argv-assembly path, not here.
- tests/test_hook_tools.py:
  * Remove ``test_show_hook_missing_returns_empty`` (its use of
    ``"after-nonexistent-hook-cxyz"`` was actually exercising the
    now-removed broad catch).
  * Add ``test_show_hook_unset_known_hook_returns_empty`` that sets
    and then unsets a real hook (``pane-exited``) before calling
    ``show_hook`` — exercises the narrow "too many arguments" path.
  * Add ``test_show_hook_unknown_name_raises`` asserting that a
    bogus hook name now surfaces as ``ToolError`` matching
    ``r"invalid option|unknown hook"`` (regression guard against
    re-broadening the swallow).
tony added a commit that referenced this pull request Apr 16, 2026
…tmcp

why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that
     ``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models
     get rendered in the generated schema docs. After PR #15 added
     seven new models, the tuple still listed only the original ten —
     so the new schemas (``SearchPanesResult``, ``PaneSnapshot``,
     ``ContentChangeResult``, ``HookEntry``, ``HookListResult``,
     ``BufferRef``, ``BufferContent``) were invisible in the built
     API docs, even though they're part of every tool surface that
     was documented.

     Flagged as Important by Gemini (Skeptic) in the Loom review.

what:
- Add the seven new model names to ``conf["fastmcp_model_classes"]``
  in docs/conf.py. Keeps the existing order for the old entries;
  new entries are inserted at semantically adjacent positions
  (e.g. ``SearchPanesResult`` next to ``PaneContentMatch``,
  ``ContentChangeResult`` next to ``WaitForTextResult``,
  ``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` /
  ``BufferContent`` grouped).
- Verified post-build: all seven names appear in the generated
  reference/api/models/index.html.
tony added a commit that referenced this pull request Apr 16, 2026
…ult shape

why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult``
     model (matches + pagination fields) but the docs/tools/panes.md
     example still showed the pre-wrapper bare-array response shape.
     Three reviewers converged on this (Claude + Gemini + GPT in the
     Loom review, 3-way consensus).

what:
- Rewrite the response block in docs/tools/panes.md from:
      [ {pane_id: ..., matched_lines: [...]} ]
  to the actual wrapper:
      { matches: [...], truncated, truncated_panes,
        total_panes_matched, offset, limit }
- Add a one-paragraph pagination hint above the example explaining
  how to iterate larger result sets via ``offset += len(matches)``
  until ``truncated == false`` and ``truncated_panes == []``.
tony added a commit that referenced this pull request Apr 16, 2026
why: AGENTS.md §255-287 requires doctests on pure helpers. Most of
     the pure helpers introduced by PR #15 have them
     (``_validate_channel_name``, ``_validate_logical_name``,
     ``_validate_buffer_name``, ``_truncate_lines_tail``,
     ``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced
     ``_allocate_buffer_name`` as the lone remaining pure helper
     without a doctest — an oversight when commit 272396d introduced
     the module.

     Loom review Suggestion #16 (finalized).

what:
- Expand the docstring on
  src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with
  a NumPy-style ``Examples`` block covering:
  * prefix contract (``"libtmux_mcp_"``),
  * logical-name suffix preservation,
  * 32-hex-character uuid nonce length,
  * empty-string and ``None`` both collapsing to the ``"buf"``
    fallback.
- The docstring also expands the rationale for why the name has the
  exact shape it does (privacy prefix + collision-resistant nonce
  + logical suffix), so future maintainers don't reduce the
  structure and accidentally re-introduce either the OS-clipboard
  read risk or the cross-agent collision risk that commit 272396d
  fixed.
tony added a commit that referenced this pull request Apr 17, 2026
…thread around capture_pane

why: In commit 67c3fc8 the wait tools became `async def` so that
     ctx.report_progress could be awaited during long polls. FastMCP
     direct-awaits async tools on the main event loop
     (`fastmcp/tools/function_tool.py:277-278`) — it only uses a
     threadpool for *sync* tools via `call_sync_fn_in_threadpool`.
     That means every blocking `pane.capture_pane()` subprocess.run
     inside the poll loop starves every other concurrent MCP request
     for the duration of the tmux roundtrip (~10s of ms per tick), for
     the full configured timeout.

     A 60-second `wait_for_text` call would pin the entire server for
     60 seconds, delaying list_sessions / capture_pane / anything else.
     Loom code review surfaced this with 100% confidence (Gemini
     Skeptic pass on PR #15); verified against FastMCP source.

what:
- Wrap the three blocking `pane.capture_pane(...)` calls in
  src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`:
  * `wait_for_text` polling call.
  * `wait_for_content_change` initial snapshot.
  * `wait_for_content_change` polling call.
- Inline comment at the first site explains why (future maintainers
  will otherwise see "sync call inside async fn" as a bug and try to
  "fix" it by removing `async def`, reintroducing the blocking path).
- No docstring changes — the `async def` signature already promises
  non-blocking semantics; this commit actually delivers on it.
- Add `test_wait_tools_do_not_block_event_loop` in
  tests/test_pane_tools.py that runs `wait_for_text` against a
  never-matching pattern inside an asyncio.gather with a ticker
  coroutine. Asserts the ticker counter >= 5 over the 300 ms wait
  window — a blocking capture would leave it at 0 or 1. Deliberately
  a generous lower bound to stay robust on slow CI.
@tony tony force-pushed the 2026-04-follow-ups branch from bc4c2ed to 8c76b87 Compare April 17, 2026 00:07
tony added a commit that referenced this pull request Apr 17, 2026
…_wait

why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
     wait-for channel name. tmux channels are server-global, so two
     concurrent agents (or parallel prompt renderings from one agent)
     would race: one agent's ``tmux wait-for -S mcp_done`` would
     unblock another's pending ``wait_for_channel("mcp_done")``,
     producing a false positive completion signal.

     Flagged as Critical by GPT Builder in the Loom review of PR #15.

what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
  * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
    future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
    every MCP-owned tmux artifact uniformly.
  * 8 hex characters (~32 bits) is safe against collision inside a
    single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
  and the wait_for_channel call so one prompt rendering remains
  internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
  * Update ``test_run_and_wait_returns_string_template`` to pin the
    new prefix.
  * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
    (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
    (b) send_keys and wait_for_channel use the same name within one
        rendering,
    (c) two separate renderings emit different channel names.
tony added a commit that referenced this pull request Apr 17, 2026
why: ``show_hook`` swallowed three tmux error substrings into an empty
     result: ``"too many arguments"`` (correctly — that's how tmux
     reports an unset hook), but also ``"unknown hook"`` and
     ``"invalid option"``. The latter two fire for *typos* and
     *wrong-scope* mistakes, not "hook is unset". Agents sending
     ``show_hook("pane-esited")`` (typo) or
     ``show_hook("pane-exited", scope="server")`` (wrong scope) got
     back an empty list indistinguishable from a correctly-named but
     unset hook — masking the real input error.

     Flagged as Important by both Gemini (Skeptic) and GPT (Builder)
     in the Loom review of PR #15. This commit also lands the upstream
     TODO comment for Suggestion #15 (libtmux's scope-kwarg argv bug).

what:
- Narrow the ``except libtmux_exc.OptionError`` branch in
  src/libtmux_mcp/tools/hook_tools.py::show_hook to only match
  ``"too many arguments"`` — the single substring all tmux builds
  supported by this project use for an unset hook. Any other
  OptionError (``"unknown hook"``, ``"invalid option"``, new future
  substrings) re-raises so ``handle_tool_errors`` surfaces it.
- Replace the block's comment with the reasoning so a future
  maintainer knows not to re-broaden the catch.
- Add a TODO(libtmux upstream) comment in ``_resolve_hook_target``
  explaining why the scope-nulling workaround exists. The fix
  belongs in libtmux's argv-assembly path, not here.
- tests/test_hook_tools.py:
  * Remove ``test_show_hook_missing_returns_empty`` (its use of
    ``"after-nonexistent-hook-cxyz"`` was actually exercising the
    now-removed broad catch).
  * Add ``test_show_hook_unset_known_hook_returns_empty`` that sets
    and then unsets a real hook (``pane-exited``) before calling
    ``show_hook`` — exercises the narrow "too many arguments" path.
  * Add ``test_show_hook_unknown_name_raises`` asserting that a
    bogus hook name now surfaces as ``ToolError`` matching
    ``r"invalid option|unknown hook"`` (regression guard against
    re-broadening the swallow).
tony added a commit that referenced this pull request Apr 17, 2026
…tmcp

why: ``fastmcp_model_classes`` in docs/conf.py is the allow-list that
     ``sphinx-autodoc-fastmcp`` uses to decide which Pydantic models
     get rendered in the generated schema docs. After PR #15 added
     seven new models, the tuple still listed only the original ten —
     so the new schemas (``SearchPanesResult``, ``PaneSnapshot``,
     ``ContentChangeResult``, ``HookEntry``, ``HookListResult``,
     ``BufferRef``, ``BufferContent``) were invisible in the built
     API docs, even though they're part of every tool surface that
     was documented.

     Flagged as Important by Gemini (Skeptic) in the Loom review.

what:
- Add the seven new model names to ``conf["fastmcp_model_classes"]``
  in docs/conf.py. Keeps the existing order for the old entries;
  new entries are inserted at semantically adjacent positions
  (e.g. ``SearchPanesResult`` next to ``PaneContentMatch``,
  ``ContentChangeResult`` next to ``WaitForTextResult``,
  ``HookEntry`` / ``HookListResult`` grouped, ``BufferRef`` /
  ``BufferContent`` grouped).
- Verified post-build: all seven names appear in the generated
  reference/api/models/index.html.
tony added a commit that referenced this pull request Apr 17, 2026
…ult shape

why: PR #15 wrapped ``search_panes`` output in a ``SearchPanesResult``
     model (matches + pagination fields) but the docs/tools/panes.md
     example still showed the pre-wrapper bare-array response shape.
     Three reviewers converged on this (Claude + Gemini + GPT in the
     Loom review, 3-way consensus).

what:
- Rewrite the response block in docs/tools/panes.md from:
      [ {pane_id: ..., matched_lines: [...]} ]
  to the actual wrapper:
      { matches: [...], truncated, truncated_panes,
        total_panes_matched, offset, limit }
- Add a one-paragraph pagination hint above the example explaining
  how to iterate larger result sets via ``offset += len(matches)``
  until ``truncated == false`` and ``truncated_panes == []``.
tony added a commit that referenced this pull request Apr 17, 2026
why: AGENTS.md §255-287 requires doctests on pure helpers. Most of
     the pure helpers introduced by PR #15 have them
     (``_validate_channel_name``, ``_validate_logical_name``,
     ``_validate_buffer_name``, ``_truncate_lines_tail``,
     ``_tmux_argv``, ``_pane_id_sort_key``). The critic pass surfaced
     ``_allocate_buffer_name`` as the lone remaining pure helper
     without a doctest — an oversight when commit 272396d introduced
     the module.

     Loom review Suggestion #16 (finalized).

what:
- Expand the docstring on
  src/libtmux_mcp/tools/buffer_tools.py::_allocate_buffer_name with
  a NumPy-style ``Examples`` block covering:
  * prefix contract (``"libtmux_mcp_"``),
  * logical-name suffix preservation,
  * 32-hex-character uuid nonce length,
  * empty-string and ``None`` both collapsing to the ``"buf"``
    fallback.
- The docstring also expands the rationale for why the name has the
  exact shape it does (privacy prefix + collision-resistant nonce
  + logical suffix), so future maintainers don't reduce the
  structure and accidentally re-introduce either the OS-clipboard
  read risk or the cross-agent collision risk that commit 272396d
  fixed.
@tony tony changed the title Follow-ups from code review + FastMCP alignment (Phase 2) FastMCP alignment: new tools, prompts, and middleware Apr 17, 2026
tony added a commit that referenced this pull request Apr 17, 2026
…thread around capture_pane

why: In commit 67c3fc8 the wait tools became `async def` so that
     ctx.report_progress could be awaited during long polls. FastMCP
     direct-awaits async tools on the main event loop
     (`fastmcp/tools/function_tool.py:277-278`) — it only uses a
     threadpool for *sync* tools via `call_sync_fn_in_threadpool`.
     That means every blocking `pane.capture_pane()` subprocess.run
     inside the poll loop starves every other concurrent MCP request
     for the duration of the tmux roundtrip (~10s of ms per tick), for
     the full configured timeout.

     A 60-second `wait_for_text` call would pin the entire server for
     60 seconds, delaying list_sessions / capture_pane / anything else.
     Loom code review surfaced this with 100% confidence (Gemini
     Skeptic pass on PR #15); verified against FastMCP source.

what:
- Wrap the three blocking `pane.capture_pane(...)` calls in
  src/libtmux_mcp/tools/pane_tools/wait.py with `asyncio.to_thread`:
  * `wait_for_text` polling call.
  * `wait_for_content_change` initial snapshot.
  * `wait_for_content_change` polling call.
- Inline comment at the first site explains why (future maintainers
  will otherwise see "sync call inside async fn" as a bug and try to
  "fix" it by removing `async def`, reintroducing the blocking path).
- No docstring changes — the `async def` signature already promises
  non-blocking semantics; this commit actually delivers on it.
- Add `test_wait_tools_do_not_block_event_loop` in
  tests/test_pane_tools.py that runs `wait_for_text` against a
  never-matching pattern inside an asyncio.gather with a ticker
  coroutine. Asserts the ticker counter >= 5 over the 300 ms wait
  window — a blocking capture would leave it at 0 or 1. Deliberately
  a generous lower bound to stay robust on slow CI.
tony added a commit that referenced this pull request Apr 17, 2026
…_wait

why: The ``run_and_wait`` prompt hardcoded ``mcp_done`` as the tmux
     wait-for channel name. tmux channels are server-global, so two
     concurrent agents (or parallel prompt renderings from one agent)
     would race: one agent's ``tmux wait-for -S mcp_done`` would
     unblock another's pending ``wait_for_channel("mcp_done")``,
     producing a false positive completion signal.

     Flagged as Critical by GPT Builder in the Loom review of PR #15.

what:
- Import ``uuid`` in src/libtmux_mcp/prompts/recipes.py.
- Generate a fresh channel name at each call: ``libtmux_mcp_wait_<uuid4hex[:8]>``.
  * Prefix aligns with the buffer-tools namespace (libtmux_mcp_) so a
    future ``list_buffers(prefix="libtmux_mcp_")``-style operator sees
    every MCP-owned tmux artifact uniformly.
  * 8 hex characters (~32 bits) is safe against collision inside a
    single tmux server's concurrent-agent population.
- Interpolate the same channel name into both the send_keys payload
  and the wait_for_channel call so one prompt rendering remains
  internally consistent.
- Update the prompt docstring to note the per-invocation scope.
- tests/test_prompts.py:
  * Update ``test_run_and_wait_returns_string_template`` to pin the
    new prefix.
  * Add ``test_run_and_wait_channel_is_uuid_scoped`` asserting
    (a) channel matches ``libtmux_mcp_wait_[0-9a-f]{8}``,
    (b) send_keys and wait_for_channel use the same name within one
        rendering,
    (c) two separate renderings emit different channel names.
tony added 28 commits April 18, 2026 08:43
…mpt waits

Regression test asserts the rendered recipe:

* does NOT contain the stale "wait for the prompt … between each
  step" guidance that would deadlock agents.
* DOES reference ``wait_for_content_change`` for post-launch UI
  confirmation.
* still references ``wait_for_text`` (for the pre-launch shell-
  readiness check, where prompt matching is meaningful).

Complements the existing prompt-alignment test, which only covers
parameter names, by validating runtime guidance semantics.
The docstring claimed ``list_servers`` discovers "every live tmux
server on this machine", but the implementation only scans
``${TMUX_TMPDIR}/tmux-<uid>/``. Custom ``tmux -S /arbitrary/path``
daemons (systemd user units, CI runners, per-project ``.envrc``
paths) live outside that directory and were invisible — the
docstring overclaimed.

Two changes:

* Correct the docstring to say "under the current user's
  ``$TMUX_TMPDIR``" and spell out the custom-path caveat so agents
  do not assume canonical-only sockets exist.
* Add an optional ``extra_socket_paths: list[str] | None`` parameter
  so callers who know about arbitrary paths can supplement the scan.
  Each path runs through the same ``_is_tmux_socket_live`` probe and
  socket-metadata query as canonical entries. Nonexistent or
  non-socket paths are silently skipped so stale user-supplied lists
  don't crash the call.

Factors the server-info construction into a private
``_probe_server_by_path`` helper that mirrors ``get_server_info`` but
keys the Server by socket_path (-S) instead of socket_name (-L), so
the extras path can use an arbitrary filesystem location.
…afety

Two regression guards for the new ``extra_socket_paths`` surface:

* Supplying the fixture socket as an extra path makes it appear
  in the result even when the canonical ``$TMUX_TMPDIR`` scan is
  pointed at an empty directory — proves the extras code path is
  exercised in isolation from the canonical scan.
* A list containing a nonexistent path and a regular file (not a
  socket) returns an empty result without raising — proves stale
  user-supplied entries degrade gracefully.
The L1 fix narrowed the macOS ``$TMUX_TMPDIR`` gap by preferring
``tmux display-message -p '#{socket_path}'`` over env-reconstruction,
but ``server.cmd`` invokes tmux with the MCP process's own
environment. When the MCP process's ``$TMUX_TMPDIR`` diverges from
the real tmux server's (macOS launchd case), the query fails:

* MCP process env: ``TMUX_TMPDIR=/wrong/path``
* Real tmux lives at ``/correct/path/tmux-1000/mysocket``
* Caller's ``$TMUX=/correct/path/tmux-1000/mysocket,…``
* Query invokes ``tmux -L mysocket display-message``; tmux uses
  ``/wrong/path`` and can't find the server; query raises.
* Reconstruction fills in ``/wrong/path/tmux-1000/mysocket``.
* ``_caller_is_on_server`` realpath compares bogus vs. real path:
  different. Guard degrades to no-op.

Adds a conservative basename fallback after the realpath check: if
the basename of the caller's socket path equals the target's declared
``socket_name`` (or ``"default"``), the guard still fires. Both names
are authoritative on their respective sides and neither is subject to
``$TMUX_TMPDIR`` divergence, so this is a safety-preserving
last-chance match.

Trade-off acknowledged in the docstring: exotic false positive when
two daemons share a socket_name under different tmpdirs. Prefer that
over silently permitting self-kill under env mismatch.
…gence

Regression guard for the macOS launchd scenario: MCP process has a
wrong ``$TMUX_TMPDIR``, display-message query fails, reconstruction
produces a path that mismatches the caller's ``$TMUX`` realpath, but
the basename (socket name) still matches. The guard must still fire.

The test forces:

* ``$TMUX_TMPDIR`` on the MCP process points at a nonexistent dir.
* Monkeypatches ``server.cmd`` so the display-message query raises.
* Sets ``$TMUX`` to a path in a "real" tmpdir whose basename is the
  target's socket_name.

Asserts ``_caller_is_on_server`` returns True — proving the
basename fallback closes the env-mismatch gap without requiring the
realpath comparison to succeed.
…r from build_dev_workspace

Two residual ergonomic problems in the recipe that F1's rewrite did
not catch, both confirmed by reading the underlying source:

**Fancy-shell hang.** Step 4 emitted a literal
``wait_for_text(pattern=r"\$ |\# |\% ", regex=True, timeout=5.0)``.
The regex only matches default bash/zsh prompts; zsh+oh-my-zsh
(``➜``), starship (``❯``), pure / p10k (Unicode glyphs), and fish
(``>``) all miss. Agents on these setups burn 5s × 3 panes = 15s
of ceremony every time the recipe runs.

Reading tmux source confirms the wait was unnecessary defensive
ceremony: ``cmd-split-window.c:151`` → ``spawn.c:382,494`` allocate
the PTY and wire the bufferevent before split-window returns.
``input-keys.c:418`` writes send-keys bytes straight into the
kernel PTY buffer via ``bufferevent_write`` with no readiness
check. The kernel holds bytes until the slave reads. A
``send_keys("vim")`` right after ``split_window`` is safe whether
or not the shell has finished drawing its prompt — the shell will
pick up the queued bytes on its first stdin read.

**Stray Enter in pane B.** Step 5 emitted
``send_keys(pane_id="%B", keys="")`` with a comment
"(leave the shell idle)". libtmux's ``Pane.send_keys``
(``pane.py:423-475``) runs ``send-keys ""`` *then* ``self.enter()``
which is ``send-keys Enter``. libtmux's own test
(``tests/test_pane.py:128``) confirms an Enter is sent. The shell
was already idle after the split; the line did nothing useful
while actively misleading anyone reading it.

Net change: ~15 LOC removed, ~3 LOC added. Recipe body is shorter,
shell-agnostic, and honest about what it does. The step-6
``wait_for_content_change`` block is kept (renumbered to step 5) —
it is still the right primitive for post-launch UI confirmation.

Also updates the existing
``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers``
test to drop the now-obsolete ``wait_for_text`` assertion; a
targeted regression guard for the two removed lines follows in a
separate test commit.
…regex or stray Enter

Targeted regression guard for the two residual ergonomic issues
fixed in the preceding commit:

* ``\$ |\# |\% `` regex must be absent — hangs starship / oh-my-zsh
  / pure / p10k / fish users for 5s × 3 panes = 15s of ceremony
  with no benefit.
* ``send_keys(pane_id="%B", keys="")`` must be absent — libtmux
  emits an Enter keystroke when ``enter=True`` (the default), so
  the line lied about being a no-op.

Positive pin on ``wait_for_content_change`` so the post-launch UI
confirmation guidance stays in the recipe; it's a shell-agnostic
"did the screen change since we captured it?" check that works
across every shell, editor, and tailing program combination.

Follows the established ``in``/``not in`` assertion style from
the neighbouring
``test_build_dev_workspace_does_not_deadlock_on_screen_grabbers``
test.
``get_server_info`` swallowed any ``display-message #{version}``
failure with a bare ``except Exception: pass`` — pre-existing from
the initial release, not from this branch. The sibling helper
``_probe_server_by_path`` in the same file now logs the analogous
failure at debug level, so the two were drifting in style.

Adds a single ``logger.debug`` so operators debugging "why does my
tmux server report version=None?" get a uniform signal across both
code paths. Semantics are unchanged (version still falls back to
None on failure, callers never see the exception).
…_session

The ``del mcp_session`` pattern (12 sites across four test files)
signalled "I need this fixture's side effect but not its value."
That is the textbook use case for ``@pytest.mark.usefixtures``,
which pytest's own test suite uses 16+ times and recommends as the
canonical form — cross-checked in ~/study/python/pytest/testing/
and in upstream pytest docs. libtmux and fastmcp also follow the
decorator form; libtmux-mcp was an outlier.

Each converted test now:

* Declares the required fixture via a decorator (explicit intent
  at function definition).
* Drops the unused ``mcp_session: Session`` parameter.
* Drops the ``del mcp_session`` ceremony from the body.

Three files also drop the now-unused ``Session`` import from their
``TYPE_CHECKING`` block (``test_buffer_tools.py``,
``test_server.py``, ``test_wait_for_tools.py``).
``test_server_tools.py`` keeps the import — ``test_list_sessions``
genuinely uses ``mcp_session``.

Behaviour unchanged; this is pure readability + idiom cleanup.
…t loops

Regression guard for MCP cancellation semantics. The two wait tools
share a ``while True:`` poll-and-sleep pattern wrapped by
``handle_tool_errors_async`` (``_utils.py:827-850``), which catches
``Exception`` (not ``BaseException``). ``asyncio.CancelledError``
inherits from ``BaseException`` (Python 3.8+), so it propagates
through the decorator today.

This test locks that in: if a future change broadens the decorator
to ``BaseException`` it would silently swallow MCP cancellation
requests, agents would burn tokens waiting for cancelled tools to
return, and operators would have no signal that anything went wrong.

Uses ``task.cancel()`` rather than ``asyncio.wait_for`` so the
raised exception is the inner ``CancelledError`` directly, not
``wait_for``'s ``TimeoutError`` wrapper. ``wait_for_text`` and
``wait_for_content_change`` get sibling tests since both go through
the same decorator path.
…elled

MCP cancellation works correctly today — ``CancelledError`` is a
``BaseException`` (Python 3.8+) and ``handle_tool_errors_async``
catches ``Exception``, so cancellation propagates untouched. But
operators have no signal that a wait was cancelled vs. timed out —
the two outcomes look identical at the log layer.

Wraps each ``while True:`` poll loop in ``wait_for_text`` and
``wait_for_content_change`` with an explicit
``except asyncio.CancelledError: logger.debug(...); raise``. Three
benefits:

* Documents intent — readers see that cancellation is handled
  intentionally, not by accident of inheritance.
* Records the cancel-time elapsed seconds and pane id at debug
  level so post-mortem log analysis can distinguish "cancelled at
  150 ms" from "timed out at 8 s".
* Re-raise preserves MCP semantics — never returns a partial
  ``WaitForTextResult`` / ``ContentChangeResult`` that would mask
  the cancellation as a timed-out wait.

The per-pair regression test added in the preceding commit pins
``CancelledError`` propagation; this commit only adds diagnostic
visibility.
…t timeouts

Adopts FastMCP's structured client logging
(``ctx.info/warning/error`` at ``fastmcp/server/context.py:716-778``)
so MCP clients receive ``notifications/message`` events for the two
diagnostic moments operators currently can't see:

* Invalid regex pattern compiled in ``wait_for_text`` — previously
  the agent saw only a generic ``ToolError``; now the client also
  receives a structured warning, which surfaces in client log
  panels independent of the tool result.
* Wait loop reaches its timeout in either ``wait_for_text`` or
  ``wait_for_content_change`` — previously identical to a "found"
  result at the log layer (only the boolean ``timed_out`` field
  distinguished them); now the client gets an explicit warning
  with pane id and timeout duration.

Adds a ``_maybe_log`` helper sibling to ``_maybe_report_progress``
sharing the same ``_TRANSPORT_CLOSED_EXCEPTIONS`` suppression
contract so a hung-up client doesn't take down the tool, while
programming errors stay loud per the regression guard
``test_wait_for_text_propagates_unexpected_progress_error``.

Updates the existing ``_StubContext`` and ``_BrokenContext`` test
stubs in ``tests/test_pane_tools.py`` to declare ``warning``
methods — the stubs need to mock the full Context surface the wait
loop now touches. ``_BrokenContext.warning`` raises the same
``anyio.BrokenResourceError`` as its ``report_progress`` so the
existing transport-closed regression guard also covers the new log
path.

Per-pair regression test in the next commit asserts the warnings
fire at both sites with the right level.
… and timeout

Three regression guards for the new ``_maybe_log`` call sites added
in the preceding commit. Each builds a tiny ``_RecordingContext``
that captures ``(level, message)`` tuples — the same recording-stub
style used by the existing progress tests, kept inline per test so
each assertion stays self-contained:

* ``test_wait_for_text_warns_on_invalid_regex`` — ``regex=True`` with
  an unclosed bracket raises ``ToolError`` AND records a warning
  containing "Invalid regex".
* ``test_wait_for_text_warns_on_timeout`` — never-matching pattern
  with a 200 ms timeout records a warning containing "timeout".
* ``test_wait_for_content_change_warns_on_timeout`` — settled pane
  with a 500 ms timeout records the equivalent warning. Reuses the
  active settle-loop pattern from
  ``test_wait_for_content_change_timeout`` so the test stays
  deterministic on slow CI rather than relying on a fixed sleep.

The tests assert on substring presence rather than exact message
text so the warning copy can evolve without forcing a test update.
…AG_READONLY

Adopts FastMCP's ``RetryMiddleware``
(``fastmcp/server/middleware/error_handling.py:136``) but bounded
by the existing safety-tier annotation. The ``_server_cache``
already evicts dead Server instances on ``is_alive() == False``,
but the *first* call after a tmux daemon crash still fails before
the cache learns about it. A single retry on the freshly-cached
Server closes that window without changing tool semantics.

Critical safety constraint: only readonly tools are retried.
Mutating tools like ``send_keys``, ``create_session``,
``split_window`` are NOT idempotent — re-running them on a
transient socket error would silently double side effects, which
is unacceptable. The wrapper ``ReadonlyRetryMiddleware`` reads
``tool.tags`` via ``context.fastmcp_context.fastmcp.get_tool(name)``
(same pattern ``SafetyMiddleware`` uses) and only delegates to
the upstream retry when ``TAG_READONLY`` is present.

Defaults are deliberately small: ``max_retries=1`` keeps audit
log noise minimal (one extra attempt, not three); ``base_delay
0.1s`` / ``max_delay 1.0s`` matches the duration window of a
transient libtmux socket hiccup. ``retry_exceptions=
(libtmux.exc.LibTmuxException,)`` is narrow on purpose — the
upstream default ``(ConnectionError, TimeoutError)`` does NOT
match libtmux's wrapped subprocess errors and would be a silent
no-op.

Wired into the middleware stack between ``AuditMiddleware`` and
``SafetyMiddleware`` so retries are audited once each and
tier-denied tools never reach retry. Updates the existing
``test_server_middleware_stack_order`` to lock in the new
position.

Per-pair regression test in the next commit asserts: readonly
tool retries on ``LibTmuxException``, mutating tool does not
retry, non-libtmux exception does not retry.
…y tools only

Three regression guards for the new safety-scoped retry wrapper:

* ``test_readonly_retry_recovers_from_libtmux_exception`` — readonly
  tool raises ``LibTmuxException`` once then succeeds; assert the
  middleware retried (call count 2) and returned the success.
  Models the production scenario: a transient socket error on the
  first call after a tmux daemon restart, recovered by the retry
  on the freshly-cached Server.
* ``test_readonly_retry_skips_mutating_tool`` — mutating tool with
  the same exception; assert NO retry (call count 1) and the
  exception propagates. This is the critical safety property —
  re-running ``send_keys`` or ``create_session`` on a transient
  error would silently double the side effect.
* ``test_readonly_retry_skips_non_libtmux_exception`` — readonly
  tool raises ``ValueError``; assert NO retry. Pins the narrow
  ``retry_exceptions=(LibTmuxException,)`` default so a future
  refactor that broadens the trigger set would fail loudly.

Tests use minimal ``_StubTool`` / ``_StubFastMCP`` /
``_StubFastMCPContext`` stand-ins to satisfy the
``context.fastmcp_context.fastmcp.get_tool(name)`` lookup the
middleware uses for tag inspection, plus a ``_FlakyCallNext``
helper that raises N times before succeeding.
… fix CI race

CI failure on `8dc7c81`:
``tests/test_pane_tools.py::test_wait_for_content_change_propagates_cancellation``
``Failed: DID NOT RAISE <class 'asyncio.exceptions.CancelledError'>``
across all six tmux versions (3.2a, 3.3a, 3.4, 3.5, 3.6, master).

Root cause: ``wait_for_content_change`` exits the poll loop when
``current != initial_content``. On the GitHub Actions Ubuntu runner
the shell prompt redraws (cursor blink, zsh async hooks like
vcs_info) within the test's 100 ms ``asyncio.sleep`` window, so
``changed=True`` fires and the tool returns a normal
``ContentChangeResult`` before the ``task.cancel()`` arrives. Local
runs on an idle pane never tripped this — classic CI-only race.

Fix: monkeypatch ``Pane.capture_pane`` to always return the same
line list. The diff branch can never trigger, so the loop stays in
``await asyncio.sleep(interval)`` indefinitely and the cancel
unambiguously interrupts it. The test still exercises the production
code path that matters — that ``CancelledError`` propagates through
``handle_tool_errors_async`` without being mapped to ``ToolError``.
The shipped ``ReadonlyRetryMiddleware`` (commit ``bd37f29``) is a
production no-op under fastmcp 3.2.3:

- ``RetryMiddleware._should_retry`` only checks ``isinstance(error,
  retry_exceptions)`` — does NOT walk ``__cause__``.
- ``handle_tool_errors_async`` (``_utils.py:842-850``) wraps every
  ``LibTmuxException`` as ``ToolError(...) from LibTmuxException``.
- At the middleware layer the exception type is ``ToolError``, not
  ``LibTmuxException``, so ``_should_retry`` returns ``False`` and
  the retry never fires in production.

The shipped tests in ``tests/test_middleware.py:475+`` only pass
because ``_FlakyCallNext`` raises ``LibTmuxException`` directly,
bypassing the production decorator wrap path.

fastmcp v3.2.4 commit ``031c7e03`` (PR #3858 "Fix RetryMiddleware
not retrying tool errors") adds the ``__cause__`` walk to
``_should_retry``. Reproduced the production-flow bug under 3.2.3
with a 25-line harness (calls=1) and verified the fix on 3.2.4
(calls=2 with max_retries=1).

Free win that ships in the same bump: PR #3872 (commit
``f3c00ba1`` "Extract parameter descriptions from docstrings")
adds an automatic NumPy/Google/Sphinx docstring parser that
populates per-parameter ``inputSchema.description`` fields. The
project's tools already use NumPy ``Parameters\n----`` style
throughout (verified ``wait.py``, ``_utils.py``, ``middleware.py``),
so per-parameter descriptions become visible to LLM clients with
zero code changes.

Next commit adds an integration-style regression test that calls a
real ``@handle_tool_errors``-decorated tool through the middleware
stack — catches the production no-op AND any future regression
where someone changes the decorator's exception wrapping.
… decorator

Integration-style regression guard for the production code path the
existing unit tests miss. The three sibling tests above use
``_FlakyCallNext`` to raise ``LibTmuxException`` directly; in
production every tool is wrapped by ``handle_tool_errors`` /
``handle_tool_errors_async`` which converts ``LibTmuxException`` to
``ToolError(...) from LibTmuxException``. At the middleware layer
the exception type is ``ToolError``, not ``LibTmuxException`` — so
the retry decision must walk ``__cause__`` to see the real failure
type.

The new test invokes the real ``list_sessions`` tool through
``ReadonlyRetryMiddleware.on_call_tool``, exercising the decorator
wrap path that broke under fastmcp 3.2.3. Monkeypatches
``Server.sessions`` to raise ``LibTmuxException`` once then return
``[]`` so the test runs without a live tmux. Asserts ``calls == 2``
with a friendly error message naming the likely cause (fastmcp
older than 3.2.4) so an accidental dep downgrade fails loudly.

Catches both: (a) the production no-op the previous fastmcp pin
allowed, and (b) any future regression where someone changes the
decorator's exception wrapping in a way that defeats ``__cause__``
walking.
…tream consistency

All three round-2 weave:review reviewers flagged that
``ReadonlyRetryMiddleware``'s internal logger defaulted to
``fastmcp.retry`` (the upstream stock at ``error_handling.py:181``)
instead of the project's ``libtmux_mcp.*`` namespace. Operators
routing the audit stream by namespace prefix would silently miss
retry events.

Set the default explicitly: when ``logger_`` is ``None``, construct
``logging.getLogger("libtmux_mcp.retry")`` before delegating to
fastmcp's ``RetryMiddleware``. The ``if logger_ is None:`` pattern
avoids evaluating ``getLogger`` in the default expression.

Pinned in tests via ``test_readonly_retry_logger_uses_project_namespace``
asserting ``middleware._retry.logger.name == "libtmux_mcp.retry"``,
so a future refactor that drops the default fails loudly.
The new ``list_servers`` tool (added earlier in the branch) wasn't
documented anywhere under ``docs/`` — agents and human readers
discovering the tool from a client's tool list had no docs page to
land on. The conf.py area-map already routes ``server_tools`` →
``sessions``, so the natural home is ``docs/tools/sessions.md``
between ``get_server_info`` and ``create_session``.

Adds the standard four-part section: ``{fastmcp-tool}`` directive,
Use-when / Avoid-when / Side-effects narrative, an example JSON
request + response (including a second example with
``extra_socket_paths`` for the custom-path case), and the
``{fastmcp-tool-input}`` schema directive. Calls out the scope
caveat (custom ``tmux -S /path/...`` daemons are not in the
canonical scan) up front so readers don't reach for the tool in
the wrong situation.

Also adds a grid-item-card to ``docs/tools/index.md`` so the
discovery overview links to the new section.
The four prompts shipped this branch (``run_and_wait``,
``diagnose_failing_pane``, ``build_dev_workspace``,
``interrupt_gracefully``) had zero docs coverage — no
``{fastmcp-prompt}`` autodoc directive (the
sphinx-autodoc-fastmcp extension only ships ``{fastmcp-tool}``,
``{fastmcp-tool-input}``, ``{fastmcp-tool-summary}``; verified
by reading the installed extension's ``__init__.py``), and
``docs/topics/prompting.md`` covers general agent-prompting
heuristics rather than the project's MCP-registered prompts.

Adds ``docs/tools/prompts.md`` — a hand-written page (since
autodoc isn't available for prompts) with one section per recipe
covering: when to use it, why it exists vs. the obvious
alternative tool calls, parameters, and a sample render at one
representative argument set. The samples use stable
representative output (``<uuid>`` placeholder for the wait-for
channel) so the page doesn't churn on every regeneration.

Wires the new page into the docs toctree under "Use it" between
``tools/index`` and ``recipes`` (the latter is the longer
narrative agent-reasoning page, conceptually distinct from these
short MCP-protocol prompts). Adds a one-paragraph cross-reference
from ``docs/tools/index.md``'s decision tree so readers
discovering the tool surface find the prompts naturally.
…he new pane id

The ``create_session`` response gained an ``active_pane_id`` field
earlier in this branch (commit ``3f092bb``), but the docs still
showed the pre-change response shape — readers landing on
``sessions.md`` had no signal that this field exists or what
they'd do with it.

Updates the example response to include the field, then adds a
brief tip explaining the workflow win: callers can target
subsequent ``send_keys`` / ``split_window`` / ``capture_pane``
calls against ``active_pane_id`` directly without a follow-up
``list_panes`` round-trip. Saves one MCP call in the most common
"create a session, then act on it" workflow that prompts like
``build_dev_workspace`` already exploit.

The full field surface is auto-rendered in
``docs/reference/api/models.md`` via the ``fastmcp_model_classes``
registration — this commit just adds the narrative discoverability
that explains why the field exists.
…nx types

Streamline the 0.1.x unreleased block: lead with grouped tool families,
collapse narrative prose to scannable bullets, drop duplicate context
already carried by the Breaking-changes section.

Convert tool names to {tooliconl} roles so the /history/ page links each
tool to its reference anchor, and route LibTmuxException / CancelledError
/ RuntimeError through {exc} for intersphinx resolution into libtmux and
Python docs.
SearchPanesResult, PaneContentMatch, SessionInfo.active_pane_id, and the
four project-local middleware classes (AuditMiddleware, SafetyMiddleware,
ReadonlyRetryMiddleware, TailPreservingResponseLimitingMiddleware) now
resolve to the API reference pages instead of rendering as bare code.
TimingMiddleware and ErrorHandlingMiddleware stay bare — they come from
fastmcp and the project has no intersphinx inventory for it.
…es it now

The shim was a project-local workaround for two gp-sphinx bugs in
``spa-nav.js::addCopyButtons``: clone-based template failed on pages
without code blocks, and only ``div.highlight pre`` was iterated
(not the full configured ``copybutton_selector``, so prompt blocks
went buttonless after SPA swap).

Both are fixed upstream in gp-sphinx (#20): inline HTML template and
``window.GP_SPHINX_COPYBUTTON_SELECTOR`` bridge. The shim's
responsibilities collapse into the theme's own code, so remove it
from conf.py's ``app.add_js_file`` list and delete the file.

``prompt-copy.js`` stays — it's unrelated: it implements Markdown-
preserving copy (backtick-wrapping ``<code>`` children) for prompt
admonitions, which sphinx-copybutton's ``innerText``-based copy
flattens. That behavior has nothing to do with SPA navigation and
uses document-level event delegation to survive DOM swaps.
…d resources

Switch ``docs/tools/prompts.md`` and ``docs/reference/api/resources.md``
from hand-written signatures / ``automodule`` to the new
``{fastmcp-prompt}`` / ``{fastmcp-prompt-input}`` and
``{fastmcp-resource-template}`` directives from gp-sphinx. Each recipe
now has a live-introspected card (name, description, tags, type
badge) and an auto-generated arguments table, while the editorial
prose (``**Use when**``, ``**Why use this**``, ``**Sample render**``)
stays hand-written around the directive blocks.

``docs/conf.py`` now sets ``fastmcp_server_module = "libtmux_mcp.server:mcp"``
so the collector can enumerate the live FastMCP instance. Because
``server.py`` defers registration to ``run_server()``, the collector
invokes ``_register_all()`` internally to populate the component
registry before reading.

Also add three short ``docs/topics/`` pages for the MCP protocol
utilities that map conceptually rather than as first-class autodoc:

- ``completion.md`` — what FastMCP derives automatically from prompt
  args and resource-template parameters; notes that live tmux state
  is not yet wired in as a completion source.
- ``logging.md`` — the ``libtmux_mcp.*`` logger hierarchy and how
  FastMCP forwards records to clients via the MCP logging capability.
- ``pagination.md`` — contrasts MCP protocol-level cursors (automatic,
  server-owned) with tool-level ``offset`` / ``limit`` on
  ``search_panes`` (agent-controlled bounded cost).

Topics index grows a new "MCP protocol utilities" grid row linking
to them.
The fastmcp-prompt directive now creates its own section with a hidden
title (for TOC / {ref} resolution) and a visible card header. The outer
## `run_and_wait` / ## `diagnose_failing_pane` etc. headings caused the
prompt name to appear twice — once as the section heading immediately
above the card, and once in the card's own signature line.

With the headings removed, each prompt name appears exactly once: in the
card header, with the new ¶ permalink from the directive-created section.
The intro bullet {ref} cross-refs continue to resolve correctly.
@tony tony force-pushed the 2026-04-follow-ups branch from 5ebc555 to 10f6e1e Compare April 18, 2026 13:43
why: Surface the GitHub repo as a first-class nav item inline with
     the normal toctree entries.
what:
- Add GitHub <https://github.com/tmux-python/libtmux-mcp> to toctree
  in index.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants