Skip to content

Phase L: MCP tool-use + capability chips + CI#5

Merged
dexwritescode merged 12 commits intomainfrom
phase-l-mcp
Apr 22, 2026
Merged

Phase L: MCP tool-use + capability chips + CI#5
dexwritescode merged 12 commits intomainfrom
phase-l-mcp

Conversation

@dexwritescode
Copy link
Copy Markdown
Owner

Summary

  • L.1 Multi-turn tool-use loop in the inference engine (up to 5 turns, Qwen3 QK norm fix)
  • L.2 MCP client runtime — stdio transport, McpManager with permission gates, neurons mcp CLI commands (add/remove/list/test)
  • L.7 Tool capability chips — heuristic inference from model name/type, overridden post-load by supports_tool_use() from C++; wired through proto → service → AppState → Flutter model picker
  • CI GitHub Actions workflow on macos-26 arm64: installs gRPC, caches build/_deps (MLX etc.), builds all targets, runs unit tests

Test plan

  • CI workflow triggers on this PR and the Build step passes
  • neurons mcp add/list/test commands work against a local MCP server
  • Tool capability chip shows correctly for loaded vs unloaded models in the Models tab
  • Unit tests pass in CI (integration tests skipped — need model files)

🤖 Generated with Claude Code

dexwritescode and others added 12 commits April 21, 2026 11:47
Adds tool-use detection and multi-turn execution to the compute/service
layers. No model weights change; behavior is identical when tool_cb is
not provided.

compute/:
- LanguageModel: add ToolCall struct and four virtual methods
  (supports_tool_use, format_tool_system_prompt, detect_tool_call,
  format_tool_result) with safe no-op defaults
- LlamaModel: implement all four for Qwen2.5/Qwen3, Llama-3.1+, and
  Mistral-tool families; family detected at load time via vocab probe
- model_config: add Qwen3ForCausalLM architecture; model_type "qwen3"
  dispatches to LlamaModel
- language_model factory: add "qwen3" to LlamaModel dispatch

service/:
- Add ToolCallCb = (ToolCall) → optional<string> typedef
- generate_internal: replace single mdl->generate() call with a
  multi-turn tool loop (up to 5 turns); detects tool call mid-stream,
  stops generation, invokes callback, injects result via
  format_tool_result(), re-encodes context (no BOS on continuations)
- build_prompt: handle qwen3 with the same ChatML template as qwen2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Qwen3 applies learned RMSNorm to Q and K tensors per-head-dimension
before RoPE. Without it attention dot-products are ungated and produce
garbage output. Qwen2/Llama/Mistral weights have no q_norm/k_norm keys,
so the probe is a no-op for those families.

Tested: Qwen3-8B-4bit now produces coherent reasoning output (thinking
mode). Qwen2.5-3B, Llama-3.1-8B, and all 115 compute_tests pass
without regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the Model Context Protocol (MCP) client layer across service
and CLI.

service/src/mcp/:
- mcp_types.h: McpServerConfig, McpPermission, ToolDef
- mcp_client.h/cpp: JSON-RPC 2.0 over stdio subprocess (fork+exec+pipes).
  Handles initialize handshake, tools/list, tools/call. SSE stubbed.
- mcp_manager.h/cpp: aggregates tools across connected servers; routes
  tool calls with permission checks (AlwaysAsk/AllowSession/AlwaysAllow/
  AlwaysDeny); persists server configs to ~/.neurons/mcp_servers.json
  and permissions to ~/.neurons/mcp_permissions.json;
  make_tool_call_cb() returns a ToolCallCb that slots directly into
  generate_internal()'s tool loop from L.1

service/:
- NeuronsServiceImpl gains an McpManager member
- generate_internal() auto-activates McpManager tools when servers are
  connected and the loaded model supports tool use (no explicit caller
  change needed)
- Generate gRPC handler now delegates to generate_internal so it also
  benefits from tool use and avoids duplicate prompt-building

cli/:
- mcp add/remove/list/test subcommands
- mcp sources shared with service via direct include (no new library)
- Tested: add→list→remove round-trip + full protocol test against a
  Python MCP server (initialize handshake + tools/list listing 2 tools)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OUTPUT_NAME in cli/CMakeLists.txt updated to "neurons" so the installed
binary matches the name used in README examples and the project name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ports_tool_use

- Add supports_tool_use to LoadModelResponse (field 7) and StatusResponse (field 10) in proto
- Set resp->set_supports_tool_use(model_->supports_tool_use()) in LoadModel + GetStatus handlers
- Add AppState.supportsToolUse populated from LoadModel response and _applyStatus; cleared on unload
- _ModelRow accepts modelType + supportsToolUse?; heuristic uses modelType when available; C++ value overrides chip after load
- Fix leading-underscore lint on local RegExp vars in inferCapabilities()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builds all C++ targets (compute + CLI + service) on macos-26 arm64.
Caches build/_deps so MLX only compiles from source on first run.
Runs unit tests with integration tests excluded (those need model files).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… builds

model_tests is only added in Debug builds (models/CMakeLists.txt line 65).
The all-tests target unconditionally depended on it, causing a missing-target
error in Release CI. Fix: make all-tests conditional on build type in the
top-level CMakeLists, and switch CI to Debug so both compute_tests and
model_tests are built.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The HttpInterface pure virtual method was added after the mock was written,
breaking Debug builds. Delegates to requestSync — progress is irrelevant
for unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builds all-tests in Debug mode (build-debug/) and runs ctest with the
same flags as CI (integration excluded, 120s timeout). Run this before
pushing to catch test gaps that Release builds skip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… log

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ASSERT_TRUE causes a hard test failure when model files are absent.
GTEST_SKIP marks the test as skipped, which is correct for CI where
~/.neurons/models/ doesn't exist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add GTEST_SKIP guards to the 4 SimpleBpeTokenizer tests that load from
  tinyllama_model_dir without checking existence first
- Add LABELS integration to model_integration_tests so --label-exclude
  integration in ctest actually filters it out in CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dexwritescode dexwritescode merged commit 7c35d62 into main Apr 22, 2026
1 check passed
@dexwritescode dexwritescode deleted the phase-l-mcp branch April 22, 2026 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant