Skip to content

Conversation

@YixinZ-NUS
Copy link

Implements a Rust-based Model Context Protocol (MCP) server for SeekDB vector database, located at examples/mcp/seekdb/ . Built with rmcp 0.14.0 and features client-side embeddings via fastembed (all-MiniLM-L6-v2 model) to reduce the need for api keys and ensure consistency with pyseekdb.

MCP Tools Provided:
create_collection: Create tables with 384-dim HNSW vector index
add_documents: Insert documents with auto-generated embeddings
search_collection: Perform vector similarity search
list_collections: List all tables with vector indexes
collection_info: Get schema, row count, and embedding info

Includes usage guide under examples/mcp/seekdb/README.md and unit testing for validation.

Closes #34

Implements Issue second-state#34: SeekDB Rust MCP Server with:
- rmcp 0.14.0 from crates.io for MCP server
- fastembed for client-side embeddings (all-MiniLM-L6-v2, 384 dims)
- mysql_async for SeekDB connectivity
- Three tools: search_collection, list_collections, collection_info
- Local .gitignore for build artifacts
- README.md: concise usage guide, test instructions (--test-threads=1)
- config.toml: example EchoKit MCP integration
- .env.template: environment variable reference
- Cargo.toml: add repository URL, rust-version, updated description
- Documents alternative AI_EMBED() for DB-side embeddings
- Add Docker lifecycle commands (start, status, logs, stop)
- Document MCP session management with notifications/initialized
- Add step-by-step curl examples with actual responses
- Clarify EchoKit config.toml location
- Explain --test-threads=1 requirement for unit tests
- Update all example outputs from live testing
Environment variables are already documented in README.md.
The .env.template file is redundant since this implementation
uses client-side embeddings without requiring API keys.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a Rust-based MCP server for SeekDB that provides vector-search tools backed by client-side embeddings via fastembed, and wires it into the existing EchoKit MCP integration. The PR adds the SeekDB MCP binary/library, embedding service, DB/config helpers, example EchoKit config, and documentation plus lockfile.

Changes:

  • Add seekdb-mcp-server crate with MCP tool implementations for create_collection, add_documents, search_collection, list_collections, and collection_info, served over rmcp’s streamable HTTP transport.
  • Introduce a client-side EmbeddingService using fastembed (all-MiniLM-L6-v2, 384-dim) with unit tests validating embedding shape, normalization, and semantic similarity.
  • Provide SeekDB MCP-specific configuration (config.toml), README usage guide (including curl-based MCP interaction walkthrough), and standard Rust project scaffolding (Cargo.toml, Cargo.lock, .gitignore).

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
examples/mcp/seekdb/src/main.rs Defines the SeekDbServer MCP handler, all five tools (search, add, create, list, info), error mapping, and the axum-based HTTP server bootstrapping with rmcp’s StreamableHttpService.
examples/mcp/seekdb/src/lib.rs Exposes the config, db, and embeddings modules for reuse by the binary and tests.
examples/mcp/seekdb/src/embeddings.rs Implements EmbeddingService around fastembed for single/batch embedding, formatting for SQL, and includes unit tests confirming embedding dimension, normalization, semantic similarity, and SQL formatting.
examples/mcp/seekdb/src/db.rs Adds a small helper to construct a mysql_async::Pool from ServerConfig for connecting to SeekDB via the MySQL protocol.
examples/mcp/seekdb/src/config.rs Introduces ServerConfig loaded from environment variables (SEEKDB_*), including sensible defaults and required SEEKDB_DATABASE validation.
examples/mcp/seekdb/config.toml Example EchoKit server configuration showing how to register the SeekDB MCP server and wire it into TTS/ASR/LLM, plus a system prompt for using the search tools (currently references some non-existent tool names).
examples/mcp/seekdb/README.md Documents SeekDB MCP server setup (Dockerized SeekDB, building & running the server), MCP usage via curl (session initialization, tool calls), testing strategy, environment variables, architecture, and notes on client-side vs DB-side embeddings.
examples/mcp/seekdb/Cargo.toml Declares the new seekdb-mcp-server crate (edition 2024), its binary target, and dependencies on rmcp, fastembed, mysql_async, axum, tokio, tracing, dotenvy, etc.
examples/mcp/seekdb/Cargo.lock Lockfile capturing the full dependency graph for the new crate (including rmcp, fastembed/ORT, mysql_async, reqwest/ureq, rustls, etc.).
examples/mcp/seekdb/.gitignore Ignores standard Rust build artifacts and the local .fastembed_cache/ directory used for model downloads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +149
let sql = format!(
r#"SELECT id, document, metadata,
COSINE_DISTANCE(embedding, '{}') as distance
FROM {}
ORDER BY distance ASC
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection_name from tool input is interpolated directly into the SQL string in the FROM {} clause (and into the log), which allows a malicious or malformed collection name to break the query or perform SQL injection. Please either validate collection names against a strict identifier pattern and/or quote+escape them as identifiers instead of concatenating raw user input into the query string.

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +236
let sql = format!(
r#"INSERT INTO {} (id, document, embedding, metadata)
VALUES (?, ?, '{}', ?)
ON DUPLICATE KEY UPDATE
document = VALUES(document),
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection_name is concatenated directly into the INSERT INTO {} statement, so a crafted collection name could inject additional SQL or reference unintended tables. This should be hardened by validating the collection name and/or properly quoting and escaping it as an identifier rather than embedding raw user input in the SQL template.

Copilot uses AI. Check for mistakes.
Comment on lines +280 to +284
let sql = format!(
r#"CREATE TABLE IF NOT EXISTS {} (
id VARCHAR(255) PRIMARY KEY,
document TEXT NOT NULL,
embedding VECTOR({}) NOT NULL,
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection_name is directly interpolated into the CREATE TABLE IF NOT EXISTS {} DDL, which makes it possible to inject arbitrary SQL via the tool argument or to fail on unusual table names. Consider enforcing an allowed-character whitelist for collection names and/or quoting+escaping them as identifiers instead of concatenating the raw string.

Copilot uses AI. Check for mistakes.
Comment on lines +362 to +372
// Get row count
let count_sql = format!("SELECT COUNT(*) FROM {}", collection_name);
let rows: Vec<Row> = conn.query(&count_sql).await.map_err(|e| {
error!("Failed to get row count: {}", e);
McpError::internal_error(format!("Query failed: {}", e), None)
})?;
let count: i64 = rows.first().and_then(|r| r.get(0)).unwrap_or(0);

// Get column info
let schema_sql = format!(
"SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{}' AND TABLE_SCHEMA = DATABASE()",
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In collection_info, collection_name is interpolated directly into both SELECT COUNT(*) FROM {} and the TABLE_NAME = '{}' filter, which allows SQL injection or malformed queries if a caller passes a crafted name. As with the other tools, this should instead validate and safely quote/escape the identifier, or otherwise ensure only valid collection names can reach these format strings.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +45
You are a helpful voice assistant with access to a knowledge base through the SeekDB search tools.

When users ask questions that might be answered by the knowledge base:
1. Use the `query_collection` tool to search for relevant information
2. Use the `hybrid_search` tool for complex queries that need both keyword and semantic matching
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system prompt mentions query_collection and hybrid_search tools, but this MCP server only exposes create_collection, add_documents, search_collection, list_collections, and collection_info. This mismatch can confuse clients and LLM behavior; please update the prompt text to reference the actual tool names and behaviors provided by this server.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Implement SeekDB Search MCP Server

1 participant