-
Notifications
You must be signed in to change notification settings - Fork 75
feat: add SeekDB MCP server with vector search capabilities (issue #34) #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements Issue second-state#34: SeekDB Rust MCP Server with: - rmcp 0.14.0 from crates.io for MCP server - fastembed for client-side embeddings (all-MiniLM-L6-v2, 384 dims) - mysql_async for SeekDB connectivity - Three tools: search_collection, list_collections, collection_info - Local .gitignore for build artifacts
- README.md: concise usage guide, test instructions (--test-threads=1) - config.toml: example EchoKit MCP integration - .env.template: environment variable reference - Cargo.toml: add repository URL, rust-version, updated description - Documents alternative AI_EMBED() for DB-side embeddings
- Add Docker lifecycle commands (start, status, logs, stop) - Document MCP session management with notifications/initialized - Add step-by-step curl examples with actual responses - Clarify EchoKit config.toml location - Explain --test-threads=1 requirement for unit tests - Update all example outputs from live testing
Environment variables are already documented in README.md. The .env.template file is redundant since this implementation uses client-side embeddings without requiring API keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Implements a Rust-based MCP server for SeekDB that provides vector-search tools backed by client-side embeddings via fastembed, and wires it into the existing EchoKit MCP integration. The PR adds the SeekDB MCP binary/library, embedding service, DB/config helpers, example EchoKit config, and documentation plus lockfile.
Changes:
- Add
seekdb-mcp-servercrate with MCP tool implementations forcreate_collection,add_documents,search_collection,list_collections, andcollection_info, served over rmcp’s streamable HTTP transport. - Introduce a client-side
EmbeddingServiceusingfastembed(all-MiniLM-L6-v2, 384-dim) with unit tests validating embedding shape, normalization, and semantic similarity. - Provide SeekDB MCP-specific configuration (
config.toml), README usage guide (including curl-based MCP interaction walkthrough), and standard Rust project scaffolding (Cargo.toml,Cargo.lock,.gitignore).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
examples/mcp/seekdb/src/main.rs |
Defines the SeekDbServer MCP handler, all five tools (search, add, create, list, info), error mapping, and the axum-based HTTP server bootstrapping with rmcp’s StreamableHttpService. |
examples/mcp/seekdb/src/lib.rs |
Exposes the config, db, and embeddings modules for reuse by the binary and tests. |
examples/mcp/seekdb/src/embeddings.rs |
Implements EmbeddingService around fastembed for single/batch embedding, formatting for SQL, and includes unit tests confirming embedding dimension, normalization, semantic similarity, and SQL formatting. |
examples/mcp/seekdb/src/db.rs |
Adds a small helper to construct a mysql_async::Pool from ServerConfig for connecting to SeekDB via the MySQL protocol. |
examples/mcp/seekdb/src/config.rs |
Introduces ServerConfig loaded from environment variables (SEEKDB_*), including sensible defaults and required SEEKDB_DATABASE validation. |
examples/mcp/seekdb/config.toml |
Example EchoKit server configuration showing how to register the SeekDB MCP server and wire it into TTS/ASR/LLM, plus a system prompt for using the search tools (currently references some non-existent tool names). |
examples/mcp/seekdb/README.md |
Documents SeekDB MCP server setup (Dockerized SeekDB, building & running the server), MCP usage via curl (session initialization, tool calls), testing strategy, environment variables, architecture, and notes on client-side vs DB-side embeddings. |
examples/mcp/seekdb/Cargo.toml |
Declares the new seekdb-mcp-server crate (edition 2024), its binary target, and dependencies on rmcp, fastembed, mysql_async, axum, tokio, tracing, dotenvy, etc. |
examples/mcp/seekdb/Cargo.lock |
Lockfile capturing the full dependency graph for the new crate (including rmcp, fastembed/ORT, mysql_async, reqwest/ureq, rustls, etc.). |
examples/mcp/seekdb/.gitignore |
Ignores standard Rust build artifacts and the local .fastembed_cache/ directory used for model downloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let sql = format!( | ||
| r#"SELECT id, document, metadata, | ||
| COSINE_DISTANCE(embedding, '{}') as distance | ||
| FROM {} | ||
| ORDER BY distance ASC |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collection_name from tool input is interpolated directly into the SQL string in the FROM {} clause (and into the log), which allows a malicious or malformed collection name to break the query or perform SQL injection. Please either validate collection names against a strict identifier pattern and/or quote+escape them as identifiers instead of concatenating raw user input into the query string.
| let sql = format!( | ||
| r#"INSERT INTO {} (id, document, embedding, metadata) | ||
| VALUES (?, ?, '{}', ?) | ||
| ON DUPLICATE KEY UPDATE | ||
| document = VALUES(document), |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collection_name is concatenated directly into the INSERT INTO {} statement, so a crafted collection name could inject additional SQL or reference unintended tables. This should be hardened by validating the collection name and/or properly quoting and escaping it as an identifier rather than embedding raw user input in the SQL template.
| let sql = format!( | ||
| r#"CREATE TABLE IF NOT EXISTS {} ( | ||
| id VARCHAR(255) PRIMARY KEY, | ||
| document TEXT NOT NULL, | ||
| embedding VECTOR({}) NOT NULL, |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collection_name is directly interpolated into the CREATE TABLE IF NOT EXISTS {} DDL, which makes it possible to inject arbitrary SQL via the tool argument or to fail on unusual table names. Consider enforcing an allowed-character whitelist for collection names and/or quoting+escaping them as identifiers instead of concatenating the raw string.
| // Get row count | ||
| let count_sql = format!("SELECT COUNT(*) FROM {}", collection_name); | ||
| let rows: Vec<Row> = conn.query(&count_sql).await.map_err(|e| { | ||
| error!("Failed to get row count: {}", e); | ||
| McpError::internal_error(format!("Query failed: {}", e), None) | ||
| })?; | ||
| let count: i64 = rows.first().and_then(|r| r.get(0)).unwrap_or(0); | ||
|
|
||
| // Get column info | ||
| let schema_sql = format!( | ||
| "SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{}' AND TABLE_SCHEMA = DATABASE()", |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In collection_info, collection_name is interpolated directly into both SELECT COUNT(*) FROM {} and the TABLE_NAME = '{}' filter, which allows SQL injection or malformed queries if a caller passes a crafted name. As with the other tools, this should instead validate and safely quote/escape the identifier, or otherwise ensure only valid collection names can reach these format strings.
| You are a helpful voice assistant with access to a knowledge base through the SeekDB search tools. | ||
|
|
||
| When users ask questions that might be answered by the knowledge base: | ||
| 1. Use the `query_collection` tool to search for relevant information | ||
| 2. Use the `hybrid_search` tool for complex queries that need both keyword and semantic matching |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The system prompt mentions query_collection and hybrid_search tools, but this MCP server only exposes create_collection, add_documents, search_collection, list_collections, and collection_info. This mismatch can confuse clients and LLM behavior; please update the prompt text to reference the actual tool names and behaviors provided by this server.
Implements a Rust-based Model Context Protocol (MCP) server for SeekDB vector database, located at
examples/mcp/seekdb/. Built withrmcp 0.14.0and features client-side embeddings viafastembed(all-MiniLM-L6-v2model) to reduce the need for api keys and ensure consistency withpyseekdb.MCP Tools Provided:
create_collection: Create tables with 384-dim HNSW vector indexadd_documents: Insert documents with auto-generated embeddingssearch_collection: Perform vector similarity searchlist_collections: List all tables with vector indexescollection_info: Get schema, row count, and embedding infoIncludes usage guide under examples/mcp/seekdb/README.md and unit testing for validation.
Closes #34