A high-performance MCP server that indexes codebases via tree-sitter AST parsing and provides token-efficient symbol retrieval. Reduces agent token costs by up to 99% by letting agents retrieve individual symbols instead of whole files.
Inspiration and thanks to jcodemunch-mcp — awesome work that inspired me to build my version of his original tool!
Go — single binary, true parallelism, SQLite storage, 13 languages.
AI coding agents (like Claude Code) are token-hungry when navigating codebases. Every time an agent needs to understand a function, it reads the entire file — even if it only needs 10 lines out of 500. This means:
- Massive token waste: A single function might be ~50 tokens, but reading the whole file costs ~2,000 tokens. Across a debugging session touching dozens of files, this adds up fast.
- Context window pollution: Agents fill their limited context window with irrelevant code, leaving less room for actual reasoning.
- Higher cost: More tokens = more API spend. For enterprise teams running agents at scale, this becomes a real budget problem.
code-scale-mcp is an MCP server that indexes codebases at the symbol level using tree-sitter AST parsing. Instead of reading whole files, agents can:
- Index once: parse the entire codebase into a SQLite database of individual symbols (functions, classes, methods, constants)
- Search precisely: find symbols by name, type, or full-text content with 3-layer search (BM25 → substring → fuzzy)
- Retrieve surgically: fetch just the one function they need, with byte-level precision
The result: up to 99% token reduction per retrieval, without losing any code comprehension quality. The agent gets exactly the code it needs, nothing more.
- Go single binary: no Python/pip dependency chain, millisecond startup, true multi-core parallelism
- 13 languages supported via tree-sitter grammars
- File watching: auto-reindexes as you edit, keeping the index fresh
- Token savings tracking: every response reports how many tokens were saved and cost avoided
In short: it turns "read the whole file to find one function" into "fetch exactly that function", making AI coding agents faster, cheaper, and more effective at scale.
- 13 languages: Python, JavaScript, TypeScript, Go, Rust, Java, PHP, C, C++, Ruby, Kotlin, Swift, Lua
- 15 MCP tools: Index repos/folders, search symbols, retrieve source code, file watching, batch operations
- SQLite + FTS5: Structured queries and full-text search, no file limits
- Parallel parsing: True multi-core via goroutines (3-5x faster than Python)
- Single binary: No Python/pip dependency chain, millisecond startup
- Dual transport: stdio (default) for CLI tools, SSE/HTTP for web clients
- File watching: Auto-reindex on file changes via fsnotify
- 9-layer security: Path traversal prevention, secret detection, binary filtering, gitignore respect
- 3-tier summarization: Docstring extraction → AI (Claude Haiku / Gemini Flash) → signature fallback
- 3-layer search: FTS5 BM25 ranking → substring scoring → fuzzy Levenshtein matching
- Smart truncation: 60/40 head/tail truncation for large symbols, preserving errors at output end
- Batch operations: Execute multiple tool calls in a single MCP request
- Progressive throttling: Per-tool rate limiting nudges agents toward efficient batch usage
- Stale cleanup: Auto-removes orphaned indexes on startup
- Token savings tracking: Reports tokens saved and cost avoided per request
- Go 1.24+ (for building from source)
- CGo-capable toolchain (tree-sitter requires CGo)
- Optional:
ANTHROPIC_API_KEYorGEMINI_API_KEYfor AI-powered summaries - Optional:
GITHUB_TOKENfor indexing private GitHub repos
| Variable | Default | Description |
|---|---|---|
GITHUB_TOKEN |
— | GitHub personal access token for private repos and higher rate limits |
ANTHROPIC_API_KEY |
— | Anthropic API key for AI-powered symbol summaries |
GEMINI_API_KEY |
— | Google Gemini API key for AI-powered symbol summaries |
CODE_INDEX_PATH |
~/.code-index |
Directory for SQLite database and cached content files |
CODE_SCALE_ALLOWED_ROOTS |
— | Colon-separated (; on Windows) list of allowed root directories for indexing/watching. When set, only paths under these roots can be indexed. When unset, system directories (/etc, /usr, /var, /root, etc.) are denied by default. |
CODE_SCALE_AUTH_TOKEN |
— | Bearer token for SSE transport authentication. When set, all SSE requests must include Authorization: Bearer <token>. When unset, SSE mode runs without authentication (a warning is logged). |
go install github.com/syphon1c/code-scale-mcp/cmd/code-scale-mcp@latestgit clone https://github.com/syphon1c/code-scale-mcp
cd code-scale-mcp
make buildDownload from Releases for your platform.
1. Build and place the binary
make build
mkdir -p ~/Desktop/mcp_servers/code-scale-mcp
cp bin/code-scale-mcp ~/Desktop/mcp_servers/code-scale-mcp/2. Add .mcp.json to your project root
cat > /path/to/your/project/.mcp.json << 'EOF'
{
"mcpServers": {
"code-scale": {
"command": "~/Desktop/mcp_servers/code-scale-mcp/code-scale-mcp",
"args": []
}
}
}
EOF3. Start a new Claude Code session and index
> Use index_folder to index this project
That's it. The MCP server starts automatically when Claude Code detects .mcp.json. You'll be prompted to approve it on first use.
Tip: After indexing, enable file watching so the index stays up-to-date as you edit code:
> Use watch_folder to watch this project for changesWithout this, you'll need to re-run
index_folderafter every code change to update the index.
Note: You must restart your Claude Code session after adding or modifying
.mcp.json. The.mcp.jsonis per-project — add it to each project you want to use code-scale with. Alternatively, copy the binary to/usr/local/bin/and use just"command": "code-scale-mcp".
Try it out: See PROMPTS.md for ready-to-use test prompts covering every tool — great for verifying your setup or exploring what code-scale-mcp can do.
Once indexed, use symbol-level retrieval instead of reading entire files:
get_repo_outline → high-level structure (dirs, languages, symbol counts)
get_file_outline → all symbols in a file without reading the whole file
search_symbols → find functions/classes by name (FTS5 BM25 → substring → fuzzy)
get_symbol → fetch just one function's source code (~50 tokens vs ~2,000)
search_text → full-text search with contextual snippet windows
batch_execute → combine multiple operations into a single call
Every response includes _meta with tokens_saved, total_tokens_saved, and cost_avoided so you can track the savings.
The SKILL.md file teaches your AI agent when and how to use code-scale-mcp's tools effectively. Without it, the agent has the tools available but won't know the optimal workflow (index → explore → search → retrieve).
Claude Code (Cowork)
Copy the skill into your project's .claude/skills/ directory:
mkdir -p .claude/skills/code-scale-mcp
cp /path/to/code-scale-mcp/SKILL.md .claude/skills/code-scale-mcp/SKILL.mdOr use the pre-packaged zip:
unzip code-scale-mcp_cowork.zip -d .claude/skills/The skill will be automatically loaded by Claude Code on the next session.
Cursor / Windsurf
Add the SKILL.md contents to your project's rules file:
# Cursor
cat /path/to/code-scale-mcp/SKILL.md >> .cursor/rules/code-scale-mcp.md
# Windsurf
cat /path/to/code-scale-mcp/SKILL.md >> .windsurfrulesOther agents
For any agent that supports system prompts or project-level instructions, include the contents of SKILL.md in your agent's configuration. The key information it provides:
- When to trigger symbol retrieval vs. reading whole files
- The correct workflow order (index → explore → search → retrieve)
- Tool reference with required/optional arguments
- Symbol ID format for
get_symbolcalls
# Without authentication (warning logged)
code-scale-mcp --transport=sse --port=8080
# With authentication (recommended)
CODE_SCALE_AUTH_TOKEN=your-secret-token code-scale-mcp --transport=sse --port=8080| Flag | Default | Description |
|---|---|---|
--transport |
stdio |
Transport type: stdio or sse |
--port |
8080 |
Port for SSE transport |
--version |
Show version and exit |
| Tool | Description |
|---|---|
index_repo |
Index a GitHub repository for symbol retrieval |
index_folder |
Index a local folder for symbol retrieval |
list_repos |
List all indexed repositories |
get_file_tree |
Get file structure of an indexed repo |
get_file_outline |
Get hierarchical symbol outline for a file (supports flat mode) |
get_symbol |
Retrieve full source code of a symbol by ID |
get_symbols |
Batch retrieve source code for multiple symbols |
search_symbols |
Search symbols with 3-layer fallback (FTS5 BM25, substring, fuzzy) |
search_text |
Full-text search with optional snippet context windows |
batch_execute |
Execute multiple operations in a single call (max 10) |
get_repo_outline |
High-level overview of an indexed repo |
invalidate_cache |
Delete index and cached files for a repo |
watch_folder |
Start watching a folder for auto-reindex |
unwatch_folder |
Stop watching a folder |
list_watches |
List active folder watches |
| Language | Extensions | Symbols Extracted |
|---|---|---|
| Python | .py |
functions, classes, methods, constants, decorators |
| JavaScript | .js, .jsx |
functions, classes, methods, constants |
| TypeScript | .ts, .tsx |
functions, classes, methods, interfaces, enums, types |
| Go | .go |
functions, methods, types, constants |
| Rust | .rs |
functions, structs, enums, traits, impls, types |
| Java | .java |
methods, constructors, classes, interfaces, enums |
| PHP | .php |
functions, classes, methods, interfaces, traits, enums |
| C | .c, .h |
functions, structs, enums, typedefs |
| C++ | .cpp, .cc, .cxx, .hpp, .hh |
functions, classes, structs, enums, namespaces |
| Ruby | .rb |
methods, classes, modules |
| Kotlin | .kt, .kts |
functions, classes, objects, interfaces |
| Swift | .swift |
functions, classes, structs, protocols, enums |
| Lua | .lua |
functions |
code-scale-mcp/
├── cmd/code-scale-mcp/main.go # Entry point, CLI flags, transport selection
├── internal/
│ ├── parser/ # Tree-sitter AST parsing (13 languages)
│ ├── security/ # 9-layer security filtering
│ ├── storage/ # SQLite + FTS5 index store
│ ├── summarizer/ # 3-tier symbol summarization
│ ├── github/ # GitHub API client
│ ├── watcher/ # fsnotify file watcher
│ ├── server/ # MCP server setup + tool registration
│ ├── tools/ # 15 MCP tool handlers
│ ├── snippet/ # Context window extraction for search results
│ ├── truncate/ # Smart 60/40 head/tail output truncation
│ ├── search/ # Fuzzy Levenshtein matching
│ └── ratelimit/ # Progressive per-tool throttling
└── testdata/ # Test fixtures for all 13 languages
make build # Build binary
make fmt # Run fmt
make test # Run tests
make lint # Run linter
make clean # Clean build artifactsMIT