code-scale-mcp

A high-performance MCP server that indexes codebases via tree-sitter AST parsing and provides token-efficient symbol retrieval. Reduces agent token costs by up to 99% by letting agents retrieve individual symbols instead of whole files.

Inspiration and thanks to jcodemunch-mcp — awesome work that inspired me to build my version of his original tool!

Go — single binary, true parallelism, SQLite storage, 13 languages.

The Problem

AI coding agents (like Claude Code) are token-hungry when navigating codebases. Every time an agent needs to understand a function, it reads the entire file — even if it only needs 10 lines out of 500. This means:

Massive token waste: A single function might be ~50 tokens, but reading the whole file costs ~2,000 tokens. Across a debugging session touching dozens of files, this adds up fast.
Context window pollution: Agents fill their limited context window with irrelevant code, leaving less room for actual reasoning.
Higher cost: More tokens = more API spend. For enterprise teams running agents at scale, this becomes a real budget problem.

The Solution

code-scale-mcp is an MCP server that indexes codebases at the symbol level using tree-sitter AST parsing. Instead of reading whole files, agents can:

Index once: parse the entire codebase into a SQLite database of individual symbols (functions, classes, methods, constants)
Search precisely: find symbols by name, type, or full-text content with 3-layer search (BM25 → substring → fuzzy)
Retrieve surgically: fetch just the one function they need, with byte-level precision

The result: up to 99% token reduction per retrieval, without losing any code comprehension quality. The agent gets exactly the code it needs, nothing more.

Key Design Choices

Go single binary: no Python/pip dependency chain, millisecond startup, true multi-core parallelism
13 languages supported via tree-sitter grammars
File watching: auto-reindexes as you edit, keeping the index fresh
Token savings tracking: every response reports how many tokens were saved and cost avoided

In short: it turns "read the whole file to find one function" into "fetch exactly that function", making AI coding agents faster, cheaper, and more effective at scale.

Features

13 languages: Python, JavaScript, TypeScript, Go, Rust, Java, PHP, C, C++, Ruby, Kotlin, Swift, Lua
15 MCP tools: Index repos/folders, search symbols, retrieve source code, file watching, batch operations
SQLite + FTS5: Structured queries and full-text search, no file limits
Parallel parsing: True multi-core via goroutines (3-5x faster than Python)
Single binary: No Python/pip dependency chain, millisecond startup
Dual transport: stdio (default) for CLI tools, SSE/HTTP for web clients
File watching: Auto-reindex on file changes via fsnotify
9-layer security: Path traversal prevention, secret detection, binary filtering, gitignore respect
3-tier summarization: Docstring extraction → AI (Claude Haiku / Gemini Flash) → signature fallback
3-layer search: FTS5 BM25 ranking → substring scoring → fuzzy Levenshtein matching
Smart truncation: 60/40 head/tail truncation for large symbols, preserving errors at output end
Batch operations: Execute multiple tool calls in a single MCP request
Progressive throttling: Per-tool rate limiting nudges agents toward efficient batch usage
Stale cleanup: Auto-removes orphaned indexes on startup
Token savings tracking: Reports tokens saved and cost avoided per request

Requirements

Go 1.24+ (for building from source)
CGo-capable toolchain (tree-sitter requires CGo)
Optional: ANTHROPIC_API_KEY or GEMINI_API_KEY for AI-powered summaries
Optional: GITHUB_TOKEN for indexing private GitHub repos

Environment Variables

Variable	Default	Description
`GITHUB_TOKEN`	—	GitHub personal access token for private repos and higher rate limits
`ANTHROPIC_API_KEY`	—	Anthropic API key for AI-powered symbol summaries
`GEMINI_API_KEY`	—	Google Gemini API key for AI-powered symbol summaries
`CODE_INDEX_PATH`	`~/.code-index`	Directory for SQLite database and cached content files
`CODE_SCALE_ALLOWED_ROOTS`	—	Colon-separated (`;` on Windows) list of allowed root directories for indexing/watching. When set, only paths under these roots can be indexed. When unset, system directories (`/etc`, `/usr`, `/var`, `/root`, etc.) are denied by default.
`CODE_SCALE_AUTH_TOKEN`	—	Bearer token for SSE transport authentication. When set, all SSE requests must include `Authorization: Bearer <token>`. When unset, SSE mode runs without authentication (a warning is logged).

Installation

From source

go install github.com/syphon1c/code-scale-mcp/cmd/code-scale-mcp@latest

Build from repo

git clone https://github.com/syphon1c/code-scale-mcp
cd code-scale-mcp
make build

Pre-built binaries

Download from Releases for your platform.

Getting Started (3 steps)

1. Build and place the binary

make build
mkdir -p ~/Desktop/mcp_servers/code-scale-mcp
cp bin/code-scale-mcp ~/Desktop/mcp_servers/code-scale-mcp/

2. Add .mcp.json to your project root

cat > /path/to/your/project/.mcp.json << 'EOF'
{
  "mcpServers": {
    "code-scale": {
      "command": "~/Desktop/mcp_servers/code-scale-mcp/code-scale-mcp",
      "args": []
    }
  }
}
EOF

3. Start a new Claude Code session and index

> Use index_folder to index this project

That's it. The MCP server starts automatically when Claude Code detects .mcp.json. You'll be prompted to approve it on first use.

Tip: After indexing, enable file watching so the index stays up-to-date as you edit code:
> Use watch_folder to watch this project for changes
Without this, you'll need to re-run index_folder after every code change to update the index.

Note: You must restart your Claude Code session after adding or modifying .mcp.json. The .mcp.json is per-project — add it to each project you want to use code-scale with. Alternatively, copy the binary to /usr/local/bin/ and use just "command": "code-scale-mcp".

Usage

Try it out: See PROMPTS.md for ready-to-use test prompts covering every tool — great for verifying your setup or exploring what code-scale-mcp can do.

Once indexed, use symbol-level retrieval instead of reading entire files:

get_repo_outline  → high-level structure (dirs, languages, symbol counts)
get_file_outline  → all symbols in a file without reading the whole file
search_symbols    → find functions/classes by name (FTS5 BM25 → substring → fuzzy)
get_symbol        → fetch just one function's source code (~50 tokens vs ~2,000)
search_text       → full-text search with contextual snippet windows
batch_execute     → combine multiple operations into a single call

Every response includes _meta with tokens_saved, total_tokens_saved, and cost_avoided so you can track the savings.

Skill Installation (Recommended)

The SKILL.md file teaches your AI agent when and how to use code-scale-mcp's tools effectively. Without it, the agent has the tools available but won't know the optimal workflow (index → explore → search → retrieve).

Claude Code (Cowork)

Copy the skill into your project's .claude/skills/ directory:

mkdir -p .claude/skills/code-scale-mcp
cp /path/to/code-scale-mcp/SKILL.md .claude/skills/code-scale-mcp/SKILL.md

Or use the pre-packaged zip:

unzip code-scale-mcp_cowork.zip -d .claude/skills/

The skill will be automatically loaded by Claude Code on the next session.

Cursor / Windsurf

Add the SKILL.md contents to your project's rules file:

# Cursor
cat /path/to/code-scale-mcp/SKILL.md >> .cursor/rules/code-scale-mcp.md

# Windsurf
cat /path/to/code-scale-mcp/SKILL.md >> .windsurfrules

Other agents

For any agent that supports system prompts or project-level instructions, include the contents of SKILL.md in your agent's configuration. The key information it provides:

When to trigger symbol retrieval vs. reading whole files
The correct workflow order (index → explore → search → retrieve)
Tool reference with required/optional arguments
Symbol ID format for get_symbol calls

SSE/HTTP mode

# Without authentication (warning logged)
code-scale-mcp --transport=sse --port=8080

# With authentication (recommended)
CODE_SCALE_AUTH_TOKEN=your-secret-token code-scale-mcp --transport=sse --port=8080

CLI flags

Flag	Default	Description
`--transport`	`stdio`	Transport type: `stdio` or `sse`
`--port`	`8080`	Port for SSE transport
`--version`		Show version and exit

MCP Tools

Tool	Description
`index_repo`	Index a GitHub repository for symbol retrieval
`index_folder`	Index a local folder for symbol retrieval
`list_repos`	List all indexed repositories
`get_file_tree`	Get file structure of an indexed repo
`get_file_outline`	Get hierarchical symbol outline for a file (supports flat mode)
`get_symbol`	Retrieve full source code of a symbol by ID
`get_symbols`	Batch retrieve source code for multiple symbols
`search_symbols`	Search symbols with 3-layer fallback (FTS5 BM25, substring, fuzzy)
`search_text`	Full-text search with optional snippet context windows
`batch_execute`	Execute multiple operations in a single call (max 10)
`get_repo_outline`	High-level overview of an indexed repo
`invalidate_cache`	Delete index and cached files for a repo
`watch_folder`	Start watching a folder for auto-reindex
`unwatch_folder`	Stop watching a folder
`list_watches`	List active folder watches

Supported Languages

Language	Extensions	Symbols Extracted
Python	`.py`	functions, classes, methods, constants, decorators
JavaScript	`.js`, `.jsx`	functions, classes, methods, constants
TypeScript	`.ts`, `.tsx`	functions, classes, methods, interfaces, enums, types
Go	`.go`	functions, methods, types, constants
Rust	`.rs`	functions, structs, enums, traits, impls, types
Java	`.java`	methods, constructors, classes, interfaces, enums
PHP	`.php`	functions, classes, methods, interfaces, traits, enums
C	`.c`, `.h`	functions, structs, enums, typedefs
C++	`.cpp`, `.cc`, `.cxx`, `.hpp`, `.hh`	functions, classes, structs, enums, namespaces
Ruby	`.rb`	methods, classes, modules
Kotlin	`.kt`, `.kts`	functions, classes, objects, interfaces
Swift	`.swift`	functions, classes, structs, protocols, enums
Lua	`.lua`	functions

Architecture

code-scale-mcp/
├── cmd/code-scale-mcp/main.go     # Entry point, CLI flags, transport selection
├── internal/
│   ├── parser/                     # Tree-sitter AST parsing (13 languages)
│   ├── security/                   # 9-layer security filtering
│   ├── storage/                    # SQLite + FTS5 index store
│   ├── summarizer/                 # 3-tier symbol summarization
│   ├── github/                     # GitHub API client
│   ├── watcher/                    # fsnotify file watcher
│   ├── server/                     # MCP server setup + tool registration
│   ├── tools/                      # 15 MCP tool handlers
│   ├── snippet/                    # Context window extraction for search results
│   ├── truncate/                   # Smart 60/40 head/tail output truncation
│   ├── search/                     # Fuzzy Levenshtein matching
│   └── ratelimit/                  # Progressive per-tool throttling
└── testdata/                       # Test fixtures for all 13 languages

Development

make build     # Build binary
make fmt       # Run fmt
make test      # Run tests
make lint      # Run linter
make clean     # Clean build artifacts

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
cmd/code-scale-mcp		cmd/code-scale-mcp
internal		internal
testdata		testdata
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
Makefile		Makefile
PROMPTS.md		PROMPTS.md
README.md		README.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
SPEC.md		SPEC.md
USER_GUIDE.md		USER_GUIDE.md
code-scale-flow.excalidraw		code-scale-flow.excalidraw
code-scale-flow.png		code-scale-flow.png
code-scale-mcp-architecture.excalidraw		code-scale-mcp-architecture.excalidraw
code-scale-mcp-architecture.png		code-scale-mcp-architecture.png
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

code-scale-mcp

The Problem

The Solution

Key Design Choices

Features

Requirements

Environment Variables

Installation

From source

Build from repo

Pre-built binaries

Getting Started (3 steps)

Usage

Skill Installation (Recommended)

SSE/HTTP mode

CLI flags

MCP Tools

Supported Languages

Architecture

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

code-scale-mcp

The Problem

The Solution

Key Design Choices

Features

Requirements

Environment Variables

Installation

From source

Build from repo

Pre-built binaries

Getting Started (3 steps)

Usage

Skill Installation (Recommended)

SSE/HTTP mode

CLI flags

MCP Tools

Supported Languages

Architecture

Development

License

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages