Codebaxing

MCP server for semantic code search. Index your codebase once, then search using natural language.

How It Works

Your Code → Tree-sitter Parser → Symbols → Embedding Model → Vectors → ChromaDB
                                                                           ↓
"find auth logic" → Embedding → Query Vector → Similarity Search → Results

Traditional search matches exact text. Codebaxing understands meaning:

Query	Finds (even without exact match)
"authentication"	login(), validateCredentials(), authMiddleware()
"database connection"	connectDB(), prismaClient, repository.query()

Quick Start

1. Start ChromaDB

docker run -d -p 8000:8000 --name chromadb chromadb/chroma

2. Index Your Codebase (CLI)

npx codebaxing@latest index /path/to/your/project

This creates a .codebaxing/ folder with the index. Only needs to be done once per project.

Performance note: Local embedding is slow (~4 min for ~4,000 files). For faster indexing, use Gemini embedding (free) — see Cloud Embedding below.

3. Install MCP Server for AI Editors

npx codebaxing install              # Claude Desktop
npx codebaxing install --cursor     # Cursor
npx codebaxing install --windsurf   # Windsurf
npx codebaxing install --all        # All editors

Restart your editor. Now you can ask: "Find the authentication logic"

CLI Commands

Command	Description
`npx codebaxing@latest index <path>`	Index a codebase (required first)
`npx codebaxing search <query>`	Search indexed code
`npx codebaxing stats [path]`	Show index statistics
`npx codebaxing clean [path]`	Remove index (reset)
`npx codebaxing install [--editor]`	Install MCP server
`npx codebaxing uninstall [--editor]`	Uninstall MCP server

Tip: Use @latest for index to ensure you have the newest version.

Search Options

npx codebaxing search "auth middleware" --path ./src --limit 10

--path, -p - Codebase path (default: current directory)
--limit, -n - Number of results (default: 5)

MCP Tools (for AI Agents)

After installing, AI agents can use these tools:

Tool	Description
`search`	Semantic code search
`stats`	Index statistics
`languages`	Supported file extensions
`remember`	Store project memory
`recall`	Retrieve memories
`forget`	Delete memories

Note: The index tool is disabled for AI agents. Use CLI: npx codebaxing@latest index <path>

Configuration

Cloud Embedding (Fastest)

Local embedding runs on CPU and can be slow for large codebases (~4 min for ~4,000 files). Cloud embedding is ~25x faster and recommended for any project with 1,000+ files.

# Gemini (FREE - recommended, 1500 RPM free tier)
CODEBAXING_EMBEDDING_PROVIDER=gemini GEMINI_API_KEY=... npx codebaxing@latest index /path

# OpenAI (text-embedding-3-small, 384 dims)
CODEBAXING_EMBEDDING_PROVIDER=openai OPENAI_API_KEY=sk-... npx codebaxing@latest index /path

# Voyage (voyage-code-3, 1024 dims, code-optimized)
CODEBAXING_EMBEDDING_PROVIDER=voyage VOYAGE_API_KEY=va-... npx codebaxing@latest index /path

Provider	Model	Speed	Cost
Gemini	`text-embedding-004` (768 dims)	~10,000 texts/sec	Free (1500 RPM)
OpenAI	`text-embedding-3-small` (384 dims)	~10,000 texts/sec	~$0.02 / 1M tokens
Voyage	`voyage-code-3` (1024 dims)	~10,000 texts/sec	~$0.06 / 1M tokens
Local	`all-MiniLM-L6-v2` (384 dims)	~200 texts/sec	Free (CPU)

Note: Switching between providers requires full re-index (npx codebaxing@latest index <path>) due to dimension differences.

Environment Variables

Variable	Description	Default
`CHROMADB_URL`	ChromaDB server URL	`http://localhost:8000`
`CODEBAXING_EMBEDDING_PROVIDER`	Embedding backend: `local`, `gemini`, `openai`, `voyage`	`local`
`CODEBAXING_DEVICE`	Compute device (local only): `cpu`, `cuda`	`cpu`
`CODEBAXING_DTYPE`	Model quantization (local only): `fp32`, `fp16`, `q8`, `q4`	`q8`
`CODEBAXING_WORKERS`	Worker threads for parallel embedding (local only, 0=off)	`2`
`CODEBAXING_MAX_FILE_SIZE`	Max file size in MB	`1`
`CODEBAXING_MAX_CHUNKS`	Max chunks to index	`500000`
`CODEBAXING_FILES_PER_BATCH`	Files per batch (lower = less RAM)	`100`
`CODEBAXING_PARALLEL_BATCHES`	Concurrent batches	`3`
`CODEBAXING_METADATA_SAVE_INTERVAL`	Save progress every N batches	`10`
`CODEBAXING_MODEL_CACHE`	Model cache directory (local only)	`~/.cache/codebaxing/models`
`CODEBAXING_OPENAI_API_KEY`	OpenAI API key (or use `OPENAI_API_KEY`)	-
`CODEBAXING_VOYAGE_API_KEY`	Voyage API key (or use `VOYAGE_API_KEY`)	-
`CODEBAXING_GEMINI_API_KEY`	Gemini API key (or use `GEMINI_API_KEY`)	-
`CODEBAXING_EMBEDDING_MODEL`	Override embedding model name	per-provider default
`CODEBAXING_EMBEDDING_DIMENSIONS`	Override embedding dimensions	per-provider default
`CODEBAXING_EMBEDDING_BASE_URL`	Custom API endpoint for cloud providers	provider default

Manual Editor Config

Claude Desktop

~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "codebaxing": {
      "command": "npx",
      "args": ["-y", "codebaxing"],
      "env": { "CHROMADB_URL": "http://localhost:8000" }
    }
  }
}

Cursor

~/.cursor/mcp.json

{
  "mcpServers": {
    "codebaxing": {
      "command": "npx",
      "args": ["-y", "codebaxing"],
      "env": { "CHROMADB_URL": "http://localhost:8000" }
    }
  }
}

Other Editors

Windsurf: ~/.codeium/windsurf/mcp_config.json Zed: ~/.config/zed/settings.json (use context_servers key) VS Code + Continue: ~/.continue/config.json

Supported Languages

Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, C#, Ruby, PHP, Kotlin, Swift, Scala, Lua, Dart, Elixir, Haskell, OCaml, Zig, Perl, Bash, HTML, CSS, Vue, JSON, YAML, TOML, Makefile

Requirements

Node.js >= 20.0.0
Docker (for ChromaDB)
~500MB disk space (embedding model)

Technical Details

Component	Technology
Local Embedding	`all-MiniLM-L6-v2` (384 dims, ONNX, q8 quantized)
Cloud Embedding	Gemini `text-embedding-004` (free), OpenAI, or Voyage
Model Cache	`~/.cache/codebaxing/models/` (local only, downloaded once)
Vector Database	ChromaDB
Code Parser	Tree-sitter (28 languages)
MCP SDK	`@modelcontextprotocol/sdk`

Local mode: The embedding model is downloaded from HuggingFace on first run and cached at ~/.cache/codebaxing/models/. Uses q8 quantization (~3x faster than fp32). No network access after initial download.

Cloud mode: Sends code chunks to OpenAI/Voyage API for embedding. ~25x faster than local CPU. Requires API key.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README.vi.md		README.vi.md
package-lock.json		package-lock.json
package.json		package.json
test-indexing.ts		test-indexing.ts
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebaxing

How It Works

Quick Start

1. Start ChromaDB

2. Index Your Codebase (CLI)

3. Install MCP Server for AI Editors

CLI Commands

Search Options

MCP Tools (for AI Agents)

Configuration

Cloud Embedding (Fastest)

Environment Variables

Manual Editor Config

Supported Languages

Requirements

Technical Details

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Codebaxing

How It Works

Quick Start

1. Start ChromaDB

2. Index Your Codebase (CLI)

3. Install MCP Server for AI Editors

CLI Commands

Search Options

MCP Tools (for AI Agents)

Configuration

Cloud Embedding (Fastest)

Environment Variables

Manual Editor Config

Supported Languages

Requirements

Technical Details

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages