code-scale-mcp indexes source code from local folders and GitHub repositories via tree-sitter AST parsing. This document describes the security controls that protect against common risks when handling arbitrary codebases.
All user-supplied paths are validated before any file is read or written.
security.ValidatePath(root, target)resolves both paths to absolute form viafilepath.Abs+filepath.Cleanand verifies the target is a descendant ofrootusingfilepath.Rel.- Applied during file discovery and again before each file read (defense in depth).
- Paths such as
../../etc/passwdor absolute paths outside the repository root are rejected.
Symlinks can be used to escape the repository root and read arbitrary files.
- Default: symlinks are checked during file discovery via
os.Lstat. security.IsSymlinkEscape(root, path)resolves the symlink target withfilepath.EvalSymlinksand validates it against the repository root. Escaping symlinks are rejected.- If a symlink cannot be resolved, it is treated as an escape (fail-closed).
Indexing is restricted to safe directory trees.
CODE_SCALE_ALLOWED_ROOTS— when set, only paths under the specified colon-separated roots (;on Windows) can be indexed. All other paths are denied.- Default deny list — when
CODE_SCALE_ALLOWED_ROOTSis not set, known system directories are denied:/etc,/usr,/var,/root,/bin,/sbin,/boot,/dev,/proc,/sys,/lib,/lib64,/System,/Library,/private,C:\Windows,C:\Program Files,C:\Program Files (x86). - The filesystem root (
/orC:\) is always denied. security.IsAllowedRootPath(absPath)enforces these checks before any indexing or watching operation begins.
Files are filtered through multiple layers during discovery:
- SkipPatterns — directories always excluded:
node_modules,vendor,.git,build,dist,__pycache__,.tox,.mypy_cache,.pytest_cache,.venv,venv,.eggs,target. - SkipFiles — files always excluded:
package-lock.json,yarn.lock,pnpm-lock.yaml,poetry.lock,Cargo.lock,go.sum,composer.lock, and minified files (*.min.js,*.min.css). .gitignore— respected for GitHub repositories via the GitHub API.extra_ignore_patterns— user-configurable additional ignore patterns passed to theindex_foldertool.
Files matching known secret patterns are excluded during indexing.
Excluded patterns include:
- Environment files:
.env,.env.*,*.env - Certificates / keys:
*.pem,*.key,*.p12,*.pfx,*.keystore,*.jks - SSH keys:
id_rsa,id_rsa.*,id_ed25519,id_ed25519.*,id_dsa,id_ecdsa - Credentials:
credentials.json,service-account*.json,*.credentials - Auth files:
.htpasswd,.netrc,.npmrc,.pypirc - Generic secret indicators:
*secret*,*.secrets,*.token
When a secret file is detected, it is excluded with reason secret_file. Secret files are never stored in the index or cached content.
- Default maximum: 500 KB per file (
DefaultMaxFileSize). - Files exceeding the limit are skipped during discovery with reason
file_too_large.
Binary files are excluded using a two-stage check:
- Extension-based detection — common binary extensions including executables (
.exe,.dll,.so,.dylib), archives (.zip,.tar,.gz), images (.png,.jpg,.gif), media (.mp3,.mp4), documents (.pdf,.docx), compiled bytecode (.pyc,.class,.wasm), databases (.db,.sqlite), fonts (.ttf,.woff), and more. - Content-based detection — files containing null bytes within the first 8 KB are treated as binary and skipped, even if the extension suggests source code.
security.SafeDecode(data)decodes bytes to string, replacing invalid UTF-8 sequences with the Unicode replacement character (U+FFFD) instead of panicking or producing garbled output.- All file content reads go through safe decoding to ensure robust handling of mixed-encoding codebases.
- Index storage defaults to
~/.code-index/. - The storage path can be overridden using the
CODE_INDEX_PATHenvironment variable. - Repository identifiers are validated via
security.SafeRepoComponent()— only alphanumeric characters, dots, hyphens, and underscores are allowed. Path separators, empty strings,., and..are rejected, preventing path injection in storage locations. - Index data is stored in SQLite with FTS5 for full-text search.
When running in SSE/HTTP mode, authentication can be enforced:
CODE_SCALE_AUTH_TOKEN— when set, all SSE requests must include anAuthorization: Bearer <token>header. Requests without a valid token receive401 Unauthorized.- When unset, SSE mode runs without authentication and a warning is logged at startup.
- stdio transport (default) does not require authentication as it communicates directly with the parent process.
| Control | Location | Default |
|---|---|---|
| Path traversal validation | security.ValidatePath() |
Always enabled |
| Symlink escape protection | security.IsSymlinkEscape() |
Always enabled (fail-closed) |
| Allowed root enforcement | security.IsAllowedRootPath() |
System dirs denied by default |
| Secret file exclusion | security.IsSecretFile() |
Always enabled |
| Binary file detection | security.IsBinaryFile() |
Always enabled |
| File size limit | security.ShouldExcludeFile() |
500 KB |
| Directory skip patterns | security.ShouldSkipDir() |
Always enabled |
| File skip patterns | security.ShouldSkipFile() |
Always enabled |
| UTF-8 safe decode | security.SafeDecode() |
Always enabled |
| Repo component validation | security.SafeRepoComponent() |
Always enabled |
| SSE bearer authentication | authMiddleware() |
Opt-in via CODE_SCALE_AUTH_TOKEN |
.gitignore respect |
GitHub indexing pipeline | Enabled for GitHub repos |