Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,37 @@ All notable changes to this project are documented here. The format is based on
- **`docs/RELEASE_CHECKLIST.md`**: a repeatable release checklist (version sync,
tests, benchmarks, doctor, install/plugin/MCP smoke, changelog) with signed
checksums + SBOM tracked as future hardening.
- **MCP contract hardening (M11.5)**: every MCP tool payload — success *and* the
no-index/error path — is now wrapped in a stable envelope (`schema_version`: 1,
`tool`: <name>). Golden snapshots lock every tool's output
(`tests/golden/mcp_*.json` via `tests/test_mcp_golden.py`), and the contract
values are asserted explicitly so a golden can't freeze a wrong version. Closes
the long-standing `docs/MCP.md` follow-ups and makes the `schema_version` claim
in `docs/ARCHITECTURE.md` §8 true.
- **Config / IaC language labeling**: Dockerfile, Containerfile, `*.tf`/`*.tfvars`
(terraform), `*.hcl`, `*.ini`/`*.cfg`/`*.conf`/`*.properties` (ini), and
Makefiles now get a real language label. These files were already FTS-indexed as
unknown text; labeling surfaces infra files in `stats` and lets agents scope
searches to config. They stay on the line/FTS floor (no tree-sitter spec).
- **Typed framework edges — design doc**
(`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`): the
documented-first deliverable for the M13 code-intelligence graph
(route→handler→service→model, test→impl, config→consumer, …) with a schema,
confidence/provenance model, resolver architecture, and a benchmark gate.
- **"Trust model in 60 seconds"** callout, identical in `README.md` and
`docs/SECURITY.md`.

### Changed
- **Reranker: dampened the god-class `in_degree` tiebreak** (`retrieval/rerank.py`).
The graph-centrality bonus is now logarithmic with a lower cap instead of linear
(which saturated by in_degree 10, giving 100-caller "god classes" the full bonus
and floating them above genuinely relevant low-degree matches on stray-term ties).
Validated as no-regression on the public benchmark (Recall@k / MRR / nDCG
unchanged) with a targeted regression test; the real-repo gain on the honest Java
misses is tracked under M12.5. CLI/MCP `search` goldens regenerated accordingly.
- **`docs/ROADMAP.md`**: M10 MCP bridge marked shipped (was "planned"); reconciled
the technical-vs-product milestone numbering instead of claiming one is canonical.

- **README**: added "Who Is It For?" and a "How Is This Different?" section that
answers why-not-grep / Cursor / Aider repo-map / Sourcegraph / Codebase-Memory
MCP on the first screen, plus a proven-today-vs-roadmap table.
Expand All @@ -30,6 +59,12 @@ All notable changes to this project are documented here. The format is based on
TODO-friendly benchmark task checklist with a no-overclaim procedure.

### Fixed
- **MCP server failed to import on `mcp>=1.27` + `pydantic>=2.10`**: newer FastMCP
auto-built a structured-output schema from each tool's `-> str` return annotation
and raised `PydanticUserError` at import time, breaking the server and its test
suite. Tools now register as unstructured (`structured_output=False` where the
kwarg exists; older `mcp` is detected and unaffected), preserving the existing
text-content wire contract.
- `docs/FAQ.md`: removed a dangling/duplicated sentence in "Is it
production-ready?" and documented the real `clean` / `clean --all` behavior.

Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,14 @@ Answer with precise file:line citations

## Safety and Privacy

> **Trust model in 60 seconds**
> 1. **Offline by default** — the base install has zero network dependencies; nothing leaves your machine.
> 2. **One opt-in exit, triple-gated** — external embeddings require `allow_external` **and** an env API key **and** a printed endpoint warning, or they are refused.
> 3. **Secrets never get in** — `.env`, keys, certs, and credential files are excluded before parsing (multi-gate ignore pipeline).
> 4. **Secrets never get out** — every snippet is redacted (AWS keys, private keys, JWTs, bearer tokens, connection strings) before it reaches the agent.
> 5. **No telemetry, ever** — no analytics, no phone-home, no usage data.
> 6. **Verify it yourself** — `codebase-index doctor --strict` audits all of the above and exits non-zero in CI on any high-severity finding.

`codebase-index` is designed with privacy as a first principle:

- **No telemetry** — No usage data, analytics, or crash reports are collected or transmitted.
Expand Down
3 changes: 2 additions & 1 deletion docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,8 @@ Current implementation:
- `src/codebase_index/mcp/server.py` is a thin adapter over `retrieval/`, `storage/`, and
`indexer/freshness.py`.
- `codebase-index mcp --root <repo>` runs the stdio server.
- JSON payloads include `schema_version`.
- Every JSON payload (including the error path) carries a `schema_version` + `tool` envelope,
locked by golden snapshots (`tests/golden/mcp_*.json`).
- [MCP.md](MCP.md) provides config templates for Claude Desktop, Claude Code, Cursor, VS Code,
Zed, and Windsurf.
- `healthcheck` lets MCP clients distinguish "server running", "index missing",
Expand Down
8 changes: 6 additions & 2 deletions docs/LANGUAGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
|---|---|---|
| Tier A | Language-specific Tree-sitter `LangSpec` with definition, call, and import/inheritance patterns | Python, JavaScript, TypeScript, Java, Go, Rust, C, C++, C#, Ruby, PHP, Kotlin |
| Tier B | Generic Tree-sitter path when a loadable grammar exists, without language-specific graph semantics | Lua |
| Tier C | Line chunks + FTS5 lexical search only | Markdown, JSON, YAML, TOML, SQL and other text/config files |
| Tier C | Line chunks + FTS5 lexical search only | Markdown, JSON, YAML, TOML, SQL; config/IaC: Dockerfile, Terraform (`.tf`/`.tfvars`), HCL, INI (`.ini`/`.cfg`/`.conf`/`.properties`), Makefiles; and other text/config files |

Tier A is the only tier that should be advertised as symbol-aware. Tier B can
surface useful definitions, but it is intentionally weaker and should be called
Expand Down Expand Up @@ -45,7 +45,11 @@ High-priority code languages:
- Objective-C
- Vue and Svelte component structure

High-priority non-code and framework-aware extraction:
High-priority non-code and framework-aware extraction (config/IaC files are now
**Tier-C labeled** — indexed, language-tagged, and FTS-searchable; the items below
are the deeper *structured* extraction still on the roadmap, and the framework
graph part is designed in
`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`):

- SQL schema-aware parsing: tables, columns, migrations, model/query consumers
- Terraform/HCL: resources, modules, variables, outputs
Expand Down
35 changes: 27 additions & 8 deletions docs/MCP.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,30 +41,45 @@ The MCP server exposes the same retrieval contract as the CLI.

## Output contract

Tool responses are JSON strings returned through MCP content blocks. The
intended stable shape for retrieval responses is:
Tool responses are JSON strings returned through MCP content blocks. **Every**
payload — success or error — is wrapped in a stable envelope so clients can
branch on the contract without sniffing the shape:

```json
{
"schema_version": 1,
"tool": "search_code",
"index": {
"exists": true,
"stale": false,
"built_at": "2026-05-29T12:00:00Z",
"files_changed_since_build": 0
},
"results": [],
"recommended_reads": [],
"warnings": []
"recommended_reads": []
}
```

- `schema_version` (int) — the payload contract version. Bumped only on a
breaking change (field removal or type change); additive fields keep the same
version. The current version is **1**.
- `tool` (string) — the emitting tool name (`search_code`, `find_symbol`,
`find_refs`, `impact_of`, `explain_code`, `index_stats`, `healthcheck`).
- The no-index / error path carries the same envelope plus an `"error"` field.

Rules:

- Additive fields are allowed within a tool output version.
- Field removal or type changes should be treated as a protocol change.
- Additive fields are allowed within a `schema_version`.
- Field removal or type changes bump `schema_version`.
- Tool descriptions should include examples and expected failure modes.
- Errors should fail closed: no partial unsafe result when config or index state is unsafe.

Every tool's enveloped output is locked by golden snapshots in
`tests/golden/mcp_*.json` (regenerate intentionally with
`UPDATE_GOLDEN=1 pytest tests/test_mcp_golden.py`), and the `schema_version` /
`tool` values are asserted explicitly so a golden can never silently freeze a
wrong contract version.

## Client config templates

### Claude Desktop
Expand Down Expand Up @@ -143,8 +158,12 @@ same trust boundaries:
- Done: `healthcheck`, `search_code`, `find_symbol`, `find_refs`, `impact_of`, `explain_code`,
and `index_stats` tools.
- Done: focused tests for tool registration, missing-index behavior, config resolution, and run entrypoint.
- Follow-up: explicit schema/version field in every structured tool payload.
- Follow-up: golden snapshots for every tool output.
- Done: explicit `schema_version` + `tool` envelope on every structured tool payload (including the
error path), asserted by `tests/test_mcp_server.py` and `tests/test_mcp_golden.py`.
- Done: golden snapshots for every tool output (`tests/golden/mcp_*.json`).
- Done: unstructured-output registration (`structured_output=False` where supported) so the server
loads on `mcp>=1.27` + `pydantic>=2.10`, where auto-detecting a structured schema from the `-> str`
return annotation otherwise raises at import time.
- Follow-up: verified client-specific docs for Claude Desktop, Claude Code, Cursor, VS Code, Zed,
and Windsurf.
- Follow-up: paging or progressive result support.
32 changes: 20 additions & 12 deletions docs/PRODUCT_UPGRADE_PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ transparent Python implementation, a strict privacy model, and honest benchmarks
| Weakness | Impact | Plan |
|---|---|---|
| No large-scale real-repo benchmark | Can't claim 100k/1M LOC quality | Benchmark tasks §8; recruit public repos |
| Graph is import/call/ref only | `impact` misses framework wiring | ARCHITECTURE §9 typed-edge roadmap |
| Graph is import/call/ref only | `impact` misses framework wiring | ARCHITECTURE §9 + design doc `specs/2026-06-14-typed-framework-edges-design.md`; implementation behind §8 benchmark |
| GitHub-only distribution | No `pip install codebase-index` / `uvx` | Distribution tasks §9 |
| MCP client docs unverified | Templates may be wrong per client version | Verify against each client, add per-client docs |
| Single-repo only | No monorepo/fleet context | Out of scope near-term; documented as non-goal |
Expand All @@ -101,12 +101,15 @@ transparent Python implementation, a strict privacy model, and honest benchmarks
logs. Highest credibility lever.
2. **Typed framework edges** (route→handler→service→model, test→impl, config→consumer)
with source spans + confidence. Biggest product-quality lever for `impact`.
*Design approved this pass* (`specs/2026-06-14-typed-framework-edges-design.md`);
implementation gated on the §8 graph benchmark.
3. **Distribution hardening**: PyPI publish, `uvx`/`pipx` story, signed checksums,
SBOM. Lowers adoption friction and raises supply-chain trust.
4. **MCP contract hardening**: `schema_version` on every payload, golden
snapshots per tool, verified client docs, paging/progressive results.
5. **Retrieval tuning**: dampen the god-class `in_degree` tiebreak (the 3 honest
misses in the Java run), per-intent weights review.
4. **MCP contract hardening**: ✅ `schema_version` on every payload + golden
snapshots per tool (this pass). Remaining: verified client docs, paging/progressive results.
5. **Retrieval tuning**: ✅ dampened the god-class `in_degree` tiebreak this pass
(log curve + lower cap, validated no-regression on the public suite). Remaining:
confirm the real-repo gain on the 3 honest Java misses (needs M12.5), per-intent weights review.
6. **Language reach**: config/IaC awareness (Dockerfile, Terraform, migrations,
CI), plus Swift/Dart/Scala/Vue/Svelte gaps called out in FAQ.

Expand All @@ -119,7 +122,7 @@ transparent Python implementation, a strict privacy model, and honest benchmarks
- [x] `docs/BENCHMARKS.md` "claims not to make yet" + TODO benchmark checklist.
- [x] `docs/RELEASE_CHECKLIST.md`.
- [ ] Verified per-client MCP setup docs (after testing each client version).
- [ ] A short "trust model in 60 seconds" callout reused across README/SECURITY.
- [x] A short "trust model in 60 seconds" callout reused across README/SECURITY.

## 8. Benchmark tasks

Expand Down Expand Up @@ -150,14 +153,19 @@ Track in [BENCHMARKS.md](BENCHMARKS.md); none may be reported until run with log

| # | Improvement | Impact | Risk | Status |
|---|---|---|---|---|
| 1 | Implement `clean` (documented but was a stub) | Fixes doc/reality gap | Low | **Shipped this pass** |
| 2 | Dampen god-class `in_degree` tiebreak in rerank | +recall on real repos | Medium (retune) | Planned |
| 3 | `schema_version` on every MCP payload | Stable contract | Low | Partly (architecture claims it) — verify+test |
| 4 | Golden snapshots for each MCP tool output | Regression safety | Low | Planned |
| 5 | Typed framework edges in the graph | Better `impact` | High | Roadmap (ARCHITECTURE §9) |
| 6 | Config/IaC parsers (Dockerfile, Terraform, migrations) | Coverage | Medium | Roadmap |
| 1 | Implement `clean` (documented but was a stub) | Fixes doc/reality gap | Low | **Shipped (1.3.0 line)** |
| 2 | Dampen god-class `in_degree` tiebreak in rerank | +recall on real repos | Medium (retune) | **Shipped this pass** — log dampening + lower cap; no-regression on the public suite + a targeted regression test. Real-repo gain still needs M12.5. |
| 3 | `schema_version` on every MCP payload | Stable contract | Low | **Shipped this pass** — `schema_version` + `tool` envelope on every payload (incl. errors), asserted + golden-locked. |
| 4 | Golden snapshots for each MCP tool output | Regression safety | Low | **Shipped this pass** — `tests/golden/mcp_*.json` via `tests/test_mcp_golden.py`. |
| 5 | Typed framework edges in the graph | Better `impact` | High | Design doc shipped this pass (`docs/superpowers/specs/2026-06-14-typed-framework-edges-design.md`); implementation behind the §8 benchmark. |
| 6 | Config/IaC parsers (Dockerfile, Terraform, migrations) | Coverage | Medium | **Partly shipped this pass** — Tier-C labeling for Dockerfile/Terraform/HCL/INI/Make (already FTS-indexed, now language-labeled); tree-sitter parsing of these still roadmap. |
| 7 | Paging/progressive MCP results | Big-repo UX | Medium | Roadmap (MCP.md) |

Also fixed this pass (not previously tracked): the MCP server failed to import on
`mcp>=1.27` + `pydantic>=2.10` (FastMCP auto-built a structured-output schema from
the `-> str` return annotation and raised). Tools now register as unstructured
(`structured_output=False` where supported), so the server loads on current `mcp`.

Rule for this repo: small, safe, tested changes land directly; anything that
risks destabilizing retrieval quality or the security model is documented here
first and lands behind a benchmark.
23 changes: 16 additions & 7 deletions docs/ROADMAP.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Roadmap & First Implementation Tasks

Milestones are vertical-ish slices: each ends with something runnable and testable.
This numbering is canonical — the product-level [ROADMAP.md](../ROADMAP.md) and the
`(Mx)` tags in [CHANGELOG.md](../CHANGELOG.md) follow it.
This is the **technical-milestone** view (M0–M10). The product-level
[ROADMAP.md](../ROADMAP.md) tells the same story at a finer grain and carries it
further (it splits the MCP server into M11 and adds M11.5/M12/M12.5/M13 for MCP
hardening, benchmarks, and the typed-edge graph). Where the two disagree on a
number, the product roadmap is the current product view; this file tracks the
original implementation slices. The `(Mx)` tags in
[CHANGELOG.md](../CHANGELOG.md) follow this technical numbering.

## M0 — Architecture & scaffold ✅ (this repo)
- Repo tree, docs (ARCHITECTURE/RETRIEVAL/SCHEMA/SECURITY/INSTALLATION), SKILL.md draft.
Expand Down Expand Up @@ -77,11 +82,15 @@ release with the built artifacts (GitHub-only distribution — no PyPI publish).
"git+https://github.com/denfry/codebase-index.git@v1.2.0"` -> `init` -> `index` -> ask a question is
verified end-to-end by `scripts/release_smoke.py`.*

## M10 — Optional MCP bridge (planned)
- Model Context Protocol server exposing `search`, `symbol`, `refs`, `impact` as tools for
MCP-compatible clients (Claude Desktop, Cursor, etc.). An optional addition, not a replacement
for the Skill/CLI interface.
- **Exit:** `codebase-index` can be used as an MCP tool by any MCP-compatible client.
## M10 — MCP bridge ✅ (product roadmap M11)
- Shipped: a stdio Model Context Protocol server (`codebase-index mcp --root <repo>`, or the
`codebase-index-mcp` entry point) exposing `healthcheck`, `search_code`, `find_symbol`,
`find_refs`, `impact_of`, `explain_code`, and `index_stats` over the same `service.py` layer the
CLI uses — an optional addition, not a replacement for the Skill/CLI interface. Every payload
carries a `schema_version` + `tool` envelope, locked by golden snapshots (`tests/golden/mcp_*.json`).
- **Exit:** `codebase-index` can be used as an MCP tool by any MCP-compatible client. See
[MCP.md](MCP.md).
- Follow-up (product roadmap M11.5): verified per-client setup docs and paging/progressive results.

---

Expand Down
10 changes: 10 additions & 0 deletions docs/SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
`codebase-index` is **local-first and offline by default**. Its threat model assumes the indexed
repository may contain secrets and that a skill must not exfiltrate code or run dangerous commands.

> **Trust model in 60 seconds**
> 1. **Offline by default** — the base install has zero network dependencies; nothing leaves your machine (§1, §4).
> 2. **One opt-in exit, triple-gated** — external embeddings require `allow_external` **and** an env API key **and** a printed endpoint warning, or they are refused (§4).
> 3. **Secrets never get in** — `.env`, keys, certs, and credential files are excluded before parsing (§2).
> 4. **Secrets never get out** — every snippet is redacted before it reaches the agent (§3).
> 5. **No telemetry, ever** — no analytics, no phone-home, no usage data.
> 6. **Verify it yourself** — `codebase-index doctor --strict` audits all of the above and gates CI (§6).
>
> The same callout appears in the README so the trust story is identical wherever a reader lands.

## 1. Principles

1. **Local-first** — index, query, and storage all happen on the user's machine.
Expand Down
Loading
Loading