From c5027707704f61a4726993cf133b6c5584f861f1 Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Mon, 27 Apr 2026 21:50:36 +0100 Subject: [PATCH 1/9] feat(server/chunker): expand to 30+ languages + parse-budget guard + bash regex fallback Three related chunker improvements: 1. Language registry expansion. defaultRegistry now ships 30 languages (Tier A: Go/Python/JS/TS/TSX/Java/C/C++/Ruby/PHP/Rust; Tier B: C#/Swift/ Kotlin/Scala/Haskell/Elixir/Erlang/OCaml/Lua/Bash/HTML/CSS/SQL/YAML/JSON /TOML/Markdown/Dockerfile/Make). Configurable via CIX_ENABLED_LANGUAGES. See doc/LANGUAGES.md. 2. Tree-sitter parse-budget guard. Bash grammar exhibits catastrophic backtracking on real-world scripts (31s on 7KB install.sh). Added parseBudget=2s with twin guards: SetTimeoutMicros + SetCancellationFlag armed by time.AfterFunc. On budget exceeded, falls back to chunkFallback(). 3. Bash regex fallback. New bashRegexChunks() recognises the three common bash function forms (POSIX `name() {`, keyword `function name {`, with or without parens) and finds each function's closing brace via a state machine that handles single/double strings, comments, heredocs (< --- doc/LANGUAGES.md | 75 +++ server/cmd/cix-server/main.go | 4 + server/internal/chunker/bash_regex.go | 318 ++++++++++ server/internal/chunker/bash_regex_test.go | 341 +++++++++++ server/internal/chunker/chunker.go | 558 +++++++++++++++--- server/internal/chunker/chunker_test.go | 398 +++++++++++++ server/internal/config/config.go | 14 + server/internal/langdetect/langdetect.go | 3 +- server/internal/langdetect/langdetect_test.go | 4 +- 9 files changed, 1640 insertions(+), 75 deletions(-) create mode 100644 doc/LANGUAGES.md create mode 100644 server/internal/chunker/bash_regex.go create mode 100644 server/internal/chunker/bash_regex_test.go diff --git a/doc/LANGUAGES.md b/doc/LANGUAGES.md new file mode 100644 index 0000000..70ef5c4 --- /dev/null +++ b/doc/LANGUAGES.md @@ -0,0 +1,75 @@ +# Supported languages + +cix uses tree-sitter (via `github.com/odvcencio/gotreesitter`) to extract semantic chunks (functions, classes, methods, types) from source code. Files in unsupported languages still get indexed via a sliding-window fallback — they're searchable, just without per-symbol granularity. + +## Default language set (30) + +| ID | gotreesitter factory | Function | Class | Method | Type | +|---|---|:-:|:-:|:-:|:-:| +| `python` | `PythonLanguage` | ✓ | ✓ | | | +| `typescript` | `TypescriptLanguage` | ✓ | ✓ | ✓ | ✓ | +| `tsx` | `TsxLanguage` | ✓ | ✓ | ✓ | ✓ | +| `javascript` | `JavascriptLanguage` | ✓ | ✓ | ✓ | | +| `go` | `GoLanguage` | ✓ | | ✓ | ✓ | +| `rust` | `RustLanguage` | ✓ | ✓ | | ✓ | +| `java` | `JavaLanguage` | ✓ | ✓ | | ✓ | +| `c` | `CLanguage` | ✓ | ✓ | | ✓ | +| `cpp` | `CppLanguage` | ✓ | ✓ | | ✓ | +| `c_sharp` | `CSharpLanguage` | ✓ | ✓ | ✓ | ✓ | +| `ruby` | `RubyLanguage` | ✓ | ✓ | | | +| `php` | `PhpLanguage` | ✓ | ✓ | ✓ | ✓ | +| `swift` | `SwiftLanguage` | ✓ | ✓ | | ✓ | +| `kotlin` | `KotlinLanguage` | ✓ | ✓ | | | +| `scala` | `ScalaLanguage` | ✓ | ✓ | | ✓ | +| `bash` | `BashLanguage` | ✓ | | | | +| `lua` | `LuaLanguage` | ✓ | | | | +| `dart` | `DartLanguage` | ✓ | ✓ | ✓ | ✓ | +| `r` | `RLanguage` | ✓ | | | | +| `objc` | `ObjcLanguage` | ✓ | ✓ | ✓ | ✓ | +| `html` | `HtmlLanguage` | | | | ✓ | +| `css` | `CssLanguage` | | ✓ | | | +| `scss` | `ScssLanguage` | ✓ | ✓ | | | +| `sql` | `SqlLanguage` | ✓ | | | ✓ | +| `markdown` | `MarkdownLanguage` | | | | ✓ | +| `zig` | `ZigLanguage` | ✓ | ✓ | | | +| `julia` | `JuliaLanguage` | ✓ | | | | +| `fortran` | `FortranLanguage` | ✓ | ✓ | | | +| `haskell` | `HaskellLanguage` | ✓ | | | ✓ | +| `ocaml` | `OcamlLanguage` | ✓ | ✓ | | ✓ | + +The exact AST node types per language live in `server/internal/chunker/chunker.go` (`defaultRegistry`). File-extension mapping lives in `server/internal/langdetect/langdetect.go`. + +## Configuring the active set + +`CIX_LANGUAGES` (comma-separated, case-insensitive) restricts the active set. Empty / unset = all defaults. + +```bash +# Only index Python and Go — every other language falls to sliding-window +CIX_LANGUAGES=python,go cix-server + +# Add Rust to the trio +CIX_LANGUAGES="python, go, rust" cix-server +``` + +Unknown IDs are logged at startup and ignored — typos won't crash the server. + +The active set is logged at INFO during startup: + +``` +{"level":"INFO","msg":"chunker languages configured","active":["python","go","rust"]} +``` + +## Languages with extension detection but no grammar + +These produce sliding-window chunks. Adding semantic chunking is a one-map-entry addition in `defaultRegistry`. Candidates: + +`erlang, elixir, commonlisp, svelte, graphql, hcl (terraform), cmake, dockerfile, regex, xml, make` + +PRs welcome — verify node names with `gotreesitter`'s `cmd/tsquery` against a representative fixture before adding. + +## How the chunker decides + +1. `langdetect.Detect(filePath)` maps extension/filename → language ID. +2. `chunker.ChunkFile()` looks up the ID in the active registry. +3. If found and its `languageNodes` map is non-empty → AST-based extraction (function/class/method/type chunks + identifier references). +4. Otherwise → sliding-window chunks of `windowSize=4000` bytes with `overlap=500`. diff --git a/server/cmd/cix-server/main.go b/server/cmd/cix-server/main.go index cbb9ad6..7364400 100644 --- a/server/cmd/cix-server/main.go +++ b/server/cmd/cix-server/main.go @@ -15,6 +15,7 @@ import ( "syscall" "time" + "github.com/dvcdsys/code-index/server/internal/chunker" "github.com/dvcdsys/code-index/server/internal/config" "github.com/dvcdsys/code-index/server/internal/db" "github.com/dvcdsys/code-index/server/internal/embeddings" @@ -73,6 +74,9 @@ func run() error { logger.Warn("CIX_API_KEY is empty — authenticated endpoints are reachable without auth (dev mode)") } + chunker.Configure(cfg.Languages) + logger.Info("chunker languages configured", "active", chunker.SupportedLanguages()) + dbPath := cfg.DynamicSQLitePath() logger.Info("opening database", "path", dbPath) database, err := db.Open(dbPath) diff --git a/server/internal/chunker/bash_regex.go b/server/internal/chunker/bash_regex.go new file mode 100644 index 0000000..53c5006 --- /dev/null +++ b/server/internal/chunker/bash_regex.go @@ -0,0 +1,318 @@ +// Package chunker — regex-based bash function extractor. +// +// Used as a fallback when tree-sitter-bash hits a parse pathology (see +// parseBudget in chunker.go). Tree-sitter would have given us better symbol +// data, but on a 7KB install.sh-style script its parser can spend 30 seconds +// on catastrophic backtracking. The regex extractor below recognises the +// three common bash function forms and finds each function's closing brace +// with a small state machine that handles strings, comments, and heredocs. +// +// Output schema matches chunkWithTreesitter so the indexer's downstream code +// (DB upserts, vector embeddings) doesn't need to special-case bash. +// +// Limitations vs full tree-sitter parse: +// - No reference extraction (returns nil refs). +// - Functions with a `{` on a line *separate* from the opener (`name()` on +// one line, `{` on the next) are not matched. That form is legal in +// bash but rare in practice; falls back to sliding-window for those. +// - Comments containing `{`/`}` inside strings can confuse the brace +// counter on adversarial inputs; bounded by maxBashFuncLines so a +// malformed function never absorbs the whole file. + +package chunker + +import ( + "regexp" + "strings" +) + +// posixFuncRE matches the POSIX-shell style: `name() { ...`. +// Captures group 1 = function name. The trailing `{` must be on the same line. +var posixFuncRE = regexp.MustCompile( + `^[[:space:]]*([A-Za-z_][A-Za-z0-9_:.-]*)[[:space:]]*\(\)[[:space:]]*\{`) + +// bashFuncRE matches the bash-keyword style: `function name [()] { ...`. +// Captures group 1 = function name. +var bashFuncRE = regexp.MustCompile( + `^[[:space:]]*function[[:space:]]+([A-Za-z_][A-Za-z0-9_:.-]*)(?:[[:space:]]*\(\))?[[:space:]]*\{`) + +// maxBashFuncLines caps how far we'll scan for a function's closing `}`. +// Real-world bash functions rarely exceed ~200 lines. The cap protects +// against pathological inputs where the brace counter goes off-track — +// instead of consuming the whole file as one function, we stop and let the +// caller decide what to do (typically: keep a partial chunk, fall back +// to sliding-window for the remainder). +const maxBashFuncLines = 500 + +// bashRegexChunks extracts function-level chunks from bash source via regex. +// Returns nil when no functions were found, signalling the caller to fall +// through to sliding-window. Always returns nil refs (the regex doesn't +// track identifier usage). +func bashRegexChunks(filePath, content string) []Chunk { + lines := splitLines(content) + if len(lines) == 0 { + return nil + } + + var chunks []Chunk + covered := make([]bool, len(lines)) + + i := 0 + for i < len(lines) { + line := lines[i] + var name string + if m := posixFuncRE.FindStringSubmatch(line); m != nil { + name = m[1] + } else if m := bashFuncRE.FindStringSubmatch(line); m != nil { + name = m[1] + } + if name == "" { + i++ + continue + } + + endIdx, ok := scanBashFuncEnd(lines, i) + if !ok { + // Couldn't find balanced close within maxBashFuncLines. + // Skip this opener — don't emit a wildly oversized chunk. + i++ + continue + } + + startLine := i + 1 // 1-based + endLine := endIdx + 1 + body := joinLines(lines[i : endIdx+1]) + // Signature = the opener line trimmed. + sigStr := trimSpace(line) + nameCopy := name + + chunks = append(chunks, Chunk{ + Content: body, + ChunkType: "function", + FilePath: filePath, + StartLine: startLine, + EndLine: endLine, + Language: "bash", + SymbolName: &nameCopy, + SymbolSignature: &sigStr, + }) + for k := i; k <= endIdx && k < len(covered); k++ { + covered[k] = true + } + i = endIdx + 1 + } + + if len(chunks) == 0 { + return nil + } + + // Fill the gaps between functions with `module` chunks so the file's + // non-function content (top-level commands, comments, set -e, etc.) + // still gets indexed for full-text/semantic search. + chunks = appendBashGaps(chunks, lines, covered, filePath) + return chunks +} + +// appendBashGaps adds module-type chunks for line ranges not covered by any +// function. Mirrors the gap-filling logic chunkWithTreesitter applies for +// tree-sitter chunks. Returns chunks sorted by StartLine. +func appendBashGaps(chunks []Chunk, lines []string, covered []bool, filePath string) []Chunk { + gapStart := -1 + for i := 0; i <= len(lines); i++ { + uncovered := i < len(lines) && !covered[i] + if uncovered && gapStart < 0 { + gapStart = i + } + if !uncovered && gapStart >= 0 { + gapEnd := i - 1 + content := joinLines(lines[gapStart : gapEnd+1]) + if trimSpace(content) != "" { + chunks = append(chunks, Chunk{ + Content: content, + ChunkType: "module", + FilePath: filePath, + StartLine: gapStart + 1, + EndLine: gapEnd + 1, + Language: "bash", + }) + } + gapStart = -1 + } + } + // Sort by StartLine so consumers see a stable order. + insertSortByStartLine(chunks) + return chunks +} + +func insertSortByStartLine(chunks []Chunk) { + for i := 1; i < len(chunks); i++ { + j := i + for j > 0 && chunks[j].StartLine < chunks[j-1].StartLine { + chunks[j], chunks[j-1] = chunks[j-1], chunks[j] + j-- + } + } +} + +// scanBashFuncEnd walks forward from startLineIdx (the opener line, which +// already contains the first `{`) and returns the 0-based line index of the +// matching close `}` and ok=true. ok=false means we couldn't find a balance +// within maxBashFuncLines or hit EOF first. +// +// State machine handles: +// - Single-quoted strings ('...') — literal, no escapes +// - Double-quoted strings ("...") — `\"` is escaped quote, `\\` is escaped backslash +// - `# ... EOL` comments — but skipping `$#`, `${#var}`, `$(( # ...))` etc. heuristically +// - Heredocs (< len(lines) { + maxIdx = len(lines) + } + + for li := startLineIdx; li < maxIdx; li++ { + line := lines[li] + + if inHeredoc { + candidate := line + if heredocStripTabs { + candidate = strings.TrimLeft(line, "\t") + } + if candidate == heredocDelim { + inHeredoc = false + heredocDelim = "" + heredocStripTabs = false + } + continue + } + + i := 0 + for i < len(line) { + c := line[i] + + if inSingleStr { + if c == '\'' { + inSingleStr = false + } + i++ + continue + } + if inDoubleStr { + if c == '\\' && i+1 < len(line) { + // Skip the escaped char (handles `\"`, `\\`, etc.). + i += 2 + continue + } + if c == '"' { + inDoubleStr = false + } + i++ + continue + } + + // Comment — `#` starts a line comment unless it follows `$` (`$#`, + // argument count) or `{`/`(` (`${#var}`, `$((# expr ))`). We + // skip the comment if `#` is at start of line / after whitespace + // or after a token-ending char. + if c == '#' { + prev := byte(' ') + if i > 0 { + prev = line[i-1] + } + if prev == ' ' || prev == '\t' || prev == ';' || prev == '|' || + prev == '&' || prev == '(' || i == 0 { + break // rest of line is comment + } + } + + // Heredoc / here-string + if c == '<' && i+1 < len(line) && line[i+1] == '<' { + // `<<<` is here-string (single-line) — skip the marker + if i+2 < len(line) && line[i+2] == '<' { + i += 3 + continue + } + // `<<` or `<<-` + j := i + 2 + stripTabs := false + if j < len(line) && line[j] == '-' { + stripTabs = true + j++ + } + // Skip leading whitespace before delimiter + for j < len(line) && (line[j] == ' ' || line[j] == '\t') { + j++ + } + delim, after := readHeredocDelim(line, j) + if delim != "" { + inHeredoc = true + heredocDelim = delim + heredocStripTabs = stripTabs + // Resume after the delimiter on this line — there may + // be more code on the opener line (e.g. `cmd <= len(line) { + return "", start + } + q := line[start] + if q == '\'' || q == '"' { + end := strings.IndexByte(line[start+1:], q) + if end < 0 { + return "", start + } + return line[start+1 : start+1+end], start + 1 + end + 1 + } + j := start + for j < len(line) && (isBashIdentByte(line[j])) { + j++ + } + if j == start { + return "", start + } + return line[start:j], j +} + +func isBashIdentByte(b byte) bool { + return (b >= 'A' && b <= 'Z') || + (b >= 'a' && b <= 'z') || + (b >= '0' && b <= '9') || + b == '_' || b == '-' +} diff --git a/server/internal/chunker/bash_regex_test.go b/server/internal/chunker/bash_regex_test.go new file mode 100644 index 0000000..f4c3815 --- /dev/null +++ b/server/internal/chunker/bash_regex_test.go @@ -0,0 +1,341 @@ +package chunker + +import ( + "strings" + "testing" +) + +// helper: assert a chunk with the given symbol name and type exists. +func findChunkByName(t *testing.T, chunks []Chunk, name, kind string) Chunk { + t.Helper() + for _, c := range chunks { + if c.SymbolName != nil && *c.SymbolName == name && c.ChunkType == kind { + return c + } + } + t.Fatalf("no chunk with name=%q type=%q in: %s", name, kind, summariseChunks(chunks)) + return Chunk{} +} + +func summariseChunks(chunks []Chunk) string { + var b strings.Builder + for i, c := range chunks { + if i > 0 { + b.WriteString("; ") + } + name := "" + if c.SymbolName != nil { + name = *c.SymbolName + } + b.WriteString(c.ChunkType + ":" + name) + } + return b.String() +} + +// --- POSIX style: name() { ... } ------------------------------------------- + +func TestBashRegex_PosixSimple(t *testing.T) { + src := `#!/usr/bin/env bash +hello() { + echo "hi" +} +` + chunks := bashRegexChunks("/p/x.sh", src) + hello := findChunkByName(t, chunks, "hello", "function") + if hello.StartLine != 2 || hello.EndLine != 4 { + t.Errorf("hello lines = %d-%d, want 2-4", hello.StartLine, hello.EndLine) + } + if !strings.Contains(hello.Content, `echo "hi"`) { + t.Errorf("body missing echo: %q", hello.Content) + } +} + +func TestBashRegex_PosixOneLiner(t *testing.T) { + src := `greet() { echo "hi"; } +` + chunks := bashRegexChunks("/p/x.sh", src) + greet := findChunkByName(t, chunks, "greet", "function") + if greet.StartLine != 1 || greet.EndLine != 1 { + t.Errorf("greet lines = %d-%d, want 1-1", greet.StartLine, greet.EndLine) + } +} + +// --- bash function keyword form -------------------------------------------- + +func TestBashRegex_FunctionKeywordWithParens(t *testing.T) { + src := `function deploy() { + echo deploying +} +` + chunks := bashRegexChunks("/p/d.sh", src) + findChunkByName(t, chunks, "deploy", "function") +} + +func TestBashRegex_FunctionKeywordNoParens(t *testing.T) { + src := `function build { + make all +} +` + chunks := bashRegexChunks("/p/b.sh", src) + findChunkByName(t, chunks, "build", "function") +} + +// --- multiple functions ---------------------------------------------------- + +func TestBashRegex_MultipleFunctions(t *testing.T) { + src := `setup() { + mkdir -p /tmp/x +} + +teardown() { + rm -rf /tmp/x +} + +run_tests() { + setup + pytest + teardown +} +` + chunks := bashRegexChunks("/p/test.sh", src) + for _, name := range []string{"setup", "teardown", "run_tests"} { + findChunkByName(t, chunks, name, "function") + } + // Three functions + the gap before teardown / between functions / after. + functionCount := 0 + for _, c := range chunks { + if c.ChunkType == "function" { + functionCount++ + } + } + if functionCount != 3 { + t.Errorf("function count = %d, want 3", functionCount) + } +} + +// --- nested braces --------------------------------------------------------- + +func TestBashRegex_NestedBraces(t *testing.T) { + src := `outer() { + if [[ "$1" == "yes" ]]; then + local x={key:value} + echo "${x}" + fi +} +` + chunks := bashRegexChunks("/p/n.sh", src) + outer := findChunkByName(t, chunks, "outer", "function") + if outer.StartLine != 1 || outer.EndLine != 6 { + t.Errorf("outer lines = %d-%d, want 1-6", outer.StartLine, outer.EndLine) + } +} + +// --- strings containing braces --------------------------------------------- + +func TestBashRegex_StringsWithBraces(t *testing.T) { + src := `format() { + echo "literal { brace }" + echo 'single { quoted }' +} +trailer() { echo done; } +` + chunks := bashRegexChunks("/p/s.sh", src) + format := findChunkByName(t, chunks, "format", "function") + if format.EndLine != 4 { + t.Errorf("format end = %d, want 4 (string braces should not count)", format.EndLine) + } + findChunkByName(t, chunks, "trailer", "function") +} + +// --- heredoc handling ------------------------------------------------------ + +func TestBashRegex_HeredocBody(t *testing.T) { + src := `usage() { + cat <] +EOF +} + +main() { + usage +} +` + chunks := bashRegexChunks("/p/install.sh", src) + findChunkByName(t, chunks, "usage", "function") + findChunkByName(t, chunks, "main", "function") +} + +// --- fallback wiring: ChunkFile uses regex for bash on parse fallback ------ + +func TestChunkFile_BashFallbackUsesRegex(t *testing.T) { + // We pick a bash source that's chunked successfully by tree-sitter + // (so the parse-budget guard does NOT fire) and verify both paths + // produce a function-named chunk for `hello`. This is a sanity check + // that bashRegexChunks signature matches the public ChunkFile schema. + src := `hello() { + echo "hi" +} +` + chunks, _, err := ChunkFile("/p/x.sh", src, "bash", 0) + if err != nil { + t.Fatalf("ChunkFile: %v", err) + } + for _, c := range chunks { + if c.ChunkType == "function" && c.SymbolName != nil && *c.SymbolName == "hello" { + return + } + } + t.Errorf("expected `hello` function chunk, got: %s", summariseChunks(chunks)) +} diff --git a/server/internal/chunker/chunker.go b/server/internal/chunker/chunker.go index dc44f55..0b140a1 100644 --- a/server/internal/chunker/chunker.go +++ b/server/internal/chunker/chunker.go @@ -2,9 +2,20 @@ // The public surface is ChunkFile, which returns ([]Chunk, []Reference, error). // Sliding-window fallback is used when a language is not supported by the // tree-sitter grammars bundle or when parsing fails. +// +// The set of active languages is built from a baked-in default registry +// (see defaultRegistry) and may be filtered at startup via Configure(). The +// CIX_LANGUAGES env var feeds Configure with a comma-separated whitelist; +// empty/nil keeps all defaults. package chunker import ( + "log/slog" + "strings" + "sync" + "sync/atomic" + "time" + sitter "github.com/odvcencio/gotreesitter" "github.com/odvcencio/gotreesitter/grammars" ) @@ -24,53 +35,401 @@ const ( // minRefNameLength mirrors MIN_REF_NAME_LENGTH in chunker.py. const minRefNameLength = 2 +// parseBudget caps wall-clock time spent in tree-sitter for a single file. +// Some grammars (notably bash) have catastrophic-backtracking pathologies on +// specific inputs — install.sh in this very repo took 31s to parse before +// this guard. The parser's own SetTimeoutMicros checkpoint is best-effort +// and overshoots by 3-4×, so we set the hint generously and rely on the +// post-parse wall-clock check to decide whether to keep the tree. +// +// On overshoot we fall back to sliding-window chunks. We accept the wasted +// CPU (parser keeps running until its next checkpoint) because killing a +// pure-Go parse from outside is not safe — the only practical levers are +// SetTimeoutMicros and the cancellation flag, both with the same overshoot +// characteristic. +const ( + parseBudget = 2 * time.Second + parseHint = uint64(parseBudget / time.Microsecond) +) + // --------------------------------------------------------------------------- -// Language maps — ported 1:1 from chunker.py +// Language registry — built from defaultRegistry() at init() and reduced by +// Configure() if the operator set CIX_LANGUAGES. The three exported maps +// stay package-private; the engine reads them directly. // --------------------------------------------------------------------------- -// languageNodes maps language → kind → []node_type. -// Kind values: function|class|method|type. -var languageNodes = map[string]map[string][]string{ - "python": { - "function": {"function_definition"}, - "class": {"class_definition"}, - }, - "typescript": { - "function": {"function_declaration", "arrow_function"}, - "class": {"class_declaration"}, - "method": {"method_definition"}, - "type": {"interface_declaration", "type_alias_declaration"}, - }, - "javascript": { - "function": {"function_declaration", "arrow_function"}, - "class": {"class_declaration"}, - "method": {"method_definition"}, - }, - "go": { - "function": {"function_declaration"}, - "method": {"method_declaration"}, - "type": {"type_spec"}, - }, - "rust": { - "function": {"function_item"}, - "class": {"struct_item", "enum_item"}, - "type": {"trait_item"}, - }, - "java": { - "function": {"method_declaration"}, - "class": {"class_declaration"}, - "type": {"interface_declaration"}, - }, +// languageEntry bundles the three pieces of state a language needs. +type languageEntry struct { + factory languageFunc + nodes map[string][]string // function|class|method|type → AST node types + identifiers map[string]struct{} // identifier leaf-node types for ref extraction } -// identifierNodes maps language → set of identifier leaf-node types. -var identifierNodes = map[string]map[string]struct{}{ - "python": {"identifier": {}}, - "typescript": {"identifier": {}, "type_identifier": {}, "property_identifier": {}}, - "javascript": {"identifier": {}, "property_identifier": {}}, - "go": {"identifier": {}, "type_identifier": {}, "field_identifier": {}}, - "rust": {"identifier": {}, "type_identifier": {}, "field_identifier": {}}, - "java": {"identifier": {}, "type_identifier": {}}, +// languageFunc is a factory for sitter.Language. +type languageFunc func() *sitter.Language + +var ( + registryMu sync.RWMutex + languageRegistry map[string]languageFunc + languageNodes map[string]map[string][]string + identifierNodes map[string]map[string]struct{} +) + +func init() { + // Populate full defaults so direct ChunkFile usage (and tests) works + // without a Configure() call. Server startup later may filter via + // Configure(cfg.Languages). + Configure(nil) +} + +// Configure (re)builds the active language registry from the baked-in +// defaultRegistry, optionally filtered to the IDs in `enabled`. Empty or nil +// `enabled` activates all defaults. Unknown IDs are logged and ignored. +// Idempotent and safe to call multiple times; concurrent ChunkFile callers +// see a consistent snapshot via registryMu. +func Configure(enabled []string) { + defaults := defaultRegistry() + + wantAll := len(enabled) == 0 + wanted := make(map[string]struct{}, len(enabled)) + if !wantAll { + for _, raw := range enabled { + id := strings.ToLower(strings.TrimSpace(raw)) + if id == "" { + continue + } + wanted[id] = struct{}{} + } + } + + reg := make(map[string]languageFunc, len(defaults)) + nodes := make(map[string]map[string][]string, len(defaults)) + idents := make(map[string]map[string]struct{}, len(defaults)) + + for lang, entry := range defaults { + if !wantAll { + if _, ok := wanted[lang]; !ok { + continue + } + } + reg[lang] = entry.factory + if entry.nodes != nil { + nodes[lang] = entry.nodes + } + if entry.identifiers != nil { + idents[lang] = entry.identifiers + } + } + + if !wantAll { + for id := range wanted { + if _, ok := defaults[id]; !ok { + slog.Warn("chunker: unknown language in CIX_LANGUAGES, ignored", "lang", id) + } + } + } + + registryMu.Lock() + languageRegistry = reg + languageNodes = nodes + identifierNodes = idents + registryMu.Unlock() +} + +// SupportedLanguages returns a snapshot of currently-active language IDs. +// Useful for /health, debug endpoints, and test assertions. +func SupportedLanguages() []string { + registryMu.RLock() + defer registryMu.RUnlock() + out := make([]string, 0, len(languageRegistry)) + for k := range languageRegistry { + out = append(out, k) + } + return out +} + +// defaultRegistry returns the baked-in language entries. Adding a language is +// a single new map entry — no other code changes are needed because the +// chunker engine is data-driven. +func defaultRegistry() map[string]languageEntry { + idID := func(extra ...string) map[string]struct{} { + m := map[string]struct{}{"identifier": {}} + for _, e := range extra { + m[e] = struct{}{} + } + return m + } + + return map[string]languageEntry{ + // --- Tier 1: original 6, kept as-is for parity with legacy Python --- + "python": { + factory: grammars.PythonLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + "class": {"class_definition"}, + }, + identifiers: idID(), + }, + "typescript": { + factory: grammars.TypescriptLanguage, + nodes: map[string][]string{ + "function": {"function_declaration", "arrow_function"}, + "class": {"class_declaration"}, + "method": {"method_definition"}, + "type": {"interface_declaration", "type_alias_declaration"}, + }, + identifiers: idID("type_identifier", "property_identifier"), + }, + "javascript": { + factory: grammars.JavascriptLanguage, + nodes: map[string][]string{ + "function": {"function_declaration", "arrow_function"}, + "class": {"class_declaration"}, + "method": {"method_definition"}, + }, + identifiers: idID("property_identifier"), + }, + "go": { + factory: grammars.GoLanguage, + nodes: map[string][]string{ + "function": {"function_declaration"}, + "method": {"method_declaration"}, + "type": {"type_spec"}, + }, + identifiers: idID("type_identifier", "field_identifier"), + }, + "rust": { + factory: grammars.RustLanguage, + nodes: map[string][]string{ + "function": {"function_item"}, + "class": {"struct_item", "enum_item"}, + "type": {"trait_item"}, + }, + identifiers: idID("type_identifier", "field_identifier"), + }, + "java": { + factory: grammars.JavaLanguage, + nodes: map[string][]string{ + "function": {"method_declaration"}, + "class": {"class_declaration"}, + "type": {"interface_declaration"}, + }, + identifiers: idID("type_identifier"), + }, + + // --- Tier 2: bug-fix — grammars were registered, node maps were not --- + "tsx": { + factory: grammars.TsxLanguage, + nodes: map[string][]string{ + "function": {"function_declaration", "arrow_function"}, + "class": {"class_declaration"}, + "method": {"method_definition"}, + "type": {"interface_declaration", "type_alias_declaration"}, + }, + identifiers: idID("type_identifier", "property_identifier"), + }, + "c": { + factory: grammars.CLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + "class": {"struct_specifier"}, + "type": {"enum_specifier", "union_specifier", "type_definition"}, + }, + identifiers: idID("type_identifier", "field_identifier"), + }, + "cpp": { + factory: grammars.CppLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + "class": {"class_specifier", "struct_specifier"}, + "type": {"enum_specifier", "union_specifier", "type_definition", "namespace_definition"}, + }, + identifiers: idID("type_identifier", "field_identifier"), + }, + "ruby": { + factory: grammars.RubyLanguage, + nodes: map[string][]string{ + "function": {"method", "singleton_method"}, + "class": {"class", "module"}, + }, + identifiers: idID("constant"), + }, + + // --- Tier 3: mainstream additions, high confidence in node names --- + "c_sharp": { + factory: grammars.CSharpLanguage, + nodes: map[string][]string{ + "function": {"local_function_statement"}, + "class": {"class_declaration"}, + "method": {"method_declaration"}, + "type": {"interface_declaration", "struct_declaration", "enum_declaration", "record_declaration"}, + }, + identifiers: idID("type_identifier"), + }, + "php": { + factory: grammars.PhpLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + "class": {"class_declaration"}, + "method": {"method_declaration"}, + "type": {"interface_declaration", "trait_declaration"}, + }, + identifiers: idID("name", "variable_name"), + }, + "swift": { + factory: grammars.SwiftLanguage, + nodes: map[string][]string{ + "function": {"function_declaration"}, + "class": {"class_declaration"}, + "type": {"protocol_declaration"}, + }, + identifiers: idID("simple_identifier", "type_identifier"), + }, + "kotlin": { + factory: grammars.KotlinLanguage, + nodes: map[string][]string{ + "function": {"function_declaration"}, + "class": {"class_declaration", "object_declaration"}, + }, + identifiers: idID("type_identifier", "simple_identifier"), + }, + "scala": { + factory: grammars.ScalaLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + "class": {"class_definition", "object_definition"}, + "type": {"trait_definition"}, + }, + identifiers: idID("type_identifier"), + }, + "bash": { + factory: grammars.BashLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + }, + identifiers: idID("variable_name", "word"), + }, + "lua": { + factory: grammars.LuaLanguage, + nodes: map[string][]string{ + "function": {"function_declaration", "function_definition"}, + }, + identifiers: idID(), + }, + "dart": { + factory: grammars.DartLanguage, + nodes: map[string][]string{ + "function": {"function_signature"}, + "class": {"class_definition"}, + "method": {"method_signature"}, + "type": {"mixin_declaration", "extension_declaration"}, + }, + identifiers: idID("type_identifier"), + }, + "r": { + factory: grammars.RLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + }, + identifiers: idID(), + }, + "objc": { + factory: grammars.ObjcLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + "class": {"class_interface", "class_implementation"}, + "method": {"method_definition"}, + "type": {"protocol_declaration"}, + }, + identifiers: idID("type_identifier", "field_identifier"), + }, + + // --- Tier 4: markup / data / config with structural nodes --- + "html": { + factory: grammars.HtmlLanguage, + nodes: map[string][]string{ + "type": {"doctype"}, + }, + identifiers: nil, + }, + "css": { + factory: grammars.CssLanguage, + nodes: map[string][]string{ + "class": {"rule_set"}, + }, + identifiers: nil, + }, + "scss": { + factory: grammars.ScssLanguage, + nodes: map[string][]string{ + "function": {"mixin_statement"}, + "class": {"rule_set"}, + }, + identifiers: nil, + }, + "sql": { + factory: grammars.SqlLanguage, + nodes: map[string][]string{ + "function": {"create_function_statement"}, + "type": {"create_table_statement"}, + }, + identifiers: nil, + }, + "markdown": { + factory: grammars.MarkdownLanguage, + nodes: map[string][]string{ + "type": {"section", "atx_heading"}, + }, + identifiers: nil, + }, + + // --- Tier 5: medium-confidence additions --- + "zig": { + factory: grammars.ZigLanguage, + nodes: map[string][]string{ + "function": {"function_declaration"}, + "class": {"struct_declaration"}, + }, + identifiers: idID(), + }, + "julia": { + factory: grammars.JuliaLanguage, + nodes: map[string][]string{ + "function": {"function_definition"}, + }, + identifiers: idID(), + }, + "fortran": { + factory: grammars.FortranLanguage, + nodes: map[string][]string{ + "function": {"subroutine", "function"}, + "class": {"module"}, + }, + identifiers: idID(), + }, + "haskell": { + factory: grammars.HaskellLanguage, + nodes: map[string][]string{ + // `function` = untyped top-level def; `bind` = typed binding + // (signature + match together); `signature` is loose stand-alone + // type signatures. + "function": {"function", "bind", "signature"}, + "type": {"data_type", "newtype"}, + }, + identifiers: map[string]struct{}{ + "variable": {}, "constructor": {}, "name": {}, + }, + }, + "ocaml": { + factory: grammars.OcamlLanguage, + nodes: map[string][]string{ + "function": {"value_definition"}, + "class": {"module_definition"}, + "type": {"type_definition"}, + }, + identifiers: idID("type_identifier"), + }, + } } // skipNames mirrors SKIP_NAMES in chunker.py. @@ -121,26 +480,6 @@ type Reference struct { Language string } -// --------------------------------------------------------------------------- -// Language registry -// --------------------------------------------------------------------------- - -// languageFunc is a factory for sitter.Language. -type languageFunc func() *sitter.Language - -var languageRegistry = map[string]languageFunc{ - "python": grammars.PythonLanguage, - "go": grammars.GoLanguage, - "javascript": grammars.JavascriptLanguage, - "typescript": grammars.TypescriptLanguage, - "tsx": grammars.TsxLanguage, - "java": grammars.JavaLanguage, - "c": grammars.CLanguage, - "cpp": grammars.CppLanguage, - "rust": grammars.RustLanguage, - "ruby": grammars.RubyLanguage, -} - // --------------------------------------------------------------------------- // ChunkFile — main entry point // --------------------------------------------------------------------------- @@ -155,29 +494,51 @@ func ChunkFile(filePath, content, language string, maxSize int) ([]Chunk, []Refe chunks, refs, err := chunkWithTreesitter(filePath, content, language, maxSize) if err != nil { // Fallback: sliding window, no references. - return chunkSlidingWindow(filePath, content, language), nil, nil + return chunkFallback(filePath, content, language), nil, nil } return chunks, refs, nil } +// chunkFallback returns reasonable chunks for content that the tree-sitter +// path could not handle (parser timeout, no grammar, malformed input, …). +// +// For languages where a regex-based extractor exists (currently only bash), +// we try that first — it produces real `function` chunks instead of generic +// `block` ones, which is much more useful for semantic search. If the +// extractor returns nil (no symbols found), we fall through to the universal +// sliding-window strategy so the file content is still indexed. +func chunkFallback(filePath, content, language string) []Chunk { + if language == "bash" { + if c := bashRegexChunks(filePath, content); len(c) > 0 { + return c + } + } + return chunkSlidingWindow(filePath, content, language) +} + // --------------------------------------------------------------------------- // Tree-sitter path // --------------------------------------------------------------------------- func chunkWithTreesitter(filePath, content, language string, maxSize int) ([]Chunk, []Reference, error) { + // Snapshot under RLock so a concurrent Configure() call does not race the read. + registryMu.RLock() langFn, ok := languageRegistry[language] + nodeKinds := languageNodes[language] + idTypes := identifierNodes[language] + registryMu.RUnlock() + if !ok { - return chunkSlidingWindow(filePath, content, language), nil, nil + return chunkFallback(filePath, content, language), nil, nil } lang := langFn() if lang == nil { - return chunkSlidingWindow(filePath, content, language), nil, nil + return chunkFallback(filePath, content, language), nil, nil } - nodeKinds, ok := languageNodes[language] - if !ok { + if nodeKinds == nil { // Grammar exists but we don't have node definitions → sliding window. - return chunkSlidingWindow(filePath, content, language), nil, nil + return chunkFallback(filePath, content, language), nil, nil } // Build flat target → kind map. @@ -190,7 +551,41 @@ func chunkWithTreesitter(filePath, content, language string, maxSize int) ([]Chu src := []byte(content) parser := sitter.NewParser(lang) + + // Twin guards: SetTimeoutMicros is the parser's own checkpoint-based + // budget; the cancellation flag is set by an external timer when the + // wall-clock deadline expires. The parser checks both at the same + // granularity, so they overshoot together — we still rely on the + // post-parse wall-clock check below to decide whether the tree is + // trustworthy. + parser.SetTimeoutMicros(parseHint) + var cancelFlag uint32 + parser.SetCancellationFlag(&cancelFlag) + deadline := time.AfterFunc(parseBudget, func() { + atomic.StoreUint32(&cancelFlag, 1) + }) + + parseStart := time.Now() tree, err := parser.Parse(src) + parseElapsed := time.Since(parseStart) + deadline.Stop() + + // Hard wall-clock check — even if parser claims success, a tree that + // took >2× the budget is the result of a backtracking pathology and + // the structure is not trustworthy enough to chunk on. Falling back to + // sliding window keeps the indexer responsive. + if parseElapsed > 2*parseBudget { + slog.Warn("chunker: parse exceeded budget, falling back to sliding window", + "path", filePath, "language", language, "elapsed", parseElapsed, + "budget", parseBudget) + return chunkFallback(filePath, content, language), nil, nil + } + if atomic.LoadUint32(&cancelFlag) == 1 { + slog.Warn("chunker: parse cancelled by deadline, falling back to sliding window", + "path", filePath, "language", language, "elapsed", parseElapsed) + return chunkFallback(filePath, content, language), nil, nil + } + if err != nil { return nil, nil, err } @@ -205,8 +600,8 @@ func chunkWithTreesitter(filePath, content, language string, maxSize int) ([]Chu extractNodes(root, lang, src, targetTypes, lines, filePath, language, &chunks, &coveredRanges, nil) - // Extract references. - refs := extractReferences(root, lang, src, targetTypes, filePath, language) + // Extract references using the snapshotted identifier set. + refs := extractReferences(root, lang, src, targetTypes, idTypes, filePath, language) // Fill gaps between extracted symbol nodes with "module" chunks. sortRanges(coveredRanges) @@ -237,7 +632,7 @@ func chunkWithTreesitter(filePath, content, language string, maxSize int) ([]Chu } if len(finalChunks) == 0 { - return chunkSlidingWindow(filePath, content, language), nil, nil + return chunkFallback(filePath, content, language), nil, nil } return finalChunks, refs, nil } @@ -312,15 +707,17 @@ func extractNodes( } // extractReferences walks AST collecting identifier usages (not definitions). +// idNodeTypes is passed in (rather than read from the global map) so callers +// can snapshot once and stay consistent if Configure() is called concurrently. func extractReferences( root *sitter.Node, lang *sitter.Language, src []byte, targetTypes map[string]string, + idNodeTypes map[string]struct{}, filePath, language string, ) []Reference { - idNodeTypes, ok := identifierNodes[language] - if !ok { + if len(idNodeTypes) == 0 { return nil } @@ -388,12 +785,27 @@ func extractReferences( } // extractName returns the first identifier-like child's text, or nil. +// +// The set of "identifier-like" node types covers the main grammars in the +// default registry. Notable additions beyond the obvious `identifier`: +// - `field_identifier` — Go method names (`func (b *Bar) Foo()` → "Foo") +// - `word` — bash function names (`hello() { ... }` → "hello") +// - `simple_identifier` — Swift / Kotlin function names +// - `constant` — Ruby class/module names (which start with uppercase) +// +// Without these, the symbol_name field on the resulting chunk was nil and +// the CLI's `cix summary` rendered weird placeholders (`[method] bool`, +// `[function] `). func extractName(node *sitter.Node, lang *sitter.Language, src []byte) *string { nameTypes := map[string]struct{}{ "identifier": {}, "name": {}, "property_identifier": {}, "type_identifier": {}, + "field_identifier": {}, + "word": {}, + "simple_identifier": {}, + "constant": {}, } cnt := node.ChildCount() for i := 0; i < cnt; i++ { diff --git a/server/internal/chunker/chunker_test.go b/server/internal/chunker/chunker_test.go index b64eb74..8244b9e 100644 --- a/server/internal/chunker/chunker_test.go +++ b/server/internal/chunker/chunker_test.go @@ -3,6 +3,9 @@ package chunker import ( "strings" "testing" + "time" + + sitter "github.com/odvcencio/gotreesitter" ) func TestChunkFile_Python(t *testing.T) { @@ -217,6 +220,60 @@ func TestSkipNames_ContainsExpected(t *testing.T) { } } +// TestChunkFile_ParseBudgetFallback exercises the parser-budget guard with +// a real-world pathology: the install.sh in this repo triggers ~31s of +// catastrophic backtracking in tree-sitter-bash. After the guard kicks in +// the chunker must return sliding-window chunks within ~parseBudget rather +// than blocking the entire indexer for half a minute. +// +// Skipped under -short because it deliberately runs until the deadline fires. +func TestChunkFile_ParseBudgetFallback(t *testing.T) { + if testing.Short() { + t.Skip("parse-budget test waits up to ~2s for the deadline to fire") + } + + // Construct bash content that deterministically tickles the bash + // grammar's slow path without depending on a specific repo file. + // Heredocs + nested $(...) inside a deeply nested case statement is a + // known trigger; we lean on the repo-known install.sh structure. + src := strings.Repeat(` +case "$x" in + pattern1) + cat < 2*parseBudget+500*time.Millisecond { + t.Errorf("ChunkFile elapsed %s, expected < ~2× parseBudget (%s)", + elapsed, parseBudget) + } + if len(chunks) == 0 { + t.Error("expected at least one chunk (block or function), got 0") + } + + // Refs are nil when sliding-window fallback fires. + _ = refs +} + func TestSplitLines_Roundtrip(t *testing.T) { original := "line one\nline two\nline three" lines := splitLines(original) @@ -246,3 +303,344 @@ type ID = string | number; t.Fatal("expected chunks from TypeScript source") } } + +// --- Tier 2 bug-fix tests: grammars were registered without languageNodes +// in earlier versions, so .tsx/.c/.cpp/.rb files silently fell to sliding +// window. These assert true semantic chunks now come back. --- + +func TestChunkFile_TSX(t *testing.T) { + src := `import React from "react"; + +interface Props { + name: string; +} + +export function Greeting(props: Props) { + return
Hello, {props.name}
; +} + +type Id = string | number; +` + chunks, _, err := ChunkFile("sample.tsx", src, "tsx", 0) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(chunks) == 0 { + t.Fatal("expected chunks from TSX source") + } + hasFunction := false + hasType := false + for _, c := range chunks { + if c.ChunkType == "function" { + hasFunction = true + } + if c.ChunkType == "type" { + hasType = true + } + } + if !hasFunction { + t.Errorf("expected function chunk for Greeting, got types: %v", chunkTypeCounts(chunks)) + } + if !hasType { + t.Errorf("expected type chunk for Id, got types: %v", chunkTypeCounts(chunks)) + } +} + +func TestChunkFile_C(t *testing.T) { + src := `#include + +struct Point { + double x; + double y; +}; + +typedef enum { RED, GREEN, BLUE } Color; + +int add(int a, int b) { + return a + b; +} + +int main(void) { + return add(1, 2); +} +` + chunks, _, err := ChunkFile("sample.c", src, "c", 0) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(chunks) == 0 { + t.Fatal("expected chunks from C source") + } + counts := chunkTypeCounts(chunks) + if counts["function"] == 0 { + t.Errorf("expected function chunks, got: %v", counts) + } + if counts["class"] == 0 { + t.Errorf("expected struct (class) chunk for Point, got: %v", counts) + } +} + +func TestChunkFile_Cpp(t *testing.T) { + src := `#include + +class Animal { +public: + Animal(std::string name) : name_(name) {} + std::string name() const { return name_; } +private: + std::string name_; +}; + +namespace zoo { + int count() { return 42; } +} +` + chunks, _, err := ChunkFile("sample.cpp", src, "cpp", 0) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(chunks) == 0 { + t.Fatal("expected chunks from C++ source") + } + counts := chunkTypeCounts(chunks) + if counts["class"] == 0 { + t.Errorf("expected class chunk for Animal, got: %v", counts) + } +} + +func TestChunkFile_Ruby(t *testing.T) { + src := `module Greetings + class Greeter + def initialize(name) + @name = name + end + + def greet + puts "Hello, #{@name}" + end + end +end +` + chunks, _, err := ChunkFile("sample.rb", src, "ruby", 0) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if len(chunks) == 0 { + t.Fatal("expected chunks from Ruby source") + } + counts := chunkTypeCounts(chunks) + if counts["class"] == 0 { + t.Errorf("expected class/module chunks, got: %v", counts) + } +} + +// --- Configure() filtering --- + +func TestConfigure_FilterToSubset(t *testing.T) { + defer Configure(nil) // restore defaults for other tests + + Configure([]string{"python", "go"}) + active := SupportedLanguages() + if len(active) != 2 { + t.Errorf("expected 2 active languages, got %d (%v)", len(active), active) + } + got := map[string]bool{} + for _, l := range active { + got[l] = true + } + if !got["python"] || !got["go"] { + t.Errorf("expected python+go, got %v", active) + } + if got["rust"] { + t.Error("rust should be filtered out") + } +} + +func TestConfigure_DefaultsAfterEmpty(t *testing.T) { + Configure([]string{"python"}) + Configure(nil) // should restore full defaults + active := SupportedLanguages() + if len(active) < 20 { + t.Errorf("expected ≥20 default languages, got %d", len(active)) + } +} + +func TestConfigure_UnknownIDIgnored(t *testing.T) { + defer Configure(nil) + + Configure([]string{"python", "imaginary-lang", "go"}) + active := SupportedLanguages() + got := map[string]bool{} + for _, l := range active { + got[l] = true + } + if !got["python"] || !got["go"] { + t.Errorf("expected python+go to survive, got %v", active) + } + if got["imaginary-lang"] { + t.Error("unknown language should not be added") + } +} + +func TestConfigure_CaseInsensitive(t *testing.T) { + defer Configure(nil) + + Configure([]string{" Python ", "GO"}) + active := SupportedLanguages() + if len(active) != 2 { + t.Errorf("expected 2 active languages, got %d (%v)", len(active), active) + } +} + +// chunkTypeCounts is a small helper for table-driven assertions on chunk types. +func chunkTypeCounts(chunks []Chunk) map[string]int { + out := map[string]int{} + for _, c := range chunks { + out[c.ChunkType]++ + } + return out +} + +// TestRegistry_AllFactoriesNonNil ensures every default-registered language +// resolves to a usable *sitter.Language. A nil factory return would mean +// gotreesitter renamed/removed a grammar between updates and we silently lost +// support — better to fail loud here than at runtime in production. +func TestRegistry_AllFactoriesNonNil(t *testing.T) { + defer Configure(nil) + Configure(nil) + + for _, lang := range SupportedLanguages() { + t.Run(lang, func(t *testing.T) { + registryMu.RLock() + fn := languageRegistry[lang] + registryMu.RUnlock() + if fn == nil { + t.Fatalf("nil factory for %q", lang) + } + if g := fn(); g == nil { + t.Fatalf("factory returned nil grammar for %q", lang) + } + }) + } +} + +// TestRegistry_NodeNamesMatchAST parses a tiny per-language fixture and +// asserts at least one configured node-type appears in its AST. This catches +// node-name typos in defaultRegistry without needing a fixture file per lang. +// Languages absent from the fixture map are skipped (registered but not +// covered — acceptable, but the per-language tests above cover the criticals). +func TestRegistry_NodeNamesMatchAST(t *testing.T) { + defer Configure(nil) + Configure(nil) + + fixtures := map[string]string{ + "python": "def f():\n pass\n", + "go": "package p\nfunc F() {}\n", + "javascript": "function f() {}\n", + "typescript": "function f(): void {}\n", + "tsx": "function F() { return
; }\n", + "java": "class C { void m() {} }\n", + "c": "int f(void) { return 0; }\n", + "cpp": "class C {}; int f(){return 0;}\n", + "rust": "fn f() {}\n", + "ruby": "class C\n def m; end\nend\n", + "c_sharp": "class C { void M() {} }\n", + "php": "\n", + "swift": "func f() {}\n", + "kotlin": "fun f() {}\n", + "scala": "object O { def f() = 1 }\n", + "bash": "f() { echo hi; }\n", + "lua": "function f() end\n", + "dart": "void f() {}\n", + "r": "f <- function() 1\n", + "objc": "@interface C\n@end\n", + "html": "\n", + "css": ".x { color: red; }\n", + "scss": ".x { color: red; }\n", + "sql": "CREATE TABLE t (id INT);\n", + "markdown": "# Heading\n\nbody\n", + "zig": "fn f() void {}\n", + "julia": "function f() end\n", + "fortran": "subroutine s\nend subroutine\n", + "haskell": "module M where\n\nf :: Int -> Int\nf x = x\n", + "ocaml": "let f x = x\n", + } + + for lang, src := range fixtures { + t.Run(lang, func(t *testing.T) { + registryMu.RLock() + fn, regOK := languageRegistry[lang] + nodes := languageNodes[lang] + registryMu.RUnlock() + + if !regOK { + t.Skipf("%q not in registry (deliberately filtered out)", lang) + } + if nodes == nil { + t.Skipf("%q has no node map (sliding-window only — by design)", lang) + } + + grammar := fn() + if grammar == nil { + t.Fatalf("nil grammar for %q", lang) + } + + parser := sitter.NewParser(grammar) + tree, err := parser.Parse([]byte(src)) + if err != nil { + t.Fatalf("parse error for %q: %v", lang, err) + } + root := tree.RootNode() + if root == nil { + t.Fatalf("nil root for %q", lang) + } + + want := map[string]struct{}{} + for _, types := range nodes { + for _, ty := range types { + want[ty] = struct{}{} + } + } + + seen := map[string]struct{}{} + collectNodeTypes(root, grammar, seen) + + matched := false + for ty := range want { + if _, ok := seen[ty]; ok { + matched = true + break + } + } + if !matched { + keys := make([]string, 0, len(want)) + for k := range want { + keys = append(keys, k) + } + t.Errorf("none of configured node types %v found in AST for %q. Sample AST node types seen: %v", + keys, lang, sampleKeys(seen, 12)) + } + }) + } +} + +func collectNodeTypes(n *sitter.Node, lang *sitter.Language, out map[string]struct{}) { + if n == nil { + return + } + out[n.Type(lang)] = struct{}{} + for i := 0; i < int(n.ChildCount()); i++ { + collectNodeTypes(n.Child(i), lang, out) + } +} + +func sampleKeys(m map[string]struct{}, n int) []string { + out := make([]string, 0, n) + for k := range m { + if len(out) >= n { + break + } + out = append(out, k) + } + return out +} diff --git a/server/internal/config/config.go b/server/internal/config/config.go index 3fa3c1e..c7a42ef 100644 --- a/server/internal/config/config.go +++ b/server/internal/config/config.go @@ -37,6 +37,12 @@ type Config struct { LlamaNGpuLayers int // CIX_N_GPU_LAYERS; -1 on darwin (Metal all layers), 0 elsewhere. LlamaStartupSec int // CIX_LLAMA_STARTUP_TIMEOUT; readiness probe ceiling in seconds. EmbeddingsEnabled bool // CIX_EMBEDDINGS_ENABLED; test hook to bypass sidecar entirely. + + // Languages narrows the chunker's active language set. Empty / unset + // activates all baked-in defaults (see chunker.defaultRegistry). Values + // not present in the registry are warned-and-ignored at startup. + // Source: CIX_LANGUAGES (comma-separated, case-insensitive). + Languages []string } // ModelSafeName returns the embedding model name normalised for use inside @@ -146,6 +152,14 @@ func Load() (*Config, error) { } c.EmbeddingsEnabled = enabled + if langs := getenv("CIX_LANGUAGES", ""); langs != "" { + for _, l := range strings.Split(langs, ",") { + if s := strings.TrimSpace(l); s != "" { + c.Languages = append(c.Languages, s) + } + } + } + return c, nil } diff --git a/server/internal/langdetect/langdetect.go b/server/internal/langdetect/langdetect.go index cc79fc8..4568fdf 100644 --- a/server/internal/langdetect/langdetect.go +++ b/server/internal/langdetect/langdetect.go @@ -25,6 +25,7 @@ var extensionMap = map[string]string{ ".cs": "c_sharp", ".swift": "swift", ".kt": "kotlin", + ".kts": "kotlin", ".scala": "scala", ".zig": "zig", ".jl": "julia", @@ -36,7 +37,7 @@ var extensionMap = map[string]string{ ".mm": "objc", // Web / scripting ".ts": "typescript", - ".tsx": "typescript", + ".tsx": "tsx", ".js": "javascript", ".jsx": "javascript", ".rb": "ruby", diff --git a/server/internal/langdetect/langdetect_test.go b/server/internal/langdetect/langdetect_test.go index 62f0dde..27e94f7 100644 --- a/server/internal/langdetect/langdetect_test.go +++ b/server/internal/langdetect/langdetect_test.go @@ -10,7 +10,7 @@ func TestDetect(t *testing.T) { {"main.go", "go"}, {"app.py", "python"}, {"index.ts", "typescript"}, - {"index.tsx", "typescript"}, + {"index.tsx", "tsx"}, {"app.js", "javascript"}, {"lib.rs", "rust"}, {"Hello.java", "java"}, @@ -35,6 +35,8 @@ func TestDetect(t *testing.T) { {"/some/path/to/main.go", "go"}, {"script.R", "r"}, // uppercase .R {"script.sh", "bash"}, + {"build.gradle.kts", "kotlin"}, + {"app.kts", "kotlin"}, } for _, c := range cases { got := Detect(c.path) From 8e46c97fb61626c2b5b764ab577cd21bc60fec2e Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Mon, 27 Apr 2026 21:50:59 +0100 Subject: [PATCH 2/9] feat(streaming): NDJSON progress events + ctx-disconnect cancel for /index/files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces the single-JSON response of POST /index/files with an NDJSON event stream when the client sends Accept: application/x-ndjson. Solves three real-world pain points seen on heavy batches (e.g. 20 files = 347 chunks = 165s on a single GPU slot): * CLI no longer hits its 600s http.Client deadline on long batches — per-event keepalive plus a server-side 10s heartbeat ticker keep the connection alive arbitrarily long. New streamingClient on the CLI uses Timeout: 0 with a 60s idle-watchdog instead. * Server now detects client disconnect mid-batch via r.Context().Done() and immediately calls CancelIndexing() to release the per-project session lock. Previously the lock survived until the 1h TTL. * Per-file progress is visible during the batch (file_started, file_chunked, file_embedded, file_done events). Three render modes on the CLI: Interactive (TTY status line with CR), LineByLine (CI / non-TTY), Quiet (watcher — only summaries + file_error). Backwards compatibility is asymmetric and intentional: server still serves old clients (no Accept header → existing single-JSON path). New CLI hard- fails with ErrLegacyServer if it gets back Content-Type: application/json, because the operator's deploy workflow is server-first. Wire format and event schema documented in server/internal/indexer/progress.go and mirrored in cli/internal/client/progress.go. SIGINT/SIGTERM in `cix reindex` now propagates via signal.NotifyContext — HTTP request context cancels, server frees lock automatically. Belt-and- braces deferred CancelIndex on error paths in indexer.Run() and watcher Stop(). Tests: * server/internal/httpapi/indexing_streaming_test.go — streaming happy path, disconnect-frees-lock (direct handler invocation with custom flushRecorder), legacy compat negotiation * cli/internal/client/index_streaming_test.go — NDJSON parse, callback ordering, ErrLegacyServer hard-fail, idle timeout, retry on 503/429, back-compat SendFiles wrapper * watcher tests updated to mock NDJSON responses Co-Authored-By: Claude Opus 4.7 --- cli/cmd/init.go | 2 +- cli/cmd/reindex.go | 17 +- cli/cmd/root.go | 7 +- cli/internal/client/client.go | 23 + cli/internal/client/index.go | 216 ++++++++- cli/internal/client/index_streaming_test.go | 331 +++++++++++++ cli/internal/client/progress.go | 60 +++ cli/internal/config/config.go | 9 +- cli/internal/indexer/indexer.go | 60 ++- cli/internal/indexer/indexer_test.go | 40 +- cli/internal/indexer/progress.go | 205 ++++++++ cli/internal/watcher/watcher.go | 19 +- cli/internal/watcher/watcher_test.go | 33 +- server/internal/httpapi/indexing.go | 147 ++++++ .../httpapi/indexing_streaming_test.go | 444 ++++++++++++++++++ server/internal/indexer/indexer.go | 112 ++++- server/internal/indexer/progress.go | 95 ++++ 17 files changed, 1780 insertions(+), 40 deletions(-) create mode 100644 cli/internal/client/index_streaming_test.go create mode 100644 cli/internal/client/progress.go create mode 100644 cli/internal/indexer/progress.go create mode 100644 server/internal/httpapi/indexing_streaming_test.go create mode 100644 server/internal/indexer/progress.go diff --git a/cli/cmd/init.go b/cli/cmd/init.go index 0076beb..8de73bd 100644 --- a/cli/cmd/init.go +++ b/cli/cmd/init.go @@ -75,7 +75,7 @@ func runInit(cmd *cobra.Command, args []string) error { cfg, _ := config.Load() batchSize := cfg.Indexing.BatchSize fmt.Printf("Starting indexing (batch size: %d)...\n", batchSize) - result, err := indexer.Run(client, absPath, false, batchSize) + result, err := indexer.Run(cmd.Context(), client, absPath, false, batchSize, indexer.AutoProgressMode()) if err != nil { return fmt.Errorf("indexing failed: %w", err) } diff --git a/cli/cmd/reindex.go b/cli/cmd/reindex.go index eb0e223..d96f79b 100644 --- a/cli/cmd/reindex.go +++ b/cli/cmd/reindex.go @@ -1,9 +1,12 @@ package cmd import ( + "context" "fmt" "os" + "os/signal" "path/filepath" + "syscall" "time" "github.com/anthropics/code-index/cli/internal/config" @@ -68,8 +71,20 @@ func runReindex(cmd *cobra.Command, args []string) error { fmt.Printf("%s reindexing: %s (batch size: %d)\n", indexType, absPath, batchSize) - result, err := indexer.Run(apiClient, absPath, reindexFull, batchSize) + // SIGINT/SIGTERM → ctx cancellation. The indexer propagates ctx through + // SendFilesStreaming, which closes the HTTP connection; the server's + // streaming handler sees the disconnect and calls CancelIndexing, + // freeing the project lock immediately rather than at the 1-hour TTL. + ctx, stop := signal.NotifyContext(cmd.Context(), syscall.SIGINT, syscall.SIGTERM) + defer stop() + + result, err := indexer.Run(ctx, apiClient, absPath, reindexFull, batchSize, indexer.AutoProgressMode()) if err != nil { + // If the user hit Ctrl+C, surface a friendlier message — the deferred + // CancelIndex inside indexer.Run already freed the server lock. + if ctx.Err() == context.Canceled { + return fmt.Errorf("indexing cancelled by user") + } return fmt.Errorf("indexing failed: %w", err) } diff --git a/cli/cmd/root.go b/cli/cmd/root.go index 07e109d..8e36c89 100644 --- a/cli/cmd/root.go +++ b/cli/cmd/root.go @@ -4,6 +4,7 @@ import ( "fmt" "os" "strings" + "time" "github.com/anthropics/code-index/cli/internal/client" "github.com/anthropics/code-index/cli/internal/config" @@ -126,5 +127,9 @@ func getClient() (*client.Client, error) { } } - return client.New(url, key), nil + c := client.New(url, key) + if cfg.Indexing.StreamingIdleTimeoutSec > 0 { + c.SetStreamingIdleTimeout(time.Duration(cfg.Indexing.StreamingIdleTimeoutSec) * time.Second) + } + return c, nil } diff --git a/cli/internal/client/client.go b/cli/internal/client/client.go index ceccdc5..6723f12 100644 --- a/cli/internal/client/client.go +++ b/cli/internal/client/client.go @@ -14,8 +14,23 @@ type Client struct { baseURL string apiKey string httpClient *http.Client + + // streamingClient is used for endpoints that return chunked NDJSON + // (currently only POST /index/files when Accept advertises x-ndjson). + // Timeout is 0 because the natural duration of an indexing batch is + // dominated by GPU embed time and there is no useful overall ceiling. + // Idle silence is bounded by streamingIdleTimeout instead. + streamingClient *http.Client + streamingIdleTimeout time.Duration } +// defaultStreamingIdleTimeout is the maximum allowed gap between events on a +// streaming response. Server emits a heartbeat every 10s, so 60s gives a 6× +// margin — enough to absorb a one-shot llama-supervisor restart (which can +// pause embedding for ~5s several times in a row before the queue catches up) +// or a network hiccup, without giving up on a still-progressing batch. +const defaultStreamingIdleTimeout = 60 * time.Second + // New creates a new API client func New(baseURL, apiKey string) *Client { return &Client{ @@ -24,9 +39,17 @@ func New(baseURL, apiKey string) *Client { httpClient: &http.Client{ Timeout: 600 * time.Second, }, + streamingClient: &http.Client{Timeout: 0}, + streamingIdleTimeout: defaultStreamingIdleTimeout, } } +// SetStreamingIdleTimeout overrides the silence threshold for streaming +// endpoints. Pass 0 to disable the watchdog entirely (not recommended). +func (c *Client) SetStreamingIdleTimeout(d time.Duration) { + c.streamingIdleTimeout = d +} + // BaseURL returns the base URL this client is configured to use. func (c *Client) BaseURL() string { return c.baseURL diff --git a/cli/internal/client/index.go b/cli/internal/client/index.go index cb7ba17..69ded20 100644 --- a/cli/internal/client/index.go +++ b/cli/internal/client/index.go @@ -1,10 +1,16 @@ package client import ( + "bufio" + "bytes" + "context" + "encoding/json" "fmt" + "io" "math/rand" "net/http" "strconv" + "strings" "time" ) @@ -94,43 +100,233 @@ func (c *Client) BeginIndex(path string, full bool) (*BeginIndexResponse, error) return &result, nil } -// SendFiles sends a batch of files to be indexed in the given run. -// On HTTP 503 (GPU busy) or 429 (rate limited) it retries with exponential -// backoff up to maxSendRetries times before giving up. +// SendFiles sends a batch of files to be indexed. It is now a thin wrapper +// over SendFilesStreaming with a no-op event callback and a background +// context — kept for tests and for callers that don't want progress events. +// +// Note: even though the response is streamed under the hood, this wrapper +// blocks until the server closes the stream and returns only the final +// summary, matching the pre-streaming public surface. func (c *Client) SendFiles(path string, runID string, files []FilePayload) (*SendFilesResponse, error) { + return c.SendFilesStreaming(context.Background(), path, runID, files, nil) +} + +// SendFilesStreaming sends a batch of files and streams NDJSON progress +// events from the server. The onEvent callback is invoked for every event; +// pass nil if you only want the final summary. +// +// On HTTP 503 (GPU busy) or 429 (rate limited) the request is retried with +// exponential backoff up to maxSendRetries times BEFORE the stream begins. +// Once the stream has started (i.e. the server responded with NDJSON), the +// caller is in a long-lived single attempt — failures during the stream +// surface to the caller without a retry. +// +// Returns ErrLegacyServer if the server doesn't speak NDJSON (Content-Type +// negotiation failed). The CLI surfaces this as "upgrade your server". +// +// Returns ErrIdleTimeout if no data arrives for streamingIdleTimeout — the +// connection is forcibly closed and the caller should treat the run as +// failed (the server will see ctx cancellation and free the session lock). +func (c *Client) SendFilesStreaming( + ctx context.Context, + path string, + runID string, + files []FilePayload, + onEvent func(ProgressEvent), +) (*SendFilesResponse, error) { encodedPath := encodeProjectPath(path) + url := c.baseURL + fmt.Sprintf("/api/v1/projects/%s/index/files", encodedPath) + body := map[string]interface{}{ "run_id": runID, "files": files, } + bodyBytes, err := json.Marshal(body) + if err != nil { + return nil, fmt.Errorf("marshal body: %w", err) + } for attempt := 0; attempt <= maxSendRetries; attempt++ { - resp, err := c.do("POST", fmt.Sprintf("/api/v1/projects/%s/index/files", encodedPath), body) + // Wrap caller ctx so the idle watchdog can cancel without touching + // the original. callerErr() distinguishes "caller cancelled us" + // from "watchdog cancelled us" when reporting errors. + streamCtx, streamCancel := context.WithCancel(ctx) + + req, err := http.NewRequestWithContext(streamCtx, http.MethodPost, url, bytes.NewReader(bodyBytes)) + if err != nil { + streamCancel() + return nil, fmt.Errorf("create request: %w", err) + } + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Accept", "application/x-ndjson") + if c.apiKey != "" { + req.Header.Set("Authorization", "Bearer "+c.apiKey) + } + + resp, err := c.streamingClient.Do(req) if err != nil { - return nil, err + streamCancel() + return nil, fmt.Errorf("do request: %w", err) } + // Retryable backpressure responses — short body, no streaming begun. if resp.StatusCode == http.StatusServiceUnavailable || resp.StatusCode == http.StatusTooManyRequests { header := resp.Header.Get("Retry-After") resp.Body.Close() + streamCancel() delay := retryAfterDelay(header, sendRetryDelay(attempt)) fmt.Printf(" GPU busy — retrying in %s (attempt %d/%d)...\n", delay.Round(time.Second), attempt+1, maxSendRetries) - time.Sleep(delay) + select { + case <-ctx.Done(): + return nil, ctx.Err() + case <-time.After(delay): + } continue } - var result SendFilesResponse - if err := parseResponse(resp, &result); err != nil { - return nil, err + // Any non-200 here is a hard error (bad run_id, project missing, …). + if resp.StatusCode != http.StatusOK { + defer resp.Body.Close() + defer streamCancel() + respBody, _ := io.ReadAll(resp.Body) + var errResp struct { + Detail string `json:"detail"` + } + if json.Unmarshal(respBody, &errResp) == nil && errResp.Detail != "" { + return nil, fmt.Errorf("API error (%d): %s", resp.StatusCode, errResp.Detail) + } + return nil, fmt.Errorf("API error (%d): %s", resp.StatusCode, string(respBody)) } - return &result, nil + + // Hard fail if the server returned plain JSON (legacy build) instead + // of NDJSON. We deliberately do not attempt a fallback parse — the + // operator is expected to upgrade the server first. + ct := resp.Header.Get("Content-Type") + if !strings.HasPrefix(ct, "application/x-ndjson") { + resp.Body.Close() + streamCancel() + return nil, ErrLegacyServer + } + + // At this point we have an open NDJSON stream. The retry loop ends. + result, err := readStream(streamCtx, streamCancel, resp.Body, onEvent, c.streamingIdleTimeout, ctx) + streamCancel() + return result, err } return nil, fmt.Errorf("GPU still busy after %d retries — try again later", maxSendRetries) } +// readStream consumes NDJSON lines from body, invokes onEvent for each, and +// returns the SendFilesResponse harvested from the terminal batch_done event. +// streamCancel is called whenever readStream wants to abort the connection +// (idle timeout, decode error, fatal server event). +func readStream( + streamCtx context.Context, + streamCancel context.CancelFunc, + body io.ReadCloser, + onEvent func(ProgressEvent), + idleTimeout time.Duration, + callerCtx context.Context, +) (*SendFilesResponse, error) { + defer body.Close() + + // Idle watchdog — fires streamCancel if no line arrives for idleTimeout. + // idleTimeout=0 disables the watchdog (used by tests when convenient). + lineRead := make(chan struct{}, 1) + if idleTimeout > 0 { + go func() { + timer := time.NewTimer(idleTimeout) + defer timer.Stop() + for { + select { + case <-lineRead: + if !timer.Stop() { + select { + case <-timer.C: + default: + } + } + timer.Reset(idleTimeout) + case <-timer.C: + streamCancel() + return + case <-streamCtx.Done(): + return + } + } + }() + } + + scanner := bufio.NewScanner(body) + // Some chunks may be very large (long file paths or error messages); + // give the scanner room. 1 MiB max-line should cover anything realistic. + scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024) + + var final *SendFilesResponse + var fatalErr error + + for scanner.Scan() { + // Notify watchdog: line arrived, reset idle timer. + select { + case lineRead <- struct{}{}: + default: + } + + line := bytes.TrimSpace(scanner.Bytes()) + if len(line) == 0 { + continue + } + + var ev ProgressEvent + if err := json.Unmarshal(line, &ev); err != nil { + return nil, fmt.Errorf("decode ndjson line: %w (line=%q)", err, line) + } + + if onEvent != nil { + onEvent(ev) + } + + switch ev.Event { + case EventBatchDone: + // Don't return yet — there may be a trailing newline. The + // scanner.Scan() loop will exit naturally on EOF. + final = &SendFilesResponse{ + FilesAccepted: ev.FilesAccepted, + ChunksCreated: ev.ChunksCreated, + FilesProcessedTotal: ev.FilesProcessedTotal, + } + case EventError: + if ev.Fatal { + fatalErr = fmt.Errorf("server error: %s", ev.Message) + } + } + } + + if err := scanner.Err(); err != nil { + // Distinguish caller cancel vs idle timeout vs network error. + if callerCtx.Err() != nil { + return nil, callerCtx.Err() + } + if streamCtx.Err() == context.Canceled && idleTimeout > 0 { + return nil, ErrIdleTimeout + } + return nil, fmt.Errorf("scan ndjson: %w", err) + } + + if fatalErr != nil { + return nil, fatalErr + } + if final == nil { + // Stream ended cleanly but no batch_done — server bug or partial + // write. Surface it so the caller can retry the batch. + return nil, fmt.Errorf("ndjson stream ended without batch_done event") + } + return final, nil +} + // FinishIndex completes the indexing session, removing deleted files. func (c *Client) FinishIndex(path string, runID string, deletedPaths []string, totalFiles int) (*FinishIndexResponse, error) { encodedPath := encodeProjectPath(path) diff --git a/cli/internal/client/index_streaming_test.go b/cli/internal/client/index_streaming_test.go new file mode 100644 index 0000000..712134a --- /dev/null +++ b/cli/internal/client/index_streaming_test.go @@ -0,0 +1,331 @@ +package client + +import ( + "bufio" + "context" + "encoding/json" + "errors" + "fmt" + "io" + "net/http" + "net/http/httptest" + "strings" + "sync" + "testing" + "time" +) + +// streamWriter is a tiny convenience for tests that need to push NDJSON +// lines from the server side. It writes one JSON object per call followed +// by a newline, then flushes so the client sees it immediately. +type streamWriter struct { + w http.ResponseWriter + f http.Flusher +} + +func newStreamWriter(t *testing.T, w http.ResponseWriter) *streamWriter { + t.Helper() + f, ok := w.(http.Flusher) + if !ok { + t.Fatal("response writer does not implement Flusher") + } + w.Header().Set("Content-Type", "application/x-ndjson") + w.WriteHeader(http.StatusOK) + f.Flush() + return &streamWriter{w: w, f: f} +} + +func (s *streamWriter) write(t *testing.T, ev ProgressEvent) { + t.Helper() + b, err := json.Marshal(ev) + if err != nil { + t.Fatalf("marshal event: %v", err) + } + if _, err := s.w.Write(append(b, '\n')); err != nil { + t.Logf("write: %v", err) // not fatal — client may have disconnected + } + s.f.Flush() +} + +// TestSendFilesStreaming_BatchDone — happy path: events delivered in order, +// final SendFilesResponse pulled from batch_done event. +func TestSendFilesStreaming_BatchDone(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + if r.Header.Get("Accept") != "application/x-ndjson" { + t.Errorf("Accept header = %q, want application/x-ndjson", r.Header.Get("Accept")) + } + s := newStreamWriter(t, w) + s.write(t, ProgressEvent{Event: EventFileStarted, Path: "/p/a.go", FileIndex: 1, BatchSize: 2}) + s.write(t, ProgressEvent{Event: EventFileEmbedded, Path: "/p/a.go", Chunks: 3, EmbedMS: 50}) + s.write(t, ProgressEvent{Event: EventFileDone, Path: "/p/a.go", Chunks: 3}) + s.write(t, ProgressEvent{Event: EventFileStarted, Path: "/p/b.go", FileIndex: 2, BatchSize: 2}) + s.write(t, ProgressEvent{Event: EventFileDone, Path: "/p/b.go", Chunks: 2}) + s.write(t, ProgressEvent{ + Event: EventBatchDone, FilesAccepted: 2, ChunksCreated: 5, FilesProcessedTotal: 2, + }) + })) + defer srv.Close() + + c := New(srv.URL, "key") + var events []ProgressEvent + resp, err := c.SendFilesStreaming(context.Background(), "/p", "run-1", []FilePayload{ + {Path: "/p/a.go", Content: "x"}, + {Path: "/p/b.go", Content: "y"}, + }, func(ev ProgressEvent) { + events = append(events, ev) + }) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if resp.FilesAccepted != 2 || resp.ChunksCreated != 5 { + t.Errorf("resp = %+v, want files=2 chunks=5", resp) + } + if len(events) != 6 { + t.Errorf("events count = %d, want 6", len(events)) + } + if events[0].Event != EventFileStarted { + t.Errorf("events[0] = %q, want %q", events[0].Event, EventFileStarted) + } + if events[len(events)-1].Event != EventBatchDone { + t.Errorf("last event = %q, want %q", events[len(events)-1].Event, EventBatchDone) + } +} + +// TestSendFilesStreaming_Heartbeat verifies heartbeat events make it to the +// callback (not just dropped) and final result still reflects only batch_done. +func TestSendFilesStreaming_Heartbeat(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + s := newStreamWriter(t, w) + s.write(t, ProgressEvent{Event: EventHeartbeat, TS: "2026-04-27T17:00:00Z"}) + s.write(t, ProgressEvent{Event: EventHeartbeat, TS: "2026-04-27T17:00:10Z"}) + s.write(t, ProgressEvent{Event: EventBatchDone, FilesAccepted: 0}) + })) + defer srv.Close() + + c := New(srv.URL, "") + heartbeatCount := 0 + resp, err := c.SendFilesStreaming(context.Background(), "/p", "r", nil, func(ev ProgressEvent) { + if ev.Event == EventHeartbeat { + heartbeatCount++ + } + }) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if heartbeatCount != 2 { + t.Errorf("heartbeat count = %d, want 2", heartbeatCount) + } + if resp.FilesAccepted != 0 { + t.Errorf("resp.FilesAccepted = %d, want 0", resp.FilesAccepted) + } +} + +// TestSendFilesStreaming_LegacyServer ensures we hard-fail when the server +// returns single-JSON instead of NDJSON. No silent fallback — caller learns +// they need to upgrade. +func TestSendFilesStreaming_LegacyServer(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusOK) + _, _ = w.Write([]byte(`{"files_accepted":1,"chunks_created":3,"files_processed_total":1}`)) + })) + defer srv.Close() + + c := New(srv.URL, "") + calledBack := false + _, err := c.SendFilesStreaming(context.Background(), "/p", "r", nil, func(ev ProgressEvent) { + calledBack = true + }) + if !errors.Is(err, ErrLegacyServer) { + t.Errorf("err = %v, want ErrLegacyServer", err) + } + if calledBack { + t.Error("onEvent should not have been called against a legacy server") + } +} + +// TestSendFilesStreaming_IdleTimeout — stall the response indefinitely and +// confirm the watchdog cancels the request. +func TestSendFilesStreaming_IdleTimeout(t *testing.T) { + stall := make(chan struct{}) // never closed + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + s := newStreamWriter(t, w) + // Send one event then sit silent until the client times out. + s.write(t, ProgressEvent{Event: EventFileStarted, Path: "/p/x.go"}) + select { + case <-stall: + case <-r.Context().Done(): + } + })) + defer srv.Close() + defer close(stall) + + c := New(srv.URL, "") + c.SetStreamingIdleTimeout(150 * time.Millisecond) + _, err := c.SendFilesStreaming(context.Background(), "/p", "r", nil, nil) + if !errors.Is(err, ErrIdleTimeout) { + t.Errorf("err = %v, want ErrIdleTimeout", err) + } +} + +// TestSendFilesStreaming_FatalError — server emits a fatal error event, +// caller gets a non-nil error containing the message. +func TestSendFilesStreaming_FatalError(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + s := newStreamWriter(t, w) + s.write(t, ProgressEvent{Event: EventFileStarted, Path: "/p/x.go"}) + s.write(t, ProgressEvent{Event: EventError, Message: "embedder unavailable", Fatal: true}) + })) + defer srv.Close() + + c := New(srv.URL, "") + _, err := c.SendFilesStreaming(context.Background(), "/p", "r", nil, nil) + if err == nil { + t.Fatal("expected error from fatal event, got nil") + } + if !strings.Contains(err.Error(), "embedder unavailable") { + t.Errorf("error %q does not contain server message", err) + } +} + +// TestSendFilesStreaming_NonStreamingErrorBodyDecoded — when the server +// returns non-200 (e.g. 404 bad run_id), the JSON detail is surfaced. +func TestSendFilesStreaming_NonStreamingError(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusNotFound) + _, _ = w.Write([]byte(`{"detail":"unknown run_id"}`)) + })) + defer srv.Close() + + c := New(srv.URL, "") + _, err := c.SendFilesStreaming(context.Background(), "/p", "r", nil, nil) + if err == nil { + t.Fatal("expected error, got nil") + } + if !strings.Contains(err.Error(), "unknown run_id") { + t.Errorf("error %q does not surface server detail", err) + } +} + +// TestSendFiles_BackwardCompat — existing public surface still works, +// invoking SendFilesStreaming under the hood. +func TestSendFiles_BackwardCompat(t *testing.T) { + var requestSeen sync.WaitGroup + requestSeen.Add(1) + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + defer requestSeen.Done() + if r.Header.Get("Accept") != "application/x-ndjson" { + t.Errorf("SendFiles wrapper must request NDJSON, got Accept=%q", r.Header.Get("Accept")) + } + s := newStreamWriter(t, w) + s.write(t, ProgressEvent{ + Event: EventBatchDone, FilesAccepted: 1, ChunksCreated: 4, FilesProcessedTotal: 1, + }) + })) + defer srv.Close() + + c := New(srv.URL, "") + resp, err := c.SendFiles("/p", "r", []FilePayload{{Path: "/p/x.go"}}) + if err != nil { + t.Fatalf("err: %v", err) + } + if resp.FilesAccepted != 1 || resp.ChunksCreated != 4 { + t.Errorf("resp = %+v", resp) + } + requestSeen.Wait() +} + +// TestSendFilesStreaming_RetryOn503 — server returns 503 with Retry-After, +// then succeeds; client should follow the retry and ultimately get the +// batch_done event without surfacing the temporary failure. +func TestSendFilesStreaming_RetryOn503(t *testing.T) { + var calls int32 + var mu sync.Mutex + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + mu.Lock() + calls++ + current := calls + mu.Unlock() + if current == 1 { + w.Header().Set("Retry-After", "1") + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusServiceUnavailable) + _, _ = w.Write([]byte(`{"detail":"GPU busy"}`)) + return + } + s := newStreamWriter(t, w) + s.write(t, ProgressEvent{ + Event: EventBatchDone, FilesAccepted: 1, ChunksCreated: 2, FilesProcessedTotal: 1, + }) + })) + defer srv.Close() + + c := New(srv.URL, "") + // Stub stdout via the Bash tool not relevant here; the retry print is OK. + resp, err := c.SendFilesStreaming(context.Background(), "/p", "r", []FilePayload{{Path: "x"}}, nil) + if err != nil { + t.Fatalf("expected success after retry, got err: %v", err) + } + if resp.FilesAccepted != 1 { + t.Errorf("resp.FilesAccepted = %d, want 1", resp.FilesAccepted) + } + if calls != 2 { + t.Errorf("expected 2 server calls, got %d", calls) + } +} + +// TestSendFilesStreaming_CallerCancel — caller cancels ctx mid-stream, the +// streaming call returns the context error promptly. +func TestSendFilesStreaming_CallerCancel(t *testing.T) { + hold := make(chan struct{}) + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + s := newStreamWriter(t, w) + s.write(t, ProgressEvent{Event: EventFileStarted, Path: "/p/x.go"}) + select { + case <-hold: + case <-r.Context().Done(): + } + })) + defer srv.Close() + defer close(hold) + + c := New(srv.URL, "") + c.SetStreamingIdleTimeout(0) // disable watchdog so we test caller cancel only + + ctx, cancel := context.WithCancel(context.Background()) + + // Cancel after the first event is observed. + gotEvent := make(chan struct{}) + errCh := make(chan error, 1) + go func() { + _, err := c.SendFilesStreaming(ctx, "/p", "r", nil, func(ev ProgressEvent) { + select { + case gotEvent <- struct{}{}: + default: + } + }) + errCh <- err + }() + + select { + case <-gotEvent: + case <-time.After(2 * time.Second): + t.Fatal("never received first event") + } + cancel() + + select { + case err := <-errCh: + if !errors.Is(err, context.Canceled) { + t.Errorf("err = %v, want context.Canceled", err) + } + case <-time.After(2 * time.Second): + t.Fatal("SendFilesStreaming did not return after cancel") + } +} + +// keep imports honest for a future addition; small no-op compile guard +var _ = bufio.NewScanner +var _ = io.EOF +var _ = fmt.Sprintf diff --git a/cli/internal/client/progress.go b/cli/internal/client/progress.go new file mode 100644 index 0000000..c79d5a5 --- /dev/null +++ b/cli/internal/client/progress.go @@ -0,0 +1,60 @@ +package client + +import "errors" + +// ProgressEvent mirrors server/internal/indexer/progress.go:ProgressEvent. +// Both sides ship in the same PR; the duplication is the cost of keeping +// CLI and server as separate Go modules. +// +// Event values: file_started, file_chunked, file_embedded, file_done, +// file_error, heartbeat, batch_done, error. +type ProgressEvent struct { + Event string `json:"event"` + + // Per-file fields. + Path string `json:"path,omitempty"` + FileIndex int `json:"file_index,omitempty"` + BatchSize int `json:"batch_size,omitempty"` + Chunks int `json:"chunks,omitempty"` + EmbedMS int64 `json:"embed_ms,omitempty"` + + // Heartbeat. + TS string `json:"ts,omitempty"` + + // Errors. + Message string `json:"message,omitempty"` + Fatal bool `json:"fatal,omitempty"` + + // batch_done summary. + FilesAccepted int `json:"files_accepted,omitempty"` + ChunksCreated int `json:"chunks_created,omitempty"` + FilesProcessedTotal int `json:"files_processed_total,omitempty"` + + RunID string `json:"run_id,omitempty"` +} + +// Event kinds — keep in sync with server/internal/indexer/progress.go. +const ( + EventFileStarted = "file_started" + EventFileChunked = "file_chunked" + EventFileEmbedded = "file_embedded" + EventFileDone = "file_done" + EventFileError = "file_error" + EventHeartbeat = "heartbeat" + EventBatchDone = "batch_done" + EventError = "error" +) + +// ErrLegacyServer is returned by SendFilesStreaming when the server responds +// with a non-NDJSON Content-Type — meaning the server predates the streaming +// protocol. Callers should surface this as "upgrade your server" rather than +// silently retrying or falling back. +var ErrLegacyServer = errors.New( + "server does not support streaming protocol — upgrade server to a version that supports NDJSON on /index/files", +) + +// ErrIdleTimeout is returned when the streaming response has been silent for +// longer than the configured idle timeout. The server should be sending at +// least a heartbeat every 10 seconds; 30 seconds of silence implies the +// server is hung or the network has stalled. +var ErrIdleTimeout = errors.New("streaming response idle timeout — no data from server") diff --git a/cli/internal/config/config.go b/cli/internal/config/config.go index 61cb06b..1456819 100644 --- a/cli/internal/config/config.go +++ b/cli/internal/config/config.go @@ -36,6 +36,12 @@ type ServerConfig struct { type IndexingConfig struct { BatchSize int `yaml:"batchsize"` + + // StreamingIdleTimeoutSec is the maximum allowed silence on the streaming + // /index/files response before the CLI gives up and closes the conn. The + // server emits a heartbeat every 10s, so 30s gives the network three + // retry windows. Set to 0 to disable the watchdog (not recommended). + StreamingIdleTimeoutSec int `yaml:"streaming_idle_timeout_sec"` } type ProjectEntry struct { @@ -68,7 +74,8 @@ func defaults() Config { CacheTTL: 300, }, Indexing: IndexingConfig{ - BatchSize: 20, + BatchSize: 20, + StreamingIdleTimeoutSec: 30, }, } } diff --git a/cli/internal/indexer/indexer.go b/cli/internal/indexer/indexer.go index 4b69bd6..96ec57e 100644 --- a/cli/internal/indexer/indexer.go +++ b/cli/internal/indexer/indexer.go @@ -1,6 +1,7 @@ package indexer import ( + "context" "fmt" "os" "time" @@ -21,7 +22,25 @@ type Result struct { } // Run performs a complete index cycle: begin → discover → diff → send batches → finish. -func Run(apiClient *client.Client, projectPath string, full bool, batchSize int) (*Result, error) { +// +// ctx is honoured for cancellation: a SIGINT-derived ctx (or a watcher's stop +// signal) propagates through to the streaming SendFilesStreaming call, which +// closes the HTTP connection. The server-side streaming handler sees the +// disconnect and frees the project's session lock immediately, so the next +// reindex doesn't hit 409. As a belt-and-braces, this function defers an +// explicit CancelIndex call for the active run on early exit. +// +// mode controls how per-file progress events are rendered. Pass +// AutoProgressMode() for `cix reindex` (TTY-aware), ProgressQuiet for the +// watcher (only summary + errors hit the log). +func Run( + ctx context.Context, + apiClient *client.Client, + projectPath string, + full bool, + batchSize int, + mode ProgressMode, +) (*Result, error) { if batchSize <= 0 { batchSize = defaultBatchSize } @@ -36,6 +55,17 @@ func Run(apiClient *client.Client, projectPath string, full bool, batchSize int) } fmt.Printf(" Session: %s\n", beginResp.RunID) + // Belt-and-braces: if we exit early (ctx cancellation, network error, + // SendFilesStreaming failure), tell the server to release the project + // lock instead of leaving it for the 1-hour TTL. CancelIndex is + // idempotent and fast. + cancelDone := false + defer func() { + if !cancelDone { + _, _ = apiClient.CancelIndex(projectPath) + } + }() + // Phase 2: Discover files on disk fmt.Println("Discovering files...") discovered, err := discovery.Discover(projectPath, discovery.Options{}) @@ -80,8 +110,16 @@ func Run(apiClient *client.Client, projectPath string, full bool, batchSize int) fmt.Printf(" %d file(s) to process\n", len(toProcess)) } - // Phase 3: Send files in batches + // Phase 3: Send files in batches via streaming. Each batch gets its own + // progressRenderer so per-file indices restart from 1 in the renderer's + // context but display globally as (batchOffset+i). for i := 0; i < len(toProcess); i += batchSize { + // Honour ctx cancellation between batches; mid-batch cancellation + // is handled inside SendFilesStreaming. + if err := ctx.Err(); err != nil { + return nil, err + } + end := i + batchSize if end > len(toProcess) { end = len(toProcess) @@ -110,20 +148,28 @@ func Run(apiClient *client.Client, projectPath string, full bool, batchSize int) continue } - resp, err := apiClient.SendFiles(projectPath, beginResp.RunID, payloads) + // batchOffset is 1-based offset of the first payload in this batch + // within the overall toProcess slice. Renderer adds ev.FileIndex + // (which is also 1-based per batch) and prints `[N/total]`. + renderer := newProgressRenderer(mode, len(toProcess), i) + _, err := apiClient.SendFilesStreaming( + ctx, projectPath, beginResp.RunID, payloads, renderer.onEvent, + ) if err != nil { return nil, fmt.Errorf("send files (batch %d-%d): %w", i+1, end, err) } - - fmt.Printf(" Processed %d/%d files (%d chunks)\n", - resp.FilesProcessedTotal, len(toProcess), resp.ChunksCreated) } - // Phase 4: Finish — server cleans up deleted files and finalizes the run + // Phase 4: Finish — server cleans up deleted files and finalizes the run. + // We mark cancelDone before this point so the deferred CancelIndex doesn't + // fire on the happy path. + cancelDone = true finishResp, err := apiClient.FinishIndex( projectPath, beginResp.RunID, deletedPaths, len(discovered), ) if err != nil { + // Restore the deferred cancel — finish failed, lock should be released. + _, _ = apiClient.CancelIndex(projectPath) return nil, fmt.Errorf("finish index: %w", err) } diff --git a/cli/internal/indexer/indexer_test.go b/cli/internal/indexer/indexer_test.go index 7609289..fb490d6 100644 --- a/cli/internal/indexer/indexer_test.go +++ b/cli/internal/indexer/indexer_test.go @@ -1,6 +1,7 @@ package indexer import ( + "context" "crypto/sha1" "crypto/sha256" "encoding/hex" @@ -50,11 +51,11 @@ type indexHandler struct { } func (h *indexHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { - w.Header().Set("Content-Type", "application/json") p := r.URL.Path switch { case strings.Contains(p, h.hash+"/index/begin"): + w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(map[string]any{ "run_id": "run-test", "stored_hashes": h.beginHashes, @@ -67,7 +68,17 @@ func (h *indexHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { } _ = json.Unmarshal(body, &payload) h.FilesReceived = append(h.FilesReceived, payload.Files...) - json.NewEncoder(w).Encode(map[string]any{ + + // Speak NDJSON — the new client requires it. We emit a single + // batch_done event matching the legacy summary semantics so existing + // assertions on FilesReceived continue to hold. + w.Header().Set("Content-Type", "application/x-ndjson") + w.WriteHeader(http.StatusOK) + if f, ok := w.(http.Flusher); ok { + f.Flush() + } + _ = json.NewEncoder(w).Encode(map[string]any{ + "event": "batch_done", "files_accepted": len(payload.Files), "chunks_created": len(payload.Files), "files_processed_total": len(payload.Files), @@ -80,12 +91,17 @@ func (h *indexHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { } _ = json.Unmarshal(body, &finish) h.DeletedPaths = finish.DeletedPaths + w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(map[string]any{ "status": "ok", "files_processed": len(h.FilesReceived), "chunks_created": len(h.FilesReceived), }) + case strings.Contains(p, h.hash+"/index/cancel"): + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(map[string]any{"cancelled": false}) + default: http.NotFound(w, r) } @@ -111,7 +127,7 @@ func TestRun_AddNewFile(t *testing.T) { srv, h := newServer(t, dir, map[string]string{}) c := client.New(srv.URL, "test-key") - result, err := Run(c, dir, false, 0) + result, err := Run(context.Background(), c, dir, false, 0, ProgressQuiet) if err != nil { t.Fatalf("unexpected error: %v", err) } @@ -137,7 +153,7 @@ func TestRun_UpdatedFile(t *testing.T) { }) c := client.New(srv.URL, "test-key") - _, err := Run(c, dir, false, 0) + _, err := Run(context.Background(), c, dir, false, 0, ProgressQuiet) if err != nil { t.Fatalf("unexpected error: %v", err) } @@ -165,7 +181,7 @@ func TestRun_DeletedFile(t *testing.T) { }) c := client.New(srv.URL, "test-key") - _, err := Run(c, dir, false, 0) + _, err := Run(context.Background(), c, dir, false, 0, ProgressQuiet) if err != nil { t.Fatalf("unexpected error: %v", err) } @@ -219,7 +235,7 @@ func TestRun_NoChanges(t *testing.T) { t.Cleanup(srv.Close) c := client.New(srv.URL, "test-key") - _, err := Run(c, dir, false, 0) + _, err := Run(context.Background(), c, dir, false, 0, ProgressQuiet) if err != nil { t.Fatalf("unexpected error: %v", err) } @@ -243,7 +259,7 @@ func TestRun_FullReindex(t *testing.T) { srv, h := newServer(t, dir, map[string]string{path: storedHash}) c := client.New(srv.URL, "test-key") - _, err := Run(c, dir, true /* full */, 0) + _, err := Run(context.Background(), c, dir, true /* full */, 0, ProgressQuiet) if err != nil { t.Fatalf("unexpected error: %v", err) } @@ -270,7 +286,7 @@ func TestRun_ServerUnavailable(t *testing.T) { srv.Close() c := client.New(srv.URL, "test-key") - _, err := Run(c, dir, false, 0) + _, err := Run(context.Background(), c, dir, false, 0, ProgressQuiet) if err == nil { t.Fatal("expected error when server is unavailable, got nil") } @@ -294,7 +310,7 @@ func TestRun_ServerError5xx(t *testing.T) { t.Cleanup(srv.Close) c := client.New(srv.URL, "test-key") - _, err := Run(c, dir, false, 0) + _, err := Run(context.Background(), c, dir, false, 0, ProgressQuiet) if err == nil { t.Fatal("expected error on 503, got nil") } @@ -317,7 +333,7 @@ func TestRun_RecoveryAfterFailure(t *testing.T) { downSrv.Close() c1 := client.New(downSrv.URL, "test-key") - if _, err := Run(c1, dir, false, 0); err == nil { + if _, err := Run(context.Background(), c1, dir, false, 0, ProgressQuiet); err == nil { t.Fatal("expected error on first run") } @@ -326,7 +342,7 @@ func TestRun_RecoveryAfterFailure(t *testing.T) { srv, h := newServer(t, dir, map[string]string{}) c2 := client.New(srv.URL, "test-key") - _, err := Run(c2, dir, false, 0) + _, err := Run(context.Background(), c2, dir, false, 0, ProgressQuiet) if err != nil { t.Fatalf("expected recovery run to succeed: %v", err) } @@ -352,7 +368,7 @@ func TestRun_MultipleFiles(t *testing.T) { srv, h := newServer(t, dir, map[string]string{}) c := client.New(srv.URL, "test-key") - result, err := Run(c, dir, false, 1 /* batchSize=1 to exercise batching */) + result, err := Run(context.Background(), c, dir, false, 1 /* batchSize=1 to exercise batching */, ProgressQuiet) if err != nil { t.Fatalf("unexpected error: %v", err) } diff --git a/cli/internal/indexer/progress.go b/cli/internal/indexer/progress.go new file mode 100644 index 0000000..0acfa6c --- /dev/null +++ b/cli/internal/indexer/progress.go @@ -0,0 +1,205 @@ +package indexer + +import ( + "fmt" + "io" + "os" + "time" + "unicode/utf8" + + "github.com/anthropics/code-index/cli/internal/client" +) + +// ProgressMode controls how SendFilesStreaming events are rendered to the +// user. Reindex on a TTY uses Interactive (in-place status line); reindex +// in CI / non-TTY context uses LineByLine; the watcher uses Quiet (only +// summary + errors land in the log). +type ProgressMode int + +const ( + // ProgressInteractive updates a single status line with carriage returns. + ProgressInteractive ProgressMode = iota + // ProgressLineByLine prints one log line per file_started/file_done. + ProgressLineByLine + // ProgressQuiet only prints file_error and the final batch summary. + ProgressQuiet +) + +// AutoProgressMode returns Interactive when stdout is a terminal and +// LineByLine otherwise. Tests and watchers should pass an explicit mode. +func AutoProgressMode() ProgressMode { + if isTerminal(os.Stdout) { + return ProgressInteractive + } + return ProgressLineByLine +} + +// isTerminal reports whether f is a character device (a TTY). Avoids the +// golang.org/x/term dependency to keep the CLI module's go directive at the +// existing minimum (no toolchain bump for a single-line check). +func isTerminal(f *os.File) bool { + stat, err := f.Stat() + if err != nil { + return false + } + return (stat.Mode() & os.ModeCharDevice) != 0 +} + +// progressRenderer is a stateful event handler. It is created once per batch +// (so per-batch counters reset) and called for every NDJSON event the +// streaming client receives. +type progressRenderer struct { + mode ProgressMode + out io.Writer + totalFiles int // total files to process across all batches + batchOffset int // index of the first file in this batch (1-based) + fileStart time.Time // when the current file's file_started arrived + + // lastLineRunes tracks the visible width of the last status line we + // drew so we can erase it before redrawing. We count runes (not bytes) + // because UTF-8 multi-byte chars like `…` would otherwise inflate the + // padding and the cursor would land past the visible end, leaving the + // tail of the previous line on screen. + lastLineRunes int + + // activeFile is the path we're currently rendering progress for. + activeFile string + + // activeFileIdx caches the global file index from file_started so it + // can be reused on file_chunked / file_embedded / file_done — which + // the server emits without a FileIndex field. Without this cache, the + // renderer fell back to ev.FileIndex == 0 and printed `[0/N]` on every + // file_embedded line. + activeFileIdx int +} + +func newProgressRenderer(mode ProgressMode, totalFiles, batchOffset int) *progressRenderer { + return &progressRenderer{ + mode: mode, + out: os.Stdout, + totalFiles: totalFiles, + batchOffset: batchOffset, + } +} + +// onEvent is the callback fed to client.SendFilesStreaming. +func (r *progressRenderer) onEvent(ev client.ProgressEvent) { + switch r.mode { + case ProgressInteractive: + r.renderInteractive(ev) + case ProgressLineByLine: + r.renderLineByLine(ev) + case ProgressQuiet: + r.renderQuiet(ev) + } +} + +func (r *progressRenderer) renderInteractive(ev client.ProgressEvent) { + switch ev.Event { + case client.EventFileStarted: + r.activeFile = ev.Path + r.activeFileIdx = r.batchOffset + ev.FileIndex + r.fileStart = time.Now() + r.statusLine(fmt.Sprintf("[%d/%d] %s (chunking…)", + r.activeFileIdx, r.totalFiles, ev.Path)) + + case client.EventFileEmbedded: + // FileIndex is not populated on file_embedded; reuse the cached + // value from the matching file_started. + r.statusLine(fmt.Sprintf("[%d/%d] %s (embedded %d chunks, %dms)", + r.activeFileIdx, r.totalFiles, ev.Path, ev.Chunks, ev.EmbedMS)) + + case client.EventFileDone: + // Leave the embedded line — file_done arrives so quickly that + // rewriting it again is just flicker. + + case client.EventHeartbeat: + if r.activeFile != "" { + elapsed := time.Since(r.fileStart).Round(time.Second) + r.statusLine(fmt.Sprintf("[%d/%d] %s · %s elapsed", + r.activeFileIdx, r.totalFiles, r.activeFile, elapsed)) + } + + case client.EventFileError: + r.endStatusLine() + fmt.Fprintf(r.out, " ! %s: %s\n", ev.Path, ev.Message) + + case client.EventBatchDone: + r.endStatusLine() + fmt.Fprintf(r.out, " Processed %d/%d files (%d chunks)\n", + ev.FilesProcessedTotal, r.totalFiles, ev.ChunksCreated) + + case client.EventError: + if ev.Fatal { + r.endStatusLine() + fmt.Fprintf(r.out, " ! server error: %s\n", ev.Message) + } + } +} + +func (r *progressRenderer) renderLineByLine(ev client.ProgressEvent) { + switch ev.Event { + case client.EventFileStarted: + idx := r.batchOffset + ev.FileIndex + fmt.Fprintf(r.out, " [%d/%d] %s\n", idx, r.totalFiles, ev.Path) + case client.EventFileError: + fmt.Fprintf(r.out, " ! %s: %s\n", ev.Path, ev.Message) + case client.EventBatchDone: + fmt.Fprintf(r.out, " Processed %d/%d files (%d chunks)\n", + ev.FilesProcessedTotal, r.totalFiles, ev.ChunksCreated) + case client.EventError: + if ev.Fatal { + fmt.Fprintf(r.out, " ! server error: %s\n", ev.Message) + } + } +} + +func (r *progressRenderer) renderQuiet(ev client.ProgressEvent) { + switch ev.Event { + case client.EventFileError: + fmt.Fprintf(r.out, " ! %s: %s\n", ev.Path, ev.Message) + case client.EventBatchDone: + fmt.Fprintf(r.out, " Processed %d/%d files (%d chunks)\n", + ev.FilesProcessedTotal, r.totalFiles, ev.ChunksCreated) + case client.EventError: + if ev.Fatal { + fmt.Fprintf(r.out, " ! server error: %s\n", ev.Message) + } + } +} + +// statusLine clears the previous line (overwriting with spaces, then \r) and +// writes the new line without a trailing newline. Avoids ANSI escapes so it +// works in any terminal. Width is measured in runes — len() on a string with +// `…` (U+2026, 3 bytes) would over-pad and leave residue from the previous +// line at the right edge. +func (r *progressRenderer) statusLine(s string) { + runes := utf8.RuneCountInString(s) + if runes < r.lastLineRunes { + // Pad with spaces to erase the longer previous text, then \r back + // so the next write overwrites again. + fmt.Fprintf(r.out, "\r%s", s+spaces(r.lastLineRunes-runes)) + } else { + fmt.Fprintf(r.out, "\r%s", s) + } + r.lastLineRunes = runes +} + +// endStatusLine writes a newline so subsequent output starts on a fresh line. +func (r *progressRenderer) endStatusLine() { + if r.lastLineRunes > 0 { + fmt.Fprintln(r.out) + r.lastLineRunes = 0 + } +} + +func spaces(n int) string { + if n <= 0 { + return "" + } + b := make([]byte, n) + for i := range b { + b[i] = ' ' + } + return string(b) +} diff --git a/cli/internal/watcher/watcher.go b/cli/internal/watcher/watcher.go index 3c21371..000f216 100644 --- a/cli/internal/watcher/watcher.go +++ b/cli/internal/watcher/watcher.go @@ -1,6 +1,7 @@ package watcher import ( + "context" "fmt" "log" "os" @@ -463,11 +464,27 @@ func (w *Watcher) runIndexer(full bool) { } }() + // Bridge stopCh → ctx for the duration of this indexing run so + // SendFilesStreaming bails out fast when the watcher is stopped. + ctx, cancelRun := context.WithCancel(context.Background()) + stopBridge := make(chan struct{}) + go func() { + select { + case <-w.stopCh: + cancelRun() + case <-stopBridge: + } + }() + defer func() { + cancelRun() + close(stopBridge) + }() + // Run with transient failure retries var err error for attempt := 0; attempt < 3; attempt++ { var result *indexer.Result - result, err = indexer.Run(w.apiClient, w.projectPath, full, 0) + result, err = indexer.Run(ctx, w.apiClient, w.projectPath, full, 0, indexer.ProgressQuiet) if err == nil { if full { w.logger.Printf("Full reindex complete: %d files, %d chunks (run ID: %s)", diff --git a/cli/internal/watcher/watcher_test.go b/cli/internal/watcher/watcher_test.go index d4fa4a9..d5705e8 100644 --- a/cli/internal/watcher/watcher_test.go +++ b/cli/internal/watcher/watcher_test.go @@ -55,12 +55,15 @@ func newTestWatcher(t *testing.T, projectPath, apiURL string) *Watcher { } // newIndexServer sets up a minimal mock that handles the three-phase index -// protocol and counts how many times each phase was called. +// protocol and counts how many times each phase was called. /index/files +// emits an NDJSON stream when the client sends Accept: application/x-ndjson +// (matching the streaming protocol the production server speaks). type serverCalls struct { mu sync.Mutex Begin int Files int Finish int + Cancel int } func newIndexServer(t *testing.T, dir string) (*httptest.Server, *serverCalls) { @@ -69,30 +72,50 @@ func newIndexServer(t *testing.T, dir string) (*httptest.Server, *serverCalls) { hash := projectHash(dir) srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - w.Header().Set("Content-Type", "application/json") p := r.URL.Path calls.mu.Lock() - defer calls.mu.Unlock() switch { case strings.Contains(p, hash+"/index/begin"): calls.Begin++ + calls.mu.Unlock() + w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(map[string]any{ "run_id": "run-watch", "stored_hashes": map[string]string{}, }) case strings.Contains(p, hash+"/index/files"): calls.Files++ + calls.mu.Unlock() io.ReadAll(r.Body) //nolint - json.NewEncoder(w).Encode(map[string]any{ - "files_accepted": 0, "chunks_created": 0, "files_processed_total": 0, + // Always speak NDJSON — the production server does the negotiation + // based on the Accept header; for tests we emit the new protocol + // unconditionally because the new client always opts in. + w.Header().Set("Content-Type", "application/x-ndjson") + w.WriteHeader(http.StatusOK) + if f, ok := w.(http.Flusher); ok { + f.Flush() + } + _ = json.NewEncoder(w).Encode(map[string]any{ + "event": "batch_done", + "files_accepted": 0, + "chunks_created": 0, + "files_processed_total": 0, }) case strings.Contains(p, hash+"/index/finish"): calls.Finish++ + calls.mu.Unlock() + w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(map[string]any{ "status": "ok", "files_processed": 0, "chunks_created": 0, }) + case strings.Contains(p, hash+"/index/cancel"): + calls.Cancel++ + calls.mu.Unlock() + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(map[string]any{"cancelled": false}) default: + calls.mu.Unlock() http.NotFound(w, r) } })) diff --git a/server/internal/httpapi/indexing.go b/server/internal/httpapi/indexing.go index bce9d76..4065a1e 100644 --- a/server/internal/httpapi/indexing.go +++ b/server/internal/httpapi/indexing.go @@ -1,13 +1,17 @@ package httpapi import ( + "context" "encoding/json" "errors" "net/http" "strconv" + "strings" + "time" "github.com/dvcdsys/code-index/server/internal/embeddings" "github.com/dvcdsys/code-index/server/internal/indexer" + "github.com/dvcdsys/code-index/server/internal/projects" ) // --------------------------------------------------------------------------- @@ -143,6 +147,15 @@ func indexFilesHandler(d Deps) http.HandlerFunc { } } + // Negotiate streaming vs single-JSON via Accept header. Old CLIs do + // not advertise application/x-ndjson, so they keep getting the legacy + // blocking response. New CLIs explicitly request the stream and get + // per-file progress + heartbeats. + if acceptsNDJSON(r.Header.Get("Accept")) { + indexFilesStreamingHandler(d, p, body.RunID, files, w, r) + return + } + accepted, chunks, total, err := d.Indexer.ProcessFiles(r.Context(), p.HostPath, body.RunID, files) if err != nil { if retry, busy := embeddings.IsBusy(err); busy { @@ -167,6 +180,140 @@ func indexFilesHandler(d Deps) http.HandlerFunc { } } +// acceptsNDJSON returns true when the Accept header advertises +// application/x-ndjson. Comma-separated values are inspected; q-values are +// ignored (presence is sufficient — the client opted in). +func acceptsNDJSON(accept string) bool { + for _, part := range strings.Split(accept, ",") { + // Strip parameters (q=…) and surrounding whitespace. + mediaType := strings.TrimSpace(part) + if i := strings.IndexByte(mediaType, ';'); i >= 0 { + mediaType = strings.TrimSpace(mediaType[:i]) + } + if strings.EqualFold(mediaType, "application/x-ndjson") { + return true + } + } + return false +} + +// streamingHeartbeatInterval is how often we emit a heartbeat event when no +// file-level progress has been sent. Idle on the wire ≤ heartbeatInterval + +// embedder slack, well under the client's default 30s read deadline. Var +// (not const) so tests can shrink it to keep the suite fast. +var streamingHeartbeatInterval = 10 * time.Second + +// streamingDisconnectCancelTimeout bounds how long we spend cleaning up a +// session after the client disconnects. +const streamingDisconnectCancelTimeout = 5 * time.Second + +// indexFilesStreamingHandler writes one NDJSON event per line with per-file +// progress and 10-second heartbeats. When the client disconnects mid-batch +// we call CancelIndexing so the session lock is released immediately rather +// than lingering until the 1-hour TTL. +func indexFilesStreamingHandler( + d Deps, + p *projects.Project, + runID string, + files []indexer.FilePayload, + w http.ResponseWriter, + r *http.Request, +) { + flusher, ok := w.(http.Flusher) + if !ok { + // httptest.ResponseRecorder and a few mock servers don't implement + // Flusher. Falling back to writeError keeps tests readable while + // still pointing at the misuse. + writeError(w, http.StatusInternalServerError, "streaming not supported by HTTP transport") + return + } + + w.Header().Set("Content-Type", "application/x-ndjson") + w.Header().Set("Cache-Control", "no-cache") + // X-Accel-Buffering disables proxy buffering on nginx; harmless elsewhere. + w.Header().Set("X-Accel-Buffering", "no") + w.WriteHeader(http.StatusOK) + flusher.Flush() + + progress := make(chan indexer.ProgressEvent, 32) + + // streamCtx is a child of r.Context() so client-disconnect propagation + // works automatically, but we keep our own cancel handle so a *write* + // failure (broken pipe before Go's read goroutine notices the FIN) can + // also unblock the indexer goroutine immediately. Otherwise the embedder + // would keep computing wasted GPU work until r.Context() eventually fires. + streamCtx, cancelStream := context.WithCancel(r.Context()) + defer cancelStream() + + go func() { + defer close(progress) + _, _, _, _ = d.Indexer.ProcessFilesStreaming(streamCtx, p.HostPath, runID, files, progress) + }() + + ticker := time.NewTicker(streamingHeartbeatInterval) + defer ticker.Stop() + + encoder := json.NewEncoder(w) + clientGone := false + + // markClientGone is the single place where we transition into the "drain + // progress until the indexer exits" mode. Cancelling streamCtx makes the + // embedder's ctx.Done() select fire so the indexer returns within ms + // rather than completing wasted work. + markClientGone := func() { + if clientGone { + return + } + clientGone = true + cancelStream() + } + + for { + select { + case ev, open := <-progress: + if !open { + // ProcessFilesStreaming has returned and closed the channel. + // If the client disconnected mid-flight, free the session + // lock immediately so a follow-up reindex doesn't hit 409. + if clientGone { + d.Logger.Warn("streaming: client disconnected mid-batch, cancelling session", + "run_id", runID, "project", p.HostPath) + cancelCtx, cancel := context.WithTimeout( + context.Background(), streamingDisconnectCancelTimeout) + _, _ = d.Indexer.CancelIndexing(cancelCtx, p.HostPath) + cancel() + } + return + } + if clientGone { + continue // drain to let ProcessFilesStreaming finish + } + if err := encoder.Encode(ev); err != nil { + markClientGone() + continue + } + flusher.Flush() + case <-ticker.C: + if clientGone { + continue + } + if err := encoder.Encode(indexer.ProgressEvent{ + Event: indexer.EventHeartbeat, + TS: indexer.NowTS(), + }); err != nil { + markClientGone() + continue + } + flusher.Flush() + case <-r.Context().Done(): + // Client disconnected (or request context cancelled by router). + // Set clientGone and cancel the indexer's ctx so it returns now. + d.Logger.Debug("streaming: r.Context() done", "run_id", runID, "err", r.Context().Err()) + markClientGone() + } + } +} + // --------------------------------------------------------------------------- // POST /api/v1/projects/{path}/index/finish // --------------------------------------------------------------------------- diff --git a/server/internal/httpapi/indexing_streaming_test.go b/server/internal/httpapi/indexing_streaming_test.go new file mode 100644 index 0000000..cbb1a68 --- /dev/null +++ b/server/internal/httpapi/indexing_streaming_test.go @@ -0,0 +1,444 @@ +package httpapi + +import ( + "bufio" + "bytes" + "context" + "encoding/json" + "fmt" + "io" + "log/slog" + "net/http" + "net/http/httptest" + "strings" + "sync" + "testing" + "time" + + "github.com/dvcdsys/code-index/server/internal/indexer" +) + +// slogDiscard returns a logger that discards all output — used to keep test +// stdout quiet while still satisfying the non-nil Logger contract some code +// paths rely on (Warn/Debug/Info on a nil *slog.Logger panics). +func slogDiscard() *slog.Logger { + return slog.New(slog.NewTextHandler(io.Discard, nil)) +} + +// flushRecorder is an http.ResponseWriter that supports http.Flusher and +// records every write so tests can observe streamed output. It is safe to +// access from a single test goroutine plus the handler goroutine — the +// shared buffer is mutex-protected. +type flushRecorder struct { + mu sync.Mutex + buf bytes.Buffer + header http.Header + status int + written chan struct{} +} + +func newFlushRecorder() *flushRecorder { + return &flushRecorder{ + header: make(http.Header), + written: make(chan struct{}, 1), + } +} + +func (r *flushRecorder) Header() http.Header { return r.header } + +func (r *flushRecorder) Write(p []byte) (int, error) { + r.mu.Lock() + n, err := r.buf.Write(p) + r.mu.Unlock() + select { + case r.written <- struct{}{}: + default: + } + return n, err +} + +func (r *flushRecorder) WriteHeader(s int) { r.status = s } + +func (r *flushRecorder) Flush() {} // no-op — buf is already coherent + +// waitForBytes blocks until the recorder accumulates at least min bytes or +// the timeout elapses. Returns true on success. +func (r *flushRecorder) waitForBytes(timeout time.Duration, min int) bool { + deadline := time.After(timeout) + for { + r.mu.Lock() + got := r.buf.Len() + r.mu.Unlock() + if got >= min { + return true + } + select { + case <-r.written: + // loop + case <-deadline: + return false + } + } +} + +// streamingTestServer spins up a real httptest.Server so the streaming +// handler gets an http.ResponseWriter that implements Flusher (which +// httptest.ResponseRecorder does not). +func streamingTestServer(t *testing.T, projectPath string) (*httptest.Server, string) { + t.Helper() + d, hash := newIndexerTestDeps(t, projectPath) + srv := httptest.NewServer(NewRouter(d)) + t.Cleanup(srv.Close) + return srv, hash +} + +// blockingEmbedder is a fakeEmbedder that waits on a channel before returning. +// Used to simulate a slow embedder so the disconnect test can interrupt mid-batch +// before ProcessFilesStreaming completes naturally. +type blockingEmbedder struct { + fakeEmbedder + release chan struct{} // close to allow EmbedTexts to proceed +} + +func (b *blockingEmbedder) EmbedTexts(ctx context.Context, texts []string) ([][]float32, error) { + select { + case <-b.release: + case <-ctx.Done(): + return nil, ctx.Err() + } + return b.fakeEmbedder.EmbedTexts(ctx, texts) +} + + +// readNDJSONLines reads NDJSON until either io.EOF or until limit lines have +// been collected. Returns the parsed events. +func readNDJSONLines(t *testing.T, body io.Reader, limit int) []indexer.ProgressEvent { + t.Helper() + scanner := bufio.NewScanner(body) + scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024) + var events []indexer.ProgressEvent + for scanner.Scan() { + line := bytes.TrimSpace(scanner.Bytes()) + if len(line) == 0 { + continue + } + var ev indexer.ProgressEvent + if err := json.Unmarshal(line, &ev); err != nil { + t.Fatalf("decode ndjson line %q: %v", line, err) + } + events = append(events, ev) + if limit > 0 && len(events) >= limit { + return events + } + } + if err := scanner.Err(); err != nil && err != io.EOF { + t.Fatalf("scan: %v", err) + } + return events +} + +// beginSession is a small helper: starts a session, returns run_id. +func beginSession(t *testing.T, baseURL, hash string) string { + t.Helper() + resp, err := http.Post( + baseURL+"/api/v1/projects/"+hash+"/index/begin", + "application/json", + strings.NewReader(`{"full":true}`), + ) + if err != nil { + t.Fatalf("begin: %v", err) + } + defer resp.Body.Close() + if resp.StatusCode != 200 { + body, _ := io.ReadAll(resp.Body) + t.Fatalf("begin status=%d body=%s", resp.StatusCode, body) + } + var br indexBeginResponse + if err := json.NewDecoder(resp.Body).Decode(&br); err != nil { + t.Fatalf("decode begin: %v", err) + } + return br.RunID +} + +func newFilesRequestBody(t *testing.T, runID string, files map[string]string) []byte { + t.Helper() + payload := map[string]any{ + "run_id": runID, + "files": []map[string]any{}, + } + for path, content := range files { + payload["files"] = append(payload["files"].([]map[string]any), map[string]any{ + "path": path, + "content": content, + "content_hash": shaHex(content), + "language": "go", + "size": len(content), + }) + } + b, err := json.Marshal(payload) + if err != nil { + t.Fatalf("marshal: %v", err) + } + return b +} + +// TestIndexFilesStreaming_BatchDone exercises the happy path: NDJSON event +// stream contains file_started + batch_done with the expected counts. +func TestIndexFilesStreaming_BatchDone(t *testing.T) { + srv, hash := streamingTestServer(t, "/proj") + runID := beginSession(t, srv.URL, hash) + + body := newFilesRequestBody(t, runID, map[string]string{ + "/proj/a.go": "package main\nfunc A() int { return 1 }\n", + "/proj/b.go": "package main\nfunc B() int { return 2 }\n", + }) + + req, _ := http.NewRequest( + http.MethodPost, + srv.URL+"/api/v1/projects/"+hash+"/index/files", + bytes.NewReader(body), + ) + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Accept", "application/x-ndjson") + + resp, err := http.DefaultClient.Do(req) + if err != nil { + t.Fatalf("post: %v", err) + } + defer resp.Body.Close() + + if resp.StatusCode != 200 { + out, _ := io.ReadAll(resp.Body) + t.Fatalf("status=%d body=%s", resp.StatusCode, out) + } + if ct := resp.Header.Get("Content-Type"); ct != "application/x-ndjson" { + t.Errorf("Content-Type=%q, want application/x-ndjson", ct) + } + + events := readNDJSONLines(t, resp.Body, 0) + if len(events) == 0 { + t.Fatal("no events in stream") + } + + last := events[len(events)-1] + if last.Event != indexer.EventBatchDone { + t.Fatalf("last event = %q, want %q (events: %v)", last.Event, indexer.EventBatchDone, summarizeEvents(events)) + } + if last.FilesAccepted != 2 { + t.Errorf("files_accepted=%d, want 2", last.FilesAccepted) + } + if last.ChunksCreated == 0 { + t.Errorf("chunks_created=0") + } + + // At least one file_started event must appear. + startedCount := 0 + for _, e := range events { + if e.Event == indexer.EventFileStarted { + startedCount++ + } + } + if startedCount != 2 { + t.Errorf("file_started count=%d, want 2", startedCount) + } +} + +// TestIndexFilesStreaming_LegacyCompat verifies that requests without an +// Accept: application/x-ndjson header keep getting the existing single-JSON +// response. This is the regression guard for old CLIs against a new server. +func TestIndexFilesStreaming_LegacyCompat(t *testing.T) { + srv, hash := streamingTestServer(t, "/proj") + runID := beginSession(t, srv.URL, hash) + + body := newFilesRequestBody(t, runID, map[string]string{ + "/proj/x.go": "package main\nfunc X() {}\n", + }) + + req, _ := http.NewRequest( + http.MethodPost, + srv.URL+"/api/v1/projects/"+hash+"/index/files", + bytes.NewReader(body), + ) + req.Header.Set("Content-Type", "application/json") + // No Accept header → legacy path. + + resp, err := http.DefaultClient.Do(req) + if err != nil { + t.Fatalf("post: %v", err) + } + defer resp.Body.Close() + + if resp.StatusCode != 200 { + out, _ := io.ReadAll(resp.Body) + t.Fatalf("status=%d body=%s", resp.StatusCode, out) + } + ct := resp.Header.Get("Content-Type") + if !strings.HasPrefix(ct, "application/json") { + t.Errorf("Content-Type=%q, want application/json (legacy)", ct) + } + + var legacy indexFilesResponse + if err := json.NewDecoder(resp.Body).Decode(&legacy); err != nil { + t.Fatalf("decode: %v", err) + } + if legacy.FilesAccepted != 1 { + t.Errorf("files_accepted=%d, want 1", legacy.FilesAccepted) + } +} + +// TestIndexFilesStreaming_AcceptOnly verifies that Accept headers without +// application/x-ndjson (e.g. */*) take the legacy path — only an explicit +// streaming opt-in upgrades the protocol. +func TestIndexFilesStreaming_AcceptOnly(t *testing.T) { + srv, hash := streamingTestServer(t, "/proj") + runID := beginSession(t, srv.URL, hash) + + body := newFilesRequestBody(t, runID, map[string]string{ + "/proj/y.go": "package main\nfunc Y() {}\n", + }) + + req, _ := http.NewRequest( + http.MethodPost, + srv.URL+"/api/v1/projects/"+hash+"/index/files", + bytes.NewReader(body), + ) + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Accept", "*/*") + + resp, err := http.DefaultClient.Do(req) + if err != nil { + t.Fatalf("post: %v", err) + } + defer resp.Body.Close() + + ct := resp.Header.Get("Content-Type") + if !strings.HasPrefix(ct, "application/json") { + t.Errorf("Accept=*/* should still get legacy JSON, got Content-Type=%q", ct) + } +} + +// TestIndexFilesStreaming_ClientDisconnect verifies that cancelling the +// request context mid-batch frees the session lock. We call the handler +// directly with a context we control rather than relying on Go's net/http +// to detect a client TCP disconnect — that detection is best-effort and +// unreliable in unit-test timeframes (it depends on the OS noticing FIN +// during a write or read goroutine, which can take seconds with chunked +// encoding even when the client has already closed). Cancelling the +// request's context is the same signal the server reacts to in production +// (chi propagates it from the underlying http.Request). +func TestIndexFilesStreaming_ClientDisconnect(t *testing.T) { + // Heartbeat shrunk so the inner ticker case fires reliably during the test. + prevHB := streamingHeartbeatInterval + streamingHeartbeatInterval = 50 * time.Millisecond + t.Cleanup(func() { streamingHeartbeatInterval = prevHB }) + + emb := &blockingEmbedder{ + fakeEmbedder: fakeEmbedder{dim: 16}, + release: make(chan struct{}), + } + d, hash := newIndexerTestDeps(t, "/proj") + d.EmbeddingSvc = emb + d.Indexer = indexer.New(d.DB, d.VectorStore, emb, slogDiscard()) + d.Logger = slogDiscard() + router := NewRouter(d) + + // Begin a session so we have a valid run_id. + beginW := httptest.NewRecorder() + beginReq := httptest.NewRequest(http.MethodPost, + "/api/v1/projects/"+hash+"/index/begin", + strings.NewReader(`{"full":true}`)) + beginReq.Header.Set("Content-Type", "application/json") + router.ServeHTTP(beginW, beginReq) + if beginW.Code != 200 { + t.Fatalf("begin: status=%d body=%s", beginW.Code, beginW.Body) + } + var br indexBeginResponse + _ = json.Unmarshal(beginW.Body.Bytes(), &br) + + files := map[string]string{} + for i := 0; i < 5; i++ { + files[fmt.Sprintf("/proj/file_%d.go", i)] = + "package main\nfunc F() int { return 1 }\n" + } + body := newFilesRequestBody(t, br.RunID, files) + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + req := httptest.NewRequest(http.MethodPost, + "/api/v1/projects/"+hash+"/index/files", + bytes.NewReader(body)).WithContext(ctx) + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Accept", "application/x-ndjson") + + rw := newFlushRecorder() + + // Serve in a goroutine because the handler will block on the embedder + // until we cancel ctx. + done := make(chan struct{}) + go func() { + router.ServeHTTP(rw, req) + close(done) + }() + + // Wait for the first NDJSON line to reach our recorder — proves the + // handler is running and ProcessFilesStreaming is engaged. + if !rw.waitForBytes(2*time.Second, 10) { + t.Fatal("no bytes written before disconnect deadline") + } + + // Disconnect: the request ctx is what chi passes through to r.Context(). + cancel() + + // Handler must return promptly after ctx cancel. + select { + case <-done: + case <-time.After(2 * time.Second): + t.Fatal("handler did not return within 2s after ctx cancel") + } + + // A new /index/begin must succeed: proves CancelIndexing was called. + begin2W := httptest.NewRecorder() + begin2Req := httptest.NewRequest(http.MethodPost, + "/api/v1/projects/"+hash+"/index/begin", + strings.NewReader(`{"full":true}`)) + begin2Req.Header.Set("Content-Type", "application/json") + router.ServeHTTP(begin2W, begin2Req) + if begin2W.Code != 200 { + t.Fatalf("begin after disconnect: status=%d body=%s — session lock not released", + begin2W.Code, begin2W.Body) + } +} + +// TestAcceptsNDJSON unit-tests the Accept header parser. +func TestAcceptsNDJSON(t *testing.T) { + cases := []struct { + header string + want bool + }{ + {"application/x-ndjson", true}, + {"application/x-ndjson; q=1.0", true}, + {"application/json, application/x-ndjson", true}, + {" application/x-ndjson ", true}, + {"application/X-NDJSON", true}, // case-insensitive + {"*/*", false}, + {"application/json", false}, + {"", false}, + } + for _, c := range cases { + if got := acceptsNDJSON(c.header); got != c.want { + t.Errorf("acceptsNDJSON(%q) = %v, want %v", c.header, got, c.want) + } + } +} + +func summarizeEvents(events []indexer.ProgressEvent) string { + var b strings.Builder + for i, e := range events { + if i > 0 { + b.WriteString(", ") + } + b.WriteString(e.Event) + } + return b.String() +} diff --git a/server/internal/indexer/indexer.go b/server/internal/indexer/indexer.go index 9839ef6..be76d1f 100644 --- a/server/internal/indexer/indexer.go +++ b/server/internal/indexer/indexer.go @@ -269,9 +269,36 @@ func (s *Service) ProcessFiles( ctx context.Context, projectPath, runID string, files []FilePayload, +) (int, int, int, error) { + return s.ProcessFilesStreaming(ctx, projectPath, runID, files, nil) +} + +// ProcessFilesStreaming is ProcessFiles with an optional progress channel. The +// streaming HTTP handler passes a channel that forwards each event as an +// NDJSON line; non-streaming callers use ProcessFiles which passes nil. +// +// The terminal event (batch_done on success, error fatal=true on failure) is +// sent with a guaranteed-blocking send so the consumer always sees it. +// Per-file progress events use a non-blocking send and may be dropped if the +// consumer is slower than the embed loop — that is acceptable because the +// final summary is what callers depend on. +// +// When progress is non-nil, the channel is left open on return; the caller +// is expected to close it after collecting the terminal event. +func (s *Service) ProcessFilesStreaming( + ctx context.Context, + projectPath, runID string, + files []FilePayload, + progress chan<- ProgressEvent, ) (int, int, int, error) { sess, err := s.requireSession(runID, projectPath) if err != nil { + emitTerminal(progress, ProgressEvent{ + Event: EventError, + Message: err.Error(), + Fatal: true, + RunID: runID, + }) return 0, 0, 0, err } @@ -303,12 +330,28 @@ func (s *Service) ProcessFiles( } }() - for _, fp := range files { + for fi, fp := range files { + // file_started — emit even for files we'll skip below, so the client + // counter advances monotonically and rendering stays aligned with N. + progressSend(progress, ProgressEvent{ + Event: EventFileStarted, + Path: fp.Path, + FileIndex: fi + 1, + BatchSize: len(files), + RunID: runID, + }) + if strings.TrimSpace(fp.Content) == "" { continue } if len(fp.Content) > maxContentBytes { s.logger.Warn("indexer: file too large, skipping", "path", fp.Path, "size_bytes", len(fp.Content)) + progressSend(progress, ProgressEvent{ + Event: EventFileError, + Path: fp.Path, + Message: fmt.Sprintf("file too large (%d bytes)", len(fp.Content)), + Fatal: false, + }) continue } @@ -320,11 +363,22 @@ func (s *Service) ProcessFiles( chunks, refs, err := chunker.ChunkFile(fp.Path, fp.Content, language, 0) if err != nil { s.logger.Warn("indexer: chunk file failed", "path", fp.Path, "err", err) + progressSend(progress, ProgressEvent{ + Event: EventFileError, + Path: fp.Path, + Message: "chunk: " + err.Error(), + Fatal: false, + }) continue } if len(chunks) == 0 { continue } + progressSend(progress, ProgressEvent{ + Event: EventFileChunked, + Path: fp.Path, + Chunks: len(chunks), + }) // Symbol extraction — mirrors Python: function|class|method|type with a name. fileSymbols := make([]symbolindex.Symbol, 0, len(chunks)) @@ -366,6 +420,7 @@ func (s *Service) ProcessFiles( texts[i] = c.ChunkType + ": " + c.Content } var embs [][]float32 + embedStart := time.Now() if tae, ok := s.emb.(TokenAwareEmbedder); ok { embs, err = tae.TokenizeAndEmbed(ctx, texts) } else { @@ -374,16 +429,38 @@ func (s *Service) ProcessFiles( if err != nil { // Propagate ErrBusy so handler can map to 503 + Retry-After. if _, busy := embeddings.IsBusy(err); busy { + emitTerminal(progress, ProgressEvent{ + Event: EventError, + Message: err.Error(), + Fatal: true, + }) return filesAccepted, batchChunks, sess.filesProcessed, err } if errors.Is(err, embeddings.ErrDisabled) || errors.Is(err, embeddings.ErrSupervisor) || errors.Is(err, embeddings.ErrNotReady) { + emitTerminal(progress, ProgressEvent{ + Event: EventError, + Message: err.Error(), + Fatal: true, + }) return filesAccepted, batchChunks, sess.filesProcessed, err } s.logger.Error("indexer: embed texts failed", "path", fp.Path, "err", err) + progressSend(progress, ProgressEvent{ + Event: EventFileError, + Path: fp.Path, + Message: "embed: " + err.Error(), + Fatal: false, + }) continue } + progressSend(progress, ProgressEvent{ + Event: EventFileEmbedded, + Path: fp.Path, + Chunks: len(chunks), + EmbedMS: time.Since(embedStart).Milliseconds(), + }) // Per-file SAVEPOINT so a partial failure rolls back only this file. // savepointName is derived from filesAccepted (monotonically increasing @@ -458,6 +535,11 @@ func (s *Service) ProcessFiles( } if _, err := tx.ExecContext(ctx, "RELEASE SAVEPOINT "+savepointName); err != nil { + emitTerminal(progress, ProgressEvent{ + Event: EventError, + Message: "release savepoint: " + err.Error(), + Fatal: true, + }) return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("release savepoint: %w", err) } @@ -469,6 +551,12 @@ func (s *Service) ProcessFiles( sess.languagesSeen[language] = struct{}{} s.mu.Unlock() filesAccepted++ + + progressSend(progress, ProgressEvent{ + Event: EventFileDone, + Path: fp.Path, + Chunks: len(chunks), + }) } // M2 — these upserts are part of the outer tx. Any failure returns the @@ -476,16 +564,31 @@ func (s *Service) ProcessFiles( // below only advance on a successful commit. if len(batchSymbols) > 0 { if err := symbolindex.UpsertSymbolsTx(ctx, tx, projectPath, batchSymbols); err != nil { + emitTerminal(progress, ProgressEvent{ + Event: EventError, + Message: "upsert symbols: " + err.Error(), + Fatal: true, + }) return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("upsert symbols: %w", err) } } if len(batchRefs) > 0 { if err := symbolindex.UpsertReferencesTx(ctx, tx, projectPath, batchRefs); err != nil { + emitTerminal(progress, ProgressEvent{ + Event: EventError, + Message: "upsert refs: " + err.Error(), + Fatal: true, + }) return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("upsert refs: %w", err) } } if err := tx.Commit(); err != nil { + emitTerminal(progress, ProgressEvent{ + Event: EventError, + Message: "commit batch: " + err.Error(), + Fatal: true, + }) return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("commit batch: %w", err) } txCommitted = true @@ -503,6 +606,13 @@ func (s *Service) ProcessFiles( "total_files", total, ) + emitTerminal(progress, ProgressEvent{ + Event: EventBatchDone, + FilesAccepted: filesAccepted, + ChunksCreated: batchChunks, + FilesProcessedTotal: total, + }) + return filesAccepted, batchChunks, total, nil } diff --git a/server/internal/indexer/progress.go b/server/internal/indexer/progress.go new file mode 100644 index 0000000..851a0ba --- /dev/null +++ b/server/internal/indexer/progress.go @@ -0,0 +1,95 @@ +package indexer + +import "time" + +// ProgressEvent is emitted by ProcessFiles when a non-nil progress channel is +// supplied, so the streaming HTTP handler can forward each one as a JSON line. +// +// One struct with all possible fields + omitempty is intentional: it keeps the +// wire format easy to evolve and the consumer code a single switch on Event. +// +// Wire format example (newline-delimited JSON, one struct per line): +// +// {"event":"file_started","run_id":"...","path":"main.go","file_index":1,"batch_size":20} +// {"event":"file_chunked","path":"main.go","chunks":12} +// {"event":"file_embedded","path":"main.go","chunks":12,"embed_ms":540} +// {"event":"file_done","path":"main.go","chunks":12} +// {"event":"heartbeat","ts":"2026-04-27T17:25:00Z"} +// {"event":"file_error","path":"big.bin","message":"...","fatal":false} +// {"event":"batch_done","files_accepted":20,"chunks_created":347,"files_processed_total":300} +// {"event":"error","message":"...","fatal":true} +type ProgressEvent struct { + Event string `json:"event"` + + // Per-file fields. + Path string `json:"path,omitempty"` + FileIndex int `json:"file_index,omitempty"` + BatchSize int `json:"batch_size,omitempty"` + Chunks int `json:"chunks,omitempty"` + EmbedMS int64 `json:"embed_ms,omitempty"` + + // Heartbeat. + TS string `json:"ts,omitempty"` + + // Errors. + Message string `json:"message,omitempty"` + Fatal bool `json:"fatal,omitempty"` + + // Batch-done summary (mirrors indexFilesResponse). + FilesAccepted int `json:"files_accepted,omitempty"` + ChunksCreated int `json:"chunks_created,omitempty"` + FilesProcessedTotal int `json:"files_processed_total,omitempty"` + + // Run identifier — populated on the first event the handler emits. + RunID string `json:"run_id,omitempty"` +} + +// Event kinds. Using string constants both for documentation and for +// comparisons in tests / consumer switches. +const ( + EventFileStarted = "file_started" + EventFileChunked = "file_chunked" + EventFileEmbedded = "file_embedded" + EventFileDone = "file_done" + EventFileError = "file_error" + EventHeartbeat = "heartbeat" + EventBatchDone = "batch_done" + EventError = "error" +) + +// progressSend is a nil-safe non-blocking send. ProcessFiles uses it instead +// of `progress <- e` so that: +// +// 1. callers that do not care about progress pass nil and pay no cost +// 2. a slow consumer cannot stall the indexer (the channel has a small +// buffer in the streaming handler; if it fills we drop the event rather +// than blocking the embed pipeline) +// +// Drops are acceptable because the only events that *must* land are +// batch_done / error, and those are sent on the unbuffered close path +// using a guaranteed-blocking send (see emitTerminal). +func progressSend(ch chan<- ProgressEvent, e ProgressEvent) { + if ch == nil { + return + } + select { + case ch <- e: + default: + // channel full — drop. Keeps embed loop unblocked. + } +} + +// emitTerminal is for batch_done / fatal error: must reach the consumer. +// Always blocks until accepted (or ctx cancellation closes things upstream). +func emitTerminal(ch chan<- ProgressEvent, e ProgressEvent) { + if ch == nil { + return + } + ch <- e +} + +// NowTS returns an RFC3339 timestamp for heartbeat events. Exported so the +// streaming HTTP handler in package httpapi can stamp its own heartbeats. +func NowTS() string { + return time.Now().UTC().Format(time.RFC3339) +} From 94ed0ff5c209a2dd7ffdeabc725049e5d6677d1f Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Mon, 27 Apr 2026 22:10:54 +0100 Subject: [PATCH 3/9] feat(cli): cix cancel command + summary grouped by language + dev makefile target MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * New `cix cancel` command frees a stuck per-project session lock without having to wait for the 1h TTL. Pairs with the streaming-handler ctx- disconnect path: in normal use the lock auto-clears, this is the manual escape hatch. * `cix summary` now groups "Top symbols" by language and shows up to N per language instead of one mixed list. Earlier output mixed Go/Python/JS symbols with no indication of which file they came from, which made the summary nearly useless on multi-language repos. * server/Makefile: docker-build-cuda-dev target builds + pushes :cu128-dev for manual prod testing before tagging a release. Floating tag, no pinned variant — rollback isn't a concern for a dev tag. * root Makefile: small build-target plumbing. * doc/benchmark-cix-vs-grep.md: numbers from the search-vs-grep comparison done while debugging the install.sh hang. Tracked locally — not user documentation, more of an internal reference. Co-Authored-By: Claude Opus 4.7 --- Makefile | 4 +- cli/cmd/cancel.go | 68 +++++++ cli/cmd/cancel_test.go | 100 +++++++++++ cli/cmd/summary.go | 45 ++++- cli/cmd/summary_test.go | 5 +- doc/benchmark-cix-vs-grep.md | 331 +++++++++++++++++++++++++++++++++++ server/Makefile | 36 +++- 7 files changed, 578 insertions(+), 11 deletions(-) create mode 100644 cli/cmd/cancel.go create mode 100644 cli/cmd/cancel_test.go create mode 100644 doc/benchmark-cix-vs-grep.md diff --git a/Makefile b/Makefile index 92add01..54ad72b 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: help build test bundle test-gate docker-build-cuda clean +.PHONY: help build test bundle test-gate docker-build-cuda docker-build-cuda-dev clean -help build test bundle test-gate docker-build-cuda clean: +help build test bundle test-gate docker-build-cuda docker-build-cuda-dev clean: @$(MAKE) -C server $@ diff --git a/cli/cmd/cancel.go b/cli/cmd/cancel.go new file mode 100644 index 0000000..e616a9e --- /dev/null +++ b/cli/cmd/cancel.go @@ -0,0 +1,68 @@ +package cmd + +import ( + "fmt" + "os" + "path/filepath" + + "github.com/spf13/cobra" +) + +var cancelProject string + +var cancelCmd = &cobra.Command{ + Use: "cancel", + Short: "Cancel an active indexing session", + Long: `Cancel any in-flight indexing session for a project. + +Useful when a previous 'cix reindex' was interrupted by a network issue or +client-side timeout but the server is still holding a session lock and +returning 409 Conflict on subsequent /index/begin attempts. + +Idempotent: succeeds (no-op) when no session is active. + +Examples: + cix cancel + cix cancel -p /path/to/project`, + RunE: runCancel, +} + +func init() { + rootCmd.AddCommand(cancelCmd) + cancelCmd.Flags().StringVarP(&cancelProject, "project", "p", "", "Project path (default: current directory)") +} + +func runCancel(cmd *cobra.Command, args []string) error { + projectPath := cancelProject + if projectPath == "" { + cwd, err := os.Getwd() + if err != nil { + return fmt.Errorf("get working directory: %w", err) + } + projectPath = cwd + } + + absPath, err := filepath.Abs(projectPath) + if err != nil { + return fmt.Errorf("resolve path: %w", err) + } + + apiClient, err := getClient() + if err != nil { + return err + } + + absPath = findProjectRoot(absPath, apiClient) + + resp, err := apiClient.CancelIndex(absPath) + if err != nil { + return fmt.Errorf("cancel: %w", err) + } + + if resp.Cancelled { + fmt.Printf("✓ Cancelled active indexing session for %s\n", absPath) + } else { + fmt.Printf("No active session for %s (nothing to cancel)\n", absPath) + } + return nil +} diff --git a/cli/cmd/cancel_test.go b/cli/cmd/cancel_test.go new file mode 100644 index 0000000..6eec5ad --- /dev/null +++ b/cli/cmd/cancel_test.go @@ -0,0 +1,100 @@ +package cmd + +import ( + "net/http" + "strings" + "testing" +) + +func TestRunCancel_ActiveSession(t *testing.T) { + proj := t.TempDir() + hash := projectHash(proj) + + srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) { + switch { + case strings.HasSuffix(r.URL.Path, "/api/v1/projects"): + writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0}) + case strings.Contains(r.URL.Path, hash+"/index/cancel") && r.Method == http.MethodPost: + writeJSON(w, 200, map[string]any{"cancelled": true}) + default: + http.NotFound(w, r) + } + }) + useAPI(t, srv) + + old := cancelProject + defer func() { cancelProject = old }() + cancelProject = proj + + out, err := captureOutput(func() error { + return runCancel(nil, nil) + }) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if !strings.Contains(out, "Cancelled active indexing session") { + t.Errorf("expected success message, got:\n%s", out) + } +} + +func TestRunCancel_NoActiveSession(t *testing.T) { + proj := t.TempDir() + hash := projectHash(proj) + + srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) { + switch { + case strings.HasSuffix(r.URL.Path, "/api/v1/projects"): + writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0}) + case strings.Contains(r.URL.Path, hash+"/index/cancel"): + writeJSON(w, 200, map[string]any{"cancelled": false}) + default: + http.NotFound(w, r) + } + }) + useAPI(t, srv) + + old := cancelProject + defer func() { cancelProject = old }() + cancelProject = proj + + out, err := captureOutput(func() error { + return runCancel(nil, nil) + }) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if !strings.Contains(out, "No active session") { + t.Errorf("expected idempotent message, got:\n%s", out) + } +} + +func TestRunCancel_APIError(t *testing.T) { + proj := t.TempDir() + hash := projectHash(proj) + + srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) { + switch { + case strings.HasSuffix(r.URL.Path, "/api/v1/projects"): + writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0}) + case strings.Contains(r.URL.Path, hash+"/index/cancel"): + apiError(w, 500, "internal error") + default: + http.NotFound(w, r) + } + }) + useAPI(t, srv) + + old := cancelProject + defer func() { cancelProject = old }() + cancelProject = proj + + _, err := captureOutput(func() error { + return runCancel(nil, nil) + }) + if err == nil { + t.Fatal("expected error, got nil") + } + if !strings.Contains(err.Error(), "cancel") { + t.Errorf("expected 'cancel' in error, got: %v", err) + } +} diff --git a/cli/cmd/summary.go b/cli/cmd/summary.go index 521c0b9..6be2d55 100644 --- a/cli/cmd/summary.go +++ b/cli/cmd/summary.go @@ -4,8 +4,10 @@ import ( "fmt" "os" "path/filepath" + "sort" "strings" + "github.com/anthropics/code-index/cli/internal/client" "github.com/spf13/cobra" ) @@ -91,16 +93,45 @@ func runSummary(cmd *cobra.Command, args []string) error { fmt.Println() } - // Recent symbols + // Top symbols — grouped by language so it's obvious which symbols come + // from which file type. Mixed lists used to be hard to scan ("why is + // `s` showing up as a function?" — turns out: minified JS bundle). if len(summary.RecentSymbols) > 0 { fmt.Println("Top symbols:") - for _, sym := range summary.RecentSymbols { - if sym.Name == "" { - continue - } - fmt.Printf(" [%s] %s\n", sym.Kind, sym.Name) - } + printSymbolsByLanguage(summary.RecentSymbols) } return nil } + +// printSymbolsByLanguage groups symbols by their language and renders each +// group under a ` (N):` header. Languages are sorted alphabetically; +// within each group, original order is preserved (the server already returns +// them ranked). Symbols with empty Language are bucketed under "(unknown)". +func printSymbolsByLanguage(syms []client.RecentSymbolEntry) { + groups := map[string][]client.RecentSymbolEntry{} + for _, sym := range syms { + if sym.Name == "" { + continue + } + lang := sym.Language + if lang == "" { + lang = "(unknown)" + } + groups[lang] = append(groups[lang], sym) + } + + langs := make([]string, 0, len(groups)) + for l := range groups { + langs = append(langs, l) + } + sort.Strings(langs) + + for _, lang := range langs { + entries := groups[lang] + fmt.Printf(" %s (%d):\n", lang, len(entries)) + for _, sym := range entries { + fmt.Printf(" [%s] %s\n", sym.Kind, sym.Name) + } + } +} diff --git a/cli/cmd/summary_test.go b/cli/cmd/summary_test.go index 8879d89..69731f8 100644 --- a/cli/cmd/summary_test.go +++ b/cli/cmd/summary_test.go @@ -27,7 +27,10 @@ func TestRunSummary(t *testing.T) { {"path": proj + "/cli", "file_count": 20.0}, }, "recent_symbols": []map[string]any{ - {"name": "IndexerService", "kind": "class"}, + {"name": "IndexerService", "kind": "class", "language": "go"}, + {"name": "User", "kind": "class", "language": "python"}, + {"name": "ParseJSON", "kind": "function", "language": "go"}, + {"name": "noLangSymbol", "kind": "function"}, }, }) default: diff --git a/doc/benchmark-cix-vs-grep.md b/doc/benchmark-cix-vs-grep.md new file mode 100644 index 0000000..2f08b8d --- /dev/null +++ b/doc/benchmark-cix-vs-grep.md @@ -0,0 +1,331 @@ +# Benchmark — CIX-first vs grep-only navigation + +Single-machine, single-model (`claude-sonnet-4-6`) head-to-head: 32 hint-free tasks +across 4 task types × 4 variants × 2 navigation strategies (Worker A: grep-only, +Worker B: cix-first). Operator: `claude-opus-4-7`. Run on 2026-04-27. + +The fixture is a frozen snapshot of this same `claude-code-index` project. +All raw transcripts and metric JSON live in `/tmp/cix-bench/results/runs/`; +this report does not include them. + +--- + +## 1. Headline comparison (16 runs each) + +| Metric | Worker A (grep-only) | Worker B (cix-first) | Δ (B − A) | Δ % | +|--------------------------|----------------------|----------------------|-----------|---------| +| Mean elapsed time (s) | **62.2** | 69.9 | +7.7 | +12.4 % | +| Median elapsed time (s) | **58.5** | **58.5** | 0.0 | 0.0 % | +| Mean tool calls | **14.5** | 19.2 | +4.7 | +32.4 % | +| Mean tokens_in | **33** | 38 | +5 | +15.2 % | +| Mean tokens_out | **2447** | 2754 | +307 | +12.5 % | +| Pass rate | 14 / 16 | **16 / 16** | +2 | +12.5 % | + +Δ is `B − A`. Negative on time/tokens means B was faster/cheaper. Bold = better cell per row. + +Token counts are uncached `input_tokens` / `output_tokens` summed across the +worker's assistant messages (per runbook §6). Cache-creation tokens, which +dominate real cost on Sonnet, are reported in §6 as a caveat but not in the +headline because the runbook fixed the metric definition before the run. + +**One-glance read:** B is *more reliable* (16 / 16 pass vs 14 / 16) but *not* +faster or cheaper on average. Median elapsed is identical; the mean gap comes +from a few long B-runs in `tests` and `summary` (see §2). The only clean B +win on time is `bugfix`. + +--- + +## 2. Per-task comparison + +| Task type | Metric | Worker A | Worker B | Δ (B − A) | Δ % | +|-----------|---------------------|----------|----------|-----------|---------| +| bugfix | mean elapsed s | 61.5 | **55.2** | −6.3 | −10.2 % | +| bugfix | mean tool calls | **13.5** | 14.0 | +0.5 | +3.7 % | +| bugfix | mean tokens_in | 21 | **20** | −1 | −4.8 % | +| bugfix | mean tokens_out | 1837 | **1745** | −92 | −5.0 % | +| bugfix | pass rate | 4 / 4 | 4 / 4 | 0 | 0 % | +| refactor | mean elapsed s | **62.0** | 64.5 | +2.5 | +4.0 % | +| refactor | mean tool calls | **13.0** | 13.8 | +0.8 | +6.2 % | +| refactor | mean tokens_in | **21** | 22 | +1 | +4.8 % | +| refactor | mean tokens_out | 2195 | **2018** | −177 | −8.1 % | +| refactor | pass rate | 2 / 4 | **4 / 4**| +2 | +50 % | +| tests | mean elapsed s | **78.2** | 107.0 | +28.8 | +36.8 % | +| tests | mean tool calls | **15.0** | 30.5 | +15.5 | +103 % | +| tests | mean tokens_in | **24** | 43 | +19 | +79 % | +| tests | mean tokens_out | **3865** | 4906 | +1041 | +26.9 % | +| tests | pass rate | 4 / 4 | 4 / 4 | 0 | 0 % | +| summary | mean elapsed s | **47.2** | 52.8 | +5.5 | +11.7 % | +| summary | mean tool calls | **16.5** | 18.5 | +2.0 | +12.1 % | +| summary | mean tokens_in | **64** | 66 | +2 | +3.1 % | +| summary | mean tokens_out | **1892** | 2347 | +454 | +24.0 % | +| summary | pass rate | 4 / 4 | 4 / 4 | 0 | 0 % | + +Where the strategy mattered most — and what it actually changed: + +- **`refactor` is the only place B's pass rate dominates.** A picked the same + non-seeded inefficiency (`chunkSlidingWindow`) twice — for variants 01 and 04 + — and was scored `partial`. B used `cix symbols` / `cix references` to + enumerate candidates more broadly and hit the seeded function in all 4 runs. +- **`bugfix` favors A on wall-clock**, ~10 % faster. With a failing test + pointing at the call site, neither navigator needs much exploration; the + extra round-trip through `cix` is pure overhead. +- **`tests` is where B paid the biggest tax** — +29 s, +1041 output tokens. + B consistently selected real exported functions (`DynamicChromaPersistDir`, + `DeleteByProject`, `DefaultSettings`) which require harness setup + (DB / temp dir) and write longer test bodies. A took the literal cheap path + 4 / 4 times: it picked an *unexported* helper (`splitChunk` ×3, + `sortRanges` ×1) every variant, which the prompt forbade ("public function"). + Verification per §7.3 still scored both as `pass` — §7.3 doesn't gate on + exportedness. See §6. +- **`summary` is a draw on quality** (rubric: A=6,6,6,7 / B=6,5,6,6) but B used + ~24 % more output tokens. Both configs read enough of the tree to ground the + paragraph; neither shape of navigation seems to help here. + +--- + +## 3. Per-run table (all 32 rows) + +| run_id | elapsed_s | tools | toks_total | toks_in | toks_out | cix_ops | grep_ops | files_read | outcome | +|-----------------|-----------|-------|------------|---------|----------|---------|----------|------------|---------| +| bugfix_01_A | 92 | 18 | 3129 | 25 | 3104 | 0 | 3 | 3 | pass | +| bugfix_01_B | 66 | 17 | 2129 | 25 | 2104 | 0 | 0 | 4 | pass | +| bugfix_02_A | 45 | 11 | 1104 | 20 | 1084 | 0 | 0 | 3 | pass | +| bugfix_02_B | 42 | 12 | 1465 | 17 | 1448 | 1 | 0 | 2 | pass | +| bugfix_03_A | 35 | 9 | 1401 | 14 | 1387 | 0 | 0 | 1 | pass | +| bugfix_03_B | 47 | 12 | 1636 | 18 | 1618 | 0 | 1 | 2 | pass | +| bugfix_04_A | 74 | 16 | 1798 | 26 | 1772 | 0 | 1 | 2 | pass | +| bugfix_04_B | 66 | 15 | 1831 | 22 | 1809 | 3 | 0 | 1 | pass | +| refactor_01_A | 55 | 13 | 2127 | 20 | 2107 | 0 | 1 | 3 | partial | +| refactor_01_B | 88 | 15 | 2646 | 25 | 2621 | 2 | 1 | 2 | pass | +| refactor_02_A | 76 | 15 | 2708 | 25 | 2683 | 0 | 3 | 2 | pass | +| refactor_02_B | 62 | 15 | 2229 | 22 | 2207 | 4 | 2 | 1 | pass | +| refactor_03_A | 59 | 11 | 1574 | 19 | 1555 | 0 | 0 | 2 | pass | +| refactor_03_B | 55 | 14 | 1835 | 23 | 1812 | 1 | 0 | 1 | pass | +| refactor_04_A | 58 | 13 | 2455 | 21 | 2434 | 0 | 3 | 3 | partial | +| refactor_04_B | 53 | 11 | 1452 | 19 | 1433 | 1 | 1 | 1 | pass | +| tests_01_A | 88 | 15 | 3600 | 24 | 3576 | 0 | 4 | 2 | pass | +| tests_01_B | 87 | 26 | 4054 | 35 | 4019 | 1 | 0 | 13 | pass | +| tests_02_A | 82 | 14 | 4857 | 23 | 4834 | 0 | 3 | 2 | pass | +| tests_02_B | 122 | 38 | 6779 | 58 | 6721 | 0 | 10 | 15 | pass | +| tests_03_A | 75 | 19 | 3227 | 29 | 3198 | 0 | 4 | 2 | pass | +| tests_03_B | 110 | 26 | 4367 | 37 | 4330 | 1 | 3 | 11 | pass | +| tests_04_A | 68 | 12 | 3873 | 20 | 3853 | 0 | 0 | 2 | pass | +| tests_04_B | 109 | 32 | 4598 | 43 | 4555 | 3 | 2 | 12 | pass | +| summary_01_A | 54 | 20 | 2557 | 199 | 2358 | 0 | 0 | 13 | pass | +| summary_01_B | 47 | 17 | 1845 | 20 | 1825 | 2 | 0 | 0 | pass | +| summary_02_A | 55 | 16 | 1714 | 19 | 1695 | 0 | 0 | 0 | pass | +| summary_02_B | 54 | 24 | 3296 | 27 | 3269 | 3 | 0 | 4 | pass | +| summary_03_A | 37 | 15 | 1836 | 18 | 1818 | 0 | 1 | 10 | pass | +| summary_03_B | 55 | 14 | 2163 | 17 | 2146 | 8 | 0 | 0 | pass | +| summary_04_A | 43 | 15 | 1719 | 21 | 1698 | 0 | 0 | 10 | pass | +| summary_04_B | 55 | 19 | 2345 | 198 | 2147 | 8 | 0 | 5 | pass | + +`cix_ops > 0` for an A row would mean the worker violated the prompt-level +restriction; **no A row has `cix_ops > 0`**, so no `(violation)` flag is needed. + +--- + +## 4. Methodology (abridged from §§0–7 of the runbook) + +**Subjects.** Two prompt-level configurations of `claude-sonnet-4-6` running as +sub-agents (operator is `claude-opus-4-7`): +- **Worker A — grep-only**: PREAMBLE_A restricts tools to Bash / Read / Edit / + Glob / Grep and forbids `cix`. The prompt also tells A that `CIX_API_KEY` + is set to an invalid value, so any cix call would 401. +- **Worker B — cix-first**: PREAMBLE_B advertises `cix search`, `cix + definitions`, `cix references`, `cix symbols`, `cix files` against + `http://192.168.1.168:21847` and notes the project has already been indexed. + Falling back to grep is permitted only when cix returns nothing relevant. + +Verbatim preambles and task prompts are in §7 below. + +**Fixture.** A frozen snapshot of `claude-code-index` at HEAD (`/tmp/cix-bench/baseline/`, +`.venv/` and built bench binaries removed). 16 variants under +`/tmp/cix-bench/variants/{bugfix,refactor,tests,summary}/{01..04}/`. SHA-256 +manifest of every variant file written to `/tmp/cix-bench/fixture-manifest.txt` +before any run; not modified afterwards. + +**Mutations (one per `bugfix` and `refactor` variant).** +- `bugfix/01`: drop the `!` in `IsBinary` (`cli/internal/fileutil/binary.go`). +- `bugfix/02`: change `".go": "go"` to `".go": "golang"` in `extensionMap`. +- `bugfix/03`: in `splitLines`, change `start = i + 1` to `start = i`. +- `bugfix/04`: legacy-key target `auto_watch:` becomes `auto-watch:`. +- `refactor/01`: replace map-based `dedupByLocation` with O(n²) nested loop. +- `refactor/02`: replace `sortRanges` (already insertion sort in baseline) with bubble sort. +- `refactor/03`: replace `joinLines`'s `strings.Join` with `+=` loop. +- `refactor/04`: fall-back per runbook — replace `repeatComma` byte-slice + build with a `+=` loop in `server/internal/symbolindex/symbolindex.go`. Recorded in manifest. + +`tests/01..04` and `summary/01..04` are identical to baseline. + +**Per-run procedure (serial, A before B per variant).** +1. `cix watch stop --all` to clear daemons; `rm -rf /tmp/cix-bench-run`. +2. `cp -R variants///. /tmp/cix-bench-run/`. +3. **B only:** `cix init --watch=false` against the server and wait for + `Status: ✓ Indexed` (192 files / 1669 chunks). Indexing is not counted in + `elapsed_s` — the worker prints its first `date +%s` only after the index + is ready. +4. Launch `Agent` with `subagent_type:"general-purpose"`, `model:"sonnet"`, the + assembled prompt, and a unique `description` (the run_id). +5. Locate transcript at `~/.claude/projects/.../subagents/agent-.jsonl`, + copy to `results/runs/.log`. +6. Compute metrics via `metrics.sh` (jq over JSONL); append CSV row. +7. Verify outcome per §7 of the runbook. + +**Outcome rules used in this run.** +- `bugfix`: `pass` iff `go test ./...` is green in **both** Go modules + (`cli/`, `server/`) — there is no top-level `go.mod`, so the runbook's + literal `go test ./...` from project root would test nothing. This is the + only verification deviation from §7.1; it applies identically to A and B. +- `refactor`: `pass` iff tests green AND a seeded function from §2.3 was + modified; `partial` iff tests green but a different function was "improved". +- `tests`: `pass` iff package builds, package tests pass, and ≥4 `func Test` + declarations exist in the new/modified test file. (Section 7.3 does not + gate on the function being exported, even though the prompt asks for one.) +- `summary`: paragraph scored 0–7 by a fresh Sonnet rubric agent (§7.4). + `pass` iff total ≥ 5; `partial` 3–4; `fail` ≤ 2. + +--- + +## 5. Executive summary (3 sentences) + +Worker A (grep-only) was faster on average (62.2 s vs 69.9 s), used fewer tool +calls, and produced fewer output tokens, but Worker B (cix-first) was strictly +more reliable — 16 / 16 pass vs 14 / 16. The two `partial` outcomes were both +on `refactor` runs where Worker A converged on the same non-seeded inefficiency +(`chunkSlidingWindow`) instead of the seeded target, while Worker B used +`cix symbols` / `cix references` to enumerate the codebase more broadly and hit +the seeded function in all four refactor variants. The strategy gap was +largest on `tests`: Worker B chose harder *exported* targets that required +real fixture setup (+29 s, +1041 output tokens), while Worker A consistently +picked unexported helpers like `splitChunk` even though the prompt asked for a +"public function" — a gap §7.3's verification doesn't penalize. + +--- + +## 6. Caveats + +- **The fixture is a snapshot of `claude-code-index` itself.** Both Sonnet + workers may recognize package layout / symbol names from training. Effect + is the same for A and B but inflates absolute "specificity" scores in §summary. +- **Tool restriction is prompt-level, not harness-level.** Worker A could have + called `cix` and we'd only catch it post-hoc via `cix_ops > 0`. None did + (16 / 16 A rows have `cix_ops = 0`). +- **Single machine, single model (`claude-sonnet-4-6`)**, single embedding + model, no warm/cold-cache split between A and B. The cix server is at + `http://192.168.1.168:21847` (remote on the LAN), not on `localhost`. + Both PREAMBLE_B and §5.2 indexing scripts were retargeted to that URL — + this is the only deviation from the verbatim preambles in the runbook, + applied identically before any A or B run started. +- **Pre-run cix indexing is excluded from `elapsed_s`** by construction — `cix + init --watch=false` returned synchronously before the Agent was spawned. + Reindex was incremental from variant 01 onward (only mutated files re-chunked). +- **Token counts are uncached `input_tokens` / `output_tokens` only.** Sonnet + cache-creation tokens — which dominate real spend on identical prompt scaffolds — + are *not* in the headline. For reference, the smoke test reported + `cache_creation_input_tokens=11775` per call against `input_tokens=3`. The + ranking between A and B does not change under cache-aware costing because + both pay nearly identical cache costs per run; cache-creation scales with + prompt length and the preambles differ by only a few sentences. +- **Outcome scoring for the `summary` task is itself done by Sonnet.** One of + the 8 scorer runs (`summary_03_A`) reasoned about port 21847 being wrong — + 21847 is in fact the correct cix-server port — and deducted a point as a + "fabrication". The total still cleared the `pass` threshold (5/7), but the + rubric run is not a perfect oracle. +- **`tests/01..04` evaluation is loose.** §7.3 doesn't gate on the function + being exported, even though TESTS_PROMPT explicitly asks for "one public + function". Worker A picked an unexported helper in all 4 tests runs, which + the verification still scores `pass`. Treating that as `partial` would + flip the tests pass-rate to 0 / 4 (A) vs 4 / 4 (B). Reported here as `pass` + to honor §7.3 letter-of-the-law. +- **`bugfix/01` actually breaks 8 tests, not 1**, because the runbook-prescribed + mutation inverts the entire `IsBinary` decision. The BUGFIX_PROMPT line + "Exactly one test is failing" is therefore mildly misleading — but the bug + is still a one-line root cause and both A and B fixed it cleanly. Recorded + in `fixture-manifest.txt`. +- **`refactor/02` baseline already used a hand-rolled insertion sort** (not + `sort.Slice` as the runbook assumed). Mutation applied in spirit: + insertion → bubble. Recorded in `fixture-manifest.txt`. +- **`refactor/04` had no `map[..]`-in-loop or `sort.Slice` in `symbolindex.go`.** + Used the runbook's documented fall-back: seeded inefficiency in + `repeatComma` (byte-slice → `+=` loop). Recorded in `fixture-manifest.txt`. + +--- + +## 7. Verbatim prompts (copy of runbook §3 + §4) + +### 7.1 Task prompts + +**BUGFIX_PROMPT** +``` +You are working in a Go project at the current directory. Run its test suite from the project root. Exactly one test is failing. Find and fix the underlying bug in the source code (do NOT modify the failing test or any other test). After your fix, re-run the full test suite from the project root and confirm everything is green. Report what you changed and why in 3–5 sentences. +``` + +**REFACTOR_PROMPT** +``` +You are working in a Go project at the current directory. Somewhere in this codebase there is a function whose implementation is asymptotically inefficient (its complexity is worse than necessary) while still being correct. Find one such function. Replace its body with an algorithmically better implementation that has the same observable behaviour. After your change, run the full test suite from the project root and confirm everything is green. Report what you changed and why in 3–5 sentences. +``` + +**TESTS_PROMPT** +``` +You are working in a Go project at the current directory. Pick one public function that currently has no unit-test coverage and write at least four meaningful unit tests for it covering distinct cases (typical input, edge case, error path, boundary). Place the new tests in the same package as the function. Run the package's tests and confirm they pass. Report which function you chose and why in 2–3 sentences. +``` + +**SUMMARY_PROMPT** +``` +You are working in a software project at the current directory. Read enough of the code to understand its overall purpose and structure. Produce a single-paragraph (≈200 words) summary covering: what the project does, its top-level architecture, the role of each major component or package, and the main entry points. The summary must be specific to THIS code base — no generic phrasing. +``` + +### 7.2 Preambles + +**COMMON_PREAMBLE** (prepended to every worker) +``` +AUTO MODE — execute autonomously, no clarifying questions, no skill invocations, code only. Begin by printing `date +%s` and end by printing `date +%s` so elapsed time can be measured from the transcript. +``` + +**PREAMBLE_A** (Worker A — grep-only) +``` +TOOL CONSTRAINT — you may use ONLY the following tools: Bash, Read, Edit, Glob, Grep. You MUST NOT call the `cix` CLI under any circumstance. Use grep, find, ls, ripgrep, etc. for navigation. +``` + +**PREAMBLE_B** (Worker B — cix-first; URL retargeted from `localhost` to `192.168.1.168` per §6) +``` +TOOL CONSTRAINT — a cix index of this project is available. Prefer the cix CLI for navigation: `cix search ""`, `cix definitions `, `cix references `, `cix symbols `, `cix files `. The cix server is at http://192.168.1.168:21847 and the project at the current working directory has already been registered and indexed for you. You MAY fall back to grep only if a cix command genuinely returns nothing relevant. Do not run `cix init`, `cix reindex`, or modify the cix configuration. +``` + +### 7.3 Final prompt assembly + +For Worker A: +``` + + + + + + +The project is at /tmp/cix-bench-run. Begin by `cd /tmp/cix-bench-run`. + +Note: the env var CIX_API_KEY is set to an invalid value for this run; any cix call will fail with an auth error. +``` + +For Worker B: +``` + + + + + + +The project is at /tmp/cix-bench-run. Begin by `cd /tmp/cix-bench-run`. +``` + +--- + +## 8. Where to look + +- Raw per-run JSONL transcripts: `/tmp/cix-bench/results/runs/.log` +- Per-run metric JSON: `/tmp/cix-bench/results/runs/.metrics.json` +- Summary task texts + scoring: `/tmp/cix-bench/results/runs/summary_*_*.txt` + + `summary_*_*.score.json` +- Combined CSV: `/tmp/cix-bench/results/results.csv` +- Frozen fixture manifest (with deviation notes): `/tmp/cix-bench/fixture-manifest.txt` diff --git a/server/Makefile b/server/Makefile index 2f991c6..085a96f 100644 --- a/server/Makefile +++ b/server/Makefile @@ -25,11 +25,17 @@ IMAGE_REPO ?= dvcdsys/code-index IMAGE_TAG ?= go-cu128 VERSION ?= $(shell git describe --tags --always 2>/dev/null || echo "0.0.0-dev") +# Floating dev tag — overwritten on every `docker-build-cuda-dev` push. Used +# by the operator to deploy "current local work" onto the RTX 3090 box for +# manual smoke testing before merging a PR. +DEV_TAG ?= cu128-dev +GIT_SHA ?= $(shell git rev-parse --short HEAD 2>/dev/null || echo "nogit") + BUILDER ?= cix-builder SCOUT_TAG ?= scout-$(shell date +%Y%m%d-%H%M) .PHONY: help build test test-gate fetch-llama bundle run docker-build-cuda \ - scout-cuda scout-cpu promote-cuda clean + docker-build-cuda-dev scout-cuda scout-cpu promote-cuda clean help: @echo "Targets:" @@ -40,6 +46,7 @@ help: @echo " run — bundle + launch cix-server (reads .env from repo root, sets CIX_LLAMA_BIN_DIR)" @echo " test-gate — run the Phase 3 parity gate (requires fetch-llama + GGUF)" @echo " docker-build-cuda — build + push linux/amd64 CUDA image $(IMAGE_REPO):$(IMAGE_TAG)" + @echo " docker-build-cuda-dev — build + push CUDA dev image $(IMAGE_REPO):$(DEV_TAG)" @echo " scout-cuda — build CUDA image via native x86 builder → push : → docker scout cves" @echo " scout-cpu — build CPU image locally → docker scout cves (no push)" @echo " promote-cuda — retag SCOUT_TAG as go-cu128+cu128 without rebuild (imagetools)" @@ -109,6 +116,33 @@ docker-build-cuda: --push \ $(ROOT) +# Local dev workflow for the RTX 3090 box (the GPU prod server reachable via +# Portainer MCP). Builds the CUDA image with a "dev" tag and pushes to Docker +# Hub from the dev machine. Operator then bumps the Portainer stack image +# tag to :cu128-dev (or pulls fresh) and smoke-tests on real hardware before +# opening a PR / cutting a versioned release. +# +# Skips Docker Scout — this image is for the operator's eyes, not for +# downstream users; run `make scout-cuda` before promoting to :cu128/:go-cu128. +docker-build-cuda-dev: + @echo "→ Building CUDA dev image on $(BUILDER) → $(IMAGE_REPO):$(DEV_TAG)" + docker buildx build \ + --builder $(BUILDER) \ + --platform linux/amd64 \ + --pull \ + --provenance=mode=max \ + --sbom=true \ + --build-arg VERSION=$(VERSION)-dev-$(GIT_SHA) \ + -f $(ROOT)/Dockerfile.cuda \ + -t $(IMAGE_REPO):$(DEV_TAG) \ + --push \ + $(ROOT) + @echo "" + @echo "✓ Pushed $(IMAGE_REPO):$(DEV_TAG)" + @echo "" + @echo "Next: bump the Portainer stack image to :$(DEV_TAG) and redeploy," + @echo " then smoke-test on the RTX 3090 box." + # Scout workflow — iterate locally before touching production tags: # # make scout-cuda # build on native x86 → :scout-YYYYMMDD-HHMM → scan From 58de3632f4e56d14b8167c6f1eace223e6b0e4d9 Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Mon, 27 Apr 2026 22:11:08 +0100 Subject: [PATCH 4/9] fix(chunker): drop markdown atx_heading + only first split-chunk keeps symbol metadata MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related fixes that cleaned up search/cix-def output for repos with markdown docs and long Go/Python functions: 1. Markdown registry. tree-sitter-markdown's `section` already wraps the heading + its body, so listing both \`section\` and \`atx_heading\` in the type-nodes config emitted duplicate one-line chunks for every \`### foo\` heading (visible as Type: type | 1-2 line snippets in `cix search` output). Drop \`atx_heading\` — keep only \`section\`. 2. splitChunk. When tree-sitter emitted a function chunk larger than maxChunkSize (default 4500 chars), splitChunk cut it into N pieces and set SymbolName/SymbolSignature/ChunkType=\"function\" on every piece. Result: cix def run returned N hits at different line ranges of the same function. Now only the FIRST piece carries the symbol metadata; subsequent pieces become anonymous \`block\` chunks. Full content of the symbol stays indexed for embed/FTS search — only the symbol-index attribution is consolidated. Test: TestSplitChunk_OnlyFirstKeepsSymbol — fixture is a 2000-line Python function, asserts exactly one chunk in the result claims symbol=big_func. Co-Authored-By: Claude Opus 4.7 --- server/internal/chunker/chunker.go | 64 ++++++++++++++++--------- server/internal/chunker/chunker_test.go | 41 ++++++++++++++++ 2 files changed, 82 insertions(+), 23 deletions(-) diff --git a/server/internal/chunker/chunker.go b/server/internal/chunker/chunker.go index 0b140a1..b5e34a2 100644 --- a/server/internal/chunker/chunker.go +++ b/server/internal/chunker/chunker.go @@ -378,7 +378,10 @@ func defaultRegistry() map[string]languageEntry { "markdown": { factory: grammars.MarkdownLanguage, nodes: map[string][]string{ - "type": {"section", "atx_heading"}, + // `section` already wraps the heading + body in + // tree-sitter-markdown — adding `atx_heading` would emit + // duplicate one-line chunks for every `### foo` line. + "type": {"section"}, }, identifiers: nil, }, @@ -864,45 +867,60 @@ func chunkSlidingWindow(filePath, content, language string) []Chunk { // Chunk splitting // --------------------------------------------------------------------------- +// splitChunk cuts an oversized chunk into pieces of <= maxSize chars. +// +// Only the FIRST piece keeps the original SymbolName/SymbolSignature/ +// ChunkType — subsequent pieces become anonymous `block` chunks. Without +// this, splitting a long function would create N rows in the symbol index +// all claiming to be `func run()`, making `cix def run` return N +// duplicates pointing at different line ranges of the same symbol. +// +// The full text of the symbol is still indexed (both for FTS and embed +// search) — just attributed to the symbol only via its first chunk. func splitChunk(chunk Chunk, maxSize int) []Chunk { lines := splitLines(chunk.Content) var subChunks []Chunk var currentLines []string currentStart := chunk.StartLine + emit := func(content string, startLine, endLine int, isFirst bool) { + c := Chunk{ + Content: content, + FilePath: chunk.FilePath, + StartLine: startLine, + EndLine: endLine, + Language: chunk.Language, + ParentName: chunk.ParentName, + } + if isFirst { + c.ChunkType = chunk.ChunkType + c.SymbolName = chunk.SymbolName + c.SymbolSignature = chunk.SymbolSignature + } else { + c.ChunkType = "block" + } + subChunks = append(subChunks, c) + } + for _, line := range lines { currentLines = append(currentLines, line) currentContent := joinLines(currentLines) if len(currentContent) >= maxSize && len(currentLines) > 1 { splitContent := joinLines(currentLines[:len(currentLines)-1]) - subChunks = append(subChunks, Chunk{ - Content: splitContent, - ChunkType: chunk.ChunkType, - FilePath: chunk.FilePath, - StartLine: currentStart, - EndLine: currentStart + len(currentLines) - 2, - Language: chunk.Language, - SymbolName: chunk.SymbolName, - SymbolSignature: chunk.SymbolSignature, - ParentName: chunk.ParentName, - }) + emit(splitContent, + currentStart, + currentStart+len(currentLines)-2, + len(subChunks) == 0) currentStart = currentStart + len(currentLines) - 1 currentLines = []string{line} } } if len(currentLines) > 0 { - subChunks = append(subChunks, Chunk{ - Content: joinLines(currentLines), - ChunkType: chunk.ChunkType, - FilePath: chunk.FilePath, - StartLine: currentStart, - EndLine: chunk.EndLine, - Language: chunk.Language, - SymbolName: chunk.SymbolName, - SymbolSignature: chunk.SymbolSignature, - ParentName: chunk.ParentName, - }) + emit(joinLines(currentLines), + currentStart, + chunk.EndLine, + len(subChunks) == 0) } return subChunks } diff --git a/server/internal/chunker/chunker_test.go b/server/internal/chunker/chunker_test.go index 8244b9e..15310c5 100644 --- a/server/internal/chunker/chunker_test.go +++ b/server/internal/chunker/chunker_test.go @@ -189,6 +189,47 @@ func TestChunkFile_OversizedChunkSplit(t *testing.T) { } } +func TestSplitChunk_OnlyFirstKeepsSymbol(t *testing.T) { + // A long Python function that splitChunk will cut into >1 piece. + var sb strings.Builder + sb.WriteString("def big_func():\n") + for i := 0; i < 2000; i++ { + sb.WriteString(" x = 1 # padding line\n") + } + src := sb.String() + chunks, _, err := ChunkFile("big.py", src, "python", 0) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + + // Find all chunks that mention `big_func`. Only one chunk in the index + // should claim the symbol; the rest must be anonymous `block` pieces + // even though they textually belong to the same function. + withSymbol := 0 + for _, c := range chunks { + if c.SymbolName != nil && *c.SymbolName == "big_func" { + withSymbol++ + if c.ChunkType != "function" { + t.Errorf("chunk with symbol big_func has type %q, want function", c.ChunkType) + } + } + } + if withSymbol != 1 { + t.Errorf("expected exactly 1 chunk attributed to big_func after split, got %d", withSymbol) + } + + // And we DID split — meaning multiple chunks for this function exist. + totalForFunc := 0 + for _, c := range chunks { + if c.FilePath == "big.py" && c.ChunkType != "module" { + totalForFunc++ + } + } + if totalForFunc < 2 { + t.Skipf("test self-check: function fit into one chunk (totalForFunc=%d) — need bigger fixture", totalForFunc) + } +} + func TestFindGaps_NoOverlap(t *testing.T) { covered := [][2]int{{2, 5}, {10, 12}} gaps := findGaps(covered, 15) From f87e7e70d5292243ea64f9ec25b4d99282903250 Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Mon, 27 Apr 2026 22:17:08 +0100 Subject: [PATCH 5/9] feat(search): merge overlapping hits + windowed retrieval + breadcrumb render MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Tree-sitter emits nested chunks by design — a markdown H1 wraps its H2 sub-sections, a Go class wraps its methods, a Python module wraps its classes. A vector search that hits the inner chunk also tends to hit (a bit weaker) the outer chunk, and the user's --limit budget gets eaten by N near-duplicates of the same code region. Same problem with splitChunk leftovers when a long function is cut into pieces. This change collapses overlapping results from the same file into a single "outer" hit with the inner matches recorded as NestedHits. Two merge cases: 1. Strict containment — A.range ⊋ B.range and same file → absorb B. 2. Same-symbol adjacent — adjacent ranges where at least one carries a symbol name → absorb (catches splitChunk piece1 + piece2 leftovers). Cross-file results are NEVER merged. Exact duplicates (same range twice) are not merged either — those should be deduped at the vector-store layer (already are, via dedupByLocation). Windowed retrieval: instead of over-fetching limit×2 once, the search handler now retries with limit×2, ×4, ×8, ×16 if mergeOverlappingHits collapses the result set below the user's --limit. Stops early when the vector store returns fewer rows than asked (HNSW exhausted) or when the factor cap is hit. In practice the first window is enough; the loop exists for repos with deeply nested markdown or many class+method hits inside the same files. Server changes: - New file search_merge.go — mergeOverlappingHits, shouldMerge. - searchResultItem gains NestedHits []nestedHit (omitempty). - semanticSearchHandler refactored: extract fetchVectorResults + filterToSearchItems helpers, wrap call in factor loop, drop the early break-on-limit (merge needs the full filtered set to identify overlaps). - 10 unit tests for mergeOverlappingHits + 1 integration test (TestSemanticSearch_NestedMarkdownMerge) verifying nested H1/H2 sections collapse to a single result with NestedHits populated. CLI changes: - SearchResult / NestedHit struct mirrors the server response. - cix search render shows "+ N more match(es) inside:" with per-hit score/range/symbol so the user sees WHY the outer chunk ranks well even when the actual signal came from a sub-section. Co-Authored-By: Claude Opus 4.7 --- cli/cmd/search.go | 19 +- cli/internal/client/search.go | 30 +- server/internal/httpapi/indexing_test.go | 65 +++++ server/internal/httpapi/search.go | 271 ++++++++++++------- server/internal/httpapi/search_merge.go | 138 ++++++++++ server/internal/httpapi/search_merge_test.go | 188 +++++++++++++ 6 files changed, 607 insertions(+), 104 deletions(-) create mode 100644 server/internal/httpapi/search_merge.go create mode 100644 server/internal/httpapi/search_merge_test.go diff --git a/cli/cmd/search.go b/cli/cmd/search.go index e9a1c32..9b12855 100644 --- a/cli/cmd/search.go +++ b/cli/cmd/search.go @@ -139,7 +139,24 @@ func runSearch(cmd *cobra.Command, args []string) error { for _, line := range strings.Split(content, "\n") { fmt.Printf(" %s\n", line) } - fmt.Printf(" ```\n\n") + fmt.Printf(" ```\n") + + // Breadcrumbs for nested hits absorbed by the server's merge step. + // Tells the user "this big chunk ranks well because of these inner + // matches" so they're not surprised that --limit returned fewer + // items than expected. + if len(result.NestedHits) > 0 { + fmt.Printf(" + %d more match(es) inside:\n", len(result.NestedHits)) + for _, nh := range result.NestedHits { + label := nh.ChunkType + if nh.SymbolName != "" { + label = fmt.Sprintf("%s %s", nh.ChunkType, nh.SymbolName) + } + fmt.Printf(" · [%.2f] %s:%d-%d (%s)\n", + nh.Score, result.FilePath, nh.StartLine, nh.EndLine, label) + } + } + fmt.Println() } return nil diff --git a/cli/internal/client/search.go b/cli/internal/client/search.go index 2392d41..018a579 100644 --- a/cli/internal/client/search.go +++ b/cli/internal/client/search.go @@ -2,16 +2,34 @@ package client import "fmt" -// SearchResult represents a code search result +// SearchResult represents a code search result. +// +// NestedHits is populated by the server's mergeOverlappingHits step when +// other matches inside this chunk's line range were absorbed (e.g. a +// markdown H2 inside an H1 section, or a method inside its class). The +// renderer uses these to show breadcrumbs so the user can see WHY this +// outer chunk ranks well. type SearchResult struct { - FilePath string `json:"file_path"` + FilePath string `json:"file_path"` + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + Content string `json:"content"` + Score float64 `json:"score"` + ChunkType string `json:"chunk_type"` + SymbolName string `json:"symbol_name"` + Language string `json:"language"` + NestedHits []NestedHit `json:"nested_hits,omitempty"` +} + +// NestedHit is a chunk that was merged INTO another result by the server. +// Just enough metadata to render a breadcrumb and let the user jump to +// the exact line. The full content is already inside the parent result. +type NestedHit struct { StartLine int `json:"start_line"` EndLine int `json:"end_line"` - Content string `json:"content"` - Score float64 `json:"score"` + SymbolName string `json:"symbol_name,omitempty"` ChunkType string `json:"chunk_type"` - SymbolName string `json:"symbol_name"` - Language string `json:"language"` + Score float64 `json:"score"` } // SearchResponse represents the search response diff --git a/server/internal/httpapi/indexing_test.go b/server/internal/httpapi/indexing_test.go index b072102..cc73c18 100644 --- a/server/internal/httpapi/indexing_test.go +++ b/server/internal/httpapi/indexing_test.go @@ -272,6 +272,71 @@ func TestSemanticSearch_HTTP(t *testing.T) { } } +// TestSemanticSearch_NestedMarkdownMerge indexes a markdown file with H1 +// containing two H2 sections, all containing a unique token. The chunker +// emits 3 overlapping `section` chunks (1 outer + 2 inner). After +// mergeOverlappingHits the outer section absorbs both inner ones — +// observable as ONE result with NestedHits populated, instead of three +// near-duplicates fighting for the user's --limit budget. +func TestSemanticSearch_NestedMarkdownMerge(t *testing.T) { + d, hash := newIndexerTestDeps(t, "/proj-md") + router := NewRouter(d) + + beginW := doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/index/begin", map[string]any{}) + var begin indexBeginResponse + _ = json.Unmarshal(beginW.Body.Bytes(), &begin) + + content := "# Setup zlork\n\nIntro about zlork.\n\n## Local zlork dev\n\n" + + "Steps for zlork.\n\n## Remote zlork\n\nMore zlork.\n" + doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/index/files", map[string]any{ + "run_id": begin.RunID, + "files": []map[string]any{ + {"path": "/proj-md/README.md", "content": content, "content_hash": shaHex(content), "language": "markdown"}, + }, + }) + doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/index/finish", map[string]any{ + "run_id": begin.RunID, + }) + + w := doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/search", map[string]any{ + "query": "zlork", + "limit": 10, + "min_score": 0.0, + }) + if w.Code != http.StatusOK { + t.Fatalf("status=%d body=%s", w.Code, w.Body.String()) + } + + var resp searchResponse + if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil { + t.Fatalf("unmarshal: %v", err) + } + + // Find the outer section result and verify it has nested_hits. + var outer *searchResultItem + for i := range resp.Results { + r := &resp.Results[i] + if r.FilePath == "/proj-md/README.md" && r.StartLine == 1 { + outer = r + break + } + } + if outer == nil { + t.Fatalf("expected an outer section starting at line 1, got results: %+v", resp.Results) + } + if len(outer.NestedHits) == 0 { + t.Errorf("outer section should have nested hits absorbed, got NestedHits=%v", outer.NestedHits) + } + // And we should NOT see those nested ranges as separate top-level results. + for _, r := range resp.Results { + if r.FilePath == "/proj-md/README.md" && r.StartLine != 1 { + // Any other start line in the same file means a nested section + // leaked through merging. + t.Errorf("non-outer section leaked as separate result: lines %d-%d", r.StartLine, r.EndLine) + } + } +} + func TestSemanticSearch_HTTP_MissingQuery(t *testing.T) { d, hash := newIndexerTestDeps(t, "/proj") router := NewRouter(d) diff --git a/server/internal/httpapi/search.go b/server/internal/httpapi/search.go index befd16e..e3918c4 100644 --- a/server/internal/httpapi/search.go +++ b/server/internal/httpapi/search.go @@ -1,6 +1,7 @@ package httpapi import ( + "context" "database/sql" "encoding/json" "errors" @@ -582,14 +583,34 @@ type searchRequest struct { } type searchResultItem struct { - FilePath string `json:"file_path"` + FilePath string `json:"file_path"` + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + Content string `json:"content"` + Score float32 `json:"score"` + ChunkType string `json:"chunk_type"` + SymbolName string `json:"symbol_name"` + Language string `json:"language"` + // NestedHits records other matches inside this result's line range that + // were merged into it by mergeOverlappingHits. Populated only when at + // least one inner hit was absorbed; emitted as `nested_hits` in JSON. + // The renderer uses these to show breadcrumbs (e.g. "+ 2 more matches: + // H2 'Foo' line 27, H3 'Bar' line 29") so the user can see WHY this + // outer chunk ranks well even when the actual signal came from a + // sub-section. + NestedHits []nestedHit `json:"nested_hits,omitempty"` +} + +// nestedHit is a compact view of a chunk that was merged INTO another +// result. We don't need the full content (the parent's content already +// contains it textually) — just enough metadata to render a breadcrumb +// and let the caller jump to the exact line. +type nestedHit struct { StartLine int `json:"start_line"` EndLine int `json:"end_line"` - Content string `json:"content"` - Score float32 `json:"score"` + SymbolName string `json:"symbol_name,omitempty"` ChunkType string `json:"chunk_type"` - SymbolName string `json:"symbol_name"` - Language string `json:"language"` + Score float32 `json:"score"` } type searchResponse struct { @@ -652,117 +673,173 @@ func semanticSearchHandler(d Deps) http.HandlerFunc { return } - // M4 — multi-language fan-out. chromem-go's `where` map cannot express - // "language IN (go, python)" natively, so: - // - 0 languages: single query, no where filter. - // - 1 language: single query with `where={"language": lang}` — same - // HNSW-level pre-filter as Python. - // - ≥2 languages: N independent queries (one per language) merged and - // deduped by document ID. Preserves pre-filter semantics so the top - // results are not starved by unrelated languages when the collection - // is large. - const maxFanout = 4 - - var allResults []vectorStoreResult - switch { - case len(body.Languages) == 0: - r1, err := d.VectorStore.Search(r.Context(), p.HostPath, qEmb, body.Limit*2, nil) + // Post-filter (path/language) and merge state are computed once outside + // the window loop — both are cheap and don't depend on factor. + langSet := map[string]struct{}{} + for _, l := range body.Languages { + langSet[l] = struct{}{} + } + applyPostLangFilter := len(body.Languages) > maxFanoutSearch + + // Windowed retrieval. Start by asking the vector store for limit×2 + // (the historical default), and if mergeOverlappingHits collapses + // the result set below the user's --limit budget — typically because + // of nested markdown sections or class+method overlaps — re-ask for + // limit×4, then ×8, up to ×maxFactorSearch. Stops early when the + // store returns fewer rows than requested (HNSW exhausted). + var merged []searchResultItem + factor := 2 + for { + n := body.Limit * factor + rawWrapped, err := fetchVectorResults( + r.Context(), d.VectorStore, p.HostPath, qEmb, n, body.Languages, + ) if err != nil { writeError(w, http.StatusInternalServerError, err.Error()) return } - allResults = wrapResults(r1) - case len(body.Languages) == 1: - r1, err := d.VectorStore.Search(r.Context(), p.HostPath, qEmb, body.Limit*2, - map[string]string{"language": body.Languages[0]}) - if err != nil { - writeError(w, http.StatusInternalServerError, err.Error()) - return + filtered := filterToSearchItems(rawWrapped, minScore, body.Paths, langSet, applyPostLangFilter) + merged = mergeOverlappingHits(filtered) + if len(merged) >= body.Limit { + break } - allResults = wrapResults(r1) - case len(body.Languages) <= maxFanout: - // Per-language fan-out; merge and dedupe. - for _, lang := range body.Languages { - rPart, err := d.VectorStore.Search(r.Context(), p.HostPath, qEmb, body.Limit*2, - map[string]string{"language": lang}) - if err != nil { - writeError(w, http.StatusInternalServerError, err.Error()) - return - } - allResults = append(allResults, wrapResults(rPart)...) + if len(rawWrapped) < n { + // Vector store returned everything it had — no point asking again. + break } - allResults = dedupByLocation(allResults) - // Sort by descending score — merged slices arrive pre-sorted per - // partition but out of order across partitions. - sort.SliceStable(allResults, func(i, j int) bool { - return allResults[i].r.Score > allResults[j].r.Score - }) - default: - // Too many languages for fan-out — fall back to post-filter with a - // generous over-fetch to minimise starvation. - rAll, err := d.VectorStore.Search(r.Context(), p.HostPath, qEmb, - body.Limit*len(body.Languages)*2, nil) - if err != nil { - writeError(w, http.StatusInternalServerError, err.Error()) - return + if factor >= maxFactorSearch { + break } - allResults = wrapResults(rAll) + factor *= 2 } - // Post-filter for the >maxFanout path needs a language set. - langSet := map[string]struct{}{} - for _, l := range body.Languages { - langSet[l] = struct{}{} + if len(merged) > body.Limit { + merged = merged[:body.Limit] } - applyPostLangFilter := len(body.Languages) > maxFanout - filtered := make([]searchResultItem, 0, len(allResults)) - for _, wrapped := range allResults { - res := wrapped.r - if res.Score < minScore { - continue + elapsedMS := float64(time.Since(start).Microseconds()) / 1000.0 + elapsedMS = float64(int(elapsedMS*10+0.5)) / 10 + + writeJSON(w, http.StatusOK, searchResponse{ + Results: merged, + Total: len(merged), + QueryTimeMS: elapsedMS, + }) + } +} + +// maxFanoutSearch is the language-count threshold above which we drop +// per-language pre-filter and fall back to a single over-fetched query +// with post-filter. Same value as the previous inline `maxFanout`. +const maxFanoutSearch = 4 + +// maxFactorSearch caps the windowed retrieval expansion. With body.Limit=10 +// and factor=16 we top out at 160 raw results — enough to fill the budget +// even on heavily nested markdown without spending all day re-querying. +const maxFactorSearch = 16 + +// fetchVectorResults performs the per-language fan-out vector-store query +// at the given limit and returns deduped, score-sorted results. Extracted +// from semanticSearchHandler so the windowed retry loop can call it with +// growing `n` values without duplicating the four-case switch. +// +// The fan-out strategy mirrors the original inline logic: 0 languages → +// single query; 1 language → single query with where-filter; 2..maxFanout +// → N queries with per-language where-filter, deduped and re-sorted by +// score; >maxFanout → single oversized query, post-filter handled by +// caller (filterToSearchItems with applyPostLangFilter=true). +func fetchVectorResults( + ctx context.Context, + store *vectorstore.Store, + projectPath string, + qEmb []float32, + n int, + languages []string, +) ([]vectorStoreResult, error) { + switch { + case len(languages) == 0: + r1, err := store.Search(ctx, projectPath, qEmb, n, nil) + if err != nil { + return nil, err + } + return wrapResults(r1), nil + case len(languages) == 1: + r1, err := store.Search(ctx, projectPath, qEmb, n, + map[string]string{"language": languages[0]}) + if err != nil { + return nil, err + } + return wrapResults(r1), nil + case len(languages) <= maxFanoutSearch: + var combined []vectorStoreResult + for _, lang := range languages { + rPart, err := store.Search(ctx, projectPath, qEmb, n, + map[string]string{"language": lang}) + if err != nil { + return nil, err } - if applyPostLangFilter { - if _, ok := langSet[res.Language]; !ok { - continue - } + combined = append(combined, wrapResults(rPart)...) + } + combined = dedupByLocation(combined) + sort.SliceStable(combined, func(i, j int) bool { + return combined[i].r.Score > combined[j].r.Score + }) + return combined, nil + default: + rAll, err := store.Search(ctx, projectPath, qEmb, n*len(languages), nil) + if err != nil { + return nil, err + } + return wrapResults(rAll), nil + } +} + +// filterToSearchItems applies min-score, language post-filter, and path +// prefix/substring matches. It does NOT truncate — the merge step needs +// the full filtered set to identify all overlaps before deciding which to +// drop. Truncation happens after merge in the caller. +func filterToSearchItems( + wrapped []vectorStoreResult, + minScore float32, + paths []string, + langSet map[string]struct{}, + applyPostLangFilter bool, +) []searchResultItem { + filtered := make([]searchResultItem, 0, len(wrapped)) + for _, w := range wrapped { + res := w.r + if res.Score < minScore { + continue + } + if applyPostLangFilter { + if _, ok := langSet[res.Language]; !ok { + continue } - if len(body.Paths) > 0 { - matched := false - for _, pfx := range body.Paths { - if strings.HasPrefix(res.FilePath, pfx) || strings.Contains(res.FilePath, pfx) { - matched = true - break - } - } - if !matched { - continue + } + if len(paths) > 0 { + matched := false + for _, pfx := range paths { + if strings.HasPrefix(res.FilePath, pfx) || strings.Contains(res.FilePath, pfx) { + matched = true + break } } - filtered = append(filtered, searchResultItem{ - FilePath: res.FilePath, - StartLine: res.StartLine, - EndLine: res.EndLine, - Content: res.Content, - Score: res.Score, - ChunkType: res.ChunkType, - SymbolName: res.SymbolName, - Language: res.Language, - }) - if len(filtered) >= body.Limit { - break + if !matched { + continue } } - - elapsedMS := float64(time.Since(start).Microseconds()) / 1000.0 - elapsedMS = float64(int(elapsedMS*10+0.5)) / 10 - - writeJSON(w, http.StatusOK, searchResponse{ - Results: filtered, - Total: len(filtered), - QueryTimeMS: elapsedMS, + filtered = append(filtered, searchResultItem{ + FilePath: res.FilePath, + StartLine: res.StartLine, + EndLine: res.EndLine, + Content: res.Content, + Score: res.Score, + ChunkType: res.ChunkType, + SymbolName: res.SymbolName, + Language: res.Language, }) } + return filtered } // strconvItoa avoids pulling strconv just for one call in this file — mirrors diff --git a/server/internal/httpapi/search_merge.go b/server/internal/httpapi/search_merge.go new file mode 100644 index 0000000..36250d9 --- /dev/null +++ b/server/internal/httpapi/search_merge.go @@ -0,0 +1,138 @@ +package httpapi + +import "sort" + +// mergeOverlappingHits collapses search results that come from the same file +// when one's line range fully contains another's, or when same-symbol pieces +// of a split chunk happen to be adjacent. The "outer" hit survives, picks up +// the best score across the merged set, and records inner hits as +// NestedHits so the renderer can show them as breadcrumbs. +// +// Why this matters +// +// Tree-sitter emits nested chunks by design: a class chunk wraps its method +// chunks; a markdown H1 section wraps its H2 sub-sections; a Python class +// wraps inner functions. Without merging, a vector-search query that hits +// strongly inside one of those nested chunks tends to also hit (slightly +// less strongly) the parent chunk that textually contains the same lines, +// and the user's --limit budget gets eaten by N copies of essentially the +// same code region. +// +// Adjacency rule (the splitChunk leftover): when a function is too long for +// a single chunk, splitChunk emits piece 1 with the symbol metadata and +// pieces 2..N as anonymous `block`s. If a query happens to hit BOTH the +// named first piece AND the anonymous tail, those two ranges are exactly +// adjacent (piece1.EndLine + 1 == piece2.StartLine). We merge those too — +// the anonymous tail "belongs" to the named symbol on the same file. +// +// Cross-file results are NEVER merged: two functions with the same name in +// two different files are legitimately separate hits. +// +// The function does not truncate to any limit — that's the caller's job +// after this returns. Output is sorted by descending merged score. +func mergeOverlappingHits(items []searchResultItem) []searchResultItem { + if len(items) <= 1 { + return items + } + + // Group indices by file path. Keeping indices (not copies) so we can + // edit items[parentIdx] in-place to grow its NestedHits. + byFile := map[string][]int{} + for i := range items { + byFile[items[i].FilePath] = append(byFile[items[i].FilePath], i) + } + + consumed := make([]bool, len(items)) + + for _, idxs := range byFile { + if len(idxs) <= 1 { + continue + } + + // Sort by range size descending (largest first → potential parent), + // tiebreak by start line ascending so the iteration order is stable + // and biggest-encloses-everything-inside-it semantics fall out + // naturally. + sort.Slice(idxs, func(a, b int) bool { + ia, ib := items[idxs[a]], items[idxs[b]] + sa := ia.EndLine - ia.StartLine + sb := ib.EndLine - ib.StartLine + if sa != sb { + return sa > sb + } + return ia.StartLine < ib.StartLine + }) + + for ai := 0; ai < len(idxs); ai++ { + parentIdx := idxs[ai] + if consumed[parentIdx] { + continue + } + parent := items[parentIdx] + + for _, childIdx := range idxs[ai+1:] { + if consumed[childIdx] { + continue + } + child := items[childIdx] + if !shouldMerge(parent, child) { + continue + } + consumed[childIdx] = true + if child.Score > parent.Score { + parent.Score = child.Score + } + parent.NestedHits = append(parent.NestedHits, nestedHit{ + StartLine: child.StartLine, + EndLine: child.EndLine, + SymbolName: child.SymbolName, + ChunkType: child.ChunkType, + Score: child.Score, + }) + } + items[parentIdx] = parent + } + } + + out := make([]searchResultItem, 0, len(items)) + for i := range items { + if !consumed[i] { + out = append(out, items[i]) + } + } + + sort.SliceStable(out, func(i, j int) bool { + return out[i].Score > out[j].Score + }) + return out +} + +// shouldMerge returns true when child should be absorbed into parent. +// Two cases trigger a merge — see mergeOverlappingHits doc-comment for +// the rationale. +func shouldMerge(parent, child searchResultItem) bool { + if parent.FilePath != child.FilePath { + return false + } + // Case 1: parent's range strictly contains child's range. We require a + // strict containment (i.e. it's NOT the same range) to avoid merging + // duplicates from per-language fan-out — those should be deduped at the + // vector-store layer, not here. + if parent.StartLine <= child.StartLine && parent.EndLine >= child.EndLine { + if parent.StartLine != child.StartLine || parent.EndLine != child.EndLine { + return true + } + } + // Case 2: same-symbol adjacent ranges. After splitChunk, only the + // first piece keeps SymbolName, so the typical pattern is + // {symbol=run, lines 61..195} + {symbol="" tail block, lines 196..198}. + // Adjacency by itself isn't enough — we need at least one to carry the + // symbol so we know they're related; otherwise we'd merge unrelated + // neighbouring chunks. + if parent.SymbolName != "" || child.SymbolName != "" { + if parent.EndLine+1 == child.StartLine || child.EndLine+1 == parent.StartLine { + return true + } + } + return false +} diff --git a/server/internal/httpapi/search_merge_test.go b/server/internal/httpapi/search_merge_test.go new file mode 100644 index 0000000..7f7da23 --- /dev/null +++ b/server/internal/httpapi/search_merge_test.go @@ -0,0 +1,188 @@ +package httpapi + +import ( + "testing" +) + +func mkItem(file string, start, end int, score float32, symbol, kind string) searchResultItem { + return searchResultItem{ + FilePath: file, + StartLine: start, + EndLine: end, + Score: score, + SymbolName: symbol, + ChunkType: kind, + Language: "go", + } +} + +// TestMerge_NestedSections: H1 (1-200) wraps H2 (27-80), which wraps H3 (29-50). +// All three matched the query — output should be the H1 chunk with +// 2 nested hits, score = max of all three. +func TestMerge_NestedSections(t *testing.T) { + items := []searchResultItem{ + mkItem("README.md", 1, 200, 0.30, "", "section"), + mkItem("README.md", 27, 80, 0.45, "", "section"), + mkItem("README.md", 29, 50, 0.50, "", "section"), + } + out := mergeOverlappingHits(items) + if len(out) != 1 { + t.Fatalf("want 1 merged result, got %d", len(out)) + } + got := out[0] + if got.StartLine != 1 || got.EndLine != 200 { + t.Errorf("expected outer range 1-200, got %d-%d", got.StartLine, got.EndLine) + } + if got.Score != 0.50 { + t.Errorf("merged score = %v, want max=0.50", got.Score) + } + if len(got.NestedHits) != 2 { + t.Fatalf("want 2 nested hits, got %d", len(got.NestedHits)) + } +} + +// TestMerge_SameSymbolAdjacent: splitChunk emitted run() as +// - lines 61-195, function:run +// - lines 196-198, block, no symbol +// Merge the second into the first. +func TestMerge_SameSymbolAdjacent(t *testing.T) { + items := []searchResultItem{ + mkItem("main.go", 61, 195, 0.40, "run", "function"), + mkItem("main.go", 196, 198, 0.30, "", "block"), + } + out := mergeOverlappingHits(items) + if len(out) != 1 { + t.Fatalf("want 1 merged result, got %d", len(out)) + } + if out[0].StartLine != 61 || out[0].EndLine != 195 { + // Parent absorbs child; parent's range stays as-is. + t.Errorf("merged range = %d-%d, want 61-195", out[0].StartLine, out[0].EndLine) + } + if out[0].SymbolName != "run" { + t.Errorf("symbol lost: got %q, want run", out[0].SymbolName) + } +} + +// TestMerge_SiblingsNotMerged: two H2 sections in same file at separate +// non-overlapping ranges → keep both. +func TestMerge_SiblingsNotMerged(t *testing.T) { + items := []searchResultItem{ + mkItem("doc.md", 10, 30, 0.40, "", "section"), + mkItem("doc.md", 50, 90, 0.45, "", "section"), + } + out := mergeOverlappingHits(items) + if len(out) != 2 { + t.Fatalf("siblings should stay separate, got %d results", len(out)) + } +} + +// TestMerge_DifferentFiles: same line range, different files → not merged. +func TestMerge_DifferentFiles(t *testing.T) { + items := []searchResultItem{ + mkItem("a.go", 10, 30, 0.40, "fn", "function"), + mkItem("b.go", 10, 30, 0.45, "fn", "function"), + } + out := mergeOverlappingHits(items) + if len(out) != 2 { + t.Fatalf("cross-file dupes shouldn't merge, got %d", len(out)) + } +} + +// TestMerge_ExactDuplicateNotAbsorbed: same range twice (e.g. fan-out +// hiccup) — these should be deduped upstream (dedupByLocation), not by +// merge. We treat them as siblings here. +func TestMerge_ExactDuplicateNotAbsorbed(t *testing.T) { + items := []searchResultItem{ + mkItem("x.go", 1, 100, 0.30, "Foo", "class"), + mkItem("x.go", 1, 100, 0.40, "Foo", "class"), + } + out := mergeOverlappingHits(items) + if len(out) != 2 { + t.Errorf("exact duplicates should NOT be merged here (dedup is a separate step), got %d", len(out)) + } +} + +// TestMerge_RescoreUsesMax: parent had lower score than child → merged +// result inherits child's higher score. +func TestMerge_RescoreUsesMax(t *testing.T) { + items := []searchResultItem{ + mkItem("a.go", 1, 100, 0.20, "Outer", "class"), + mkItem("a.go", 30, 50, 0.80, "inner", "method"), + } + out := mergeOverlappingHits(items) + if len(out) != 1 { + t.Fatalf("want 1, got %d", len(out)) + } + if out[0].Score != 0.80 { + t.Errorf("merged score = %v, want 0.80", out[0].Score) + } + if len(out[0].NestedHits) != 1 || out[0].NestedHits[0].SymbolName != "inner" { + t.Errorf("nested hit missing or wrong: %+v", out[0].NestedHits) + } +} + +// TestMerge_ResortByMergedScore: merged item's max score should bring it +// to the top of the result list. +func TestMerge_ResortByMergedScore(t *testing.T) { + items := []searchResultItem{ + mkItem("a.go", 1, 100, 0.20, "Outer", "class"), // will absorb 0.80 + mkItem("a.go", 30, 50, 0.80, "inner", "method"), + mkItem("b.go", 1, 50, 0.50, "other", "function"), + } + out := mergeOverlappingHits(items) + if len(out) != 2 { + t.Fatalf("want 2 results after merge, got %d", len(out)) + } + if out[0].FilePath != "a.go" { + t.Errorf("merged a.go (score 0.80) should be first, got %s (score %v)", out[0].FilePath, out[0].Score) + } +} + +// TestMerge_NoOverlapNoMerge: completely disjoint hits stay as-is and +// keep their original score order. +func TestMerge_NoOverlapNoMerge(t *testing.T) { + items := []searchResultItem{ + mkItem("a.go", 1, 5, 0.50, "fnA", "function"), + mkItem("b.go", 10, 15, 0.40, "fnB", "function"), + mkItem("c.go", 20, 25, 0.30, "fnC", "function"), + } + out := mergeOverlappingHits(items) + if len(out) != 3 { + t.Fatalf("disjoint items should stay separate, got %d", len(out)) + } + if out[0].Score != 0.50 || out[1].Score != 0.40 || out[2].Score != 0.30 { + t.Errorf("score order broken: %v / %v / %v", out[0].Score, out[1].Score, out[2].Score) + } +} + +// TestMerge_TripleNesting: H1 -> H2 -> H3, all match. After merge, ONE +// result with 2 nested hits (H2 and H3 inside H1). +func TestMerge_TripleNesting(t *testing.T) { + items := []searchResultItem{ + mkItem("d.md", 1, 100, 0.30, "", "section"), + mkItem("d.md", 10, 50, 0.40, "", "section"), + mkItem("d.md", 20, 30, 0.55, "", "section"), + } + out := mergeOverlappingHits(items) + if len(out) != 1 { + t.Fatalf("triple nesting → 1 result, got %d", len(out)) + } + if len(out[0].NestedHits) != 2 { + t.Errorf("want 2 nested hits, got %d", len(out[0].NestedHits)) + } +} + +// TestMerge_AdjacentNoSymbolNotMerged: two anonymous chunks adjacent in +// the same file are NOT merged — we only merge adjacent chunks if at +// least one carries a symbol (otherwise we have no signal that they're +// related). +func TestMerge_AdjacentNoSymbolNotMerged(t *testing.T) { + items := []searchResultItem{ + mkItem("x.go", 1, 50, 0.40, "", "module"), + mkItem("x.go", 51, 100, 0.45, "", "module"), + } + out := mergeOverlappingHits(items) + if len(out) != 2 { + t.Errorf("anonymous adjacent chunks shouldn't merge, got %d", len(out)) + } +} From 14fc45a74c832ddbaac254429bf22ec323214f50 Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Mon, 27 Apr 2026 22:47:41 +0100 Subject: [PATCH 6/9] =?UTF-8?q?feat(search):=20file-grouped=20results=20?= =?UTF-8?q?=E2=80=94=20one=20file=20=3D=20one=20entry,=20all=20matches=20i?= =?UTF-8?q?nside?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Changes the unit of search output from "chunk" to "file". Inspired by how grep groups hits per file but with AST-aware match boundaries and embedding-driven ranking. Old wire shape: a flat list of chunks. A file with three matching chunks ate three slots out of the user's --limit budget, scattered across the result list, often with the file appearing at positions #3 and #10 simultaneously. New wire shape: results: [ { file_path, language, best_score, matches: [ { start_line, end_line, score, content, chunk_type, symbol_name, nested_hits }, ... ] } ] total: Ranking: * Files ordered by best_score (the highest match score in the group) descending. * Inside each file, matches ordered by start_line ascending — natural reading order top-to-bottom. * No per-file cap on matches. The only intra-file filter is min_score. A file with 50 matches above threshold shows all 50. Window loop now targets distinct files, not chunks: factor 2..16, stops when len(file_groups) >= limit, when the vector store returns fewer rows than asked, or when the cap is hit. mergeOverlappingHits still runs FIRST (collapses nested H1⊋H2⊋H3 etc. into one match with nested_hits inside), then groupByFile lifts the survivors into file-grouped output. So a markdown file with three nested sections still produces ONE match (not one file with three), and a Go file with class+method overlap still produces a clean class match with the method as a nested hit. CLI render redesigned around the new shape: 1. /path/to/file.go [best 0.85] 4 matches · go -- [0.85] lines 61-195 (function run) ```go ... ``` + 1 more match inside: · [0.50] line 80 (function init) -- [0.42] lines 250-280 (type Server) ```go ... ``` Tests: * groupByFile: sort-by-best-score, sort-matches-by-line, preserves nested_hits, empty input. * TestSemanticSearch_NestedMarkdownMerge updated for the new shape — still asserts the H1 absorbs the two H2 sub-sections (now visible as group.Matches[0].NestedHits). * CLI search_test fixture updated to new wire shape. Co-Authored-By: Claude Opus 4.7 --- cli/cmd/search.go | 87 ++++++----- cli/cmd/search_test.go | 25 ++-- cli/internal/client/search.go | 53 ++++--- server/internal/httpapi/indexing_test.go | 32 ++-- server/internal/httpapi/search.go | 146 +++++++++++++++---- server/internal/httpapi/search_merge_test.go | 74 ++++++++++ 6 files changed, 298 insertions(+), 119 deletions(-) diff --git a/cli/cmd/search.go b/cli/cmd/search.go index 9b12855..c67247f 100644 --- a/cli/cmd/search.go +++ b/cli/cmd/search.go @@ -112,48 +112,59 @@ func runSearch(cmd *cobra.Command, args []string) error { return nil } - // Print results - fmt.Printf("Found %d result(s) (%.1fms):\n\n", results.Total, results.QueryTimeMS) - - for i, result := range results.Results { - // Format score as colored - scoreStr := fmt.Sprintf("%.2f", result.Score) - - // Print result header - fmt.Printf("%d. [%s] %s:%d-%d\n", - i+1, scoreStr, result.FilePath, result.StartLine, result.EndLine) - - // Print metadata - meta := []string{} - if result.SymbolName != "" { - meta = append(meta, fmt.Sprintf("Symbol: %s", result.SymbolName)) + // Files-as-results: --limit is a count of files. Inside each file, + // every match above min_score is shown, ordered by line number so the + // reader walks the file top-to-bottom. + fmt.Printf("Found %d file(s) (%.1fms):\n\n", results.Total, results.QueryTimeMS) + + for i, file := range results.Results { + // File header. Best score is the rank driver; total match count + // gives a sense of how relevant this file is overall. + matchWord := "match" + if len(file.Matches) != 1 { + matchWord = "matches" } - meta = append(meta, fmt.Sprintf("Type: %s", result.ChunkType)) - if result.Language != "" { - meta = append(meta, fmt.Sprintf("Lang: %s", result.Language)) + langSuffix := "" + if file.Language != "" { + langSuffix = " · " + file.Language } - fmt.Printf(" %s\n", strings.Join(meta, " | ")) + fmt.Printf("%d. %s [best %.2f] %d %s%s\n", + i+1, file.FilePath, file.BestScore, len(file.Matches), matchWord, langSuffix) + + for _, m := range file.Matches { + // Per-match separator with score + line range + label so the + // user can scan vertically by relevance, even though matches + // are in line order. + label := m.ChunkType + if m.SymbolName != "" { + label = fmt.Sprintf("%s %s", m.ChunkType, m.SymbolName) + } + rangeStr := fmt.Sprintf("line %d", m.StartLine) + if m.EndLine != m.StartLine { + rangeStr = fmt.Sprintf("lines %d-%d", m.StartLine, m.EndLine) + } + fmt.Printf(" -- [%.2f] %s (%s)\n", m.Score, rangeStr, label) - fmt.Printf(" ```%s\n", result.Language) - content := result.Content - for _, line := range strings.Split(content, "\n") { - fmt.Printf(" %s\n", line) - } - fmt.Printf(" ```\n") - - // Breadcrumbs for nested hits absorbed by the server's merge step. - // Tells the user "this big chunk ranks well because of these inner - // matches" so they're not surprised that --limit returned fewer - // items than expected. - if len(result.NestedHits) > 0 { - fmt.Printf(" + %d more match(es) inside:\n", len(result.NestedHits)) - for _, nh := range result.NestedHits { - label := nh.ChunkType - if nh.SymbolName != "" { - label = fmt.Sprintf("%s %s", nh.ChunkType, nh.SymbolName) + lang := file.Language + fmt.Printf(" ```%s\n", lang) + for _, line := range strings.Split(m.Content, "\n") { + fmt.Printf(" %s\n", line) + } + fmt.Printf(" ```\n") + + // Nested hits — chunks merged INTO this match by the server. + // They sit textually inside m.Content; this just exposes the + // inner anchor points so the user can jump to the exact line. + if len(m.NestedHits) > 0 { + fmt.Printf(" + %d more match(es) inside:\n", len(m.NestedHits)) + for _, nh := range m.NestedHits { + nhLabel := nh.ChunkType + if nh.SymbolName != "" { + nhLabel = fmt.Sprintf("%s %s", nh.ChunkType, nh.SymbolName) + } + fmt.Printf(" · [%.2f] line %d (%s)\n", + nh.Score, nh.StartLine, nhLabel) } - fmt.Printf(" · [%.2f] %s:%d-%d (%s)\n", - nh.Score, result.FilePath, nh.StartLine, nh.EndLine, label) } } fmt.Println() diff --git a/cli/cmd/search_test.go b/cli/cmd/search_test.go index 140038b..609e7e4 100644 --- a/cli/cmd/search_test.go +++ b/cli/cmd/search_test.go @@ -18,14 +18,19 @@ func TestRunSearch_Results(t *testing.T) { writeJSON(w, 200, map[string]any{ "results": []map[string]any{ { - "file_path": proj + "/api/auth.go", - "start_line": 10, - "end_line": 25, - "content": "func AuthMiddleware() {}", - "score": 0.92, - "chunk_type": "function", - "symbol_name": "AuthMiddleware", - "language": "go", + "file_path": proj + "/api/auth.go", + "language": "go", + "best_score": 0.92, + "matches": []map[string]any{ + { + "start_line": 10, + "end_line": 25, + "content": "func AuthMiddleware() {}", + "score": 0.92, + "chunk_type": "function", + "symbol_name": "AuthMiddleware", + }, + }, }, }, "total": 1, @@ -58,8 +63,8 @@ func TestRunSearch_Results(t *testing.T) { if !strings.Contains(out, "auth.go") { t.Errorf("expected file path in output, got:\n%s", out) } - if !strings.Contains(out, "1 result") { - t.Errorf("expected result count in output, got:\n%s", out) + if !strings.Contains(out, "1 file") { + t.Errorf("expected file count in output, got:\n%s", out) } } diff --git a/cli/internal/client/search.go b/cli/internal/client/search.go index 018a579..093be6a 100644 --- a/cli/internal/client/search.go +++ b/cli/internal/client/search.go @@ -2,28 +2,23 @@ package client import "fmt" -// SearchResult represents a code search result. -// -// NestedHits is populated by the server's mergeOverlappingHits step when -// other matches inside this chunk's line range were absorbed (e.g. a -// markdown H2 inside an H1 section, or a method inside its class). The -// renderer uses these to show breadcrumbs so the user can see WHY this -// outer chunk ranks well. -type SearchResult struct { - FilePath string `json:"file_path"` - StartLine int `json:"start_line"` - EndLine int `json:"end_line"` - Content string `json:"content"` - Score float64 `json:"score"` - ChunkType string `json:"chunk_type"` - SymbolName string `json:"symbol_name"` - Language string `json:"language"` - NestedHits []NestedHit `json:"nested_hits,omitempty"` +// FileMatch is one search hit inside a file group. Position + score + +// content + chunk metadata. NestedHits records overlapping inner chunks +// that were absorbed by mergeOverlappingHits on the server side (e.g. a +// markdown H2 absorbed into its parent H1 section). +type FileMatch struct { + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + Content string `json:"content"` + Score float64 `json:"score"` + ChunkType string `json:"chunk_type"` + SymbolName string `json:"symbol_name,omitempty"` + NestedHits []NestedHit `json:"nested_hits,omitempty"` } -// NestedHit is a chunk that was merged INTO another result by the server. -// Just enough metadata to render a breadcrumb and let the user jump to -// the exact line. The full content is already inside the parent result. +// NestedHit is a chunk absorbed INTO a parent FileMatch. The parent's +// content already contains it textually; this carries just the metadata +// so renderers can show a breadcrumb and let the user jump to the line. type NestedHit struct { StartLine int `json:"start_line"` EndLine int `json:"end_line"` @@ -32,11 +27,23 @@ type NestedHit struct { Score float64 `json:"score"` } +// SearchResult is the top-level unit of search output: one file with +// every match inside it that passed min_score. Files are ordered by +// BestScore descending; matches inside are ordered by StartLine ascending +// (natural reading order). No per-file cap — the only intra-file filter +// is the similarity threshold. +type SearchResult struct { + FilePath string `json:"file_path"` + Language string `json:"language,omitempty"` + BestScore float64 `json:"best_score"` + Matches []FileMatch `json:"matches"` +} + // SearchResponse represents the search response type SearchResponse struct { - Results []SearchResult `json:"results"` - Total int `json:"total"` - QueryTimeMS float64 `json:"query_time_ms"` + Results []SearchResult `json:"results"` + Total int `json:"total"` + QueryTimeMS float64 `json:"query_time_ms"` } // SymbolResult represents a symbol search result diff --git a/server/internal/httpapi/indexing_test.go b/server/internal/httpapi/indexing_test.go index cc73c18..dcdce3a 100644 --- a/server/internal/httpapi/indexing_test.go +++ b/server/internal/httpapi/indexing_test.go @@ -312,28 +312,28 @@ func TestSemanticSearch_NestedMarkdownMerge(t *testing.T) { t.Fatalf("unmarshal: %v", err) } - // Find the outer section result and verify it has nested_hits. - var outer *searchResultItem + // Find the README.md file group and verify the merge happened. + var group *fileGroupResult for i := range resp.Results { - r := &resp.Results[i] - if r.FilePath == "/proj-md/README.md" && r.StartLine == 1 { - outer = r + if resp.Results[i].FilePath == "/proj-md/README.md" { + group = &resp.Results[i] break } } - if outer == nil { - t.Fatalf("expected an outer section starting at line 1, got results: %+v", resp.Results) + if group == nil { + t.Fatalf("expected a file group for README.md, got results: %+v", resp.Results) + } + // After merge, only ONE match should remain in this file (the outer + // H1 section absorbing the two H2s as nested hits). + if len(group.Matches) != 1 { + t.Fatalf("expected 1 match in README.md after merge, got %d: %+v", len(group.Matches), group.Matches) + } + outer := group.Matches[0] + if outer.StartLine != 1 { + t.Errorf("outer match should start at line 1, got %d", outer.StartLine) } if len(outer.NestedHits) == 0 { - t.Errorf("outer section should have nested hits absorbed, got NestedHits=%v", outer.NestedHits) - } - // And we should NOT see those nested ranges as separate top-level results. - for _, r := range resp.Results { - if r.FilePath == "/proj-md/README.md" && r.StartLine != 1 { - // Any other start line in the same file means a nested section - // leaked through merging. - t.Errorf("non-outer section leaked as separate result: lines %d-%d", r.StartLine, r.EndLine) - } + t.Errorf("outer match should record absorbed nested hits, got NestedHits=%v", outer.NestedHits) } } diff --git a/server/internal/httpapi/search.go b/server/internal/httpapi/search.go index e3918c4..432407d 100644 --- a/server/internal/httpapi/search.go +++ b/server/internal/httpapi/search.go @@ -582,29 +582,28 @@ type searchRequest struct { MinScore *float32 `json:"min_score,omitempty"` } +// searchResultItem is the per-chunk match used INTERNALLY during retrieval. +// It is not exposed in the JSON response — the wire format groups matches +// by file (see fileGroupResult). The merge step (mergeOverlappingHits) +// works on this struct, then groupByFile lifts the survivors into +// file-grouped results. type searchResultItem struct { - FilePath string `json:"file_path"` + FilePath string `json:"-"` StartLine int `json:"start_line"` EndLine int `json:"end_line"` Content string `json:"content"` Score float32 `json:"score"` ChunkType string `json:"chunk_type"` - SymbolName string `json:"symbol_name"` - Language string `json:"language"` - // NestedHits records other matches inside this result's line range that - // were merged into it by mergeOverlappingHits. Populated only when at - // least one inner hit was absorbed; emitted as `nested_hits` in JSON. - // The renderer uses these to show breadcrumbs (e.g. "+ 2 more matches: - // H2 'Foo' line 27, H3 'Bar' line 29") so the user can see WHY this - // outer chunk ranks well even when the actual signal came from a - // sub-section. + SymbolName string `json:"symbol_name,omitempty"` + Language string `json:"-"` NestedHits []nestedHit `json:"nested_hits,omitempty"` } // nestedHit is a compact view of a chunk that was merged INTO another -// result. We don't need the full content (the parent's content already -// contains it textually) — just enough metadata to render a breadcrumb -// and let the caller jump to the exact line. +// result by mergeOverlappingHits (e.g. an H2 section absorbed into its +// containing H1). The parent's `content` already includes the inner +// textually; this just records the metadata so renderers can show a +// breadcrumb and let the user jump to the exact line. type nestedHit struct { StartLine int `json:"start_line"` EndLine int `json:"end_line"` @@ -613,10 +612,37 @@ type nestedHit struct { Score float32 `json:"score"` } +// fileMatch is one search hit inside a file group. Mirrors the per-chunk +// information from searchResultItem but without the file_path/language +// (those live one level up on fileGroupResult — same for every match in +// the group). +type fileMatch struct { + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + Content string `json:"content"` + Score float32 `json:"score"` + ChunkType string `json:"chunk_type"` + SymbolName string `json:"symbol_name,omitempty"` + NestedHits []nestedHit `json:"nested_hits,omitempty"` +} + +// fileGroupResult is the top-level unit of search output: one file with +// every match inside it that passed min_score. Files are ranked by +// BestScore (the highest match score in the group) and matches inside are +// ordered by StartLine ascending so the renderer reads top-to-bottom like +// the actual file. There is no per-file cap on matches — the only filter +// inside a file is the similarity threshold. +type fileGroupResult struct { + FilePath string `json:"file_path"` + Language string `json:"language,omitempty"` + BestScore float32 `json:"best_score"` + Matches []fileMatch `json:"matches"` +} + type searchResponse struct { - Results []searchResultItem `json:"results"` - Total int `json:"total"` - QueryTimeMS float64 `json:"query_time_ms"` + Results []fileGroupResult `json:"results"` + Total int `json:"total"` + QueryTimeMS float64 `json:"query_time_ms"` } // semanticSearchHandler implements POST /api/v1/projects/{path}/search, @@ -673,21 +699,26 @@ func semanticSearchHandler(d Deps) http.HandlerFunc { return } - // Post-filter (path/language) and merge state are computed once outside - // the window loop — both are cheap and don't depend on factor. + // Post-filter (path/language) state is computed once outside the + // window loop — cheap and doesn't depend on factor. langSet := map[string]struct{}{} for _, l := range body.Languages { langSet[l] = struct{}{} } applyPostLangFilter := len(body.Languages) > maxFanoutSearch - // Windowed retrieval. Start by asking the vector store for limit×2 - // (the historical default), and if mergeOverlappingHits collapses - // the result set below the user's --limit budget — typically because - // of nested markdown sections or class+method overlaps — re-ask for - // limit×4, then ×8, up to ×maxFactorSearch. Stops early when the - // store returns fewer rows than requested (HNSW exhausted). - var merged []searchResultItem + // Windowed retrieval. The user's --limit is a count of FILES, not + // chunks. We start with a chunk over-fetch of limit×2, group by + // file, and if we don't have `limit` distinct files yet, re-fetch + // with limit×4, ×8, ×16. Stops early when: + // - we have ≥ limit file groups, OR + // - the vector store returned fewer rows than asked (HNSW is + // exhausted), OR + // - factor cap (×16) reached. + // + // Inside each file there is no count cap — every match that passed + // min_score is shown. The threshold is the only intra-file filter. + var fileGroups []fileGroupResult factor := 2 for { n := body.Limit * factor @@ -699,12 +730,12 @@ func semanticSearchHandler(d Deps) http.HandlerFunc { return } filtered := filterToSearchItems(rawWrapped, minScore, body.Paths, langSet, applyPostLangFilter) - merged = mergeOverlappingHits(filtered) - if len(merged) >= body.Limit { + merged := mergeOverlappingHits(filtered) + fileGroups = groupByFile(merged) + if len(fileGroups) >= body.Limit { break } if len(rawWrapped) < n { - // Vector store returned everything it had — no point asking again. break } if factor >= maxFactorSearch { @@ -713,21 +744,72 @@ func semanticSearchHandler(d Deps) http.HandlerFunc { factor *= 2 } - if len(merged) > body.Limit { - merged = merged[:body.Limit] + if len(fileGroups) > body.Limit { + fileGroups = fileGroups[:body.Limit] } elapsedMS := float64(time.Since(start).Microseconds()) / 1000.0 elapsedMS = float64(int(elapsedMS*10+0.5)) / 10 writeJSON(w, http.StatusOK, searchResponse{ - Results: merged, - Total: len(merged), + Results: fileGroups, + Total: len(fileGroups), QueryTimeMS: elapsedMS, }) } } +// groupByFile takes the merged per-chunk results and lifts them into +// file-grouped output. Inside each file matches are sorted by StartLine +// ascending (natural reading order); files are sorted by BestScore +// descending (hottest hit drives the rank). +func groupByFile(items []searchResultItem) []fileGroupResult { + if len(items) == 0 { + return nil + } + indexByPath := map[string]int{} + var groups []fileGroupResult + for _, it := range items { + idx, ok := indexByPath[it.FilePath] + if !ok { + groups = append(groups, fileGroupResult{ + FilePath: it.FilePath, + Language: it.Language, + BestScore: it.Score, + }) + idx = len(groups) - 1 + indexByPath[it.FilePath] = idx + } + g := &groups[idx] + if it.Score > g.BestScore { + g.BestScore = it.Score + } + g.Matches = append(g.Matches, fileMatch{ + StartLine: it.StartLine, + EndLine: it.EndLine, + Content: it.Content, + Score: it.Score, + ChunkType: it.ChunkType, + SymbolName: it.SymbolName, + NestedHits: it.NestedHits, + }) + } + for i := range groups { + ms := groups[i].Matches + sort.SliceStable(ms, func(a, b int) bool { + return ms[a].StartLine < ms[b].StartLine + }) + } + sort.SliceStable(groups, func(i, j int) bool { + if groups[i].BestScore != groups[j].BestScore { + return groups[i].BestScore > groups[j].BestScore + } + // Tie-break by file path so output is stable in tests. + return groups[i].FilePath < groups[j].FilePath + }) + return groups +} + // maxFanoutSearch is the language-count threshold above which we drop // per-language pre-filter and fall back to a single over-fetched query // with post-filter. Same value as the previous inline `maxFanout`. diff --git a/server/internal/httpapi/search_merge_test.go b/server/internal/httpapi/search_merge_test.go index 7f7da23..8634428 100644 --- a/server/internal/httpapi/search_merge_test.go +++ b/server/internal/httpapi/search_merge_test.go @@ -172,6 +172,80 @@ func TestMerge_TripleNesting(t *testing.T) { } } +// --- groupByFile ---------------------------------------------------------- + +// TestGroupByFile_SortsByBestScore: files are ordered by best chunk score +// descending. main.go's best chunk (0.80) outranks doc.md's best (0.60). +func TestGroupByFile_SortsByBestScore(t *testing.T) { + items := []searchResultItem{ + mkItem("doc.md", 1, 10, 0.60, "", "section"), + mkItem("main.go", 5, 20, 0.40, "Foo", "function"), + mkItem("main.go", 30, 50, 0.80, "Bar", "function"), + } + groups := groupByFile(items) + if len(groups) != 2 { + t.Fatalf("want 2 groups, got %d", len(groups)) + } + if groups[0].FilePath != "main.go" || groups[0].BestScore != 0.80 { + t.Errorf("first group should be main.go (0.80), got %+v", groups[0]) + } + if groups[1].FilePath != "doc.md" { + t.Errorf("second group should be doc.md, got %s", groups[1].FilePath) + } +} + +// TestGroupByFile_SortsMatchesByLineAsc: inside a file, matches read top- +// to-bottom — score order is for FILE ranking, not for matches inside. +func TestGroupByFile_SortsMatchesByLineAsc(t *testing.T) { + items := []searchResultItem{ + mkItem("a.go", 100, 120, 0.50, "later", "function"), + mkItem("a.go", 30, 50, 0.80, "earlier", "function"), + mkItem("a.go", 200, 220, 0.40, "latest", "function"), + } + groups := groupByFile(items) + if len(groups) != 1 { + t.Fatalf("want 1 group, got %d", len(groups)) + } + ms := groups[0].Matches + if len(ms) != 3 { + t.Fatalf("want 3 matches, got %d", len(ms)) + } + if ms[0].StartLine != 30 || ms[1].StartLine != 100 || ms[2].StartLine != 200 { + t.Errorf("matches not sorted by StartLine asc: %v %v %v", + ms[0].StartLine, ms[1].StartLine, ms[2].StartLine) + } + // Best score still tracks the max score across all matches. + if groups[0].BestScore != 0.80 { + t.Errorf("best score = %v, want 0.80", groups[0].BestScore) + } +} + +// TestGroupByFile_PreservesNestedHits: items that survived merge with +// nested_hits attached should carry them through into the file group. +func TestGroupByFile_PreservesNestedHits(t *testing.T) { + parent := mkItem("d.md", 1, 100, 0.50, "", "section") + parent.NestedHits = []nestedHit{ + {StartLine: 10, EndLine: 30, ChunkType: "section", Score: 0.60}, + } + groups := groupByFile([]searchResultItem{parent}) + if len(groups) != 1 || len(groups[0].Matches) != 1 { + t.Fatalf("unexpected shape: %+v", groups) + } + if len(groups[0].Matches[0].NestedHits) != 1 { + t.Errorf("nested hits dropped during groupByFile: %+v", groups[0].Matches[0]) + } +} + +// TestGroupByFile_Empty: nil/empty in → nil out. +func TestGroupByFile_Empty(t *testing.T) { + if groupByFile(nil) != nil { + t.Error("nil input should produce nil output") + } + if groupByFile([]searchResultItem{}) != nil { + t.Error("empty input should produce nil output") + } +} + // TestMerge_AdjacentNoSymbolNotMerged: two anonymous chunks adjacent in // the same file are NOT merged — we only merge adjacent chunks if at // least one carries a symbol (otherwise we have no signal that they're From 1593cf9f2643d4bc0a3bb3bc5442a3abb3820d17 Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Tue, 28 Apr 2026 11:49:33 +0100 Subject: [PATCH 7/9] refactoring search output, refactoring file indexing --- README.md | 55 +++- cli/cmd/search.go | 71 ++++- cli/cmd/search_test.go | 163 ++++++++++- cli/cmd/status.go | 33 +++ cli/internal/client/search.go | 4 + server/bench/bench_eval_retrieval.go | 277 ++++++++++++++++++ server/bench/queries.json | 222 ++++++++++++++ server/cmd/cix-server/main.go | 4 + server/internal/config/config.go | 13 + server/internal/embeddings/format.go | 59 ++++ server/internal/embeddings/format_test.go | 102 +++++++ server/internal/httpapi/search.go | 34 ++- server/internal/httpapi/search_filter_test.go | 85 ++++++ server/internal/indexer/indexer.go | 30 +- server/internal/indexer/indexer_test.go | 124 ++++++++ 15 files changed, 1245 insertions(+), 31 deletions(-) create mode 100644 server/bench/bench_eval_retrieval.go create mode 100644 server/bench/queries.json create mode 100644 server/internal/embeddings/format.go create mode 100644 server/internal/embeddings/format_test.go create mode 100644 server/internal/httpapi/search_filter_test.go diff --git a/README.md b/README.md index 0fba1d5..1f06e00 100644 --- a/README.md +++ b/README.md @@ -244,9 +244,10 @@ cix summary # Semantic search — natural language, finds by meaning cix search [flags] --in restrict to file or directory (repeatable) + --exclude exclude file or directory (repeatable) --lang filter by language (repeatable) --limit, -l max results (default: 10) - --min-score <0-1> minimum relevance score (default: 0.1) + --min-score <0-1> minimum relevance score (default: 0.4) -p project path (default: cwd) # Symbol search — fast lookup by name @@ -365,6 +366,56 @@ Supported languages: Python, TypeScript, JavaScript, Go, Rust, Java (+ 40+ other --- +## Tuning Search Quality + +### `--min-score` threshold + +`cix` defaults to `--min-score 0.4`. This is calibrated for **CodeRankEmbed-Q8_0** with the path-aware embedding format (`CIX_EMBED_INCLUDE_PATH=true`, default). + +A typical score landscape on this codebase: + +| Match strength | Score range | Action | +|---|---|---| +| Exact symbol or filename match | 0.65 – 0.80 | rare; very high confidence | +| Strong path-aware concept match | 0.50 – 0.65 | typical "good" match for `cix search "cli watch daemon"` | +| Weaker concept / partial path overlap | 0.40 – 0.50 | typical for ambiguous or multi-token queries | +| Likely unrelated noise | < 0.40 | filtered out by default | + +**When to lower the threshold**: + +- The query returns `No results` but you know matching code exists — try `--min-score 0.25` +- Your query is intentionally vague (exploring an unfamiliar codebase) — `--min-score 0.2` +- Single-word identifier queries on rare names + +**When to raise the threshold**: + +- Agent context is filling up with weak matches — `--min-score 0.5` +- You only want clear top hits — `--min-score 0.6` + +> [!NOTE] +> CodeRankEmbed is **asymmetric**: queries get a `"Represent this query for searching relevant code: "` prefix, which puts query and passage vectors into separate regions of the embedding space. Cosine similarities are systematically lower than for symmetric models — a "strong" match here is 0.55, not 0.80. Don't compare these numbers to thresholds quoted for OpenAI / Voyage / generic sentence-transformers. + +> [!TIP] +> If you switched embedding models or toggled `CIX_EMBED_INCLUDE_PATH`, rerun `cix reindex --full` and recalibrate. Old vectors and new vectors live in the same store but score differently. + +### `--exclude` for noisy directories + +Repos with vendored code, fixtures, or legacy migrations can pull unrelated paths into top results because path tokens contribute to scoring. Two options: + +```bash +# One-off exclude for a single search +cix search "main entry point" --exclude legacy --exclude bench/fixtures + +# Permanent exclude — add to .cixignore (skips indexing entirely) +echo "legacy/" >> .cixignore +echo "bench/fixtures/" >> .cixignore +cix reindex --full +``` + +`.cixignore` is preferred for directories you never want in results — they don't take up index space. `--exclude` is a per-query escape hatch. + +--- + ## Per-Project Configuration ### `.cixignore` — exclude files from indexing @@ -557,7 +608,7 @@ cix watch stop && cix watch /path/to/project **Search returns no results** - Check project is indexed: `cix status` -- Lower the threshold: `cix search "query" --min-score 0.05` +- Lower the threshold: `cix search "query" --min-score 0.2` (default is `0.4`; see [Tuning Search Quality](#tuning-search-quality)) - Docker mode: run `cix list` to verify the project is registered --- diff --git a/cli/cmd/search.go b/cli/cmd/search.go index c67247f..2873b19 100644 --- a/cli/cmd/search.go +++ b/cli/cmd/search.go @@ -14,6 +14,7 @@ var ( searchLimit int searchLanguages []string searchPaths []string + searchExcludes []string searchMinScore float64 searchProject string ) @@ -37,7 +38,8 @@ Examples: cix search "API endpoints" --lang go --lang python cix search "error handling" --in src/api/ cix search "config" --in README.md - cix search "routes" --in ./api --in ./mcp_server`, + cix search "routes" --in ./api --in ./mcp_server + cix search "main entry point" --exclude bench/fixtures --exclude legacy`, Args: cobra.ExactArgs(1), RunE: runSearch, } @@ -47,10 +49,32 @@ func init() { searchCmd.Flags().IntVarP(&searchLimit, "limit", "l", 10, "Maximum number of results") searchCmd.Flags().StringSliceVar(&searchLanguages, "lang", nil, "Filter by language") searchCmd.Flags().StringSliceVar(&searchPaths, "in", nil, "Search within file or directory (relative or absolute path)") - searchCmd.Flags().Float64Var(&searchMinScore, "min-score", 0.1, "Minimum relevance score") + searchCmd.Flags().StringSliceVar(&searchExcludes, "exclude", nil, "Exclude file or directory from results (relative or absolute path)") + // Default threshold of 0.4 calibrated for CodeRankEmbed-Q8_0 with + // path-aware embedding (CIX_EMBED_INCLUDE_PATH=true). Below 0.4 results + // are usually unrelated; lower it explicitly for very specific or + // long-tail queries via --min-score 0.2. + searchCmd.Flags().Float64Var(&searchMinScore, "min-score", 0.4, "Minimum relevance score (lower with --min-score 0.2 if your query returns nothing)") searchCmd.Flags().StringVarP(&searchProject, "project", "p", "", "Project path (default: current directory)") } +// resolveFilterPaths normalises --in / --exclude inputs to absolute paths +// so the server's prefix-match against canonical FilePaths in the vector +// store works regardless of whether the user wrote a relative or absolute +// argument. Inputs that don't resolve are passed through unchanged so a +// substring match (server-side) can still fire. +func resolveFilterPaths(in []string) []string { + out := make([]string, 0, len(in)) + for _, p := range in { + if ap, err := filepath.Abs(p); err == nil { + out = append(out, ap) + } else { + out = append(out, p) + } + } + return out +} + func runSearch(cmd *cobra.Command, args []string) error { query := args[0] @@ -78,27 +102,27 @@ func runSearch(cmd *cobra.Command, args []string) error { absPath = findProjectRoot(absPath, apiClient) // Resolve --in paths to absolute - resolvedPaths := make([]string, 0, len(searchPaths)) - for _, p := range searchPaths { - ap, err := filepath.Abs(p) - if err == nil { - resolvedPaths = append(resolvedPaths, ap) - } else { - resolvedPaths = append(resolvedPaths, p) - } - } + resolvedPaths := resolveFilterPaths(searchPaths) + resolvedExcludes := resolveFilterPaths(searchExcludes) // Perform search opts := client.SearchOptions{ Limit: searchLimit, Languages: searchLanguages, Paths: resolvedPaths, + Excludes: resolvedExcludes, MinScore: searchMinScore, } - if len(resolvedPaths) > 0 { + switch { + case len(resolvedPaths) > 0 && len(resolvedExcludes) > 0: + fmt.Printf("Searching in %s (filtered: %s, excluded: %s)...\n\n", + absPath, strings.Join(resolvedPaths, ", "), strings.Join(resolvedExcludes, ", ")) + case len(resolvedPaths) > 0: fmt.Printf("Searching in %s (filtered: %s)...\n\n", absPath, strings.Join(resolvedPaths, ", ")) - } else { + case len(resolvedExcludes) > 0: + fmt.Printf("Searching in %s (excluded: %s)...\n\n", absPath, strings.Join(resolvedExcludes, ", ")) + default: fmt.Printf("Searching in %s...\n\n", absPath) } @@ -128,8 +152,21 @@ func runSearch(cmd *cobra.Command, args []string) error { if file.Language != "" { langSuffix = " · " + file.Language } + // Display the path relative to the project root when possible — + // agents and humans both read shorter paths faster, and absolute + // paths just leak filesystem layout into the agent context window. + displayPath := file.FilePath + if rel, relErr := filepath.Rel(absPath, file.FilePath); relErr == nil { + displayPath = rel + } fmt.Printf("%d. %s [best %.2f] %d %s%s\n", - i+1, file.FilePath, file.BestScore, len(file.Matches), matchWord, langSuffix) + i+1, displayPath, file.BestScore, len(file.Matches), matchWord, langSuffix) + + // Suppress the per-match score line when there's exactly one match + // and its score equals the file-level best score — the two would + // just print the same number twice. With multiple matches we keep + // the per-match score because it differentiates them. + suppressMatchScore := len(file.Matches) == 1 && file.Matches[0].Score == file.BestScore for _, m := range file.Matches { // Per-match separator with score + line range + label so the @@ -143,7 +180,11 @@ func runSearch(cmd *cobra.Command, args []string) error { if m.EndLine != m.StartLine { rangeStr = fmt.Sprintf("lines %d-%d", m.StartLine, m.EndLine) } - fmt.Printf(" -- [%.2f] %s (%s)\n", m.Score, rangeStr, label) + if suppressMatchScore { + fmt.Printf(" -- %s (%s)\n", rangeStr, label) + } else { + fmt.Printf(" -- [%.2f] %s (%s)\n", m.Score, rangeStr, label) + } lang := file.Language fmt.Printf(" ```%s\n", lang) diff --git a/cli/cmd/search_test.go b/cli/cmd/search_test.go index 609e7e4..f0b4102 100644 --- a/cli/cmd/search_test.go +++ b/cli/cmd/search_test.go @@ -1,6 +1,8 @@ package cmd import ( + "encoding/json" + "io" "net/http" "strings" "testing" @@ -46,7 +48,7 @@ func TestRunSearch_Results(t *testing.T) { defer func() { searchProject = old }() searchProject = proj searchLimit = 10 - searchMinScore = 0.1 + searchMinScore = 0.4 searchLanguages = nil searchPaths = nil @@ -88,7 +90,7 @@ func TestRunSearch_EmptyResults(t *testing.T) { defer func() { searchProject = old }() searchProject = proj searchLimit = 10 - searchMinScore = 0.1 + searchMinScore = 0.4 searchLanguages = nil searchPaths = nil @@ -124,7 +126,7 @@ func TestRunSearch_APIError(t *testing.T) { defer func() { searchProject = old }() searchProject = proj searchLimit = 10 - searchMinScore = 0.1 + searchMinScore = 0.4 searchLanguages = nil searchPaths = nil @@ -140,6 +142,159 @@ func TestRunSearch_APIError(t *testing.T) { } } +func TestRunSearch_OutputUsesRelativePath(t *testing.T) { + // Verifies the cosmetic-but-impactful path-shortening: output should + // show paths relative to the project root, not the full /Users/.../foo + // absolute path. Saves ~50 chars per result line in agent context. + proj := t.TempDir() + hash := projectHash(proj) + + srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) { + switch { + case strings.HasSuffix(r.URL.Path, "/api/v1/projects"): + writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0}) + case strings.Contains(r.URL.Path, hash+"/search") && r.Method == "POST": + writeJSON(w, 200, map[string]any{ + "results": []map[string]any{ + { + "file_path": proj + "/server/internal/search.go", + "language": "go", + "best_score": 0.88, + "matches": []map[string]any{ + { + "start_line": 1, + "end_line": 5, + "content": "x", + "score": 0.88, + "chunk_type": "function", + "symbol_name": "Search", + }, + }, + }, + }, + "total": 1, + "query_time_ms": 1.0, + }) + default: + http.NotFound(w, r) + } + }) + useAPI(t, srv) + + resetSearchFlags() + defer resetSearchFlags() + searchProject = proj + + out, err := captureOutput(func() error { return runSearch(nil, []string{"q"}) }) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + + // The result-header line must contain the relative path, NOT the full + // absolute path. The exact line shape is "1. [best 0.88] ...". + if !strings.Contains(out, "server/internal/search.go") { + t.Errorf("expected relative path in output, got:\n%s", out) + } + if strings.Contains(out, proj+"/server/internal/search.go") { + t.Errorf("absolute path leaked into output:\n%s", out) + } +} + +func TestRunSearch_SuppressesScoreOnSingleMatch(t *testing.T) { + // When a file has exactly one match and its score equals BestScore, + // the renderer drops the redundant inner "[0.88]" line. + proj := t.TempDir() + hash := projectHash(proj) + + srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) { + switch { + case strings.HasSuffix(r.URL.Path, "/api/v1/projects"): + writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0}) + case strings.Contains(r.URL.Path, hash+"/search"): + writeJSON(w, 200, map[string]any{ + "results": []map[string]any{{ + "file_path": proj + "/x.go", + "language": "go", + "best_score": 0.88, + "matches": []map[string]any{{ + "start_line": 1, "end_line": 5, "content": "x", "score": 0.88, + "chunk_type": "function", "symbol_name": "X", + }}, + }}, + "total": 1, "query_time_ms": 1.0, + }) + default: + http.NotFound(w, r) + } + }) + useAPI(t, srv) + + resetSearchFlags() + defer resetSearchFlags() + searchProject = proj + + out, err := captureOutput(func() error { return runSearch(nil, []string{"q"}) }) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + + // "[best 0.88]" must appear once (file header), not twice. + if c := strings.Count(out, "0.88"); c != 1 { + t.Errorf("expected score 0.88 to appear exactly once, got %d. Output:\n%s", c, out) + } +} + +func TestRunSearch_SendsExcludesToServer(t *testing.T) { + // --exclude must end up in the search request body so the server can + // honour it. Verifies the CLI → client → request body wiring. + proj := t.TempDir() + hash := projectHash(proj) + + var captured map[string]any + srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) { + switch { + case strings.HasSuffix(r.URL.Path, "/api/v1/projects"): + writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0}) + case strings.Contains(r.URL.Path, hash+"/search"): + body, _ := io.ReadAll(r.Body) + _ = json.Unmarshal(body, &captured) + writeJSON(w, 200, map[string]any{"results": []any{}, "total": 0, "query_time_ms": 1.0}) + default: + http.NotFound(w, r) + } + }) + useAPI(t, srv) + + resetSearchFlags() + defer resetSearchFlags() + searchProject = proj + searchExcludes = []string{"bench/fixtures", "legacy"} + + if _, err := captureOutput(func() error { return runSearch(nil, []string{"q"}) }); err != nil { + t.Fatalf("unexpected error: %v", err) + } + + rawExcl, ok := captured["excludes"].([]any) + if !ok { + t.Fatalf("excludes missing from request body; got: %v", captured) + } + if len(rawExcl) != 2 { + t.Errorf("expected 2 excludes, got %d", len(rawExcl)) + } +} + +// resetSearchFlags clears the package-level cobra flag vars between tests. +// Without it, a value set in one test leaks into the next via the shared +// var captures inside searchCmd. +func resetSearchFlags() { + searchProject = "" + searchLimit = 10 + searchMinScore = 0.4 + searchLanguages = nil + searchPaths = nil + searchExcludes = nil +} + func TestRunSearch_SubdirectoryResolvesToProject(t *testing.T) { proj := t.TempDir() sub := proj + "/src/api" @@ -166,7 +321,7 @@ func TestRunSearch_SubdirectoryResolvesToProject(t *testing.T) { // Set project to a subdirectory — should resolve to proj root searchProject = sub searchLimit = 10 - searchMinScore = 0.1 + searchMinScore = 0.4 searchLanguages = nil searchPaths = nil diff --git a/cli/cmd/status.go b/cli/cmd/status.go index fd073f9..49ed298 100644 --- a/cli/cmd/status.go +++ b/cli/cmd/status.go @@ -5,7 +5,9 @@ import ( "os" "path/filepath" "strings" + "time" + "github.com/anthropics/code-index/cli/internal/daemon" "github.com/spf13/cobra" ) @@ -73,6 +75,21 @@ func runStatus(cmd *cobra.Command, args []string) error { fmt.Printf("\nLast indexed: %s\n", project.LastIndexedAt.Format("2006-01-02 15:04:05")) } + // Watcher daemon status — surfaces silent stale-index situations where + // the user thinks the index is fresh because LastIndexedAt is recent, + // but the watcher has actually died and the project has drifted. + wstatus := daemon.GetStatus(absPath) + if wstatus.Running { + fmt.Printf("Watcher: ✓ running (PID %d)\n", wstatus.PID) + } else { + fmt.Print("Watcher: ✗ not running") + if project.LastIndexedAt != nil { + elapsed := time.Since(*project.LastIndexedAt) + fmt.Printf(" — last index sync %s ago", humanDuration(elapsed)) + } + fmt.Println() + } + // Get indexing progress if in progress if project.Status == "indexing" { fmt.Println("\nIndexing in progress...") @@ -87,6 +104,22 @@ func runStatus(cmd *cobra.Command, args []string) error { return nil } +// humanDuration returns a human-readable approximation of d for the +// "watcher down for ..." status line. Coarse on purpose — exact seconds +// are noise here; the user just needs to know whether drift is plausible. +func humanDuration(d time.Duration) string { + switch { + case d < time.Minute: + return fmt.Sprintf("%ds", int(d.Seconds())) + case d < time.Hour: + return fmt.Sprintf("%dm", int(d.Minutes())) + case d < 24*time.Hour: + return fmt.Sprintf("%.1fh", d.Hours()) + default: + return fmt.Sprintf("%.1fd", d.Hours()/24) + } +} + func formatStatus(status string) string { switch status { case "indexed": diff --git a/cli/internal/client/search.go b/cli/internal/client/search.go index 093be6a..c1eb3bc 100644 --- a/cli/internal/client/search.go +++ b/cli/internal/client/search.go @@ -69,6 +69,7 @@ type SearchOptions struct { Limit int `json:"limit"` Languages []string `json:"languages,omitempty"` Paths []string `json:"paths,omitempty"` + Excludes []string `json:"excludes,omitempty"` MinScore float64 `json:"min_score,omitempty"` } @@ -87,6 +88,9 @@ func (c *Client) Search(projectPath, query string, opts SearchOptions) (*SearchR if len(opts.Paths) > 0 { body["paths"] = opts.Paths } + if len(opts.Excludes) > 0 { + body["excludes"] = opts.Excludes + } if opts.MinScore > 0 { body["min_score"] = opts.MinScore } diff --git a/server/bench/bench_eval_retrieval.go b/server/bench/bench_eval_retrieval.go new file mode 100644 index 0000000..02bb632 --- /dev/null +++ b/server/bench/bench_eval_retrieval.go @@ -0,0 +1,277 @@ +//go:build bench_eval_retrieval + +// bench_eval_retrieval runs queries.json against a live cix-server and +// reports precision@K plus anti-path leakage per category. Use it to +// compare retrieval quality before/after toggling CIX_EMBED_INCLUDE_PATH. +// +// Prerequisites: +// +// 1. cix-server is running locally (any port — pass via -url). +// 2. The target project has been indexed under whichever embedding format +// you want to measure. To compare formats, run twice: once with the +// old format (CIX_EMBED_INCLUDE_PATH=false + reindex), once with the +// new format (CIX_EMBED_INCLUDE_PATH=true + reindex), capturing each +// run's output to a separate file, then diff. +// +// Usage: +// +// go run -tags=bench_eval_retrieval ./bench_eval_retrieval.go \ +// -url http://localhost:21847 \ +// -api-key "$(grep CIX_API_KEY ../.env | cut -d= -f2)" \ +// -project /Users/dvcdsys/Cursor/claude-code-index \ +// -queries queries.json +// +// Output: a per-category summary plus a per-query verdict line. Exit code 0 +// always — this is a measurement tool, not a pass/fail gate. + +package main + +import ( + "bytes" + "crypto/sha1" + "encoding/json" + "flag" + "fmt" + "io" + "net/http" + "net/url" + "os" + "path/filepath" + "sort" + "strings" + "time" +) + +type queryDef struct { + Query string `json:"query"` + Category string `json:"category"` + K int `json:"k"` + ExpectedPaths []string `json:"expected_paths"` + AntiPaths []string `json:"anti_paths"` +} + +type queryFile struct { + Queries []queryDef `json:"queries"` +} + +type searchRequest struct { + Query string `json:"query"` + Limit int `json:"limit"` + MinScore *float64 `json:"min_score,omitempty"` +} + +type searchResultItem struct { + FilePath string `json:"file_path"` + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + Score float64 `json:"score"` + Language string `json:"language"` +} + +type searchResponse struct { + Results []searchResultItem `json:"results"` + Total int `json:"total"` +} + +func main() { + urlFlag := flag.String("url", "http://localhost:21847", "cix-server URL") + apiKey := flag.String("api-key", os.Getenv("CIX_API_KEY"), "API key (or set CIX_API_KEY env var)") + project := flag.String("project", "", "absolute path of the indexed project (required)") + queriesPath := flag.String("queries", "queries.json", "path to queries.json") + verbose := flag.Bool("v", false, "print per-query verdicts in addition to per-category summary") + flag.Parse() + + if *project == "" { + fmt.Fprintln(os.Stderr, "error: -project is required") + os.Exit(2) + } + if *apiKey == "" { + fmt.Fprintln(os.Stderr, "warn: -api-key not set; requests will be unauthenticated (only works in dev mode)") + } + + abs, err := filepath.Abs(*project) + if err != nil { + fmt.Fprintln(os.Stderr, "abs project path:", err) + os.Exit(1) + } + + data, err := os.ReadFile(*queriesPath) + if err != nil { + fmt.Fprintln(os.Stderr, "read queries:", err) + os.Exit(1) + } + var qf queryFile + if err := json.Unmarshal(data, &qf); err != nil { + fmt.Fprintln(os.Stderr, "parse queries:", err) + os.Exit(1) + } + + hash := projectHash(abs) + endpoint, err := url.JoinPath(*urlFlag, "api", "v1", "projects", hash, "search") + if err != nil { + fmt.Fprintln(os.Stderr, "join url:", err) + os.Exit(1) + } + + client := &http.Client{Timeout: 30 * time.Second} + + type catStats struct { + queries int + precisionHits int + antiLeaks int + expectedTopCount int // sum of topK slots that were expected hits + expectedTopBudget int // sum of K across queries (denominator for avg precision) + } + stats := map[string]*catStats{} + + if *verbose { + fmt.Printf("%-12s %-8s %s\n", "category", "verdict", "query") + fmt.Println(strings.Repeat("-", 80)) + } + + for _, q := range qf.Queries { + cs, ok := stats[q.Category] + if !ok { + cs = &catStats{} + stats[q.Category] = cs + } + cs.queries++ + + k := q.K + if k <= 0 { + k = 5 + } + cs.expectedTopBudget += k + + results, err := doSearch(client, endpoint, *apiKey, q.Query, k) + if err != nil { + fmt.Fprintln(os.Stderr, "query failed:", q.Query, err) + continue + } + + hit := false + antiHit := false + expectedHits := 0 + for _, r := range results { + rel, _ := filepath.Rel(abs, r.FilePath) + if rel == "" { + rel = r.FilePath + } + if matchesAny(rel, q.ExpectedPaths) { + hit = true + expectedHits++ + } + if matchesAny(rel, q.AntiPaths) { + antiHit = true + } + } + if hit { + cs.precisionHits++ + } + if antiHit { + cs.antiLeaks++ + } + cs.expectedTopCount += expectedHits + + if *verbose { + verdict := "PASS" + if !hit { + verdict = "MISS" + } + extra := "" + if antiHit { + extra = " ⚠ anti-leak" + } + fmt.Printf("%-12s %-8s %s%s\n", q.Category, verdict, q.Query, extra) + } + } + + // Summary + fmt.Println() + fmt.Println("=== Retrieval quality summary ===") + fmt.Printf("%-12s %8s %12s %12s %14s\n", "category", "queries", "any-hit @K", "anti-leak", "avg recall@K") + fmt.Println(strings.Repeat("-", 72)) + cats := make([]string, 0, len(stats)) + for c := range stats { + cats = append(cats, c) + } + sort.Strings(cats) + for _, c := range cats { + s := stats[c] + anyHitPct := pct(s.precisionHits, s.queries) + antiLeakPct := pct(s.antiLeaks, s.queries) + recallPct := pct(s.expectedTopCount, s.expectedTopBudget) + fmt.Printf("%-12s %8d %11.1f%% %11.1f%% %13.1f%%\n", + c, s.queries, anyHitPct, antiLeakPct, recallPct) + } + fmt.Println() + fmt.Println("Note: any-hit@K = fraction of queries with at least one expected_paths match in top-K.") + fmt.Println(" anti-leak = fraction of queries with any anti_paths match in top-K (LOWER is better).") + fmt.Println(" avg recall@K = sum(expected hits) / sum(K) — fraction of top-K slots that were expected.") +} + +func doSearch(client *http.Client, endpoint, apiKey, query string, limit int) ([]searchResultItem, error) { + body, _ := json.Marshal(searchRequest{Query: query, Limit: limit}) + req, err := http.NewRequest(http.MethodPost, endpoint, bytes.NewReader(body)) + if err != nil { + return nil, err + } + req.Header.Set("Content-Type", "application/json") + if apiKey != "" { + req.Header.Set("Authorization", "Bearer "+apiKey) + } + resp, err := client.Do(req) + if err != nil { + return nil, err + } + defer resp.Body.Close() + if resp.StatusCode != http.StatusOK { + snippet, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) + return nil, fmt.Errorf("status %d: %s", resp.StatusCode, string(snippet)) + } + var sr searchResponse + if err := json.NewDecoder(resp.Body).Decode(&sr); err != nil { + return nil, err + } + return sr.Results, nil +} + +func matchesAny(rel string, prefixes []string) bool { + rel = filepath.ToSlash(rel) + for _, p := range prefixes { + p = filepath.ToSlash(p) + // Treat trailing slash as "any descendant" match; otherwise require + // either an exact match or a directory match. + if strings.HasSuffix(p, "/") { + if strings.HasPrefix(rel, p) { + return true + } + } else { + if rel == p || strings.HasPrefix(rel, p+"/") { + return true + } + } + } + return false +} + +func pct(n, d int) float64 { + if d == 0 { + return 0 + } + return 100 * float64(n) / float64(d) +} + +// projectHash mirrors projects.HashPath: first 16 hex chars of sha1(path). +func projectHash(absPath string) string { + h := sha1.New() + h.Write([]byte(absPath)) + b := h.Sum(nil) + const hexchars = "0123456789abcdef" + out := make([]byte, 16) + for i := 0; i < 8; i++ { + out[i*2] = hexchars[b[i]>>4] + out[i*2+1] = hexchars[b[i]&0xf] + } + return string(out) +} diff --git a/server/bench/queries.json b/server/bench/queries.json new file mode 100644 index 0000000..a0a33f2 --- /dev/null +++ b/server/bench/queries.json @@ -0,0 +1,222 @@ +{ + "_comment": "Retrieval-quality benchmark queries for the cix repository. Each query is paired with paths that should rank in the top-k (expected_paths) and paths that should NOT (anti_paths). Run before/after toggling CIX_EMBED_INCLUDE_PATH and compare precision@k + anti-path leakage.", + "_categories": { + "concept": "Natural-language queries that describe a feature/intent, not a specific identifier.", + "path_hint": "Queries containing words that appear in file paths (server, cli, watcher) — these benefit most from path-in-embedding.", + "identifier": "Single-token camelCase/snake_case names that look like symbols. cix symbols already covers these; goal is no regression." + }, + "queries": [ + { + "query": "main entry point server", + "category": "concept", + "k": 5, + "expected_paths": ["server/cmd/cix-server/main.go"], + "anti_paths": ["bench/fixtures/", "legacy/"] + }, + { + "query": "supervisor unix socket llama", + "category": "concept", + "k": 3, + "expected_paths": ["server/internal/embeddings/supervisor.go"], + "anti_paths": ["legacy/"] + }, + { + "query": "tree-sitter chunk function class method", + "category": "concept", + "k": 5, + "expected_paths": ["server/internal/chunker/chunker.go"], + "anti_paths": ["legacy/"] + }, + { + "query": "three phase indexing protocol", + "category": "concept", + "k": 3, + "expected_paths": ["server/internal/indexer/indexer.go"], + "anti_paths": ["legacy/"] + }, + { + "query": "semantic search handler embedding cosine", + "category": "concept", + "k": 3, + "expected_paths": ["server/internal/httpapi/search.go"], + "anti_paths": ["legacy/"] + }, + { + "query": "vector store chromem cosine similarity", + "category": "concept", + "k": 5, + "expected_paths": ["server/internal/vectorstore/store.go"], + "anti_paths": [] + }, + { + "query": "fsnotify file watcher debounce", + "category": "concept", + "k": 5, + "expected_paths": ["cli/internal/watcher/watcher.go"], + "anti_paths": ["legacy/"] + }, + { + "query": "incremental reindex sha256 file hash", + "category": "concept", + "k": 5, + "expected_paths": ["server/internal/indexer/", "cli/internal/indexer/"], + "anti_paths": [] + }, + { + "query": "config environment variables CIX prefix", + "category": "concept", + "k": 5, + "expected_paths": ["server/internal/config/config.go", "cli/internal/config/config.go"], + "anti_paths": [] + }, + { + "query": "bash regex chunker fallback", + "category": "concept", + "k": 5, + "expected_paths": ["server/internal/chunker/"], + "anti_paths": [] + }, + + { + "query": "server search handler", + "category": "path_hint", + "k": 5, + "expected_paths": ["server/internal/httpapi/search.go"], + "anti_paths": ["legacy/", "bench/fixtures/"] + }, + { + "query": "cli watch daemon", + "category": "path_hint", + "k": 5, + "expected_paths": ["cli/cmd/watch.go", "cli/internal/daemon/daemon.go"], + "anti_paths": ["legacy/"] + }, + { + "query": "embeddings supervisor", + "category": "path_hint", + "k": 3, + "expected_paths": ["server/internal/embeddings/supervisor.go"], + "anti_paths": [] + }, + { + "query": "chunker chunk file", + "category": "path_hint", + "k": 5, + "expected_paths": ["server/internal/chunker/chunker.go"], + "anti_paths": [] + }, + { + "query": "indexer process files", + "category": "path_hint", + "k": 5, + "expected_paths": ["server/internal/indexer/indexer.go"], + "anti_paths": [] + }, + { + "query": "cli search command flags", + "category": "path_hint", + "k": 5, + "expected_paths": ["cli/cmd/search.go"], + "anti_paths": [] + }, + { + "query": "vectorstore store delete by file", + "category": "path_hint", + "k": 5, + "expected_paths": ["server/internal/vectorstore/store.go"], + "anti_paths": [] + }, + { + "query": "httpapi router middleware", + "category": "path_hint", + "k": 5, + "expected_paths": ["server/internal/httpapi/"], + "anti_paths": [] + }, + { + "query": "symbolindex SQLite symbols upsert", + "category": "path_hint", + "k": 5, + "expected_paths": ["server/internal/symbolindex/"], + "anti_paths": [] + }, + { + "query": "watcher daemon pid status", + "category": "path_hint", + "k": 5, + "expected_paths": ["cli/internal/daemon/daemon.go", "cli/cmd/watch.go"], + "anti_paths": [] + }, + + { + "query": "FormatChunkForEmbedding", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/embeddings/format.go"], + "anti_paths": [] + }, + { + "query": "semanticSearchHandler", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/httpapi/search.go"], + "anti_paths": [] + }, + { + "query": "BeginIndexing", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/indexer/indexer.go"], + "anti_paths": [] + }, + { + "query": "ChunkFile", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/chunker/chunker.go"], + "anti_paths": [] + }, + { + "query": "EmbedTexts", + "category": "identifier", + "k": 5, + "expected_paths": ["server/internal/embeddings/", "server/internal/indexer/"], + "anti_paths": [] + }, + { + "query": "TokenizeAndEmbed", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/embeddings/service.go"], + "anti_paths": [] + }, + { + "query": "mergeOverlappingHits", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/httpapi/"], + "anti_paths": [] + }, + { + "query": "DeleteCollection", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/vectorstore/", "server/internal/indexer/"], + "anti_paths": [] + }, + { + "query": "FilePayload", + "category": "identifier", + "k": 3, + "expected_paths": ["server/internal/indexer/", "cli/internal/client/", "cli/internal/indexer/"], + "anti_paths": [] + }, + { + "query": "runWatchStatus", + "category": "identifier", + "k": 3, + "expected_paths": ["cli/cmd/watch.go"], + "anti_paths": [] + } + ] +} diff --git a/server/cmd/cix-server/main.go b/server/cmd/cix-server/main.go index 7364400..263e5c6 100644 --- a/server/cmd/cix-server/main.go +++ b/server/cmd/cix-server/main.go @@ -132,6 +132,10 @@ func run() error { } idx := indexer.New(database, vs, embedSvc, logger) + idx.SetEmbedIncludePath(cfg.EmbedIncludePath) + if cfg.EmbedIncludePath { + logger.Info("embedding format: path-aware preamble enabled (CIX_EMBED_INCLUDE_PATH=true) — full reindex required if upgrading") + } // Stop housekeeping goroutines during shutdown so sessionTTL timers do not // leak for up to 1h past shutdown. m8 fix. defer idx.Shutdown() diff --git a/server/internal/config/config.go b/server/internal/config/config.go index c7a42ef..f60832f 100644 --- a/server/internal/config/config.go +++ b/server/internal/config/config.go @@ -38,6 +38,13 @@ type Config struct { LlamaStartupSec int // CIX_LLAMA_STARTUP_TIMEOUT; readiness probe ceiling in seconds. EmbeddingsEnabled bool // CIX_EMBEDDINGS_ENABLED; test hook to bypass sidecar entirely. + // EmbedIncludePath toggles a path+language+symbol preamble in front of + // each chunk before sending it to the embedder. Improves retrieval for + // queries whose terms appear in file paths (e.g. "server search handler"), + // at the cost of requiring a full reindex when toggled. Source: + // CIX_EMBED_INCLUDE_PATH (default true). + EmbedIncludePath bool + // Languages narrows the chunker's active language set. Empty / unset // activates all baked-in defaults (see chunker.defaultRegistry). Values // not present in the registry are warned-and-ignored at startup. @@ -152,6 +159,12 @@ func Load() (*Config, error) { } c.EmbeddingsEnabled = enabled + includePath, err := getenvBool("CIX_EMBED_INCLUDE_PATH", true) + if err != nil { + return nil, err + } + c.EmbedIncludePath = includePath + if langs := getenv("CIX_LANGUAGES", ""); langs != "" { for _, l := range strings.Split(langs, ",") { if s := strings.TrimSpace(l); s != "" { diff --git a/server/internal/embeddings/format.go b/server/internal/embeddings/format.go new file mode 100644 index 0000000..dcb8663 --- /dev/null +++ b/server/internal/embeddings/format.go @@ -0,0 +1,59 @@ +package embeddings + +import ( + "strings" + + "github.com/dvcdsys/code-index/server/internal/chunker" +) + +// FormatChunkForEmbedding builds the text passed to the embedder for a chunk. +// It optionally prepends a natural-language preamble carrying the relative +// path, language, and symbol kind+name. Code-trained embedders interpret this +// preamble as docstring-style context — empirically it improves retrieval for +// path-aware queries (e.g. "server search handler") because file paths +// contribute high-signal tokens that bare chunk content lacks. +// +// Off-recipe for CodeRankEmbed (whose passage side was trained on raw code), +// but the cost is a few dozen extra tokens per chunk and the gain on this +// repo's "main entry point server" type queries is large enough to be worth +// the trade. Switching this format on or off requires a full reindex — +// vectors are not interchangeable between formats. +// +// When relPath is empty (or includePath=false), the function falls back to +// the legacy ": " prefix that the Python indexer used, +// preserving parity for projects that have not yet reindexed. +func FormatChunkForEmbedding(c chunker.Chunk, relPath string, includePath bool) string { + if !includePath || relPath == "" { + return c.ChunkType + ": " + c.Content + } + + var sb strings.Builder + sb.Grow(len(relPath) + len(c.Content) + 64) + + sb.WriteString("File: ") + sb.WriteString(relPath) + sb.WriteByte('\n') + + if c.Language != "" { + sb.WriteString("Language: ") + sb.WriteString(c.Language) + sb.WriteByte('\n') + } + + // Symbol metadata is only included for nameable chunks. "module" / "block" + // chunks have no symbol and would just add noise. The chunker stores + // SymbolName as a *string; nil means "no symbol". + if c.SymbolName != nil && *c.SymbolName != "" { + switch c.ChunkType { + case "function", "class", "method", "type": + sb.WriteString(c.ChunkType) + sb.WriteString(": ") + sb.WriteString(*c.SymbolName) + sb.WriteByte('\n') + } + } + + sb.WriteByte('\n') + sb.WriteString(c.Content) + return sb.String() +} diff --git a/server/internal/embeddings/format_test.go b/server/internal/embeddings/format_test.go new file mode 100644 index 0000000..6d44925 --- /dev/null +++ b/server/internal/embeddings/format_test.go @@ -0,0 +1,102 @@ +package embeddings + +import ( + "strings" + "testing" + + "github.com/dvcdsys/code-index/server/internal/chunker" +) + +func ptr(s string) *string { return &s } + +func TestFormatChunkForEmbedding_Disabled(t *testing.T) { + c := chunker.Chunk{ + Content: "func main() {}", + ChunkType: "function", + Language: "go", + } + got := FormatChunkForEmbedding(c, "cmd/main.go", false) + want := "function: func main() {}" + if got != want { + t.Errorf("includePath=false: got %q, want %q", got, want) + } +} + +func TestFormatChunkForEmbedding_EmptyRelPath(t *testing.T) { + c := chunker.Chunk{ + Content: "x := 1", + ChunkType: "module", + Language: "go", + } + got := FormatChunkForEmbedding(c, "", true) + want := "module: x := 1" + if got != want { + t.Errorf("empty relPath: got %q, want %q", got, want) + } +} + +func TestFormatChunkForEmbedding_FunctionWithSymbol(t *testing.T) { + c := chunker.Chunk{ + Content: "func semanticSearchHandler() {}", + ChunkType: "function", + Language: "go", + SymbolName: ptr("semanticSearchHandler"), + } + got := FormatChunkForEmbedding(c, "server/internal/httpapi/search.go", true) + wantContains := []string{ + "File: server/internal/httpapi/search.go", + "Language: go", + "function: semanticSearchHandler", + "func semanticSearchHandler() {}", + } + for _, w := range wantContains { + if !strings.Contains(got, w) { + t.Errorf("output missing %q\nfull output:\n%s", w, got) + } + } +} + +func TestFormatChunkForEmbedding_ModuleChunkOmitsSymbol(t *testing.T) { + // Module chunks have no symbol and SymbolName is nil; ensure we don't + // emit a "module: " line. Even if a symbol leaks in, module/block kinds + // must not produce a symbol preamble line (would add path-correlated + // noise to gap-filler chunks). + c := chunker.Chunk{ + Content: "import \"fmt\"", + ChunkType: "module", + Language: "go", + SymbolName: ptr("Anything"), + } + got := FormatChunkForEmbedding(c, "main.go", true) + if strings.Contains(got, "module:") { + t.Errorf("module chunk should not produce 'module:' preamble, got:\n%s", got) + } + if !strings.Contains(got, "File: main.go") { + t.Errorf("expected File: line, got:\n%s", got) + } +} + +func TestFormatChunkForEmbedding_OmitsLangWhenEmpty(t *testing.T) { + c := chunker.Chunk{ + Content: "raw text", + ChunkType: "module", + } + got := FormatChunkForEmbedding(c, "README", true) + if strings.Contains(got, "Language:") { + t.Errorf("empty Language should not emit Language: line, got:\n%s", got) + } +} + +func TestFormatChunkForEmbedding_PreservesContentBytes(t *testing.T) { + // The raw chunk content must appear in the output unchanged — the + // preamble is additive, never lossy. + c := chunker.Chunk{ + Content: "line1\nline2\n indented\n", + ChunkType: "function", + Language: "go", + } + got := FormatChunkForEmbedding(c, "x.go", true) + if !strings.HasSuffix(got, c.Content) { + t.Errorf("output must end with raw content; got:\n%s", got) + } +} diff --git a/server/internal/httpapi/search.go b/server/internal/httpapi/search.go index 432407d..d745182 100644 --- a/server/internal/httpapi/search.go +++ b/server/internal/httpapi/search.go @@ -576,6 +576,11 @@ type searchRequest struct { Limit int `json:"limit"` Languages []string `json:"languages"` Paths []string `json:"paths"` + // Excludes drops any result whose file path matches one of the prefixes. + // Mirrors Paths' matching semantics (prefix or substring) but inverted — + // useful for ignoring fixtures, vendored, or legacy directories without + // adding them to .cixignore (which would prevent indexing entirely). + Excludes []string `json:"excludes"` // MinScore is a pointer so we can distinguish "not provided" from an // explicit zero. Python uses a Pydantic default (0.1) which also allows // explicit 0 through — mirror that here. m2 fix. @@ -676,7 +681,10 @@ func semanticSearchHandler(d Deps) http.HandlerFunc { } // m2 — only apply default when the caller did not send the field. // Explicit 0 means "return everything above the HNSW floor". - minScore := float32(0.1) + // Default 0.4 calibrated against CodeRankEmbed-Q8_0 with the + // path-aware embedding format. Clients that want the historical + // 0.1 behavior must send min_score explicitly. + minScore := float32(0.4) if body.MinScore != nil { minScore = *body.MinScore } @@ -729,7 +737,7 @@ func semanticSearchHandler(d Deps) http.HandlerFunc { writeError(w, http.StatusInternalServerError, err.Error()) return } - filtered := filterToSearchItems(rawWrapped, minScore, body.Paths, langSet, applyPostLangFilter) + filtered := filterToSearchItems(rawWrapped, minScore, body.Paths, body.Excludes, langSet, applyPostLangFilter) merged := mergeOverlappingHits(filtered) fileGroups = groupByFile(merged) if len(fileGroups) >= body.Limit { @@ -876,14 +884,16 @@ func fetchVectorResults( } } -// filterToSearchItems applies min-score, language post-filter, and path -// prefix/substring matches. It does NOT truncate — the merge step needs -// the full filtered set to identify all overlaps before deciding which to -// drop. Truncation happens after merge in the caller. +// filterToSearchItems applies min-score, language post-filter, path +// whitelist (paths), and path blacklist (excludes). It does NOT truncate — +// the merge step needs the full filtered set to identify all overlaps +// before deciding which to drop. Truncation happens after merge in the +// caller. func filterToSearchItems( wrapped []vectorStoreResult, minScore float32, paths []string, + excludes []string, langSet map[string]struct{}, applyPostLangFilter bool, ) []searchResultItem { @@ -910,6 +920,18 @@ func filterToSearchItems( continue } } + if len(excludes) > 0 { + excluded := false + for _, pfx := range excludes { + if strings.HasPrefix(res.FilePath, pfx) || strings.Contains(res.FilePath, pfx) { + excluded = true + break + } + } + if excluded { + continue + } + } filtered = append(filtered, searchResultItem{ FilePath: res.FilePath, StartLine: res.StartLine, diff --git a/server/internal/httpapi/search_filter_test.go b/server/internal/httpapi/search_filter_test.go new file mode 100644 index 0000000..8c28758 --- /dev/null +++ b/server/internal/httpapi/search_filter_test.go @@ -0,0 +1,85 @@ +package httpapi + +import ( + "testing" + + "github.com/dvcdsys/code-index/server/internal/vectorstore" +) + +func mkRawResult(path string, score float32) vectorStoreResult { + return vectorStoreResult{ + r: vectorstore.SearchResult{ + FilePath: path, + StartLine: 1, + EndLine: 10, + Content: "stub", + Score: score, + Language: "go", + }, + } +} + +func TestFilterToSearchItems_ExcludesPrefixDropsMatchingPaths(t *testing.T) { + raw := []vectorStoreResult{ + mkRawResult("/proj/server/cmd/cix-server/main.go", 0.9), + mkRawResult("/proj/bench/fixtures/sample.py", 0.85), + mkRawResult("/proj/legacy/python-api/scripts/profile_vram.py", 0.8), + mkRawResult("/proj/cli/main.go", 0.7), + } + + out := filterToSearchItems(raw, 0.0, nil, []string{"/proj/bench", "/proj/legacy"}, nil, false) + if len(out) != 2 { + t.Fatalf("want 2 results after exclude, got %d", len(out)) + } + for _, r := range out { + if r.FilePath == "/proj/bench/fixtures/sample.py" || r.FilePath == "/proj/legacy/python-api/scripts/profile_vram.py" { + t.Errorf("excluded path leaked through: %s", r.FilePath) + } + } +} + +func TestFilterToSearchItems_ExcludesSubstringMatch(t *testing.T) { + // Substring match parity with --in: an exclude of "fixtures" drops any + // path that contains the substring, not just a prefix match. + raw := []vectorStoreResult{ + mkRawResult("/proj/server/cmd/cix-server/main.go", 0.9), + mkRawResult("/proj/bench/fixtures/sample.py", 0.85), + } + out := filterToSearchItems(raw, 0.0, nil, []string{"fixtures"}, nil, false) + if len(out) != 1 { + t.Fatalf("want 1 after substring exclude, got %d", len(out)) + } + if out[0].FilePath != "/proj/server/cmd/cix-server/main.go" { + t.Errorf("unexpected survivor: %s", out[0].FilePath) + } +} + +func TestFilterToSearchItems_ExcludesAndPathsCombined(t *testing.T) { + // --in narrows to a directory; --exclude further trims a subdirectory. + raw := []vectorStoreResult{ + mkRawResult("/proj/server/internal/httpapi/search.go", 0.9), + mkRawResult("/proj/server/internal/httpapi/search_test.go", 0.85), + mkRawResult("/proj/cli/cmd/search.go", 0.8), + } + out := filterToSearchItems(raw, 0.0, + []string{"/proj/server"}, + []string{"_test.go"}, + nil, false) + if len(out) != 1 { + t.Fatalf("want 1 after path+exclude, got %d", len(out)) + } + if out[0].FilePath != "/proj/server/internal/httpapi/search.go" { + t.Errorf("unexpected survivor: %s", out[0].FilePath) + } +} + +func TestFilterToSearchItems_NilExcludesIsNoop(t *testing.T) { + raw := []vectorStoreResult{ + mkRawResult("/a.go", 0.9), + mkRawResult("/b.go", 0.8), + } + out := filterToSearchItems(raw, 0.0, nil, nil, nil, false) + if len(out) != 2 { + t.Errorf("nil excludes must not drop anything; got %d", len(out)) + } +} diff --git a/server/internal/indexer/indexer.go b/server/internal/indexer/indexer.go index be76d1f..5639b06 100644 --- a/server/internal/indexer/indexer.go +++ b/server/internal/indexer/indexer.go @@ -95,6 +95,13 @@ type Service struct { // instead of leaking for up to sessionTTL on server shutdown. stopCh chan struct{} stopOnce sync.Once + + // embedIncludePath, when true, makes ProcessFiles wrap each chunk with + // a "File: \nLanguage: \n..." preamble before embedding. + // Set via SetEmbedIncludePath; default false preserves Python-parity + // ": " formatting for projects that have not been + // reindexed under the new format. + embedIncludePath bool } // New constructs a Service. All deps are required except logger (falls back to @@ -119,6 +126,14 @@ func (s *Service) Shutdown() { s.stopOnce.Do(func() { close(s.stopCh) }) } +// SetEmbedIncludePath toggles the path+language+symbol preamble that +// ProcessFiles prepends to chunk content before embedding. Toggling between +// runs requires a full reindex — vectors trained against the new preamble +// are not interchangeable with vectors trained on bare content. +func (s *Service) SetEmbedIncludePath(v bool) { + s.embedIncludePath = v +} + // --------------------------------------------------------------------------- // Phase 1 — begin // --------------------------------------------------------------------------- @@ -414,10 +429,19 @@ func (s *Service) ProcessFilesStreaming( }) } - // Embed. Python prefixes with "{chunk_type}: {content}". + // Embed. Format depends on embedIncludePath: legacy Python-parity + // "{chunk_type}: {content}" when false, or path+language+symbol + // preamble + content when true (see embeddings.FormatChunkForEmbedding). + // Relative path is computed once per file and reused for all its chunks. + relPath := fp.Path + if s.embedIncludePath { + if rp, rerr := filepath.Rel(projectPath, fp.Path); rerr == nil { + relPath = rp + } + } texts := make([]string, len(chunks)) for i, c := range chunks { - texts[i] = c.ChunkType + ": " + c.Content + texts[i] = embeddings.FormatChunkForEmbedding(c, relPath, s.embedIncludePath) } var embs [][]float32 embedStart := time.Now() @@ -935,5 +959,3 @@ func marshalJSONStringArray(langs []string) string { return b.String() } -// Unused but kept for symmetry with Python: filepath.Base is used by callers. -var _ = filepath.Base diff --git a/server/internal/indexer/indexer_test.go b/server/internal/indexer/indexer_test.go index d6da79a..405ad3f 100644 --- a/server/internal/indexer/indexer_test.go +++ b/server/internal/indexer/indexer_test.go @@ -228,6 +228,130 @@ func TestProcessFiles_HappyPath(t *testing.T) { } } +// capturingEmbedder records every batch passed to EmbedTexts so tests can +// assert what the indexer actually sent to the embedder. Returned vectors +// are zero-valued (still unit length 8) — value content is never checked +// by callers of this helper. +type capturingEmbedder struct { + dim int + calls [][]string +} + +func (c *capturingEmbedder) EmbedTexts(ctx context.Context, texts []string) ([][]float32, error) { + captured := append([]string(nil), texts...) + c.calls = append(c.calls, captured) + out := make([][]float32, len(texts)) + for i := range out { + v := make([]float32, c.dim) + v[0] = 1 + out[i] = v + } + return out, nil +} + +// TestProcessFiles_EmbedTextFormat_LegacyByDefault verifies that without +// SetEmbedIncludePath, ProcessFiles preserves the historical ": +// " format that Python parity depends on. +func TestProcessFiles_EmbedTextFormat_LegacyByDefault(t *testing.T) { + d := openTestDB(t) + seedProject(t, d, "/proj") + + ctx := context.Background() + emb := &capturingEmbedder{dim: 8} + svc := New(d, newStore(t), emb, nil) + // Default: embedIncludePath is false. + + runID, _, err := svc.BeginIndexing(ctx, "/proj", false) + if err != nil { + t.Fatalf("BeginIndexing: %v", err) + } + goFile := "package main\n\nfunc Hello() {}\n" + files := []FilePayload{{ + Path: "/proj/cmd/hello/main.go", + Content: goFile, + ContentHash: sha256hex(goFile), + Language: "go", + Size: len(goFile), + }} + if _, _, _, err := svc.ProcessFiles(ctx, "/proj", runID, files); err != nil { + t.Fatalf("ProcessFiles: %v", err) + } + + if len(emb.calls) == 0 { + t.Fatal("embedder was never called") + } + first := emb.calls[0] + if len(first) == 0 { + t.Fatal("first batch was empty") + } + for _, txt := range first { + // Legacy format must NOT contain a "File:" preamble. + if containsString(txt, "File: cmd/hello") { + t.Errorf("legacy format leaked path preamble:\n%s", txt) + } + } +} + +// TestProcessFiles_EmbedTextFormat_PathPrefixWhenEnabled verifies that with +// SetEmbedIncludePath(true), every chunk text is wrapped with the path-aware +// preamble produced by embeddings.FormatChunkForEmbedding. +func TestProcessFiles_EmbedTextFormat_PathPrefixWhenEnabled(t *testing.T) { + d := openTestDB(t) + seedProject(t, d, "/proj") + + ctx := context.Background() + emb := &capturingEmbedder{dim: 8} + svc := New(d, newStore(t), emb, nil) + svc.SetEmbedIncludePath(true) + + runID, _, err := svc.BeginIndexing(ctx, "/proj", false) + if err != nil { + t.Fatalf("BeginIndexing: %v", err) + } + goFile := "package main\n\nfunc Hello() {}\n" + files := []FilePayload{{ + Path: "/proj/cmd/hello/main.go", + Content: goFile, + ContentHash: sha256hex(goFile), + Language: "go", + Size: len(goFile), + }} + if _, _, _, err := svc.ProcessFiles(ctx, "/proj", runID, files); err != nil { + t.Fatalf("ProcessFiles: %v", err) + } + + if len(emb.calls) == 0 { + t.Fatal("embedder was never called") + } + first := emb.calls[0] + if len(first) == 0 { + t.Fatal("first batch was empty") + } + hasPathPreamble := false + for _, txt := range first { + if containsString(txt, "File: cmd/hello/main.go") && + containsString(txt, "Language: go") { + hasPathPreamble = true + break + } + } + if !hasPathPreamble { + t.Errorf("expected at least one chunk text to carry 'File: cmd/hello/main.go' + 'Language: go'; sent texts:\n%v", first) + } +} + +func containsString(haystack, needle string) bool { + if len(needle) == 0 { + return true + } + for i := 0; i+len(needle) <= len(haystack); i++ { + if haystack[i:i+len(needle)] == needle { + return true + } + } + return false +} + func TestProcessFiles_EmbedderBusy(t *testing.T) { d := openTestDB(t) seedProject(t, d, "/proj") From f20e9c6fe3ac8b991f9d832480d772b2cfdd5c6b Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Tue, 28 Apr 2026 13:18:44 +0100 Subject: [PATCH 8/9] chore: drop legacy/python-api/ archived tree MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Removes the archived Python/FastAPI backend deprecated on 2026-04-24 per the schedule in doc/MIGRATION_FROM_PYTHON.md. No Go code imports this tree; the migration completed in server/v0.3.0. Updated references: * CONTRIBUTING.md: drop legacy row from repo-tree diagram * doc/MIGRATION_FROM_PYTHON.md: past-tense, retain as historical / rollback recipe for the preserved :0.2-python-legacy Docker tag * doc/DEPRECATION_POLICY.md: past-tense * .github/workflows/codeql.yml: drop two now-stale comment fragments * server/internal/vectorstore/store.go: rephrase docID comment to drop the dead path pointer; the byte-format invariant (md5[:6] → 12 hex chars + line range + idx) stays load-bearing for existing prod indexes * README: .cixignore example legacy/ -> vendor/ to avoid implying the dir still ships server/bench/queries.json keeps "legacy/" in anti_paths defensively; the rules are inert with the dir gone. Slates the next server tag at v0.4.0 per the deletion plan in doc/MIGRATION_FROM_PYTHON.md. Co-Authored-By: Claude Opus 4.7 --- .github/workflows/codeql.yml | 7 +- CONTRIBUTING.md | 1 - README.md | 4 +- doc/DEPRECATION_POLICY.md | 7 +- doc/MIGRATION_FROM_PYTHON.md | 7 +- legacy/python-api/Makefile | 244 ------ legacy/python-api/README.md | 9 - legacy/python-api/app-root/Dockerfile | 53 -- legacy/python-api/app-root/Dockerfile.cuda | 69 -- legacy/python-api/app-root/app/__init__.py | 0 legacy/python-api/app-root/app/auth.py | 18 - legacy/python-api/app-root/app/config.py | 54 -- .../python-api/app-root/app/core/__init__.py | 0 .../app-root/app/core/exceptions.py | 15 - .../python-api/app-root/app/core/language.py | 88 --- .../app-root/app/core/path_encoding.py | 29 - legacy/python-api/app-root/app/database.py | 101 --- legacy/python-api/app-root/app/main.py | 69 -- .../app-root/app/routers/__init__.py | 0 .../python-api/app-root/app/routers/health.py | 36 - .../app-root/app/routers/indexing.py | 157 ---- .../app-root/app/routers/projects.py | 141 ---- .../python-api/app-root/app/routers/search.py | 280 ------- .../app-root/app/schemas/__init__.py | 0 .../python-api/app-root/app/schemas/common.py | 9 - .../app-root/app/schemas/indexing.py | 59 -- .../app-root/app/schemas/project.py | 42 - .../python-api/app-root/app/schemas/search.py | 118 --- .../app-root/app/services/__init__.py | 0 .../app-root/app/services/chunker.py | 454 ----------- .../app-root/app/services/embeddings.py | 184 ----- .../app-root/app/services/file_discovery.py | 121 --- .../app-root/app/services/indexer.py | 614 --------------- .../app-root/app/services/project_config.py | 61 -- .../app-root/app/services/reference_index.py | 70 -- .../app-root/app/services/symbol_index.py | 119 --- .../app-root/app/services/vector_store.py | 135 ---- legacy/python-api/app-root/app/version.py | 2 - .../app-root/migrate_to_path_based.py | 299 ------- .../python-api/app-root/requirements-cuda.txt | 53 -- .../python-api/app-root/requirements-dev.txt | 2 - legacy/python-api/app-root/requirements.txt | 51 -- legacy/python-api/pyproject.toml | 10 - .../scripts/benchmark_embeddings.py | 428 ---------- legacy/python-api/scripts/profile_vram.py | 115 --- legacy/python-api/setup-local.sh | 154 ---- legacy/python-api/setup.sh | 75 -- legacy/python-api/tests/__init__.py | 0 legacy/python-api/tests/test_api.py | 106 --- legacy/python-api/tests/test_chunker.py | 406 ---------- .../python-api/tests/test_file_discovery.py | 111 --- .../python-api/tests/test_project_config.py | 136 ---- legacy/python-api/tests/test_search.py | 111 --- legacy/python-api/uv.lock | 742 ------------------ server/internal/vectorstore/store.go | 10 +- 55 files changed, 17 insertions(+), 6169 deletions(-) delete mode 100644 legacy/python-api/Makefile delete mode 100644 legacy/python-api/README.md delete mode 100644 legacy/python-api/app-root/Dockerfile delete mode 100644 legacy/python-api/app-root/Dockerfile.cuda delete mode 100644 legacy/python-api/app-root/app/__init__.py delete mode 100644 legacy/python-api/app-root/app/auth.py delete mode 100644 legacy/python-api/app-root/app/config.py delete mode 100644 legacy/python-api/app-root/app/core/__init__.py delete mode 100644 legacy/python-api/app-root/app/core/exceptions.py delete mode 100644 legacy/python-api/app-root/app/core/language.py delete mode 100644 legacy/python-api/app-root/app/core/path_encoding.py delete mode 100644 legacy/python-api/app-root/app/database.py delete mode 100644 legacy/python-api/app-root/app/main.py delete mode 100644 legacy/python-api/app-root/app/routers/__init__.py delete mode 100644 legacy/python-api/app-root/app/routers/health.py delete mode 100644 legacy/python-api/app-root/app/routers/indexing.py delete mode 100644 legacy/python-api/app-root/app/routers/projects.py delete mode 100644 legacy/python-api/app-root/app/routers/search.py delete mode 100644 legacy/python-api/app-root/app/schemas/__init__.py delete mode 100644 legacy/python-api/app-root/app/schemas/common.py delete mode 100644 legacy/python-api/app-root/app/schemas/indexing.py delete mode 100644 legacy/python-api/app-root/app/schemas/project.py delete mode 100644 legacy/python-api/app-root/app/schemas/search.py delete mode 100644 legacy/python-api/app-root/app/services/__init__.py delete mode 100644 legacy/python-api/app-root/app/services/chunker.py delete mode 100644 legacy/python-api/app-root/app/services/embeddings.py delete mode 100644 legacy/python-api/app-root/app/services/file_discovery.py delete mode 100644 legacy/python-api/app-root/app/services/indexer.py delete mode 100644 legacy/python-api/app-root/app/services/project_config.py delete mode 100644 legacy/python-api/app-root/app/services/reference_index.py delete mode 100644 legacy/python-api/app-root/app/services/symbol_index.py delete mode 100644 legacy/python-api/app-root/app/services/vector_store.py delete mode 100644 legacy/python-api/app-root/app/version.py delete mode 100644 legacy/python-api/app-root/migrate_to_path_based.py delete mode 100644 legacy/python-api/app-root/requirements-cuda.txt delete mode 100644 legacy/python-api/app-root/requirements-dev.txt delete mode 100644 legacy/python-api/app-root/requirements.txt delete mode 100644 legacy/python-api/pyproject.toml delete mode 100755 legacy/python-api/scripts/benchmark_embeddings.py delete mode 100644 legacy/python-api/scripts/profile_vram.py delete mode 100755 legacy/python-api/setup-local.sh delete mode 100755 legacy/python-api/setup.sh delete mode 100644 legacy/python-api/tests/__init__.py delete mode 100644 legacy/python-api/tests/test_api.py delete mode 100644 legacy/python-api/tests/test_chunker.py delete mode 100644 legacy/python-api/tests/test_file_discovery.py delete mode 100644 legacy/python-api/tests/test_project_config.py delete mode 100644 legacy/python-api/tests/test_search.py delete mode 100644 legacy/python-api/uv.lock diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml index c7d12e8..dbc57cb 100644 --- a/.github/workflows/codeql.yml +++ b/.github/workflows/codeql.yml @@ -2,8 +2,8 @@ name: "CodeQL" # Advanced setup. Replaces GitHub's "default setup" which auto-detects # and scans every language it finds — that included java-kotlin, ruby, -# rust, javascript-typescript, c-cpp, python false-positives from -# vendored CGO deps and the archived legacy/python-api/ tree. +# rust, javascript-typescript, and c-cpp false-positives from vendored +# CGO deps. # # To stop the duplicate runs you also need to disable the default # setup once: GitHub repo → Settings → Code security → Code scanning @@ -31,8 +31,7 @@ jobs: matrix: # Keep tightly scoped: only languages that actually ship code. # `actions` lints workflow YAML; `go` covers server + CLI. - # Do NOT add python (only legacy/python-api/, archived) or - # c-cpp (only transitive CGO deps, no first-party C). + # Do NOT add c-cpp (only transitive CGO deps, no first-party C). language: [actions, go] steps: - name: Checkout diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index ee61ac8..499dcce 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -12,7 +12,6 @@ code-index/ ├── cli/ # Go CLI (cix binary) │ ├── cmd/ # cobra commands │ └── internal/ # client, config, daemon, indexer, watcher -├── legacy/python-api/ # archived Python backend (deprecated, see doc/MIGRATION_FROM_PYTHON.md) └── skills/ # Claude Code skill definitions ``` diff --git a/README.md b/README.md index 1f06e00..f8021bf 100644 --- a/README.md +++ b/README.md @@ -404,10 +404,10 @@ Repos with vendored code, fixtures, or legacy migrations can pull unrelated path ```bash # One-off exclude for a single search -cix search "main entry point" --exclude legacy --exclude bench/fixtures +cix search "main entry point" --exclude vendor --exclude bench/fixtures # Permanent exclude — add to .cixignore (skips indexing entirely) -echo "legacy/" >> .cixignore +echo "vendor/" >> .cixignore echo "bench/fixtures/" >> .cixignore cix reindex --full ``` diff --git a/doc/DEPRECATION_POLICY.md b/doc/DEPRECATION_POLICY.md index aa1b7e3..994204a 100644 --- a/doc/DEPRECATION_POLICY.md +++ b/doc/DEPRECATION_POLICY.md @@ -20,10 +20,11 @@ See `doc/DOCKER_TAGS.md` for the current tag inventory. ## Python backend The Python FastAPI backend (`legacy/python-api/`) was deprecated in -`server/v0.3.0` (2026-04-24). It will be deleted from the repository in -`server/v0.4.0` (target: ~2026-07-24, ~90 days). +`server/v0.3.0` (2026-04-24) and removed from the repository in +`server/v0.4.0` (2026-04-28). The Docker image `dvcdsys/code-index:0.2-python-legacy` is preserved on Docker Hub indefinitely as a rollback option. -See `doc/MIGRATION_FROM_PYTHON.md` for migration instructions. +See `doc/MIGRATION_FROM_PYTHON.md` for migration instructions and the +rollback recipe. diff --git a/doc/MIGRATION_FROM_PYTHON.md b/doc/MIGRATION_FROM_PYTHON.md index 8315bd8..6f87725 100644 --- a/doc/MIGRATION_FROM_PYTHON.md +++ b/doc/MIGRATION_FROM_PYTHON.md @@ -66,6 +66,7 @@ If you need to go back to the Python server: ## Sunset timeline -The Python code in `legacy/python-api/` will be deleted in `server/v0.4.0` -(approximately 90 days after v0.3.0 — target ~2026-07-24). -The `:0.2-python-legacy` Docker tag is preserved on Docker Hub indefinitely. +The Python code in `legacy/python-api/` was deleted in `server/v0.4.0` +(2026-04-28). This document is retained for historical reference and as +the rollback recipe for the preserved `:0.2-python-legacy` Docker tag, +which stays on Docker Hub indefinitely. diff --git a/legacy/python-api/Makefile b/legacy/python-api/Makefile deleted file mode 100644 index 8c90dcd..0000000 --- a/legacy/python-api/Makefile +++ /dev/null @@ -1,244 +0,0 @@ -.PHONY: server-local-setup server-local-start server-local-stop server-local-restart \ - server-local-status server-local-logs \ - server-docker-start server-docker-stop server-docker-restart \ - server-docker-status server-docker-logs \ - server-cuda-start server-cuda-stop server-cuda-restart \ - server-cuda-status server-cuda-logs \ - docker-setup docker-push-all docker-push-cuda \ - test test-server test-client test-setup help - -PORT ?= 21847 -PYTHON ?= $(shell test -f .venv/bin/python && echo .venv/bin/python || (command -v uv >/dev/null 2>&1 && echo "uv run --python 3.12 python" || echo python3)) -DOCKER_USER ?= $(error DOCKER_USER is not set. Run: make docker-push-all DOCKER_USER=yourname) -IMAGE_NAME ?= code-index -CLI_VERSION ?= $(shell git describe --tags --match "cli/*" --abbrev=0 2>/dev/null | sed 's/^cli\///' || echo v0.2.0) -SERVER_VERSION ?= $(shell git describe --tags --match "server/*" --abbrev=0 2>/dev/null | sed 's/^server\///' || echo v0.2.0) -DATA_DIR ?= $(HOME)/.cix/data - -# ─── Server: Local (native, MPS on Mac) ───────────────────────────── - -# First-time setup + start (installs uv, Python 3.12, deps) -server-local-setup: - ./setup-local.sh - -# Start server from existing .venv -server-local-start: - @if [ ! -f .venv/bin/uvicorn ]; then \ - echo "ERROR: Run 'make server-local-setup' first."; \ - exit 1; \ - fi - @if curl -sf http://localhost:$(PORT)/health > /dev/null 2>&1; then \ - echo "Already running on port $(PORT)"; \ - exit 0; \ - fi - @. .env && \ - mkdir -p "$(DATA_DIR)/chroma" "$(DATA_DIR)/sqlite" && \ - echo "Starting server on port $(PORT)..." && \ - cd api && \ - PYTHONPATH="$$(pwd)" \ - API_KEY="$$API_KEY" \ - CHROMA_PERSIST_DIR="$${CHROMA_PERSIST_DIR:-$(DATA_DIR)/chroma}" \ - SQLITE_PATH="$${SQLITE_PATH:-$(DATA_DIR)/sqlite/projects.db}" \ - EMBEDDING_MODEL="$${EMBEDDING_MODEL:-awhiteside/CodeRankEmbed-Q8_0-GGUF}" \ - MAX_FILE_SIZE="$${MAX_FILE_SIZE:-524288}" \ - EXCLUDED_DIRS="$${EXCLUDED_DIRS:-node_modules,.git,.venv,__pycache__,dist,build,.next,.cache,.DS_Store}" \ - nohup ../.venv/bin/uvicorn app.main:app \ - --host 0.0.0.0 --port $(PORT) \ - > "$(DATA_DIR)/server.log" 2>&1 & \ - echo "$$!" > "$(DATA_DIR)/server.pid" && \ - echo "PID: $$(cat $(DATA_DIR)/server.pid)" && \ - cd .. && \ - for i in $$(seq 1 30); do \ - if curl -sf http://localhost:$(PORT)/health > /dev/null 2>&1; then \ - echo "Healthy: http://localhost:$(PORT)"; \ - exit 0; \ - fi; \ - sleep 2; \ - done; \ - echo "ERROR: Failed to start. Run: make server-local-logs"; exit 1 - -server-local-stop: - @if [ -f "$(DATA_DIR)/server.pid" ]; then \ - PID=$$(cat "$(DATA_DIR)/server.pid"); \ - if kill -0 "$$PID" 2>/dev/null; then \ - echo "Stopping server (PID $$PID)..."; \ - kill "$$PID"; \ - fi; \ - rm -f "$(DATA_DIR)/server.pid"; \ - fi - @PIDS=$$(lsof -ti :$(PORT) 2>/dev/null); \ - if [ -n "$$PIDS" ]; then \ - echo "Killing process(es) on port $(PORT): $$PIDS"; \ - echo "$$PIDS" | xargs kill 2>/dev/null || true; \ - fi - @echo "Stopped" - -server-local-restart: server-local-stop server-local-start - -server-local-status: - @if curl -sf http://localhost:$(PORT)/health > /dev/null 2>&1; then \ - echo "Running on port $(PORT)"; \ - curl -sf http://localhost:$(PORT)/health; echo; \ - else \ - echo "Not running"; \ - fi - @if [ -f "$(DATA_DIR)/server.pid" ] && kill -0 $$(cat "$(DATA_DIR)/server.pid") 2>/dev/null; then \ - echo "PID: $$(cat $(DATA_DIR)/server.pid)"; \ - fi - -server-local-logs: - @if [ -f "$(DATA_DIR)/server.log" ]; then \ - tail -f "$(DATA_DIR)/server.log"; \ - else \ - echo "No log file at $(DATA_DIR)/server.log"; \ - fi - -# ─── Server: Docker (CPU, multi-arch) ─────────────────────────────── - -server-docker-start: - @if [ ! -f .env ]; then \ - echo "Generating .env..."; \ - API_KEY="cix_$$(openssl rand -hex 32)"; \ - printf "API_KEY=$$API_KEY\nPORT=$(PORT)\nEMBEDDING_MODEL=awhiteside/CodeRankEmbed-Q8_0-GGUF\nMAX_FILE_SIZE=524288\nEXCLUDED_DIRS=node_modules,.git,.venv,__pycache__,dist,build,.next,.cache,.DS_Store\n" > .env; \ - echo "Created .env"; \ - fi - @mkdir -p "$(DATA_DIR)/chroma" "$(DATA_DIR)/sqlite" - docker compose up -d --build - @echo "Waiting for health..." - @for i in $$(seq 1 30); do \ - if curl -sf http://localhost:$(PORT)/health > /dev/null 2>&1; then \ - echo "Healthy: http://localhost:$(PORT)"; \ - exit 0; \ - fi; \ - sleep 2; \ - done; \ - echo "ERROR: Failed to start. Run: make server-docker-logs"; exit 1 - -server-docker-stop: - docker compose down - -server-docker-restart: server-docker-stop server-docker-start - -server-docker-status: - @docker compose ps - @if curl -sf http://localhost:$(PORT)/health > /dev/null 2>&1; then \ - curl -sf http://localhost:$(PORT)/health; echo; \ - fi - -server-docker-logs: - docker compose logs -f - -# ─── Server: CUDA (NVIDIA GPU) ────────────────────────────────────── - -server-cuda-start: - @if [ ! -f .env ]; then \ - echo "Generating .env..."; \ - API_KEY="cix_$$(openssl rand -hex 32)"; \ - printf "API_KEY=$$API_KEY\nPORT=$(PORT)\nEMBEDDING_MODEL=awhiteside/CodeRankEmbed-Q8_0-GGUF\nMAX_FILE_SIZE=524288\nEXCLUDED_DIRS=node_modules,.git,.venv,__pycache__,dist,build,.next,.cache,.DS_Store\n" > .env; \ - echo "Created .env"; \ - fi - @mkdir -p "$(DATA_DIR)/chroma" "$(DATA_DIR)/sqlite" - docker compose -f docker-compose.cuda.yml up -d --build - @echo "Waiting for health (CUDA)..." - @for i in $$(seq 1 45); do \ - if curl -sf http://localhost:$(PORT)/health > /dev/null 2>&1; then \ - echo "Healthy (CUDA): http://localhost:$(PORT)"; \ - exit 0; \ - fi; \ - sleep 2; \ - done; \ - echo "ERROR: Failed to start. Run: make server-cuda-logs"; exit 1 - -server-cuda-stop: - docker compose -f docker-compose.cuda.yml down - -server-cuda-restart: server-cuda-stop server-cuda-start - -server-cuda-status: - @docker compose -f docker-compose.cuda.yml ps - @if curl -sf http://localhost:$(PORT)/health > /dev/null 2>&1; then \ - curl -sf http://localhost:$(PORT)/health; echo; \ - fi - -server-cuda-logs: - docker compose -f docker-compose.cuda.yml logs -f - -# ─── Build & Push ─────────────────────────────────────────────────── - -docker-setup: - @if ! docker buildx inspect cix-builder > /dev/null 2>&1; then \ - echo "Creating buildx builder 'cix-builder'..."; \ - docker buildx create --name cix-builder --driver docker-container --bootstrap; \ - fi - docker buildx use cix-builder - @echo "Builder ready. Run: docker login" - -docker-push-cuda: - docker buildx build \ - --builder cix-builder \ - --platform linux/amd64 \ - --tag $(DOCKER_USER)/$(IMAGE_NAME):latest-cu130 \ - --tag $(DOCKER_USER)/$(IMAGE_NAME):$(SERVER_VERSION)-cu130 \ - --file api/Dockerfile.cuda \ - --push \ - . - -docker-push-all: - docker buildx build \ - --builder cix-builder \ - --platform linux/arm64,linux/amd64 \ - --tag $(DOCKER_USER)/$(IMAGE_NAME):latest \ - --tag $(DOCKER_USER)/$(IMAGE_NAME):$(SERVER_VERSION) \ - --file api/Dockerfile \ - --push \ - . - -# ─── Tests ─────────────────────────────────────────────────────────── - -test-setup: - $(PYTHON) -m pip install -r api/requirements-dev.txt - -test: test-server test-client - -test-server: - $(PYTHON) -m pytest api/ -v; code=$$?; [ $$code -eq 5 ] && exit 0 || exit $$code - -test-client: - cd cli && go test -v ./... - -# ─── Help ──────────────────────────────────────────────────────────── - -help: - @echo "=== Claude Code Index ===" - @echo "" - @echo "Server — Local (native, MPS on Mac):" - @echo " server-local-setup First-time setup (installs uv, Python, deps)" - @echo " server-local-start Start server" - @echo " server-local-stop Stop server" - @echo " server-local-restart Restart server" - @echo " server-local-status Check status" - @echo " server-local-logs Tail logs" - @echo "" - @echo "Server — Docker (CPU):" - @echo " server-docker-start Start server" - @echo " server-docker-stop Stop server" - @echo " server-docker-restart Restart server" - @echo " server-docker-status Check status" - @echo " server-docker-logs Tail logs" - @echo "" - @echo "Server — CUDA (NVIDIA GPU):" - @echo " server-cuda-start Start server" - @echo " server-cuda-stop Stop server" - @echo " server-cuda-restart Restart server" - @echo " server-cuda-status Check status" - @echo " server-cuda-logs Tail logs" - @echo "" - @echo "Build & Push:" - @echo " docker-setup Create buildx builder (run once)" - @echo " docker-push-all Build & push :latest + :$(SERVER_VERSION) (multi-arch)" - @echo " docker-push-cuda Build & push :latest-cu130 + :$(SERVER_VERSION)-cu130" - @echo "" - @echo "Tests:" - @echo " test Run all tests" - @echo " test-server Python API tests" - @echo " test-client Go CLI tests" \ No newline at end of file diff --git a/legacy/python-api/README.md b/legacy/python-api/README.md deleted file mode 100644 index cd17c5c..0000000 --- a/legacy/python-api/README.md +++ /dev/null @@ -1,9 +0,0 @@ -This directory contains the Python FastAPI implementation of cix-server, -deprecated as of server/v0.3.0 (2026-04-24). - -The Go server (`server/`) replaces it with identical HTTP API contract, -better performance, and a pure-Go binary with no Python runtime dependency. - -See `doc/MIGRATION_FROM_PYTHON.md` for migration instructions. - -Timeline: will be deleted in server/v0.4.0 (~90 days from deprecation). diff --git a/legacy/python-api/app-root/Dockerfile b/legacy/python-api/app-root/Dockerfile deleted file mode 100644 index 3cca86e..0000000 --- a/legacy/python-api/app-root/Dockerfile +++ /dev/null @@ -1,53 +0,0 @@ -# Stage 1: builder — compile deps on Ubuntu 24.04 -FROM ubuntu:24.04 AS builder - -ENV DEBIAN_FRONTEND=noninteractive - -RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends \ - python3 python3-dev python3-venv python3-pip \ - build-essential gcc curl cmake \ - && rm -rf /var/lib/apt/lists/* \ - && update-alternatives --install /usr/bin/python python /usr/bin/python3 1 - -# Clean pip/setuptools/wheel and reinstall patched versions -RUN rm -f /usr/lib/python3.12/EXTERNALLY-MANAGED && \ - apt-get purge -y python3-setuptools python3-wheel python3-pip 2>/dev/null; \ - curl -sS https://bootstrap.pypa.io/get-pip.py | python && \ - pip install --no-cache-dir "setuptools>=78.1.1" "wheel>=0.46.2" - -# Install all deps (llama-cpp-python will be compiled for CPU) -WORKDIR /build -COPY api/requirements.txt . -RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \ - pip install --no-cache-dir --prefix=/install -r requirements.txt && \ - pip install --no-cache-dir --prefix=/install --force-reinstall --no-deps packaging - -# Pre-download embedding model at build time -ARG EMBEDDING_MODEL="awhiteside/CodeRankEmbed-Q8_0-GGUF" -RUN PYTHONPATH=/install/local/lib/python3.12/dist-packages python -c \ - "from huggingface_hub import hf_hub_download, list_repo_files; \ - files = list_repo_files('${EMBEDDING_MODEL}'); \ - gguf_file = next((f for f in files if f.endswith('.gguf')), None); \ - hf_hub_download(repo_id='${EMBEDDING_MODEL}', filename=gguf_file)" - -# Stage 2: runtime — lightweight image without compilers -FROM ubuntu:24.04 - -ENV DEBIAN_FRONTEND=noninteractive - -RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends \ - python3 curl libopenblas-dev \ - && rm -rf /var/lib/apt/lists/* \ - && update-alternatives --install /usr/bin/python python /usr/bin/python3 1 - -# Copy installed Python packages and model from builder -COPY --from=builder /install/local /usr/local -COPY --from=builder /root/.cache/huggingface /root/.cache/huggingface - -WORKDIR /app -COPY api/app/ ./app/ -RUN mkdir -p /data/chroma /data/sqlite -EXPOSE 21847 -HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ - CMD curl -f http://localhost:21847/health || exit 1 -CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "21847"] diff --git a/legacy/python-api/app-root/Dockerfile.cuda b/legacy/python-api/app-root/Dockerfile.cuda deleted file mode 100644 index 7c8cf10..0000000 --- a/legacy/python-api/app-root/Dockerfile.cuda +++ /dev/null @@ -1,69 +0,0 @@ -# Stage 1: builder — compile deps in full devel image -FROM nvidia/cuda:12.6.3-devel-ubuntu24.04 AS builder - -ENV DEBIAN_FRONTEND=noninteractive - -RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends \ - python3 python3-dev python3-venv python3-pip \ - build-essential gcc curl cmake \ - && rm -rf /var/lib/apt/lists/* \ - && update-alternatives --install /usr/bin/python python /usr/bin/python3 1 - -# Clean pip/setuptools/wheel and reinstall patched versions -RUN rm -f /usr/lib/python3.12/EXTERNALLY-MANAGED && \ - apt-get purge -y python3-setuptools python3-wheel python3-pip 2>/dev/null; \ - curl -sS https://bootstrap.pypa.io/get-pip.py | python && \ - pip install --no-cache-dir "setuptools>=78.1.1" "wheel>=0.46.2" - -# Install Python deps into /install prefix -WORKDIR /build -COPY api/requirements-cuda.txt requirements.txt - -# Make CUDA driver stub findable when linking the llama-cpp-python wheel. -# The devel image ships /usr/local/cuda/lib64/stubs/libcuda.so but tools like -# llama.cpp's mtmd-cli look for libcuda.so.1 — create the expected symlink and -# add the stub dir to LIBRARY_PATH (link-time search, runtime uses the driver). -RUN ln -sf /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 -ENV LIBRARY_PATH=/usr/local/cuda/lib64/stubs:${LIBRARY_PATH} - -# Enable CUDA for llama-cpp-python. Skip llama.cpp tools/examples (mtmd-cli etc.) -# — we only need embeddings, and those binaries link against libcuda.so.1 which -# isn't available in the builder image (only the stub is). -RUN CMAKE_ARGS="-DGGML_CUDA=on -DLLAMA_BUILD_TOOLS=OFF -DLLAMA_BUILD_EXAMPLES=OFF" \ - LDFLAGS="-Wl,-rpath-link,/usr/local/cuda/lib64/stubs" \ - pip install --no-cache-dir --prefix=/install -r requirements.txt && \ - pip install --no-cache-dir --prefix=/install --force-reinstall --no-deps packaging - -# Pre-download embedding model at build time -ARG EMBEDDING_MODEL="awhiteside/CodeRankEmbed-Q8_0-GGUF" -RUN PYTHONPATH=/install/local/lib/python3.12/dist-packages python -c \ - "from huggingface_hub import hf_hub_download, list_repo_files; \ - files = list_repo_files('${EMBEDDING_MODEL}'); \ - gguf_file = next((f for f in files if f.endswith('.gguf')), None); \ - hf_hub_download(repo_id='${EMBEDDING_MODEL}', filename=gguf_file)" - -# Stage 2: runtime — lightweight image without compilers -FROM nvidia/cuda:12.6.3-runtime-ubuntu24.04 - -ENV DEBIAN_FRONTEND=noninteractive -ENV TZ=Etc/UTC -ENV NVIDIA_VISIBLE_DEVICES=all -ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility - -RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends \ - python3 curl \ - && rm -rf /var/lib/apt/lists/* \ - && update-alternatives --install /usr/bin/python python /usr/bin/python3 1 - -# Copy installed Python packages and model from builder -COPY --from=builder /install/local /usr/local -COPY --from=builder /root/.cache/huggingface /root/.cache/huggingface - -WORKDIR /app -COPY api/app/ ./app/ -RUN mkdir -p /data/chroma /data/sqlite - -EXPOSE 21847 -HEALTHCHECK --interval=30s --timeout=10s --start-period=90s --retries=3 \ - CMD curl -f http://localhost:21847/health || exit 1 -CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "21847"] \ No newline at end of file diff --git a/legacy/python-api/app-root/app/__init__.py b/legacy/python-api/app-root/app/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/legacy/python-api/app-root/app/auth.py b/legacy/python-api/app-root/app/auth.py deleted file mode 100644 index 1cacbc7..0000000 --- a/legacy/python-api/app-root/app/auth.py +++ /dev/null @@ -1,18 +0,0 @@ -from fastapi import Depends, HTTPException, status -from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer - -from .config import settings - -_scheme = HTTPBearer() - - -async def verify_api_key( - credentials: HTTPAuthorizationCredentials = Depends(_scheme), -) -> str: - token = credentials.credentials - if not settings.api_key or token != settings.api_key: - raise HTTPException( - status_code=status.HTTP_401_UNAUTHORIZED, - detail="Invalid or missing API key", - ) - return token diff --git a/legacy/python-api/app-root/app/config.py b/legacy/python-api/app-root/app/config.py deleted file mode 100644 index 59330b6..0000000 --- a/legacy/python-api/app-root/app/config.py +++ /dev/null @@ -1,54 +0,0 @@ -import os -from pydantic_settings import BaseSettings, SettingsConfigDict - - -class Settings(BaseSettings): - api_key: str = "" - port: int = 21847 - embedding_model: str = "awhiteside/CodeRankEmbed-Q8_0-GGUF" - chroma_persist_dir: str = "/data/chroma" - sqlite_path: str = "/data/sqlite/projects.db" - max_file_size: int = 524288 - excluded_dirs: str = "node_modules,.git,.venv,__pycache__,dist,build,.next,.cache,.DS_Store" - - @property - def model_safe_name(self) -> str: - return self.embedding_model.replace("/", "_").replace("-", "_").lower() - - @property - def dynamic_chroma_persist_dir(self) -> str: - return f"{self.chroma_persist_dir}_{self.model_safe_name}" - - @property - def dynamic_sqlite_path(self) -> str: - base, ext = os.path.splitext(self.sqlite_path) - return f"{base}_{self.model_safe_name}{ext}" - - # Concurrent embedding calls. llama-cpp-python holds a single context per Llama - # instance, so parallel create_embedding() calls on the same model serialize - # anyway. Keep at 1 unless you instantiate separate models. - max_embedding_concurrency: int = 1 - - # Seconds an /index/files request waits for a free embedding slot before the - # server returns HTTP 503 with Retry-After (the Go client auto-retries). - # 0 = reject immediately. - embedding_queue_timeout: int = 300 - - # Maximum chunk length in tokens. 1 token ≈ 4 ASCII chars. - # The chunker enforces this via MAX_CHUNK_SIZE = max_chunk_tokens * 4. - # Also drives n_ctx for the llama.cpp context buffer. - max_chunk_tokens: int = 1500 - - model_config = SettingsConfigDict( - env_file=os.path.join(os.path.dirname(__file__), "../../.env"), - env_file_encoding="utf-8", - case_sensitive=False, - extra="ignore", - ) - - @property - def excluded_dirs_list(self) -> list[str]: - return [d.strip() for d in self.excluded_dirs.split(",") if d.strip()] - - -settings = Settings() diff --git a/legacy/python-api/app-root/app/core/__init__.py b/legacy/python-api/app-root/app/core/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/legacy/python-api/app-root/app/core/exceptions.py b/legacy/python-api/app-root/app/core/exceptions.py deleted file mode 100644 index 67214c9..0000000 --- a/legacy/python-api/app-root/app/core/exceptions.py +++ /dev/null @@ -1,15 +0,0 @@ -class ProjectNotFoundError(Exception): - def __init__(self, project_id: str): - self.project_id = project_id - super().__init__(f"Project not found: {project_id}") - - -class IndexingError(Exception): - def __init__(self, message: str, project_id: str | None = None): - self.project_id = project_id - super().__init__(message) - - -class AuthError(Exception): - def __init__(self, message: str = "Invalid or missing API key"): - super().__init__(message) diff --git a/legacy/python-api/app-root/app/core/language.py b/legacy/python-api/app-root/app/core/language.py deleted file mode 100644 index 7fe9a97..0000000 --- a/legacy/python-api/app-root/app/core/language.py +++ /dev/null @@ -1,88 +0,0 @@ -EXTENSION_MAP: dict[str, str] = { - # Systems / compiled - ".py": "python", - ".go": "go", - ".rs": "rust", - ".java": "java", - ".c": "c", - ".h": "c", - ".cpp": "cpp", - ".cc": "cpp", - ".cxx": "cpp", - ".hpp": "cpp", - ".cs": "c_sharp", - ".swift": "swift", - ".kt": "kotlin", - ".scala": "scala", - ".zig": "zig", - ".jl": "julia", - ".f90": "fortran", - ".f95": "fortran", - ".f03": "fortran", - ".f": "fortran", - ".m": "objc", - ".mm": "objc", - # Web / scripting - ".ts": "typescript", - ".tsx": "typescript", - ".js": "javascript", - ".jsx": "javascript", - ".rb": "ruby", - ".php": "php", - ".lua": "lua", - ".sh": "bash", - ".bash": "bash", - ".zsh": "bash", - ".r": "r", - ".R": "r", - ".dart": "dart", - ".ex": "elixir", - ".exs": "elixir", - ".erl": "erlang", - ".hs": "haskell", - ".ml": "ocaml", - ".lisp": "commonlisp", - ".cl": "commonlisp", - ".svelte": "svelte", - # Markup / config / data - ".html": "html", - ".css": "css", - ".scss": "scss", - ".sql": "sql", - ".yaml": "yaml", - ".yml": "yaml", - ".json": "json", - ".toml": "toml", - ".xml": "xml", - ".md": "markdown", - ".graphql": "graphql", - ".gql": "graphql", - ".re": "regex", - # Infra / build - ".tf": "hcl", - ".hcl": "hcl", - ".cmake": "cmake", - "CMakeLists.txt": "cmake", - "Makefile": "make", - "Dockerfile": "dockerfile", -} - -# Filename-based detection (no extension or special names) -FILENAME_MAP: dict[str, str] = { - "CMakeLists.txt": "cmake", - "Makefile": "make", - "GNUmakefile": "make", - "Dockerfile": "dockerfile", -} - - -def detect_language(file_path: str) -> str | None: - from pathlib import Path - p = Path(file_path) - # Check filename first (Makefile, Dockerfile, CMakeLists.txt) - name = p.name - lang = FILENAME_MAP.get(name) - if lang: - return lang - ext = p.suffix.lower() - return EXTENSION_MAP.get(ext) diff --git a/legacy/python-api/app-root/app/core/path_encoding.py b/legacy/python-api/app-root/app/core/path_encoding.py deleted file mode 100644 index 412d9da..0000000 --- a/legacy/python-api/app-root/app/core/path_encoding.py +++ /dev/null @@ -1,29 +0,0 @@ -"""Project path hashing for safe URL routing. - -Project paths contain slashes which conflict with FastAPI path routing. -We use SHA1 hash as a short, URL-safe identifier for projects. -The actual path is stored in the database and resolved by hash. -""" - -import hashlib - - -def hash_project_path(path: str) -> str: - """Compute SHA1 hash of a project path (first 16 hex chars).""" - return hashlib.sha1(path.encode()).hexdigest()[:16] - - -async def resolve_project_path(path_hash: str) -> str: - """Look up the actual project path by its SHA1 hash prefix.""" - from ..core.exceptions import ProjectNotFoundError - from ..database import get_db - - db = await get_db() - cursor = await db.execute("SELECT host_path FROM projects") - rows = await cursor.fetchall() - - for row in rows: - if hash_project_path(row["host_path"]) == path_hash: - return row["host_path"] - - raise ProjectNotFoundError(path_hash) diff --git a/legacy/python-api/app-root/app/database.py b/legacy/python-api/app-root/app/database.py deleted file mode 100644 index a964e05..0000000 --- a/legacy/python-api/app-root/app/database.py +++ /dev/null @@ -1,101 +0,0 @@ -import aiosqlite -from pathlib import Path - -from .config import settings - -_db: aiosqlite.Connection | None = None - -_SCHEMA = """ -CREATE TABLE IF NOT EXISTS projects ( - host_path TEXT PRIMARY KEY, - container_path TEXT NOT NULL, - languages TEXT DEFAULT '[]', - settings TEXT DEFAULT '{}', - stats TEXT DEFAULT '{"total_files":0,"indexed_files":0,"total_chunks":0,"total_symbols":0}', - status TEXT DEFAULT 'created', - created_at TEXT NOT NULL, - updated_at TEXT NOT NULL, - last_indexed_at TEXT -); - -CREATE TABLE IF NOT EXISTS file_hashes ( - project_path TEXT NOT NULL, - file_path TEXT NOT NULL, - content_hash TEXT NOT NULL, - indexed_at TEXT NOT NULL, - PRIMARY KEY (project_path, file_path), - FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE -); - -CREATE TABLE IF NOT EXISTS symbols ( - id TEXT PRIMARY KEY, - project_path TEXT NOT NULL, - name TEXT NOT NULL, - kind TEXT NOT NULL, - file_path TEXT NOT NULL, - line INTEGER NOT NULL, - end_line INTEGER NOT NULL, - language TEXT NOT NULL, - signature TEXT, - parent_name TEXT, - docstring TEXT, - FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE -); - -CREATE INDEX IF NOT EXISTS idx_symbols_project_name ON symbols(project_path, name); -CREATE INDEX IF NOT EXISTS idx_symbols_project_kind ON symbols(project_path, kind); -CREATE INDEX IF NOT EXISTS idx_symbols_project_file ON symbols(project_path, file_path); - -CREATE TABLE IF NOT EXISTS refs ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - project_path TEXT NOT NULL, - name TEXT NOT NULL, - file_path TEXT NOT NULL, - line INTEGER NOT NULL, - col INTEGER NOT NULL, - language TEXT NOT NULL, - FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE -); - -CREATE INDEX IF NOT EXISTS idx_refs_project_name ON refs(project_path, name); -CREATE INDEX IF NOT EXISTS idx_refs_project_file ON refs(project_path, file_path); - -CREATE TABLE IF NOT EXISTS index_runs ( - id TEXT PRIMARY KEY, - project_path TEXT NOT NULL, - started_at TEXT NOT NULL, - completed_at TEXT, - files_processed INTEGER DEFAULT 0, - files_total INTEGER DEFAULT 0, - chunks_created INTEGER DEFAULT 0, - status TEXT DEFAULT 'running', - error_message TEXT, - FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE -); -""" - - -async def init_db() -> aiosqlite.Connection: - global _db - db_path = Path(settings.dynamic_sqlite_path) - db_path.parent.mkdir(parents=True, exist_ok=True) - _db = await aiosqlite.connect(str(db_path)) - _db.row_factory = aiosqlite.Row - await _db.execute("PRAGMA journal_mode=WAL") - await _db.execute("PRAGMA foreign_keys=ON") - await _db.executescript(_SCHEMA) - await _db.commit() - return _db - - -async def get_db() -> aiosqlite.Connection: - if _db is None: - raise RuntimeError("Database not initialized") - return _db - - -async def close_db() -> None: - global _db - if _db is not None: - await _db.close() - _db = None diff --git a/legacy/python-api/app-root/app/main.py b/legacy/python-api/app-root/app/main.py deleted file mode 100644 index ee2902b..0000000 --- a/legacy/python-api/app-root/app/main.py +++ /dev/null @@ -1,69 +0,0 @@ -import logging -from contextlib import asynccontextmanager - -from fastapi import FastAPI, Request -from fastapi.responses import JSONResponse - -from .config import settings -from .core.exceptions import ProjectNotFoundError, IndexingError, AuthError -from .database import init_db, close_db -from .routers import health, projects, indexing, search - -from .version import SERVER_VERSION - -logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(__name__) - - -@asynccontextmanager -async def lifespan(app: FastAPI): - logger.info("Starting up (v%s) — initializing database...", SERVER_VERSION) - await init_db() - logger.info("Database initialized") - - logger.info("Loading embedding model: %s", settings.embedding_model) - from .services.embeddings import embedding_service - await embedding_service.load_model() - logger.info("Embedding model loaded") - - yield - - logger.info("Shutting down...") - await close_db() - - -app = FastAPI( - title="Claude Code Index API", - version=SERVER_VERSION, - lifespan=lifespan, -) - - -@app.middleware("http") -async def log_client_version(request: Request, call_next): - client_version = request.headers.get("X-Client-Version", "unknown") - if client_version != "unknown": - logger.info("Request from client version: %s", client_version) - response = await call_next(request) - return response - - -app.include_router(health.router) -app.include_router(projects.router) -app.include_router(indexing.router) -app.include_router(search.router) - - -@app.exception_handler(ProjectNotFoundError) -async def project_not_found_handler(request: Request, exc: ProjectNotFoundError): - return JSONResponse(status_code=404, content={"detail": str(exc)}) - - -@app.exception_handler(IndexingError) -async def indexing_error_handler(request: Request, exc: IndexingError): - return JSONResponse(status_code=500, content={"detail": str(exc)}) - - -@app.exception_handler(AuthError) -async def auth_error_handler(request: Request, exc: AuthError): - return JSONResponse(status_code=401, content={"detail": str(exc)}) diff --git a/legacy/python-api/app-root/app/routers/__init__.py b/legacy/python-api/app-root/app/routers/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/legacy/python-api/app-root/app/routers/health.py b/legacy/python-api/app-root/app/routers/health.py deleted file mode 100644 index 0857cbd..0000000 --- a/legacy/python-api/app-root/app/routers/health.py +++ /dev/null @@ -1,36 +0,0 @@ -from fastapi import APIRouter, Depends - -from ..auth import verify_api_key -from ..database import get_db - -from ..version import SERVER_VERSION, API_VERSION - -router = APIRouter() - - -@router.get("/health") -async def health(): - return {"status": "ok"} - - -@router.get("/api/v1/status", dependencies=[Depends(verify_api_key)]) -async def status(): - db = await get_db() - cursor = await db.execute("SELECT COUNT(*) FROM projects") - row = await cursor.fetchone() - project_count = row[0] if row else 0 - - cursor = await db.execute( - "SELECT COUNT(*) FROM index_runs WHERE status = 'running'" - ) - row = await cursor.fetchone() - active_jobs = row[0] if row else 0 - - return { - "status": "ok", - "server_version": SERVER_VERSION, - "api_version": API_VERSION, - "model_loaded": True, - "projects": project_count, - "active_indexing_jobs": active_jobs, - } diff --git a/legacy/python-api/app-root/app/routers/indexing.py b/legacy/python-api/app-root/app/routers/indexing.py deleted file mode 100644 index 665d850..0000000 --- a/legacy/python-api/app-root/app/routers/indexing.py +++ /dev/null @@ -1,157 +0,0 @@ -from ..core.path_encoding import resolve_project_path - -from fastapi import APIRouter, Depends, HTTPException, status - -from ..auth import verify_api_key -from ..core.exceptions import ProjectNotFoundError -from ..database import get_db -from ..schemas.indexing import ( - IndexBeginRequest, - IndexBeginResponse, - IndexFilesRequest, - IndexFilesResponse, - IndexFinishRequest, - IndexFinishResponse, - IndexProgressResponse, - IndexRequest, - IndexTriggerResponse, -) -from ..services.embeddings import EmbeddingBusyError -from ..services.indexer import indexer_service - -router = APIRouter( - prefix="/api/v1/projects", - dependencies=[Depends(verify_api_key)], -) - - -async def _ensure_project(project_path: str): - db = await get_db() - cursor = await db.execute("SELECT host_path FROM projects WHERE host_path = ?", (project_path,)) - row = await cursor.fetchone() - if not row: - raise ProjectNotFoundError(project_path) - - -@router.post( - "/{project_path}/index", - status_code=status.HTTP_202_ACCEPTED, - response_model=IndexTriggerResponse, -) -async def trigger_index(project_path: str, body: IndexRequest | None = None): - project_path = await resolve_project_path(project_path) - await _ensure_project(project_path) - full = body.full if body else False - batch_size = body.batch_size if body else 20 - run_id = await indexer_service.start_indexing(project_path, full=full, batch_size=batch_size) - return IndexTriggerResponse( - run_id=run_id, - message="Indexing started" if not full else "Full reindex started", - ) - - -@router.get("/{project_path}/index/status", response_model=IndexProgressResponse) -async def index_status(project_path: str): - project_path = await resolve_project_path(project_path) - await _ensure_project(project_path) - progress = await indexer_service.get_progress(project_path) - - if progress is None: - # Check last run - db = await get_db() - cursor = await db.execute( - "SELECT * FROM index_runs WHERE project_path = ? ORDER BY started_at DESC LIMIT 1", - (project_path,), - ) - row = await cursor.fetchone() - if row: - return IndexProgressResponse( - status=row["status"], - progress={ - "files_processed": row["files_processed"], - "files_total": row["files_total"], - "chunks_created": row["chunks_created"], - }, - ) - return IndexProgressResponse(status="idle") - - return IndexProgressResponse( - status=progress.status, - progress={ - "phase": progress.phase, - "files_discovered": progress.files_discovered, - "files_processed": progress.files_processed, - "files_total": progress.files_total, - "chunks_created": progress.chunks_created, - "elapsed_seconds": round(progress.elapsed_seconds, 1), - "estimated_remaining": round(progress.estimated_remaining, 1), - }, - ) - - -@router.post("/{project_path}/index/cancel") -async def cancel_index(project_path: str): - project_path = await resolve_project_path(project_path) - await _ensure_project(project_path) - cancelled = await indexer_service.cancel(project_path) - if not cancelled: - raise HTTPException( - status_code=status.HTTP_404_NOT_FOUND, - detail="No active indexing job found", - ) - return {"message": "Indexing cancellation requested"} - - -# --- New three-phase protocol endpoints --- - -@router.post( - "/{project_path}/index/begin", - response_model=IndexBeginResponse, -) -async def begin_index(project_path: str, body: IndexBeginRequest | None = None): - project_path = await resolve_project_path(project_path) - await _ensure_project(project_path) - full = body.full if body else False - run_id, stored_hashes = await indexer_service.begin_indexing(project_path, full=full) - return IndexBeginResponse(run_id=run_id, stored_hashes=stored_hashes) - - -@router.post( - "/{project_path}/index/files", - response_model=IndexFilesResponse, -) -async def index_files(project_path: str, body: IndexFilesRequest): - project_path = await resolve_project_path(project_path) - await _ensure_project(project_path) - try: - files_accepted, chunks_created, total = await indexer_service.process_files( - project_path, body.run_id, body.files, - ) - except EmbeddingBusyError as exc: - raise HTTPException( - status_code=status.HTTP_503_SERVICE_UNAVAILABLE, - detail=f"GPU is busy processing another embedding request, retry after {exc.retry_after}s", - headers={"Retry-After": str(exc.retry_after)}, - ) - return IndexFilesResponse( - files_accepted=files_accepted, - chunks_created=chunks_created, - files_processed_total=total, - ) - - -@router.post( - "/{project_path}/index/finish", - response_model=IndexFinishResponse, -) -async def finish_index(project_path: str, body: IndexFinishRequest): - project_path = await resolve_project_path(project_path) - await _ensure_project(project_path) - status_str, files_processed, chunks_created = await indexer_service.finish_indexing( - project_path, body.run_id, body.deleted_paths, body.total_files_discovered, - ) - return IndexFinishResponse( - status=status_str, - files_processed=files_processed, - chunks_created=chunks_created, - ) diff --git a/legacy/python-api/app-root/app/routers/projects.py b/legacy/python-api/app-root/app/routers/projects.py deleted file mode 100644 index e65dedd..0000000 --- a/legacy/python-api/app-root/app/routers/projects.py +++ /dev/null @@ -1,141 +0,0 @@ -import json -from datetime import datetime, timezone -from ..core.path_encoding import resolve_project_path - -from fastapi import APIRouter, Depends, HTTPException, status - -from ..auth import verify_api_key -from ..core.exceptions import ProjectNotFoundError -from ..database import get_db -from ..schemas.project import ( - ProjectCreate, - ProjectListResponse, - ProjectResponse, - ProjectSettings, - ProjectStats, - ProjectUpdate, -) - -router = APIRouter( - prefix="/api/v1/projects", - dependencies=[Depends(verify_api_key)], -) - - -def _row_to_project(row) -> ProjectResponse: - return ProjectResponse( - host_path=row["host_path"], - container_path=row["container_path"], - languages=json.loads(row["languages"]), - settings=ProjectSettings(**json.loads(row["settings"])), - stats=ProjectStats(**json.loads(row["stats"])), - status=row["status"], - created_at=row["created_at"], - updated_at=row["updated_at"], - last_indexed_at=row["last_indexed_at"], - ) - - -@router.post("", status_code=status.HTTP_201_CREATED, response_model=ProjectResponse) -async def create_project(body: ProjectCreate): - db = await get_db() - now = datetime.now(timezone.utc).isoformat() - container_path = body.host_path - default_settings = ProjectSettings() - default_stats = ProjectStats() - - try: - await db.execute( - """INSERT INTO projects (host_path, container_path, languages, settings, stats, status, created_at, updated_at) - VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", - ( - body.host_path, - container_path, - "[]", - default_settings.model_dump_json(), - default_stats.model_dump_json(), - "created", - now, - now, - ), - ) - await db.commit() - except Exception as e: - if "UNIQUE" in str(e): - raise HTTPException( - status_code=status.HTTP_409_CONFLICT, - detail=f"Project at path '{body.host_path}' already exists", - ) - raise - - cursor = await db.execute("SELECT * FROM projects WHERE host_path = ?", (body.host_path,)) - row = await cursor.fetchone() - return _row_to_project(row) - - -@router.get("", response_model=ProjectListResponse) -async def list_projects(): - db = await get_db() - cursor = await db.execute("SELECT * FROM projects ORDER BY created_at DESC") - rows = await cursor.fetchall() - projects = [_row_to_project(row) for row in rows] - return ProjectListResponse(projects=projects, total=len(projects)) - - -@router.get("/{project_path}", response_model=ProjectResponse) -async def get_project(project_path: str): - project_path = await resolve_project_path(project_path) - db = await get_db() - cursor = await db.execute("SELECT * FROM projects WHERE host_path = ?", (project_path,)) - row = await cursor.fetchone() - if not row: - raise ProjectNotFoundError(project_path) - return _row_to_project(row) - - -@router.patch("/{project_path}", response_model=ProjectResponse) -async def update_project(project_path: str, body: ProjectUpdate): - project_path = await resolve_project_path(project_path) - db = await get_db() - cursor = await db.execute("SELECT * FROM projects WHERE host_path = ?", (project_path,)) - row = await cursor.fetchone() - if not row: - raise ProjectNotFoundError(project_path) - - now = datetime.now(timezone.utc).isoformat() - updates = [] - values = [] - - if body.settings is not None: - updates.append("settings = ?") - values.append(body.settings.model_dump_json()) - - if updates: - updates.append("updated_at = ?") - values.append(now) - values.append(project_path) - await db.execute( - f"UPDATE projects SET {', '.join(updates)} WHERE host_path = ?", values - ) - await db.commit() - - cursor = await db.execute("SELECT * FROM projects WHERE host_path = ?", (project_path,)) - row = await cursor.fetchone() - return _row_to_project(row) - - -@router.delete("/{project_path}", status_code=status.HTTP_204_NO_CONTENT) -async def delete_project(project_path: str): - project_path = await resolve_project_path(project_path) - db = await get_db() - cursor = await db.execute("SELECT * FROM projects WHERE host_path = ?", (project_path,)) - row = await cursor.fetchone() - if not row: - raise ProjectNotFoundError(project_path) - - # Delete ChromaDB collection - from ..services.vector_store import vector_store_service - vector_store_service.delete_collection(project_path) - - await db.execute("DELETE FROM projects WHERE host_path = ?", (project_path,)) - await db.commit() diff --git a/legacy/python-api/app-root/app/routers/search.py b/legacy/python-api/app-root/app/routers/search.py deleted file mode 100644 index b843dc6..0000000 --- a/legacy/python-api/app-root/app/routers/search.py +++ /dev/null @@ -1,280 +0,0 @@ -import json -import time -from collections import Counter -from pathlib import Path -from ..core.path_encoding import resolve_project_path - -from fastapi import APIRouter, Depends - -from ..auth import verify_api_key -from ..core.exceptions import ProjectNotFoundError -from ..database import get_db -from ..schemas.search import ( - DefinitionItem, - DefinitionRequest, - DefinitionResponse, - FileResultItem, - FileSearchRequest, - FileSearchResponse, - ProjectSummary, - ReferenceItem, - ReferenceRequest, - ReferenceResponse, - SearchRequest, - SearchResponse, - SearchResultItem, - SymbolResultItem, - SymbolSearchRequest, - SymbolSearchResponse, -) -from ..services.embeddings import embedding_service -from ..services.reference_index import reference_index_service -from ..services.symbol_index import symbol_index_service -from ..services.vector_store import vector_store_service - -router = APIRouter( - prefix="/api/v1/projects", - dependencies=[Depends(verify_api_key)], -) - - -async def _get_project(project_path: str): - db = await get_db() - cursor = await db.execute("SELECT * FROM projects WHERE host_path = ?", (project_path,)) - row = await cursor.fetchone() - if not row: - raise ProjectNotFoundError(project_path) - return row - - -@router.post("/{project_path}/search", response_model=SearchResponse) -async def semantic_search(project_path: str, body: SearchRequest): - project_path = await resolve_project_path(project_path) - await _get_project(project_path) - start = time.time() - - query_embedding = await embedding_service.embed_query(body.query) - - where = {} - if body.languages: - if len(body.languages) == 1: - where["language"] = body.languages[0] - else: - where["$or"] = [{"language": lang} for lang in body.languages] - - results = await vector_store_service.search( - project_path, query_embedding, limit=body.limit * 2, where=where or None, - ) - - # Filter by min_score and path patterns - filtered = [] - for r in results: - if r["score"] < body.min_score: - continue - if body.paths: - if not any(r["file_path"].startswith(p) or p in r["file_path"] for p in body.paths): - continue - filtered.append(r) - - filtered = filtered[:body.limit] - elapsed = (time.time() - start) * 1000 - - return SearchResponse( - results=[SearchResultItem(**r) for r in filtered], - total=len(filtered), - query_time_ms=round(elapsed, 1), - ) - - -@router.post("/{project_path}/search/symbols", response_model=SymbolSearchResponse) -async def symbol_search(project_path: str, body: SymbolSearchRequest): - project_path = await resolve_project_path(project_path) - await _get_project(project_path) - - symbols = await symbol_index_service.search( - project_path, body.query, kinds=body.kinds or None, limit=body.limit, - ) - - results = [ - SymbolResultItem( - name=s.name, - kind=s.kind, - file_path=s.file_path, - line=s.line, - end_line=s.end_line, - language=s.language, - signature=s.signature, - parent_name=s.parent_name, - ) - for s in symbols - ] - - return SymbolSearchResponse(results=results, total=len(results)) - - -@router.post("/{project_path}/search/files", response_model=FileSearchResponse) -async def file_search(project_path: str, body: FileSearchRequest): - project_path = await resolve_project_path(project_path) - await _get_project(project_path) - - db = await get_db() - cursor = await db.execute( - "SELECT file_path FROM file_hashes WHERE project_path = ? AND file_path LIKE ?", - (project_path, f"%{body.query}%"), - ) - rows = await cursor.fetchall() - - from ..core.language import detect_language - - results = [] - for row in rows[:body.limit]: - fp = row["file_path"] - results.append(FileResultItem(file_path=fp, language=detect_language(fp))) - - return FileSearchResponse(results=results, total=len(results)) - - -@router.post("/{project_path}/search/definitions", response_model=DefinitionResponse) -async def definition_search(project_path: str, body: DefinitionRequest): - """Go to Definition — find where a symbol is defined.""" - project_path = await resolve_project_path(project_path) - await _get_project(project_path) - - db = await get_db() - - # Exact name match in symbols table - sql = "SELECT * FROM symbols WHERE project_path = ? AND name = ?" - params: list = [project_path, body.symbol] - - if body.kind: - sql += " AND kind = ?" - params.append(body.kind) - - if body.file_path: - sql += " AND file_path = ?" - params.append(body.file_path) - - sql += " ORDER BY name LIMIT ?" - params.append(body.limit) - - cursor = await db.execute(sql, params) - rows = await cursor.fetchall() - - # If no exact match, try case-insensitive - if not rows: - sql = "SELECT * FROM symbols WHERE project_path = ? AND name LIKE ?" - params = [project_path, body.symbol] - - if body.kind: - sql += " AND kind = ?" - params.append(body.kind) - - if body.file_path: - sql += " AND file_path = ?" - params.append(body.file_path) - - sql += " ORDER BY name LIMIT ?" - params.append(body.limit) - - cursor = await db.execute(sql, params) - rows = await cursor.fetchall() - - results = [ - DefinitionItem( - name=row["name"], - kind=row["kind"], - file_path=row["file_path"], - line=row["line"], - end_line=row["end_line"], - language=row["language"], - signature=row["signature"], - parent_name=row["parent_name"], - ) - for row in rows - ] - - return DefinitionResponse(results=results, total=len(results)) - - -@router.post("/{project_path}/search/references", response_model=ReferenceResponse) -async def reference_search(project_path: str, body: ReferenceRequest): - """Find References — find all places where a symbol is used (AST-based).""" - project_path = await resolve_project_path(project_path) - await _get_project(project_path) - - refs = await reference_index_service.search( - project_path, body.symbol, file_path=body.file_path, limit=body.limit, - ) - - results = [ - ReferenceItem( - file_path=ref.file_path, - start_line=ref.line, - end_line=ref.line, - content="", - chunk_type="reference", - symbol_name=ref.name, - language=ref.language, - ) - for ref in refs - ] - - return ReferenceResponse(results=results, total=len(results)) - - -@router.get("/{project_path}/summary", response_model=ProjectSummary) -async def project_summary(project_path: str): - project_path = await resolve_project_path(project_path) - project = await _get_project(project_path) - stats = json.loads(project["stats"]) - languages = json.loads(project["languages"]) - - # Top directories - db = await get_db() - cursor = await db.execute( - "SELECT file_path FROM file_hashes WHERE project_path = ?", - (project_path,), - ) - rows = await cursor.fetchall() - - dir_counter: Counter = Counter() - for row in rows: - parts = Path(row["file_path"]).parts - if len(parts) > 3: - dir_counter[str(Path(*parts[:4]))] += 1 - elif len(parts) > 1: - dir_counter[str(Path(*parts[:2]))] += 1 - - top_dirs = [ - {"path": path, "file_count": count} - for path, count in dir_counter.most_common(10) - ] - - # Recent symbols + accurate count directly from DB - cursor = await db.execute( - "SELECT name, kind, file_path, language FROM symbols WHERE project_path = ? LIMIT 20", - (project_path,), - ) - symbol_rows = await cursor.fetchall() - recent_symbols = [ - {"name": r["name"], "kind": r["kind"], "file_path": r["file_path"], "language": r["language"]} - for r in symbol_rows - ] - - cursor = await db.execute( - "SELECT COUNT(*) as cnt FROM symbols WHERE project_path = ?", - (project_path,), - ) - row = await cursor.fetchone() - total_symbols = row["cnt"] if row else 0 - - return ProjectSummary( - host_path=project_path, - status=project["status"], - languages=languages, - total_files=stats.get("total_files", 0), - total_chunks=stats.get("total_chunks", 0), - total_symbols=total_symbols, - top_directories=top_dirs, - recent_symbols=recent_symbols, - ) diff --git a/legacy/python-api/app-root/app/schemas/__init__.py b/legacy/python-api/app-root/app/schemas/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/legacy/python-api/app-root/app/schemas/common.py b/legacy/python-api/app-root/app/schemas/common.py deleted file mode 100644 index d5511be..0000000 --- a/legacy/python-api/app-root/app/schemas/common.py +++ /dev/null @@ -1,9 +0,0 @@ -from pydantic import BaseModel - - -class ErrorResponse(BaseModel): - detail: str - - -class MessageResponse(BaseModel): - message: str diff --git a/legacy/python-api/app-root/app/schemas/indexing.py b/legacy/python-api/app-root/app/schemas/indexing.py deleted file mode 100644 index 0346429..0000000 --- a/legacy/python-api/app-root/app/schemas/indexing.py +++ /dev/null @@ -1,59 +0,0 @@ -from pydantic import BaseModel, Field - - -class IndexRequest(BaseModel): - """DEPRECATED: Use the three-phase protocol instead.""" - full: bool = False - batch_size: int = 20 # files per batch (lower = less memory, slower) - - -class IndexProgressResponse(BaseModel): - status: str # idle|queued|indexing|completed|failed|cancelled - progress: dict | None = None - - -class IndexTriggerResponse(BaseModel): - run_id: str - message: str - - -# --- New three-phase protocol --- - -class IndexBeginRequest(BaseModel): - full: bool = False - - -class IndexBeginResponse(BaseModel): - run_id: str - stored_hashes: dict[str, str] # {file_path: sha256_hash} - - -class FilePayload(BaseModel): - path: str - content: str - content_hash: str # SHA-256 - language: str | None = None - size: int = 0 - - -class IndexFilesRequest(BaseModel): - run_id: str - files: list[FilePayload] = Field(..., max_length=50) - - -class IndexFilesResponse(BaseModel): - files_accepted: int - chunks_created: int - files_processed_total: int - - -class IndexFinishRequest(BaseModel): - run_id: str - deleted_paths: list[str] = [] - total_files_discovered: int = 0 - - -class IndexFinishResponse(BaseModel): - status: str - files_processed: int - chunks_created: int diff --git a/legacy/python-api/app-root/app/schemas/project.py b/legacy/python-api/app-root/app/schemas/project.py deleted file mode 100644 index c71b3fe..0000000 --- a/legacy/python-api/app-root/app/schemas/project.py +++ /dev/null @@ -1,42 +0,0 @@ -from datetime import datetime - -from pydantic import BaseModel, Field - - -class ProjectSettings(BaseModel): - exclude_patterns: list[str] = Field( - default_factory=lambda: ["node_modules", ".git", ".venv", "__pycache__", "dist", "build", ".next", ".cache", ".DS_Store"] - ) - max_file_size: int = 524288 - - -class ProjectStats(BaseModel): - total_files: int = 0 - indexed_files: int = 0 - total_chunks: int = 0 - total_symbols: int = 0 - - -class ProjectCreate(BaseModel): - host_path: str - - -class ProjectUpdate(BaseModel): - settings: ProjectSettings | None = None - - -class ProjectResponse(BaseModel): - host_path: str - container_path: str - languages: list[str] = Field(default_factory=list) - settings: ProjectSettings = Field(default_factory=ProjectSettings) - stats: ProjectStats = Field(default_factory=ProjectStats) - status: str = "created" - created_at: datetime - updated_at: datetime - last_indexed_at: datetime | None = None - - -class ProjectListResponse(BaseModel): - projects: list[ProjectResponse] - total: int diff --git a/legacy/python-api/app-root/app/schemas/search.py b/legacy/python-api/app-root/app/schemas/search.py deleted file mode 100644 index 1a6c8eb..0000000 --- a/legacy/python-api/app-root/app/schemas/search.py +++ /dev/null @@ -1,118 +0,0 @@ -from pydantic import BaseModel, Field - - -class SearchRequest(BaseModel): - query: str - limit: int = 10 - languages: list[str] = Field(default_factory=list) - paths: list[str] = Field(default_factory=list) - min_score: float = 0.1 - - -class SymbolSearchRequest(BaseModel): - query: str - kinds: list[str] = Field(default_factory=list) - limit: int = 20 - - -class FileSearchRequest(BaseModel): - query: str - limit: int = 20 - - -class SearchResultItem(BaseModel): - file_path: str - start_line: int - end_line: int - content: str - score: float - chunk_type: str - symbol_name: str - language: str - - -class SearchResponse(BaseModel): - results: list[SearchResultItem] - total: int - query_time_ms: float - - -class SymbolResultItem(BaseModel): - name: str - kind: str - file_path: str - line: int - end_line: int - language: str - signature: str | None = None - parent_name: str | None = None - - -class SymbolSearchResponse(BaseModel): - results: list[SymbolResultItem] - total: int - - -class FileResultItem(BaseModel): - file_path: str - language: str | None - - -class FileSearchResponse(BaseModel): - results: list[FileResultItem] - total: int - - -class DefinitionRequest(BaseModel): - symbol: str - kind: str | None = None # function|class|method|type - file_path: str | None = None # narrow to a specific file - limit: int = 10 - - -class DefinitionItem(BaseModel): - name: str - kind: str - file_path: str - line: int - end_line: int - language: str - signature: str | None = None - parent_name: str | None = None - - -class DefinitionResponse(BaseModel): - results: list[DefinitionItem] - total: int - - -class ReferenceRequest(BaseModel): - symbol: str - limit: int = 50 - file_path: str | None = None # narrow to a specific file - - -class ReferenceItem(BaseModel): - file_path: str - start_line: int - end_line: int - content: str - chunk_type: str - symbol_name: str - language: str - - -class ReferenceResponse(BaseModel): - results: list[ReferenceItem] - total: int - - -class ProjectSummary(BaseModel): - host_path: str - status: str - languages: list[str] - total_files: int - total_chunks: int - total_symbols: int - top_directories: list[dict] - recent_symbols: list[dict] diff --git a/legacy/python-api/app-root/app/services/__init__.py b/legacy/python-api/app-root/app/services/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/legacy/python-api/app-root/app/services/chunker.py b/legacy/python-api/app-root/app/services/chunker.py deleted file mode 100644 index 8b8a272..0000000 --- a/legacy/python-api/app-root/app/services/chunker.py +++ /dev/null @@ -1,454 +0,0 @@ -import logging -from dataclasses import dataclass - -logger = logging.getLogger(__name__) - -# Tree-sitter node types to extract per language -LANGUAGE_NODES: dict[str, dict[str, list[str]]] = { - "python": { - "function": ["function_definition"], - "class": ["class_definition"], - }, - "typescript": { - "function": ["function_declaration", "arrow_function"], - "class": ["class_declaration"], - "method": ["method_definition"], - "type": ["interface_declaration", "type_alias_declaration"], - }, - "javascript": { - "function": ["function_declaration", "arrow_function"], - "class": ["class_declaration"], - "method": ["method_definition"], - }, - "go": { - "function": ["function_declaration"], - "method": ["method_declaration"], - "type": ["type_spec"], - }, - "rust": { - "function": ["function_item"], - "class": ["struct_item", "enum_item"], - "type": ["trait_item"], - }, - "java": { - "function": ["method_declaration"], - "class": ["class_declaration"], - "type": ["interface_declaration"], - }, -} - -from ..config import settings - -MAX_CHUNK_SIZE = settings.max_chunk_tokens * 4 # chars; 1 token ≈ 4 ASCII chars - -# Identifier leaf-node types per language (for reference extraction) -IDENTIFIER_NODES: dict[str, set[str]] = { - "python": {"identifier"}, - "typescript": {"identifier", "type_identifier", "property_identifier"}, - "javascript": {"identifier", "property_identifier"}, - "go": {"identifier", "type_identifier", "field_identifier"}, - "rust": {"identifier", "type_identifier", "field_identifier"}, - "java": {"identifier", "type_identifier"}, -} - -# Names to skip when extracting references (keywords, builtins, noise) -SKIP_NAMES: set[str] = { - # Python - "self", "cls", "None", "True", "False", "print", "len", "range", "type", - "list", "dict", "set", "tuple", "int", "str", "float", "bool", "bytes", - "object", "Exception", "isinstance", "hasattr", "getattr", "setattr", - # JS/TS - "undefined", "null", "true", "false", "console", "window", "document", - "Array", "Object", "String", "Number", "Boolean", "Promise", "Map", "Set", - # Go - "nil", "fmt", "err", "ctx", - # Rust - "Ok", "Err", "Some", - # Common - "this", "super", "void", -} - -MIN_REF_NAME_LENGTH = 2 - - -@dataclass -class ReferenceInfo: - name: str - file_path: str - line: int # 1-based - col: int # 0-based - language: str - - -@dataclass -class ChunkResult: - chunks: list["CodeChunk"] - references: list[ReferenceInfo] - - -@dataclass -class CodeChunk: - content: str - chunk_type: str # function|class|method|type|module|block - file_path: str - start_line: int - end_line: int - language: str - symbol_name: str | None - symbol_signature: str | None - parent_name: str | None - - -class ChunkerService: - def __init__(self): - self._parsers: dict[str, object] = {} - - def chunk_file(self, file_path: str, content: str, language: str) -> ChunkResult: - try: - return self._chunk_with_treesitter(content, language, file_path) - except Exception as e: - logger.debug("Tree-sitter failed for %s (%s): %s, falling back to sliding window", file_path, language, e) - return ChunkResult( - chunks=self._chunk_sliding_window(content, file_path, language), - references=[], - ) - - # Map cix language names to (module_name, language_function_name). - # Each entry corresponds to a PyPI package tree-sitter-. - _LANGUAGE_BINDINGS: dict[str, tuple[str, str]] = { - "python": ("tree_sitter_python", "language"), - "typescript": ("tree_sitter_typescript", "language_typescript"), - "javascript": ("tree_sitter_javascript", "language"), - "go": ("tree_sitter_go", "language"), - "rust": ("tree_sitter_rust", "language"), - "java": ("tree_sitter_java", "language"), - "c": ("tree_sitter_c", "language"), - "cpp": ("tree_sitter_cpp", "language"), - "c_sharp": ("tree_sitter_c_sharp", "language"), - "ruby": ("tree_sitter_ruby", "language"), - "php": ("tree_sitter_php", "language_php"), - "swift": ("tree_sitter_swift", "language"), - "kotlin": ("tree_sitter_kotlin", "language"), - "scala": ("tree_sitter_scala", "language"), - "bash": ("tree_sitter_bash", "language"), - "html": ("tree_sitter_html", "language"), - "css": ("tree_sitter_css", "language"), - "scss": ("tree_sitter_scss", "language"), - "lua": ("tree_sitter_lua", "language"), - "sql": ("tree_sitter_sql", "language"), - "json": ("tree_sitter_json", "language"), - "yaml": ("tree_sitter_yaml", "language"), - "toml": ("tree_sitter_toml", "language"), - "xml": ("tree_sitter_xml", "language_xml"), - "markdown": ("tree_sitter_markdown", "language"), - "haskell": ("tree_sitter_haskell", "language"), - "ocaml": ("tree_sitter_ocaml", "language_ocaml"), - "hcl": ("tree_sitter_hcl", "language"), - "dart": ("tree_sitter_dart", "language"), - "elixir": ("tree_sitter_elixir", "language"), - "erlang": ("tree_sitter_erlang", "language"), - "zig": ("tree_sitter_zig", "language"), - "julia": ("tree_sitter_julia", "language"), - "r": ("tree_sitter_r", "language"), - "svelte": ("tree_sitter_svelte", "language"), - "graphql": ("tree_sitter_graphql", "language"), - "dockerfile": ("tree_sitter_dockerfile", "language"), - "cmake": ("tree_sitter_cmake", "language"), - "make": ("tree_sitter_make", "language"), - "fortran": ("tree_sitter_fortran", "language"), - "objc": ("tree_sitter_objc", "language"), - "commonlisp": ("tree_sitter_commonlisp", "language"), - "regex": ("tree_sitter_regex", "language"), - } - - def _get_parser(self, language: str): - if language not in self._parsers: - try: - from tree_sitter import Language, Parser - binding = self._LANGUAGE_BINDINGS.get(language) - if not binding: - return None - mod_name, func_name = binding - mod = __import__(mod_name) - lang = Language(getattr(mod, func_name)()) - self._parsers[language] = Parser(lang) - except Exception: - return None - return self._parsers[language] - - def _chunk_with_treesitter(self, content: str, language: str, file_path: str) -> ChunkResult: - parser = self._get_parser(language) - if parser is None: - return ChunkResult( - chunks=self._chunk_sliding_window(content, file_path, language), - references=[], - ) - - tree = parser.parse(content.encode("utf-8")) - node_types = LANGUAGE_NODES.get(language, {}) - if not node_types: - return ChunkResult( - chunks=self._chunk_sliding_window(content, file_path, language), - references=[], - ) - - # Build flat list of all target node types - target_types = set() - type_to_kind: dict[str, str] = {} - for kind, types in node_types.items(): - for t in types: - target_types.add(t) - type_to_kind[t] = kind - - lines = content.split("\n") - chunks: list[CodeChunk] = [] - covered_ranges: list[tuple[int, int]] = [] - - self._extract_nodes( - tree.root_node, target_types, type_to_kind, lines, - file_path, language, chunks, covered_ranges, parent_name=None, - ) - - # Extract references from AST - references = self._extract_references( - tree.root_node, target_types, file_path, language, - ) - - # Collect gaps as module chunks - covered_ranges.sort() - gap_lines = self._find_gaps(covered_ranges, len(lines)) - for start, end in gap_lines: - gap_content = "\n".join(lines[start:end + 1]).strip() - if gap_content: - chunks.append(CodeChunk( - content=gap_content, - chunk_type="module", - file_path=file_path, - start_line=start + 1, - end_line=end + 1, - language=language, - symbol_name=None, - symbol_signature=None, - parent_name=None, - )) - - # Split oversized chunks - final_chunks = [] - for chunk in chunks: - if len(chunk.content) > MAX_CHUNK_SIZE: - final_chunks.extend(self._split_chunk(chunk)) - else: - final_chunks.append(chunk) - - if not final_chunks: - return ChunkResult( - chunks=self._chunk_sliding_window(content, file_path, language), - references=[], - ) - - return ChunkResult(chunks=final_chunks, references=references) - - def _extract_nodes( - self, node, target_types, type_to_kind, lines, - file_path, language, chunks, covered_ranges, parent_name, - ): - if node.type in target_types: - start_line = node.start_point[0] - end_line = node.end_point[0] - content = "\n".join(lines[start_line:end_line + 1]) - kind = type_to_kind[node.type] - - # Detect if this is a method (function inside a class) - actual_kind = kind - if kind == "function" and parent_name is not None: - actual_kind = "method" - - # Extract symbol name - symbol_name = self._extract_name(node) - - # Extract signature (first line) - signature = lines[start_line].strip() if start_line < len(lines) else None - - chunks.append(CodeChunk( - content=content, - chunk_type=actual_kind, - file_path=file_path, - start_line=start_line + 1, - end_line=end_line + 1, - language=language, - symbol_name=symbol_name, - symbol_signature=signature, - parent_name=parent_name, - )) - covered_ranges.append((start_line, end_line)) - - # For classes, recurse with class name as parent - if kind == "class": - current_parent = symbol_name or parent_name - for child in node.children: - self._extract_nodes( - child, target_types, type_to_kind, lines, - file_path, language, chunks, covered_ranges, - parent_name=current_parent, - ) - return - - for child in node.children: - self._extract_nodes( - child, target_types, type_to_kind, lines, - file_path, language, chunks, covered_ranges, - parent_name=parent_name, - ) - - def _extract_references( - self, root_node, target_types: set, file_path: str, language: str, - ) -> list[ReferenceInfo]: - """Walk AST and collect identifier nodes that are usages (not definitions).""" - id_node_types = IDENTIFIER_NODES.get(language) - if not id_node_types: - return [] - - refs: list[ReferenceInfo] = [] - seen: set[tuple[str, int, int]] = set() - - def _walk(node): - if node.type in id_node_types: - name = node.text.decode("utf-8") if isinstance(node.text, bytes) else node.text - if ( - name - and len(name) >= MIN_REF_NAME_LENGTH - and name not in SKIP_NAMES - ): - # Skip if this identifier is the name child of a definition node - parent = node.parent - if parent and parent.type in target_types: - # Check if this is the "name" child (first identifier) - is_def_name = False - for child in parent.children: - if child.type in id_node_types: - is_def_name = (child.id == node.id) - break - if is_def_name: - return - - line = node.start_point[0] + 1 # 1-based - col = node.start_point[1] # 0-based - key = (name, line, col) - if key not in seen: - seen.add(key) - refs.append(ReferenceInfo( - name=name, - file_path=file_path, - line=line, - col=col, - language=language, - )) - return # leaf node, no children to recurse - - for child in node.children: - _walk(child) - - _walk(root_node) - return refs - - @staticmethod - def _extract_name(node) -> str | None: - for child in node.children: - if child.type in ("identifier", "name", "property_identifier", "type_identifier"): - return child.text.decode("utf-8") if isinstance(child.text, bytes) else child.text - return None - - @staticmethod - def _find_gaps(covered: list[tuple[int, int]], total_lines: int) -> list[tuple[int, int]]: - if not covered: - return [(0, total_lines - 1)] if total_lines > 0 else [] - - gaps = [] - prev_end = -1 - for start, end in covered: - if start > prev_end + 1: - gaps.append((prev_end + 1, start - 1)) - prev_end = max(prev_end, end) - if prev_end < total_lines - 1: - gaps.append((prev_end + 1, total_lines - 1)) - return gaps - - @staticmethod - def _split_chunk(chunk: CodeChunk) -> list[CodeChunk]: - lines = chunk.content.split("\n") - sub_chunks = [] - current_lines = [] - current_start = chunk.start_line - - for i, line in enumerate(lines): - current_lines.append(line) - current_content = "\n".join(current_lines) - if len(current_content) >= MAX_CHUNK_SIZE and len(current_lines) > 1: - # Split here - split_content = "\n".join(current_lines[:-1]) - sub_chunks.append(CodeChunk( - content=split_content, - chunk_type=chunk.chunk_type, - file_path=chunk.file_path, - start_line=current_start, - end_line=current_start + len(current_lines) - 2, - language=chunk.language, - symbol_name=chunk.symbol_name, - symbol_signature=chunk.symbol_signature, - parent_name=chunk.parent_name, - )) - current_start = current_start + len(current_lines) - 1 - current_lines = [line] - - if current_lines: - sub_chunks.append(CodeChunk( - content="\n".join(current_lines), - chunk_type=chunk.chunk_type, - file_path=chunk.file_path, - start_line=current_start, - end_line=chunk.end_line, - language=chunk.language, - symbol_name=chunk.symbol_name, - symbol_signature=chunk.symbol_signature, - parent_name=chunk.parent_name, - )) - - return sub_chunks - - def _chunk_sliding_window(self, content: str, file_path: str, language: str) -> list[CodeChunk]: - window_size = 4000 # chars (~1000 tokens) - overlap = 500 # chars (~125 tokens) - chunks = [] - - lines = content.split("\n") - current_pos = 0 - chunk_start_line = 0 - - while current_pos < len(content): - end_pos = min(current_pos + window_size, len(content)) - chunk_content = content[current_pos:end_pos] - - # Count lines - start_line = content[:current_pos].count("\n") - end_line = content[:end_pos].count("\n") - - chunks.append(CodeChunk( - content=chunk_content, - chunk_type="block", - file_path=file_path, - start_line=start_line + 1, - end_line=end_line + 1, - language=language, - symbol_name=None, - symbol_signature=None, - parent_name=None, - )) - - if end_pos >= len(content): - break - current_pos = end_pos - overlap - - return chunks - - -chunker_service = ChunkerService() diff --git a/legacy/python-api/app-root/app/services/embeddings.py b/legacy/python-api/app-root/app/services/embeddings.py deleted file mode 100644 index f76d5cc..0000000 --- a/legacy/python-api/app-root/app/services/embeddings.py +++ /dev/null @@ -1,184 +0,0 @@ -import asyncio -import logging -import os -import platform -import subprocess -import time as _time -from concurrent.futures import ThreadPoolExecutor -from typing import Any - -from ..config import settings - -logger = logging.getLogger(__name__) - -_AVG_BATCH_SEC_DEFAULT = 3.0 -_EMA_ALPHA = 0.25 - -# Models that require a query prefix for asymmetric retrieval. -QUERY_PREFIX_MODELS = { - "nomic-ai/CodeRankEmbed": "Represent this query for searching relevant code: ", - "nomic-ai/nomic-embed-text-v1.5": "search_query: ", - "BAAI/bge-base-en-v1.5": "Represent this sentence for searching relevant passages: ", - "BAAI/bge-large-en-v1.5": "Represent this sentence for searching relevant passages: ", - "awhiteside/CodeRankEmbed-Q8_0-GGUF": "Represent this query for searching relevant code: ", -} - - -def _resolve_query_prefix(model_name: str) -> str: - if model_name in QUERY_PREFIX_MODELS: - return QUERY_PREFIX_MODELS[model_name] - lowered = model_name.lower() - if "coderankembed" in lowered: - return QUERY_PREFIX_MODELS["nomic-ai/CodeRankEmbed"] - if "nomic-embed-text" in lowered: - return QUERY_PREFIX_MODELS["nomic-ai/nomic-embed-text-v1.5"] - if "bge-base" in lowered: - return QUERY_PREFIX_MODELS["BAAI/bge-base-en-v1.5"] - if "bge-large" in lowered: - return QUERY_PREFIX_MODELS["BAAI/bge-large-en-v1.5"] - return "" - - -def _detect_gpu_layers() -> int: - # Explicit override wins — e.g. CIX_N_GPU_LAYERS=0 forces CPU on a GPU box. - explicit = os.environ.get("CIX_N_GPU_LAYERS") - if explicit is not None: - return int(explicit) - # macOS: llama-cpp-python pip wheel ships with Metal enabled. - if platform.system() == "Darwin": - return -1 - # Linux: if nvidia-smi responds, llama.cpp was built against CUDA (Dockerfile.cuda). - try: - subprocess.run( - ["nvidia-smi"], - capture_output=True, - timeout=1, - check=True, - ) - return -1 - except (FileNotFoundError, subprocess.CalledProcessError, subprocess.TimeoutExpired): - return 0 - - -class EmbeddingBusyError(RuntimeError): - """Raised when the embedding queue is full and the request timed out waiting.""" - - def __init__(self, message: str, retry_after: int = 5) -> None: - super().__init__(message) - self.retry_after = retry_after - - -class EmbeddingService: - def __init__(self): - self._model: Any = None - self._executor = ThreadPoolExecutor( - max_workers=max(1, settings.max_embedding_concurrency) - ) - self._query_prefix = "" - self._semaphore = asyncio.Semaphore(settings.max_embedding_concurrency) - self._avg_batch_sec: float = _AVG_BATCH_SEC_DEFAULT - self._estimated_finish_at: float = 0.0 - - async def load_model(self): - loop = asyncio.get_event_loop() - self._model = await loop.run_in_executor( - self._executor, self._load_model_sync - ) - self._query_prefix = _resolve_query_prefix(settings.embedding_model) - - logger.info( - "Embedding model loaded: %s (dims=%d, query_prefix=%r)", - settings.embedding_model, - self._model.n_embd(), - self._query_prefix, - ) - - def _load_model_sync(self): - os.environ["TOKENIZERS_PARALLELISM"] = "false" - os.environ.setdefault("OMP_NUM_THREADS", str(os.cpu_count() or 2)) - - from huggingface_hub import hf_hub_download, list_repo_files - from llama_cpp import Llama - - model_path = settings.embedding_model - - if "/" in model_path and not os.path.exists(model_path): - logger.info("Downloading GGUF model from Hugging Face: %s", model_path) - files = list_repo_files(model_path) - gguf_file = next((f for f in files if f.endswith(".gguf")), None) - if not gguf_file: - raise ValueError( - f"No .gguf file found in repo {model_path}. " - "Only GGUF repositories are supported." - ) - model_path = hf_hub_download(repo_id=model_path, filename=gguf_file) - - n_gpu_layers = _detect_gpu_layers() - logger.info( - "Loading Llama (n_ctx=%d, n_gpu_layers=%d)", - settings.max_chunk_tokens + 128, - n_gpu_layers, - ) - - return Llama( - model_path=model_path, - embedding=True, - n_ctx=settings.max_chunk_tokens + 128, - n_threads=int(os.environ.get("OMP_NUM_THREADS", "4")), - n_gpu_layers=n_gpu_layers, - verbose=False, - ) - - async def embed_texts(self, texts: list[str]) -> list[list[float]]: - if not self._model: - raise RuntimeError("Model not loaded") - - timeout = settings.embedding_queue_timeout - try: - async with asyncio.timeout(timeout if timeout > 0 else 0): - async with self._semaphore: - return await self._embed_locked(texts) - except TimeoutError: - retry_after = max(5, int(self._estimated_finish_at - _time.monotonic())) - raise EmbeddingBusyError( - f"Queue is full — request waited {timeout}s without a free slot", - retry_after=retry_after, - ) - - async def _embed_locked(self, texts: list[str]) -> list[list[float]]: - if not texts: - return [] - - self._estimated_finish_at = _time.monotonic() + self._avg_batch_sec - loop = asyncio.get_event_loop() - t0 = _time.monotonic() - - result = await loop.run_in_executor( - self._executor, - lambda: self._model.create_embedding(texts), - ) - - batch_sec = _time.monotonic() - t0 - self._avg_batch_sec = ( - (1 - _EMA_ALPHA) * self._avg_batch_sec + _EMA_ALPHA * batch_sec - ) - self._estimated_finish_at = 0.0 - - logger.debug("Embedded %d texts in %.2fs", len(texts), batch_sec) - return [item["embedding"] for item in result["data"]] - - async def embed_query(self, query: str) -> list[float]: - if not self._model: - raise RuntimeError("Model not loaded") - - prefixed_query = self._query_prefix + query - loop = asyncio.get_event_loop() - - result = await loop.run_in_executor( - self._executor, - lambda: self._model.create_embedding(prefixed_query), - ) - return result["data"][0]["embedding"] - - -embedding_service = EmbeddingService() diff --git a/legacy/python-api/app-root/app/services/file_discovery.py b/legacy/python-api/app-root/app/services/file_discovery.py deleted file mode 100644 index a837262..0000000 --- a/legacy/python-api/app-root/app/services/file_discovery.py +++ /dev/null @@ -1,121 +0,0 @@ -import hashlib -from dataclasses import dataclass -from pathlib import Path - -import pathspec - -from ..core.language import detect_language -from .project_config import load_project_config, parse_submodule_paths - - -@dataclass -class DiscoveredFile: - path: str # container path - host_path: str # original host path - size: int - content_hash: str # SHA256 - language: str | None # detected from extension - - -class FileDiscoveryService: - def discover( - self, - project_container_path: str, - exclude_patterns: list[str], - max_file_size: int, - ) -> list[DiscoveredFile]: - root = Path(project_container_path) - if not root.exists(): - return [] - - # Load .gitignore and .cixignore if present (same format, merged) - ignore_patterns: list[str] = [] - for ignore_file in (".gitignore", ".cixignore"): - ignore_path = root / ignore_file - if ignore_path.exists(): - with open(ignore_path, "r", errors="ignore") as f: - ignore_patterns.extend(f.readlines()) - - # Load .cixconfig.yaml — if ignore.submodules is true, exclude submodule paths - proj_cfg = load_project_config(project_container_path) - if proj_cfg.ignore.submodules: - for sp in parse_submodule_paths(project_container_path): - ignore_patterns.append(sp + "/\n") - - ignore_spec = pathspec.PathSpec.from_lines("gitwildmatch", ignore_patterns) if ignore_patterns else None - - discovered = [] - exclude_set = set(exclude_patterns) - - for file_path in root.rglob("*"): - if not file_path.is_file(): - continue - - # Check excluded directory names - parts = file_path.relative_to(root).parts - if any(part in exclude_set for part in parts): - continue - - # Check .gitignore / .cixignore - relative = str(file_path.relative_to(root)) - if ignore_spec and ignore_spec.match_file(relative): - continue - - # Check file size - try: - size = file_path.stat().st_size - except OSError: - continue - if size > max_file_size or size == 0: - continue - - # Detect language - language = detect_language(str(file_path)) - - # Compute hash - try: - content_hash = self._hash_file(file_path) - except OSError: - continue - - host_path = str(file_path) - - discovered.append( - DiscoveredFile( - path=str(file_path), - host_path=host_path, - size=size, - content_hash=content_hash, - language=language, - ) - ) - - return discovered - - def get_changed_files( - self, - discovered: list[DiscoveredFile], - stored_hashes: dict[str, str], - ) -> tuple[list[DiscoveredFile], list[str]]: - changed_or_new = [] - current_paths = set() - - for f in discovered: - current_paths.add(f.host_path) - stored_hash = stored_hashes.get(f.host_path) - if stored_hash is None or stored_hash != f.content_hash: - changed_or_new.append(f) - - deleted = [p for p in stored_hashes if p not in current_paths] - return changed_or_new, deleted - - @staticmethod - def _hash_file(path: Path) -> str: - h = hashlib.sha256() - with open(path, "rb") as f: - for chunk in iter(lambda: f.read(8192), b""): - h.update(chunk) - return h.hexdigest() - - -file_discovery_service = FileDiscoveryService() diff --git a/legacy/python-api/app-root/app/services/indexer.py b/legacy/python-api/app-root/app/services/indexer.py deleted file mode 100644 index 885e53c..0000000 --- a/legacy/python-api/app-root/app/services/indexer.py +++ /dev/null @@ -1,614 +0,0 @@ -import asyncio -import gc -import json -import logging -import time -import uuid -from dataclasses import dataclass, field -from datetime import datetime, timezone - -from ..config import settings -from ..database import get_db -from .chunker import chunker_service -from .embeddings import embedding_service -from .file_discovery import file_discovery_service -from .reference_index import reference_index_service -from .symbol_index import SymbolInfo, symbol_index_service -from .vector_store import vector_store_service - -logger = logging.getLogger(__name__) - - -@dataclass -class IndexProgress: - run_id: str - project_path: str - status: str = "queued" # queued|indexing|completed|failed|cancelled - phase: str = "queued" # queued|discovering|chunking|embedding|storing|completed - files_discovered: int = 0 - files_processed: int = 0 - files_total: int = 0 - chunks_created: int = 0 - elapsed_seconds: float = 0 - estimated_remaining: float = 0 - error_message: str | None = None - - -@dataclass -class SessionState: - run_id: str - project_path: str - files_processed: int = 0 - chunks_created: int = 0 - languages_seen: set = field(default_factory=set) - start_time: float = field(default_factory=time.time) - status: str = "active" - - -class IndexerService: - def __init__(self): - self._active_jobs: dict[str, IndexProgress] = {} - self._cancel_events: dict[str, asyncio.Event] = {} - self._active_sessions: dict[str, SessionState] = {} # run_id -> SessionState - - # ---- New three-phase protocol ---- - - async def begin_indexing(self, project_path: str, full: bool = False) -> tuple[str, dict[str, str]]: - """Phase 1: Create indexing session, return stored hashes.""" - run_id = str(uuid.uuid4()) - db = await get_db() - now = datetime.now(timezone.utc).isoformat() - - await db.execute( - "INSERT INTO index_runs (id, project_path, started_at, status) VALUES (?, ?, ?, ?)", - (run_id, project_path, now, "running"), - ) - await db.execute( - "UPDATE projects SET status = 'indexing', updated_at = ? WHERE host_path = ?", - (now, project_path), - ) - await db.commit() - - stored_hashes: dict[str, str] = {} - - if full: - vector_store_service.delete_collection(project_path) - await db.execute("DELETE FROM file_hashes WHERE project_path = ?", (project_path,)) - await db.execute("DELETE FROM symbols WHERE project_path = ?", (project_path,)) - await db.execute("DELETE FROM refs WHERE project_path = ?", (project_path,)) - await db.commit() - else: - cursor = await db.execute( - "SELECT file_path, content_hash FROM file_hashes WHERE project_path = ?", - (project_path,), - ) - rows = await cursor.fetchall() - stored_hashes = {row["file_path"]: row["content_hash"] for row in rows} - - session = SessionState(run_id=run_id, project_path=project_path) - self._active_sessions[run_id] = session - - progress = IndexProgress(run_id=run_id, project_path=project_path, status="indexing", phase="receiving") - self._active_jobs[project_path] = progress - - asyncio.create_task(self._session_ttl_cleanup(run_id)) - - return run_id, stored_hashes - - async def process_files(self, project_path: str, run_id: str, files: list) -> tuple[int, int, int]: - """Phase 2: Process a batch of files (chunk, embed, store). Synchronous within request.""" - session = self._active_sessions.get(run_id) - if not session: - raise ValueError(f"No active session for run_id {run_id}") - if session.project_path != project_path: - raise ValueError("run_id does not match project") - - logger.info("Processing batch of %d files for session %s", len(files), run_id) - - db = await get_db() - now = datetime.now(timezone.utc).isoformat() - files_accepted = 0 - batch_chunks = 0 - batch_symbols: list[SymbolInfo] = [] - batch_references = [] - - for file_payload in files: - try: - content = file_payload.content - if not content.strip(): - continue - - language = file_payload.language or "text" - session.languages_seen.add(language) - - result = chunker_service.chunk_file(file_payload.path, content, language) - chunks = result.chunks - if not chunks: - continue - - for chunk in chunks: - if chunk.symbol_name and chunk.chunk_type in ( - "function", "class", "method", "type" - ): - batch_symbols.append( - SymbolInfo( - name=chunk.symbol_name, - kind=chunk.chunk_type, - file_path=chunk.file_path, - line=chunk.start_line, - end_line=chunk.end_line, - language=chunk.language, - signature=chunk.symbol_signature, - parent_name=chunk.parent_name, - ) - ) - - batch_references.extend(result.references) - - texts = [f"{c.chunk_type}: {c.content}" for c in chunks] - embeddings = await embedding_service.embed_texts(texts) - - # Delete old chunks, symbols, and references BEFORE inserting new ones - await vector_store_service.delete_by_file(project_path, file_payload.path) - await symbol_index_service.delete_by_file(project_path, file_payload.path) - await reference_index_service.delete_by_file(project_path, file_payload.path) - - await vector_store_service.upsert_chunks(project_path, chunks, embeddings) - batch_chunks += len(chunks) - - await db.execute( - """INSERT OR REPLACE INTO file_hashes - (project_path, file_path, content_hash, indexed_at) - VALUES (?, ?, ?, ?)""", - (project_path, file_payload.path, file_payload.content_hash, now), - ) - files_accepted += 1 - - except Exception as e: - logger.error("Error processing %s: %s", file_payload.path, e) - continue - - if batch_symbols: - await symbol_index_service.upsert_symbols(project_path, batch_symbols) - if batch_references: - await reference_index_service.upsert_references(project_path, batch_references) - await db.commit() - gc.collect() - - session.files_processed += files_accepted - session.chunks_created += batch_chunks - - progress = self._active_jobs.get(project_path) - if progress: - progress.files_processed = session.files_processed - progress.chunks_created = session.chunks_created - progress.elapsed_seconds = time.time() - session.start_time - - logger.info( - "Batch done: %d files accepted, %d chunks. Total: %d files, %d chunks", - files_accepted, batch_chunks, session.files_processed, session.chunks_created, - ) - - return files_accepted, batch_chunks, session.files_processed - - async def finish_indexing( - self, project_path: str, run_id: str, - deleted_paths: list[str], total_files_discovered: int, - ) -> tuple[str, int, int]: - """Phase 3: Clean up deleted files, update project stats, close session.""" - session = self._active_sessions.get(run_id) - if not session: - raise ValueError(f"No active session for run_id {run_id}") - if session.project_path != project_path: - raise ValueError("run_id does not match project") - - db = await get_db() - now = datetime.now(timezone.utc).isoformat() - - for del_path in deleted_paths: - await vector_store_service.delete_by_file(project_path, del_path) - await symbol_index_service.delete_by_file(project_path, del_path) - await reference_index_service.delete_by_file(project_path, del_path) - await db.execute( - "DELETE FROM file_hashes WHERE project_path = ? AND file_path = ?", - (project_path, del_path), - ) - - # Compute accurate stats from DB (not just this session) - cursor = await db.execute( - "SELECT COUNT(*) as cnt FROM file_hashes WHERE project_path = ?", - (project_path,), - ) - row = await cursor.fetchone() - total_indexed_files = row["cnt"] if row else session.files_processed - - cursor = await db.execute( - "SELECT COUNT(*) as cnt FROM symbols WHERE project_path = ?", - (project_path,), - ) - row = await cursor.fetchone() - total_symbols = row["cnt"] if row else 0 - - # Get total chunks from vector store collection - try: - collection = vector_store_service.get_or_create_collection(project_path) - total_chunks = collection.count() - except Exception: - total_chunks = session.chunks_created - - # Collect all languages from indexed files - from ..core.language import detect_language - cursor = await db.execute( - "SELECT file_path FROM file_hashes WHERE project_path = ?", - (project_path,), - ) - all_files = await cursor.fetchall() - all_languages: set[str] = set() - for f in all_files: - lang = detect_language(f["file_path"]) - if lang: - all_languages.add(lang) - - stats = { - "total_files": total_files_discovered, - "indexed_files": total_indexed_files, - "total_chunks": total_chunks, - "total_symbols": total_symbols, - } - await db.execute( - """UPDATE projects - SET stats = ?, languages = ?, status = 'indexed', - last_indexed_at = ?, updated_at = ? - WHERE host_path = ?""", - ( - json.dumps(stats), - json.dumps(sorted(all_languages)), - now, now, project_path, - ), - ) - - await db.execute( - """UPDATE index_runs - SET status = 'completed', completed_at = ?, - files_processed = ?, chunks_created = ? - WHERE id = ?""", - (now, session.files_processed, session.chunks_created, run_id), - ) - await db.commit() - - progress = self._active_jobs.get(project_path) - if progress: - progress.status = "completed" - progress.phase = "completed" - - session.status = "completed" - - async def _cleanup(): - await asyncio.sleep(60) - self._active_sessions.pop(run_id, None) - self._active_jobs.pop(project_path, None) - - asyncio.create_task(_cleanup()) - - return "completed", session.files_processed, session.chunks_created - - async def _session_ttl_cleanup(self, run_id: str): - """Remove stale sessions after 1 hour.""" - await asyncio.sleep(3600) - session = self._active_sessions.pop(run_id, None) - if session and session.status == "active": - logger.warning("Session %s timed out, cleaning up", run_id) - self._active_jobs.pop(session.project_path, None) - - # ---- Legacy methods ---- - - async def start_indexing(self, project_path: str, full: bool = False, batch_size: int = 20) -> str: - if project_path in self._active_jobs: - existing = self._active_jobs[project_path] - if existing.status in ("queued", "indexing"): - return existing.run_id - - run_id = str(uuid.uuid4()) - progress = IndexProgress(run_id=run_id, project_path=project_path) - self._active_jobs[project_path] = progress - - cancel_event = asyncio.Event() - self._cancel_events[project_path] = cancel_event - - # Record run in DB - db = await get_db() - now = datetime.now(timezone.utc).isoformat() - await db.execute( - "INSERT INTO index_runs (id, project_path, started_at, status) VALUES (?, ?, ?, ?)", - (run_id, project_path, now, "running"), - ) - await db.execute( - "UPDATE projects SET status = 'indexing', updated_at = ? WHERE host_path = ?", - (now, project_path), - ) - await db.commit() - - asyncio.create_task(self._run_pipeline(project_path, run_id, cancel_event, full, batch_size)) - return run_id - - async def get_progress(self, project_path: str) -> IndexProgress | None: - return self._active_jobs.get(project_path) - - async def cancel(self, project_path: str) -> bool: - event = self._cancel_events.get(project_path) - if event: - event.set() - return True - return False - - async def _run_pipeline( - self, project_path: str, run_id: str, - cancel_event: asyncio.Event, full: bool = False, - batch_size: int = 20, - ): - progress = self._active_jobs[project_path] - progress.status = "indexing" - start_time = time.time() - - try: - db = await get_db() - - # Get project info - cursor = await db.execute( - "SELECT * FROM projects WHERE host_path = ?", (project_path,) - ) - project = await cursor.fetchone() - if not project: - raise ValueError(f"Project {project_path} not found") - - container_path = project["container_path"] - proj_settings = json.loads(project["settings"]) - exclude_patterns = proj_settings.get( - "exclude_patterns", settings.excluded_dirs_list - ) - max_file_size = proj_settings.get("max_file_size", settings.max_file_size) - - # Phase 1: Discover files - progress.phase = "discovering" - discovered = await asyncio.get_event_loop().run_in_executor( - None, - lambda: file_discovery_service.discover( - container_path, exclude_patterns, max_file_size - ), - ) - progress.files_discovered = len(discovered) - - if cancel_event.is_set(): - await self._finish_run(project_path, run_id, "cancelled", progress) - return - - # Get stored hashes for incremental - if not full: - cursor = await db.execute( - "SELECT file_path, content_hash FROM file_hashes WHERE project_path = ?", - (project_path,), - ) - rows = await cursor.fetchall() - stored_hashes = {row["file_path"]: row["content_hash"] for row in rows} - to_process, deleted = file_discovery_service.get_changed_files( - discovered, stored_hashes - ) - - # Remove deleted files - for del_path in deleted: - await vector_store_service.delete_by_file(project_path, del_path) - await symbol_index_service.delete_by_file(project_path, del_path) - await reference_index_service.delete_by_file(project_path, del_path) - await db.execute( - "DELETE FROM file_hashes WHERE project_path = ? AND file_path = ?", - (project_path, del_path), - ) - else: - to_process = discovered - # Clear all existing data for full reindex - vector_store_service.delete_collection(project_path) - await db.execute( - "DELETE FROM file_hashes WHERE project_path = ?", (project_path,) - ) - await db.execute( - "DELETE FROM symbols WHERE project_path = ?", (project_path,) - ) - await db.execute( - "DELETE FROM refs WHERE project_path = ?", (project_path,) - ) - - progress.files_total = len(to_process) - files_discovered_count = len(discovered) - # Free discovery data — no longer needed - del discovered - gc.collect() - - await db.execute( - "UPDATE index_runs SET files_total = ? WHERE id = ?", - (len(to_process), run_id), - ) - await db.commit() - - if not to_process: - await self._finish_run(project_path, run_id, "completed", progress) - return - - # Phase 2-4: Process files in batches to limit memory usage - BATCH_COMMIT_SIZE = max(1, batch_size) # commit DB and flush symbols every N files - batch_symbols: list[SymbolInfo] = [] - batch_references = [] - total_chunks = 0 - total_symbols = 0 - now = datetime.now(timezone.utc).isoformat() - languages_seen: set[str] = set() - - for i, file_info in enumerate(to_process): - if cancel_event.is_set(): - await self._finish_run(project_path, run_id, "cancelled", progress) - return - - progress.phase = "chunking" - progress.files_processed = i - progress.elapsed_seconds = time.time() - start_time - if i > 0: - progress.estimated_remaining = ( - progress.elapsed_seconds / i * (len(to_process) - i) - ) - - try: - # Read file - with open(file_info.path, "r", errors="ignore") as f: - content = f.read() - - if not content.strip(): - continue - - language = file_info.language or "text" - languages_seen.add(language) - - # Chunk - result = chunker_service.chunk_file( - file_info.host_path, content, language - ) - chunks = result.chunks - if not chunks: - continue - - # Collect symbols from this file - for chunk in chunks: - if chunk.symbol_name and chunk.chunk_type in ( - "function", "class", "method", "type" - ): - batch_symbols.append( - SymbolInfo( - name=chunk.symbol_name, - kind=chunk.chunk_type, - file_path=chunk.file_path, - line=chunk.start_line, - end_line=chunk.end_line, - language=chunk.language, - signature=chunk.symbol_signature, - parent_name=chunk.parent_name, - ) - ) - - batch_references.extend(result.references) - - # Embed - progress.phase = "embedding" - texts = [ - f"{c.chunk_type}: {c.content}" for c in chunks - ] - embeddings = await embedding_service.embed_texts(texts) - - # Delete old data BEFORE inserting new - progress.phase = "storing" - await vector_store_service.delete_by_file(project_path, file_info.host_path) - await symbol_index_service.delete_by_file(project_path, file_info.host_path) - await reference_index_service.delete_by_file(project_path, file_info.host_path) - - # Store in vector DB - await vector_store_service.upsert_chunks( - project_path, chunks, embeddings - ) - - total_chunks += len(chunks) - progress.chunks_created = total_chunks - - # Update file hash - await db.execute( - """INSERT OR REPLACE INTO file_hashes - (project_path, file_path, content_hash, indexed_at) - VALUES (?, ?, ?, ?)""", - (project_path, file_info.host_path, file_info.content_hash, now), - ) - - except Exception as e: - logger.error("Error processing %s: %s", file_info.path, e) - continue - - # Flush batch: commit DB, store symbols/refs, free memory every N files - if (i + 1) % BATCH_COMMIT_SIZE == 0: - if batch_symbols: - await symbol_index_service.upsert_symbols(project_path, batch_symbols) - total_symbols += len(batch_symbols) - batch_symbols = [] - if batch_references: - await reference_index_service.upsert_references(project_path, batch_references) - batch_references = [] - await db.commit() - gc.collect() - logger.debug("Batch committed: %d/%d files", i + 1, len(to_process)) - - # Flush remaining symbols and references - if batch_symbols: - await symbol_index_service.upsert_symbols(project_path, batch_symbols) - total_symbols += len(batch_symbols) - if batch_references: - await reference_index_service.upsert_references(project_path, batch_references) - - # Update project stats - progress.files_processed = len(to_process) - stats = { - "total_files": files_discovered_count, - "indexed_files": progress.files_processed, - "total_chunks": total_chunks, - "total_symbols": total_symbols, - } - await db.execute( - """UPDATE projects - SET stats = ?, languages = ?, status = 'indexed', - last_indexed_at = ?, updated_at = ? - WHERE host_path = ?""", - ( - json.dumps(stats), - json.dumps(sorted(languages_seen)), - now, - now, - project_path, - ), - ) - await db.commit() - - await self._finish_run(project_path, run_id, "completed", progress) - - except Exception as e: - logger.exception("Indexing failed for project %s", project_path) - progress.status = "failed" - progress.error_message = str(e) - await self._finish_run(project_path, run_id, "failed", progress, str(e)) - - async def _finish_run( - self, project_path: str, run_id: str, status: str, - progress: IndexProgress, error: str | None = None, - ): - progress.status = status - progress.phase = "completed" if status == "completed" else status - now = datetime.now(timezone.utc).isoformat() - - db = await get_db() - await db.execute( - """UPDATE index_runs - SET status = ?, completed_at = ?, files_processed = ?, - chunks_created = ?, error_message = ? - WHERE id = ?""", - (status, now, progress.files_processed, progress.chunks_created, error, run_id), - ) - - if status != "completed": - await db.execute( - "UPDATE projects SET status = ?, updated_at = ? WHERE host_path = ?", - ("error" if status == "failed" else status, now, project_path), - ) - await db.commit() - - # Clean up after a delay - async def _cleanup(): - await asyncio.sleep(60) - self._active_jobs.pop(project_path, None) - self._cancel_events.pop(project_path, None) - - asyncio.create_task(_cleanup()) - - -indexer_service = IndexerService() diff --git a/legacy/python-api/app-root/app/services/project_config.py b/legacy/python-api/app-root/app/services/project_config.py deleted file mode 100644 index e6d425b..0000000 --- a/legacy/python-api/app-root/app/services/project_config.py +++ /dev/null @@ -1,61 +0,0 @@ -import re -from dataclasses import dataclass, field -from pathlib import Path - -import yaml - - -@dataclass -class IgnoreConfig: - submodules: bool = False - - -@dataclass -class ProjectConfig: - ignore: IgnoreConfig = field(default_factory=IgnoreConfig) - - -def load_project_config(project_root: str) -> ProjectConfig: - """Load .cixconfig.yaml from the project root. - Returns default config if the file does not exist.""" - config_path = Path(project_root) / ".cixconfig.yaml" - if not config_path.exists(): - return ProjectConfig() - - try: - data = yaml.safe_load(config_path.read_text()) - except Exception: - return ProjectConfig() - - if not isinstance(data, dict): - return ProjectConfig() - - ignore_data = data.get("ignore", {}) - return ProjectConfig( - ignore=IgnoreConfig( - submodules=bool(ignore_data.get("submodules", False)), - ) - ) - - -def parse_submodule_paths(project_root: str) -> list[str]: - """Parse .gitmodules and return list of submodule paths. - Returns empty list if .gitmodules does not exist.""" - gitmodules_path = Path(project_root) / ".gitmodules" - if not gitmodules_path.exists(): - return [] - - paths: list[str] = [] - try: - for line in gitmodules_path.read_text().splitlines(): - line = line.strip() - if line.startswith("path"): - parts = line.split("=", 1) - if len(parts) == 2: - p = parts[1].strip() - if p: - paths.append(p) - except Exception: - pass - - return paths \ No newline at end of file diff --git a/legacy/python-api/app-root/app/services/reference_index.py b/legacy/python-api/app-root/app/services/reference_index.py deleted file mode 100644 index bff60f4..0000000 --- a/legacy/python-api/app-root/app/services/reference_index.py +++ /dev/null @@ -1,70 +0,0 @@ -from dataclasses import dataclass - -from ..database import get_db -from .chunker import ReferenceInfo - - -class ReferenceIndexService: - async def upsert_references(self, project_path: str, refs: list[ReferenceInfo]): - if not refs: - return - db = await get_db() - await db.executemany( - """INSERT INTO refs (project_path, name, file_path, line, col, language) - VALUES (?, ?, ?, ?, ?, ?)""", - [ - (project_path, r.name, r.file_path, r.line, r.col, r.language) - for r in refs - ], - ) - await db.commit() - - async def delete_by_file(self, project_path: str, file_path: str): - db = await get_db() - await db.execute( - "DELETE FROM refs WHERE project_path = ? AND file_path = ?", - (project_path, file_path), - ) - await db.commit() - - async def delete_by_project(self, project_path: str): - db = await get_db() - await db.execute( - "DELETE FROM refs WHERE project_path = ?", - (project_path,), - ) - await db.commit() - - async def search( - self, - project_path: str, - name: str, - file_path: str | None = None, - limit: int = 50, - ) -> list[ReferenceInfo]: - db = await get_db() - sql = "SELECT name, file_path, line, col, language FROM refs WHERE project_path = ? AND name = ?" - params: list = [project_path, name] - - if file_path: - sql += " AND file_path = ?" - params.append(file_path) - - sql += " ORDER BY file_path, line LIMIT ?" - params.append(limit) - - cursor = await db.execute(sql, params) - rows = await cursor.fetchall() - return [ - ReferenceInfo( - name=row["name"], - file_path=row["file_path"], - line=row["line"], - col=row["col"], - language=row["language"], - ) - for row in rows - ] - - -reference_index_service = ReferenceIndexService() \ No newline at end of file diff --git a/legacy/python-api/app-root/app/services/symbol_index.py b/legacy/python-api/app-root/app/services/symbol_index.py deleted file mode 100644 index b5276c2..0000000 --- a/legacy/python-api/app-root/app/services/symbol_index.py +++ /dev/null @@ -1,119 +0,0 @@ -import uuid -from dataclasses import dataclass - -from ..database import get_db - - -@dataclass -class SymbolInfo: - name: str - kind: str # function|class|method|type - file_path: str # host path - line: int - end_line: int - language: str - signature: str | None = None - parent_name: str | None = None - docstring: str | None = None - - -class SymbolIndexService: - async def upsert_symbols(self, project_path: str, symbols: list[SymbolInfo]): - db = await get_db() - for symbol in symbols: - symbol_id = str(uuid.uuid4()) - await db.execute( - """INSERT OR REPLACE INTO symbols - (id, project_path, name, kind, file_path, line, end_line, language, signature, parent_name, docstring) - VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", - ( - symbol_id, - project_path, - symbol.name, - symbol.kind, - symbol.file_path, - symbol.line, - symbol.end_line, - symbol.language, - symbol.signature, - symbol.parent_name, - symbol.docstring, - ), - ) - await db.commit() - - async def search( - self, - project_path: str, - query: str, - kinds: list[str] | None = None, - limit: int = 20, - ) -> list[SymbolInfo]: - db = await get_db() - - # Try exact match first, then prefix, then contains - for pattern in [query, f"{query}%", f"%{query}%"]: - sql = "SELECT * FROM symbols WHERE project_path = ? AND name LIKE ?" - params: list = [project_path, pattern] - - if kinds: - placeholders = ",".join("?" for _ in kinds) - sql += f" AND kind IN ({placeholders})" - params.extend(kinds) - - sql += f" ORDER BY name LIMIT ?" - params.append(limit) - - cursor = await db.execute(sql, params) - rows = await cursor.fetchall() - - if rows: - return [ - SymbolInfo( - name=row["name"], - kind=row["kind"], - file_path=row["file_path"], - line=row["line"], - end_line=row["end_line"], - language=row["language"], - signature=row["signature"], - parent_name=row["parent_name"], - docstring=row["docstring"], - ) - for row in rows - ] - - return [] - - async def delete_by_file(self, project_path: str, file_path: str): - db = await get_db() - await db.execute( - "DELETE FROM symbols WHERE project_path = ? AND file_path = ?", - (project_path, file_path), - ) - await db.commit() - - async def get_project_symbols(self, project_path: str) -> list[SymbolInfo]: - db = await get_db() - cursor = await db.execute( - "SELECT * FROM symbols WHERE project_path = ? ORDER BY kind, name", - (project_path,), - ) - rows = await cursor.fetchall() - return [ - SymbolInfo( - name=row["name"], - kind=row["kind"], - file_path=row["file_path"], - line=row["line"], - end_line=row["end_line"], - language=row["language"], - signature=row["signature"], - parent_name=row["parent_name"], - docstring=row["docstring"], - ) - for row in rows - ] - - -symbol_index_service = SymbolIndexService() diff --git a/legacy/python-api/app-root/app/services/vector_store.py b/legacy/python-api/app-root/app/services/vector_store.py deleted file mode 100644 index f493073..0000000 --- a/legacy/python-api/app-root/app/services/vector_store.py +++ /dev/null @@ -1,135 +0,0 @@ -import hashlib -import logging - -import chromadb - -from ..config import settings - -logger = logging.getLogger(__name__) - - -class VectorStoreService: - def __init__(self): - self._client: chromadb.ClientAPI | None = None - - def init(self): - self._client = chromadb.PersistentClient(path=settings.dynamic_chroma_persist_dir) - logger.info("ChromaDB initialized at %s", settings.dynamic_chroma_persist_dir) - - @property - def client(self) -> chromadb.ClientAPI: - if self._client is None: - self.init() - return self._client - - def _collection_name(self, project_path: str) -> str: - # Use hash of path to create valid collection name - path_hash = hashlib.md5(project_path.encode()).hexdigest() - return f"project_{path_hash}" - - def get_or_create_collection(self, project_path: str) -> chromadb.Collection: - return self.client.get_or_create_collection( - name=self._collection_name(project_path), - metadata={"hnsw:space": "cosine"}, - ) - - async def upsert_chunks( - self, - project_path: str, - chunks: list, - embeddings: list[list[float]], - ): - collection = self.get_or_create_collection(project_path) - - ids = [] - documents = [] - metadatas = [] - embs = [] - - for idx, (chunk, embedding) in enumerate(zip(chunks, embeddings)): - path_hash = hashlib.md5(chunk.file_path.encode()).hexdigest()[:12] - doc_id = f"{path_hash}:{chunk.start_line}-{chunk.end_line}:{idx}" - - ids.append(doc_id) - documents.append(chunk.content) - metadatas.append({ - "file_path": chunk.file_path, - "start_line": chunk.start_line, - "end_line": chunk.end_line, - "chunk_type": chunk.chunk_type, - "symbol_name": chunk.symbol_name or "", - "language": chunk.language, - }) - embs.append(embedding) - - # Upsert in batches of 500 (ChromaDB limit) - batch_size = 500 - for i in range(0, len(ids), batch_size): - end = i + batch_size - collection.upsert( - ids=ids[i:end], - documents=documents[i:end], - metadatas=metadatas[i:end], - embeddings=embs[i:end], - ) - - async def search( - self, - project_path: str, - query_embedding: list[float], - limit: int = 10, - where: dict | None = None, - ) -> list[dict]: - collection = self.get_or_create_collection(project_path) - - kwargs = { - "query_embeddings": [query_embedding], - "n_results": limit, - "include": ["documents", "metadatas", "distances"], - } - if where: - kwargs["where"] = where - - try: - results = collection.query(**kwargs) - except Exception as e: - logger.error("ChromaDB search error: %s", e) - return [] - - items = [] - if results and results["ids"] and results["ids"][0]: - for i in range(len(results["ids"][0])): - metadata = results["metadatas"][0][i] - distance = results["distances"][0][i] - # Cosine distance to similarity score - score = 1.0 - distance - - items.append({ - "file_path": metadata["file_path"], - "start_line": metadata["start_line"], - "end_line": metadata["end_line"], - "content": results["documents"][0][i], - "score": round(score, 4), - "chunk_type": metadata["chunk_type"], - "symbol_name": metadata.get("symbol_name", ""), - "language": metadata.get("language", ""), - }) - - return items - - async def delete_by_file(self, project_path: str, file_path: str): - collection = self.get_or_create_collection(project_path) - try: - collection.delete(where={"file_path": file_path}) - except Exception as e: - logger.warning("Failed to delete chunks for %s: %s", file_path, e) - - def delete_collection(self, project_path: str): - name = self._collection_name(project_path) - try: - self.client.delete_collection(name) - except Exception: - pass - - -vector_store_service = VectorStoreService() diff --git a/legacy/python-api/app-root/app/version.py b/legacy/python-api/app-root/app/version.py deleted file mode 100644 index cce8bcd..0000000 --- a/legacy/python-api/app-root/app/version.py +++ /dev/null @@ -1,2 +0,0 @@ -SERVER_VERSION = "0.2.0" -API_VERSION = "v1" diff --git a/legacy/python-api/app-root/migrate_to_path_based.py b/legacy/python-api/app-root/migrate_to_path_based.py deleted file mode 100644 index d25480e..0000000 --- a/legacy/python-api/app-root/migrate_to_path_based.py +++ /dev/null @@ -1,299 +0,0 @@ -""" -Migration script to convert from UUID-based project IDs to path-based identification. - -This script: -1. Creates backup of the database -2. Creates new tables with path-based schema -3. Migrates data from old tables to new ones -4. Renames tables (old -> backup, new -> main) - -Usage: - python migrate_to_path_based.py --db-path /path/to/projects.db -""" -import argparse -import asyncio -import shutil -import sqlite3 -from datetime import datetime -from pathlib import Path - - -async def migrate_database(db_path: str, dry_run: bool = False): - """Migrate database from UUID-based to path-based schema.""" - db_file = Path(db_path) - - if not db_file.exists(): - print(f"❌ Database file not found: {db_path}") - return False - - # Create backup - backup_path = db_file.parent / f"{db_file.stem}_backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}.db" - if not dry_run: - print(f"📦 Creating backup: {backup_path}") - shutil.copy2(db_path, backup_path) - else: - print(f"[DRY RUN] Would create backup: {backup_path}") - - conn = sqlite3.connect(db_path) - conn.row_factory = sqlite3.Row - cursor = conn.cursor() - - try: - # Check if migration is needed - cursor.execute("SELECT sql FROM sqlite_master WHERE type='table' AND name='projects'") - table_schema = cursor.fetchone() - if table_schema and 'host_path TEXT PRIMARY KEY' in table_schema[0]: - print("✅ Database already using path-based schema. No migration needed.") - return True - - # Get existing projects - cursor.execute("SELECT * FROM projects") - projects = cursor.fetchall() - - if not projects: - print("ℹ️ No projects found. Creating new schema...") - if not dry_run: - # Just rename tables and create new ones - cursor.execute("DROP TABLE IF EXISTS projects_old") - cursor.execute("ALTER TABLE projects RENAME TO projects_old") - _create_new_tables(cursor) - conn.commit() - return True - - print(f"📊 Found {len(projects)} project(s) to migrate") - - # Create new tables with _new suffix - if not dry_run: - _create_new_tables_with_suffix(cursor, "_new") - - # Migrate data - for project in projects: - host_path = project['host_path'] - print(f" Migrating: {host_path}") - - if dry_run: - print(f" [DRY RUN] Would migrate project {project['id']} -> {host_path}") - continue - - # Insert into new projects table - cursor.execute(""" - INSERT INTO projects_new (host_path, container_path, languages, settings, stats, - status, created_at, updated_at, last_indexed_at) - VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) - """, ( - project['host_path'], - project['container_path'], - project['languages'], - project['settings'], - project['stats'], - project['status'], - project['created_at'], - project['updated_at'], - project['last_indexed_at'] - )) - - # Migrate file_hashes - cursor.execute(""" - INSERT INTO file_hashes_new (project_path, file_path, content_hash, indexed_at) - SELECT ?, file_path, content_hash, indexed_at - FROM file_hashes - WHERE project_id = ? - """, (host_path, project['id'])) - - # Migrate symbols - cursor.execute(""" - INSERT INTO symbols_new (id, project_path, name, kind, file_path, line, end_line, - language, signature, parent_name, docstring) - SELECT id, ?, name, kind, file_path, line, end_line, - language, signature, parent_name, docstring - FROM symbols - WHERE project_id = ? - """, (host_path, project['id'])) - - # Migrate index_runs - cursor.execute(""" - INSERT INTO index_runs_new (id, project_path, started_at, completed_at, - files_processed, files_total, chunks_created, - status, error_message) - SELECT id, ?, started_at, completed_at, files_processed, files_total, - chunks_created, status, error_message - FROM index_runs - WHERE project_id = ? - """, (host_path, project['id'])) - - if not dry_run: - # Rename old tables to _old - cursor.execute("ALTER TABLE projects RENAME TO projects_old") - cursor.execute("ALTER TABLE file_hashes RENAME TO file_hashes_old") - cursor.execute("ALTER TABLE symbols RENAME TO symbols_old") - cursor.execute("ALTER TABLE index_runs RENAME TO index_runs_old") - - # Rename new tables to main names - cursor.execute("ALTER TABLE projects_new RENAME TO projects") - cursor.execute("ALTER TABLE file_hashes_new RENAME TO file_hashes") - cursor.execute("ALTER TABLE symbols_new RENAME TO symbols") - cursor.execute("ALTER TABLE index_runs_new RENAME TO index_runs") - - conn.commit() - print("✅ Migration completed successfully!") - print(f" Old tables kept as: projects_old, file_hashes_old, symbols_old, index_runs_old") - print(f" You can drop them manually if everything works correctly") - else: - print("[DRY RUN] Migration would complete successfully") - - return True - - except Exception as e: - print(f"❌ Migration failed: {e}") - if not dry_run: - conn.rollback() - return False - finally: - conn.close() - - -def _create_new_tables(cursor): - """Create new schema tables.""" - cursor.executescript(""" - CREATE TABLE IF NOT EXISTS projects ( - host_path TEXT PRIMARY KEY, - container_path TEXT NOT NULL, - languages TEXT DEFAULT '[]', - settings TEXT DEFAULT '{}', - stats TEXT DEFAULT '{"total_files":0,"indexed_files":0,"total_chunks":0,"total_symbols":0}', - status TEXT DEFAULT 'created', - created_at TEXT NOT NULL, - updated_at TEXT NOT NULL, - last_indexed_at TEXT - ); - - CREATE TABLE IF NOT EXISTS file_hashes ( - project_path TEXT NOT NULL, - file_path TEXT NOT NULL, - content_hash TEXT NOT NULL, - indexed_at TEXT NOT NULL, - PRIMARY KEY (project_path, file_path), - FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE - ); - - CREATE TABLE IF NOT EXISTS symbols ( - id TEXT PRIMARY KEY, - project_path TEXT NOT NULL, - name TEXT NOT NULL, - kind TEXT NOT NULL, - file_path TEXT NOT NULL, - line INTEGER NOT NULL, - end_line INTEGER NOT NULL, - language TEXT NOT NULL, - signature TEXT, - parent_name TEXT, - docstring TEXT, - FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE - ); - - CREATE INDEX IF NOT EXISTS idx_symbols_project_name ON symbols(project_path, name); - CREATE INDEX IF NOT EXISTS idx_symbols_project_kind ON symbols(project_path, kind); - CREATE INDEX IF NOT EXISTS idx_symbols_project_file ON symbols(project_path, file_path); - - CREATE TABLE IF NOT EXISTS index_runs ( - id TEXT PRIMARY KEY, - project_path TEXT NOT NULL, - started_at TEXT NOT NULL, - completed_at TEXT, - files_processed INTEGER DEFAULT 0, - files_total INTEGER DEFAULT 0, - chunks_created INTEGER DEFAULT 0, - status TEXT DEFAULT 'running', - error_message TEXT, - FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE - ); - """) - - -def _create_new_tables_with_suffix(cursor, suffix: str): - """Create new schema tables with a suffix.""" - cursor.executescript(f""" - CREATE TABLE projects{suffix} ( - host_path TEXT PRIMARY KEY, - container_path TEXT NOT NULL, - languages TEXT DEFAULT '[]', - settings TEXT DEFAULT '{{}}', - stats TEXT DEFAULT '{{"total_files":0,"indexed_files":0,"total_chunks":0,"total_symbols":0}}', - status TEXT DEFAULT 'created', - created_at TEXT NOT NULL, - updated_at TEXT NOT NULL, - last_indexed_at TEXT - ); - - CREATE TABLE file_hashes{suffix} ( - project_path TEXT NOT NULL, - file_path TEXT NOT NULL, - content_hash TEXT NOT NULL, - indexed_at TEXT NOT NULL, - PRIMARY KEY (project_path, file_path), - FOREIGN KEY (project_path) REFERENCES projects{suffix}(host_path) ON DELETE CASCADE - ); - - CREATE TABLE symbols{suffix} ( - id TEXT PRIMARY KEY, - project_path TEXT NOT NULL, - name TEXT NOT NULL, - kind TEXT NOT NULL, - file_path TEXT NOT NULL, - line INTEGER NOT NULL, - end_line INTEGER NOT NULL, - language TEXT NOT NULL, - signature TEXT, - parent_name TEXT, - docstring TEXT, - FOREIGN KEY (project_path) REFERENCES projects{suffix}(host_path) ON DELETE CASCADE - ); - - CREATE INDEX idx_symbols{suffix}_project_name ON symbols{suffix}(project_path, name); - CREATE INDEX idx_symbols{suffix}_project_kind ON symbols{suffix}(project_path, kind); - CREATE INDEX idx_symbols{suffix}_project_file ON symbols{suffix}(project_path, file_path); - - CREATE TABLE index_runs{suffix} ( - id TEXT PRIMARY KEY, - project_path TEXT NOT NULL, - started_at TEXT NOT NULL, - completed_at TEXT, - files_processed INTEGER DEFAULT 0, - files_total INTEGER DEFAULT 0, - chunks_created INTEGER DEFAULT 0, - status TEXT DEFAULT 'running', - error_message TEXT, - FOREIGN KEY (project_path) REFERENCES projects{suffix}(host_path) ON DELETE CASCADE - ); - """) - - -def main(): - parser = argparse.ArgumentParser(description="Migrate database from UUID to path-based schema") - parser.add_argument("--db-path", required=True, help="Path to the SQLite database file") - parser.add_argument("--dry-run", action="store_true", help="Show what would be done without making changes") - - args = parser.parse_args() - - print("🔄 Starting database migration...") - print(f" Database: {args.db_path}") - print(f" Dry run: {args.dry_run}") - print() - - success = asyncio.run(migrate_database(args.db_path, args.dry_run)) - - if success: - print("\n✨ Migration process completed") - if not args.dry_run: - print("\n⚠️ IMPORTANT: You should also delete old ChromaDB collections manually if needed") - print(" Old collection names were: project_") - print(" New collection names are: project_") - else: - print("\n❌ Migration failed") - return 1 - - return 0 - - -if __name__ == "__main__": - exit(main()) diff --git a/legacy/python-api/app-root/requirements-cuda.txt b/legacy/python-api/app-root/requirements-cuda.txt deleted file mode 100644 index a83ca9e..0000000 --- a/legacy/python-api/app-root/requirements-cuda.txt +++ /dev/null @@ -1,53 +0,0 @@ -# CUDA build deps — mirrors requirements.txt -# llama-cpp-python is compiled with CUDA support in the Dockerfile -fastapi>=0.115 -uvicorn[standard]>=0.34 -llama-cpp-python>=0.3 -huggingface-hub>=0.29 -chromadb>=0.6 -tree-sitter>=0.24,<0.26 -# tree-sitter language grammars (individual packages replace tree-sitter-languages) -tree-sitter-python>=0.23 -tree-sitter-javascript>=0.23 -tree-sitter-typescript>=0.23 -tree-sitter-go>=0.23 -tree-sitter-rust>=0.23 -tree-sitter-java>=0.23 -tree-sitter-c>=0.23 -tree-sitter-cpp>=0.23 -tree-sitter-c-sharp>=0.23 -tree-sitter-ruby>=0.23 -tree-sitter-php>=0.23 -tree-sitter-swift>=0.0.1 -tree-sitter-kotlin>=1.0 -tree-sitter-scala>=0.23 -tree-sitter-bash>=0.23 -tree-sitter-html>=0.23 -tree-sitter-css>=0.23 -tree-sitter-scss>=1.0 -tree-sitter-lua>=0.5 -tree-sitter-sql>=0.3 -tree-sitter-json>=0.23 -tree-sitter-yaml>=0.7 -tree-sitter-toml>=0.7 -tree-sitter-xml>=0.7 -tree-sitter-markdown>=0.5 -tree-sitter-haskell>=0.23 -tree-sitter-ocaml>=0.23 -tree-sitter-hcl>=1.0 -tree-sitter-elixir>=0.3 -tree-sitter-zig>=1.0 -tree-sitter-julia>=0.23 -tree-sitter-svelte>=1.0 -tree-sitter-graphql>=0.1 -tree-sitter-dockerfile>=0.2 -tree-sitter-cmake>=0.7 -tree-sitter-make>=1.0 -tree-sitter-fortran>=0.5 -tree-sitter-objc>=3.0 -tree-sitter-commonlisp>=0.4 -tree-sitter-regex>=0.23 -pydantic>=2.10 -pydantic-settings>=2.7 -aiosqlite>=0.20 -pathspec>=0.12 diff --git a/legacy/python-api/app-root/requirements-dev.txt b/legacy/python-api/app-root/requirements-dev.txt deleted file mode 100644 index 8f004bd..0000000 --- a/legacy/python-api/app-root/requirements-dev.txt +++ /dev/null @@ -1,2 +0,0 @@ -pytest>=8.0 -httpx>=0.27 \ No newline at end of file diff --git a/legacy/python-api/app-root/requirements.txt b/legacy/python-api/app-root/requirements.txt deleted file mode 100644 index 1ba4a08..0000000 --- a/legacy/python-api/app-root/requirements.txt +++ /dev/null @@ -1,51 +0,0 @@ -fastapi>=0.115 -uvicorn[standard]>=0.34 -llama-cpp-python>=0.3 -huggingface-hub>=0.29 -chromadb>=0.6 -tree-sitter>=0.24,<0.26 -# tree-sitter language grammars (individual packages replace tree-sitter-languages) -tree-sitter-python>=0.23 -tree-sitter-javascript>=0.23 -tree-sitter-typescript>=0.23 -tree-sitter-go>=0.23 -tree-sitter-rust>=0.23 -tree-sitter-java>=0.23 -tree-sitter-c>=0.23 -tree-sitter-cpp>=0.23 -tree-sitter-c-sharp>=0.23 -tree-sitter-ruby>=0.23 -tree-sitter-php>=0.23 -tree-sitter-swift>=0.0.1 -tree-sitter-kotlin>=1.0 -tree-sitter-scala>=0.23 -tree-sitter-bash>=0.23 -tree-sitter-html>=0.23 -tree-sitter-css>=0.23 -tree-sitter-scss>=1.0 -tree-sitter-lua>=0.5 -tree-sitter-sql>=0.3 -tree-sitter-json>=0.23 -tree-sitter-yaml>=0.7 -tree-sitter-toml>=0.7 -tree-sitter-xml>=0.7 -tree-sitter-markdown>=0.5 -tree-sitter-haskell>=0.23 -tree-sitter-ocaml>=0.23 -tree-sitter-hcl>=1.0 -tree-sitter-elixir>=0.3 -tree-sitter-zig>=1.0 -tree-sitter-julia>=0.23 -tree-sitter-svelte>=1.0 -tree-sitter-graphql>=0.1 -tree-sitter-dockerfile>=0.2 -tree-sitter-cmake>=0.7 -tree-sitter-make>=1.0 -tree-sitter-fortran>=0.5 -tree-sitter-objc>=3.0 -tree-sitter-commonlisp>=0.4 -tree-sitter-regex>=0.23 -pydantic>=2.10 -pydantic-settings>=2.7 -aiosqlite>=0.20 -pathspec>=0.12 diff --git a/legacy/python-api/pyproject.toml b/legacy/python-api/pyproject.toml deleted file mode 100644 index dfc9d6e..0000000 --- a/legacy/python-api/pyproject.toml +++ /dev/null @@ -1,10 +0,0 @@ -[project] -name = "code-index-mcp" -version = "0.1.0" -requires-python = ">=3.11" -dependencies = [ - "mcp>=1.7", - "httpx>=0.27", - "pyjwt>=2.12.0", - "pyyaml>=6.0", -] diff --git a/legacy/python-api/scripts/benchmark_embeddings.py b/legacy/python-api/scripts/benchmark_embeddings.py deleted file mode 100755 index 68a2f8a..0000000 --- a/legacy/python-api/scripts/benchmark_embeddings.py +++ /dev/null @@ -1,428 +0,0 @@ -#!/usr/bin/env python3 -""" -Benchmark GGUF embedding quality against fp16 sentence-transformers baseline. - -Validates the claim that the Q8_0 GGUF build of CodeRankEmbed has negligible -retrieval-quality loss compared to the fp16 reference. Reports Jaccard@k, -Recall@k, and rank-correlation (Kendall tau) on a fixed query set run against -a local code corpus (defaults to this repository). - -Install before running: - uv pip install sentence-transformers torch einops # fp16 reference - uv pip install llama-cpp-python huggingface-hub # already in requirements.txt - -Usage: - python scripts/benchmark_embeddings.py \ - --corpus . \ - --gguf-repo awhiteside/CodeRankEmbed-Q8_0-GGUF \ - --fp16-repo nomic-ai/CodeRankEmbed \ - --k 10 \ - --output doc/benchmark-q8-vs-fp16.md - -Acceptance thresholds: - Jaccard@10 >= 0.7 - Recall@10 >= 0.9 - Kendall tau >= 0.5 -""" -from __future__ import annotations - -import argparse -import json -import logging -import math -import os -import sys -import time -from dataclasses import dataclass, field -from pathlib import Path -from typing import Any, Callable - -logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") -logger = logging.getLogger("benchmark") - -QUERIES: list[str] = [ - "async queue timeout", - "parse tree-sitter chunk", - "chroma collection upsert", - "cli root command version", - "embedding service load model", - "project root detection", - "file watcher branch switch", - "config yaml migration legacy keys", - "indexing status estimated finish", - "search by meaning code", - "api key authentication middleware", - "health endpoint status response", - "docker compose cuda healthcheck", - "gitignore pattern matching", - "sqlite projects table schema", - "mean pooling embedding", - "batch size inference throughput", - "incremental reindex sha256", - "client version header compatibility", - "goroutine concurrent walk", -] - -CODE_EXTENSIONS = {".py", ".go", ".js", ".ts", ".rs", ".java", ".cpp", ".c", ".h"} -MAX_CHUNK_CHARS = 2000 -EXCLUDE_DIRS = {".git", ".venv", "node_modules", "build", "dist", "__pycache__", "data"} -QUERY_PREFIX = "Represent this query for searching relevant code: " - - -@dataclass -class Chunk: - chunk_id: str # "relative/path.py:0" - path: str - content: str - - -@dataclass -class BackendResult: - name: str - load_seconds: float = 0.0 - embed_seconds: float = 0.0 - dim: int = 0 - top_k: dict[str, list[str]] = field(default_factory=dict) # query -> chunk_ids - - -def collect_chunks(corpus_root: Path) -> list[Chunk]: - chunks: list[Chunk] = [] - for path in corpus_root.rglob("*"): - if not path.is_file(): - continue - if path.suffix not in CODE_EXTENSIONS: - continue - if any(part in EXCLUDE_DIRS for part in path.parts): - continue - try: - text = path.read_text(encoding="utf-8", errors="replace") - except OSError: - continue - if not text.strip(): - continue - rel = path.relative_to(corpus_root).as_posix() - # Slice to ≤MAX_CHUNK_CHARS chunks, line-aligned where possible. - if len(text) <= MAX_CHUNK_CHARS: - chunks.append(Chunk(f"{rel}:0", rel, text)) - continue - idx = 0 - part = 0 - while idx < len(text): - end = min(idx + MAX_CHUNK_CHARS, len(text)) - # extend to next newline to avoid slicing mid-token - nl = text.find("\n", end) - if nl != -1 and nl - end < 200: - end = nl + 1 - chunks.append(Chunk(f"{rel}:{part}", rel, text[idx:end])) - idx = end - part += 1 - return chunks - - -def cosine(a: list[float], b: list[float]) -> float: - # Fast enough on pure-Python for a few thousand vectors * 20 queries. - num = sum(x * y for x, y in zip(a, b)) - da = math.sqrt(sum(x * x for x in a)) - db = math.sqrt(sum(y * y for y in b)) - if da == 0 or db == 0: - return 0.0 - return num / (da * db) - - -def top_k_per_query( - chunk_vecs: dict[str, list[float]], - query_vecs: dict[str, list[float]], - k: int, -) -> dict[str, list[str]]: - result: dict[str, list[str]] = {} - for q, qv in query_vecs.items(): - scored = [(cid, cosine(qv, cv)) for cid, cv in chunk_vecs.items()] - scored.sort(key=lambda x: x[1], reverse=True) - result[q] = [cid for cid, _ in scored[:k]] - return result - - -def run_fp16( - chunks: list[Chunk], - queries: list[str], - repo: str, -) -> BackendResult: - from sentence_transformers import SentenceTransformer # type: ignore - - t0 = time.monotonic() - model = SentenceTransformer(repo, trust_remote_code=True) - load_s = time.monotonic() - t0 - - t0 = time.monotonic() - chunk_embeddings = model.encode( - [c.content for c in chunks], show_progress_bar=True, batch_size=8 - ).tolist() - query_embeddings = model.encode( - [QUERY_PREFIX + q for q in queries], show_progress_bar=False - ).tolist() - embed_s = time.monotonic() - t0 - - chunk_vecs = {c.chunk_id: v for c, v in zip(chunks, chunk_embeddings)} - query_vecs = dict(zip(queries, query_embeddings)) - return BackendResult( - name=f"fp16/{repo}", - load_seconds=load_s, - embed_seconds=embed_s, - dim=len(chunk_embeddings[0]) if chunk_embeddings else 0, - top_k=top_k_per_query(chunk_vecs, query_vecs, 10), - ) - - -def run_gguf( - chunks: list[Chunk], - queries: list[str], - repo: str, - gguf_filename: str | None = None, -) -> BackendResult: - from huggingface_hub import hf_hub_download, list_repo_files # type: ignore - from llama_cpp import Llama # type: ignore - - t0 = time.monotonic() - files = list(list_repo_files(repo)) - if gguf_filename: - gguf_file = gguf_filename if gguf_filename in files else None - if not gguf_file: - raise RuntimeError(f"File {gguf_filename} not found in {repo}. Available: {[f for f in files if f.endswith('.gguf')]}") - else: - gguf_file = next((f for f in files if f.endswith(".gguf")), None) - if not gguf_file: - raise RuntimeError(f"No .gguf file in {repo}") - model_path = hf_hub_download(repo_id=repo, filename=gguf_file) - - n_gpu_layers = int(os.environ.get("CIX_N_GPU_LAYERS", "-1")) - # n_ctx matches production config (max_chunk_tokens=1500 + 128 headroom) - model = Llama( - model_path=model_path, - embedding=True, - n_ctx=1628, - n_gpu_layers=n_gpu_layers, - verbose=False, - ) - load_s = time.monotonic() - t0 - - t0 = time.monotonic() - # Embed one text at a time to avoid context-window overflow across chunks - chunk_vecs: dict[str, list[float]] = {} - for i, c in enumerate(chunks): - result = model.create_embedding([c.content]) - chunk_vecs[c.chunk_id] = result["data"][0]["embedding"] - if (i + 1) % 50 == 0: - logger.info(" GGUF embedded %d/%d chunks", i + 1, len(chunks)) - query_vecs: dict[str, list[float]] = {} - for q in queries: - result = model.create_embedding([QUERY_PREFIX + q]) - query_vecs[q] = result["data"][0]["embedding"] - embed_s = time.monotonic() - t0 - - # derive dim from first embedding - first_vec = next(iter(chunk_vecs.values()), []) - dim = len(first_vec) - return BackendResult( - name=f"gguf/{repo}/{gguf_filename or 'auto'}", - load_seconds=load_s, - embed_seconds=embed_s, - dim=dim, - top_k=top_k_per_query(chunk_vecs, query_vecs, 10), - ) - - -def jaccard(a: list[str], b: list[str]) -> float: - sa, sb = set(a), set(b) - if not sa and not sb: - return 1.0 - return len(sa & sb) / len(sa | sb) - - -def recall_at_k(reference: list[str], candidate: list[str]) -> float: - if not reference: - return 1.0 - hits = sum(1 for item in reference if item in candidate) - return hits / len(reference) - - -def kendall_tau(reference: list[str], candidate: list[str]) -> float: - # Rank-correlation restricted to items that appear in both lists. - common = [item for item in reference if item in candidate] - if len(common) < 2: - return 1.0 if len(common) == len(reference) else 0.0 - ref_rank = {item: i for i, item in enumerate(reference)} - cand_rank = {item: i for i, item in enumerate(candidate)} - concordant = discordant = 0 - for i in range(len(common)): - for j in range(i + 1, len(common)): - a, b = common[i], common[j] - ra, rb = ref_rank[a] - ref_rank[b], cand_rank[a] - cand_rank[b] - if ra * rb > 0: - concordant += 1 - elif ra * rb < 0: - discordant += 1 - total = concordant + discordant - return (concordant - discordant) / total if total else 0.0 - - -def write_report( - output: Path, - reference: BackendResult, - candidate: BackendResult, - k: int, - raw_path: Path, -) -> dict[str, float]: - per_query = [] - jaccards: list[float] = [] - recalls: list[float] = [] - taus: list[float] = [] - for q in reference.top_k: - ref = reference.top_k[q] - cand = candidate.top_k.get(q, []) - j = jaccard(ref, cand) - r = recall_at_k(ref, cand) - t = kendall_tau(ref, cand) - jaccards.append(j) - recalls.append(r) - taus.append(t) - per_query.append((q, j, r, t)) - - def mean(xs: list[float]) -> float: - return sum(xs) / len(xs) if xs else 0.0 - - summary = { - "jaccard_mean": mean(jaccards), - "recall_mean": mean(recalls), - "kendall_tau_mean": mean(taus), - "reference_embed_seconds": reference.embed_seconds, - "candidate_embed_seconds": candidate.embed_seconds, - "speedup": ( - reference.embed_seconds / candidate.embed_seconds - if candidate.embed_seconds > 0 - else 0.0 - ), - } - - lines: list[str] = [] - lines.append(f"# Embedding Quality Benchmark — {candidate.name} vs {reference.name}\n") - lines.append("") - lines.append(f"**k** = {k} | **queries** = {len(reference.top_k)} | **dim ref/cand** = {reference.dim}/{candidate.dim}") - lines.append("") - lines.append("## Summary") - lines.append("") - lines.append("| Metric | Value | Acceptance |") - lines.append("|---|---:|---:|") - lines.append(f"| Jaccard@{k} (mean) | {summary['jaccard_mean']:.3f} | ≥ 0.70 |") - lines.append(f"| Recall@{k} (mean) | {summary['recall_mean']:.3f} | ≥ 0.90 |") - lines.append(f"| Kendall tau (mean) | {summary['kendall_tau_mean']:.3f} | ≥ 0.50 |") - lines.append(f"| Reference embed time | {reference.embed_seconds:.1f}s | — |") - lines.append(f"| Candidate embed time | {candidate.embed_seconds:.1f}s | — |") - lines.append(f"| Speedup (ref/cand) | {summary['speedup']:.2f}× | — |") - lines.append("") - lines.append("## Per-query scores") - lines.append("") - lines.append("| Query | Jaccard | Recall | Kendall τ |") - lines.append("|---|---:|---:|---:|") - for q, j, r, t in per_query: - lines.append(f"| `{q}` | {j:.3f} | {r:.3f} | {t:.3f} |") - lines.append("") - lines.append(f"Raw top-k lists: `{raw_path.name}`") - lines.append("") - - output.write_text("\n".join(lines), encoding="utf-8") - return summary - - -def main() -> int: - parser = argparse.ArgumentParser(description=__doc__) - parser.add_argument("--corpus", type=Path, default=Path.cwd(), - help="Directory to index (default: CWD)") - parser.add_argument("--gguf-repo", default="awhiteside/CodeRankEmbed-Q8_0-GGUF") - parser.add_argument("--gguf-file", default=None, - help="Specific .gguf filename to use from the repo (optional)") - parser.add_argument("--fp16-repo", default="nomic-ai/CodeRankEmbed") - parser.add_argument("--fp16-cache", type=Path, default=None, - help="Path to JSON file for caching/loading fp16 results. " - "If file exists, load from it; otherwise run fp16 and save.") - parser.add_argument("--k", type=int, default=10) - parser.add_argument("--output", type=Path, default=Path("doc/benchmark-q8-vs-fp16.md")) - parser.add_argument("--skip-fp16", action="store_true", - help="Skip fp16 reference — useful for quick sanity checks") - args = parser.parse_args() - - logger.info("Collecting chunks from %s", args.corpus) - chunks = collect_chunks(args.corpus) - logger.info("Collected %d chunks", len(chunks)) - if not chunks: - logger.error("No chunks to benchmark") - return 1 - - logger.info("Running GGUF backend: %s (file: %s)", args.gguf_repo, args.gguf_file or "auto") - gguf = run_gguf(chunks, QUERIES, args.gguf_repo, gguf_filename=args.gguf_file) - - if args.skip_fp16: - logger.info("Skipping fp16 reference (--skip-fp16)") - args.output.parent.mkdir(parents=True, exist_ok=True) - raw_dir = args.output.parent / "benchmark-data" - raw_dir.mkdir(parents=True, exist_ok=True) - raw = raw_dir / (args.output.stem + ".json") - raw.write_text(json.dumps({"gguf": gguf.top_k}, indent=2), encoding="utf-8") - logger.info("Wrote top-k to %s (no comparison possible)", raw) - return 0 - - # fp16 caching: load from cache file if available, else run and save - fp16: BackendResult - if args.fp16_cache and args.fp16_cache.exists(): - logger.info("Loading fp16 results from cache: %s", args.fp16_cache) - cache_data = json.loads(args.fp16_cache.read_text(encoding="utf-8")) - fp16 = BackendResult( - name=cache_data["name"], - load_seconds=cache_data["load_seconds"], - embed_seconds=cache_data["embed_seconds"], - dim=cache_data["dim"], - top_k=cache_data["top_k"], - ) - else: - logger.info("Running fp16 reference backend: %s", args.fp16_repo) - fp16 = run_fp16(chunks, QUERIES, args.fp16_repo) - if args.fp16_cache: - args.fp16_cache.parent.mkdir(parents=True, exist_ok=True) - cache_payload = { - "name": fp16.name, - "load_seconds": fp16.load_seconds, - "embed_seconds": fp16.embed_seconds, - "dim": fp16.dim, - "top_k": fp16.top_k, - } - args.fp16_cache.write_text(json.dumps(cache_payload, indent=2), encoding="utf-8") - logger.info("Saved fp16 cache to %s", args.fp16_cache) - - args.output.parent.mkdir(parents=True, exist_ok=True) - raw_dir = args.output.parent / "benchmark-data" - raw_dir.mkdir(parents=True, exist_ok=True) - raw_path = raw_dir / (args.output.stem + ".json") - raw_path.write_text( - json.dumps({"fp16": fp16.top_k, "gguf": gguf.top_k}, indent=2), - encoding="utf-8", - ) - summary = write_report(args.output, fp16, gguf, args.k, raw_path) - - logger.info("Summary: %s", summary) - logger.info("Report written to %s", args.output) - - failed = [] - if summary["jaccard_mean"] < 0.7: - failed.append(f"Jaccard {summary['jaccard_mean']:.3f} < 0.70") - if summary["recall_mean"] < 0.9: - failed.append(f"Recall {summary['recall_mean']:.3f} < 0.90") - if summary["kendall_tau_mean"] < 0.5: - failed.append(f"Kendall τ {summary['kendall_tau_mean']:.3f} < 0.50") - if failed: - logger.error("Acceptance criteria failed: %s", "; ".join(failed)) - return 2 - logger.info("All acceptance criteria passed") - return 0 - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/legacy/python-api/scripts/profile_vram.py b/legacy/python-api/scripts/profile_vram.py deleted file mode 100644 index 780fd4d..0000000 --- a/legacy/python-api/scripts/profile_vram.py +++ /dev/null @@ -1,115 +0,0 @@ -#!/usr/bin/env python3 -""" -VRAM profiling for the GGUF embedding model. - -Measures peak GPU memory for a GGUF model using llama-cpp-python. -Run this with the indexing server STOPPED so measurements are clean. - -Usage on the server: - docker compose -f /path/to/stack/docker-compose.yml stop code-index-api - docker run --rm --gpus all \ - -e EMBEDDING_MODEL=awhiteside/CodeRankEmbed-Q8_0-GGUF \ - -v cix_cix_data:/data \ - dvcdsys/code-index:test-cu130 \ - python3 /app/scripts/profile_vram.py - docker compose ... start code-index-api - -Override GPU/CPU behaviour with CIX_N_GPU_LAYERS=0 (CPU) or =-1 (all layers on GPU). -""" -import gc -import json -import os -import sys -import time -import subprocess - -os.environ["TOKENIZERS_PARALLELISM"] = "false" - -from llama_cpp import Llama -from huggingface_hub import hf_hub_download, list_repo_files - -MODEL_NAME = os.environ.get("EMBEDDING_MODEL", "awhiteside/CodeRankEmbed-Q8_0-GGUF") - -def get_gpu_memory(): - """Returns (used, total) in MB via nvidia-smi.""" - try: - output = subprocess.check_output( - ["nvidia-smi", "--query-gpu=memory.used,memory.total", "--format=csv,nounits,noheader"], - encoding="utf-8" - ) - used, total = map(int, output.strip().split(",")) - return used, total - except Exception: - return 0, 0 - -def synthetic_text(n_tokens: int) -> str: - """Code-like text with ~n_tokens tokens.""" - word = "variableName" - count = max(1, n_tokens * 4 // len(word)) - return " ".join(f"{word}_{i}" for i in range(count)) - -def main(): - used_start, total_vram = get_gpu_memory() - if total_vram == 0: - print("nvidia-smi unavailable — running on CPU or GPU access is missing.") - - print(f"GPU : NVIDIA (via nvidia-smi)") - print(f"VRAM : {total_vram} MB total, {used_start} MB used at start") - print(f"Model : {MODEL_NAME}") - print("Loading model...", flush=True) - - model_path = MODEL_NAME - if "/" in model_path and not os.path.exists(model_path): - files = list_repo_files(model_path) - gguf_file = next((f for f in files if f.endswith(".gguf")), None) - model_path = hf_hub_download(repo_id=model_path, filename=gguf_file) - - n_gpu_layers = int(os.environ.get("CIX_N_GPU_LAYERS", "-1" if total_vram else "0")) - model = Llama( - model_path=model_path, - embedding=True, - n_ctx=8192, - n_gpu_layers=n_gpu_layers, - verbose=False - ) - - used_after_load, _ = get_gpu_memory() - model_size_mb = used_after_load - used_start - print(f"Model loaded. VRAM used: {used_after_load} MB (Model ~{model_size_mb} MB)\n", flush=True) - - token_counts = [128, 256, 512, 1024, 2048, 4096, 8192] - results = [] - - print(f"{'tokens':>7} {'peak_used_MB':>12} {'delta_MB':>8}") - print("-" * 35) - - for n_tokens in token_counts: - text = synthetic_text(n_tokens) - - # GGUF usually doesn't show huge VRAM spikes for embeddings like PyTorch does - # because the context is pre-allocated. - model.create_embedding(text) - - used_now, _ = get_gpu_memory() - results.append({ - "n_tokens": n_tokens, - "used_mb": used_now, - "delta_mb": used_now - used_after_load - }) - - print(f"{n_tokens:>7} {used_now:>12d} {used_now - used_after_load:>8d}") - - # ---- save JSON ---- - out = "/tmp/vram_profile.json" - dump_data = { - "model": MODEL_NAME, - "total_vram_mb": total_vram, - "load_vram_mb": used_after_load, - "results": results - } - with open(out, "w") as f: - json.dump(dump_data, f, indent=2) - print(f"\nRaw data saved to {out}") - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/legacy/python-api/setup-local.sh b/legacy/python-api/setup-local.sh deleted file mode 100755 index fec4abb..0000000 --- a/legacy/python-api/setup-local.sh +++ /dev/null @@ -1,154 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -PROJECT_DIR="$(cd "$(dirname "$0")" && pwd)" -ENV_FILE="$PROJECT_DIR/.env" -DATA_DIR="$HOME/.cix/data" - -echo "=== Claude Code Index — Local Setup ===" - -# 1. Ensure uv is installed (manages Python automatically) -if ! command -v uv &>/dev/null; then - echo "Installing uv (Python package manager)..." - curl -LsSf https://astral.sh/uv/install.sh | sh - # Add to current session - export PATH="$HOME/.local/bin:$HOME/.cargo/bin:$PATH" - if ! command -v uv &>/dev/null; then - echo "ERROR: uv installation failed. Install manually: https://docs.astral.sh/uv/" - exit 1 - fi -fi -echo "uv: $(uv --version)" - -# 2. Create virtual environment with Python 3.12 (auto-downloads if needed) -if [ ! -d "$PROJECT_DIR/.venv" ]; then - echo "Creating virtual environment (Python 3.12)..." - uv venv --python 3.12 "$PROJECT_DIR/.venv" -fi - -# 3. Install API dependencies -echo "Installing dependencies (first time downloads ~650MB GGUF model)..." -uv pip install --python "$PROJECT_DIR/.venv/bin/python" -r "$PROJECT_DIR/api/requirements.txt" - -# 4. Create data directories -mkdir -p "$DATA_DIR/chroma" "$DATA_DIR/sqlite" - -# 5. Generate .env if not exists -if [ ! -f "$ENV_FILE" ]; then - echo "Generating configuration..." - API_KEY="cix_$(openssl rand -hex 32)" - cat > "$ENV_FILE" </dev/null - -# 7. Start API server in background -echo "Starting API server on port ${PORT:-21847}..." -cd "$PROJECT_DIR/api" -PYTHONPATH="$PROJECT_DIR/api" \ -API_KEY="$API_KEY" \ -CHROMA_PERSIST_DIR="${CHROMA_PERSIST_DIR:-$DATA_DIR/chroma}" \ -SQLITE_PATH="${SQLITE_PATH:-$DATA_DIR/sqlite/projects.db}" \ -EMBEDDING_MODEL="${EMBEDDING_MODEL:-awhiteside/CodeRankEmbed-Q8_0-GGUF}" \ -MAX_FILE_SIZE="${MAX_FILE_SIZE:-524288}" \ -EXCLUDED_DIRS="${EXCLUDED_DIRS:-node_modules,.git,.venv,__pycache__,dist,build,.next,.cache,.DS_Store}" \ -nohup "$PROJECT_DIR/.venv/bin/uvicorn" app.main:app \ - --host 0.0.0.0 --port "${PORT:-21847}" \ - > "$DATA_DIR/server.log" 2>&1 & - -SERVER_PID=$! -echo "$SERVER_PID" > "$DATA_DIR/server.pid" -echo "Server PID: $SERVER_PID (saved to $DATA_DIR/server.pid)" - -cd "$PROJECT_DIR" - -# 8. Wait for health -echo "Waiting for service to be healthy..." -for i in $(seq 1 30); do - if curl -sf "http://localhost:${PORT:-21847}/health" > /dev/null 2>&1; then - echo "Service is healthy!" - break - fi - if ! kill -0 "$SERVER_PID" 2>/dev/null; then - echo "ERROR: Server process died. Check logs: cat $DATA_DIR/server.log" - exit 1 - fi - [ "$i" -eq 30 ] && echo "ERROR: Service failed to start. Check logs: cat $DATA_DIR/server.log" && exit 1 - sleep 2 -done - -# 9. Add instructions to global CLAUDE.md -CLAUDE_DIR="$HOME/.claude" -CLAUDE_MD="$CLAUDE_DIR/CLAUDE.md" -MARKER="" - -if [ ! -f "$CLAUDE_MD" ] || ! grep -q "$MARKER" "$CLAUDE_MD" 2>/dev/null; then - echo "Adding code-index instructions to $CLAUDE_MD..." - mkdir -p "$CLAUDE_DIR" - cat >> "$CLAUDE_MD" <<'INSTRUCTIONS' - - -## Code Index (`cix`) - -This environment has a semantic code index. Use the `cix` CLI to search code and navigate the project. - -**IMPORTANT — search priority:** -1. ALWAYS use `cix search` or `cix symbols` FIRST when looking for code -2. Only fall back to Grep/Glob if the index returns no results or `cix` is not available -3. The index understands natural language — ask it like you would ask a developer - -**Commands (run via Bash tool):** -- `cix search "authentication middleware"` — semantic code search -- `cix search "error handling" --in ./api` — search within a directory -- `cix search "config" --in README.md` — search within a specific file -- `cix symbols "handleRequest" --kind function` — find symbols by name -- `cix files "config"` — search files by path pattern -- `cix summary` — project overview (languages, directories, symbols) -- `cix status` — check indexing status -- `cix reindex` — trigger incremental reindex after changes - -**First time setup:** -If the project is not yet indexed, run: `cix init` -This registers the project, starts indexing, and launches a file watcher daemon. -The watcher auto-reindexes when files change — no manual reindex needed. - -**Tips:** -- Use `--in` flag to narrow search to a specific file or directory -- Use `--lang go` to filter by language -- Use `--limit 20` to get more results -- If `cix` is not installed, fall back to MCP tools: search_code, find_symbols - -INSTRUCTIONS - echo "Added code-index instructions to $CLAUDE_MD" -else - echo "Code-index instructions already in $CLAUDE_MD" -fi - -echo "" -echo "=== Local Setup Complete ===" -echo "API server running on http://localhost:${PORT:-21847} (PID: $SERVER_PID)" -echo "Instructions added to $CLAUDE_MD." -echo "" -echo "Useful commands:" -echo " Stop server: kill \$(cat $DATA_DIR/server.pid)" -echo " View logs: tail -f $DATA_DIR/server.log" -echo " Restart server: kill \$(cat $DATA_DIR/server.pid) && ./setup-local.sh" diff --git a/legacy/python-api/setup.sh b/legacy/python-api/setup.sh deleted file mode 100755 index 5c37b0d..0000000 --- a/legacy/python-api/setup.sh +++ /dev/null @@ -1,75 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -PROJECT_DIR="$(cd "$(dirname "$0")" && pwd)" -ENV_FILE="$PROJECT_DIR/.env" -DATA_DIR="$HOME/.cix/data" - -echo "=== cix — Code IndeX Setup (Docker) ===" - -# 1. Generate .env if not exists -if [ ! -f "$ENV_FILE" ]; then - echo "Generating configuration..." - API_KEY="cix_$(openssl rand -hex 32)" - cat > "$ENV_FILE" < /dev/null 2>&1; then - echo "Service is healthy!" - break - fi - [ "$i" -eq 30 ] && echo "ERROR: Service failed to start. Check logs: docker compose logs" && exit 1 - sleep 2 -done - -# 6. Configure cix CLI (if installed) -if command -v cix &>/dev/null; then - echo "Configuring cix CLI..." - cix config set api.url "http://localhost:${PORT:-21847}" - cix config set api.key "$API_KEY" - echo "✓ cix configured" -else - echo "cix CLI not installed. Install it with: cd cli && make build && make install" - echo "Then configure it:" - echo " cix config set api.url http://localhost:${PORT:-21847}" - echo " cix config set api.key $API_KEY" -fi - -echo "" -echo "=== Setup Complete ===" -echo "" -echo "API: http://localhost:${PORT:-21847}" -echo "API key: $API_KEY" -echo "Data: $DATA_DIR" -echo "" -echo "Next steps:" -echo " Install CLI: cd cli && make build && make install" -echo " Index project: cix init /path/to/your/project" -echo " Search: cix search \"authentication middleware\"" diff --git a/legacy/python-api/tests/__init__.py b/legacy/python-api/tests/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/legacy/python-api/tests/test_api.py b/legacy/python-api/tests/test_api.py deleted file mode 100644 index 0e9aed4..0000000 --- a/legacy/python-api/tests/test_api.py +++ /dev/null @@ -1,106 +0,0 @@ -"""API integration tests — require running Docker container.""" -import os - -import httpx -import pytest - -BASE_URL = os.environ.get("CODE_INDEX_API_URL", "http://localhost:21847") -API_KEY = os.environ.get("CODE_INDEX_API_KEY", "") - - -@pytest.fixture -def client(): - return httpx.Client( - base_url=BASE_URL, - headers={"Authorization": f"Bearer {API_KEY}"}, - timeout=30.0, - ) - - -def test_health_no_auth(): - r = httpx.get(f"{BASE_URL}/health", timeout=10.0) - assert r.status_code == 200 - assert r.json()["status"] == "ok" - - -def test_status_requires_auth(): - r = httpx.get(f"{BASE_URL}/api/v1/status", timeout=10.0) - assert r.status_code in (401, 403) - - -def test_status_with_auth(client): - if not API_KEY: - pytest.skip("API_KEY not set") - r = client.get("/api/v1/status") - assert r.status_code == 200 - data = r.json() - assert "model_loaded" in data - assert "server_version" in data - assert "api_version" in data - assert data["api_version"] == "v1" - - -def test_project_crud(client): - if not API_KEY: - pytest.skip("API_KEY not set") - - # Create - r = client.post( - "/api/v1/projects", - json={"name": "test-project", "host_path": "/tmp/test-project"}, - ) - assert r.status_code == 201 - project = r.json() - project_id = project["id"] - assert project["name"] == "test-project" - - # List - r = client.get("/api/v1/projects") - assert r.status_code == 200 - assert any(p["id"] == project_id for p in r.json()["projects"]) - - # Get - r = client.get(f"/api/v1/projects/{project_id}") - assert r.status_code == 200 - assert r.json()["name"] == "test-project" - - # Update - r = client.patch( - f"/api/v1/projects/{project_id}", - json={"name": "test-project-updated"}, - ) - assert r.status_code == 200 - assert r.json()["name"] == "test-project-updated" - - # Delete - r = client.delete(f"/api/v1/projects/{project_id}") - assert r.status_code == 204 - - # Verify deleted - r = client.get(f"/api/v1/projects/{project_id}") - assert r.status_code == 404 - - -def test_index_trigger(client): - if not API_KEY: - pytest.skip("API_KEY not set") - - # Create project first - r = client.post( - "/api/v1/projects", - json={"name": "test-index", "host_path": "/tmp/test-index"}, - ) - project_id = r.json()["id"] - - # Trigger index - r = client.post(f"/api/v1/projects/{project_id}/index") - assert r.status_code == 202 - assert "run_id" in r.json() - - # Check status - r = client.get(f"/api/v1/projects/{project_id}/index/status") - assert r.status_code == 200 - assert "status" in r.json() - - # Cleanup - client.delete(f"/api/v1/projects/{project_id}") diff --git a/legacy/python-api/tests/test_chunker.py b/legacy/python-api/tests/test_chunker.py deleted file mode 100644 index 2f3c5f2..0000000 --- a/legacy/python-api/tests/test_chunker.py +++ /dev/null @@ -1,406 +0,0 @@ -"""Tests for the chunker service — runs locally without Docker.""" -import sys -from pathlib import Path - -# Add api directory to path for local testing -sys.path.insert(0, str(Path(__file__).parent.parent / "api")) - -import pytest - - -def _make_chunker(): - """Create chunker service instance.""" - from app.services.chunker import ChunkerService - return ChunkerService() - - -PYTHON_CODE = ''' -import os -import sys - -CONSTANT = 42 - -def hello(name: str) -> str: - """Say hello.""" - return f"Hello, {name}!" - -class Calculator: - """A simple calculator.""" - - def __init__(self, initial: int = 0): - self.value = initial - - def add(self, n: int) -> int: - self.value += n - return self.value - - def subtract(self, n: int) -> int: - self.value -= n - return self.value - -def main(): - calc = Calculator(10) - print(hello("World")) - print(calc.add(5)) -''' - -GO_CODE = '''package main - -import "fmt" - -type Server struct { - host string - port int -} - -func NewServer(host string, port int) *Server { - return &Server{host: host, port: port} -} - -func (s *Server) Start() error { - fmt.Printf("Starting on %s:%d\\n", s.host, s.port) - return nil -} - -func main() { - s := NewServer("localhost", 8080) - s.Start() -} -''' - -PLAIN_TEXT = "Just some plain text that has no code structure at all. " * 20 - - -class TestTreeSitterIntegration: - """Verify tree-sitter bindings load correctly — catches version incompatibilities.""" - - def test_parser_loads_for_all_language_nodes(self): - """Every language in LANGUAGE_NODES must have a working parser (not None).""" - from app.services.chunker import LANGUAGE_NODES - chunker = _make_chunker() - for language in LANGUAGE_NODES: - parser = chunker._get_parser(language) - assert parser is not None, ( - f"_get_parser('{language}') returned None — " - f"tree-sitter binding broken or missing for '{language}'" - ) - - def test_parser_produces_ast(self): - """Parser.parse() must return a tree with a root_node.""" - chunker = _make_chunker() - parser = chunker._get_parser("python") - assert parser is not None - tree = parser.parse(b"def foo(): pass") - assert tree.root_node is not None - assert tree.root_node.type == "module" - - -class TestChunkerPython: - def test_extracts_functions(self): - chunker = _make_chunker() - result = chunker.chunk_file("test.py", PYTHON_CODE, "python") - chunks = result.chunks - func_chunks = [c for c in chunks if c.chunk_type == "function"] - func_names = {c.symbol_name for c in func_chunks} - assert "hello" in func_names - assert "main" in func_names - - def test_extracts_class(self): - chunker = _make_chunker() - chunks = chunker.chunk_file("test.py", PYTHON_CODE, "python").chunks - class_chunks = [c for c in chunks if c.chunk_type == "class"] - assert any(c.symbol_name == "Calculator" for c in class_chunks) - - def test_extracts_methods(self): - chunker = _make_chunker() - chunks = chunker.chunk_file("test.py", PYTHON_CODE, "python").chunks - method_chunks = [c for c in chunks if c.chunk_type == "method"] - method_names = {c.symbol_name for c in method_chunks} - assert "add" in method_names - assert "__init__" in method_names - - def test_module_chunks(self): - chunker = _make_chunker() - chunks = chunker.chunk_file("test.py", PYTHON_CODE, "python").chunks - module_chunks = [c for c in chunks if c.chunk_type == "module"] - # Should capture imports and constants - assert len(module_chunks) > 0 - - def test_line_numbers(self): - chunker = _make_chunker() - chunks = chunker.chunk_file("test.py", PYTHON_CODE, "python").chunks - for chunk in chunks: - assert chunk.start_line >= 1 - assert chunk.end_line >= chunk.start_line - - -class TestChunkerGo: - def test_extracts_functions(self): - chunker = _make_chunker() - chunks = chunker.chunk_file("main.go", GO_CODE, "go").chunks - func_chunks = [c for c in chunks if c.chunk_type == "function"] - func_names = {c.symbol_name for c in func_chunks} - assert "NewServer" in func_names or "main" in func_names - - def test_extracts_type(self): - chunker = _make_chunker() - chunks = chunker.chunk_file("main.go", GO_CODE, "go").chunks - type_chunks = [c for c in chunks if c.chunk_type == "type"] - assert any(c.symbol_name == "Server" for c in type_chunks) - - -TYPESCRIPT_CODE = ''' -import { Request, Response } from "express"; - -interface User { - id: number; - name: string; -} - -type UserRole = "admin" | "user"; - -function getUser(id: number): User { - return { id, name: "test" }; -} - -class UserService { - private users: User[] = []; - - addUser(user: User): void { - this.users.push(user); - } -} - -const fetchUser = (id: number): Promise => { - return Promise.resolve({ id, name: "test" }); -}; -''' - -JAVASCRIPT_CODE = ''' -const express = require("express"); - -function createApp() { - const app = express(); - return app; -} - -class Router { - constructor() { - this.routes = []; - } - - addRoute(path, handler) { - this.routes.push({ path, handler }); - } -} - -const handler = (req, res) => { - res.json({ ok: true }); -}; -''' - -RUST_CODE = ''' -use std::collections::HashMap; - -struct Config { - host: String, - port: u16, -} - -enum AppError { - NotFound, - Internal(String), -} - -trait Handler { - fn handle(&self, req: &str) -> Result; -} - -fn create_config() -> Config { - Config { host: "localhost".to_string(), port: 8080 } -} -''' - -JAVA_CODE = ''' -package com.example; - -import java.util.List; - -interface Repository { - List findAll(); -} - -class UserService { - private final Repository repo; - - UserService(Repository repo) { - this.repo = repo; - } - - public List getUsers() { - return repo.findAll(); - } -} -''' - -LUA_CODE = ''' -local M = {} - -function M.setup(opts) - opts = opts or {} - M.debug = opts.debug or false -end - -function M.greet(name) - return "Hello, " .. name -end - -return M -''' - -YAML_CODE = ''' -name: CI Pipeline -on: - push: - branches: [main] -jobs: - build: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - run: make test -''' - -JSON_CODE = ''' -{ - "name": "my-project", - "version": "1.0.0", - "dependencies": { - "express": "^4.18.0" - }, - "scripts": { - "start": "node index.js", - "test": "jest" - } -} -''' - - -class TestChunkerMultiLanguage: - """Verify tree-sitter parses all LANGUAGE_NODES languages (not falling back to sliding window).""" - - @pytest.mark.parametrize("filename,code,language,expected_symbols", [ - ("test.py", PYTHON_CODE, "python", {"hello", "Calculator"}), - ("test.ts", TYPESCRIPT_CODE, "typescript", {"getUser", "UserService"}), - ("test.js", JAVASCRIPT_CODE, "javascript", {"createApp", "Router"}), - ("main.go", GO_CODE, "go", {"NewServer", "Server"}), - ("lib.rs", RUST_CODE, "rust", {"Config", "Handler", "create_config"}), - ("Main.java", JAVA_CODE, "java", {"UserService", "Repository"}), - ]) - def test_treesitter_parses_language(self, filename, code, language, expected_symbols): - chunker = _make_chunker() - result = chunker.chunk_file(filename, code, language) - structured_types = {"function", "class", "method", "type"} - structured = [c for c in result.chunks if c.chunk_type in structured_types] - assert len(structured) > 0, f"{language}: fell back to sliding window, no structured chunks" - found_names = {c.symbol_name for c in structured if c.symbol_name} - for sym in expected_symbols: - assert sym in found_names, f"{language}: expected symbol '{sym}' not found in {found_names}" - - @pytest.mark.parametrize("filename,code,language", [ - ("test.py", PYTHON_CODE, "python"), - ("test.ts", TYPESCRIPT_CODE, "typescript"), - ("test.js", JAVASCRIPT_CODE, "javascript"), - ("main.go", GO_CODE, "go"), - ("lib.rs", RUST_CODE, "rust"), - ("Main.java", JAVA_CODE, "java"), - ]) - def test_references_extracted(self, filename, code, language): - chunker = _make_chunker() - result = chunker.chunk_file(filename, code, language) - assert len(result.references) > 0, f"{language}: no references extracted" - for ref in result.references: - assert ref.file_path == filename - assert ref.line >= 1 - assert ref.language == language - - @pytest.mark.parametrize("filename,code,language", [ - ("script.lua", LUA_CODE, "lua"), - ("config.yaml", YAML_CODE, "yaml"), - ("package.json", JSON_CODE, "json"), - ]) - def test_no_crash_on_data_languages(self, filename, code, language): - """Languages without LANGUAGE_NODES fall back to sliding window without errors.""" - chunker = _make_chunker() - result = chunker.chunk_file(filename, code, language) - assert len(result.chunks) > 0, f"{language}: produced no chunks at all" - assert all(c.chunk_type == "block" for c in result.chunks), ( - f"{language}: expected sliding-window blocks" - ) - - -class TestChunkerFallback: - def test_sliding_window(self): - chunker = _make_chunker() - result = chunker.chunk_file("readme.txt", PLAIN_TEXT, "text") - assert len(result.chunks) > 0 - assert all(c.chunk_type == "block" for c in result.chunks) - assert result.references == [] - - def test_empty_file(self): - chunker = _make_chunker() - result = chunker.chunk_file("empty.py", "", "python") - assert len(result.chunks) == 0 - - -class TestReferenceExtraction: - def test_extracts_references_python(self): - chunker = _make_chunker() - result = chunker.chunk_file("test.py", PYTHON_CODE, "python") - ref_names = {r.name for r in result.references} - # Calculator and hello are used in main() - assert "Calculator" in ref_names - assert "hello" in ref_names - - def test_skips_definition_names(self): - chunker = _make_chunker() - result = chunker.chunk_file("test.py", PYTHON_CODE, "python") - # "hello" should appear as reference (in main), but not at def line - hello_refs = [r for r in result.references if r.name == "hello"] - # The definition is at line 7 (def hello(...)), refs should not be there - hello_def_line = None - for c in result.chunks: - if c.symbol_name == "hello" and c.chunk_type == "function": - hello_def_line = c.start_line - break - assert hello_def_line is not None - assert all(r.line != hello_def_line for r in hello_refs) - - def test_skips_keywords(self): - chunker = _make_chunker() - result = chunker.chunk_file("test.py", PYTHON_CODE, "python") - ref_names = {r.name for r in result.references} - assert "self" not in ref_names - assert "None" not in ref_names - assert "True" not in ref_names - - def test_refs_have_correct_file_path(self): - chunker = _make_chunker() - result = chunker.chunk_file("test.py", PYTHON_CODE, "python") - for ref in result.references: - assert ref.file_path == "test.py" - assert ref.line >= 1 - assert ref.col >= 0 - assert ref.language == "python" - - def test_extracts_references_go(self): - chunker = _make_chunker() - result = chunker.chunk_file("main.go", GO_CODE, "go") - ref_names = {r.name for r in result.references} - # NewServer and Start are used in main() - assert "NewServer" in ref_names or "Server" in ref_names - - def test_no_refs_for_unsupported_language(self): - chunker = _make_chunker() - result = chunker.chunk_file("readme.txt", PLAIN_TEXT, "text") - assert result.references == [] diff --git a/legacy/python-api/tests/test_file_discovery.py b/legacy/python-api/tests/test_file_discovery.py deleted file mode 100644 index 192ac37..0000000 --- a/legacy/python-api/tests/test_file_discovery.py +++ /dev/null @@ -1,111 +0,0 @@ -import os -import tempfile -from pathlib import Path - -import pytest - -from api.app.services.file_discovery import FileDiscoveryService - - -def _write(root: Path, rel_path: str, content: str = "hello") -> None: - p = root / rel_path - p.parent.mkdir(parents=True, exist_ok=True) - p.write_text(content) - - -@pytest.fixture -def svc() -> FileDiscoveryService: - return FileDiscoveryService() - - -class TestCixignore: - def test_root_cixignore_excludes_files(self, svc: FileDiscoveryService) -> None: - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, ".cixignore", "*.log\nsecret.txt\n") - _write(root, "main.go", "package main") - _write(root, "app.log", "log data") - _write(root, "secret.txt", "password") - _write(root, "readme.txt", "hello") - - files = svc.discover(tmp, [], 524288) - paths = sorted(f.path for f in files) - - assert any("main.go" in p for p in paths) - assert any("readme.txt" in p for p in paths) - assert not any("app.log" in p for p in paths) - assert not any("secret.txt" in p for p in paths) - - def test_cixignore_directory_pattern(self, svc: FileDiscoveryService) -> None: - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, ".cixignore", "submodules/\n") - _write(root, "main.go", "package main") - _write(root, "submodules/vendor/lib.go", "package lib") - _write(root, "src/app.go", "package src") - - files = svc.discover(tmp, [], 524288) - paths = [f.path for f in files] - - assert any("main.go" in p for p in paths) - assert any("app.go" in p for p in paths) - assert not any("submodules" in p for p in paths) - - def test_cixignore_and_gitignore_merged(self, svc: FileDiscoveryService) -> None: - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, ".gitignore", "*.log\n") - _write(root, ".cixignore", "*.tmp\n") - _write(root, "main.go", "package main") - _write(root, "app.log", "log") - _write(root, "cache.tmp", "temp") - _write(root, "readme.txt", "hello") - - files = svc.discover(tmp, [], 524288) - paths = sorted(f.path for f in files) - - assert any("main.go" in p for p in paths) - assert any("readme.txt" in p for p in paths) - assert not any("app.log" in p for p in paths) - assert not any("cache.tmp" in p for p in paths) - - def test_only_cixignore_no_gitignore(self, svc: FileDiscoveryService) -> None: - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, ".cixignore", "generated/\n*.bak\n") - _write(root, "main.go", "package main") - _write(root, "config.bak", "old config") - _write(root, "generated/api.go", "package gen") - - files = svc.discover(tmp, [], 524288) - paths = [f.path for f in files] - - assert any("main.go" in p for p in paths) - assert not any("config.bak" in p for p in paths) - assert not any("generated" in p for p in paths) - - def test_only_gitignore_no_cixignore(self, svc: FileDiscoveryService) -> None: - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, ".gitignore", "*.log\n") - _write(root, "main.go", "package main") - _write(root, "app.log", "log data") - - files = svc.discover(tmp, [], 524288) - paths = [f.path for f in files] - - assert any("main.go" in p for p in paths) - assert not any("app.log" in p for p in paths) - - def test_no_ignore_files(self, svc: FileDiscoveryService) -> None: - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, "main.go", "package main") - _write(root, "data.txt", "data") - - files = svc.discover(tmp, [], 524288) - paths = sorted(f.path for f in files) - - assert len(paths) == 2 - assert any("main.go" in p for p in paths) - assert any("data.txt" in p for p in paths) \ No newline at end of file diff --git a/legacy/python-api/tests/test_project_config.py b/legacy/python-api/tests/test_project_config.py deleted file mode 100644 index 51f528b..0000000 --- a/legacy/python-api/tests/test_project_config.py +++ /dev/null @@ -1,136 +0,0 @@ -import tempfile -from pathlib import Path - -import pytest - -from api.app.services.project_config import ( - IgnoreConfig, - ProjectConfig, - load_project_config, - parse_submodule_paths, -) - - -def _write(root: Path, rel_path: str, content: str) -> None: - p = root / rel_path - p.parent.mkdir(parents=True, exist_ok=True) - p.write_text(content) - - -class TestLoadProjectConfig: - def test_submodules_true(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - _write(Path(tmp), ".cixconfig.yaml", "ignore:\n submodules: true\n") - cfg = load_project_config(tmp) - assert cfg.ignore.submodules is True - - def test_submodules_false(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - _write(Path(tmp), ".cixconfig.yaml", "ignore:\n submodules: false\n") - cfg = load_project_config(tmp) - assert cfg.ignore.submodules is False - - def test_no_file(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - cfg = load_project_config(tmp) - assert cfg.ignore.submodules is False - - def test_empty_file(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - _write(Path(tmp), ".cixconfig.yaml", "") - cfg = load_project_config(tmp) - assert cfg.ignore.submodules is False - - def test_invalid_yaml(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - _write(Path(tmp), ".cixconfig.yaml", ":::bad{{{yaml") - cfg = load_project_config(tmp) - # Should return default config, not crash - assert cfg.ignore.submodules is False - - -class TestParseSubmodulePaths: - def test_standard_gitmodules(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - _write( - Path(tmp), - ".gitmodules", - '[submodule "api/schema"]\n' - "\tpath = api/schema\n" - "\turl = https://example.com/schema.git\n" - '[submodule "libs/vendor"]\n' - "\tpath = libs/vendor\n" - "\turl = https://example.com/vendor.git\n", - ) - paths = parse_submodule_paths(tmp) - assert sorted(paths) == ["api/schema", "libs/vendor"] - - def test_no_gitmodules(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - paths = parse_submodule_paths(tmp) - assert paths == [] - - def test_empty_gitmodules(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - _write(Path(tmp), ".gitmodules", "") - paths = parse_submodule_paths(tmp) - assert paths == [] - - def test_single_submodule(self) -> None: - with tempfile.TemporaryDirectory() as tmp: - _write( - Path(tmp), - ".gitmodules", - '[submodule "vendor"]\n\tpath = vendor\n\turl = https://example.com/v.git\n', - ) - paths = parse_submodule_paths(tmp) - assert paths == ["vendor"] - - -class TestFileDiscoveryWithSubmodules: - def test_submodules_excluded(self) -> None: - from api.app.services.file_discovery import FileDiscoveryService - - svc = FileDiscoveryService() - - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, ".cixconfig.yaml", "ignore:\n submodules: true\n") - _write( - root, - ".gitmodules", - '[submodule "vendor"]\n\tpath = vendor\n\turl = https://example.com/v.git\n', - ) - _write(root, "main.go", "package main") - _write(root, "vendor/lib.go", "package vendor") - _write(root, "vendor/deep/util.go", "package deep") - _write(root, "src/app.go", "package src") - - files = svc.discover(tmp, [], 524288) - paths = sorted(f.path for f in files) - - assert any("main.go" in p for p in paths) - assert any("app.go" in p for p in paths) - assert not any("vendor" in p for p in paths) - - def test_submodules_not_excluded_when_false(self) -> None: - from api.app.services.file_discovery import FileDiscoveryService - - svc = FileDiscoveryService() - - with tempfile.TemporaryDirectory() as tmp: - root = Path(tmp) - _write(root, ".cixconfig.yaml", "ignore:\n submodules: false\n") - _write( - root, - ".gitmodules", - '[submodule "vendor"]\n\tpath = vendor\n\turl = https://example.com/v.git\n', - ) - _write(root, "main.go", "package main") - _write(root, "vendor/lib.go", "package vendor") - - files = svc.discover(tmp, [], 524288) - paths = [f.path for f in files] - - assert any("main.go" in p for p in paths) - assert any("vendor" in p for p in paths) \ No newline at end of file diff --git a/legacy/python-api/tests/test_search.py b/legacy/python-api/tests/test_search.py deleted file mode 100644 index e162765..0000000 --- a/legacy/python-api/tests/test_search.py +++ /dev/null @@ -1,111 +0,0 @@ -"""Search integration tests — require running Docker container with indexed project.""" -import os - -import httpx -import pytest - -BASE_URL = os.environ.get("CODE_INDEX_API_URL", "http://localhost:21847") -API_KEY = os.environ.get("CODE_INDEX_API_KEY", "") - - -@pytest.fixture -def client(): - return httpx.Client( - base_url=BASE_URL, - headers={"Authorization": f"Bearer {API_KEY}"}, - timeout=60.0, - ) - - -@pytest.fixture -def project_with_index(client): - """Create a project, wait for indexing, return project_id. Cleanup after.""" - if not API_KEY: - pytest.skip("API_KEY not set") - - r = client.post( - "/api/v1/projects", - json={"name": "test-search", "host_path": "/tmp/test-search"}, - ) - if r.status_code == 409: - # Already exists, find it - r = client.get("/api/v1/projects") - for p in r.json()["projects"]: - if p["name"] == "test-search": - yield p["id"] - client.delete(f"/api/v1/projects/{p['id']}") - return - - project_id = r.json()["id"] - yield project_id - client.delete(f"/api/v1/projects/{project_id}") - - -def test_semantic_search(client, project_with_index): - r = client.post( - f"/api/v1/projects/{project_with_index}/search", - json={"query": "test function", "limit": 5}, - ) - assert r.status_code == 200 - data = r.json() - assert "results" in data - assert "total" in data - assert "query_time_ms" in data - - -def test_symbol_search(client, project_with_index): - r = client.post( - f"/api/v1/projects/{project_with_index}/search/symbols", - json={"query": "main", "limit": 5}, - ) - assert r.status_code == 200 - data = r.json() - assert "results" in data - assert "total" in data - - -def test_file_search(client, project_with_index): - r = client.post( - f"/api/v1/projects/{project_with_index}/search/files", - json={"query": "test", "limit": 5}, - ) - assert r.status_code == 200 - data = r.json() - assert "files" in data - assert "total" in data - - -def test_project_summary(client, project_with_index): - r = client.get(f"/api/v1/projects/{project_with_index}/summary") - assert r.status_code == 200 - data = r.json() - assert "name" in data - assert "languages" in data - assert "total_files" in data - - -def test_search_with_filters(client, project_with_index): - r = client.post( - f"/api/v1/projects/{project_with_index}/search", - json={ - "query": "function", - "limit": 5, - "languages": ["python"], - "min_score": 0.1, - }, - ) - assert r.status_code == 200 - data = r.json() - assert "results" in data - assert "total" in data - assert "query_time_ms" in data - - -def test_search_nonexistent_project(client): - if not API_KEY: - pytest.skip("API_KEY not set") - r = client.post( - "/api/v1/projects/nonexistent-id/search", - json={"query": "test"}, - ) - assert r.status_code == 404 diff --git a/legacy/python-api/uv.lock b/legacy/python-api/uv.lock deleted file mode 100644 index bc0614e..0000000 --- a/legacy/python-api/uv.lock +++ /dev/null @@ -1,742 +0,0 @@ -version = 1 -revision = 3 -requires-python = ">=3.11" - -[[package]] -name = "annotated-types" -version = "0.7.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" }, -] - -[[package]] -name = "anyio" -version = "4.12.1" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "idna" }, - { name = "typing-extensions", marker = "python_full_version < '3.13'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/96/f0/5eb65b2bb0d09ac6776f2eb54adee6abe8228ea05b20a5ad0e4945de8aac/anyio-4.12.1.tar.gz", hash = "sha256:41cfcc3a4c85d3f05c932da7c26d0201ac36f72abd4435ba90d0464a3ffed703", size = 228685, upload-time = "2026-01-06T11:45:21.246Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/38/0e/27be9fdef66e72d64c0cdc3cc2823101b80585f8119b5c112c2e8f5f7dab/anyio-4.12.1-py3-none-any.whl", hash = "sha256:d405828884fc140aa80a3c667b8beed277f1dfedec42ba031bd6ac3db606ab6c", size = 113592, upload-time = "2026-01-06T11:45:19.497Z" }, -] - -[[package]] -name = "attrs" -version = "25.4.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/6b/5c/685e6633917e101e5dcb62b9dd76946cbb57c26e133bae9e0cd36033c0a9/attrs-25.4.0.tar.gz", hash = "sha256:16d5969b87f0859ef33a48b35d55ac1be6e42ae49d5e853b597db70c35c57e11", size = 934251, upload-time = "2025-10-06T13:54:44.725Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/3a/2a/7cc015f5b9f5db42b7d48157e23356022889fc354a2813c15934b7cb5c0e/attrs-25.4.0-py3-none-any.whl", hash = "sha256:adcf7e2a1fb3b36ac48d97835bb6d8ade15b8dcce26aba8bf1d14847b57a3373", size = 67615, upload-time = "2025-10-06T13:54:43.17Z" }, -] - -[[package]] -name = "certifi" -version = "2026.2.25" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/af/2d/7bf41579a8986e348fa033a31cdd0e4121114f6bce2457e8876010b092dd/certifi-2026.2.25.tar.gz", hash = "sha256:e887ab5cee78ea814d3472169153c2d12cd43b14bd03329a39a9c6e2e80bfba7", size = 155029, upload-time = "2026-02-25T02:54:17.342Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/9a/3c/c17fb3ca2d9c3acff52e30b309f538586f9f5b9c9cf454f3845fc9af4881/certifi-2026.2.25-py3-none-any.whl", hash = "sha256:027692e4402ad994f1c42e52a4997a9763c646b73e4096e4d5d6db8af1d6f0fa", size = 153684, upload-time = "2026-02-25T02:54:15.766Z" }, -] - -[[package]] -name = "cffi" -version = "2.0.0" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "pycparser", marker = "implementation_name != 'PyPy'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/eb/56/b1ba7935a17738ae8453301356628e8147c79dbb825bcbc73dc7401f9846/cffi-2.0.0.tar.gz", hash = "sha256:44d1b5909021139fe36001ae048dbdde8214afa20200eda0f64c068cac5d5529", size = 523588, upload-time = "2025-09-08T23:24:04.541Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/12/4a/3dfd5f7850cbf0d06dc84ba9aa00db766b52ca38d8b86e3a38314d52498c/cffi-2.0.0-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:b4c854ef3adc177950a8dfc81a86f5115d2abd545751a304c5bcf2c2c7283cfe", size = 184344, upload-time = "2025-09-08T23:22:26.456Z" }, - { url = "https://files.pythonhosted.org/packages/4f/8b/f0e4c441227ba756aafbe78f117485b25bb26b1c059d01f137fa6d14896b/cffi-2.0.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2de9a304e27f7596cd03d16f1b7c72219bd944e99cc52b84d0145aefb07cbd3c", size = 180560, upload-time = "2025-09-08T23:22:28.197Z" }, - { url = "https://files.pythonhosted.org/packages/b1/b7/1200d354378ef52ec227395d95c2576330fd22a869f7a70e88e1447eb234/cffi-2.0.0-cp311-cp311-manylinux1_i686.manylinux2014_i686.manylinux_2_17_i686.manylinux_2_5_i686.whl", hash = "sha256:baf5215e0ab74c16e2dd324e8ec067ef59e41125d3eade2b863d294fd5035c92", size = 209613, upload-time = "2025-09-08T23:22:29.475Z" }, - { url = "https://files.pythonhosted.org/packages/b8/56/6033f5e86e8cc9bb629f0077ba71679508bdf54a9a5e112a3c0b91870332/cffi-2.0.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:730cacb21e1bdff3ce90babf007d0a0917cc3e6492f336c2f0134101e0944f93", size = 216476, upload-time = "2025-09-08T23:22:31.063Z" }, - { url = "https://files.pythonhosted.org/packages/dc/7f/55fecd70f7ece178db2f26128ec41430d8720f2d12ca97bf8f0a628207d5/cffi-2.0.0-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:6824f87845e3396029f3820c206e459ccc91760e8fa24422f8b0c3d1731cbec5", size = 203374, upload-time = "2025-09-08T23:22:32.507Z" }, - { url = "https://files.pythonhosted.org/packages/84/ef/a7b77c8bdc0f77adc3b46888f1ad54be8f3b7821697a7b89126e829e676a/cffi-2.0.0-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:9de40a7b0323d889cf8d23d1ef214f565ab154443c42737dfe52ff82cf857664", size = 202597, upload-time = "2025-09-08T23:22:34.132Z" }, - { url = "https://files.pythonhosted.org/packages/d7/91/500d892b2bf36529a75b77958edfcd5ad8e2ce4064ce2ecfeab2125d72d1/cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8941aaadaf67246224cee8c3803777eed332a19d909b47e29c9842ef1e79ac26", size = 215574, upload-time = "2025-09-08T23:22:35.443Z" }, - { url = "https://files.pythonhosted.org/packages/44/64/58f6255b62b101093d5df22dcb752596066c7e89dd725e0afaed242a61be/cffi-2.0.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:a05d0c237b3349096d3981b727493e22147f934b20f6f125a3eba8f994bec4a9", size = 218971, upload-time = "2025-09-08T23:22:36.805Z" }, - { url = "https://files.pythonhosted.org/packages/ab/49/fa72cebe2fd8a55fbe14956f9970fe8eb1ac59e5df042f603ef7c8ba0adc/cffi-2.0.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:94698a9c5f91f9d138526b48fe26a199609544591f859c870d477351dc7b2414", size = 211972, upload-time = "2025-09-08T23:22:38.436Z" }, - { url = "https://files.pythonhosted.org/packages/0b/28/dd0967a76aab36731b6ebfe64dec4e981aff7e0608f60c2d46b46982607d/cffi-2.0.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:5fed36fccc0612a53f1d4d9a816b50a36702c28a2aa880cb8a122b3466638743", size = 217078, upload-time = "2025-09-08T23:22:39.776Z" }, - { url = "https://files.pythonhosted.org/packages/2b/c0/015b25184413d7ab0a410775fdb4a50fca20f5589b5dab1dbbfa3baad8ce/cffi-2.0.0-cp311-cp311-win32.whl", hash = "sha256:c649e3a33450ec82378822b3dad03cc228b8f5963c0c12fc3b1e0ab940f768a5", size = 172076, upload-time = "2025-09-08T23:22:40.95Z" }, - { url = "https://files.pythonhosted.org/packages/ae/8f/dc5531155e7070361eb1b7e4c1a9d896d0cb21c49f807a6c03fd63fc877e/cffi-2.0.0-cp311-cp311-win_amd64.whl", hash = "sha256:66f011380d0e49ed280c789fbd08ff0d40968ee7b665575489afa95c98196ab5", size = 182820, upload-time = "2025-09-08T23:22:42.463Z" }, - { url = "https://files.pythonhosted.org/packages/95/5c/1b493356429f9aecfd56bc171285a4c4ac8697f76e9bbbbb105e537853a1/cffi-2.0.0-cp311-cp311-win_arm64.whl", hash = "sha256:c6638687455baf640e37344fe26d37c404db8b80d037c3d29f58fe8d1c3b194d", size = 177635, upload-time = "2025-09-08T23:22:43.623Z" }, - { url = "https://files.pythonhosted.org/packages/ea/47/4f61023ea636104d4f16ab488e268b93008c3d0bb76893b1b31db1f96802/cffi-2.0.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:6d02d6655b0e54f54c4ef0b94eb6be0607b70853c45ce98bd278dc7de718be5d", size = 185271, upload-time = "2025-09-08T23:22:44.795Z" }, - { url = "https://files.pythonhosted.org/packages/df/a2/781b623f57358e360d62cdd7a8c681f074a71d445418a776eef0aadb4ab4/cffi-2.0.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:8eca2a813c1cb7ad4fb74d368c2ffbbb4789d377ee5bb8df98373c2cc0dee76c", size = 181048, upload-time = "2025-09-08T23:22:45.938Z" }, - { url = "https://files.pythonhosted.org/packages/ff/df/a4f0fbd47331ceeba3d37c2e51e9dfc9722498becbeec2bd8bc856c9538a/cffi-2.0.0-cp312-cp312-manylinux1_i686.manylinux2014_i686.manylinux_2_17_i686.manylinux_2_5_i686.whl", hash = "sha256:21d1152871b019407d8ac3985f6775c079416c282e431a4da6afe7aefd2bccbe", size = 212529, upload-time = "2025-09-08T23:22:47.349Z" }, - { url = "https://files.pythonhosted.org/packages/d5/72/12b5f8d3865bf0f87cf1404d8c374e7487dcf097a1c91c436e72e6badd83/cffi-2.0.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b21e08af67b8a103c71a250401c78d5e0893beff75e28c53c98f4de42f774062", size = 220097, upload-time = "2025-09-08T23:22:48.677Z" }, - { url = "https://files.pythonhosted.org/packages/c2/95/7a135d52a50dfa7c882ab0ac17e8dc11cec9d55d2c18dda414c051c5e69e/cffi-2.0.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:1e3a615586f05fc4065a8b22b8152f0c1b00cdbc60596d187c2a74f9e3036e4e", size = 207983, upload-time = "2025-09-08T23:22:50.06Z" }, - { url = "https://files.pythonhosted.org/packages/3a/c8/15cb9ada8895957ea171c62dc78ff3e99159ee7adb13c0123c001a2546c1/cffi-2.0.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:81afed14892743bbe14dacb9e36d9e0e504cd204e0b165062c488942b9718037", size = 206519, upload-time = "2025-09-08T23:22:51.364Z" }, - { url = "https://files.pythonhosted.org/packages/78/2d/7fa73dfa841b5ac06c7b8855cfc18622132e365f5b81d02230333ff26e9e/cffi-2.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3e17ed538242334bf70832644a32a7aae3d83b57567f9fd60a26257e992b79ba", size = 219572, upload-time = "2025-09-08T23:22:52.902Z" }, - { url = "https://files.pythonhosted.org/packages/07/e0/267e57e387b4ca276b90f0434ff88b2c2241ad72b16d31836adddfd6031b/cffi-2.0.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:3925dd22fa2b7699ed2617149842d2e6adde22b262fcbfada50e3d195e4b3a94", size = 222963, upload-time = "2025-09-08T23:22:54.518Z" }, - { url = "https://files.pythonhosted.org/packages/b6/75/1f2747525e06f53efbd878f4d03bac5b859cbc11c633d0fb81432d98a795/cffi-2.0.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:2c8f814d84194c9ea681642fd164267891702542f028a15fc97d4674b6206187", size = 221361, upload-time = "2025-09-08T23:22:55.867Z" }, - { url = "https://files.pythonhosted.org/packages/7b/2b/2b6435f76bfeb6bbf055596976da087377ede68df465419d192acf00c437/cffi-2.0.0-cp312-cp312-win32.whl", hash = "sha256:da902562c3e9c550df360bfa53c035b2f241fed6d9aef119048073680ace4a18", size = 172932, upload-time = "2025-09-08T23:22:57.188Z" }, - { url = "https://files.pythonhosted.org/packages/f8/ed/13bd4418627013bec4ed6e54283b1959cf6db888048c7cf4b4c3b5b36002/cffi-2.0.0-cp312-cp312-win_amd64.whl", hash = "sha256:da68248800ad6320861f129cd9c1bf96ca849a2771a59e0344e88681905916f5", size = 183557, upload-time = "2025-09-08T23:22:58.351Z" }, - { url = "https://files.pythonhosted.org/packages/95/31/9f7f93ad2f8eff1dbc1c3656d7ca5bfd8fb52c9d786b4dcf19b2d02217fa/cffi-2.0.0-cp312-cp312-win_arm64.whl", hash = "sha256:4671d9dd5ec934cb9a73e7ee9676f9362aba54f7f34910956b84d727b0d73fb6", size = 177762, upload-time = "2025-09-08T23:22:59.668Z" }, - { url = "https://files.pythonhosted.org/packages/4b/8d/a0a47a0c9e413a658623d014e91e74a50cdd2c423f7ccfd44086ef767f90/cffi-2.0.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:00bdf7acc5f795150faa6957054fbbca2439db2f775ce831222b66f192f03beb", size = 185230, upload-time = "2025-09-08T23:23:00.879Z" }, - { url = "https://files.pythonhosted.org/packages/4a/d2/a6c0296814556c68ee32009d9c2ad4f85f2707cdecfd7727951ec228005d/cffi-2.0.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:45d5e886156860dc35862657e1494b9bae8dfa63bf56796f2fb56e1679fc0bca", size = 181043, upload-time = "2025-09-08T23:23:02.231Z" }, - { url = "https://files.pythonhosted.org/packages/b0/1e/d22cc63332bd59b06481ceaac49d6c507598642e2230f201649058a7e704/cffi-2.0.0-cp313-cp313-manylinux1_i686.manylinux2014_i686.manylinux_2_17_i686.manylinux_2_5_i686.whl", hash = "sha256:07b271772c100085dd28b74fa0cd81c8fb1a3ba18b21e03d7c27f3436a10606b", size = 212446, upload-time = "2025-09-08T23:23:03.472Z" }, - { url = "https://files.pythonhosted.org/packages/a9/f5/a2c23eb03b61a0b8747f211eb716446c826ad66818ddc7810cc2cc19b3f2/cffi-2.0.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:d48a880098c96020b02d5a1f7d9251308510ce8858940e6fa99ece33f610838b", size = 220101, upload-time = "2025-09-08T23:23:04.792Z" }, - { url = "https://files.pythonhosted.org/packages/f2/7f/e6647792fc5850d634695bc0e6ab4111ae88e89981d35ac269956605feba/cffi-2.0.0-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:f93fd8e5c8c0a4aa1f424d6173f14a892044054871c771f8566e4008eaa359d2", size = 207948, upload-time = "2025-09-08T23:23:06.127Z" }, - { url = "https://files.pythonhosted.org/packages/cb/1e/a5a1bd6f1fb30f22573f76533de12a00bf274abcdc55c8edab639078abb6/cffi-2.0.0-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:dd4f05f54a52fb558f1ba9f528228066954fee3ebe629fc1660d874d040ae5a3", size = 206422, upload-time = "2025-09-08T23:23:07.753Z" }, - { url = "https://files.pythonhosted.org/packages/98/df/0a1755e750013a2081e863e7cd37e0cdd02664372c754e5560099eb7aa44/cffi-2.0.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:c8d3b5532fc71b7a77c09192b4a5a200ea992702734a2e9279a37f2478236f26", size = 219499, upload-time = "2025-09-08T23:23:09.648Z" }, - { url = "https://files.pythonhosted.org/packages/50/e1/a969e687fcf9ea58e6e2a928ad5e2dd88cc12f6f0ab477e9971f2309b57c/cffi-2.0.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:d9b29c1f0ae438d5ee9acb31cadee00a58c46cc9c0b2f9038c6b0b3470877a8c", size = 222928, upload-time = "2025-09-08T23:23:10.928Z" }, - { url = "https://files.pythonhosted.org/packages/36/54/0362578dd2c9e557a28ac77698ed67323ed5b9775ca9d3fe73fe191bb5d8/cffi-2.0.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6d50360be4546678fc1b79ffe7a66265e28667840010348dd69a314145807a1b", size = 221302, upload-time = "2025-09-08T23:23:12.42Z" }, - { url = "https://files.pythonhosted.org/packages/eb/6d/bf9bda840d5f1dfdbf0feca87fbdb64a918a69bca42cfa0ba7b137c48cb8/cffi-2.0.0-cp313-cp313-win32.whl", hash = "sha256:74a03b9698e198d47562765773b4a8309919089150a0bb17d829ad7b44b60d27", size = 172909, upload-time = "2025-09-08T23:23:14.32Z" }, - { url = "https://files.pythonhosted.org/packages/37/18/6519e1ee6f5a1e579e04b9ddb6f1676c17368a7aba48299c3759bbc3c8b3/cffi-2.0.0-cp313-cp313-win_amd64.whl", hash = "sha256:19f705ada2530c1167abacb171925dd886168931e0a7b78f5bffcae5c6b5be75", size = 183402, upload-time = "2025-09-08T23:23:15.535Z" }, - { url = "https://files.pythonhosted.org/packages/cb/0e/02ceeec9a7d6ee63bb596121c2c8e9b3a9e150936f4fbef6ca1943e6137c/cffi-2.0.0-cp313-cp313-win_arm64.whl", hash = "sha256:256f80b80ca3853f90c21b23ee78cd008713787b1b1e93eae9f3d6a7134abd91", size = 177780, upload-time = "2025-09-08T23:23:16.761Z" }, - { url = "https://files.pythonhosted.org/packages/92/c4/3ce07396253a83250ee98564f8d7e9789fab8e58858f35d07a9a2c78de9f/cffi-2.0.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:fc33c5141b55ed366cfaad382df24fe7dcbc686de5be719b207bb248e3053dc5", size = 185320, upload-time = "2025-09-08T23:23:18.087Z" }, - { url = "https://files.pythonhosted.org/packages/59/dd/27e9fa567a23931c838c6b02d0764611c62290062a6d4e8ff7863daf9730/cffi-2.0.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c654de545946e0db659b3400168c9ad31b5d29593291482c43e3564effbcee13", size = 181487, upload-time = "2025-09-08T23:23:19.622Z" }, - { url = "https://files.pythonhosted.org/packages/d6/43/0e822876f87ea8a4ef95442c3d766a06a51fc5298823f884ef87aaad168c/cffi-2.0.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:24b6f81f1983e6df8db3adc38562c83f7d4a0c36162885ec7f7b77c7dcbec97b", size = 220049, upload-time = "2025-09-08T23:23:20.853Z" }, - { url = "https://files.pythonhosted.org/packages/b4/89/76799151d9c2d2d1ead63c2429da9ea9d7aac304603de0c6e8764e6e8e70/cffi-2.0.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:12873ca6cb9b0f0d3a0da705d6086fe911591737a59f28b7936bdfed27c0d47c", size = 207793, upload-time = "2025-09-08T23:23:22.08Z" }, - { url = "https://files.pythonhosted.org/packages/bb/dd/3465b14bb9e24ee24cb88c9e3730f6de63111fffe513492bf8c808a3547e/cffi-2.0.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:d9b97165e8aed9272a6bb17c01e3cc5871a594a446ebedc996e2397a1c1ea8ef", size = 206300, upload-time = "2025-09-08T23:23:23.314Z" }, - { url = "https://files.pythonhosted.org/packages/47/d9/d83e293854571c877a92da46fdec39158f8d7e68da75bf73581225d28e90/cffi-2.0.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:afb8db5439b81cf9c9d0c80404b60c3cc9c3add93e114dcae767f1477cb53775", size = 219244, upload-time = "2025-09-08T23:23:24.541Z" }, - { url = "https://files.pythonhosted.org/packages/2b/0f/1f177e3683aead2bb00f7679a16451d302c436b5cbf2505f0ea8146ef59e/cffi-2.0.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:737fe7d37e1a1bffe70bd5754ea763a62a066dc5913ca57e957824b72a85e205", size = 222828, upload-time = "2025-09-08T23:23:26.143Z" }, - { url = "https://files.pythonhosted.org/packages/c6/0f/cafacebd4b040e3119dcb32fed8bdef8dfe94da653155f9d0b9dc660166e/cffi-2.0.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:38100abb9d1b1435bc4cc340bb4489635dc2f0da7456590877030c9b3d40b0c1", size = 220926, upload-time = "2025-09-08T23:23:27.873Z" }, - { url = "https://files.pythonhosted.org/packages/3e/aa/df335faa45b395396fcbc03de2dfcab242cd61a9900e914fe682a59170b1/cffi-2.0.0-cp314-cp314-win32.whl", hash = "sha256:087067fa8953339c723661eda6b54bc98c5625757ea62e95eb4898ad5e776e9f", size = 175328, upload-time = "2025-09-08T23:23:44.61Z" }, - { url = "https://files.pythonhosted.org/packages/bb/92/882c2d30831744296ce713f0feb4c1cd30f346ef747b530b5318715cc367/cffi-2.0.0-cp314-cp314-win_amd64.whl", hash = "sha256:203a48d1fb583fc7d78a4c6655692963b860a417c0528492a6bc21f1aaefab25", size = 185650, upload-time = "2025-09-08T23:23:45.848Z" }, - { url = "https://files.pythonhosted.org/packages/9f/2c/98ece204b9d35a7366b5b2c6539c350313ca13932143e79dc133ba757104/cffi-2.0.0-cp314-cp314-win_arm64.whl", hash = "sha256:dbd5c7a25a7cb98f5ca55d258b103a2054f859a46ae11aaf23134f9cc0d356ad", size = 180687, upload-time = "2025-09-08T23:23:47.105Z" }, - { url = "https://files.pythonhosted.org/packages/3e/61/c768e4d548bfa607abcda77423448df8c471f25dbe64fb2ef6d555eae006/cffi-2.0.0-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:9a67fc9e8eb39039280526379fb3a70023d77caec1852002b4da7e8b270c4dd9", size = 188773, upload-time = "2025-09-08T23:23:29.347Z" }, - { url = "https://files.pythonhosted.org/packages/2c/ea/5f76bce7cf6fcd0ab1a1058b5af899bfbef198bea4d5686da88471ea0336/cffi-2.0.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:7a66c7204d8869299919db4d5069a82f1561581af12b11b3c9f48c584eb8743d", size = 185013, upload-time = "2025-09-08T23:23:30.63Z" }, - { url = "https://files.pythonhosted.org/packages/be/b4/c56878d0d1755cf9caa54ba71e5d049479c52f9e4afc230f06822162ab2f/cffi-2.0.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:7cc09976e8b56f8cebd752f7113ad07752461f48a58cbba644139015ac24954c", size = 221593, upload-time = "2025-09-08T23:23:31.91Z" }, - { url = "https://files.pythonhosted.org/packages/e0/0d/eb704606dfe8033e7128df5e90fee946bbcb64a04fcdaa97321309004000/cffi-2.0.0-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:92b68146a71df78564e4ef48af17551a5ddd142e5190cdf2c5624d0c3ff5b2e8", size = 209354, upload-time = "2025-09-08T23:23:33.214Z" }, - { url = "https://files.pythonhosted.org/packages/d8/19/3c435d727b368ca475fb8742ab97c9cb13a0de600ce86f62eab7fa3eea60/cffi-2.0.0-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:b1e74d11748e7e98e2f426ab176d4ed720a64412b6a15054378afdb71e0f37dc", size = 208480, upload-time = "2025-09-08T23:23:34.495Z" }, - { url = "https://files.pythonhosted.org/packages/d0/44/681604464ed9541673e486521497406fadcc15b5217c3e326b061696899a/cffi-2.0.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:28a3a209b96630bca57cce802da70c266eb08c6e97e5afd61a75611ee6c64592", size = 221584, upload-time = "2025-09-08T23:23:36.096Z" }, - { url = "https://files.pythonhosted.org/packages/25/8e/342a504ff018a2825d395d44d63a767dd8ebc927ebda557fecdaca3ac33a/cffi-2.0.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:7553fb2090d71822f02c629afe6042c299edf91ba1bf94951165613553984512", size = 224443, upload-time = "2025-09-08T23:23:37.328Z" }, - { url = "https://files.pythonhosted.org/packages/e1/5e/b666bacbbc60fbf415ba9988324a132c9a7a0448a9a8f125074671c0f2c3/cffi-2.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:6c6c373cfc5c83a975506110d17457138c8c63016b563cc9ed6e056a82f13ce4", size = 223437, upload-time = "2025-09-08T23:23:38.945Z" }, - { url = "https://files.pythonhosted.org/packages/a0/1d/ec1a60bd1a10daa292d3cd6bb0b359a81607154fb8165f3ec95fe003b85c/cffi-2.0.0-cp314-cp314t-win32.whl", hash = "sha256:1fc9ea04857caf665289b7a75923f2c6ed559b8298a1b8c49e59f7dd95c8481e", size = 180487, upload-time = "2025-09-08T23:23:40.423Z" }, - { url = "https://files.pythonhosted.org/packages/bf/41/4c1168c74fac325c0c8156f04b6749c8b6a8f405bbf91413ba088359f60d/cffi-2.0.0-cp314-cp314t-win_amd64.whl", hash = "sha256:d68b6cef7827e8641e8ef16f4494edda8b36104d79773a334beaa1e3521430f6", size = 191726, upload-time = "2025-09-08T23:23:41.742Z" }, - { url = "https://files.pythonhosted.org/packages/ae/3a/dbeec9d1ee0844c679f6bb5d6ad4e9f198b1224f4e7a32825f47f6192b0c/cffi-2.0.0-cp314-cp314t-win_arm64.whl", hash = "sha256:0a1527a803f0a659de1af2e1fd700213caba79377e27e4693648c2923da066f9", size = 184195, upload-time = "2025-09-08T23:23:43.004Z" }, -] - -[[package]] -name = "click" -version = "8.3.1" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "colorama", marker = "sys_platform == 'win32'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/3d/fa/656b739db8587d7b5dfa22e22ed02566950fbfbcdc20311993483657a5c0/click-8.3.1.tar.gz", hash = "sha256:12ff4785d337a1bb490bb7e9c2b1ee5da3112e94a8622f26a6c77f5d2fc6842a", size = 295065, upload-time = "2025-11-15T20:45:42.706Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/98/78/01c019cdb5d6498122777c1a43056ebb3ebfeef2076d9d026bfe15583b2b/click-8.3.1-py3-none-any.whl", hash = "sha256:981153a64e25f12d547d3426c367a4857371575ee7ad18df2a6183ab0545b2a6", size = 108274, upload-time = "2025-11-15T20:45:41.139Z" }, -] - -[[package]] -name = "code-index-mcp" -version = "0.1.0" -source = { virtual = "." } -dependencies = [ - { name = "httpx" }, - { name = "mcp" }, - { name = "pyjwt" }, - { name = "pyyaml" }, -] - -[package.metadata] -requires-dist = [ - { name = "httpx", specifier = ">=0.27" }, - { name = "mcp", specifier = ">=1.7" }, - { name = "pyjwt", specifier = ">=2.12.0" }, - { name = "pyyaml", specifier = ">=6.0" }, -] - -[[package]] -name = "colorama" -version = "0.4.6" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" }, -] - -[[package]] -name = "cryptography" -version = "46.0.5" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "cffi", marker = "platform_python_implementation != 'PyPy'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/60/04/ee2a9e8542e4fa2773b81771ff8349ff19cdd56b7258a0cc442639052edb/cryptography-46.0.5.tar.gz", hash = "sha256:abace499247268e3757271b2f1e244b36b06f8515cf27c4d49468fc9eb16e93d", size = 750064, upload-time = "2026-02-10T19:18:38.255Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/f7/81/b0bb27f2ba931a65409c6b8a8b358a7f03c0e46eceacddff55f7c84b1f3b/cryptography-46.0.5-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:351695ada9ea9618b3500b490ad54c739860883df6c1f555e088eaf25b1bbaad", size = 7176289, upload-time = "2026-02-10T19:17:08.274Z" }, - { url = "https://files.pythonhosted.org/packages/ff/9e/6b4397a3e3d15123de3b1806ef342522393d50736c13b20ec4c9ea6693a6/cryptography-46.0.5-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:c18ff11e86df2e28854939acde2d003f7984f721eba450b56a200ad90eeb0e6b", size = 4275637, upload-time = "2026-02-10T19:17:10.53Z" }, - { url = "https://files.pythonhosted.org/packages/63/e7/471ab61099a3920b0c77852ea3f0ea611c9702f651600397ac567848b897/cryptography-46.0.5-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4d7e3d356b8cd4ea5aff04f129d5f66ebdc7b6f8eae802b93739ed520c47c79b", size = 4424742, upload-time = "2026-02-10T19:17:12.388Z" }, - { url = "https://files.pythonhosted.org/packages/37/53/a18500f270342d66bf7e4d9f091114e31e5ee9e7375a5aba2e85a91e0044/cryptography-46.0.5-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:50bfb6925eff619c9c023b967d5b77a54e04256c4281b0e21336a130cd7fc263", size = 4277528, upload-time = "2026-02-10T19:17:13.853Z" }, - { url = "https://files.pythonhosted.org/packages/22/29/c2e812ebc38c57b40e7c583895e73c8c5adb4d1e4a0cc4c5a4fdab2b1acc/cryptography-46.0.5-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:803812e111e75d1aa73690d2facc295eaefd4439be1023fefc4995eaea2af90d", size = 4947993, upload-time = "2026-02-10T19:17:15.618Z" }, - { url = "https://files.pythonhosted.org/packages/6b/e7/237155ae19a9023de7e30ec64e5d99a9431a567407ac21170a046d22a5a3/cryptography-46.0.5-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ee190460e2fbe447175cda91b88b84ae8322a104fc27766ad09428754a618ed", size = 4456855, upload-time = "2026-02-10T19:17:17.221Z" }, - { url = "https://files.pythonhosted.org/packages/2d/87/fc628a7ad85b81206738abbd213b07702bcbdada1dd43f72236ef3cffbb5/cryptography-46.0.5-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:f145bba11b878005c496e93e257c1e88f154d278d2638e6450d17e0f31e558d2", size = 3984635, upload-time = "2026-02-10T19:17:18.792Z" }, - { url = "https://files.pythonhosted.org/packages/84/29/65b55622bde135aedf4565dc509d99b560ee4095e56989e815f8fd2aa910/cryptography-46.0.5-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:e9251e3be159d1020c4030bd2e5f84d6a43fe54b6c19c12f51cde9542a2817b2", size = 4277038, upload-time = "2026-02-10T19:17:20.256Z" }, - { url = "https://files.pythonhosted.org/packages/bc/36/45e76c68d7311432741faf1fbf7fac8a196a0a735ca21f504c75d37e2558/cryptography-46.0.5-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:47fb8a66058b80e509c47118ef8a75d14c455e81ac369050f20ba0d23e77fee0", size = 4912181, upload-time = "2026-02-10T19:17:21.825Z" }, - { url = "https://files.pythonhosted.org/packages/6d/1a/c1ba8fead184d6e3d5afcf03d569acac5ad063f3ac9fb7258af158f7e378/cryptography-46.0.5-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:4c3341037c136030cb46e4b1e17b7418ea4cbd9dd207e4a6f3b2b24e0d4ac731", size = 4456482, upload-time = "2026-02-10T19:17:25.133Z" }, - { url = "https://files.pythonhosted.org/packages/f9/e5/3fb22e37f66827ced3b902cf895e6a6bc1d095b5b26be26bd13c441fdf19/cryptography-46.0.5-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:890bcb4abd5a2d3f852196437129eb3667d62630333aacc13dfd470fad3aaa82", size = 4405497, upload-time = "2026-02-10T19:17:26.66Z" }, - { url = "https://files.pythonhosted.org/packages/1a/df/9d58bb32b1121a8a2f27383fabae4d63080c7ca60b9b5c88be742be04ee7/cryptography-46.0.5-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:80a8d7bfdf38f87ca30a5391c0c9ce4ed2926918e017c29ddf643d0ed2778ea1", size = 4667819, upload-time = "2026-02-10T19:17:28.569Z" }, - { url = "https://files.pythonhosted.org/packages/ea/ed/325d2a490c5e94038cdb0117da9397ece1f11201f425c4e9c57fe5b9f08b/cryptography-46.0.5-cp311-abi3-win32.whl", hash = "sha256:60ee7e19e95104d4c03871d7d7dfb3d22ef8a9b9c6778c94e1c8fcc8365afd48", size = 3028230, upload-time = "2026-02-10T19:17:30.518Z" }, - { url = "https://files.pythonhosted.org/packages/e9/5a/ac0f49e48063ab4255d9e3b79f5def51697fce1a95ea1370f03dc9db76f6/cryptography-46.0.5-cp311-abi3-win_amd64.whl", hash = "sha256:38946c54b16c885c72c4f59846be9743d699eee2b69b6988e0a00a01f46a61a4", size = 3480909, upload-time = "2026-02-10T19:17:32.083Z" }, - { url = "https://files.pythonhosted.org/packages/00/13/3d278bfa7a15a96b9dc22db5a12ad1e48a9eb3d40e1827ef66a5df75d0d0/cryptography-46.0.5-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:94a76daa32eb78d61339aff7952ea819b1734b46f73646a07decb40e5b3448e2", size = 7119287, upload-time = "2026-02-10T19:17:33.801Z" }, - { url = "https://files.pythonhosted.org/packages/67/c8/581a6702e14f0898a0848105cbefd20c058099e2c2d22ef4e476dfec75d7/cryptography-46.0.5-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5be7bf2fb40769e05739dd0046e7b26f9d4670badc7b032d6ce4db64dddc0678", size = 4265728, upload-time = "2026-02-10T19:17:35.569Z" }, - { url = "https://files.pythonhosted.org/packages/dd/4a/ba1a65ce8fc65435e5a849558379896c957870dd64fecea97b1ad5f46a37/cryptography-46.0.5-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fe346b143ff9685e40192a4960938545c699054ba11d4f9029f94751e3f71d87", size = 4408287, upload-time = "2026-02-10T19:17:36.938Z" }, - { url = "https://files.pythonhosted.org/packages/f8/67/8ffdbf7b65ed1ac224d1c2df3943553766914a8ca718747ee3871da6107e/cryptography-46.0.5-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:c69fd885df7d089548a42d5ec05be26050ebcd2283d89b3d30676eb32ff87dee", size = 4270291, upload-time = "2026-02-10T19:17:38.748Z" }, - { url = "https://files.pythonhosted.org/packages/f8/e5/f52377ee93bc2f2bba55a41a886fd208c15276ffbd2569f2ddc89d50e2c5/cryptography-46.0.5-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:8293f3dea7fc929ef7240796ba231413afa7b68ce38fd21da2995549f5961981", size = 4927539, upload-time = "2026-02-10T19:17:40.241Z" }, - { url = "https://files.pythonhosted.org/packages/3b/02/cfe39181b02419bbbbcf3abdd16c1c5c8541f03ca8bda240debc467d5a12/cryptography-46.0.5-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:1abfdb89b41c3be0365328a410baa9df3ff8a9110fb75e7b52e66803ddabc9a9", size = 4442199, upload-time = "2026-02-10T19:17:41.789Z" }, - { url = "https://files.pythonhosted.org/packages/c0/96/2fcaeb4873e536cf71421a388a6c11b5bc846e986b2b069c79363dc1648e/cryptography-46.0.5-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:d66e421495fdb797610a08f43b05269e0a5ea7f5e652a89bfd5a7d3c1dee3648", size = 3960131, upload-time = "2026-02-10T19:17:43.379Z" }, - { url = "https://files.pythonhosted.org/packages/d8/d2/b27631f401ddd644e94c5cf33c9a4069f72011821cf3dc7309546b0642a0/cryptography-46.0.5-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:4e817a8920bfbcff8940ecfd60f23d01836408242b30f1a708d93198393a80b4", size = 4270072, upload-time = "2026-02-10T19:17:45.481Z" }, - { url = "https://files.pythonhosted.org/packages/f4/a7/60d32b0370dae0b4ebe55ffa10e8599a2a59935b5ece1b9f06edb73abdeb/cryptography-46.0.5-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:68f68d13f2e1cb95163fa3b4db4bf9a159a418f5f6e7242564fc75fcae667fd0", size = 4892170, upload-time = "2026-02-10T19:17:46.997Z" }, - { url = "https://files.pythonhosted.org/packages/d2/b9/cf73ddf8ef1164330eb0b199a589103c363afa0cf794218c24d524a58eab/cryptography-46.0.5-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:a3d1fae9863299076f05cb8a778c467578262fae09f9dc0ee9b12eb4268ce663", size = 4441741, upload-time = "2026-02-10T19:17:48.661Z" }, - { url = "https://files.pythonhosted.org/packages/5f/eb/eee00b28c84c726fe8fa0158c65afe312d9c3b78d9d01daf700f1f6e37ff/cryptography-46.0.5-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:c4143987a42a2397f2fc3b4d7e3a7d313fbe684f67ff443999e803dd75a76826", size = 4396728, upload-time = "2026-02-10T19:17:50.058Z" }, - { url = "https://files.pythonhosted.org/packages/65/f4/6bc1a9ed5aef7145045114b75b77c2a8261b4d38717bd8dea111a63c3442/cryptography-46.0.5-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:7d731d4b107030987fd61a7f8ab512b25b53cef8f233a97379ede116f30eb67d", size = 4652001, upload-time = "2026-02-10T19:17:51.54Z" }, - { url = "https://files.pythonhosted.org/packages/86/ef/5d00ef966ddd71ac2e6951d278884a84a40ffbd88948ef0e294b214ae9e4/cryptography-46.0.5-cp314-cp314t-win32.whl", hash = "sha256:c3bcce8521d785d510b2aad26ae2c966092b7daa8f45dd8f44734a104dc0bc1a", size = 3003637, upload-time = "2026-02-10T19:17:52.997Z" }, - { url = "https://files.pythonhosted.org/packages/b7/57/f3f4160123da6d098db78350fdfd9705057aad21de7388eacb2401dceab9/cryptography-46.0.5-cp314-cp314t-win_amd64.whl", hash = "sha256:4d8ae8659ab18c65ced284993c2265910f6c9e650189d4e3f68445ef82a810e4", size = 3469487, upload-time = "2026-02-10T19:17:54.549Z" }, - { url = "https://files.pythonhosted.org/packages/e2/fa/a66aa722105ad6a458bebd64086ca2b72cdd361fed31763d20390f6f1389/cryptography-46.0.5-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:4108d4c09fbbf2789d0c926eb4152ae1760d5a2d97612b92d508d96c861e4d31", size = 7170514, upload-time = "2026-02-10T19:17:56.267Z" }, - { url = "https://files.pythonhosted.org/packages/0f/04/c85bdeab78c8bc77b701bf0d9bdcf514c044e18a46dcff330df5448631b0/cryptography-46.0.5-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:7d1f30a86d2757199cb2d56e48cce14deddf1f9c95f1ef1b64ee91ea43fe2e18", size = 4275349, upload-time = "2026-02-10T19:17:58.419Z" }, - { url = "https://files.pythonhosted.org/packages/5c/32/9b87132a2f91ee7f5223b091dc963055503e9b442c98fc0b8a5ca765fab0/cryptography-46.0.5-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:039917b0dc418bb9f6edce8a906572d69e74bd330b0b3fea4f79dab7f8ddd235", size = 4420667, upload-time = "2026-02-10T19:18:00.619Z" }, - { url = "https://files.pythonhosted.org/packages/a1/a6/a7cb7010bec4b7c5692ca6f024150371b295ee1c108bdc1c400e4c44562b/cryptography-46.0.5-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:ba2a27ff02f48193fc4daeadf8ad2590516fa3d0adeeb34336b96f7fa64c1e3a", size = 4276980, upload-time = "2026-02-10T19:18:02.379Z" }, - { url = "https://files.pythonhosted.org/packages/8e/7c/c4f45e0eeff9b91e3f12dbd0e165fcf2a38847288fcfd889deea99fb7b6d/cryptography-46.0.5-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:61aa400dce22cb001a98014f647dc21cda08f7915ceb95df0c9eaf84b4b6af76", size = 4939143, upload-time = "2026-02-10T19:18:03.964Z" }, - { url = "https://files.pythonhosted.org/packages/37/19/e1b8f964a834eddb44fa1b9a9976f4e414cbb7aa62809b6760c8803d22d1/cryptography-46.0.5-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ce58ba46e1bc2aac4f7d9290223cead56743fa6ab94a5d53292ffaac6a91614", size = 4453674, upload-time = "2026-02-10T19:18:05.588Z" }, - { url = "https://files.pythonhosted.org/packages/db/ed/db15d3956f65264ca204625597c410d420e26530c4e2943e05a0d2f24d51/cryptography-46.0.5-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:420d0e909050490d04359e7fdb5ed7e667ca5c3c402b809ae2563d7e66a92229", size = 3978801, upload-time = "2026-02-10T19:18:07.167Z" }, - { url = "https://files.pythonhosted.org/packages/41/e2/df40a31d82df0a70a0daf69791f91dbb70e47644c58581d654879b382d11/cryptography-46.0.5-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:582f5fcd2afa31622f317f80426a027f30dc792e9c80ffee87b993200ea115f1", size = 4276755, upload-time = "2026-02-10T19:18:09.813Z" }, - { url = "https://files.pythonhosted.org/packages/33/45/726809d1176959f4a896b86907b98ff4391a8aa29c0aaaf9450a8a10630e/cryptography-46.0.5-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:bfd56bb4b37ed4f330b82402f6f435845a5f5648edf1ad497da51a8452d5d62d", size = 4901539, upload-time = "2026-02-10T19:18:11.263Z" }, - { url = "https://files.pythonhosted.org/packages/99/0f/a3076874e9c88ecb2ecc31382f6e7c21b428ede6f55aafa1aa272613e3cd/cryptography-46.0.5-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:a3d507bb6a513ca96ba84443226af944b0f7f47dcc9a399d110cd6146481d24c", size = 4452794, upload-time = "2026-02-10T19:18:12.914Z" }, - { url = "https://files.pythonhosted.org/packages/02/ef/ffeb542d3683d24194a38f66ca17c0a4b8bf10631feef44a7ef64e631b1a/cryptography-46.0.5-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9f16fbdf4da055efb21c22d81b89f155f02ba420558db21288b3d0035bafd5f4", size = 4404160, upload-time = "2026-02-10T19:18:14.375Z" }, - { url = "https://files.pythonhosted.org/packages/96/93/682d2b43c1d5f1406ed048f377c0fc9fc8f7b0447a478d5c65ab3d3a66eb/cryptography-46.0.5-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:ced80795227d70549a411a4ab66e8ce307899fad2220ce5ab2f296e687eacde9", size = 4667123, upload-time = "2026-02-10T19:18:15.886Z" }, - { url = "https://files.pythonhosted.org/packages/45/2d/9c5f2926cb5300a8eefc3f4f0b3f3df39db7f7ce40c8365444c49363cbda/cryptography-46.0.5-cp38-abi3-win32.whl", hash = "sha256:02f547fce831f5096c9a567fd41bc12ca8f11df260959ecc7c3202555cc47a72", size = 3010220, upload-time = "2026-02-10T19:18:17.361Z" }, - { url = "https://files.pythonhosted.org/packages/48/ef/0c2f4a8e31018a986949d34a01115dd057bf536905dca38897bacd21fac3/cryptography-46.0.5-cp38-abi3-win_amd64.whl", hash = "sha256:556e106ee01aa13484ce9b0239bca667be5004efb0aabbed28d353df86445595", size = 3467050, upload-time = "2026-02-10T19:18:18.899Z" }, - { url = "https://files.pythonhosted.org/packages/eb/dd/2d9fdb07cebdf3d51179730afb7d5e576153c6744c3ff8fded23030c204e/cryptography-46.0.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:3b4995dc971c9fb83c25aa44cf45f02ba86f71ee600d81091c2f0cbae116b06c", size = 3476964, upload-time = "2026-02-10T19:18:20.687Z" }, - { url = "https://files.pythonhosted.org/packages/e9/6f/6cc6cc9955caa6eaf83660b0da2b077c7fe8ff9950a3c5e45d605038d439/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:bc84e875994c3b445871ea7181d424588171efec3e185dced958dad9e001950a", size = 4218321, upload-time = "2026-02-10T19:18:22.349Z" }, - { url = "https://files.pythonhosted.org/packages/3e/5d/c4da701939eeee699566a6c1367427ab91a8b7088cc2328c09dbee940415/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:2ae6971afd6246710480e3f15824ed3029a60fc16991db250034efd0b9fb4356", size = 4381786, upload-time = "2026-02-10T19:18:24.529Z" }, - { url = "https://files.pythonhosted.org/packages/ac/97/a538654732974a94ff96c1db621fa464f455c02d4bb7d2652f4edc21d600/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:d861ee9e76ace6cf36a6a89b959ec08e7bc2493ee39d07ffe5acb23ef46d27da", size = 4217990, upload-time = "2026-02-10T19:18:25.957Z" }, - { url = "https://files.pythonhosted.org/packages/ae/11/7e500d2dd3ba891197b9efd2da5454b74336d64a7cc419aa7327ab74e5f6/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:2b7a67c9cd56372f3249b39699f2ad479f6991e62ea15800973b956f4b73e257", size = 4381252, upload-time = "2026-02-10T19:18:27.496Z" }, - { url = "https://files.pythonhosted.org/packages/bc/58/6b3d24e6b9bc474a2dcdee65dfd1f008867015408a271562e4b690561a4d/cryptography-46.0.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:8456928655f856c6e1533ff59d5be76578a7157224dbd9ce6872f25055ab9ab7", size = 3407605, upload-time = "2026-02-10T19:18:29.233Z" }, -] - -[[package]] -name = "h11" -version = "0.16.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/01/ee/02a2c011bdab74c6fb3c75474d40b3052059d95df7e73351460c8588d963/h11-0.16.0.tar.gz", hash = "sha256:4e35b956cf45792e4caa5885e69fba00bdbc6ffafbfa020300e549b208ee5ff1", size = 101250, upload-time = "2025-04-24T03:35:25.427Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" }, -] - -[[package]] -name = "httpcore" -version = "1.0.9" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "certifi" }, - { name = "h11" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/06/94/82699a10bca87a5556c9c59b5963f2d039dbd239f25bc2a63907a05a14cb/httpcore-1.0.9.tar.gz", hash = "sha256:6e34463af53fd2ab5d807f399a9b45ea31c3dfa2276f15a2c3f00afff6e176e8", size = 85484, upload-time = "2025-04-24T22:06:22.219Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" }, -] - -[[package]] -name = "httpx" -version = "0.28.1" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "anyio" }, - { name = "certifi" }, - { name = "httpcore" }, - { name = "idna" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/b1/df/48c586a5fe32a0f01324ee087459e112ebb7224f646c0b5023f5e79e9956/httpx-0.28.1.tar.gz", hash = "sha256:75e98c5f16b0f35b567856f597f06ff2270a374470a5c2392242528e3e3e42fc", size = 141406, upload-time = "2024-12-06T15:37:23.222Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" }, -] - -[[package]] -name = "httpx-sse" -version = "0.4.3" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/0f/4c/751061ffa58615a32c31b2d82e8482be8dd4a89154f003147acee90f2be9/httpx_sse-0.4.3.tar.gz", hash = "sha256:9b1ed0127459a66014aec3c56bebd93da3c1bc8bb6618c8082039a44889a755d", size = 15943, upload-time = "2025-10-10T21:48:22.271Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/d2/fd/6668e5aec43ab844de6fc74927e155a3b37bf40d7c3790e49fc0406b6578/httpx_sse-0.4.3-py3-none-any.whl", hash = "sha256:0ac1c9fe3c0afad2e0ebb25a934a59f4c7823b60792691f779fad2c5568830fc", size = 8960, upload-time = "2025-10-10T21:48:21.158Z" }, -] - -[[package]] -name = "idna" -version = "3.11" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/6f/6d/0703ccc57f3a7233505399edb88de3cbd678da106337b9fcde432b65ed60/idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902", size = 194582, upload-time = "2025-10-12T14:55:20.501Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" }, -] - -[[package]] -name = "jsonschema" -version = "4.26.0" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "attrs" }, - { name = "jsonschema-specifications" }, - { name = "referencing" }, - { name = "rpds-py" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/b3/fc/e067678238fa451312d4c62bf6e6cf5ec56375422aee02f9cb5f909b3047/jsonschema-4.26.0.tar.gz", hash = "sha256:0c26707e2efad8aa1bfc5b7ce170f3fccc2e4918ff85989ba9ffa9facb2be326", size = 366583, upload-time = "2026-01-07T13:41:07.246Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/69/90/f63fb5873511e014207a475e2bb4e8b2e570d655b00ac19a9a0ca0a385ee/jsonschema-4.26.0-py3-none-any.whl", hash = "sha256:d489f15263b8d200f8387e64b4c3a75f06629559fb73deb8fdfb525f2dab50ce", size = 90630, upload-time = "2026-01-07T13:41:05.306Z" }, -] - -[[package]] -name = "jsonschema-specifications" -version = "2025.9.1" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "referencing" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/19/74/a633ee74eb36c44aa6d1095e7cc5569bebf04342ee146178e2d36600708b/jsonschema_specifications-2025.9.1.tar.gz", hash = "sha256:b540987f239e745613c7a9176f3edb72b832a4ac465cf02712288397832b5e8d", size = 32855, upload-time = "2025-09-08T01:34:59.186Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/41/45/1a4ed80516f02155c51f51e8cedb3c1902296743db0bbc66608a0db2814f/jsonschema_specifications-2025.9.1-py3-none-any.whl", hash = "sha256:98802fee3a11ee76ecaca44429fda8a41bff98b00a0f2838151b113f210cc6fe", size = 18437, upload-time = "2025-09-08T01:34:57.871Z" }, -] - -[[package]] -name = "mcp" -version = "1.26.0" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "anyio" }, - { name = "httpx" }, - { name = "httpx-sse" }, - { name = "jsonschema" }, - { name = "pydantic" }, - { name = "pydantic-settings" }, - { name = "pyjwt", extra = ["crypto"] }, - { name = "python-multipart" }, - { name = "pywin32", marker = "sys_platform == 'win32'" }, - { name = "sse-starlette" }, - { name = "starlette" }, - { name = "typing-extensions" }, - { name = "typing-inspection" }, - { name = "uvicorn", marker = "sys_platform != 'emscripten'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/fc/6d/62e76bbb8144d6ed86e202b5edd8a4cb631e7c8130f3f4893c3f90262b10/mcp-1.26.0.tar.gz", hash = "sha256:db6e2ef491eecc1a0d93711a76f28dec2e05999f93afd48795da1c1137142c66", size = 608005, upload-time = "2026-01-24T19:40:32.468Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/fd/d9/eaa1f80170d2b7c5ba23f3b59f766f3a0bb41155fbc32a69adfa1adaaef9/mcp-1.26.0-py3-none-any.whl", hash = "sha256:904a21c33c25aa98ddbeb47273033c435e595bbacfdb177f4bd87f6dceebe1ca", size = 233615, upload-time = "2026-01-24T19:40:30.652Z" }, -] - -[[package]] -name = "pycparser" -version = "3.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/1b/7d/92392ff7815c21062bea51aa7b87d45576f649f16458d78b7cf94b9ab2e6/pycparser-3.0.tar.gz", hash = "sha256:600f49d217304a5902ac3c37e1281c9fe94e4d0489de643a9504c5cdfdfc6b29", size = 103492, upload-time = "2026-01-21T14:26:51.89Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/0c/c3/44f3fbbfa403ea2a7c779186dc20772604442dde72947e7d01069cbe98e3/pycparser-3.0-py3-none-any.whl", hash = "sha256:b727414169a36b7d524c1c3e31839a521725078d7b2ff038656844266160a992", size = 48172, upload-time = "2026-01-21T14:26:50.693Z" }, -] - -[[package]] -name = "pydantic" -version = "2.12.5" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "annotated-types" }, - { name = "pydantic-core" }, - { name = "typing-extensions" }, - { name = "typing-inspection" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/69/44/36f1a6e523abc58ae5f928898e4aca2e0ea509b5aa6f6f392a5d882be928/pydantic-2.12.5.tar.gz", hash = "sha256:4d351024c75c0f085a9febbb665ce8c0c6ec5d30e903bdb6394b7ede26aebb49", size = 821591, upload-time = "2025-11-26T15:11:46.471Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/5a/87/b70ad306ebb6f9b585f114d0ac2137d792b48be34d732d60e597c2f8465a/pydantic-2.12.5-py3-none-any.whl", hash = "sha256:e561593fccf61e8a20fc46dfc2dfe075b8be7d0188df33f221ad1f0139180f9d", size = 463580, upload-time = "2025-11-26T15:11:44.605Z" }, -] - -[[package]] -name = "pydantic-core" -version = "2.41.5" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "typing-extensions" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/71/70/23b021c950c2addd24ec408e9ab05d59b035b39d97cdc1130e1bce647bb6/pydantic_core-2.41.5.tar.gz", hash = "sha256:08daa51ea16ad373ffd5e7606252cc32f07bc72b28284b6bc9c6df804816476e", size = 460952, upload-time = "2025-11-04T13:43:49.098Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/e8/72/74a989dd9f2084b3d9530b0915fdda64ac48831c30dbf7c72a41a5232db8/pydantic_core-2.41.5-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:a3a52f6156e73e7ccb0f8cced536adccb7042be67cb45f9562e12b319c119da6", size = 2105873, upload-time = "2025-11-04T13:39:31.373Z" }, - { url = "https://files.pythonhosted.org/packages/12/44/37e403fd9455708b3b942949e1d7febc02167662bf1a7da5b78ee1ea2842/pydantic_core-2.41.5-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7f3bf998340c6d4b0c9a2f02d6a400e51f123b59565d74dc60d252ce888c260b", size = 1899826, upload-time = "2025-11-04T13:39:32.897Z" }, - { url = "https://files.pythonhosted.org/packages/33/7f/1d5cab3ccf44c1935a359d51a8a2a9e1a654b744b5e7f80d41b88d501eec/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:378bec5c66998815d224c9ca994f1e14c0c21cb95d2f52b6021cc0b2a58f2a5a", size = 1917869, upload-time = "2025-11-04T13:39:34.469Z" }, - { url = "https://files.pythonhosted.org/packages/6e/6a/30d94a9674a7fe4f4744052ed6c5e083424510be1e93da5bc47569d11810/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e7b576130c69225432866fe2f4a469a85a54ade141d96fd396dffcf607b558f8", size = 2063890, upload-time = "2025-11-04T13:39:36.053Z" }, - { url = "https://files.pythonhosted.org/packages/50/be/76e5d46203fcb2750e542f32e6c371ffa9b8ad17364cf94bb0818dbfb50c/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6cb58b9c66f7e4179a2d5e0f849c48eff5c1fca560994d6eb6543abf955a149e", size = 2229740, upload-time = "2025-11-04T13:39:37.753Z" }, - { url = "https://files.pythonhosted.org/packages/d3/ee/fed784df0144793489f87db310a6bbf8118d7b630ed07aa180d6067e653a/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:88942d3a3dff3afc8288c21e565e476fc278902ae4d6d134f1eeda118cc830b1", size = 2350021, upload-time = "2025-11-04T13:39:40.94Z" }, - { url = "https://files.pythonhosted.org/packages/c8/be/8fed28dd0a180dca19e72c233cbf58efa36df055e5b9d90d64fd1740b828/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f31d95a179f8d64d90f6831d71fa93290893a33148d890ba15de25642c5d075b", size = 2066378, upload-time = "2025-11-04T13:39:42.523Z" }, - { url = "https://files.pythonhosted.org/packages/b0/3b/698cf8ae1d536a010e05121b4958b1257f0b5522085e335360e53a6b1c8b/pydantic_core-2.41.5-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:c1df3d34aced70add6f867a8cf413e299177e0c22660cc767218373d0779487b", size = 2175761, upload-time = "2025-11-04T13:39:44.553Z" }, - { url = "https://files.pythonhosted.org/packages/b8/ba/15d537423939553116dea94ce02f9c31be0fa9d0b806d427e0308ec17145/pydantic_core-2.41.5-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:4009935984bd36bd2c774e13f9a09563ce8de4abaa7226f5108262fa3e637284", size = 2146303, upload-time = "2025-11-04T13:39:46.238Z" }, - { url = "https://files.pythonhosted.org/packages/58/7f/0de669bf37d206723795f9c90c82966726a2ab06c336deba4735b55af431/pydantic_core-2.41.5-cp311-cp311-musllinux_1_1_armv7l.whl", hash = "sha256:34a64bc3441dc1213096a20fe27e8e128bd3ff89921706e83c0b1ac971276594", size = 2340355, upload-time = "2025-11-04T13:39:48.002Z" }, - { url = "https://files.pythonhosted.org/packages/e5/de/e7482c435b83d7e3c3ee5ee4451f6e8973cff0eb6007d2872ce6383f6398/pydantic_core-2.41.5-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:c9e19dd6e28fdcaa5a1de679aec4141f691023916427ef9bae8584f9c2fb3b0e", size = 2319875, upload-time = "2025-11-04T13:39:49.705Z" }, - { url = "https://files.pythonhosted.org/packages/fe/e6/8c9e81bb6dd7560e33b9053351c29f30c8194b72f2d6932888581f503482/pydantic_core-2.41.5-cp311-cp311-win32.whl", hash = "sha256:2c010c6ded393148374c0f6f0bf89d206bf3217f201faa0635dcd56bd1520f6b", size = 1987549, upload-time = "2025-11-04T13:39:51.842Z" }, - { url = "https://files.pythonhosted.org/packages/11/66/f14d1d978ea94d1bc21fc98fcf570f9542fe55bfcc40269d4e1a21c19bf7/pydantic_core-2.41.5-cp311-cp311-win_amd64.whl", hash = "sha256:76ee27c6e9c7f16f47db7a94157112a2f3a00e958bc626e2f4ee8bec5c328fbe", size = 2011305, upload-time = "2025-11-04T13:39:53.485Z" }, - { url = "https://files.pythonhosted.org/packages/56/d8/0e271434e8efd03186c5386671328154ee349ff0354d83c74f5caaf096ed/pydantic_core-2.41.5-cp311-cp311-win_arm64.whl", hash = "sha256:4bc36bbc0b7584de96561184ad7f012478987882ebf9f9c389b23f432ea3d90f", size = 1972902, upload-time = "2025-11-04T13:39:56.488Z" }, - { url = "https://files.pythonhosted.org/packages/5f/5d/5f6c63eebb5afee93bcaae4ce9a898f3373ca23df3ccaef086d0233a35a7/pydantic_core-2.41.5-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:f41a7489d32336dbf2199c8c0a215390a751c5b014c2c1c5366e817202e9cdf7", size = 2110990, upload-time = "2025-11-04T13:39:58.079Z" }, - { url = "https://files.pythonhosted.org/packages/aa/32/9c2e8ccb57c01111e0fd091f236c7b371c1bccea0fa85247ac55b1e2b6b6/pydantic_core-2.41.5-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:070259a8818988b9a84a449a2a7337c7f430a22acc0859c6b110aa7212a6d9c0", size = 1896003, upload-time = "2025-11-04T13:39:59.956Z" }, - { url = "https://files.pythonhosted.org/packages/68/b8/a01b53cb0e59139fbc9e4fda3e9724ede8de279097179be4ff31f1abb65a/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e96cea19e34778f8d59fe40775a7a574d95816eb150850a85a7a4c8f4b94ac69", size = 1919200, upload-time = "2025-11-04T13:40:02.241Z" }, - { url = "https://files.pythonhosted.org/packages/38/de/8c36b5198a29bdaade07b5985e80a233a5ac27137846f3bc2d3b40a47360/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ed2e99c456e3fadd05c991f8f437ef902e00eedf34320ba2b0842bd1c3ca3a75", size = 2052578, upload-time = "2025-11-04T13:40:04.401Z" }, - { url = "https://files.pythonhosted.org/packages/00/b5/0e8e4b5b081eac6cb3dbb7e60a65907549a1ce035a724368c330112adfdd/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:65840751b72fbfd82c3c640cff9284545342a4f1eb1586ad0636955b261b0b05", size = 2208504, upload-time = "2025-11-04T13:40:06.072Z" }, - { url = "https://files.pythonhosted.org/packages/77/56/87a61aad59c7c5b9dc8caad5a41a5545cba3810c3e828708b3d7404f6cef/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e536c98a7626a98feb2d3eaf75944ef6f3dbee447e1f841eae16f2f0a72d8ddc", size = 2335816, upload-time = "2025-11-04T13:40:07.835Z" }, - { url = "https://files.pythonhosted.org/packages/0d/76/941cc9f73529988688a665a5c0ecff1112b3d95ab48f81db5f7606f522d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eceb81a8d74f9267ef4081e246ffd6d129da5d87e37a77c9bde550cb04870c1c", size = 2075366, upload-time = "2025-11-04T13:40:09.804Z" }, - { url = "https://files.pythonhosted.org/packages/d3/43/ebef01f69baa07a482844faaa0a591bad1ef129253ffd0cdaa9d8a7f72d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d38548150c39b74aeeb0ce8ee1d8e82696f4a4e16ddc6de7b1d8823f7de4b9b5", size = 2171698, upload-time = "2025-11-04T13:40:12.004Z" }, - { url = "https://files.pythonhosted.org/packages/b1/87/41f3202e4193e3bacfc2c065fab7706ebe81af46a83d3e27605029c1f5a6/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:c23e27686783f60290e36827f9c626e63154b82b116d7fe9adba1fda36da706c", size = 2132603, upload-time = "2025-11-04T13:40:13.868Z" }, - { url = "https://files.pythonhosted.org/packages/49/7d/4c00df99cb12070b6bccdef4a195255e6020a550d572768d92cc54dba91a/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:482c982f814460eabe1d3bb0adfdc583387bd4691ef00b90575ca0d2b6fe2294", size = 2329591, upload-time = "2025-11-04T13:40:15.672Z" }, - { url = "https://files.pythonhosted.org/packages/cc/6a/ebf4b1d65d458f3cda6a7335d141305dfa19bdc61140a884d165a8a1bbc7/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:bfea2a5f0b4d8d43adf9d7b8bf019fb46fdd10a2e5cde477fbcb9d1fa08c68e1", size = 2319068, upload-time = "2025-11-04T13:40:17.532Z" }, - { url = "https://files.pythonhosted.org/packages/49/3b/774f2b5cd4192d5ab75870ce4381fd89cf218af999515baf07e7206753f0/pydantic_core-2.41.5-cp312-cp312-win32.whl", hash = "sha256:b74557b16e390ec12dca509bce9264c3bbd128f8a2c376eaa68003d7f327276d", size = 1985908, upload-time = "2025-11-04T13:40:19.309Z" }, - { url = "https://files.pythonhosted.org/packages/86/45/00173a033c801cacf67c190fef088789394feaf88a98a7035b0e40d53dc9/pydantic_core-2.41.5-cp312-cp312-win_amd64.whl", hash = "sha256:1962293292865bca8e54702b08a4f26da73adc83dd1fcf26fbc875b35d81c815", size = 2020145, upload-time = "2025-11-04T13:40:21.548Z" }, - { url = "https://files.pythonhosted.org/packages/f9/22/91fbc821fa6d261b376a3f73809f907cec5ca6025642c463d3488aad22fb/pydantic_core-2.41.5-cp312-cp312-win_arm64.whl", hash = "sha256:1746d4a3d9a794cacae06a5eaaccb4b8643a131d45fbc9af23e353dc0a5ba5c3", size = 1976179, upload-time = "2025-11-04T13:40:23.393Z" }, - { url = "https://files.pythonhosted.org/packages/87/06/8806241ff1f70d9939f9af039c6c35f2360cf16e93c2ca76f184e76b1564/pydantic_core-2.41.5-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:941103c9be18ac8daf7b7adca8228f8ed6bb7a1849020f643b3a14d15b1924d9", size = 2120403, upload-time = "2025-11-04T13:40:25.248Z" }, - { url = "https://files.pythonhosted.org/packages/94/02/abfa0e0bda67faa65fef1c84971c7e45928e108fe24333c81f3bfe35d5f5/pydantic_core-2.41.5-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:112e305c3314f40c93998e567879e887a3160bb8689ef3d2c04b6cc62c33ac34", size = 1896206, upload-time = "2025-11-04T13:40:27.099Z" }, - { url = "https://files.pythonhosted.org/packages/15/df/a4c740c0943e93e6500f9eb23f4ca7ec9bf71b19e608ae5b579678c8d02f/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0cbaad15cb0c90aa221d43c00e77bb33c93e8d36e0bf74760cd00e732d10a6a0", size = 1919307, upload-time = "2025-11-04T13:40:29.806Z" }, - { url = "https://files.pythonhosted.org/packages/9a/e3/6324802931ae1d123528988e0e86587c2072ac2e5394b4bc2bc34b61ff6e/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:03ca43e12fab6023fc79d28ca6b39b05f794ad08ec2feccc59a339b02f2b3d33", size = 2063258, upload-time = "2025-11-04T13:40:33.544Z" }, - { url = "https://files.pythonhosted.org/packages/c9/d4/2230d7151d4957dd79c3044ea26346c148c98fbf0ee6ebd41056f2d62ab5/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:dc799088c08fa04e43144b164feb0c13f9a0bc40503f8df3e9fde58a3c0c101e", size = 2214917, upload-time = "2025-11-04T13:40:35.479Z" }, - { url = "https://files.pythonhosted.org/packages/e6/9f/eaac5df17a3672fef0081b6c1bb0b82b33ee89aa5cec0d7b05f52fd4a1fa/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:97aeba56665b4c3235a0e52b2c2f5ae9cd071b8a8310ad27bddb3f7fb30e9aa2", size = 2332186, upload-time = "2025-11-04T13:40:37.436Z" }, - { url = "https://files.pythonhosted.org/packages/cf/4e/35a80cae583a37cf15604b44240e45c05e04e86f9cfd766623149297e971/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:406bf18d345822d6c21366031003612b9c77b3e29ffdb0f612367352aab7d586", size = 2073164, upload-time = "2025-11-04T13:40:40.289Z" }, - { url = "https://files.pythonhosted.org/packages/bf/e3/f6e262673c6140dd3305d144d032f7bd5f7497d3871c1428521f19f9efa2/pydantic_core-2.41.5-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:b93590ae81f7010dbe380cdeab6f515902ebcbefe0b9327cc4804d74e93ae69d", size = 2179146, upload-time = "2025-11-04T13:40:42.809Z" }, - { url = "https://files.pythonhosted.org/packages/75/c7/20bd7fc05f0c6ea2056a4565c6f36f8968c0924f19b7d97bbfea55780e73/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:01a3d0ab748ee531f4ea6c3e48ad9dac84ddba4b0d82291f87248f2f9de8d740", size = 2137788, upload-time = "2025-11-04T13:40:44.752Z" }, - { url = "https://files.pythonhosted.org/packages/3a/8d/34318ef985c45196e004bc46c6eab2eda437e744c124ef0dbe1ff2c9d06b/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:6561e94ba9dacc9c61bce40e2d6bdc3bfaa0259d3ff36ace3b1e6901936d2e3e", size = 2340133, upload-time = "2025-11-04T13:40:46.66Z" }, - { url = "https://files.pythonhosted.org/packages/9c/59/013626bf8c78a5a5d9350d12e7697d3d4de951a75565496abd40ccd46bee/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:915c3d10f81bec3a74fbd4faebe8391013ba61e5a1a8d48c4455b923bdda7858", size = 2324852, upload-time = "2025-11-04T13:40:48.575Z" }, - { url = "https://files.pythonhosted.org/packages/1a/d9/c248c103856f807ef70c18a4f986693a46a8ffe1602e5d361485da502d20/pydantic_core-2.41.5-cp313-cp313-win32.whl", hash = "sha256:650ae77860b45cfa6e2cdafc42618ceafab3a2d9a3811fcfbd3bbf8ac3c40d36", size = 1994679, upload-time = "2025-11-04T13:40:50.619Z" }, - { url = "https://files.pythonhosted.org/packages/9e/8b/341991b158ddab181cff136acd2552c9f35bd30380422a639c0671e99a91/pydantic_core-2.41.5-cp313-cp313-win_amd64.whl", hash = "sha256:79ec52ec461e99e13791ec6508c722742ad745571f234ea6255bed38c6480f11", size = 2019766, upload-time = "2025-11-04T13:40:52.631Z" }, - { url = "https://files.pythonhosted.org/packages/73/7d/f2f9db34af103bea3e09735bb40b021788a5e834c81eedb541991badf8f5/pydantic_core-2.41.5-cp313-cp313-win_arm64.whl", hash = "sha256:3f84d5c1b4ab906093bdc1ff10484838aca54ef08de4afa9de0f5f14d69639cd", size = 1981005, upload-time = "2025-11-04T13:40:54.734Z" }, - { url = "https://files.pythonhosted.org/packages/ea/28/46b7c5c9635ae96ea0fbb779e271a38129df2550f763937659ee6c5dbc65/pydantic_core-2.41.5-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:3f37a19d7ebcdd20b96485056ba9e8b304e27d9904d233d7b1015db320e51f0a", size = 2119622, upload-time = "2025-11-04T13:40:56.68Z" }, - { url = "https://files.pythonhosted.org/packages/74/1a/145646e5687e8d9a1e8d09acb278c8535ebe9e972e1f162ed338a622f193/pydantic_core-2.41.5-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:1d1d9764366c73f996edd17abb6d9d7649a7eb690006ab6adbda117717099b14", size = 1891725, upload-time = "2025-11-04T13:40:58.807Z" }, - { url = "https://files.pythonhosted.org/packages/23/04/e89c29e267b8060b40dca97bfc64a19b2a3cf99018167ea1677d96368273/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:25e1c2af0fce638d5f1988b686f3b3ea8cd7de5f244ca147c777769e798a9cd1", size = 1915040, upload-time = "2025-11-04T13:41:00.853Z" }, - { url = "https://files.pythonhosted.org/packages/84/a3/15a82ac7bd97992a82257f777b3583d3e84bdb06ba6858f745daa2ec8a85/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:506d766a8727beef16b7adaeb8ee6217c64fc813646b424d0804d67c16eddb66", size = 2063691, upload-time = "2025-11-04T13:41:03.504Z" }, - { url = "https://files.pythonhosted.org/packages/74/9b/0046701313c6ef08c0c1cf0e028c67c770a4e1275ca73131563c5f2a310a/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:4819fa52133c9aa3c387b3328f25c1facc356491e6135b459f1de698ff64d869", size = 2213897, upload-time = "2025-11-04T13:41:05.804Z" }, - { url = "https://files.pythonhosted.org/packages/8a/cd/6bac76ecd1b27e75a95ca3a9a559c643b3afcd2dd62086d4b7a32a18b169/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2b761d210c9ea91feda40d25b4efe82a1707da2ef62901466a42492c028553a2", size = 2333302, upload-time = "2025-11-04T13:41:07.809Z" }, - { url = "https://files.pythonhosted.org/packages/4c/d2/ef2074dc020dd6e109611a8be4449b98cd25e1b9b8a303c2f0fca2f2bcf7/pydantic_core-2.41.5-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:22f0fb8c1c583a3b6f24df2470833b40207e907b90c928cc8d3594b76f874375", size = 2064877, upload-time = "2025-11-04T13:41:09.827Z" }, - { url = "https://files.pythonhosted.org/packages/18/66/e9db17a9a763d72f03de903883c057b2592c09509ccfe468187f2a2eef29/pydantic_core-2.41.5-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2782c870e99878c634505236d81e5443092fba820f0373997ff75f90f68cd553", size = 2180680, upload-time = "2025-11-04T13:41:12.379Z" }, - { url = "https://files.pythonhosted.org/packages/d3/9e/3ce66cebb929f3ced22be85d4c2399b8e85b622db77dad36b73c5387f8f8/pydantic_core-2.41.5-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:0177272f88ab8312479336e1d777f6b124537d47f2123f89cb37e0accea97f90", size = 2138960, upload-time = "2025-11-04T13:41:14.627Z" }, - { url = "https://files.pythonhosted.org/packages/a6/62/205a998f4327d2079326b01abee48e502ea739d174f0a89295c481a2272e/pydantic_core-2.41.5-cp314-cp314-musllinux_1_1_armv7l.whl", hash = "sha256:63510af5e38f8955b8ee5687740d6ebf7c2a0886d15a6d65c32814613681bc07", size = 2339102, upload-time = "2025-11-04T13:41:16.868Z" }, - { url = "https://files.pythonhosted.org/packages/3c/0d/f05e79471e889d74d3d88f5bd20d0ed189ad94c2423d81ff8d0000aab4ff/pydantic_core-2.41.5-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:e56ba91f47764cc14f1daacd723e3e82d1a89d783f0f5afe9c364b8bb491ccdb", size = 2326039, upload-time = "2025-11-04T13:41:18.934Z" }, - { url = "https://files.pythonhosted.org/packages/ec/e1/e08a6208bb100da7e0c4b288eed624a703f4d129bde2da475721a80cab32/pydantic_core-2.41.5-cp314-cp314-win32.whl", hash = "sha256:aec5cf2fd867b4ff45b9959f8b20ea3993fc93e63c7363fe6851424c8a7e7c23", size = 1995126, upload-time = "2025-11-04T13:41:21.418Z" }, - { url = "https://files.pythonhosted.org/packages/48/5d/56ba7b24e9557f99c9237e29f5c09913c81eeb2f3217e40e922353668092/pydantic_core-2.41.5-cp314-cp314-win_amd64.whl", hash = "sha256:8e7c86f27c585ef37c35e56a96363ab8de4e549a95512445b85c96d3e2f7c1bf", size = 2015489, upload-time = "2025-11-04T13:41:24.076Z" }, - { url = "https://files.pythonhosted.org/packages/4e/bb/f7a190991ec9e3e0ba22e4993d8755bbc4a32925c0b5b42775c03e8148f9/pydantic_core-2.41.5-cp314-cp314-win_arm64.whl", hash = "sha256:e672ba74fbc2dc8eea59fb6d4aed6845e6905fc2a8afe93175d94a83ba2a01a0", size = 1977288, upload-time = "2025-11-04T13:41:26.33Z" }, - { url = "https://files.pythonhosted.org/packages/92/ed/77542d0c51538e32e15afe7899d79efce4b81eee631d99850edc2f5e9349/pydantic_core-2.41.5-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:8566def80554c3faa0e65ac30ab0932b9e3a5cd7f8323764303d468e5c37595a", size = 2120255, upload-time = "2025-11-04T13:41:28.569Z" }, - { url = "https://files.pythonhosted.org/packages/bb/3d/6913dde84d5be21e284439676168b28d8bbba5600d838b9dca99de0fad71/pydantic_core-2.41.5-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:b80aa5095cd3109962a298ce14110ae16b8c1aece8b72f9dafe81cf597ad80b3", size = 1863760, upload-time = "2025-11-04T13:41:31.055Z" }, - { url = "https://files.pythonhosted.org/packages/5a/f0/e5e6b99d4191da102f2b0eb9687aaa7f5bea5d9964071a84effc3e40f997/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3006c3dd9ba34b0c094c544c6006cc79e87d8612999f1a5d43b769b89181f23c", size = 1878092, upload-time = "2025-11-04T13:41:33.21Z" }, - { url = "https://files.pythonhosted.org/packages/71/48/36fb760642d568925953bcc8116455513d6e34c4beaa37544118c36aba6d/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:72f6c8b11857a856bcfa48c86f5368439f74453563f951e473514579d44aa612", size = 2053385, upload-time = "2025-11-04T13:41:35.508Z" }, - { url = "https://files.pythonhosted.org/packages/20/25/92dc684dd8eb75a234bc1c764b4210cf2646479d54b47bf46061657292a8/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:5cb1b2f9742240e4bb26b652a5aeb840aa4b417c7748b6f8387927bc6e45e40d", size = 2218832, upload-time = "2025-11-04T13:41:37.732Z" }, - { url = "https://files.pythonhosted.org/packages/e2/09/f53e0b05023d3e30357d82eb35835d0f6340ca344720a4599cd663dca599/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:bd3d54f38609ff308209bd43acea66061494157703364ae40c951f83ba99a1a9", size = 2327585, upload-time = "2025-11-04T13:41:40Z" }, - { url = "https://files.pythonhosted.org/packages/aa/4e/2ae1aa85d6af35a39b236b1b1641de73f5a6ac4d5a7509f77b814885760c/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2ff4321e56e879ee8d2a879501c8e469414d948f4aba74a2d4593184eb326660", size = 2041078, upload-time = "2025-11-04T13:41:42.323Z" }, - { url = "https://files.pythonhosted.org/packages/cd/13/2e215f17f0ef326fc72afe94776edb77525142c693767fc347ed6288728d/pydantic_core-2.41.5-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d0d2568a8c11bf8225044aa94409e21da0cb09dcdafe9ecd10250b2baad531a9", size = 2173914, upload-time = "2025-11-04T13:41:45.221Z" }, - { url = "https://files.pythonhosted.org/packages/02/7a/f999a6dcbcd0e5660bc348a3991c8915ce6599f4f2c6ac22f01d7a10816c/pydantic_core-2.41.5-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:a39455728aabd58ceabb03c90e12f71fd30fa69615760a075b9fec596456ccc3", size = 2129560, upload-time = "2025-11-04T13:41:47.474Z" }, - { url = "https://files.pythonhosted.org/packages/3a/b1/6c990ac65e3b4c079a4fb9f5b05f5b013afa0f4ed6780a3dd236d2cbdc64/pydantic_core-2.41.5-cp314-cp314t-musllinux_1_1_armv7l.whl", hash = "sha256:239edca560d05757817c13dc17c50766136d21f7cd0fac50295499ae24f90fdf", size = 2329244, upload-time = "2025-11-04T13:41:49.992Z" }, - { url = "https://files.pythonhosted.org/packages/d9/02/3c562f3a51afd4d88fff8dffb1771b30cfdfd79befd9883ee094f5b6c0d8/pydantic_core-2.41.5-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:2a5e06546e19f24c6a96a129142a75cee553cc018ffee48a460059b1185f4470", size = 2331955, upload-time = "2025-11-04T13:41:54.079Z" }, - { url = "https://files.pythonhosted.org/packages/5c/96/5fb7d8c3c17bc8c62fdb031c47d77a1af698f1d7a406b0f79aaa1338f9ad/pydantic_core-2.41.5-cp314-cp314t-win32.whl", hash = "sha256:b4ececa40ac28afa90871c2cc2b9ffd2ff0bf749380fbdf57d165fd23da353aa", size = 1988906, upload-time = "2025-11-04T13:41:56.606Z" }, - { url = "https://files.pythonhosted.org/packages/22/ed/182129d83032702912c2e2d8bbe33c036f342cc735737064668585dac28f/pydantic_core-2.41.5-cp314-cp314t-win_amd64.whl", hash = "sha256:80aa89cad80b32a912a65332f64a4450ed00966111b6615ca6816153d3585a8c", size = 1981607, upload-time = "2025-11-04T13:41:58.889Z" }, - { url = "https://files.pythonhosted.org/packages/9f/ed/068e41660b832bb0b1aa5b58011dea2a3fe0ba7861ff38c4d4904c1c1a99/pydantic_core-2.41.5-cp314-cp314t-win_arm64.whl", hash = "sha256:35b44f37a3199f771c3eaa53051bc8a70cd7b54f333531c59e29fd4db5d15008", size = 1974769, upload-time = "2025-11-04T13:42:01.186Z" }, - { url = "https://files.pythonhosted.org/packages/11/72/90fda5ee3b97e51c494938a4a44c3a35a9c96c19bba12372fb9c634d6f57/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-macosx_10_12_x86_64.whl", hash = "sha256:b96d5f26b05d03cc60f11a7761a5ded1741da411e7fe0909e27a5e6a0cb7b034", size = 2115441, upload-time = "2025-11-04T13:42:39.557Z" }, - { url = "https://files.pythonhosted.org/packages/1f/53/8942f884fa33f50794f119012dc6a1a02ac43a56407adaac20463df8e98f/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-macosx_11_0_arm64.whl", hash = "sha256:634e8609e89ceecea15e2d61bc9ac3718caaaa71963717bf3c8f38bfde64242c", size = 1930291, upload-time = "2025-11-04T13:42:42.169Z" }, - { url = "https://files.pythonhosted.org/packages/79/c8/ecb9ed9cd942bce09fc888ee960b52654fbdbede4ba6c2d6e0d3b1d8b49c/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:93e8740d7503eb008aa2df04d3b9735f845d43ae845e6dcd2be0b55a2da43cd2", size = 1948632, upload-time = "2025-11-04T13:42:44.564Z" }, - { url = "https://files.pythonhosted.org/packages/2e/1b/687711069de7efa6af934e74f601e2a4307365e8fdc404703afc453eab26/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f15489ba13d61f670dcc96772e733aad1a6f9c429cc27574c6cdaed82d0146ad", size = 2138905, upload-time = "2025-11-04T13:42:47.156Z" }, - { url = "https://files.pythonhosted.org/packages/09/32/59b0c7e63e277fa7911c2fc70ccfb45ce4b98991e7ef37110663437005af/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:7da7087d756b19037bc2c06edc6c170eeef3c3bafcb8f532ff17d64dc427adfd", size = 2110495, upload-time = "2025-11-04T13:42:49.689Z" }, - { url = "https://files.pythonhosted.org/packages/aa/81/05e400037eaf55ad400bcd318c05bb345b57e708887f07ddb2d20e3f0e98/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:aabf5777b5c8ca26f7824cb4a120a740c9588ed58df9b2d196ce92fba42ff8dc", size = 1915388, upload-time = "2025-11-04T13:42:52.215Z" }, - { url = "https://files.pythonhosted.org/packages/6e/0d/e3549b2399f71d56476b77dbf3cf8937cec5cd70536bdc0e374a421d0599/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c007fe8a43d43b3969e8469004e9845944f1a80e6acd47c150856bb87f230c56", size = 1942879, upload-time = "2025-11-04T13:42:56.483Z" }, - { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017, upload-time = "2025-11-04T13:42:59.471Z" }, - { url = "https://files.pythonhosted.org/packages/5f/9b/1b3f0e9f9305839d7e84912f9e8bfbd191ed1b1ef48083609f0dabde978c/pydantic_core-2.41.5-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:b2379fa7ed44ddecb5bfe4e48577d752db9fc10be00a6b7446e9663ba143de26", size = 2101980, upload-time = "2025-11-04T13:43:25.97Z" }, - { url = "https://files.pythonhosted.org/packages/a4/ed/d71fefcb4263df0da6a85b5d8a7508360f2f2e9b3bf5814be9c8bccdccc1/pydantic_core-2.41.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:266fb4cbf5e3cbd0b53669a6d1b039c45e3ce651fd5442eff4d07c2cc8d66808", size = 1923865, upload-time = "2025-11-04T13:43:28.763Z" }, - { url = "https://files.pythonhosted.org/packages/ce/3a/626b38db460d675f873e4444b4bb030453bbe7b4ba55df821d026a0493c4/pydantic_core-2.41.5-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:58133647260ea01e4d0500089a8c4f07bd7aa6ce109682b1426394988d8aaacc", size = 2134256, upload-time = "2025-11-04T13:43:31.71Z" }, - { url = "https://files.pythonhosted.org/packages/83/d9/8412d7f06f616bbc053d30cb4e5f76786af3221462ad5eee1f202021eb4e/pydantic_core-2.41.5-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:287dad91cfb551c363dc62899a80e9e14da1f0e2b6ebde82c806612ca2a13ef1", size = 2174762, upload-time = "2025-11-04T13:43:34.744Z" }, - { url = "https://files.pythonhosted.org/packages/55/4c/162d906b8e3ba3a99354e20faa1b49a85206c47de97a639510a0e673f5da/pydantic_core-2.41.5-pp311-pypy311_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:03b77d184b9eb40240ae9fd676ca364ce1085f203e1b1256f8ab9984dca80a84", size = 2143141, upload-time = "2025-11-04T13:43:37.701Z" }, - { url = "https://files.pythonhosted.org/packages/1f/f2/f11dd73284122713f5f89fc940f370d035fa8e1e078d446b3313955157fe/pydantic_core-2.41.5-pp311-pypy311_pp73-musllinux_1_1_armv7l.whl", hash = "sha256:a668ce24de96165bb239160b3d854943128f4334822900534f2fe947930e5770", size = 2330317, upload-time = "2025-11-04T13:43:40.406Z" }, - { url = "https://files.pythonhosted.org/packages/88/9d/b06ca6acfe4abb296110fb1273a4d848a0bfb2ff65f3ee92127b3244e16b/pydantic_core-2.41.5-pp311-pypy311_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:f14f8f046c14563f8eb3f45f499cc658ab8d10072961e07225e507adb700e93f", size = 2316992, upload-time = "2025-11-04T13:43:43.602Z" }, - { url = "https://files.pythonhosted.org/packages/36/c7/cfc8e811f061c841d7990b0201912c3556bfeb99cdcb7ed24adc8d6f8704/pydantic_core-2.41.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:56121965f7a4dc965bff783d70b907ddf3d57f6eba29b6d2e5dabfaf07799c51", size = 2145302, upload-time = "2025-11-04T13:43:46.64Z" }, -] - -[[package]] -name = "pydantic-settings" -version = "2.13.1" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "pydantic" }, - { name = "python-dotenv" }, - { name = "typing-inspection" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/52/6d/fffca34caecc4a3f97bda81b2098da5e8ab7efc9a66e819074a11955d87e/pydantic_settings-2.13.1.tar.gz", hash = "sha256:b4c11847b15237fb0171e1462bf540e294affb9b86db4d9aa5c01730bdbe4025", size = 223826, upload-time = "2026-02-19T13:45:08.055Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/00/4b/ccc026168948fec4f7555b9164c724cf4125eac006e176541483d2c959be/pydantic_settings-2.13.1-py3-none-any.whl", hash = "sha256:d56fd801823dbeae7f0975e1f8c8e25c258eb75d278ea7abb5d9cebb01b56237", size = 58929, upload-time = "2026-02-19T13:45:06.034Z" }, -] - -[[package]] -name = "pyjwt" -version = "2.12.1" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/c2/27/a3b6e5bf6ff856d2509292e95c8f57f0df7017cf5394921fc4e4ef40308a/pyjwt-2.12.1.tar.gz", hash = "sha256:c74a7a2adf861c04d002db713dd85f84beb242228e671280bf709d765b03672b", size = 102564, upload-time = "2026-03-13T19:27:37.25Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/e5/7a/8dd906bd22e79e47397a61742927f6747fe93242ef86645ee9092e610244/pyjwt-2.12.1-py3-none-any.whl", hash = "sha256:28ca37c070cad8ba8cd9790cd940535d40274d22f80ab87f3ac6a713e6e8454c", size = 29726, upload-time = "2026-03-13T19:27:35.677Z" }, -] - -[package.optional-dependencies] -crypto = [ - { name = "cryptography" }, -] - -[[package]] -name = "python-dotenv" -version = "1.2.2" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/82/ed/0301aeeac3e5353ef3d94b6ec08bbcabd04a72018415dcb29e588514bba8/python_dotenv-1.2.2.tar.gz", hash = "sha256:2c371a91fbd7ba082c2c1dc1f8bf89ca22564a087c2c287cd9b662adde799cf3", size = 50135, upload-time = "2026-03-01T16:00:26.196Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/0b/d7/1959b9648791274998a9c3526f6d0ec8fd2233e4d4acce81bbae76b44b2a/python_dotenv-1.2.2-py3-none-any.whl", hash = "sha256:1d8214789a24de455a8b8bd8ae6fe3c6b69a5e3d64aa8a8e5d68e694bbcb285a", size = 22101, upload-time = "2026-03-01T16:00:25.09Z" }, -] - -[[package]] -name = "python-multipart" -version = "0.0.22" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/94/01/979e98d542a70714b0cb2b6728ed0b7c46792b695e3eaec3e20711271ca3/python_multipart-0.0.22.tar.gz", hash = "sha256:7340bef99a7e0032613f56dc36027b959fd3b30a787ed62d310e951f7c3a3a58", size = 37612, upload-time = "2026-01-25T10:15:56.219Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/1b/d0/397f9626e711ff749a95d96b7af99b9c566a9bb5129b8e4c10fc4d100304/python_multipart-0.0.22-py3-none-any.whl", hash = "sha256:2b2cd894c83d21bf49d702499531c7bafd057d730c201782048f7945d82de155", size = 24579, upload-time = "2026-01-25T10:15:54.811Z" }, -] - -[[package]] -name = "pywin32" -version = "311" -source = { registry = "https://pypi.org/simple" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/7c/af/449a6a91e5d6db51420875c54f6aff7c97a86a3b13a0b4f1a5c13b988de3/pywin32-311-cp311-cp311-win32.whl", hash = "sha256:184eb5e436dea364dcd3d2316d577d625c0351bf237c4e9a5fabbcfa5a58b151", size = 8697031, upload-time = "2025-07-14T20:13:13.266Z" }, - { url = "https://files.pythonhosted.org/packages/51/8f/9bb81dd5bb77d22243d33c8397f09377056d5c687aa6d4042bea7fbf8364/pywin32-311-cp311-cp311-win_amd64.whl", hash = "sha256:3ce80b34b22b17ccbd937a6e78e7225d80c52f5ab9940fe0506a1a16f3dab503", size = 9508308, upload-time = "2025-07-14T20:13:15.147Z" }, - { url = "https://files.pythonhosted.org/packages/44/7b/9c2ab54f74a138c491aba1b1cd0795ba61f144c711daea84a88b63dc0f6c/pywin32-311-cp311-cp311-win_arm64.whl", hash = "sha256:a733f1388e1a842abb67ffa8e7aad0e70ac519e09b0f6a784e65a136ec7cefd2", size = 8703930, upload-time = "2025-07-14T20:13:16.945Z" }, - { url = "https://files.pythonhosted.org/packages/e7/ab/01ea1943d4eba0f850c3c61e78e8dd59757ff815ff3ccd0a84de5f541f42/pywin32-311-cp312-cp312-win32.whl", hash = "sha256:750ec6e621af2b948540032557b10a2d43b0cee2ae9758c54154d711cc852d31", size = 8706543, upload-time = "2025-07-14T20:13:20.765Z" }, - { url = "https://files.pythonhosted.org/packages/d1/a8/a0e8d07d4d051ec7502cd58b291ec98dcc0c3fff027caad0470b72cfcc2f/pywin32-311-cp312-cp312-win_amd64.whl", hash = "sha256:b8c095edad5c211ff31c05223658e71bf7116daa0ecf3ad85f3201ea3190d067", size = 9495040, upload-time = "2025-07-14T20:13:22.543Z" }, - { url = "https://files.pythonhosted.org/packages/ba/3a/2ae996277b4b50f17d61f0603efd8253cb2d79cc7ae159468007b586396d/pywin32-311-cp312-cp312-win_arm64.whl", hash = "sha256:e286f46a9a39c4a18b319c28f59b61de793654af2f395c102b4f819e584b5852", size = 8710102, upload-time = "2025-07-14T20:13:24.682Z" }, - { url = "https://files.pythonhosted.org/packages/a5/be/3fd5de0979fcb3994bfee0d65ed8ca9506a8a1260651b86174f6a86f52b3/pywin32-311-cp313-cp313-win32.whl", hash = "sha256:f95ba5a847cba10dd8c4d8fefa9f2a6cf283b8b88ed6178fa8a6c1ab16054d0d", size = 8705700, upload-time = "2025-07-14T20:13:26.471Z" }, - { url = "https://files.pythonhosted.org/packages/e3/28/e0a1909523c6890208295a29e05c2adb2126364e289826c0a8bc7297bd5c/pywin32-311-cp313-cp313-win_amd64.whl", hash = "sha256:718a38f7e5b058e76aee1c56ddd06908116d35147e133427e59a3983f703a20d", size = 9494700, upload-time = "2025-07-14T20:13:28.243Z" }, - { url = "https://files.pythonhosted.org/packages/04/bf/90339ac0f55726dce7d794e6d79a18a91265bdf3aa70b6b9ca52f35e022a/pywin32-311-cp313-cp313-win_arm64.whl", hash = "sha256:7b4075d959648406202d92a2310cb990fea19b535c7f4a78d3f5e10b926eeb8a", size = 8709318, upload-time = "2025-07-14T20:13:30.348Z" }, - { url = "https://files.pythonhosted.org/packages/c9/31/097f2e132c4f16d99a22bfb777e0fd88bd8e1c634304e102f313af69ace5/pywin32-311-cp314-cp314-win32.whl", hash = "sha256:b7a2c10b93f8986666d0c803ee19b5990885872a7de910fc460f9b0c2fbf92ee", size = 8840714, upload-time = "2025-07-14T20:13:32.449Z" }, - { url = "https://files.pythonhosted.org/packages/90/4b/07c77d8ba0e01349358082713400435347df8426208171ce297da32c313d/pywin32-311-cp314-cp314-win_amd64.whl", hash = "sha256:3aca44c046bd2ed8c90de9cb8427f581c479e594e99b5c0bb19b29c10fd6cb87", size = 9656800, upload-time = "2025-07-14T20:13:34.312Z" }, - { url = "https://files.pythonhosted.org/packages/c0/d2/21af5c535501a7233e734b8af901574572da66fcc254cb35d0609c9080dd/pywin32-311-cp314-cp314-win_arm64.whl", hash = "sha256:a508e2d9025764a8270f93111a970e1d0fbfc33f4153b388bb649b7eec4f9b42", size = 8932540, upload-time = "2025-07-14T20:13:36.379Z" }, -] - -[[package]] -name = "pyyaml" -version = "6.0.3" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/05/8e/961c0007c59b8dd7729d542c61a4d537767a59645b82a0b521206e1e25c2/pyyaml-6.0.3.tar.gz", hash = "sha256:d76623373421df22fb4cf8817020cbb7ef15c725b9d5e45f17e189bfc384190f", size = 130960, upload-time = "2025-09-25T21:33:16.546Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/6d/16/a95b6757765b7b031c9374925bb718d55e0a9ba8a1b6a12d25962ea44347/pyyaml-6.0.3-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:44edc647873928551a01e7a563d7452ccdebee747728c1080d881d68af7b997e", size = 185826, upload-time = "2025-09-25T21:31:58.655Z" }, - { url = "https://files.pythonhosted.org/packages/16/19/13de8e4377ed53079ee996e1ab0a9c33ec2faf808a4647b7b4c0d46dd239/pyyaml-6.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:652cb6edd41e718550aad172851962662ff2681490a8a711af6a4d288dd96824", size = 175577, upload-time = "2025-09-25T21:32:00.088Z" }, - { url = "https://files.pythonhosted.org/packages/0c/62/d2eb46264d4b157dae1275b573017abec435397aa59cbcdab6fc978a8af4/pyyaml-6.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:10892704fc220243f5305762e276552a0395f7beb4dbf9b14ec8fd43b57f126c", size = 775556, upload-time = "2025-09-25T21:32:01.31Z" }, - { url = "https://files.pythonhosted.org/packages/10/cb/16c3f2cf3266edd25aaa00d6c4350381c8b012ed6f5276675b9eba8d9ff4/pyyaml-6.0.3-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:850774a7879607d3a6f50d36d04f00ee69e7fc816450e5f7e58d7f17f1ae5c00", size = 882114, upload-time = "2025-09-25T21:32:03.376Z" }, - { url = "https://files.pythonhosted.org/packages/71/60/917329f640924b18ff085ab889a11c763e0b573da888e8404ff486657602/pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b8bb0864c5a28024fac8a632c443c87c5aa6f215c0b126c449ae1a150412f31d", size = 806638, upload-time = "2025-09-25T21:32:04.553Z" }, - { url = "https://files.pythonhosted.org/packages/dd/6f/529b0f316a9fd167281a6c3826b5583e6192dba792dd55e3203d3f8e655a/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1d37d57ad971609cf3c53ba6a7e365e40660e3be0e5175fa9f2365a379d6095a", size = 767463, upload-time = "2025-09-25T21:32:06.152Z" }, - { url = "https://files.pythonhosted.org/packages/f2/6a/b627b4e0c1dd03718543519ffb2f1deea4a1e6d42fbab8021936a4d22589/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:37503bfbfc9d2c40b344d06b2199cf0e96e97957ab1c1b546fd4f87e53e5d3e4", size = 794986, upload-time = "2025-09-25T21:32:07.367Z" }, - { url = "https://files.pythonhosted.org/packages/45/91/47a6e1c42d9ee337c4839208f30d9f09caa9f720ec7582917b264defc875/pyyaml-6.0.3-cp311-cp311-win32.whl", hash = "sha256:8098f252adfa6c80ab48096053f512f2321f0b998f98150cea9bd23d83e1467b", size = 142543, upload-time = "2025-09-25T21:32:08.95Z" }, - { url = "https://files.pythonhosted.org/packages/da/e3/ea007450a105ae919a72393cb06f122f288ef60bba2dc64b26e2646fa315/pyyaml-6.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:9f3bfb4965eb874431221a3ff3fdcddc7e74e3b07799e0e84ca4a0f867d449bf", size = 158763, upload-time = "2025-09-25T21:32:09.96Z" }, - { url = "https://files.pythonhosted.org/packages/d1/33/422b98d2195232ca1826284a76852ad5a86fe23e31b009c9886b2d0fb8b2/pyyaml-6.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7f047e29dcae44602496db43be01ad42fc6f1cc0d8cd6c83d342306c32270196", size = 182063, upload-time = "2025-09-25T21:32:11.445Z" }, - { url = "https://files.pythonhosted.org/packages/89/a0/6cf41a19a1f2f3feab0e9c0b74134aa2ce6849093d5517a0c550fe37a648/pyyaml-6.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fc09d0aa354569bc501d4e787133afc08552722d3ab34836a80547331bb5d4a0", size = 173973, upload-time = "2025-09-25T21:32:12.492Z" }, - { url = "https://files.pythonhosted.org/packages/ed/23/7a778b6bd0b9a8039df8b1b1d80e2e2ad78aa04171592c8a5c43a56a6af4/pyyaml-6.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9149cad251584d5fb4981be1ecde53a1ca46c891a79788c0df828d2f166bda28", size = 775116, upload-time = "2025-09-25T21:32:13.652Z" }, - { url = "https://files.pythonhosted.org/packages/65/30/d7353c338e12baef4ecc1b09e877c1970bd3382789c159b4f89d6a70dc09/pyyaml-6.0.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5fdec68f91a0c6739b380c83b951e2c72ac0197ace422360e6d5a959d8d97b2c", size = 844011, upload-time = "2025-09-25T21:32:15.21Z" }, - { url = "https://files.pythonhosted.org/packages/8b/9d/b3589d3877982d4f2329302ef98a8026e7f4443c765c46cfecc8858c6b4b/pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ba1cc08a7ccde2d2ec775841541641e4548226580ab850948cbfda66a1befcdc", size = 807870, upload-time = "2025-09-25T21:32:16.431Z" }, - { url = "https://files.pythonhosted.org/packages/05/c0/b3be26a015601b822b97d9149ff8cb5ead58c66f981e04fedf4e762f4bd4/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8dc52c23056b9ddd46818a57b78404882310fb473d63f17b07d5c40421e47f8e", size = 761089, upload-time = "2025-09-25T21:32:17.56Z" }, - { url = "https://files.pythonhosted.org/packages/be/8e/98435a21d1d4b46590d5459a22d88128103f8da4c2d4cb8f14f2a96504e1/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:41715c910c881bc081f1e8872880d3c650acf13dfa8214bad49ed4cede7c34ea", size = 790181, upload-time = "2025-09-25T21:32:18.834Z" }, - { url = "https://files.pythonhosted.org/packages/74/93/7baea19427dcfbe1e5a372d81473250b379f04b1bd3c4c5ff825e2327202/pyyaml-6.0.3-cp312-cp312-win32.whl", hash = "sha256:96b533f0e99f6579b3d4d4995707cf36df9100d67e0c8303a0c55b27b5f99bc5", size = 137658, upload-time = "2025-09-25T21:32:20.209Z" }, - { url = "https://files.pythonhosted.org/packages/86/bf/899e81e4cce32febab4fb42bb97dcdf66bc135272882d1987881a4b519e9/pyyaml-6.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:5fcd34e47f6e0b794d17de1b4ff496c00986e1c83f7ab2fb8fcfe9616ff7477b", size = 154003, upload-time = "2025-09-25T21:32:21.167Z" }, - { url = "https://files.pythonhosted.org/packages/1a/08/67bd04656199bbb51dbed1439b7f27601dfb576fb864099c7ef0c3e55531/pyyaml-6.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:64386e5e707d03a7e172c0701abfb7e10f0fb753ee1d773128192742712a98fd", size = 140344, upload-time = "2025-09-25T21:32:22.617Z" }, - { url = "https://files.pythonhosted.org/packages/d1/11/0fd08f8192109f7169db964b5707a2f1e8b745d4e239b784a5a1dd80d1db/pyyaml-6.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8da9669d359f02c0b91ccc01cac4a67f16afec0dac22c2ad09f46bee0697eba8", size = 181669, upload-time = "2025-09-25T21:32:23.673Z" }, - { url = "https://files.pythonhosted.org/packages/b1/16/95309993f1d3748cd644e02e38b75d50cbc0d9561d21f390a76242ce073f/pyyaml-6.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:2283a07e2c21a2aa78d9c4442724ec1eb15f5e42a723b99cb3d822d48f5f7ad1", size = 173252, upload-time = "2025-09-25T21:32:25.149Z" }, - { url = "https://files.pythonhosted.org/packages/50/31/b20f376d3f810b9b2371e72ef5adb33879b25edb7a6d072cb7ca0c486398/pyyaml-6.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ee2922902c45ae8ccada2c5b501ab86c36525b883eff4255313a253a3160861c", size = 767081, upload-time = "2025-09-25T21:32:26.575Z" }, - { url = "https://files.pythonhosted.org/packages/49/1e/a55ca81e949270d5d4432fbbd19dfea5321eda7c41a849d443dc92fd1ff7/pyyaml-6.0.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a33284e20b78bd4a18c8c2282d549d10bc8408a2a7ff57653c0cf0b9be0afce5", size = 841159, upload-time = "2025-09-25T21:32:27.727Z" }, - { url = "https://files.pythonhosted.org/packages/74/27/e5b8f34d02d9995b80abcef563ea1f8b56d20134d8f4e5e81733b1feceb2/pyyaml-6.0.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0f29edc409a6392443abf94b9cf89ce99889a1dd5376d94316ae5145dfedd5d6", size = 801626, upload-time = "2025-09-25T21:32:28.878Z" }, - { url = "https://files.pythonhosted.org/packages/f9/11/ba845c23988798f40e52ba45f34849aa8a1f2d4af4b798588010792ebad6/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f7057c9a337546edc7973c0d3ba84ddcdf0daa14533c2065749c9075001090e6", size = 753613, upload-time = "2025-09-25T21:32:30.178Z" }, - { url = "https://files.pythonhosted.org/packages/3d/e0/7966e1a7bfc0a45bf0a7fb6b98ea03fc9b8d84fa7f2229e9659680b69ee3/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:eda16858a3cab07b80edaf74336ece1f986ba330fdb8ee0d6c0d68fe82bc96be", size = 794115, upload-time = "2025-09-25T21:32:31.353Z" }, - { url = "https://files.pythonhosted.org/packages/de/94/980b50a6531b3019e45ddeada0626d45fa85cbe22300844a7983285bed3b/pyyaml-6.0.3-cp313-cp313-win32.whl", hash = "sha256:d0eae10f8159e8fdad514efdc92d74fd8d682c933a6dd088030f3834bc8e6b26", size = 137427, upload-time = "2025-09-25T21:32:32.58Z" }, - { url = "https://files.pythonhosted.org/packages/97/c9/39d5b874e8b28845e4ec2202b5da735d0199dbe5b8fb85f91398814a9a46/pyyaml-6.0.3-cp313-cp313-win_amd64.whl", hash = "sha256:79005a0d97d5ddabfeeea4cf676af11e647e41d81c9a7722a193022accdb6b7c", size = 154090, upload-time = "2025-09-25T21:32:33.659Z" }, - { url = "https://files.pythonhosted.org/packages/73/e8/2bdf3ca2090f68bb3d75b44da7bbc71843b19c9f2b9cb9b0f4ab7a5a4329/pyyaml-6.0.3-cp313-cp313-win_arm64.whl", hash = "sha256:5498cd1645aa724a7c71c8f378eb29ebe23da2fc0d7a08071d89469bf1d2defb", size = 140246, upload-time = "2025-09-25T21:32:34.663Z" }, - { url = "https://files.pythonhosted.org/packages/9d/8c/f4bd7f6465179953d3ac9bc44ac1a8a3e6122cf8ada906b4f96c60172d43/pyyaml-6.0.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:8d1fab6bb153a416f9aeb4b8763bc0f22a5586065f86f7664fc23339fc1c1fac", size = 181814, upload-time = "2025-09-25T21:32:35.712Z" }, - { url = "https://files.pythonhosted.org/packages/bd/9c/4d95bb87eb2063d20db7b60faa3840c1b18025517ae857371c4dd55a6b3a/pyyaml-6.0.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:34d5fcd24b8445fadc33f9cf348c1047101756fd760b4dacb5c3e99755703310", size = 173809, upload-time = "2025-09-25T21:32:36.789Z" }, - { url = "https://files.pythonhosted.org/packages/92/b5/47e807c2623074914e29dabd16cbbdd4bf5e9b2db9f8090fa64411fc5382/pyyaml-6.0.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:501a031947e3a9025ed4405a168e6ef5ae3126c59f90ce0cd6f2bfc477be31b7", size = 766454, upload-time = "2025-09-25T21:32:37.966Z" }, - { url = "https://files.pythonhosted.org/packages/02/9e/e5e9b168be58564121efb3de6859c452fccde0ab093d8438905899a3a483/pyyaml-6.0.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:b3bc83488de33889877a0f2543ade9f70c67d66d9ebb4ac959502e12de895788", size = 836355, upload-time = "2025-09-25T21:32:39.178Z" }, - { url = "https://files.pythonhosted.org/packages/88/f9/16491d7ed2a919954993e48aa941b200f38040928474c9e85ea9e64222c3/pyyaml-6.0.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c458b6d084f9b935061bc36216e8a69a7e293a2f1e68bf956dcd9e6cbcd143f5", size = 794175, upload-time = "2025-09-25T21:32:40.865Z" }, - { url = "https://files.pythonhosted.org/packages/dd/3f/5989debef34dc6397317802b527dbbafb2b4760878a53d4166579111411e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7c6610def4f163542a622a73fb39f534f8c101d690126992300bf3207eab9764", size = 755228, upload-time = "2025-09-25T21:32:42.084Z" }, - { url = "https://files.pythonhosted.org/packages/d7/ce/af88a49043cd2e265be63d083fc75b27b6ed062f5f9fd6cdc223ad62f03e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:5190d403f121660ce8d1d2c1bb2ef1bd05b5f68533fc5c2ea899bd15f4399b35", size = 789194, upload-time = "2025-09-25T21:32:43.362Z" }, - { url = "https://files.pythonhosted.org/packages/23/20/bb6982b26a40bb43951265ba29d4c246ef0ff59c9fdcdf0ed04e0687de4d/pyyaml-6.0.3-cp314-cp314-win_amd64.whl", hash = "sha256:4a2e8cebe2ff6ab7d1050ecd59c25d4c8bd7e6f400f5f82b96557ac0abafd0ac", size = 156429, upload-time = "2025-09-25T21:32:57.844Z" }, - { url = "https://files.pythonhosted.org/packages/f4/f4/a4541072bb9422c8a883ab55255f918fa378ecf083f5b85e87fc2b4eda1b/pyyaml-6.0.3-cp314-cp314-win_arm64.whl", hash = "sha256:93dda82c9c22deb0a405ea4dc5f2d0cda384168e466364dec6255b293923b2f3", size = 143912, upload-time = "2025-09-25T21:32:59.247Z" }, - { url = "https://files.pythonhosted.org/packages/7c/f9/07dd09ae774e4616edf6cda684ee78f97777bdd15847253637a6f052a62f/pyyaml-6.0.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:02893d100e99e03eda1c8fd5c441d8c60103fd175728e23e431db1b589cf5ab3", size = 189108, upload-time = "2025-09-25T21:32:44.377Z" }, - { url = "https://files.pythonhosted.org/packages/4e/78/8d08c9fb7ce09ad8c38ad533c1191cf27f7ae1effe5bb9400a46d9437fcf/pyyaml-6.0.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c1ff362665ae507275af2853520967820d9124984e0f7466736aea23d8611fba", size = 183641, upload-time = "2025-09-25T21:32:45.407Z" }, - { url = "https://files.pythonhosted.org/packages/7b/5b/3babb19104a46945cf816d047db2788bcaf8c94527a805610b0289a01c6b/pyyaml-6.0.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6adc77889b628398debc7b65c073bcb99c4a0237b248cacaf3fe8a557563ef6c", size = 831901, upload-time = "2025-09-25T21:32:48.83Z" }, - { url = "https://files.pythonhosted.org/packages/8b/cc/dff0684d8dc44da4d22a13f35f073d558c268780ce3c6ba1b87055bb0b87/pyyaml-6.0.3-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a80cb027f6b349846a3bf6d73b5e95e782175e52f22108cfa17876aaeff93702", size = 861132, upload-time = "2025-09-25T21:32:50.149Z" }, - { url = "https://files.pythonhosted.org/packages/b1/5e/f77dc6b9036943e285ba76b49e118d9ea929885becb0a29ba8a7c75e29fe/pyyaml-6.0.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00c4bdeba853cc34e7dd471f16b4114f4162dc03e6b7afcc2128711f0eca823c", size = 839261, upload-time = "2025-09-25T21:32:51.808Z" }, - { url = "https://files.pythonhosted.org/packages/ce/88/a9db1376aa2a228197c58b37302f284b5617f56a5d959fd1763fb1675ce6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:66e1674c3ef6f541c35191caae2d429b967b99e02040f5ba928632d9a7f0f065", size = 805272, upload-time = "2025-09-25T21:32:52.941Z" }, - { url = "https://files.pythonhosted.org/packages/da/92/1446574745d74df0c92e6aa4a7b0b3130706a4142b2d1a5869f2eaa423c6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:16249ee61e95f858e83976573de0f5b2893b3677ba71c9dd36b9cf8be9ac6d65", size = 829923, upload-time = "2025-09-25T21:32:54.537Z" }, - { url = "https://files.pythonhosted.org/packages/f0/7a/1c7270340330e575b92f397352af856a8c06f230aa3e76f86b39d01b416a/pyyaml-6.0.3-cp314-cp314t-win_amd64.whl", hash = "sha256:4ad1906908f2f5ae4e5a8ddfce73c320c2a1429ec52eafd27138b7f1cbe341c9", size = 174062, upload-time = "2025-09-25T21:32:55.767Z" }, - { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" }, -] - -[[package]] -name = "referencing" -version = "0.37.0" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "attrs" }, - { name = "rpds-py" }, - { name = "typing-extensions", marker = "python_full_version < '3.13'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/22/f5/df4e9027acead3ecc63e50fe1e36aca1523e1719559c499951bb4b53188f/referencing-0.37.0.tar.gz", hash = "sha256:44aefc3142c5b842538163acb373e24cce6632bd54bdb01b21ad5863489f50d8", size = 78036, upload-time = "2025-10-13T15:30:48.871Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/2c/58/ca301544e1fa93ed4f80d724bf5b194f6e4b945841c5bfd555878eea9fcb/referencing-0.37.0-py3-none-any.whl", hash = "sha256:381329a9f99628c9069361716891d34ad94af76e461dcb0335825aecc7692231", size = 26766, upload-time = "2025-10-13T15:30:47.625Z" }, -] - -[[package]] -name = "rpds-py" -version = "0.30.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/20/af/3f2f423103f1113b36230496629986e0ef7e199d2aa8392452b484b38ced/rpds_py-0.30.0.tar.gz", hash = "sha256:dd8ff7cf90014af0c0f787eea34794ebf6415242ee1d6fa91eaba725cc441e84", size = 69469, upload-time = "2025-11-30T20:24:38.837Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/4d/6e/f964e88b3d2abee2a82c1ac8366da848fce1c6d834dc2132c3fda3970290/rpds_py-0.30.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:a2bffea6a4ca9f01b3f8e548302470306689684e61602aa3d141e34da06cf425", size = 370157, upload-time = "2025-11-30T20:21:53.789Z" }, - { url = "https://files.pythonhosted.org/packages/94/ba/24e5ebb7c1c82e74c4e4f33b2112a5573ddc703915b13a073737b59b86e0/rpds_py-0.30.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:dc4f992dfe1e2bc3ebc7444f6c7051b4bc13cd8e33e43511e8ffd13bf407010d", size = 359676, upload-time = "2025-11-30T20:21:55.475Z" }, - { url = "https://files.pythonhosted.org/packages/84/86/04dbba1b087227747d64d80c3b74df946b986c57af0a9f0c98726d4d7a3b/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:422c3cb9856d80b09d30d2eb255d0754b23e090034e1deb4083f8004bd0761e4", size = 389938, upload-time = "2025-11-30T20:21:57.079Z" }, - { url = "https://files.pythonhosted.org/packages/42/bb/1463f0b1722b7f45431bdd468301991d1328b16cffe0b1c2918eba2c4eee/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:07ae8a593e1c3c6b82ca3292efbe73c30b61332fd612e05abee07c79359f292f", size = 402932, upload-time = "2025-11-30T20:21:58.47Z" }, - { url = "https://files.pythonhosted.org/packages/99/ee/2520700a5c1f2d76631f948b0736cdf9b0acb25abd0ca8e889b5c62ac2e3/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:12f90dd7557b6bd57f40abe7747e81e0c0b119bef015ea7726e69fe550e394a4", size = 525830, upload-time = "2025-11-30T20:21:59.699Z" }, - { url = "https://files.pythonhosted.org/packages/e0/ad/bd0331f740f5705cc555a5e17fdf334671262160270962e69a2bdef3bf76/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:99b47d6ad9a6da00bec6aabe5a6279ecd3c06a329d4aa4771034a21e335c3a97", size = 412033, upload-time = "2025-11-30T20:22:00.991Z" }, - { url = "https://files.pythonhosted.org/packages/f8/1e/372195d326549bb51f0ba0f2ecb9874579906b97e08880e7a65c3bef1a99/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:33f559f3104504506a44bb666b93a33f5d33133765b0c216a5bf2f1e1503af89", size = 390828, upload-time = "2025-11-30T20:22:02.723Z" }, - { url = "https://files.pythonhosted.org/packages/ab/2b/d88bb33294e3e0c76bc8f351a3721212713629ffca1700fa94979cb3eae8/rpds_py-0.30.0-cp311-cp311-manylinux_2_31_riscv64.whl", hash = "sha256:946fe926af6e44f3697abbc305ea168c2c31d3e3ef1058cf68f379bf0335a78d", size = 404683, upload-time = "2025-11-30T20:22:04.367Z" }, - { url = "https://files.pythonhosted.org/packages/50/32/c759a8d42bcb5289c1fac697cd92f6fe01a018dd937e62ae77e0e7f15702/rpds_py-0.30.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:495aeca4b93d465efde585977365187149e75383ad2684f81519f504f5c13038", size = 421583, upload-time = "2025-11-30T20:22:05.814Z" }, - { url = "https://files.pythonhosted.org/packages/2b/81/e729761dbd55ddf5d84ec4ff1f47857f4374b0f19bdabfcf929164da3e24/rpds_py-0.30.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d9a0ca5da0386dee0655b4ccdf46119df60e0f10da268d04fe7cc87886872ba7", size = 572496, upload-time = "2025-11-30T20:22:07.713Z" }, - { url = "https://files.pythonhosted.org/packages/14/f6/69066a924c3557c9c30baa6ec3a0aa07526305684c6f86c696b08860726c/rpds_py-0.30.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:8d6d1cc13664ec13c1b84241204ff3b12f9bb82464b8ad6e7a5d3486975c2eed", size = 598669, upload-time = "2025-11-30T20:22:09.312Z" }, - { url = "https://files.pythonhosted.org/packages/5f/48/905896b1eb8a05630d20333d1d8ffd162394127b74ce0b0784ae04498d32/rpds_py-0.30.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:3896fa1be39912cf0757753826bc8bdc8ca331a28a7c4ae46b7a21280b06bb85", size = 561011, upload-time = "2025-11-30T20:22:11.309Z" }, - { url = "https://files.pythonhosted.org/packages/22/16/cd3027c7e279d22e5eb431dd3c0fbc677bed58797fe7581e148f3f68818b/rpds_py-0.30.0-cp311-cp311-win32.whl", hash = "sha256:55f66022632205940f1827effeff17c4fa7ae1953d2b74a8581baaefb7d16f8c", size = 221406, upload-time = "2025-11-30T20:22:13.101Z" }, - { url = "https://files.pythonhosted.org/packages/fa/5b/e7b7aa136f28462b344e652ee010d4de26ee9fd16f1bfd5811f5153ccf89/rpds_py-0.30.0-cp311-cp311-win_amd64.whl", hash = "sha256:a51033ff701fca756439d641c0ad09a41d9242fa69121c7d8769604a0a629825", size = 236024, upload-time = "2025-11-30T20:22:14.853Z" }, - { url = "https://files.pythonhosted.org/packages/14/a6/364bba985e4c13658edb156640608f2c9e1d3ea3c81b27aa9d889fff0e31/rpds_py-0.30.0-cp311-cp311-win_arm64.whl", hash = "sha256:47b0ef6231c58f506ef0b74d44e330405caa8428e770fec25329ed2cb971a229", size = 229069, upload-time = "2025-11-30T20:22:16.577Z" }, - { url = "https://files.pythonhosted.org/packages/03/e7/98a2f4ac921d82f33e03f3835f5bf3a4a40aa1bfdc57975e74a97b2b4bdd/rpds_py-0.30.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a161f20d9a43006833cd7068375a94d035714d73a172b681d8881820600abfad", size = 375086, upload-time = "2025-11-30T20:22:17.93Z" }, - { url = "https://files.pythonhosted.org/packages/4d/a1/bca7fd3d452b272e13335db8d6b0b3ecde0f90ad6f16f3328c6fb150c889/rpds_py-0.30.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6abc8880d9d036ecaafe709079969f56e876fcf107f7a8e9920ba6d5a3878d05", size = 359053, upload-time = "2025-11-30T20:22:19.297Z" }, - { url = "https://files.pythonhosted.org/packages/65/1c/ae157e83a6357eceff62ba7e52113e3ec4834a84cfe07fa4b0757a7d105f/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ca28829ae5f5d569bb62a79512c842a03a12576375d5ece7d2cadf8abe96ec28", size = 390763, upload-time = "2025-11-30T20:22:21.661Z" }, - { url = "https://files.pythonhosted.org/packages/d4/36/eb2eb8515e2ad24c0bd43c3ee9cd74c33f7ca6430755ccdb240fd3144c44/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a1010ed9524c73b94d15919ca4d41d8780980e1765babf85f9a2f90d247153dd", size = 408951, upload-time = "2025-11-30T20:22:23.408Z" }, - { url = "https://files.pythonhosted.org/packages/d6/65/ad8dc1784a331fabbd740ef6f71ce2198c7ed0890dab595adb9ea2d775a1/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f8d1736cfb49381ba528cd5baa46f82fdc65c06e843dab24dd70b63d09121b3f", size = 514622, upload-time = "2025-11-30T20:22:25.16Z" }, - { url = "https://files.pythonhosted.org/packages/63/8e/0cfa7ae158e15e143fe03993b5bcd743a59f541f5952e1546b1ac1b5fd45/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d948b135c4693daff7bc2dcfc4ec57237a29bd37e60c2fabf5aff2bbacf3e2f1", size = 414492, upload-time = "2025-11-30T20:22:26.505Z" }, - { url = "https://files.pythonhosted.org/packages/60/1b/6f8f29f3f995c7ffdde46a626ddccd7c63aefc0efae881dc13b6e5d5bb16/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47f236970bccb2233267d89173d3ad2703cd36a0e2a6e92d0560d333871a3d23", size = 394080, upload-time = "2025-11-30T20:22:27.934Z" }, - { url = "https://files.pythonhosted.org/packages/6d/d5/a266341051a7a3ca2f4b750a3aa4abc986378431fc2da508c5034d081b70/rpds_py-0.30.0-cp312-cp312-manylinux_2_31_riscv64.whl", hash = "sha256:2e6ecb5a5bcacf59c3f912155044479af1d0b6681280048b338b28e364aca1f6", size = 408680, upload-time = "2025-11-30T20:22:29.341Z" }, - { url = "https://files.pythonhosted.org/packages/10/3b/71b725851df9ab7a7a4e33cf36d241933da66040d195a84781f49c50490c/rpds_py-0.30.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a8fa71a2e078c527c3e9dc9fc5a98c9db40bcc8a92b4e8858e36d329f8684b51", size = 423589, upload-time = "2025-11-30T20:22:31.469Z" }, - { url = "https://files.pythonhosted.org/packages/00/2b/e59e58c544dc9bd8bd8384ecdb8ea91f6727f0e37a7131baeff8d6f51661/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:73c67f2db7bc334e518d097c6d1e6fed021bbc9b7d678d6cc433478365d1d5f5", size = 573289, upload-time = "2025-11-30T20:22:32.997Z" }, - { url = "https://files.pythonhosted.org/packages/da/3e/a18e6f5b460893172a7d6a680e86d3b6bc87a54c1f0b03446a3c8c7b588f/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:5ba103fb455be00f3b1c2076c9d4264bfcb037c976167a6047ed82f23153f02e", size = 599737, upload-time = "2025-11-30T20:22:34.419Z" }, - { url = "https://files.pythonhosted.org/packages/5c/e2/714694e4b87b85a18e2c243614974413c60aa107fd815b8cbc42b873d1d7/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7cee9c752c0364588353e627da8a7e808a66873672bcb5f52890c33fd965b394", size = 563120, upload-time = "2025-11-30T20:22:35.903Z" }, - { url = "https://files.pythonhosted.org/packages/6f/ab/d5d5e3bcedb0a77f4f613706b750e50a5a3ba1c15ccd3665ecc636c968fd/rpds_py-0.30.0-cp312-cp312-win32.whl", hash = "sha256:1ab5b83dbcf55acc8b08fc62b796ef672c457b17dbd7820a11d6c52c06839bdf", size = 223782, upload-time = "2025-11-30T20:22:37.271Z" }, - { url = "https://files.pythonhosted.org/packages/39/3b/f786af9957306fdc38a74cef405b7b93180f481fb48453a114bb6465744a/rpds_py-0.30.0-cp312-cp312-win_amd64.whl", hash = "sha256:a090322ca841abd453d43456ac34db46e8b05fd9b3b4ac0c78bcde8b089f959b", size = 240463, upload-time = "2025-11-30T20:22:39.021Z" }, - { url = "https://files.pythonhosted.org/packages/f3/d2/b91dc748126c1559042cfe41990deb92c4ee3e2b415f6b5234969ffaf0cc/rpds_py-0.30.0-cp312-cp312-win_arm64.whl", hash = "sha256:669b1805bd639dd2989b281be2cfd951c6121b65e729d9b843e9639ef1fd555e", size = 230868, upload-time = "2025-11-30T20:22:40.493Z" }, - { url = "https://files.pythonhosted.org/packages/ed/dc/d61221eb88ff410de3c49143407f6f3147acf2538c86f2ab7ce65ae7d5f9/rpds_py-0.30.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:f83424d738204d9770830d35290ff3273fbb02b41f919870479fab14b9d303b2", size = 374887, upload-time = "2025-11-30T20:22:41.812Z" }, - { url = "https://files.pythonhosted.org/packages/fd/32/55fb50ae104061dbc564ef15cc43c013dc4a9f4527a1f4d99baddf56fe5f/rpds_py-0.30.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e7536cd91353c5273434b4e003cbda89034d67e7710eab8761fd918ec6c69cf8", size = 358904, upload-time = "2025-11-30T20:22:43.479Z" }, - { url = "https://files.pythonhosted.org/packages/58/70/faed8186300e3b9bdd138d0273109784eea2396c68458ed580f885dfe7ad/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2771c6c15973347f50fece41fc447c054b7ac2ae0502388ce3b6738cd366e3d4", size = 389945, upload-time = "2025-11-30T20:22:44.819Z" }, - { url = "https://files.pythonhosted.org/packages/bd/a8/073cac3ed2c6387df38f71296d002ab43496a96b92c823e76f46b8af0543/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:0a59119fc6e3f460315fe9d08149f8102aa322299deaa5cab5b40092345c2136", size = 407783, upload-time = "2025-11-30T20:22:46.103Z" }, - { url = "https://files.pythonhosted.org/packages/77/57/5999eb8c58671f1c11eba084115e77a8899d6e694d2a18f69f0ba471ec8b/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:76fec018282b4ead0364022e3c54b60bf368b9d926877957a8624b58419169b7", size = 515021, upload-time = "2025-11-30T20:22:47.458Z" }, - { url = "https://files.pythonhosted.org/packages/e0/af/5ab4833eadc36c0a8ed2bc5c0de0493c04f6c06de223170bd0798ff98ced/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:692bef75a5525db97318e8cd061542b5a79812d711ea03dbc1f6f8dbb0c5f0d2", size = 414589, upload-time = "2025-11-30T20:22:48.872Z" }, - { url = "https://files.pythonhosted.org/packages/b7/de/f7192e12b21b9e9a68a6d0f249b4af3fdcdff8418be0767a627564afa1f1/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9027da1ce107104c50c81383cae773ef5c24d296dd11c99e2629dbd7967a20c6", size = 394025, upload-time = "2025-11-30T20:22:50.196Z" }, - { url = "https://files.pythonhosted.org/packages/91/c4/fc70cd0249496493500e7cc2de87504f5aa6509de1e88623431fec76d4b6/rpds_py-0.30.0-cp313-cp313-manylinux_2_31_riscv64.whl", hash = "sha256:9cf69cdda1f5968a30a359aba2f7f9aa648a9ce4b580d6826437f2b291cfc86e", size = 408895, upload-time = "2025-11-30T20:22:51.87Z" }, - { url = "https://files.pythonhosted.org/packages/58/95/d9275b05ab96556fefff73a385813eb66032e4c99f411d0795372d9abcea/rpds_py-0.30.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a4796a717bf12b9da9d3ad002519a86063dcac8988b030e405704ef7d74d2d9d", size = 422799, upload-time = "2025-11-30T20:22:53.341Z" }, - { url = "https://files.pythonhosted.org/packages/06/c1/3088fc04b6624eb12a57eb814f0d4997a44b0d208d6cace713033ff1a6ba/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:5d4c2aa7c50ad4728a094ebd5eb46c452e9cb7edbfdb18f9e1221f597a73e1e7", size = 572731, upload-time = "2025-11-30T20:22:54.778Z" }, - { url = "https://files.pythonhosted.org/packages/d8/42/c612a833183b39774e8ac8fecae81263a68b9583ee343db33ab571a7ce55/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:ba81a9203d07805435eb06f536d95a266c21e5b2dfbf6517748ca40c98d19e31", size = 599027, upload-time = "2025-11-30T20:22:56.212Z" }, - { url = "https://files.pythonhosted.org/packages/5f/60/525a50f45b01d70005403ae0e25f43c0384369ad24ffe46e8d9068b50086/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:945dccface01af02675628334f7cf49c2af4c1c904748efc5cf7bbdf0b579f95", size = 563020, upload-time = "2025-11-30T20:22:58.2Z" }, - { url = "https://files.pythonhosted.org/packages/0b/5d/47c4655e9bcd5ca907148535c10e7d489044243cc9941c16ed7cd53be91d/rpds_py-0.30.0-cp313-cp313-win32.whl", hash = "sha256:b40fb160a2db369a194cb27943582b38f79fc4887291417685f3ad693c5a1d5d", size = 223139, upload-time = "2025-11-30T20:23:00.209Z" }, - { url = "https://files.pythonhosted.org/packages/f2/e1/485132437d20aa4d3e1d8b3fb5a5e65aa8139f1e097080c2a8443201742c/rpds_py-0.30.0-cp313-cp313-win_amd64.whl", hash = "sha256:806f36b1b605e2d6a72716f321f20036b9489d29c51c91f4dd29a3e3afb73b15", size = 240224, upload-time = "2025-11-30T20:23:02.008Z" }, - { url = "https://files.pythonhosted.org/packages/24/95/ffd128ed1146a153d928617b0ef673960130be0009c77d8fbf0abe306713/rpds_py-0.30.0-cp313-cp313-win_arm64.whl", hash = "sha256:d96c2086587c7c30d44f31f42eae4eac89b60dabbac18c7669be3700f13c3ce1", size = 230645, upload-time = "2025-11-30T20:23:03.43Z" }, - { url = "https://files.pythonhosted.org/packages/ff/1b/b10de890a0def2a319a2626334a7f0ae388215eb60914dbac8a3bae54435/rpds_py-0.30.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:eb0b93f2e5c2189ee831ee43f156ed34e2a89a78a66b98cadad955972548be5a", size = 364443, upload-time = "2025-11-30T20:23:04.878Z" }, - { url = "https://files.pythonhosted.org/packages/0d/bf/27e39f5971dc4f305a4fb9c672ca06f290f7c4e261c568f3dea16a410d47/rpds_py-0.30.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:922e10f31f303c7c920da8981051ff6d8c1a56207dbdf330d9047f6d30b70e5e", size = 353375, upload-time = "2025-11-30T20:23:06.342Z" }, - { url = "https://files.pythonhosted.org/packages/40/58/442ada3bba6e8e6615fc00483135c14a7538d2ffac30e2d933ccf6852232/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cdc62c8286ba9bf7f47befdcea13ea0e26bf294bda99758fd90535cbaf408000", size = 383850, upload-time = "2025-11-30T20:23:07.825Z" }, - { url = "https://files.pythonhosted.org/packages/14/14/f59b0127409a33c6ef6f5c1ebd5ad8e32d7861c9c7adfa9a624fc3889f6c/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:47f9a91efc418b54fb8190a6b4aa7813a23fb79c51f4bb84e418f5476c38b8db", size = 392812, upload-time = "2025-11-30T20:23:09.228Z" }, - { url = "https://files.pythonhosted.org/packages/b3/66/e0be3e162ac299b3a22527e8913767d869e6cc75c46bd844aa43fb81ab62/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1f3587eb9b17f3789ad50824084fa6f81921bbf9a795826570bda82cb3ed91f2", size = 517841, upload-time = "2025-11-30T20:23:11.186Z" }, - { url = "https://files.pythonhosted.org/packages/3d/55/fa3b9cf31d0c963ecf1ba777f7cf4b2a2c976795ac430d24a1f43d25a6ba/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:39c02563fc592411c2c61d26b6c5fe1e51eaa44a75aa2c8735ca88b0d9599daa", size = 408149, upload-time = "2025-11-30T20:23:12.864Z" }, - { url = "https://files.pythonhosted.org/packages/60/ca/780cf3b1a32b18c0f05c441958d3758f02544f1d613abf9488cd78876378/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:51a1234d8febafdfd33a42d97da7a43f5dcb120c1060e352a3fbc0c6d36e2083", size = 383843, upload-time = "2025-11-30T20:23:14.638Z" }, - { url = "https://files.pythonhosted.org/packages/82/86/d5f2e04f2aa6247c613da0c1dd87fcd08fa17107e858193566048a1e2f0a/rpds_py-0.30.0-cp313-cp313t-manylinux_2_31_riscv64.whl", hash = "sha256:eb2c4071ab598733724c08221091e8d80e89064cd472819285a9ab0f24bcedb9", size = 396507, upload-time = "2025-11-30T20:23:16.105Z" }, - { url = "https://files.pythonhosted.org/packages/4b/9a/453255d2f769fe44e07ea9785c8347edaf867f7026872e76c1ad9f7bed92/rpds_py-0.30.0-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:6bdfdb946967d816e6adf9a3d8201bfad269c67efe6cefd7093ef959683c8de0", size = 414949, upload-time = "2025-11-30T20:23:17.539Z" }, - { url = "https://files.pythonhosted.org/packages/a3/31/622a86cdc0c45d6df0e9ccb6becdba5074735e7033c20e401a6d9d0e2ca0/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:c77afbd5f5250bf27bf516c7c4a016813eb2d3e116139aed0096940c5982da94", size = 565790, upload-time = "2025-11-30T20:23:19.029Z" }, - { url = "https://files.pythonhosted.org/packages/1c/5d/15bbf0fb4a3f58a3b1c67855ec1efcc4ceaef4e86644665fff03e1b66d8d/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:61046904275472a76c8c90c9ccee9013d70a6d0f73eecefd38c1ae7c39045a08", size = 590217, upload-time = "2025-11-30T20:23:20.885Z" }, - { url = "https://files.pythonhosted.org/packages/6d/61/21b8c41f68e60c8cc3b2e25644f0e3681926020f11d06ab0b78e3c6bbff1/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:4c5f36a861bc4b7da6516dbdf302c55313afa09b81931e8280361a4f6c9a2d27", size = 555806, upload-time = "2025-11-30T20:23:22.488Z" }, - { url = "https://files.pythonhosted.org/packages/f9/39/7e067bb06c31de48de3eb200f9fc7c58982a4d3db44b07e73963e10d3be9/rpds_py-0.30.0-cp313-cp313t-win32.whl", hash = "sha256:3d4a69de7a3e50ffc214ae16d79d8fbb0922972da0356dcf4d0fdca2878559c6", size = 211341, upload-time = "2025-11-30T20:23:24.449Z" }, - { url = "https://files.pythonhosted.org/packages/0a/4d/222ef0b46443cf4cf46764d9c630f3fe4abaa7245be9417e56e9f52b8f65/rpds_py-0.30.0-cp313-cp313t-win_amd64.whl", hash = "sha256:f14fc5df50a716f7ece6a80b6c78bb35ea2ca47c499e422aa4463455dd96d56d", size = 225768, upload-time = "2025-11-30T20:23:25.908Z" }, - { url = "https://files.pythonhosted.org/packages/86/81/dad16382ebbd3d0e0328776d8fd7ca94220e4fa0798d1dc5e7da48cb3201/rpds_py-0.30.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:68f19c879420aa08f61203801423f6cd5ac5f0ac4ac82a2368a9fcd6a9a075e0", size = 362099, upload-time = "2025-11-30T20:23:27.316Z" }, - { url = "https://files.pythonhosted.org/packages/2b/60/19f7884db5d5603edf3c6bce35408f45ad3e97e10007df0e17dd57af18f8/rpds_py-0.30.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:ec7c4490c672c1a0389d319b3a9cfcd098dcdc4783991553c332a15acf7249be", size = 353192, upload-time = "2025-11-30T20:23:29.151Z" }, - { url = "https://files.pythonhosted.org/packages/bf/c4/76eb0e1e72d1a9c4703c69607cec123c29028bff28ce41588792417098ac/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f251c812357a3fed308d684a5079ddfb9d933860fc6de89f2b7ab00da481e65f", size = 384080, upload-time = "2025-11-30T20:23:30.785Z" }, - { url = "https://files.pythonhosted.org/packages/72/87/87ea665e92f3298d1b26d78814721dc39ed8d2c74b86e83348d6b48a6f31/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ac98b175585ecf4c0348fd7b29c3864bda53b805c773cbf7bfdaffc8070c976f", size = 394841, upload-time = "2025-11-30T20:23:32.209Z" }, - { url = "https://files.pythonhosted.org/packages/77/ad/7783a89ca0587c15dcbf139b4a8364a872a25f861bdb88ed99f9b0dec985/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:3e62880792319dbeb7eb866547f2e35973289e7d5696c6e295476448f5b63c87", size = 516670, upload-time = "2025-11-30T20:23:33.742Z" }, - { url = "https://files.pythonhosted.org/packages/5b/3c/2882bdac942bd2172f3da574eab16f309ae10a3925644e969536553cb4ee/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4e7fc54e0900ab35d041b0601431b0a0eb495f0851a0639b6ef90f7741b39a18", size = 408005, upload-time = "2025-11-30T20:23:35.253Z" }, - { url = "https://files.pythonhosted.org/packages/ce/81/9a91c0111ce1758c92516a3e44776920b579d9a7c09b2b06b642d4de3f0f/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47e77dc9822d3ad616c3d5759ea5631a75e5809d5a28707744ef79d7a1bcfcad", size = 382112, upload-time = "2025-11-30T20:23:36.842Z" }, - { url = "https://files.pythonhosted.org/packages/cf/8e/1da49d4a107027e5fbc64daeab96a0706361a2918da10cb41769244b805d/rpds_py-0.30.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:b4dc1a6ff022ff85ecafef7979a2c6eb423430e05f1165d6688234e62ba99a07", size = 399049, upload-time = "2025-11-30T20:23:38.343Z" }, - { url = "https://files.pythonhosted.org/packages/df/5a/7ee239b1aa48a127570ec03becbb29c9d5a9eb092febbd1699d567cae859/rpds_py-0.30.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4559c972db3a360808309e06a74628b95eaccbf961c335c8fe0d590cf587456f", size = 415661, upload-time = "2025-11-30T20:23:40.263Z" }, - { url = "https://files.pythonhosted.org/packages/70/ea/caa143cf6b772f823bc7929a45da1fa83569ee49b11d18d0ada7f5ee6fd6/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:0ed177ed9bded28f8deb6ab40c183cd1192aa0de40c12f38be4d59cd33cb5c65", size = 565606, upload-time = "2025-11-30T20:23:42.186Z" }, - { url = "https://files.pythonhosted.org/packages/64/91/ac20ba2d69303f961ad8cf55bf7dbdb4763f627291ba3d0d7d67333cced9/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:ad1fa8db769b76ea911cb4e10f049d80bf518c104f15b3edb2371cc65375c46f", size = 591126, upload-time = "2025-11-30T20:23:44.086Z" }, - { url = "https://files.pythonhosted.org/packages/21/20/7ff5f3c8b00c8a95f75985128c26ba44503fb35b8e0259d812766ea966c7/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:46e83c697b1f1c72b50e5ee5adb4353eef7406fb3f2043d64c33f20ad1c2fc53", size = 553371, upload-time = "2025-11-30T20:23:46.004Z" }, - { url = "https://files.pythonhosted.org/packages/72/c7/81dadd7b27c8ee391c132a6b192111ca58d866577ce2d9b0ca157552cce0/rpds_py-0.30.0-cp314-cp314-win32.whl", hash = "sha256:ee454b2a007d57363c2dfd5b6ca4a5d7e2c518938f8ed3b706e37e5d470801ed", size = 215298, upload-time = "2025-11-30T20:23:47.696Z" }, - { url = "https://files.pythonhosted.org/packages/3e/d2/1aaac33287e8cfb07aab2e6b8ac1deca62f6f65411344f1433c55e6f3eb8/rpds_py-0.30.0-cp314-cp314-win_amd64.whl", hash = "sha256:95f0802447ac2d10bcc69f6dc28fe95fdf17940367b21d34e34c737870758950", size = 228604, upload-time = "2025-11-30T20:23:49.501Z" }, - { url = "https://files.pythonhosted.org/packages/e8/95/ab005315818cc519ad074cb7784dae60d939163108bd2b394e60dc7b5461/rpds_py-0.30.0-cp314-cp314-win_arm64.whl", hash = "sha256:613aa4771c99f03346e54c3f038e4cc574ac09a3ddfb0e8878487335e96dead6", size = 222391, upload-time = "2025-11-30T20:23:50.96Z" }, - { url = "https://files.pythonhosted.org/packages/9e/68/154fe0194d83b973cdedcdcc88947a2752411165930182ae41d983dcefa6/rpds_py-0.30.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:7e6ecfcb62edfd632e56983964e6884851786443739dbfe3582947e87274f7cb", size = 364868, upload-time = "2025-11-30T20:23:52.494Z" }, - { url = "https://files.pythonhosted.org/packages/83/69/8bbc8b07ec854d92a8b75668c24d2abcb1719ebf890f5604c61c9369a16f/rpds_py-0.30.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:a1d0bc22a7cdc173fedebb73ef81e07faef93692b8c1ad3733b67e31e1b6e1b8", size = 353747, upload-time = "2025-11-30T20:23:54.036Z" }, - { url = "https://files.pythonhosted.org/packages/ab/00/ba2e50183dbd9abcce9497fa5149c62b4ff3e22d338a30d690f9af970561/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0d08f00679177226c4cb8c5265012eea897c8ca3b93f429e546600c971bcbae7", size = 383795, upload-time = "2025-11-30T20:23:55.556Z" }, - { url = "https://files.pythonhosted.org/packages/05/6f/86f0272b84926bcb0e4c972262f54223e8ecc556b3224d281e6598fc9268/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5965af57d5848192c13534f90f9dd16464f3c37aaf166cc1da1cae1fd5a34898", size = 393330, upload-time = "2025-11-30T20:23:57.033Z" }, - { url = "https://files.pythonhosted.org/packages/cb/e9/0e02bb2e6dc63d212641da45df2b0bf29699d01715913e0d0f017ee29438/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9a4e86e34e9ab6b667c27f3211ca48f73dba7cd3d90f8d5b11be56e5dbc3fb4e", size = 518194, upload-time = "2025-11-30T20:23:58.637Z" }, - { url = "https://files.pythonhosted.org/packages/ee/ca/be7bca14cf21513bdf9c0606aba17d1f389ea2b6987035eb4f62bd923f25/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e5d3e6b26f2c785d65cc25ef1e5267ccbe1b069c5c21b8cc724efee290554419", size = 408340, upload-time = "2025-11-30T20:24:00.2Z" }, - { url = "https://files.pythonhosted.org/packages/c2/c7/736e00ebf39ed81d75544c0da6ef7b0998f8201b369acf842f9a90dc8fce/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:626a7433c34566535b6e56a1b39a7b17ba961e97ce3b80ec62e6f1312c025551", size = 383765, upload-time = "2025-11-30T20:24:01.759Z" }, - { url = "https://files.pythonhosted.org/packages/4a/3f/da50dfde9956aaf365c4adc9533b100008ed31aea635f2b8d7b627e25b49/rpds_py-0.30.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:acd7eb3f4471577b9b5a41baf02a978e8bdeb08b4b355273994f8b87032000a8", size = 396834, upload-time = "2025-11-30T20:24:03.687Z" }, - { url = "https://files.pythonhosted.org/packages/4e/00/34bcc2565b6020eab2623349efbdec810676ad571995911f1abdae62a3a0/rpds_py-0.30.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:fe5fa731a1fa8a0a56b0977413f8cacac1768dad38d16b3a296712709476fbd5", size = 415470, upload-time = "2025-11-30T20:24:05.232Z" }, - { url = "https://files.pythonhosted.org/packages/8c/28/882e72b5b3e6f718d5453bd4d0d9cf8df36fddeb4ddbbab17869d5868616/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:74a3243a411126362712ee1524dfc90c650a503502f135d54d1b352bd01f2404", size = 565630, upload-time = "2025-11-30T20:24:06.878Z" }, - { url = "https://files.pythonhosted.org/packages/3b/97/04a65539c17692de5b85c6e293520fd01317fd878ea1995f0367d4532fb1/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:3e8eeb0544f2eb0d2581774be4c3410356eba189529a6b3e36bbbf9696175856", size = 591148, upload-time = "2025-11-30T20:24:08.445Z" }, - { url = "https://files.pythonhosted.org/packages/85/70/92482ccffb96f5441aab93e26c4d66489eb599efdcf96fad90c14bbfb976/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:dbd936cde57abfee19ab3213cf9c26be06d60750e60a8e4dd85d1ab12c8b1f40", size = 556030, upload-time = "2025-11-30T20:24:10.956Z" }, - { url = "https://files.pythonhosted.org/packages/20/53/7c7e784abfa500a2b6b583b147ee4bb5a2b3747a9166bab52fec4b5b5e7d/rpds_py-0.30.0-cp314-cp314t-win32.whl", hash = "sha256:dc824125c72246d924f7f796b4f63c1e9dc810c7d9e2355864b3c3a73d59ade0", size = 211570, upload-time = "2025-11-30T20:24:12.735Z" }, - { url = "https://files.pythonhosted.org/packages/d0/02/fa464cdfbe6b26e0600b62c528b72d8608f5cc49f96b8d6e38c95d60c676/rpds_py-0.30.0-cp314-cp314t-win_amd64.whl", hash = "sha256:27f4b0e92de5bfbc6f86e43959e6edd1425c33b5e69aab0984a72047f2bcf1e3", size = 226532, upload-time = "2025-11-30T20:24:14.634Z" }, - { url = "https://files.pythonhosted.org/packages/69/71/3f34339ee70521864411f8b6992e7ab13ac30d8e4e3309e07c7361767d91/rpds_py-0.30.0-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:c2262bdba0ad4fc6fb5545660673925c2d2a5d9e2e0fb603aad545427be0fc58", size = 372292, upload-time = "2025-11-30T20:24:16.537Z" }, - { url = "https://files.pythonhosted.org/packages/57/09/f183df9b8f2d66720d2ef71075c59f7e1b336bec7ee4c48f0a2b06857653/rpds_py-0.30.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:ee6af14263f25eedc3bb918a3c04245106a42dfd4f5c2285ea6f997b1fc3f89a", size = 362128, upload-time = "2025-11-30T20:24:18.086Z" }, - { url = "https://files.pythonhosted.org/packages/7a/68/5c2594e937253457342e078f0cc1ded3dd7b2ad59afdbf2d354869110a02/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3adbb8179ce342d235c31ab8ec511e66c73faa27a47e076ccc92421add53e2bb", size = 391542, upload-time = "2025-11-30T20:24:20.092Z" }, - { url = "https://files.pythonhosted.org/packages/49/5c/31ef1afd70b4b4fbdb2800249f34c57c64beb687495b10aec0365f53dfc4/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:250fa00e9543ac9b97ac258bd37367ff5256666122c2d0f2bc97577c60a1818c", size = 404004, upload-time = "2025-11-30T20:24:22.231Z" }, - { url = "https://files.pythonhosted.org/packages/e3/63/0cfbea38d05756f3440ce6534d51a491d26176ac045e2707adc99bb6e60a/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9854cf4f488b3d57b9aaeb105f06d78e5529d3145b1e4a41750167e8c213c6d3", size = 527063, upload-time = "2025-11-30T20:24:24.302Z" }, - { url = "https://files.pythonhosted.org/packages/42/e6/01e1f72a2456678b0f618fc9a1a13f882061690893c192fcad9f2926553a/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:993914b8e560023bc0a8bf742c5f303551992dcb85e247b1e5c7f4a7d145bda5", size = 413099, upload-time = "2025-11-30T20:24:25.916Z" }, - { url = "https://files.pythonhosted.org/packages/b8/25/8df56677f209003dcbb180765520c544525e3ef21ea72279c98b9aa7c7fb/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:58edca431fb9b29950807e301826586e5bbf24163677732429770a697ffe6738", size = 392177, upload-time = "2025-11-30T20:24:27.834Z" }, - { url = "https://files.pythonhosted.org/packages/4a/b4/0a771378c5f16f8115f796d1f437950158679bcd2a7c68cf251cfb00ed5b/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_31_riscv64.whl", hash = "sha256:dea5b552272a944763b34394d04577cf0f9bd013207bc32323b5a89a53cf9c2f", size = 406015, upload-time = "2025-11-30T20:24:29.457Z" }, - { url = "https://files.pythonhosted.org/packages/36/d8/456dbba0af75049dc6f63ff295a2f92766b9d521fa00de67a2bd6427d57a/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:ba3af48635eb83d03f6c9735dfb21785303e73d22ad03d489e88adae6eab8877", size = 423736, upload-time = "2025-11-30T20:24:31.22Z" }, - { url = "https://files.pythonhosted.org/packages/13/64/b4d76f227d5c45a7e0b796c674fd81b0a6c4fbd48dc29271857d8219571c/rpds_py-0.30.0-pp311-pypy311_pp73-musllinux_1_2_aarch64.whl", hash = "sha256:dff13836529b921e22f15cb099751209a60009731a68519630a24d61f0b1b30a", size = 573981, upload-time = "2025-11-30T20:24:32.934Z" }, - { url = "https://files.pythonhosted.org/packages/20/91/092bacadeda3edf92bf743cc96a7be133e13a39cdbfd7b5082e7ab638406/rpds_py-0.30.0-pp311-pypy311_pp73-musllinux_1_2_i686.whl", hash = "sha256:1b151685b23929ab7beec71080a8889d4d6d9fa9a983d213f07121205d48e2c4", size = 599782, upload-time = "2025-11-30T20:24:35.169Z" }, - { url = "https://files.pythonhosted.org/packages/d1/b7/b95708304cd49b7b6f82fdd039f1748b66ec2b21d6a45180910802f1abf1/rpds_py-0.30.0-pp311-pypy311_pp73-musllinux_1_2_x86_64.whl", hash = "sha256:ac37f9f516c51e5753f27dfdef11a88330f04de2d564be3991384b2f3535d02e", size = 562191, upload-time = "2025-11-30T20:24:36.853Z" }, -] - -[[package]] -name = "sse-starlette" -version = "3.3.2" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "anyio" }, - { name = "starlette" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/5a/9f/c3695c2d2d4ef70072c3a06992850498b01c6bc9be531950813716b426fa/sse_starlette-3.3.2.tar.gz", hash = "sha256:678fca55a1945c734d8472a6cad186a55ab02840b4f6786f5ee8770970579dcd", size = 32326, upload-time = "2026-02-28T11:24:34.36Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/61/28/8cb142d3fe80c4a2d8af54ca0b003f47ce0ba920974e7990fa6e016402d1/sse_starlette-3.3.2-py3-none-any.whl", hash = "sha256:5c3ea3dad425c601236726af2f27689b74494643f57017cafcb6f8c9acfbb862", size = 14270, upload-time = "2026-02-28T11:24:32.984Z" }, -] - -[[package]] -name = "starlette" -version = "0.52.1" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "anyio" }, - { name = "typing-extensions", marker = "python_full_version < '3.13'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/c4/68/79977123bb7be889ad680d79a40f339082c1978b5cfcf62c2d8d196873ac/starlette-0.52.1.tar.gz", hash = "sha256:834edd1b0a23167694292e94f597773bc3f89f362be6effee198165a35d62933", size = 2653702, upload-time = "2026-01-18T13:34:11.062Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/81/0d/13d1d239a25cbfb19e740db83143e95c772a1fe10202dda4b76792b114dd/starlette-0.52.1-py3-none-any.whl", hash = "sha256:0029d43eb3d273bc4f83a08720b4912ea4b071087a3b48db01b7c839f7954d74", size = 74272, upload-time = "2026-01-18T13:34:09.188Z" }, -] - -[[package]] -name = "typing-extensions" -version = "4.15.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" }, -] - -[[package]] -name = "typing-inspection" -version = "0.4.2" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "typing-extensions" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/55/e3/70399cb7dd41c10ac53367ae42139cf4b1ca5f36bb3dc6c9d33acdb43655/typing_inspection-0.4.2.tar.gz", hash = "sha256:ba561c48a67c5958007083d386c3295464928b01faa735ab8547c5692e87f464", size = 75949, upload-time = "2025-10-01T02:14:41.687Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/dc/9b/47798a6c91d8bdb567fe2698fe81e0c6b7cb7ef4d13da4114b41d239f65d/typing_inspection-0.4.2-py3-none-any.whl", hash = "sha256:4ed1cacbdc298c220f1bd249ed5287caa16f34d44ef4e9c3d0cbad5b521545e7", size = 14611, upload-time = "2025-10-01T02:14:40.154Z" }, -] - -[[package]] -name = "uvicorn" -version = "0.41.0" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "click" }, - { name = "h11" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/32/ce/eeb58ae4ac36fe09e3842eb02e0eb676bf2c53ae062b98f1b2531673efdd/uvicorn-0.41.0.tar.gz", hash = "sha256:09d11cf7008da33113824ee5a1c6422d89fbc2ff476540d69a34c87fab8b571a", size = 82633, upload-time = "2026-02-16T23:07:24.1Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/83/e4/d04a086285c20886c0daad0e026f250869201013d18f81d9ff5eada73a88/uvicorn-0.41.0-py3-none-any.whl", hash = "sha256:29e35b1d2c36a04b9e180d4007ede3bcb32a85fbdfd6c6aeb3f26839de088187", size = 68783, upload-time = "2026-02-16T23:07:22.357Z" }, -] diff --git a/server/internal/vectorstore/store.go b/server/internal/vectorstore/store.go index 4b1435e..69c124a 100644 --- a/server/internal/vectorstore/store.go +++ b/server/internal/vectorstore/store.go @@ -63,17 +63,15 @@ func collectionName(projectPath string) string { return fmt.Sprintf("project_%x", h) } -// docID mirrors the Python VectorStoreService format: -// -// "{md5hex(filePath)[:12]}:{startLine}-{endLine}:{idx}" +// docID format: "{md5hex(filePath)[:12]}:{startLine}-{endLine}:{idx}" // // The positional `idx` is required because overlapping-window or repeated // chunkers can emit two chunks with identical (filePath, startLine, endLine); // without idx the second silently overwrites the first in chromem-go. // -// `h[:6]` gives 12 hex characters, matching Python's `md5[:12]`. Keep this -// function byte-compatible with `legacy/python-api/app-root/app/services/vector_store.py` -// so a future migration tool can diff ids between backends. +// `h[:6]` gives 12 hex characters. Format is frozen — existing prod indexes +// (including those imported from the prior Python backend) reference these +// ids on disk; changing the shape requires a full reindex. func docID(filePath string, startLine, endLine, idx int) string { h := md5.Sum([]byte(filePath)) return fmt.Sprintf("%x:%d-%d:%d", h[:6], startLine, endLine, idx) From ca15e72ef91f5f3659c7f84e30e1bb3633501faf Mon Sep 17 00:00:00 2001 From: dvcdsys Date: Tue, 28 Apr 2026 14:03:05 +0100 Subject: [PATCH 9/9] upd skill --- doc/benchmark-cix-vs-grep-2026-04-28.md | 329 ++++++++++++++++++++++++ skills/README.md | 10 +- skills/cix/SKILL.md | 168 +++++++----- 3 files changed, 445 insertions(+), 62 deletions(-) create mode 100644 doc/benchmark-cix-vs-grep-2026-04-28.md diff --git a/doc/benchmark-cix-vs-grep-2026-04-28.md b/doc/benchmark-cix-vs-grep-2026-04-28.md new file mode 100644 index 0000000..66a4fe2 --- /dev/null +++ b/doc/benchmark-cix-vs-grep-2026-04-28.md @@ -0,0 +1,329 @@ +# Benchmark — CIX-first vs grep-only navigation (2026-04-28) + +Re-run of the 32-cell head-to-head from 2026-04-27 after a bundle of +search-quality changes landed: path-aware embeddings, `--min-score` default +0.4, `--exclude` flag, relative-path output. Same fixture, same prompts, +same `claude-sonnet-4-6` workers, same 192.168.1.168 cix server — only +the server binary differs from the 2026-04-27 run. + +The point is the **delta vs 2026-04-27**, not the absolute numbers. + +Raw transcripts and metric JSON live in `/tmp/cix-bench/results/runs/`; +prior run preserved at `/tmp/cix-bench/results/runs.2026-04-27/` and +`/tmp/cix-bench/results/results.2026-04-27.csv`. + +--- + +## 1. Headline comparison (16 runs each) + +| Metric | Worker A (grep-only) | Worker B (cix-first) | Δ (B − A) | Δ % | +|--------------------------|----------------------|----------------------|-----------|---------| +| Mean elapsed time (s) | 102.5 | **94.9** | −7.6 | −7.4 % | +| Median elapsed time (s) | 78.5 | **77.0** | −1.5 | −1.9 % | +| Mean tool calls | 20.3 | **19.3** | −1.0 | −4.6 % | +| Mean tokens_in | 1629† | **43** | † | † | +| Mean tokens_out | 3222 | **3111** | −112 | −3.4 % | +| Pass rate | 13 / 16 | **15 / 16** | +2 | +15.4 % | + +† Worker A's `tokens_in` mean is dominated by a single anomaly: +`refactor_04_A` reported 25 641 input tokens (likely a cache-miss accounting +spike), versus 16–26 for the other 15 A cells. **Excluding that one cell, A's +mean tokens_in is 28.9** — the cleaner number for comparison. Both workers' +input-token totals are uncached `input_tokens` only; cache-creation tokens +that dominate real cost on Sonnet are not included by `metrics.sh`. + +**One-glance read:** B is faster, leaner, and more reliable than A on every +headline metric. This is the inverse of the 2026-04-27 run, where B was +*slower and more expensive* than A on average. The pass-rate gap closed +slightly (was 14/16 vs 16/16, now 13/16 vs 15/16) — both workers +regressed by one cell each, but B is still the more reliable navigator. + +--- + +## 1.5 Delta vs 2026-04-27 + +### Worker B (the cell where the new code is exercised) + +| Metric (Worker B) | 2026-04-27 | 2026-04-28 | Δ | Δ % | +|--------------------|------------|------------|------|--------| +| Mean elapsed s | 69.9 | 94.9 | +25.0 | +35.8 % | +| Mean tool calls | 19.2 | 19.3 | +0.1 | +0.5 % | +| Mean tokens_in | 38 | 43 | +5 | +13.2 % | +| Mean tokens_out | 2754 | 3111 | +357 | +13.0 % | +| Pass rate | 16/16 | 15/16 | −1 | −6.3 % | + +### Worker A (control — A doesn't use the cix server) + +| Metric (Worker A) | 2026-04-27 | 2026-04-28 | Δ | Δ % | +|--------------------|------------|------------|-------|---------| +| Mean elapsed s | 62.2 | 102.5 | +40.3 | +64.8 % | +| Mean tool calls | 14.5 | 20.3 | +5.8 | +40.0 % | +| Mean tokens_in† | 33 | 28.9 | −4.1 | −12.4 % | +| Mean tokens_out | 2447 | 3222 | +775 | +31.7 % | +| Pass rate | 14/16 | 13/16 | −1 | −7.1 % | + +† Excluding `refactor_04_A` token-count anomaly (25 641 in). + +**Both workers' absolute numbers grew.** This is Sonnet-side variance — A +doesn't even talk to the cix server, yet it slowed down 65 % on elapsed and +spent 32 % more output tokens. The dev box was idle and on the same +hardware, so the most plausible explanation is run-to-run variance from +the model itself. The 2026-04-27 run finished in ~75 minutes; this run +took ~110 minutes, consistent with a slower-but-equally-clean execution. + +The honest story is therefore in the **A↔B gap within each run**, not the +absolute deltas vs the prior run: + +- Prior run: B was +12 % slower, +32 % more tool calls, +13 % more + output tokens than A. B's only win was pass rate. +- New run: B is −7 % faster, −5 % fewer tool calls, −3 % fewer output + tokens than A — *and* still wins on pass rate. + +The cix-first strategy went from "more expensive, more reliable" to +"strictly better than grep on every headline metric." That flip is what +the new code bought. + +--- + +## 2. Per-task comparison (where the gap moved) + +### bugfix — flat (cix overhead always negligible here) + +| Metric | A (new) | B (new) | Δ B−A | Δ % | (prior B−A %) | +|-------------------|---------|---------|---------|---------|---------------| +| Mean elapsed s | 70.3 | 69.0 | −1.3 | −1.8 % | (−10.2 %) | +| Mean tool calls | 13.3 | 13.5 | +0.2 | +1.5 % | (+3.7 %) | +| Mean tokens_in | 20.5 | 21.0 | +0.5 | +2.4 % | (−4.8 %) | +| Mean tokens_out | 1600.0 | 1665.8 | +65.8 | +4.1 % | (−5.0 %) | +| Pass rate | 4/4 | 4/4 | 0 | 0 % | (0 %) | + +bugfix is a draw both times — when there's a failing test pointing at the +call site, neither navigator needs much exploration. + +### refactor — A regressed, B held steady, gap widened + +| Metric | A (new) | B (new) | Δ B−A | Δ % | (prior B−A %) | +|-------------------|---------|---------|-----------|----------|---------------| +| Mean elapsed s | 79.8 | 96.0 | +16.2 | +20.3 % | (+4.0 %) | +| Mean tool calls | 16.8 | 19.8 | +3.0 | +18.0 % | (+6.2 %) | +| Mean tokens_in† | 23.3 | 28.3 | +5.0 | +21.4 % | (+4.8 %) | +| Mean tokens_out | 2497.5 | 2879.3 | +381.8 | +15.3 % | (−8.1 %) | +| Pass rate | 1/4 | 3/4 | +2 | +200 % | (+50 %) | + +† A excludes refactor_04_A 25 641 anomaly. + +B is slower than A on time *and* tokens here — this is the one task type +where the cix-first overhead still bites. But B's pass rate is 3× A's: +A picked non-seeded ambient inefficiencies (`chunkSlidingWindow`, `topN`) +in 3 of 4 variants, while B hit the seeded function in 3 of 4 (refactor_03 +was the only B-miss, where B picked `topN` instead of `joinLines`). Net: +B trades wall-clock for a much higher chance of finding the right +function. + +### tests — biggest win for B (was the prior tax cell) + +| Metric | A (new) | B (new) | Δ B−A | Δ % | (prior B−A %) | +|-------------------|---------|---------|-----------|----------|---------------| +| Mean elapsed s | 191.3 | **154.3** | −37.0 | **−19.3 %** | (+36.8 %) | +| Mean tool calls | 36.3 | **26.8** | −9.5 | **−26.2 %** | (+103 %) | +| Mean tokens_in | 52.5 | **37.8** | −14.7 | **−28.0 %** | (+79 %) | +| Mean tokens_out | 6789.8 | **5728.8** | −1061.0 | **−15.6 %** | (+26.9 %) | +| Pass rate | 4/4 | 4/4 | 0 | 0 % | (0 %) | + +This is the cell that motivated the search-quality work. **B paid a ++103 % tool-call tax in the prior run; in this run it's a −26 % win.** +And mechanically B did much less reading: B's per-cell `files_read_count` +mean dropped to **7.25** vs A's **15.25** — half. The path-aware +embeddings + min-score 0.4 made the top-K hits relevant enough that B +didn't need to range-read the codebase. + +The most striking single cell: `tests_03_B` finished in 146 s with 6 files +read; `tests_03_A` took 245 s and read 28 files. B chose the public +`Service.CancelIndexing` method (a real exported function); A picked +`splitPath` (unexported) on 3 of 4 variants — the runbook's verification +gap from the prior report is still there. + +### summary — small but consistent flip + +| Metric | A (new) | B (new) | Δ B−A | Δ % | (prior B−A %) | +|-------------------|---------|---------|-----------|----------|---------------| +| Mean elapsed s | 68.8 | **60.3** | −8.5 | **−12.4 %** | (+11.7 %) | +| Mean tool calls | 14.8 | 17.3 | +2.5 | +16.9 % | (+12.1 %) | +| Mean tokens_in† | 17.8 | 19.3 | +1.5 | +8.6 % | (+3.1 %) | +| Mean tokens_out | 2000.8 | 2168.5 | +167.7 | +8.4 % | (+24.0 %) | +| Pass rate | 4/4 | 4/4 | 0 | 0 % | (0 %) | + +† B excludes summary_04_B 285-token-in anomaly. + +Both workers grounded the summaries; rubric scores are flat at 6/7 across +all 8 cells (vs prior A=6,6,6,7 / B=6,5,6,6). B is now ~12 % faster and +spent only +8 % output tokens (vs +24 % before). + +--- + +## 3. Per-run table (all 32 rows, sorted) + +| run_id | elapsed_s | tools | toks_total | toks_in | toks_out | cix_ops | grep_ops | files_read | outcome | +|-----------------|-----------|-------|------------|---------|----------|---------|----------|------------|---------| +| bugfix_01_A | 78 | 15 | 1643 | 24 | 1619 | 0 | 1 | 2 | pass | +| bugfix_01_B | 75 | 13 | 1710 | 21 | 1689 | 0 | 0 | 2 | pass | +| bugfix_02_A | 67 | 10 | 1190 | 16 | 1174 | 0 | 0 | 2 | pass | +| bugfix_02_B | 48 | 11 | 1307 | 16 | 1291 | 0 | 0 | 2 | pass | +| bugfix_03_A | 67 | 13 | 1760 | 19 | 1741 | 0 | 2 | 2 | pass | +| bugfix_03_B | 83 | 15 | 1988 | 26 | 1962 | 0 | 2 | 2 | pass | +| bugfix_04_A | 69 | 15 | 1889 | 23 | 1866 | 0 | 1 | 2 | pass | +| bugfix_04_B | 70 | 15 | 1742 | 21 | 1721 | 0 | 1 | 2 | pass | +| refactor_01_A | 68 | 15 | 2306 | 22 | 2284 | 0 | 3 | 5 | partial | +| refactor_01_B | 104 | 19 | 3052 | 32 | 3020 | 2 | 3 | 1 | pass | +| refactor_02_A | 86 | 16 | 2267 | 22 | 2245 | 0 | 4 | 1 | pass | +| refactor_02_B | 90 | 21 | 2875 | 29 | 2846 | 2 | 5 | 1 | pass | +| refactor_03_A | 80 | 18 | 2263 | 26 | 2237 | 0 | 5 | 1 | partial | +| refactor_03_B | 91 | 18 | 3093 | 25 | 3068 | 2 | 6 | 2 | partial | +| refactor_04_A | 85 | 18 | 28865 | 25641 | 3224 | 0 | 6 | 4 | partial | +| refactor_04_B | 99 | 21 | 2610 | 27 | 2583 | 2 | 2 | 2 | pass | +| summary_01_A | 65 | 12 | 1497 | 15 | 1482 | 0 | 0 | 0 | pass | +| summary_01_B | 64 | 20 | 2156 | 23 | 2133 | 0 | 0 | 8 | pass | +| summary_02_A | 65 | 15 | 2076 | 18 | 2058 | 0 | 0 | 7 | pass | +| summary_02_B | 41 | 13 | 1829 | 16 | 1813 | 1 | 0 | 5 | pass | +| summary_03_A | 79 | 19 | 2398 | 22 | 2376 | 0 | 1 | 10 | pass | +| summary_03_B | 74 | 16 | 2043 | 19 | 2024 | 0 | 1 | 8 | pass | +| summary_04_A | 66 | 13 | 2103 | 16 | 2087 | 0 | 0 | 0 | pass | +| summary_04_B | 62 | 20 | 2989 | 285 | 2704 | 6 | 0 | 0 | pass | +| tests_01_A | 200 | 37 | 7345 | 51 | 7294 | 0 | 3 | 16 | pass | +| tests_01_B | 189 | 31 | 6482 | 46 | 6436 | 0 | 1 | 14 | pass | +| tests_02_A | 163 | 29 | 6042 | 45 | 5997 | 0 | 4 | 9 | pass | +| tests_02_B | 148 | 23 | 6689 | 32 | 6657 | 1 | 2 | 2 | pass | +| tests_03_A | 245 | 50 | 8422 | 66 | 8356 | 0 | 6 | 28 | pass | +| tests_03_B | 146 | 30 | 5490 | 39 | 5451 | 1 | 3 | 6 | pass | +| tests_04_A | 157 | 29 | 5560 | 48 | 5512 | 0 | 5 | 8 | pass | +| tests_04_B | 134 | 23 | 4405 | 34 | 4371 | 1 | 2 | 7 | pass | + +Pass = 28/32 (15 B + 13 A). Partial = 4/32 (3 A refactor + 1 B refactor). +No `(violation)` rows: every A cell has `cix_ops = 0`. + +Summary rubric scores: A = {6, 6, 6, 6}, B = {6, 6, 6, 6}. Both pass +(threshold ≥5). + +--- + +## 4. Methodology (abridged) + +Same as 2026-04-27 (see `docs/benchmark-runbook.md` for the runbook). +Two procedural deviations from the runbook, **identical to the prior +run unless noted**: + +1. PREAMBLE_B URL = `http://192.168.1.168:21847` (RTX 3090 prod box, + not literal `localhost`). Same as prior run. +2. **Per-cell unique workspace** at `/tmp/cix-bench-runs/${RUN_ID}/` + instead of one shared `/tmp/cix-bench-run/`. Different paths produce + different `projectHash` on the server, so each B-cell hits a fresh + index — no residual chunks bleeding between cells. **This is new in + this run.** Effect: every B cell pays a one-time index cost (180-s + wait deadline; observed 30–60 s actual), absorbed inside cell setup + and excluded from `elapsed_s`. + +The cix server on .168 ran the working-tree binary with +`CIX_EMBED_INCLUDE_PATH=true` (default) and the new `min-score=0.4` +default. Spot check before launch: `cix search "main entry point server"` +ranked `server/cmd/cix-server/main.go` first at 0.52, confirming the +path-aware embeddings were live. + +All 32 transcripts identify the worker model as `claude-sonnet-4-6` — +audited via `grep -L 'claude-sonnet-4-6' /tmp/cix-bench/results/runs/*.log` +returning zero lines. + +Fixture manifest (`fixture-manifest.txt`, 3744 hashed files) verified +clean both before and after the run. + +--- + +## 5. Headline numbers (executive summary) + +The 2026-04-27 run found that cix-first navigation was *more reliable but +no faster* than grep-only. The 2026-04-28 re-run, with path-aware +embeddings + `min-score=0.4` shipped, finds cix-first is now +**−7.4 % faster**, **−4.6 % fewer tool calls**, and **−3.4 % fewer +output tokens** than grep-only — while still beating it on pass rate +(15/16 vs 13/16). The single biggest gain is the **tests** task, which +flipped from a +37 % B-tax to a −19 % B-win, with B reading half as many +files per cell. The summary task also flipped (+12 % B-tax → −12 % +B-win). Refactor remains the one task where B costs more wall-clock +than A on average, but B's pass rate (3/4) is 3× A's (1/4) — same +direction as the prior run. + +--- + +## 6. Caveats + +- **Both workers got slower in absolute terms vs 2026-04-27.** A grew + +65 % on elapsed and +32 % on output tokens despite never talking to + the cix server — pure Sonnet variance. B grew +36 % on elapsed. + The honest comparison is therefore the *within-run gap* between A and + B, not the absolute delta vs the prior run. Both within-run gap + measurements are in §1.5 and §2. +- **Per-cell unique paths** are new this run. Prior run reused a single + `/tmp/cix-bench-run/` path so all 32 cells hit the same `projectHash` + on the server. This run isolates each cell on a fresh hash. Effect on + B should be small (server-side caches keyed by chunk content, not + project), but it's a real procedural difference worth flagging. +- **`refactor_04_A` token spike**: 25 641 input tokens vs 16–26 for the + other 15 A cells. Almost certainly cache-miss accounting; treated as + an outlier in the per-task means but kept in the per-run table. +- **`tokens_in` is uncached input only.** Cache-creation and cache-read + tokens dominate real Sonnet cost and are not summed by `metrics.sh`. + This is consistent with the prior run's accounting — the relative gap + is comparable, the absolute number is not the whole bill. +- **Fixture is a snapshot of the cix project itself** — the model may + recognise it from training. Same caveat as 2026-04-27. +- **Tool restriction is enforced via prompt, not at the harness level.** + No A cell violated (`cix_ops = 0` everywhere); we still trust the + prompt because of post-hoc audit, not architecture. +- **Single machine, single model (`claude-sonnet-4-6`), single embedding + model, single random seed per worker.** No warm/cold cache split. +- **Pre-run cix indexing time is excluded from `elapsed_s`** (B gets a + "free" index), as before. Indexing took 30–60 s per B cell on .168 — + not amortised in the workload comparison. +- **Refactor verification still depends on naming the seeded function.** + A's "asymptotically inefficient" picks (`chunkSlidingWindow`, + insertion sort, `topN`) are real wins on the merits but score + `partial` because they aren't the runbook's planted target. The + runbook gap from the prior report (§7.2 too strict) hasn't been + patched. +- **Tests verification is exportedness-blind.** Both workers picked + unexported helpers (`splitPath` and friends) on tests_01/02 and still + scored `pass`. The new code didn't change this. + +--- + +## 7. Verbatim prompts + +Identical to 2026-04-27 (see `docs/benchmark-runbook.md` §3 and §4): +COMMON_PREAMBLE, PREAMBLE_A, PREAMBLE_B, BUGFIX_PROMPT, REFACTOR_PROMPT, +TESTS_PROMPT, SUMMARY_PROMPT — all unchanged. The only deltas in +PREAMBLE_B vs the runbook's literal text are the api URL +(`http://192.168.1.168:21847`) and the per-cell `cd` path +(`/tmp/cix-bench-runs/${RUN_ID}/`). + +For Worker A, the runbook §5.2 auth-error gate line was appended to +every assembled prompt: +> Note: the env var CIX_API_KEY is set to an invalid value for this run; +> any cix call will fail with an auth error. + +--- + +## 8. Where the artefacts live + +- This report: + `doc/benchmark-cix-vs-grep-2026-04-28.md` +- Prior report (preserved): + `doc/benchmark-cix-vs-grep.md` (2026-04-27) +- New CSV: `/tmp/cix-bench/results/results.csv` +- Prior CSV (preserved): `/tmp/cix-bench/results/results.2026-04-27.csv` +- New per-run logs + metrics: `/tmp/cix-bench/results/runs/` +- Prior per-run logs + metrics (preserved): + `/tmp/cix-bench/results/runs.2026-04-27/` +- Summary rubric scores (this run only): + `/tmp/cix-bench/results/rubric.json` +- Fixture (frozen, byte-identical to 2026-04-27): + `/tmp/cix-bench/baseline/`, `/tmp/cix-bench/variants/`, + `/tmp/cix-bench/fixture-manifest.txt` diff --git a/skills/README.md b/skills/README.md index 6bb6a51..230130d 100644 --- a/skills/README.md +++ b/skills/README.md @@ -2,7 +2,9 @@ ## cix — Semantic Code Search -Teaches an AI agent how to use `cix` for code navigation instead of Grep/Glob. +Teaches an AI agent when to reach for `cix` (semantic, cross-file, +exploratory) versus Grep / Glob / Read (exact strings, known pointers, +non-code files). ### Install @@ -18,6 +20,8 @@ In a Claude Code session: /cix ``` -Loads search guidance into context. Claude will use `cix search` instead of Grep/Glob for the rest of the session. +Loads navigation guidance into context for the rest of the session. -To activate automatically in every session, add `cix` usage instructions to `~/.claude/CLAUDE.md` (see the [Agent Integration](../README.md#agent-integration) section in the main README). \ No newline at end of file +To activate automatically in every session, add `cix` usage instructions +to `~/.claude/CLAUDE.md` (see the [Agent Integration](../README.md#agent-integration) +section in the main README). \ No newline at end of file diff --git a/skills/cix/SKILL.md b/skills/cix/SKILL.md index f755307..d85c0b5 100644 --- a/skills/cix/SKILL.md +++ b/skills/cix/SKILL.md @@ -1,25 +1,36 @@ --- name: cix -description: Semantic code search and navigation using the cix index. Use BEFORE Grep/Glob/Read for faster, smarter code discovery. Covers search, definitions, references, symbols, files, and indexing. +description: Semantic code search and navigation using the cix index. Reach for cix when you don't already know where to look. Covers search, definitions, references, symbols, files, and indexing. user-invocable: true --- # Code Index (`cix`) — Semantic Code Search & Navigation -You have access to `cix`, a semantic code index that understands your codebase. It uses embeddings and AST parsing to provide intelligent search — **always prefer `cix` over Grep/Glob** when looking for code. +You have access to `cix`, a semantic code index that understands the +codebase via embeddings + AST parsing. The right reflex is **"cix when +you don't have a pointer; grep when you do."** -## Why use `cix` first? +## When to use which -1. **Saves tokens** — returns only relevant snippets, not entire files -2. **Understands meaning** — "authentication middleware" finds auth code even if those words aren't in the source -3. **Structured navigation** — go-to-definition and find-references like an IDE -4. **Fast** — pre-indexed, no filesystem scanning needed +**Reach for `cix` first when:** +- The starting point is open-ended ("how does indexing work?", "find the + authentication middleware", "where is the main entry point?") +- You need cross-file navigation (definitions / references / callers) +- You're searching by *meaning*, not by an exact string + (`"JWT validation"` should find `verifyToken` even without that phrase) +- You're exploring an unfamiliar package or codebase -## Search priority +**Skip `cix`, use Read / Grep / Glob directly when:** +- A failing test or stack trace already names the file and function — + just `Read` it +- You're chasing an exact literal: a specific error message, a config + key, a commit-message phrase, an import path +- You're inside dependencies (`node_modules`, `vendor`, `.venv`) — they + aren't indexed +- You're editing a non-code file (Dockerfile, yaml, lockfile) -1. `cix search` or `cix symbols` — FIRST choice -2. `cix definitions` / `cix references` — for navigation -3. Grep/Glob — only if `cix` returns no results or is unavailable +If `cix` returns nothing relevant after one well-formed query, fall +back to grep — don't loop on cix. --- @@ -32,29 +43,24 @@ cix search "database connection retry logic" cix search "error handling in payment flow" --limit 20 cix search "config parsing" --in ./internal/config/ cix search "API routes" --lang go -cix search "validation" --in ./api --lang python +cix search "main entry point" --exclude bench/fixtures --exclude legacy ``` **Flags:** - `--in ` — restrict to file or directory (can repeat) +- `--exclude ` — drop a directory or substring from results (can repeat) - `--lang ` — filter by language (can repeat) -- `--limit ` — max results (default: 10) -- `--min-score ` — minimum relevance 0.0-1.0 (default: 0.1) +- `--limit ` — max **files** returned (default: 10) — output is + grouped per file with all matches inside, so 10 files ≈ many snippets +- `--min-score ` — minimum relevance 0.0–1.0 (default: **0.4**) ### Go to Definition — find where a symbol is defined ```bash cix definitions HandleRequest cix def AuthMiddleware --kind function -cix goto UserService --kind class cix def Config --file ./internal/config.go ``` - -**Aliases:** `definitions`, `def`, `goto` - -**Flags:** -- `--kind ` — filter: function, class, method, type -- `--file ` — narrow to specific file -- `--limit ` — max results (default: 10) +Aliases: `definitions`, `def`, `goto`. Flags: `--kind`, `--file`, `--limit`. ### Find References — find where a symbol is used ```bash @@ -62,12 +68,7 @@ cix references HandleRequest cix refs AuthMiddleware --limit 50 cix usages UserService --file ./internal/api/ ``` - -**Aliases:** `references`, `refs`, `usages` - -**Flags:** -- `--file ` — narrow to specific file -- `--limit ` — max results (default: 30) +Aliases: `references`, `refs`, `usages`. Flags: `--file`, `--limit`. ### Symbol Search — find symbols by name ```bash @@ -75,10 +76,7 @@ cix symbols handleRequest cix symbols User --kind class cix symbols Auth --kind function --kind method ``` - -**Flags:** -- `--kind ` — filter: function, class, method, type (can repeat) -- `--limit ` — max results (default: 20) +Flags: `--kind` (function/class/method/type, repeatable), `--limit`. ### File Search — find files by path pattern ```bash @@ -88,60 +86,112 @@ cix files "middleware" --limit 20 ### Project Overview ```bash -cix summary # languages, directories, key symbols -cix status # indexing status, file counts +cix summary # languages, top dirs, key symbols +cix status # indexing status + file watcher status cix list # all indexed projects ``` ### Indexing ```bash cix init [path] # register + index + start watcher -cix reindex # incremental (only changed files) -cix reindex --full # full reindex from scratch -cix watch # start auto-reindex daemon +cix reindex # incremental +cix reindex --full # full reindex +cix cancel # cancel an in-flight indexing run +cix watch # start file-change auto-reindex daemon cix watch stop # stop daemon ``` +The watcher auto-reindexes on file change — manual `reindex` is rarely +needed. `cix status` shows whether the watcher is running and the +last-sync timestamp. + +--- + +## Search quality — what scores mean + +Default `--min-score 0.4` is calibrated for the production embedding +model (CodeRankEmbed-Q8 with path-aware preamble). Rough landscape: + +| Score | Meaning | +|----------|---------------------------------------------------------| +| 0.65+ | Exact / very strong match — almost certainly relevant | +| 0.50–0.65| Strong match — usually relevant | +| 0.40–0.50| Weaker match — sometimes useful, sometimes not | +| <0.40 | Noise — filtered out by default | + +**If a query returns nothing**, lower the floor explicitly: +`--min-score 0.2` for very specific or long-tail queries. Don't drop +below 0.2 — results below that are noise. + +--- + +## Writing better queries — leverage path-aware embedding + +Each chunk is embedded with its file path, language, and symbol name in +the preamble. This means **mentioning a file/dir/symbol you already +know about boosts ranking**: + +```bash +# Generic +cix search "validation" +# Better — pins the search to the auth area +cix search "validation in auth middleware" +# Even better when you know the symbol +cix search "ValidateToken" --kind function +``` + +Natural-language queries that name the *kind of thing* and *where it +lives* outperform single-word queries. + --- ## Usage Patterns -### Exploring unfamiliar code +### Exploring unfamiliar code (`cix`'s strongest case) ```bash -cix summary # understand project structure -cix search "main entry point" # find where it starts -cix search "database" --limit 20 # find all DB-related code +cix summary # project structure, top dirs +cix search "main entry point server" # find where it starts +cix search "database connection setup" # find DB wiring +cix search "request handler" --in ./api # narrow to API ``` -### Finding specific functionality +### Tracing a symbol end-to-end ```bash -cix search "JWT token validation" # semantic — finds by meaning -cix symbols "Validate" --kind function # exact name lookup -cix def ValidateToken # jump to definition -cix refs ValidateToken # find all callers +cix def HandleRequest # where is it defined? +cix refs HandleRequest # who calls it? +cix search "HandleRequest error handling" # how are errors handled? ``` -### Understanding a symbol +### Chasing a known target (often grep is enough) ```bash -cix def HandleRequest # where is it defined? -cix refs HandleRequest # who calls it? -cix search "HandleRequest error" # how are errors handled? +# Stack trace says "internal/auth/middleware.go:42 — invalid token" +# → just Read that file. No cix needed. + +# Config key "max_concurrent_requests" used somewhere? +# → grep is more precise. ``` ### Narrowing scope ```bash -cix search "middleware" --in ./api/ # only in api directory -cix search "config" --in ./cmd/ # only in cmd directory -cix refs Config --file ./internal/server.go # only in one file +cix search "middleware" --in ./api/ +cix search "config" --in ./cmd/ --exclude legacy +cix refs Config --file ./internal/server.go ``` --- ## Tips -- Search queries are natural language — write what you're looking for, not regex -- `cix def` is faster than `cix symbols` for exact name matches -- `cix refs` finds usages across the entire codebase in indexed chunks -- Use `--in` to avoid noise from irrelevant directories -- The index auto-updates via file watcher — no need to manually reindex -- If results seem stale, run `cix reindex` \ No newline at end of file +- Search queries are natural language, not regex. Write what you'd ask + a colleague. +- Output groups by file: each result line is a file with all relevant + matches inside, ordered top-to-bottom by line number. The + `[best 0.NN]` is the score of the top hit in that file. +- `cix def` is a faster path than `cix symbols` when you already know + the exact name. +- `--exclude` complements `--in` — use it to drop noisy dirs (`bench/`, + `legacy/`, vendored code) inline without touching `.cixignore`. +- The watcher keeps the index fresh. If results feel stale, check + `cix status` first — `Watcher: ✗ not running` is the usual cause. +- Don't loop. If a query returns nothing useful after one well-phrased + attempt + one `--min-score 0.2` retry, drop to grep.