Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ name: "CodeQL"

# Advanced setup. Replaces GitHub's "default setup" which auto-detects
# and scans every language it finds — that included java-kotlin, ruby,
# rust, javascript-typescript, c-cpp, python false-positives from
# vendored CGO deps and the archived legacy/python-api/ tree.
# rust, javascript-typescript, and c-cpp false-positives from vendored
# CGO deps.
#
# To stop the duplicate runs you also need to disable the default
# setup once: GitHub repo → Settings → Code security → Code scanning
Expand Down Expand Up @@ -31,8 +31,7 @@ jobs:
matrix:
# Keep tightly scoped: only languages that actually ship code.
# `actions` lints workflow YAML; `go` covers server + CLI.
# Do NOT add python (only legacy/python-api/, archived) or
# c-cpp (only transitive CGO deps, no first-party C).
# Do NOT add c-cpp (only transitive CGO deps, no first-party C).
language: [actions, go]
steps:
- name: Checkout
Expand Down
1 change: 0 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ code-index/
├── cli/ # Go CLI (cix binary)
│ ├── cmd/ # cobra commands
│ └── internal/ # client, config, daemon, indexer, watcher
├── legacy/python-api/ # archived Python backend (deprecated, see doc/MIGRATION_FROM_PYTHON.md)
└── skills/ # Claude Code skill definitions
```

Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: help build test bundle test-gate docker-build-cuda clean
.PHONY: help build test bundle test-gate docker-build-cuda docker-build-cuda-dev clean

help build test bundle test-gate docker-build-cuda clean:
help build test bundle test-gate docker-build-cuda docker-build-cuda-dev clean:
@$(MAKE) -C server $@
55 changes: 53 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,9 +244,10 @@ cix summary
# Semantic search — natural language, finds by meaning
cix search <query> [flags]
--in <path> restrict to file or directory (repeatable)
--exclude <path> exclude file or directory (repeatable)
--lang <language> filter by language (repeatable)
--limit, -l <n> max results (default: 10)
--min-score <0-1> minimum relevance score (default: 0.1)
--min-score <0-1> minimum relevance score (default: 0.4)
-p <path> project path (default: cwd)

# Symbol search — fast lookup by name
Expand Down Expand Up @@ -365,6 +366,56 @@ Supported languages: Python, TypeScript, JavaScript, Go, Rust, Java (+ 40+ other

---

## Tuning Search Quality

### `--min-score` threshold

`cix` defaults to `--min-score 0.4`. This is calibrated for **CodeRankEmbed-Q8_0** with the path-aware embedding format (`CIX_EMBED_INCLUDE_PATH=true`, default).

A typical score landscape on this codebase:

| Match strength | Score range | Action |
|---|---|---|
| Exact symbol or filename match | 0.65 – 0.80 | rare; very high confidence |
| Strong path-aware concept match | 0.50 – 0.65 | typical "good" match for `cix search "cli watch daemon"` |
| Weaker concept / partial path overlap | 0.40 – 0.50 | typical for ambiguous or multi-token queries |
| Likely unrelated noise | < 0.40 | filtered out by default |

**When to lower the threshold**:

- The query returns `No results` but you know matching code exists — try `--min-score 0.25`
- Your query is intentionally vague (exploring an unfamiliar codebase) — `--min-score 0.2`
- Single-word identifier queries on rare names

**When to raise the threshold**:

- Agent context is filling up with weak matches — `--min-score 0.5`
- You only want clear top hits — `--min-score 0.6`

> [!NOTE]
> CodeRankEmbed is **asymmetric**: queries get a `"Represent this query for searching relevant code: "` prefix, which puts query and passage vectors into separate regions of the embedding space. Cosine similarities are systematically lower than for symmetric models — a "strong" match here is 0.55, not 0.80. Don't compare these numbers to thresholds quoted for OpenAI / Voyage / generic sentence-transformers.

> [!TIP]
> If you switched embedding models or toggled `CIX_EMBED_INCLUDE_PATH`, rerun `cix reindex --full` and recalibrate. Old vectors and new vectors live in the same store but score differently.

### `--exclude` for noisy directories

Repos with vendored code, fixtures, or legacy migrations can pull unrelated paths into top results because path tokens contribute to scoring. Two options:

```bash
# One-off exclude for a single search
cix search "main entry point" --exclude vendor --exclude bench/fixtures

# Permanent exclude — add to .cixignore (skips indexing entirely)
echo "vendor/" >> .cixignore
echo "bench/fixtures/" >> .cixignore
cix reindex --full
```

`.cixignore` is preferred for directories you never want in results — they don't take up index space. `--exclude` is a per-query escape hatch.

---

## Per-Project Configuration

### `.cixignore` — exclude files from indexing
Expand Down Expand Up @@ -557,7 +608,7 @@ cix watch stop && cix watch /path/to/project

**Search returns no results**
- Check project is indexed: `cix status`
- Lower the threshold: `cix search "query" --min-score 0.05`
- Lower the threshold: `cix search "query" --min-score 0.2` (default is `0.4`; see [Tuning Search Quality](#tuning-search-quality))
- Docker mode: run `cix list` to verify the project is registered

---
Expand Down
68 changes: 68 additions & 0 deletions cli/cmd/cancel.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
package cmd

import (
"fmt"
"os"
"path/filepath"

"github.com/spf13/cobra"
)

var cancelProject string

var cancelCmd = &cobra.Command{
Use: "cancel",
Short: "Cancel an active indexing session",
Long: `Cancel any in-flight indexing session for a project.

Useful when a previous 'cix reindex' was interrupted by a network issue or
client-side timeout but the server is still holding a session lock and
returning 409 Conflict on subsequent /index/begin attempts.

Idempotent: succeeds (no-op) when no session is active.

Examples:
cix cancel
cix cancel -p /path/to/project`,
RunE: runCancel,
}

func init() {
rootCmd.AddCommand(cancelCmd)
cancelCmd.Flags().StringVarP(&cancelProject, "project", "p", "", "Project path (default: current directory)")
}

func runCancel(cmd *cobra.Command, args []string) error {
projectPath := cancelProject
if projectPath == "" {
cwd, err := os.Getwd()
if err != nil {
return fmt.Errorf("get working directory: %w", err)
}
projectPath = cwd
}

absPath, err := filepath.Abs(projectPath)
if err != nil {
return fmt.Errorf("resolve path: %w", err)
}

apiClient, err := getClient()
if err != nil {
return err
}

absPath = findProjectRoot(absPath, apiClient)

resp, err := apiClient.CancelIndex(absPath)
if err != nil {
return fmt.Errorf("cancel: %w", err)
}

if resp.Cancelled {
fmt.Printf("✓ Cancelled active indexing session for %s\n", absPath)
} else {
fmt.Printf("No active session for %s (nothing to cancel)\n", absPath)
}
return nil
}
100 changes: 100 additions & 0 deletions cli/cmd/cancel_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
package cmd

import (
"net/http"
"strings"
"testing"
)

func TestRunCancel_ActiveSession(t *testing.T) {
proj := t.TempDir()
hash := projectHash(proj)

srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) {
switch {
case strings.HasSuffix(r.URL.Path, "/api/v1/projects"):
writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0})
case strings.Contains(r.URL.Path, hash+"/index/cancel") && r.Method == http.MethodPost:
writeJSON(w, 200, map[string]any{"cancelled": true})
default:
http.NotFound(w, r)
}
})
useAPI(t, srv)

old := cancelProject
defer func() { cancelProject = old }()
cancelProject = proj

out, err := captureOutput(func() error {
return runCancel(nil, nil)
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if !strings.Contains(out, "Cancelled active indexing session") {
t.Errorf("expected success message, got:\n%s", out)
}
}

func TestRunCancel_NoActiveSession(t *testing.T) {
proj := t.TempDir()
hash := projectHash(proj)

srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) {
switch {
case strings.HasSuffix(r.URL.Path, "/api/v1/projects"):
writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0})
case strings.Contains(r.URL.Path, hash+"/index/cancel"):
writeJSON(w, 200, map[string]any{"cancelled": false})
default:
http.NotFound(w, r)
}
})
useAPI(t, srv)

old := cancelProject
defer func() { cancelProject = old }()
cancelProject = proj

out, err := captureOutput(func() error {
return runCancel(nil, nil)
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if !strings.Contains(out, "No active session") {
t.Errorf("expected idempotent message, got:\n%s", out)
}
}

func TestRunCancel_APIError(t *testing.T) {
proj := t.TempDir()
hash := projectHash(proj)

srv := mockServer(t, func(w http.ResponseWriter, r *http.Request) {
switch {
case strings.HasSuffix(r.URL.Path, "/api/v1/projects"):
writeJSON(w, 200, map[string]any{"projects": []any{}, "total": 0})
case strings.Contains(r.URL.Path, hash+"/index/cancel"):
apiError(w, 500, "internal error")
default:
http.NotFound(w, r)
}
})
useAPI(t, srv)

old := cancelProject
defer func() { cancelProject = old }()
cancelProject = proj

_, err := captureOutput(func() error {
return runCancel(nil, nil)
})
if err == nil {
t.Fatal("expected error, got nil")
}
if !strings.Contains(err.Error(), "cancel") {
t.Errorf("expected 'cancel' in error, got: %v", err)
}
}
2 changes: 1 addition & 1 deletion cli/cmd/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ func runInit(cmd *cobra.Command, args []string) error {
cfg, _ := config.Load()
batchSize := cfg.Indexing.BatchSize
fmt.Printf("Starting indexing (batch size: %d)...\n", batchSize)
result, err := indexer.Run(client, absPath, false, batchSize)
result, err := indexer.Run(cmd.Context(), client, absPath, false, batchSize, indexer.AutoProgressMode())
if err != nil {
return fmt.Errorf("indexing failed: %w", err)
}
Expand Down
17 changes: 16 additions & 1 deletion cli/cmd/reindex.go
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
package cmd

import (
"context"
"fmt"
"os"
"os/signal"
"path/filepath"
"syscall"
"time"

"github.com/anthropics/code-index/cli/internal/config"
Expand Down Expand Up @@ -68,8 +71,20 @@ func runReindex(cmd *cobra.Command, args []string) error {

fmt.Printf("%s reindexing: %s (batch size: %d)\n", indexType, absPath, batchSize)

result, err := indexer.Run(apiClient, absPath, reindexFull, batchSize)
// SIGINT/SIGTERM → ctx cancellation. The indexer propagates ctx through
// SendFilesStreaming, which closes the HTTP connection; the server's
// streaming handler sees the disconnect and calls CancelIndexing,
// freeing the project lock immediately rather than at the 1-hour TTL.
ctx, stop := signal.NotifyContext(cmd.Context(), syscall.SIGINT, syscall.SIGTERM)
defer stop()

result, err := indexer.Run(ctx, apiClient, absPath, reindexFull, batchSize, indexer.AutoProgressMode())
if err != nil {
// If the user hit Ctrl+C, surface a friendlier message — the deferred
// CancelIndex inside indexer.Run already freed the server lock.
if ctx.Err() == context.Canceled {
return fmt.Errorf("indexing cancelled by user")
}
return fmt.Errorf("indexing failed: %w", err)
}

Expand Down
7 changes: 6 additions & 1 deletion cli/cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"fmt"
"os"
"strings"
"time"

"github.com/anthropics/code-index/cli/internal/client"
"github.com/anthropics/code-index/cli/internal/config"
Expand Down Expand Up @@ -126,5 +127,9 @@ func getClient() (*client.Client, error) {
}
}

return client.New(url, key), nil
c := client.New(url, key)
if cfg.Indexing.StreamingIdleTimeoutSec > 0 {
c.SetStreamingIdleTimeout(time.Duration(cfg.Indexing.StreamingIdleTimeoutSec) * time.Second)
}
return c, nil
}
Loading
Loading