engraph search` latency scales sharply with query length (1.4s for 2-word query → 32s for 50-word natural-language query)

**Version:** engraph 1.6.1 (prebuilt Linux x86_64 binary)
**OS:** Linux 6.17.0 (Ubuntu)
**Embedder in use:** `all-MiniLM-L6-v2` (per `engraph status`)
**Vault size:** 723 files, 11,739 chunks, 40 MB SQLite index

### Summary

`engraph search` runtime grows dramatically with input length. A short keyword query returns in ~1.4 s; the same vault queried with a 50-word natural-language sentence takes ~32 s — a >20× slowdown that's not explained by candidate-set size (top-N is identical, results overlap).

This blocks LLM-driven retrieval pipelines that pass the user's prompt directly to `engraph search`, because realistic user prompts are usually full sentences, not keyword strings.

### Reproduction

```
$ engraph status
Vault:      /home/jeb/Vaults/Archives
Files:      723
Chunks:     11739
Edges:      4846
Index size: 40.1 MB
Model:      all-MiniLM-L6-v2

$ time engraph search "ternary quantization" --json -n 10 >/dev/null
real    0m1.422s

$ time engraph search "I have an interest Ternary computing and there's probably some additional videos that should be ingested into the vault.  Is Ternary quant still going to be GPU compatible in addition to loading into RAM for performance on both?" --json -n 10 >/dev/null
real    0m32.474s
```

Both queries return overlapping top-N candidates, so the work being done isn't proportionally more useful — it's just slower.

### Hypothesis

Without source access I can only speculate, but suspects:

- Embedder tokenization or pooling cost growing with input length (MiniLM has a 512-token context — anything past that is being truncated or chunked oddly).
- A lexical / BM25-style scoring lane doing per-term passes against the full chunk corpus.
- Hybrid-fusion ranking allocating per-query-term work.

A keyword-only path (short query bypass) or input clamp would mask this, but the underlying scaling is what's surprising.

### Workaround we're using

Added a "Stage 0" distillation step in the calling pipeline that calls a local LLM (Ollama, gemma4:26b) to extract 5–10 keywords from long prompts before calling `engraph search`. Long-query end-to-end dropped from 30 s+ timeout to ~6 s, and recall actually improved (the keyword form retrieves notes the natural-language form missed).

Happy to share the hook code if useful.

### Minor observation: `engraph status` Model field

After `engraph configure --model embed hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf` and a full `engraph index --rebuild` (19.9 min, all 721 files re-embedded), `engraph status` still reports `Model: all-MiniLM-L6-v2`. The `[models]` config block contains the new URI and the rebuild took the expected wall-clock for re-embedding everything, so I assume the display field is just hardcoded or not wired to read the active config. Low-impact, but confusing during diagnosis.

### Asks

1. Is the long-query scaling expected, or does it look like a bug?
2. Would an input-length clamp / warning be reasonable at the CLI layer?
3. Is the `engraph status` "Model:" field expected to reflect the configured embedder, or is it intentionally a static label?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engraph search` latency scales sharply with query length (1.4s for 2-word query → 32s for 50-word natural-language query) #34

Summary

Reproduction

Hypothesis

Workaround we're using

Minor observation: `engraph status` Model field

Asks

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

engraph search` latency scales sharply with query length (1.4s for 2-word query → 32s for 50-word natural-language query) #34

Description

Summary

Reproduction

Hypothesis

Workaround we're using

Minor observation: engraph status Model field

Asks

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Minor observation: `engraph status` Model field