Version: engraph 1.6.1 (prebuilt Linux x86_64 binary)
OS: Linux 6.17.0 (Ubuntu)
Embedder in use: all-MiniLM-L6-v2 (per engraph status)
Vault size: 723 files, 11,739 chunks, 40 MB SQLite index
Summary
engraph search runtime grows dramatically with input length. A short keyword query returns in ~1.4 s; the same vault queried with a 50-word natural-language sentence takes ~32 s — a >20× slowdown that's not explained by candidate-set size (top-N is identical, results overlap).
This blocks LLM-driven retrieval pipelines that pass the user's prompt directly to engraph search, because realistic user prompts are usually full sentences, not keyword strings.
Reproduction
$ engraph status
Vault: /home/jeb/Vaults/Archives
Files: 723
Chunks: 11739
Edges: 4846
Index size: 40.1 MB
Model: all-MiniLM-L6-v2
$ time engraph search "ternary quantization" --json -n 10 >/dev/null
real 0m1.422s
$ time engraph search "I have an interest Ternary computing and there's probably some additional videos that should be ingested into the vault. Is Ternary quant still going to be GPU compatible in addition to loading into RAM for performance on both?" --json -n 10 >/dev/null
real 0m32.474s
Both queries return overlapping top-N candidates, so the work being done isn't proportionally more useful — it's just slower.
Hypothesis
Without source access I can only speculate, but suspects:
- Embedder tokenization or pooling cost growing with input length (MiniLM has a 512-token context — anything past that is being truncated or chunked oddly).
- A lexical / BM25-style scoring lane doing per-term passes against the full chunk corpus.
- Hybrid-fusion ranking allocating per-query-term work.
A keyword-only path (short query bypass) or input clamp would mask this, but the underlying scaling is what's surprising.
Workaround we're using
Added a "Stage 0" distillation step in the calling pipeline that calls a local LLM (Ollama, gemma4:26b) to extract 5–10 keywords from long prompts before calling engraph search. Long-query end-to-end dropped from 30 s+ timeout to ~6 s, and recall actually improved (the keyword form retrieves notes the natural-language form missed).
Happy to share the hook code if useful.
Minor observation: engraph status Model field
After engraph configure --model embed hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf and a full engraph index --rebuild (19.9 min, all 721 files re-embedded), engraph status still reports Model: all-MiniLM-L6-v2. The [models] config block contains the new URI and the rebuild took the expected wall-clock for re-embedding everything, so I assume the display field is just hardcoded or not wired to read the active config. Low-impact, but confusing during diagnosis.
Asks
- Is the long-query scaling expected, or does it look like a bug?
- Would an input-length clamp / warning be reasonable at the CLI layer?
- Is the
engraph status "Model:" field expected to reflect the configured embedder, or is it intentionally a static label?
Version: engraph 1.6.1 (prebuilt Linux x86_64 binary)
OS: Linux 6.17.0 (Ubuntu)
Embedder in use:
all-MiniLM-L6-v2(perengraph status)Vault size: 723 files, 11,739 chunks, 40 MB SQLite index
Summary
engraph searchruntime grows dramatically with input length. A short keyword query returns in ~1.4 s; the same vault queried with a 50-word natural-language sentence takes ~32 s — a >20× slowdown that's not explained by candidate-set size (top-N is identical, results overlap).This blocks LLM-driven retrieval pipelines that pass the user's prompt directly to
engraph search, because realistic user prompts are usually full sentences, not keyword strings.Reproduction
Both queries return overlapping top-N candidates, so the work being done isn't proportionally more useful — it's just slower.
Hypothesis
Without source access I can only speculate, but suspects:
A keyword-only path (short query bypass) or input clamp would mask this, but the underlying scaling is what's surprising.
Workaround we're using
Added a "Stage 0" distillation step in the calling pipeline that calls a local LLM (Ollama, gemma4:26b) to extract 5–10 keywords from long prompts before calling
engraph search. Long-query end-to-end dropped from 30 s+ timeout to ~6 s, and recall actually improved (the keyword form retrieves notes the natural-language form missed).Happy to share the hook code if useful.
Minor observation:
engraph statusModel fieldAfter
engraph configure --model embed hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.ggufand a fullengraph index --rebuild(19.9 min, all 721 files re-embedded),engraph statusstill reportsModel: all-MiniLM-L6-v2. The[models]config block contains the new URI and the rebuild took the expected wall-clock for re-embedding everything, so I assume the display field is just hardcoded or not wired to read the active config. Low-impact, but confusing during diagnosis.Asks
engraph status"Model:" field expected to reflect the configured embedder, or is it intentionally a static label?