Benchmark system for comparing CAGRA (GPU) vs Lucene HNSW (CPU) vector search algorithms.
-
Prerequisites:
- JDK 22+
- CUDA libraries
- Python 3.7+
- pip install pyyaml matplotlib numpy click pandas
-
Set library paths:
export LD_LIBRARY_PATH="/path/to/cuvs/build:/path/to/cuda/lib64:/path/to/conda/lib:$LD_LIBRARY_PATH"
./run_sweep.sh --data-dir /data2/vsbench-datasets --datasets datasets.json --sweeps sweeps.json --configs-dir configs --results-dir results --run-benchmarks
./run_sweep.sh --data-dir /data2/vsbench-datasets --datasets datasets.json --mode solr --sweeps solr-sweeps.json --configs-dir configs --results-dir results --run-benchmarks
It builds Apache Solr's main branch and runs the benchmarks.
Edit datasets.json:
Edit (or copy+edit) sweep.json:
./run_pareto_analysis.sh (already called in run_sweep.sh) example: ./run_pareto_analysis.sh 3cNWY5 wiki10m
Serve the webui on port 8000:
cd web-ui-new; python3 -m http.server