Skip to content

Switch reranker from Jina API to local cross-encoder#64

Merged
CodeNinjaSarthak merged 5 commits into
mainfrom
feature/local-cross-encoder-migration
Apr 26, 2026
Merged

Switch reranker from Jina API to local cross-encoder#64
CodeNinjaSarthak merged 5 commits into
mainfrom
feature/local-cross-encoder-migration

Conversation

@CodeNinjaSarthak

Copy link
Copy Markdown
Owner

Summary

  • Migrates the reranker from the hosted Jina API to a local cross-encoder (cross-encoder/ms-marco-MiniLM-L-6-v2) loaded via sentence-transformers. The local model runs on CPU with ONNX INT8 quantization and produces results within noise of the Jina API while removing all API key, rate limit, and reproducibility concerns.

Results (LoCoMo, n=1540)

Metric Score
Overall 56.3% (vs. 57.5% with Jina — within bootstrap CI)
Temporal 64.2% (vs. 68.2% — within CI)
Held-out (n=718) 55.0% overall, 68.3% temporal

Key finding

Ablation reveals the cross-encoder reranker is the load-bearing component. Round-robin merge vs. score-based merge produces identical results (55.8%) once the reranker is in place — the merge strategy becomes irrelevant. Without the reranker, neither merge strategy improves over isolation alone.

Changes

  • README.md: numbers updated, Jina references replaced, 4 new ablation rows added, fact extraction precision corrected (58.6% → 52% to match measured value)
  • .env.development.example: JINA_API_KEY section removed
  • eval/eval_qa_accuracy.py: --local-rerank, --merge-strategy, --top-k, --fetch-multiplier, --no-rr-rerank flags added; help strings updated

Note: JinaRateLimiter dead code (424f359) is intentionally left in — cleanup belongs in a follow-up PR.

Test plan

  • Verify README numbers match eval output JSON
  • Smoke test --local-rerank flag on a single conv before full run
  • Confirm JINA_API_KEY is no longer referenced in .env.development.example
  • Confirm --no-rerank ablation still works (no reranker path)

- Archive 18 superseded/debug result files (untracked)
- Rename canonical result files for clarity
- Add --no-isolation and --no-rerank flags to eval script
- Add 5-attempt retry on generation, retrieval, judge calls
- Add run_ablation_parallel.sh for parallel ablation runs
- Record no_isolation/no_rerank/model in result metadata
Update README numbers to 56.3% overall, 64.2% temporal (local
cross-encoder/ms-marco-MiniLM-L-6-v2 results, n=1540)
Replace Jina branding in mermaid diagrams and text
Add ablation rows showing reranker is load-bearing component
Fix fact extraction precision discrepancy (58.6% -> 52%)
Remove JINA_API_KEY from .env.development.example
Update eval_qa_accuracy.py help strings for --no-rerank flags
Document --local-rerank flag in Reproduce Results section
@CodeNinjaSarthak CodeNinjaSarthak merged commit f3c0dfb into main Apr 26, 2026
2 checks passed
@CodeNinjaSarthak CodeNinjaSarthak deleted the feature/local-cross-encoder-migration branch April 26, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant