Retrieval-augmented question answering pipeline that indexes the vendored vLLM codebase and answers codebase questions with BM25 retrieval plus a local Qwen model.
The pipeline is split into four stages:
- Ingestion – walk the repository, read supported text files, and split them into chunks.
- Indexing – build a BM25 index over chunk contents.
- Retrieval – return the top-$k$ source spans for a question.
- Generation – feed the retrieved context into
Qwen/Qwen3-0.6Bthroughtransformersand generate a short answer.
Implementation details worth knowing:
- Python files are chunked with a simple AST-aware split by
def/classboundaries. - Markdown and text files use a sliding-window strategy.
- Code and docs are indexed separately.
- The generator disables Qwen3 thinking mode with
enable_thinking=Falseso answers stay direct and concise. - Metadata for source spans is preserved as
MinimalSourceobjects, which makes evaluation and traceability straightforward.
- Python 3.10
uv- A working local PyTorch installation with
transformers - Enough disk space to unpack the vendored vLLM corpus and dataset archives
make installThe repository expects the raw codebase corpus under data/raw/ and the evaluation dataset archives under data/.
Build the index, then query the pipeline:
make index
uv run python -m student search "OpenAI compatible server" --k 5
uv run python -m student answer "How to configure an OpenAI server" --k 10If you prefer the Makefile wrappers, the equivalent commands are make search and make answer.
The repo includes a local evaluator that reports Recall@1, Recall@3, Recall@5, and Recall@10 on the public question datasets.
Published results in the README and evaluator docs:
- Recall@5 on docs: 86%
- Recall@5 on code: 48%
- Required thresholds in the evaluator: 55% docs and 45% code at Recall@5
answer_datasetthroughput: about 2m16s for 100 questions (~1.36 s/it)
The repository vendors the vLLM 0.10.1 codebase under data/raw/vllm-0.10.1/. That corpus is the main indexing target.
For evaluation, the pipeline uses public question datasets in data/datasets/ and writes search outputs to data/output/search_results/ and answered outputs to data/output/search_results_and_answer/.
bm25sfor sparse retrieval and predictable ranking behaviorpydanticmodels for stable serialization of datasets, chunks, and answersfirefor lightweight CLI wiringtqdmfor dataset progress reportingtransformers+torchfor local generation instead of a hosted API
- No API keys or tokens are hardcoded in the implementation files.
- The repository is optimized for fast repository-question lookup, not for free-form chat.