A modular Retrieval-Augmented Generation (RAG) system designed to reflect real-world production pipelines.
The project combines hybrid retrieval, reranking, and LLM-based generation with full transparency of token usage and cost.
This system allows users to:
- Query their own documents using natural language
- Generate answers grounded in retrieved context
- Create summaries based on document content
- Inspect retrieved sources and debug the pipeline
- Track token usage and estimated API costs
- Switch between local and API-based language models
The architecture is modular and extensible, following patterns used in modern AI systems.
- Dense search using FAISS embeddings
- Sparse search using TF-IDF
- Score fusion for improved relevance
- Basic query normalization
- Typo correction for improved robustness
- Combines semantic similarity and keyword overlap
- Improves precision of top retrieved results
- Supports:
- OpenAI models (e.g. gpt-4.1-mini, gpt-4o-mini)
- Local models via Ollama
- Generates answers strictly based on retrieved context
- Input and output token tracking
- Estimated cost per request
- Aggregated session usage
- Retrieval evaluation framework
- Metrics:
- Top-1 / Top-3 / Top-5 accuracy
- Mean Reciprocal Rank (MRR)
- Recall@k
- Comparison of retrieval modes:
- dense vs sparse vs hybrid
- Streamlit-based interface
- Chat mode and summary mode
- Adjustable retrieval parameters
- Source inspection for each answer
Pre-Retrieval
↓
Query Processing (normalization + typo correction)
↓
Hybrid Retrieval (FAISS + TF-IDF)
↓
Reranking and Filtering
↓
Context Construction
↓
Generation (LLM)
↓
Pipeline (chat / summary)
rag/
├── indexing/
├── retrieval/
├── pre_retrieval/
├── post_retrieval/
├── generation/
├── orchestration/
├── utils/
├── config.py
evaluation/
├── test_cases.py
├── evaluate.py
app.py
-
Install dependencies
pip install -r requirements.txt -
Configure environment
cp .env.example .env
Add your OpenAI API key:
OPENAI_API_KEY=your_key_here
-
(Optional) Run local models with Ollama
ollama pull llama3.1:8b
ollama pull nomic-embed-text -
Build the knowledge base
python -m rag.indexing.builder -
Run the application
streamlit run app.py
python -m evaluation.evaluate
The retrieval component was evaluated using standard information retrieval metrics.
| Mode | Top-1 | Top-3 | Top-5 | MRR | Recall@5 |
|---|---|---|---|---|---|
| Dense | 0.90 | 0.90 | 1.00 | 0.92 | 1.00 |
| Sparse | 0.80 | 0.90 | 1.00 | 0.85 | 1.00 |
| Hybrid | 0.90 | 1.00 | 1.00 | 0.95 | 1.00 |
Hybrid retrieval achieves the best ranking performance (MRR), while all methods reach full recall at top-5.
Python
FAISS
Scikit-learn
OpenAI API
Ollama
Streamlit
Bartłomiej Jamiołkowski

