Extrag: The Agentic ETL-to-RAG Engine

Extrag is a high-performance, agentic ETL-to-RAG engine built in Rust. It goes beyond traditional RAG by incorporating MemRL patterns (Value-Aware Retrieval) and Reinforcement Learning feedback loops, allowing the system's memory to evolve based on its real-world performance.

🚀 Key Features

Agentic Memory (MemRL): Every document chunk carries a Utility Profile ($Q$-score). Retrieval is a fusion of Semantic Similarity and Historical Utility.
Delta Vectorization: Content-aware ingestion pipeline that hashes files to skip unchanged documents, drastically reducing embedding API costs.
Advanced Retrieval Stage:
- HyDE (Hypothetical Document Embeddings): Context expansion by generating "ideal" answers before searching.
- Z-Score Fusion: Balanced retrieval using normalized semantic and value scores.
Multi-Backend Support: Concrete, stable implementations for Qdrant (Vector Store) and Ollama (LLM & Embeddings).
Concurrent Ingestion: High-throughput streaming pipeline built with tokio-stream and buffered async processing.
REST First API: A professional Axum-based API providing clean endpoints for agents to ingest, retrieve, and provide feedback.

🏗 Project Structure

extrag-core: Foundational traits and common types (Embedder, VectorStore, LlmClient, etc.).
etl: Extraction, Transformation, and Loading logic. Scans files, parses formats, and chunks text.
rag: The orchestration layer for Ingestion Pipelines and Retrieval Engines.
api: The Axum REST server providing the engine interface.

🛠 Getting Started

Prerequisites

Ollama: Install Ollama and pull the required models:
```
ollama pull gemma4:latest
```
Docker: Required to run Qdrant.

Running the Engine

Start the Infrastructure:
```
docker-compose up -d
```
Start the API Server:
```
cargo run -p api
```

📡 API Endpoints

Method	Endpoint	Description
`POST`	`/v1/ingest`	Delta-aware ingestion of a directory.
`POST`	`/v1/retrieve`	Advanced Retrieval (HyDE + MemRL Fusion).
`POST`	`/v1/feedback`	RL reward loop for updating chunk utility.

🧪 Testing

Run the full workspace test suite:

cargo test --workspace

📜 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
api		api
assets		assets
etl		etl
extrag-core		extrag-core
rag		rag
web		web
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extrag: The Agentic ETL-to-RAG Engine

🚀 Key Features

🏗 Project Structure

🛠 Getting Started

Prerequisites

Running the Engine

📡 API Endpoints

🧪 Testing

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Extrag: The Agentic ETL-to-RAG Engine

🚀 Key Features

🏗 Project Structure

🛠 Getting Started

Prerequisites

Running the Engine

📡 API Endpoints

🧪 Testing

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages