Skip to content

Latest commit

Β 

History

History
57 lines (51 loc) Β· 2.62 KB

File metadata and controls

57 lines (51 loc) Β· 2.62 KB

LocalRAG

🧠 AI-Powered Document Chunking and Querying Project

This project is designed to update a database with new documents, split them into smaller chunks, and query the database using a given query text to generate a response based on the most relevant chunks retrieved. The core features of this project include document chunking, database updating, and querying using a vector database and pre-trained language models. The project aims to provide an efficient and effective way to manage and query large documents.

πŸš€ Features

  • Document chunking: splits documents into smaller chunks based on a specified size
  • Database updating: updates the database with new document chunks
  • Querying: queries the database using a given query text to generate a response
  • Vector database: uses a Chroma vector database to store and query document chunks
  • Pre-trained language models: uses pre-trained language models for generating responses
  • Embeddings generation: generates embeddings for document chunks and query texts

πŸ› οΈ Tech Stack

  • langchain.vectorstores for Chroma vector database
  • langchain.text_splitter for RecursiveCharacterTextSplitter
  • langchain.document_loaders.pdf for PyPDFDirectoryLoader
  • Embeddings for embeddings generation
  • transformers for loading pre-trained language models
  • torch for GPU acceleration and tensor operations
  • sentence_transformers for natural language processing tasks
  • huggingface_hub for downloading models from the Hugging Face Hub
  • pathlib for handling file paths
  • os for interacting with the operating system
  • dotenv for loading environment variables

πŸ“¦ Installation

To install the project, follow these steps:

  1. Clone the repository using git clone
  2. Install the required dependencies using pip install -r requirements.txt Additionally, you need to download the required models using the following scripts:
  • installations/mistral_install.py or
  • installations/phi2_download.py (whichever compatible)
  • installations/embeddings_install.py

πŸ’» Usage

To use the project, follow these steps:

  1. Update the database with new documents using update_db.py
  2. Query the database using a given query text using query_rag.py
  3. Evaluate the response generated by query_rag.py using eval_resp.py

πŸ“‚ Project Structure

.
β”œβ”€β”€ update_db.py
β”œβ”€β”€ query_rag.py
β”œβ”€β”€ eval_resp.py
β”œβ”€β”€ Embeddings.py
β”œβ”€β”€ installations
β”‚   β”œβ”€β”€ mistral_install.py
β”‚   β”œβ”€β”€ phi2_download.py
β”‚   β”œβ”€β”€ embeddings_install.py
β”œβ”€β”€ models
β”‚   β”œβ”€β”€ mistral
β”‚   β”œβ”€β”€ phi-2
β”‚   β”œβ”€β”€ all-MiniLM-L12-v2