This project is designed to update a database with new documents, split them into smaller chunks, and query the database using a given query text to generate a response based on the most relevant chunks retrieved. The core features of this project include document chunking, database updating, and querying using a vector database and pre-trained language models. The project aims to provide an efficient and effective way to manage and query large documents.
- Document chunking: splits documents into smaller chunks based on a specified size
- Database updating: updates the database with new document chunks
- Querying: queries the database using a given query text to generate a response
- Vector database: uses a Chroma vector database to store and query document chunks
- Pre-trained language models: uses pre-trained language models for generating responses
- Embeddings generation: generates embeddings for document chunks and query texts
langchain.vectorstoresfor Chroma vector databaselangchain.text_splitterfor RecursiveCharacterTextSplitterlangchain.document_loaders.pdffor PyPDFDirectoryLoaderEmbeddingsfor embeddings generationtransformersfor loading pre-trained language modelstorchfor GPU acceleration and tensor operationssentence_transformersfor natural language processing taskshuggingface_hubfor downloading models from the Hugging Face Hubpathlibfor handling file pathsosfor interacting with the operating systemdotenvfor loading environment variables
To install the project, follow these steps:
- Clone the repository using
git clone - Install the required dependencies using
pip install -r requirements.txtAdditionally, you need to download the required models using the following scripts:
installations/mistral_install.pyorinstallations/phi2_download.py(whichever compatible)installations/embeddings_install.py
To use the project, follow these steps:
- Update the database with new documents using
update_db.py - Query the database using a given query text using
query_rag.py - Evaluate the response generated by
query_rag.pyusingeval_resp.py
.
βββ update_db.py
βββ query_rag.py
βββ eval_resp.py
βββ Embeddings.py
βββ installations
β βββ mistral_install.py
β βββ phi2_download.py
β βββ embeddings_install.py
βββ models
β βββ mistral
β βββ phi-2
β βββ all-MiniLM-L12-v2