This project implements a Retrieval-Augmented Generation (RAG) system using Azure AI Services. The system allows users to:
- Upload a document (PDF or Word).
- Convert it into embeddings and store it in Azure AI Search.
- Ask questions related to the document.
- Retrieve relevant document chunks using Azure Cognitive Search.
- Generate responses using Azure OpenAI (GPT-4o).
The project is built with Streamlit for the user interface and integrates Azure OpenAI & Azure Cognitive Search for document indexing and retrieval.
✅ Document Upload & Processing - Upload PDF or Word files, extract text, and convert it into embeddings.
✅ Vector-Based Search - Store document embeddings in Azure AI Search for similarity search.
✅ Query Handling - Users can ask questions, and the system retrieves relevant document content.
✅ AI-Powered Responses - Uses Azure OpenAI GPT-4o to generate answers based on retrieved data.
✅ Scalability - Uses Azure AI Services for efficient document handling.
Here are the core services and tools used in this project:
- Azure OpenAI (GPT-4o) → For generating AI responses.
- Azure Cognitive Search → For storing & retrieving document embeddings.
- Azure AI Search Indexing → Converts uploaded documents into searchable embeddings.
- Streamlit → For building the interactive web app UI.
- Azure SDK for Python → For integrating Azure AI services.
- OpenAI Python SDK → For accessing Azure OpenAI models.
- pdfplumber → For extracting text from PDFs.
- python-docx → For extracting text from Word documents.
- tiktoken → For tokenization.
📂 Azure-RAG-System/
│── 📜 main.py # Streamlit UI & RAG workflow
│── 📜 rag_index2.py # Azure AI Search Indexing logic
│── 📜 requirements.txt # Dependencies & Python libraries
│── 📂 assets/ # Placeholder for future assets
- Handles user interactions via Streamlit UI.
- Uploads documents (PDF/Word), extracts text, and generates embeddings.
- Sends embeddings to Azure AI Search for indexing.
- Accepts user queries and retrieves relevant document content.
- Passes retrieved content to Azure OpenAI GPT-4o to generate responses.
- Manages Azure Cognitive Search indexing.
- Converts documents into vector embeddings.
- Stores indexed content in Azure AI Search for retrieval.
- Contains all dependencies and libraries needed to run the project.
- Install dependencies using:
pip install -r requirements.txt
1️⃣ Clone the Repository:
git clone https://github.com/your-github-username/Azure-RAG-System.git
cd Azure-RAG-System2️⃣ Install Dependencies:
pip install -r requirements.txt3️⃣ Set up Environment Variables:
- Create a
.envfile and add your Azure API Keys & Endpoints:AZURE_SEARCH_SERVICE=your-azure-search-service AZURE_SEARCH_INDEX=user_specific_rag_index2 AZURE_SEARCH_KEY=your-azure-search-key AZURE_OPENAI_API_KEY=your-azure-openai-api-key AZURE_OPENAI_ENDPOINT=your-azure-openai-endpoint AZURE_OPENAI_DEPLOYMENT=your-azure-openai-deployment AZURE_OPENAI_EMBEDDING_MODEL=your-embedding-model-name
4️⃣ Run the Streamlit App:
streamlit run main.py5️⃣ Upload a document and ask questions about its content! 🚀
🔹 Add multi-document retrieval support.
🔹 Improve UI & user experience in Streamlit.
🔹 Implement user authentication for secure access.
🔹 Optimize query ranking in Azure Cognitive Search.
Feel free to fork the repo and submit a pull request! Contributions are always welcome.