Skip to content

Adityag009/Azure-RAG-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure RAG System 🧠🔍

Overview

This project implements a Retrieval-Augmented Generation (RAG) system using Azure AI Services. The system allows users to:

  1. Upload a document (PDF or Word).
  2. Convert it into embeddings and store it in Azure AI Search.
  3. Ask questions related to the document.
  4. Retrieve relevant document chunks using Azure Cognitive Search.
  5. Generate responses using Azure OpenAI (GPT-4o).

The project is built with Streamlit for the user interface and integrates Azure OpenAI & Azure Cognitive Search for document indexing and retrieval.


🚀 Features

Document Upload & Processing - Upload PDF or Word files, extract text, and convert it into embeddings.
Vector-Based Search - Store document embeddings in Azure AI Search for similarity search.
Query Handling - Users can ask questions, and the system retrieves relevant document content.
AI-Powered Responses - Uses Azure OpenAI GPT-4o to generate answers based on retrieved data.
Scalability - Uses Azure AI Services for efficient document handling.


🛠️ Services & Tools Used

Here are the core services and tools used in this project:

🟢 Azure Services

  • Azure OpenAI (GPT-4o) → For generating AI responses.
  • Azure Cognitive Search → For storing & retrieving document embeddings.
  • Azure AI Search Indexing → Converts uploaded documents into searchable embeddings.

📜 Libraries & Frameworks

  • Streamlit → For building the interactive web app UI.
  • Azure SDK for Python → For integrating Azure AI services.
  • OpenAI Python SDK → For accessing Azure OpenAI models.
  • pdfplumber → For extracting text from PDFs.
  • python-docx → For extracting text from Word documents.
  • tiktoken → For tokenization.

📂 Project Structure & Important Files

📂 Azure-RAG-System/
│── 📜 main.py            # Streamlit UI & RAG workflow
│── 📜 rag_index2.py      # Azure AI Search Indexing logic
│── 📜 requirements.txt   # Dependencies & Python libraries
│── 📂 assets/            # Placeholder for future assets

🔹 main.py

  • Handles user interactions via Streamlit UI.
  • Uploads documents (PDF/Word), extracts text, and generates embeddings.
  • Sends embeddings to Azure AI Search for indexing.
  • Accepts user queries and retrieves relevant document content.
  • Passes retrieved content to Azure OpenAI GPT-4o to generate responses.

🔹 rag_index2.py

  • Manages Azure Cognitive Search indexing.
  • Converts documents into vector embeddings.
  • Stores indexed content in Azure AI Search for retrieval.

🔹 requirements.txt

  • Contains all dependencies and libraries needed to run the project.
  • Install dependencies using:
    pip install -r requirements.txt

🚀 How to Run the Project Locally

1️⃣ Clone the Repository:

git clone https://github.com/your-github-username/Azure-RAG-System.git
cd Azure-RAG-System

2️⃣ Install Dependencies:

pip install -r requirements.txt

3️⃣ Set up Environment Variables:

  • Create a .env file and add your Azure API Keys & Endpoints:
    AZURE_SEARCH_SERVICE=your-azure-search-service
    AZURE_SEARCH_INDEX=user_specific_rag_index2
    AZURE_SEARCH_KEY=your-azure-search-key
    AZURE_OPENAI_API_KEY=your-azure-openai-api-key
    AZURE_OPENAI_ENDPOINT=your-azure-openai-endpoint
    AZURE_OPENAI_DEPLOYMENT=your-azure-openai-deployment
    AZURE_OPENAI_EMBEDDING_MODEL=your-embedding-model-name
    

4️⃣ Run the Streamlit App:

streamlit run main.py

5️⃣ Upload a document and ask questions about its content! 🚀


🛠️ Future Enhancements

🔹 Add multi-document retrieval support.
🔹 Improve UI & user experience in Streamlit.
🔹 Implement user authentication for secure access.
🔹 Optimize query ranking in Azure Cognitive Search.


📢 Contributing

Feel free to fork the repo and submit a pull request! Contributions are always welcome.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors