A modular, production-ready Retrieval-Augmented Generation (RAG) backend that scrapes website content using Scrapling, chunks and embeds the text, and answers questions with an LLM (Groq API). Comes with a React frontend for seamless QA over any website.
- Web Scraping: Extract website content using Scrapling.
- Text Chunking: Efficiently splits large texts for better retrieval.
- Embeddings & Vector Store: Uses Sentence Transformers and FAISS for semantic search.
- LLM Integration: Leverages Groq API for context-grounded answers.
- CORS-ready FastAPI Backend: Plug-and-play with Next.js or React frontends.
- React UI: Quickstart frontend in
/web-rag-ui.
-
Clone the Repository
git clone https://github.com/yourusername/Website-rag-using-scrapling.git cd Website-rag-using-scrapling -
Create Virtual Environment & Install Dependencies
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt(You may need to manually install:
fastapi,scrapling,sentence-transformers,faiss-cpu,python-dotenv,groq) -
Set up Environment Variables
- Copy
.env.exampleto.envand add your Groq API Key:GROQ_API_KEY=your_groq_api_key
- Copy
-
Run the Backend
uvicorn app:app --reload
-
Navigate to UI Folder
cd web-rag-ui -
Install Dependencies
npm install
-
Start the Frontend
npm start
Open http://localhost:3000 in your browser.
- Start both backend and frontend as above.
- In the web UI, enter a target website URL and your question.
- The backend will:
- Scrape the website,
- Chunk and embed the content,
- Retrieve relevant context,
- Generate an answer using the LLM.
- The answer appears in the UI, sourced only from the provided site.
Contributions are welcome! Please:
- Fork the repo.
- Create your feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -am 'Add new feature' - Push to the branch:
git push origin feature/your-feature - Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.
.
├── app.py # FastAPI entrypoint
├── scraper.py # Web scraping logic with Scrapling
├── utils.py # Text chunking utilities
├── embedder.py # Embedding & vector store
├── llm.py # LLM prompt & completion
├── web-rag-ui/ # React frontend
│ ├── package.json
│ └── ...
└── README.md
Built with ❤️ for accessible, modular, and production-ready RAG pipelines.
This project is licensed under the MIT License.
🔗 GitHub Repo: https://github.com/Tharanika-R-Git/Website-rag-using-scrapling
