🕸️ Website RAG using Scrapling

A modular, production-ready Retrieval-Augmented Generation (RAG) backend that scrapes website content using Scrapling, chunks and embeds the text, and answers questions with an LLM (Groq API). Comes with a React frontend for seamless QA over any website.

🌟 Features

Web Scraping: Extract website content using Scrapling.
Text Chunking: Efficiently splits large texts for better retrieval.
Embeddings & Vector Store: Uses Sentence Transformers and FAISS for semantic search.
LLM Integration: Leverages Groq API for context-grounded answers.
CORS-ready FastAPI Backend: Plug-and-play with Next.js or React frontends.
React UI: Quickstart frontend in /web-rag-ui.

Output

🚀 Installation

Backend (Python, FastAPI)

Clone the Repository

git clone https://github.com/yourusername/Website-rag-using-scrapling.git
cd Website-rag-using-scrapling

Create Virtual Environment & Install Dependencies
```
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
(You may need to manually install: fastapi, scrapling, sentence-transformers, faiss-cpu, python-dotenv, groq)
Set up Environment Variables
- Copy .env.example to .env and add your Groq API Key:
```
GROQ_API_KEY=your_groq_api_key
```
Run the Backend
```
uvicorn app:app --reload
```

Frontend (React)

Navigate to UI Folder
```
cd web-rag-ui
```
Install Dependencies
```
npm install
```
Start the Frontend
```
npm start
```
Open http://localhost:3000 in your browser.

🛠️ Usage

Start both backend and frontend as above.
In the web UI, enter a target website URL and your question.
The backend will:
- Scrape the website,
- Chunk and embed the content,
- Retrieve relevant context,
- Generate an answer using the LLM.
The answer appears in the UI, sourced only from the provided site.

🤝 Contributing

Contributions are welcome! Please:

Fork the repo.
Create your feature branch: git checkout -b feature/your-feature
Commit your changes: git commit -am 'Add new feature'
Push to the branch: git push origin feature/your-feature
Open a Pull Request.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

📂 Project Structure

.
├── app.py               # FastAPI entrypoint
├── scraper.py           # Web scraping logic with Scrapling
├── utils.py             # Text chunking utilities
├── embedder.py          # Embedding & vector store
├── llm.py               # LLM prompt & completion
├── web-rag-ui/          # React frontend
│   ├── package.json
│   └── ...
└── README.md

Built with ❤️ for accessible, modular, and production-ready RAG pipelines.

License

This project is licensed under the MIT License.

🔗 GitHub Repo: https://github.com/Tharanika-R-Git/Website-rag-using-scrapling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕸️ Website RAG using Scrapling

🌟 Features

Output

🚀 Installation

Backend (Python, FastAPI)

Frontend (React)

🛠️ Usage

🤝 Contributing

📄 License

📂 Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
__pycache__		__pycache__
web-rag-ui		web-rag-ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
embedder.py		embedder.py
gradioui.py		gradioui.py
llm.py		llm.py
output.jpg		output.jpg
requirements.txt		requirements.txt
scraper.py		scraper.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

🕸️ Website RAG using Scrapling

🌟 Features

Output

🚀 Installation

Backend (Python, FastAPI)

Frontend (React)

🛠️ Usage

🤝 Contributing

📄 License

📂 Project Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages