RAG implementation by simplysandeepp · Pull Request #218 · endee-io/endee

simplysandeepp · 2026-04-18T12:28:36Z

No description provided.

- Add RAG application with Streamlit UI for document ingestion and querying - Implement Endee vector database integration for semantic search - Add Groq LLM integration for AI-powered answer generation - Create document chunking and embedding pipeline using sentence-transformers - Add environment configuration with .env support for API keys and database URL - Include comprehensive README with setup instructions and architecture overview - Add requirements.txt with all necessary dependencies - Create ingest.py module for document processing and vector storage - Create query.py module for semantic search and retrieval functionality - Add todo tracking file for project management

…tor-db

… including theme injection and improved document ingestion process

Main

Copilot

Pull request overview

Adds a Streamlit-based RAG app that ingests text documents into an Endee vector index and uses Groq to generate answers, plus Render deployment configuration and supporting scripts/docs.

Changes:

Introduces a Streamlit UI (project-RAG/app.py) for ingestion, retrieval (Endee), and generation (Groq).
Adds standalone ingestion/query utilities and Python dependencies under project-RAG/.
Adds a Render Blueprint (render.yaml) and extensive project documentation (project-RAG/README.md).

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`render.yaml`	Defines two Render web services: Endee (Docker image) and the Streamlit RAG app.
`project-RAG/app.py`	Main Streamlit RAG UI with Endee connectivity, ingestion, vector search, and Groq answering.
`project-RAG/ingest.py`	Standalone PDF ingestion script (currently does not match Endee API).
`project-RAG/query.py`	Standalone query script (currently incompatible with Endee search response format).
`project-RAG/requirements.txt`	Declares Python dependencies for the app/scripts (missing `pypdf`).
`project-RAG/README.md`	Full system documentation, architecture diagrams, and usage/deploy guide (some mismatches with code).
`project-RAG/.gitignore`	Ignores local env/caches for the Python app.
`project-RAG/.env.example`	Example environment variables for Groq + Endee configuration.
`.gitignore`	Adds ignores for `project-RAG` artifacts including `test_insert.py` (which is also committed).
`project-RAG/test_insert.py`	Local debug script for vector insertion testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T12:35:27Z

+
+    def create_query_embedding(self, question):
+        """Convert question to embedding"""
+        embedding = self.model.encode(question)


For cosine similarity, the query embedding should be normalized the same way as stored document embeddings. create_query_embedding() uses self.model.encode(question) without normalization, so similarity scoring can be skewed. Use normalize_embeddings=True (or L2-normalize the returned vector) for query embeddings.

Suggested change

embedding = self.model.encode(question)

embedding = self.model.encode(question, normalize_embeddings=True)

Copilot · 2026-04-18T12:35:27Z

+def search_similar(question: str, model, top_k: int = 3):
+    """Search Endee and return source objects."""
+    query_embedding = model.encode([question])[0]
+
+    try:
+        response = requests.post(
+            f"{ENDEE_URL}/api/v1/index/{INDEX_NAME}/search",
+            json={"vector": query_embedding.astype(np.float32).tolist(), "k": top_k},
+            headers=endee_headers(content_type_json=True),


search_similar() builds the query vector with model.encode([question])[0] but does not normalize it, while ingestion uses normalize_embeddings=True. For Endee’s cosine metric (implemented as inner product on unit vectors), the query should also be unit-normalized; otherwise retrieval scores/ranking can be wrong. Encode the query with normalization (or normalize the vector before sending).

Copilot · 2026-04-18T12:35:28Z

+    with st.container(border=True):
+        st.subheader("Document Upload")
+        uploaded_file = st.file_uploader("Upload a .txt file", type=["txt"])
+
+        if uploaded_file:
+            file_text = uploaded_file.getvalue().decode("utf-8", errors="replace")
+            estimated_chunks = len(chunk_text(file_text, settings["chunk_size"]))
+
+            c1, c2, c3 = st.columns(3)
+            with c1:
+                st.caption(f"File: {uploaded_file.name}")
+            with c2:
+                st.caption(f"Characters: {len(file_text)}")
+            with c3:
+                st.caption(f"Estimated Chunks: {estimated_chunks}")
+
+            if st.button("Ingest Document", type="primary", disabled=not endee_available):
+                model = load_embedding_model()
+                progress = st.progress(0)


create_index()/delete_index() helpers exist, but the UI never calls create_index() proactively (only on a very specific insert error). If the index doesn’t already exist, the first ingestion will fail and there’s no in-app way to initialize it, despite README instructions. Add an explicit “Initialize Index” action (e.g., a sidebar button) that calls create_index() and surfaces the result to the user.

Copilot · 2026-04-18T12:35:28Z

+### What makes this production-grade?
+
+- **Semantic search** instead of keyword matching — the system finds the *meaning* of a question, not just matching words.
+- **Chunked document ingestion** — large documents are split into overlapping windows and each chunk is independently searchable.


The README claims chunks are created as “overlapping windows”, but the implemented chunk_text() in app.py uses non-overlapping range(0, len(words), chunk_size). Either implement overlap (and document the stride/overlap) or update the README to match the current behavior.

Suggested change

- **Chunked document ingestion** — large documents are split into overlapping windows and each chunk is independently searchable.

- **Chunked document ingestion** — large documents are split into fixed-size, non-overlapping chunks and each chunk is independently searchable.

Copilot · 2026-04-18T12:35:28Z

+### Vector Storage Schema
+
+Each vector inserted into Endee carries:
+- **`id`**: A unique string combining the document name, Unix timestamp, and chunk index (e.g., `contract.txt-1710000000-12`). This ensures idempotent re-ingestion does not create duplicate IDs.


The README states the chunk ID scheme “ensures idempotent re-ingestion does not create duplicate IDs”, but app.py generates IDs with a timestamp-based seed (doc_name + current time). Re-ingesting the same file will create a new set of IDs rather than being idempotent. Update the README to reflect the actual behavior, or change the ID scheme to be deterministic per (doc, chunk_index) if idempotency is required.

Suggested change

- **`id`**: A unique string combining the document name, Unix timestamp, and chunk index (e.g., `contract.txt-1710000000-12`). This ensures idempotent re-ingestion does not create duplicate IDs.

- **`id`**: A unique string combining the document name, Unix timestamp, and chunk index (e.g., `contract.txt-1710000000-12`). This guarantees uniqueness across ingestion runs, but it is **not** idempotent: re-ingesting the same file will generate a new set of IDs because the timestamp changes. If idempotent re-ingestion is required, the ID scheme should be deterministic per document and chunk index instead.

Copilot · 2026-04-18T12:35:29Z

+        response = requests.post(url, json=payload)
+
+        if response.status_code == 200:
+            results = response.json()
+            return results
+        else:
+            print(f"✗ Search failed: {response.text}")
+            return None


/api/v1/index/{index}/search returns a MessagePack payload (Content-Type: application/msgpack) in this repo’s server implementation, but this code calls response.json(), which will raise a JSON decode error on successful responses. Decode response.content with msgpack.unpackb(...) (and import msgpack) the same way app.py does.

Copilot · 2026-04-18T12:35:29Z

+    seed = f"{doc_name}-{int(time.time())}"
+
+    vectors = []
+    for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
+        meta = json.dumps({"doc": doc_name, "text": chunk})
+        vectors.append(
+            {
+                "id": f"{seed}-{i}",
+                "vector": embedding.astype(np.float32).tolist(),
+                "meta": meta,


Vector IDs are derived from int(time.time()) (seconds) plus the chunk index. Ingesting the same filename twice within the same second can generate duplicate IDs and overwrite/merge previous vectors unexpectedly. Use a higher-resolution or collision-resistant seed (e.g., time.time_ns() or a UUID) to avoid accidental ID reuse.

Copilot · 2026-04-18T12:35:30Z

+
+# Create a simple test vector
+text = "This is a test"
+embedding = model.encode([text])[0]


This debug script generates embeddings without normalization, but the server’s cosine metric assumes unit-normalized vectors. Consider using normalized embeddings here as well so insert/search behavior matches the main app and avoids misleading test results.

Suggested change

embedding = model.encode([text])[0]

embedding = model.encode([text], normalize_embeddings=True)[0]

Copilot · 2026-04-18T12:35:30Z

+        """Convert text chunks to embeddings"""
+        embeddings = self.model.encode(chunks, show_progress_bar=True)


The Endee cosine metric implementation assumes vectors are normalized (cosine distance is treated as inner product on unit vectors). create_embeddings() currently uses model.encode(chunks, show_progress_bar=True) without normalization, so inserted vectors may not be unit-length and cosine similarity results will be incorrect. Encode with normalization (or explicitly L2-normalize the vectors before insert).

Suggested change

"""Convert text chunks to embeddings"""

embeddings = self.model.encode(chunks, show_progress_bar=True)

"""Convert text chunks to normalized embeddings"""

embeddings = self.model.encode(

chunks,

show_progress_bar=True,

normalize_embeddings=True

)

Copilot · 2026-04-18T12:35:30Z

+### Ingesting a Document
+
+1. Ensure Endee is running and the **Vector DB Connected** indicator is green.
+2. Click **Initialize Index** on first use (or if the index was lost).


The README instructs users to click an Initialize Index button (and troubleshooting references it), but the Streamlit UI in app.py doesn’t expose any such control. Either add the button/workflow in the app, or remove/update these README steps so users aren’t blocked by missing UI.

Suggested change

2. Click **Initialize Index** on first use (or if the index was lost).

2. The app uses the configured Endee index automatically, so once the connection is healthy you can proceed directly to upload.

simplysandeepp

Retrieval-Augmented Generation (RAG) Implementation with Endee Vector DB

Successfully designed and implemented a Retrieval-Augmented Generation (RAG) pipeline using Endee Vector Database to enhance response accuracy and contextual relevance.

Key Highlights

Integrated vector-based semantic search for efficient document retrieval.
Implemented embedding generation pipeline for indexing structured and unstructured data.
Optimized query flow to retrieve the most relevant context before generation.
Improved response quality by grounding outputs in retrieved knowledge.
Ensured scalable and low-latency retrieval using Endee's vector indexing capabilities.

Outcome

Achieved significantly higher factual accuracy in generated responses.
Reduced hallucinations by anchoring outputs to real data.
Built a modular and extensible RAG architecture suitable for production use.

Tech Stack

Endee Vector DB
Embedding Models
LLM Integration (RAG pipeline)
Backend API Layer

Status

Completed and validated with successful end-to-end testing.

simplysandeepp and others added 12 commits March 12, 2026 21:37

Add RAG application with Endee and Groq

58097ed

feat: enhance environment configuration and add Render deployment setup

aae77b9

fix(render.yaml): update service type and configuration for endee-vec…

71a5ded

…tor-db

refactor: remove outdated deployment documentation and streamline README

b9a4ea7

refactor: improve code structure and enhance readability in app.py

1a63ec3

feat: enhance UI and functionality of RAG application with Streamlit,…

cd5b320

… including theme injection and improved document ingestion process

Merge pull request #1 from simplysandeepp/main

8309ac5

Main

Update README.md

7d5db1d

Update README.md

c169e86

Update README.md

6f5407d

Merge branch 'endee-io:master' into master

29091d6

Copilot AI review requested due to automatic review settings April 18, 2026 12:28

Copilot started reviewing on behalf of simplysandeepp April 18, 2026 12:28 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

simplysandeepp commented Apr 18, 2026

View reviewed changes

hemant-endee closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG implementation #218

RAG implementation #218
simplysandeepp wants to merge 12 commits intoendee-io:masterfrom
simplysandeepp:master

simplysandeepp commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

simplysandeepp left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	embedding = self.model.encode(question)
	embedding = self.model.encode(question, normalize_embeddings=True)

	- Chunked document ingestion — large documents are split into overlapping windows and each chunk is independently searchable.
	- Chunked document ingestion — large documents are split into fixed-size, non-overlapping chunks and each chunk is independently searchable.

	embedding = model.encode([text])[0]
	embedding = model.encode([text], normalize_embeddings=True)[0]

		"""Convert text chunks to embeddings"""
		embeddings = self.model.encode(chunks, show_progress_bar=True)

	2. Click Initialize Index on first use (or if the index was lost).
	2. The app uses the configured Endee index automatically, so once the connection is healthy you can proceed directly to upload.

Conversation

simplysandeepp commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

simplysandeepp left a comment

Choose a reason for hiding this comment

Retrieval-Augmented Generation (RAG) Implementation with Endee Vector DB

Key Highlights

Outcome

Tech Stack

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants