Skip to content

Added YouTube Transcript RAG pipeline using Haystack.#291

Open
Sanjjjayyy wants to merge 3 commits into
deepset-ai:mainfrom
Sanjjjayyy:main
Open

Added YouTube Transcript RAG pipeline using Haystack.#291
Sanjjjayyy wants to merge 3 commits into
deepset-ai:mainfrom
Sanjjjayyy:main

Conversation

@Sanjjjayyy

Copy link
Copy Markdown

Description

Added a new cookbook notebook that builds a RAG pipeline over any
YouTube video transcript using Haystack and HuggingFace free Inference API.

What this cookbook covers

  • Fetching YouTube transcripts using youtube-transcript-api
  • Splitting and indexing using DocumentSplitter and InMemoryDocumentStore
  • Embedding using SentenceTransformers with BAAI/bge-base-en-v1.5
  • Retrieval using InMemoryEmbeddingRetriever
  • Answer generation using Qwen2.5-7B-Instruct via HuggingFace Inference API

@Sanjjjayyy Sanjjjayyy requested a review from a team as a code owner May 26, 2026 16:57
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,429 @@
{

@kacperlukawski kacperlukawski Jun 9, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more Haystack-way to create a custom component that fetches YT transcripts, so then we can build a pipeline.


Reply via ReviewNB

@@ -0,0 +1,429 @@
{

@kacperlukawski kacperlukawski Jun 9, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to build a pipeline out of these manual wirings.


Reply via ReviewNB

@@ -0,0 +1,429 @@
{

@kacperlukawski kacperlukawski Jun 9, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be also easier to follow for Haystack users, if we just created a pipeline.


Reply via ReviewNB

@Sanjjjayyy Sanjjjayyy Jun 9, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kacperlukawski, thanks for the detailed feedback!

I understand the changes needed:

  1. Wrap the transcript fetching in a custom Haystack component
  2. Rebuild the pipelines using the proper Haystack Pipeline way

I'll work on these changes and update the PR shortly.

@kacperlukawski kacperlukawski self-requested a review June 9, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants