A Google Colab notebook that collects YouTube video transcripts and combines them into a single text file.
- Collect by Channel: Enter a YouTube channel URL/ID to download transcripts from all videos or filter by date
- Collect by Search: Search for videos by keyword, view results with view counts, and select specific videos
- Custom Video List: Enter specific video URLs or IDs directly
- Multi-language Support: Automatically detects and fetches available transcripts (English, Japanese, and more)
- Formatted Output: Exports all transcripts to a single text file with video metadata
-
Open in Google Colab: Click the button below to open the notebook
-
Get a YouTube Data API Key (free):
- Go to Google Cloud Console
- Create a project and enable "YouTube Data API v3"
- Create an API key under Credentials
-
Run the notebook cells in order
++++++++++++++[Video Title | 2024-01-15]
transcript text goes here...
==============END=================
++++++++++++++[Another Video | 2024-01-10]
another transcript text...
==============END=================
- Google Colab (recommended) or Python 3.x environment
- YouTube Data API Key (free, 10,000 units/day quota)
youtube-transcript-api>=1.0.0
google-api-python-client
python-dateutil
-
Enter a YouTube channel URL (supports multiple formats):
https://www.youtube.com/@ChannelHandlehttps://www.youtube.com/channel/UCxxxxxx- Channel name (will search)
-
Choose to download:
- All videos from the channel
- Only videos published after a specific date
- Enter a search keyword
- View results with title, channel, date, and view count
- Select videos:
- Select all results
- Select specific videos by number (e.g.,
1,3,5,7-10)
Enter specific video URLs or IDs directly to collect their transcripts.
| API Call | Cost |
|---|---|
| Search | 100 units |
| Channel info | 1 unit |
| Video list | 1 unit |
| Video statistics | 1 unit |
Daily quota: 10,000 units (sufficient for ~100 searches or thousands of video info requests)
YouTube-Transcript-Collector/
├── README.md
├── LICENSE
├── requirements.txt
├── .gitignore
├── YouTube_Transcript_Collector.ipynb # Main notebook
├── CLAUDE.md # Development guidelines
└── docs/
├── requirement.md # Requirements specification
├── architecture.md # System architecture
├── api_reference.md # API reference
├── testing_guide.md # Testing guide
└── changelog.md # Change history
- Videos with disabled captions cannot be processed
- Some videos may not have transcripts available
- API quota limits apply (10,000 units/day)
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- youtube-transcript-api for transcript fetching
- Google YouTube Data API for video metadata