Fully automated Reddit story scraping β AI voice synthesis β viral short-form video generation
Features β’ Architecture β’ Installation β’ Usage β’ Tech Stack
ViralContent Factory is an end-to-end automated content generation system that transforms Reddit stories into professionally edited, viral-ready short-form videos for TikTok, YouTube Shorts, and Instagram Reels. The pipeline handles everything from content discovery to final video rendering with zero manual intervention.
- π€ Fully Autonomous: Set it and forget it. The system runs daily via scheduled tasks
- π§ AI-Powered Intelligence: Multi-provider LLM router with automatic failover across 5+ AI services
- π― Production-Ready: Includes failover systems, cold storage backups, and email alerting
- β‘ Optimized Performance: Multi-threaded rendering, smart caching, and resource management
- π Scalable Architecture: Modular phase-based design for easy extension and maintenance
- π Smart LLM Routing: Automatic failover between Groq, Cerebras, Gemini, HuggingFace, and OpenRouter
- Multi-Source Scraping: Waterfall system across 10+ high-engagement subreddits (AITA, TIFU, TrueOffMyChest, etc.)
- Smart Filtering:
- Language detection (English-only)
- Optimal word count (120-200 words for 60-second videos)
- Duplicate prevention via persistent database
- Automatic removal of deleted/removed posts
- AI Enhancement:
- Multi-provider LLM router with automatic quota management
- Gender detection for voice matching (fast models)
- Viral hook generation with creative reasoning (strong models)
- Slang/acronym normalization (AITA β "Am I the jerk", etc.)
- Failover System: Falls back to local cold storage if all live sources fail
- Upload Automation: YouTube and Instagram automation modules (in development)
- Edge TTS Integration: Microsoft's neural voices for natural-sounding narration
- Dynamic Voice Selection: Gender-matched voices (3 female variants, 1 male)
- Word-Level Timing: Precise timestamp extraction for perfect subtitle synchronization
- Fallback Mechanisms: Sentence-level heuristics if word boundaries fail
- 9:16 Vertical Format: Optimized for mobile-first platforms
- Dynamic Background Selection: Random gameplay footage (Minecraft, GTA 5)
- Animated Subtitles:
- Impact font with stroke for maximum readability
- 2-word chunks with pop-in animations
- Mathematically synced to audio timestamps
- Smart Cropping: Automatic center-crop from 16:9 to 9:16
- Random Start Points: Prevents repetitive background footage
- Multi-Provider Architecture: Supports 5 AI providers with automatic failover
- Intelligent Task Routing:
- Fast models (Gemini, HuggingFace, OpenRouter) for classification and tagging
- Strong models (Groq, Cerebras) for creative writing and reasoning
- Quota Management: Automatically detects rate limits and switches providers
- Error Recovery: Retry logic with exponential backoff
- Cost Optimization: Routes cheap tasks to free tiers, expensive tasks to premium models
- Automated Cleanup: Removes temporary files after each run
- Batch Management: Collects 7 videos before triggering upload alert
- Email Notifications: Gmail SMTP alerts when batch is ready
- Sanitized Filenames: OS-safe naming with ID-based uniqueness
- Error Handling: Comprehensive try-catch blocks with detailed logging
- Video Path Utilities: Batch processing helpers for upload automation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MAIN PIPELINE ORCHESTRATOR β
β (main_pipeline.py) β
ββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββ΄βββββββββ
β β
βΌ βΌ
βββββββββββ βββββββββββ
β Phase 1 ββββββββ Phase 2 β
β Scraper β β Audio β
ββββββ¬βββββ ββββββ¬βββββ
β β
β βΌ
β βββββββββββ
β β Phase 3 β
βββββββββββββ Video β
ββββββ¬βββββ
β
βΌ
βββββββββββββββββ
β Cleanup & β
β Notification β
βββββββββ¬ββββββββ
β
βΌ
βββββββββββββββββ
β Upload β
β Automation β
βββββββββββββββββ
ViralContent-Factory/
βββ π main_pipeline.py # Orchestrator - coordinates all phases
βββ π phase1.py # Content acquisition & AI processing
βββ ποΈ phase2.py # Audio synthesis & timestamp extraction
βββ π₯ phase3.py # Video composition & rendering
βββ π€ llm_router.py # Multi-provider LLM failover system
βββ π₯ yt_downloader.py # Background footage downloader
βββ π§ reminder.py # Batch management & email alerts
βββ π€ yt_automation.py # YouTube upload automation
βββ π± insta_automation.py # Instagram upload automation (WIP)
βββ π§ get_videopaths.py # Video path utility for batch processing
βββ βοΈ run_factory.bat # Windows Task Scheduler entry point
βββ π¦ requirements.txt # Python dependencies
βββ ποΈ scripts.json # Persistent story database
βββ π¬ downloads/ # Background video assets
βββ π€ reels/ # Final rendered videos
βββ π¦ ready_to_upload/ # Batched videos ready for upload
| Category | Technology | Purpose |
|---|---|---|
| Language | Python 3.11+ | Core runtime |
| AI/LLM | Multi-Provider Router | Groq, Cerebras, Gemini, HuggingFace, OpenRouter |
| Voice Synthesis | Edge-TTS | Neural text-to-speech |
| Video Processing | MoviePy 1.0.3 | Compositing & rendering |
| Image Processing | ImageMagick | Text rendering backend |
| Web Scraping | Requests | Reddit API interaction |
| NLP | langdetect | Language filtering |
| Video Download | yt-dlp | Background footage acquisition |
| smtplib | Gmail notifications | |
| Environment | python-dotenv | Secure credential management |
# Required System Dependencies
- Python 3.11 or higher
- FFmpeg (for audio/video processing)
- ImageMagick (for subtitle rendering)
- Deno or Node.js (for yt-dlp)git clone https://github.com/indiser/viralcontent-factory.git
cd viralcontent-factorypip install -r requirements.txtWindows (via winget):
winget install Gyan.FFmpeg
winget install ImageMagick.ImageMagick
winget install DenoLand.DenomacOS (via Homebrew):
brew install ffmpeg imagemagick denoLinux (Ubuntu/Debian):
sudo apt update
sudo apt install ffmpeg imagemagick
curl -fsSL https://deno.land/install.sh | shCreate a .env file in the project root:
# LLM API Keys (at least one required, more = better failover)
GROQ_API_KEY=your_groq_api_key_here
CEREBRAS_API_KEY=your_cerebras_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
HUGGINGFACE_API_KEY=your_huggingface_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here
# Gmail SMTP (for notifications)
EMAIL_USER=your_email@gmail.com
EMAIL_APP_PASS=your_gmail_app_passwordNote: For Gmail, you need to generate an App Password (not your regular password)
LLM Keys: You only need ONE API key to start, but having multiple provides better reliability through automatic failover
python yt_downloader.py "https://youtube.com/watch?v=MINECRAFT_VIDEO_ID"
python yt_downloader.py "https://youtube.com/watch?v=GTA5_VIDEO_ID"Or manually place 9:16 or 16:9 gameplay videos in the downloads/ folder.
Edit phase3.py line 5 to match your ImageMagick installation:
os.environ["IMAGEMAGICK_BINARY"] = r"C:\Program Files\ImageMagick-7.1.2-Q16-HDRI\magick.exe"python main_pipeline.py- Open Task Scheduler
- Create a new task:
- Trigger: Daily at 3:00 AM
- Action: Run
run_factory.bat
- The system will automatically:
- Generate 1 video per day
- Collect 7 videos per week
- Send email alert when batch is ready
python reminder.pyThis checks if 7+ videos are ready and moves them to ready_to_upload/ folder.
python get_videopaths.pyReturns absolute paths of all videos in ready_to_upload/ for batch upload scripts.
1. [03:00 AM] Task Scheduler triggers run_factory.bat
2. [03:00:05] Phase 1 scrapes r/AmItheAsshole
3. [03:00:12] LLM Router tries Groq β generates viral hook
4. [03:00:15] Gender detected: Female β Voice: en-US-AriaNeural
5. [03:00:45] Phase 2 generates audio + word timestamps
6. [03:01:30] Phase 3 renders 60-second vertical video
7. [03:02:00] Cleanup removes temporary files
8. [03:02:05] Reminder script checks inventory (3/7 videos)
9. [Day 7] Email sent: "π’ FACTORY ALERT: Weekly Batch Ready"
10. [Manual] Run upload automation scripts
Edit phase1.py:
SUBREDDITS = [
"AmItheAsshole",
"YourNewSubreddit", # Add here
]Edit phase2.py:
WOMAN_VOICE_LIST = [
"en-US-JennyNeural",
"en-GB-SoniaNeural", # Add British accent
]Edit phase1.py line 175:
if 120 < len(words) < 200: # Change word count rangeEdit phase3.py lines 50-60:
txt_clip = TextClip(
chunk_text,
font="Arial", # Change font
fontsize=100, # Increase size
color="yellow", # Change color
stroke_width=8, # Thicker outline
)Edit llm_router.py:
CHEAP_PROVIDERS = [openrouter_chat, hf_chat, gemini_chat]
STRONG_PROVIDERS = [groq_chat, cerebras_chat]Solution: Update the path in phase3.py line 5 to match your installation
Solution: The subreddit may have no posts matching criteria. The system will automatically try the next subreddit
Solution: Ensure FFmpeg is in your system PATH. Run ffmpeg -version to verify
Solution:
- Enable 2FA on Gmail
- Generate an App Password
- Use the App Password in
.env, not your regular password
Solution:
- Check that at least one API key is valid in
.env - Verify API quotas haven't been exceeded
- Check internet connection
Solution: The system automatically falls back to sentence-level timing. This is expected behavior for some voices
- Average Runtime: 2-3 minutes per video
- Video Quality: 1080x1920 @ 30fps
- Audio Quality: 192kbps MP3
- Storage: ~15-25MB per final video
- Success Rate: 95%+ (with failover systems)
- LLM Failover: <2 seconds between provider switches
- β No user data collection
- β
API keys stored in
.env(gitignored) - β Reddit scraping complies with API terms
- β All content is public domain (Reddit posts)
- β No personal information in generated videos
- β Multi-provider LLM routing prevents vendor lock-in
- Multi-provider LLM router with automatic failover
- Batch video management system
- YouTube upload automation (in progress)
- Instagram Reels upload automation (in progress)
- TikTok API integration
- A/B testing for hooks and thumbnails
- Analytics dashboard (views, engagement tracking)
- GPU-accelerated rendering (NVENC support)
- Cloud deployment (AWS Lambda + S3)
- Web UI for manual overrides
- Multi-language support (Spanish, French, etc.)
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Reddit API - Content source
- Microsoft Edge TTS - Neural voice synthesis
- Groq, Cerebras, Gemini, HuggingFace, OpenRouter - LLM infrastructure
- MoviePy - Video processing framework
- yt-dlp - Video download utility
Project Link: https://github.com/yourusername/viralcontent-factory