GitHub - samay-hash/ClipAI_Intern: ClipAI is an end-to-end AI-powered autonomous video editing pipeline that transforms raw talking-head videos into professional, engagement-ready shorts — completely hands-free.

🧬 What is ClipAI?

ClipAI is an end-to-end AI-powered autonomous video editing pipeline that transforms raw talking-head videos into professional, engagement-ready shorts — completely hands-free.

It uses cutting-edge Large Language Models (LLaMA-3), Text-to-Image Diffusion (Stable Diffusion XL), and FFmpeg hardware compositing to:

🎤 Transcribe your video using Groq Whisper
🧠 Analyze context via LLaMA-3.3-70B to find visually interesting moments
🎨 Generate cinematic B-Roll images matching the speaker's words
🎬 Composite everything with zoompan animations, fades, and subtitle burns
☁️ Deliver the final cut via Cloudinary CDN

💡 Why Generative AI instead of Stock APIs?

Stock footage APIs (like Pexels) return generic results. If a speaker says "A glowing coffee cup next to a 1980s computer," stock APIs return plain coffee images. Our Stable Diffusion pipeline generates pixel-perfect, context-aware visuals that match exactly what the speaker describes — achieving 100% semantic relevance.

🏗️ System Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        ClipAI Pipeline                           │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│   ┌─────────┐    ┌───────────┐    ┌──────────────────────────┐  │
│   │ Next.js │───▶│  Node.js  │───▶│   FastAPI (Python)        │  │
│   │ Client  │    │   Proxy   │    │                          │  │
│   │ :3000   │    │   :5001   │    │  ┌────────────────────┐  │  │
│   └─────────┘    └───────────┘    │  │ 1. FFmpeg Extract   │  │  │
│        ▲                          │  │ 2. Groq Whisper     │  │  │
│        │                          │  │ 3. LLaMA-3 Analysis │  │  │
│        │         ┌───────────┐    │  │ 4. SDXL Image Gen   │  │  │
│        └─────────│ Cloudinary│◀───│  │ 5. FFmpeg Composite │  │  │
│                  │    CDN    │    │  └────────────────────┘  │  │
│                  └───────────┘    └──────────────────────────┘  │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

✨ Features

🤖 AI-Powered Pipeline Groq Whisper — Lightning-fast speech transcription LLaMA-3.3 70B — Context-aware prompt generation Stable Diffusion XL — Photorealistic B-roll image synthesis FFmpeg — Hardware-accelerated video compositing	🎨 User Controls 🔄 Auto B-Roll Toggle — Enable/disable AI generation 🎭 Style Selection — Cinematic / Cyberpunk / Anime 🌍 Multi-language — Auto-detect or manual language select 📊 Real-time Progress — Live step-by-step status tracking
⚡ Performance 🔁 Async Processing — Background task execution 📡 Polling Architecture — No timeout on heavy renders 📐 Resolution Match — Zero aspect-ratio distortion 🎞️ Cinematic Effects — Zoompan, fade-in/out, blur	🚀 Production Ready ☁️ Cloudinary CDN — Global edge video delivery 🐳 Docker Support — One-command AI service deploy 🔒 Env-based Config — Secure API key management 📱 Responsive UI — Works on all screen sizes

🛠️ Tech Stack

Layer	Technology	Purpose
🖥️ Frontend	Next.js 14 + Tailwind CSS	Responsive UI with real-time status
🔌 Backend Proxy	Node.js + Express + Multer	File upload buffering & API routing
🧠 AI Engine	Python + FastAPI	Core pipeline orchestration
🗣️ Transcription	Groq Whisper API	Speech-to-text with timestamps
💬 LLM	LLaMA-3.3-70B (Groq)	Context analysis & prompt engineering
🎨 Image Gen	Stable Diffusion XL (HuggingFace)	Text-to-image B-roll generation
🎬 Video Engine	FFmpeg	Compositing, transitions, subtitles
☁️ CDN	Cloudinary	Cloud storage & video delivery

🔄 AI Pipeline Deep Dive

graph LR
    A[📹 Upload Video] --> B[🎵 Extract Audio]
    B --> C[🗣️ Groq Whisper<br/>Transcription]
    C --> D[🧠 LLaMA-3 70B<br/>Context Analysis]
    D --> E[🎨 Stable Diffusion XL<br/>Image Generation]
    E --> F[🎬 FFmpeg Compositing<br/>Zoompan + Subtitles]
    F --> G[☁️ Cloudinary Upload]
    G --> H[✅ Final Video Ready]

    style A fill:#f97316,stroke:#ea580c,color:#fff
    style B fill:#f59e0b,stroke:#d97706,color:#fff
    style C fill:#10b981,stroke:#059669,color:#fff
    style D fill:#3b82f6,stroke:#2563eb,color:#fff
    style E fill:#a855f7,stroke:#9333ea,color:#fff
    style F fill:#ec4899,stroke:#db2777,color:#fff
    style G fill:#06b6d4,stroke:#0891b2,color:#fff
    style H fill:#22c55e,stroke:#16a34a,color:#fff

Step-by-Step Breakdown

Step	Process	Technology	What Happens
1️⃣	Audio Extraction	FFmpeg subprocess	Video → `.mp3` audio file extracted
2️⃣	Transcription	Groq Whisper API	Audio → timestamped text segments
3️⃣	Context Analysis	LLaMA-3.3-70B	Transcript → cinematic image prompts
4️⃣	B-Roll Generation	Stable Diffusion XL	Prompts → photorealistic images
5️⃣	Motion Animation	FFmpeg zoompan	Static images → animated video clips
6️⃣	Compositing	FFmpeg filter_complex	Overlay B-roll + burn SRT subtitles
7️⃣	Cloud Delivery	Cloudinary API	Upload → global CDN URL returned

🚀 Quick Start

Prerequisites

✅ Node.js v18+
✅ Python 3.9+
✅ FFmpeg (in system PATH)

1. Clone the Repository

git clone https://github.com/samay-hash/ClipAI_Intern.git
cd ClipAI_Intern

2. Configure Environment Variables

📁 ai-service/.env (click to expand)

# Sarvam API — for speech-to-text / Hindi captions
SARVAM_API_KEY="sk_..."

# Groq API — for LLaMA-3 and Whisper
GROQ_API_KEY="gsk_..."

# Hugging Face — for Stable Diffusion XL
HF_API_KEY="hf_..."

# Cloudinary — for video cloud storage
CLOUDINARY_CLOUD_NAME="..."
CLOUDINARY_API_KEY="..."
CLOUDINARY_API_SECRET="..."

📁 backend/.env (click to expand)

PORT=5001
AI_SERVICE_URL="http://localhost:8000"

3. Start All Services

⚠️ Run each in a separate terminal

# Terminal 1: AI Engine (Python)
cd ai-service
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py
# → Running on http://localhost:8000

# Terminal 2: Backend Proxy (Node.js)
cd backend
npm install
npm run dev
# → Running on http://localhost:5001

# Terminal 3: Frontend (Next.js)
cd frontend
npm install
npm run dev
# → Running on http://localhost:3000

4. Open & Use

Navigate to http://localhost:3000 → Upload a video → Watch AI magic happen! ✨

🌐 Deployment

🎨 Frontend

Vercel

Root: frontend/
Env: NEXT_PUBLIC_API_URL

🔌 Backend

Render (Node.js)

Root: backend/
Env: AI_SERVICE_URL

🧠 AI Engine

Render (Docker)

Root: ai-service/
Runtime: Docker

📁 Project Structure

ClipAI_Intern/
├── 🎨 frontend/                 # Next.js 14 + Tailwind CSS
│   ├── src/app/page.tsx         # Main UI (upload, progress, gallery)
│   ├── src/app/globals.css      # Design system (CSS variables)
│   └── package.json
│
├── 🔌 backend/                  # Node.js Express Proxy
│   ├── server.js                # Upload handling, AI service proxy
│   └── package.json
│
├── 🧠 ai-service/               # Python FastAPI AI Engine
│   ├── main.py                  # Core pipeline (Whisper → LLaMA → SDXL → FFmpeg)
│   ├── Dockerfile               # Docker config for Render deployment
│   └── requirements.txt         # Python dependencies
│
└── 📄 README.md                 # You are here!

🤝 Contributing

Contributions are welcome! Here's how to get involved:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Ideas for Contribution

Area	Idea	Difficulty
🎨	Add more visual styles (Watercolor, Pixel Art)	🟢 Easy
🔊	Add background music overlay	🟡 Medium
📊	Redis/Celery job queue for scaling	🟡 Medium
🎥	AI video generation (instead of images)	🔴 Hard
🌐	Multi-language subtitle support	🟡 Medium

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

⭐ Star this repo if you found it useful!

Built with ❤️ using AI, FFmpeg & Whisper

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
ai-service		ai-service
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
cdb802e0.mp4		cdb802e0.mp4
deploy.zip		deploy.zip
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 What is ClipAI?

🏗️ System Architecture

✨ Features

🤖 AI-Powered Pipeline

🎨 User Controls

⚡ Performance

🚀 Production Ready

🛠️ Tech Stack

🔄 AI Pipeline Deep Dive

Step-by-Step Breakdown

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Configure Environment Variables

3. Start All Services

4. Open & Use

🌐 Deployment

🎨 Frontend

🔌 Backend

🧠 AI Engine

📁 Project Structure

🤝 Contributing

Ideas for Contribution

📄 License

⭐ Star this repo if you found it useful!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 What is ClipAI?

🏗️ System Architecture

✨ Features

🤖 AI-Powered Pipeline

🎨 User Controls

⚡ Performance

🚀 Production Ready

🛠️ Tech Stack

🔄 AI Pipeline Deep Dive

Step-by-Step Breakdown

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Configure Environment Variables

3. Start All Services

4. Open & Use

🌐 Deployment

🎨 Frontend

🔌 Backend

🧠 AI Engine

📁 Project Structure

🤝 Contributing

Ideas for Contribution

📄 License

⭐ Star this repo if you found it useful!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages