diff --git a/GenAI.md b/GenAI.md
index 3c1fd31b..c7e10cf9 100644
--- a/GenAI.md
+++ b/GenAI.md
@@ -1,3 +1,16 @@
+
+
+### π€ Submitted by: **Nithin N**
+
+> β οΈ **Note:** My primary GitHub account is **[Nithin9585](https://github.com/Nithin9585)** β due to login issues, this submission is made from a secondary account.
+> To verify my profile, projects, and work history, please visit:
+>
+> ## π [github.com/Nithin9585](https://github.com/Nithin9585)
+
+
+
+---
+
# GenAI Assignment:
**Evaluation Criteria**
@@ -26,7 +39,7 @@ No code required. We want a **clear, practical proposal** with architecture and
### Your Solution for problem 1:
-You need to put your solution here.
+[**> Click here to view the Architectural Proposal (Solution_1_Video_Notes.md)**](./Solution_1_Video_Notes.md)
## Problem 2: **Zero-Shot Prompt to generate 3 LinkedIn Post**
@@ -36,7 +49,7 @@ Design a **single zero-shot prompt** that takes a userβs persona configuration
### Your Solution for problem 2:
-You need to put your solution here.
+[**> Click here to view the Prompt Design (Solution_2_LinkedIn.md)**](./Solution_2_LinkedIn.md)
## Problem 3: **Smart DOCX Template β Bulk DOCX/PDF Generator (Proposal + Prompt)**
@@ -45,7 +58,7 @@ Users have many Word documents that act like templates (offer letters, certifica
We want a system that:
1. Converts an uploaded **DOCX** into a reusable **template** by identifying editable fields.
-2. Supports **single generation** (form-fill β DOCX/PDF download).
+2. Supports **single generation** (form-fill β DOCX/PDF output).
3. Supports **bulk generation** via **Excel/Google Sheet** rows.
### **Task (No coding)**
@@ -54,7 +67,7 @@ Submit a **proposal** for building this system using GenAI (OpenAI/Gemini) for
### Your Solution for problem 3:
-You need to put your solution here.
+[**> Click here to view the Template Engine Proposal (Solution_3_Doc_Template.md)**](./Solution_3_Doc_Template.md)
## Problem 4: Architecture Proposal for 5-Min Character Video Series Generator
@@ -66,4 +79,4 @@ Create a **small, clear architecture proposal** (no code, no prompts) describing
### Your Solution for problem 4:
-You need to put your solution here.
+[**> Click here to view the Character Video Architecture (Solution_4_Character_Video.md)**](./Solution_4_Character_Video.md)
diff --git a/Solution_1_Video_Notes.md b/Solution_1_Video_Notes.md
new file mode 100644
index 00000000..8155f4fa
--- /dev/null
+++ b/Solution_1_Video_Notes.md
@@ -0,0 +1,221 @@
+# Solution: Proposal for "Video-to-Notes" Platform
+
+## The Problem
+
+We have a local folder of long videos (3β4 hours each, 200MB+). We need an automated pipeline to generate a **Summary Package** per video:
+- `Summary.md` β structured notes with key takeaways
+- **Highlight clips** β short video segments of key moments
+- **Screenshots** β frames from important slides/moments
+
+---
+
+## Approach Comparison
+
+### Approach 1: Online/Cloud-Based SaaS (e.g., Pictory, ScreenApp, Exemplary.ai)
+
+```mermaid
+graph LR
+ User[User] -->|Upload 5GB+ per video| Cloud[SaaS Platform]
+ Cloud -->|Black-Box AI| Output[Summary + Clips]
+ Output -->|Download| User
+```
+
+**How it works:** Upload videos to a third-party platform. The platform transcribes, summarizes, and generates clips automatically.
+
+| Factor | Assessment |
+|--------|-----------|
+| File Size | [NO] Upload bottleneck β uploading 200MBβ9GB per video is slow and fragile |
+| Duration | [NO] Most platforms cap at 3 hours (ScreenApp Business Plan) β our 3β4hr videos may fail |
+| Batch Processing | [NO] No bulk automation β manual upload per file via browser |
+| Customization | [NO] Black-box AI optimized for "viral" clips, not technical/informational content |
+| Cost | [NO] Subscription-based; 10 Γ 4hr videos = 2,400 min, exceeds most Pro plan limits |
+
+**Verdict: REJECTED** β Upload friction, duration limits, and no batch control make this unworkable.
+
+---
+
+### Approach 2: Hybrid Architecture β Local Processing + Cloud AI -- RECOMMENDED
+
+```mermaid
+graph LR
+ A[Local Video 2GB+] -->|FFmpeg: Extract Audio| B[Audio File ~60MB]
+ B -->|Upload only audio| C[Deepgram STT API]
+ C -->|Transcript + Timestamps| D[Claude 3.5 Sonnet LLM]
+ D -->|Structured JSON| E[Local FFmpeg]
+ A -->|Original quality source| E
+ E --> F[Clips + Screenshots + Summary.md]
+```
+
+**How it works:**
+1. **Local FFmpeg** extracts only the audio from each video (Opus codec, 64kbps -> ~60MB for 4hrs)
+2. **Deepgram API** transcribes the audio with word-level timestamps (~12 sec per hour of audio)
+3. **Claude 3.5 Sonnet** (200k token context) reads the full transcript and returns a JSON with summary + highlight timestamps
+4. **Local FFmpeg** cuts clips and screenshots from the original high-quality video using those timestamps
+
+**Full Pipeline (Detailed):**
+
+```mermaid
+graph TD
+ subgraph "Phase 1: Ingestion"
+ Start([Start Batch]) --> Scan[Scan Input Folder]
+ Scan --> Check{Valid File?}
+ Check -- No --> LogError[Log to skipped.csv]
+ Check -- Yes --> FFprobe[Extract Metadata via ffprobe]
+ end
+
+ subgraph "Phase 2: Audio Extraction"
+ FFprobe --> Extract["FFmpeg: Extract Opus Audio (-vn -acodec libopus -b:a 64k)"]
+ Extract --> AudioFile(output_audio.opus ~60MB)
+ end
+
+ subgraph "Phase 3: Transcription"
+ AudioFile --> Deepgram["Deepgram Nova-2 API (diarize + timestamps)"]
+ Deepgram --> Transcript[Full Transcript + Word Timestamps JSON]
+ end
+
+ subgraph "Phase 4: Intelligence"
+ Transcript --> LLM["Claude 3.5 Sonnet (200k context window)"]
+ LLM --> Analysis[Structured JSON: Summary + Highlight Segments]
+ end
+
+ subgraph "Phase 5 & 6: Asset Production"
+ Analysis --> Cut["FFmpeg: Cut Clips (-ss start -t duration)"]
+ Analysis --> Snap["FFmpeg: Screenshots (-vframes 1)"]
+ Cut --> Assets[assets folder]
+ Snap --> Assets
+ Assets --> Assemble[Generate Summary.md via Jinja2]
+ end
+
+ Assemble --> End([Done])
+```
+
+| Factor | Assessment |
+|--------|-----------|
+| File Size | [YES] Only ~60MB audio uploaded (97% bandwidth reduction) |
+| Duration | [YES] No limit β Claude 3.5 handles 200k tokens (full 4hr transcript) |
+| Batch Processing | [YES] Python script with retry logic, state persistence, skips corrupt files |
+| Customization | [YES] Full control over prompt β prioritize technical/informational content |
+| Cost | ~$1.50 per 4hr video (Deepgram $0.0043/min + Claude API) |
+
+**Verdict: RECOMMENDED** β Solves the bandwidth problem (audio extraction) and the context problem (200k token LLM).
+
+---
+
+### Approach 3: Fully Offline β Open-Source Models (Faster-Whisper + Llama 3)
+
+```mermaid
+graph LR
+ A[Local Video] -->|Faster-Whisper on GPU| B[Local Transcript]
+ B -->|Llama 3 70B| C[Local JSON Summary]
+ C -->|FFmpeg| D[Clips + Screenshots]
+```
+
+**How it works:** Run everything locally β Faster-Whisper for transcription, Llama 3 70B for summarization, FFmpeg for asset generation. Zero data leaves the machine.
+
+| Factor | Assessment |
+|--------|-----------|
+| File Size | [YES] No upload needed |
+| Duration | [WARN] Llama 3 70B needs 40GB VRAM (dual GPU or A6000 $4,000+) |
+| Batch Processing | [WARN] Prone to OOM crashes on long files; requires chunking (lossy summaries) |
+| Customization | [YES] Full control |
+| Cost | [WARN] High CapEx (hardware); $0 per-run after setup |
+| Privacy | [YES] Air-gapped β no data leaves premises |
+
+**Verdict: CONDITIONAL -- Viable only if data is classified** β Requires enterprise GPU hardware. Smaller models (8B) hallucinate timestamps and lose context on 4hr videos.
+
+---
+
+## Strategic Recommendation Summary
+
+| Feature | SaaS (Cloud Only) | **Hybrid (Local + API)** | Offline (Local Only) |
+|---|---|---|---|
+| Data Movement | [NO] Upload GBs | [YES] Upload MBs (audio only) | [YES] Zero transfer |
+| Long Context (4hr) | [NO] Often capped <3hrs | [YES] 200k+ tokens | [WARN] Hardware limited |
+| Cost Efficiency | [NO] High subscriptions | ~$1.50/video | [WARN] High CapEx |
+| Privacy | [NO] 3rd party storage | [WARN] Transient API calls | [YES] Air-gapped |
+| Batch Automation | [NO] Manual uploads | [YES] Fully scripted | [WARN] OOM risk |
+| **Recommendation** | **Reject** | ** Adopt** | **Reject (unless classified)** |
+
+---
+
+## JSON Schema (LLM Output Contract)
+
+The LLM must return a strict JSON so FFmpeg commands can be generated reliably:
+
+```json
+{
+ "meta": {
+ "title": "Q3 All-Hands Meeting",
+ "main_topics": ["Financials", "Roadmap", "Q&A"]
+ },
+ "summary_content": {
+ "executive_summary": "200-300 word overview...",
+ "key_takeaways": ["Insight 1", "Insight 2"],
+ "action_items": ["Follow up on budget", "Schedule roadmap review"]
+ },
+ "segments": [
+ {
+ "id": "seg_001",
+ "timestamp_start": "00:15:20",
+ "timestamp_end": "00:18:45",
+ "segment_title": "Q3_Financials_Overview",
+ "description": "CFO presents Q3 revenue breakdown",
+ "reasoning": "High information density β key financial decision point",
+ "assets_to_generate": { "clip": true, "screenshot": false }
+ }
+ ]
+}
+```
+
+**Key design decisions:**
+- `timestamp_start/end` enforced as `HH:MM:SS` regex β FFmpeg rejects any other format
+- `reasoning` field forces Chain-of-Thought, reducing hallucinated timestamps
+- `assets_to_generate` flags let the LLM decide: not every moment needs a 50MB clip
+
+---
+
+## Zero-Shot Prompt (The LLM Instruction)
+
+```
+You are a Senior Technical Archivist. Process the transcript below into a structured JSON knowledge artifact.
+
+RULES (Anti-Hallucination Protocol):
+1. Only use timestamps that exist verbatim in the transcript. Never guess.
+2. Add a 10-second pad: subtract 10s from start, add 10s to end of each clip.
+3. Clips must be 30 secondsβ3 minutes long.
+4. Prioritize: technical demos, decisions, debates, conclusions. Skip banter/logistics.
+5. Output ONLY valid JSON. No markdown fencing, no preamble.
+
+PROCESS:
+1. Scan the full transcript to map the video structure.
+2. Identify 5β10 highlight candidates.
+3. Verify timestamps exist in the source text.
+4. Output the JSON.
+
+[TRANSCRIPT BELOW]
+```
+
+**Why Zero-Shot?** Few-shot examples waste context window tokens. With a 4hr transcript (40k tokens), we need every token for the actual content. Claude 3.5 follows detailed zero-shot instructions reliably.
+
+---
+
+## Bulk Processing & Error Handling
+
+**Resilience features:**
+- `ffprobe` validates each file before processing β corrupt files logged to `skipped.csv`, batch continues
+- API calls wrapped in exponential backoff retry (2s 4s 8s, max 5 retries)
+- `job_status.json` tracks completed videos β if script crashes at video #49, it resumes at #50
+
+**Output structure:**
+```
+Output/
+ 2024-11-05_Q3_All_Hands/
+ Summary.md
+ manifest.json
+ assets/
+ Clip_01_Financials_00-15-20.mp4
+ Clip_02_Roadmap_01-10-00.mp4
+ Screenshot_01_Slide_A.jpg
+```
+
+**Batch Report** generated at end: `Batch_Report.csv` with filename, duration, status, cost estimate per video.
diff --git a/Solution_2_LinkedIn.md b/Solution_2_LinkedIn.md
new file mode 100644
index 00000000..6e890a93
--- /dev/null
+++ b/Solution_2_LinkedIn.md
@@ -0,0 +1,234 @@
+# Solution: Zero-Shot Prompt for LinkedIn Post Generation
+
+## The Task
+
+Design a **single zero-shot prompt** that takes a user's persona configuration + a topic and generates **3 LinkedIn post drafts in 3 distinct styles**, each aligned to the user's voice. Output must be structured JSON so the app can display the 3 drafts.
+
+---
+
+## The 3 Post Styles
+
+| Style | Name | Goal | Hook Type |
+|---|---|---|---|
+| Style 1 | Personal Narrative | Empathy & Trust | Vulnerability / "I" statement |
+| Style 2 | Actionable Listicle | Saves & Utility | Value promise ("X steps to...") |
+| Style 3 | Contrarian Insight | Comments & Debate | Pattern interruption / myth-busting |
+
+---
+
+## JSON Output Schema
+
+The LLM must return this exact structure so the app can render 3 draft cards:
+
+```json
+{
+ "meta": {
+ "topic": "string",
+ "persona_analysis": "string β how the AI interpreted the voice settings"
+ },
+ "posts": [
+ {
+ "style_id": "narrative",
+ "style_label": "Personal Narrative",
+ "hook": "string β first 2-3 lines (the 'See More' bait)",
+ "body": "string β full post body using \\n for line breaks",
+ "cta": "string β closing call to action",
+ "hook_analysis": "string β why this hook works for the persona",
+ "estimated_length_words": 150
+ },
+ {
+ "style_id": "listicle",
+ "style_label": "Actionable Listicle",
+ "hook": "string",
+ "body": "string",
+ "cta": "string",
+ "hook_analysis": "string",
+ "estimated_length_words": 120
+ },
+ {
+ "style_id": "contrarian",
+ "style_label": "Contrarian Insight",
+ "hook": "string",
+ "body": "string",
+ "cta": "string",
+ "hook_analysis": "string",
+ "estimated_length_words": 130
+ }
+ ]
+}
+```
+
+**Critical JSON rules:**
+- Use `\n` (literal backslash-n) for line breaks inside strings β never actual newlines
+- No markdown fencing, no preamble β raw JSON only
+- `hook_analysis` forces Chain-of-Thought reasoning before committing to the hook text
+
+---
+
+## The Zero-Shot Prompt
+
+This is the full system prompt to paste into the API call:
+
+```
+ROLE
+You are an elite LinkedIn Personal Brand Strategist and Ghostwriter.
+Your job: take a Topic and User Persona and generate 3 high-fidelity LinkedIn post drafts.
+
+HARD RULES
+1. Output ONLY raw valid JSON. No markdown fencing, no preamble, no explanation.
+2. Use the literal characters \n for line breaks inside JSON strings. Never break the line.
+3. Adopt the user's exact voice. Do not revert to generic AI tone.
+4. Each post must be meaningfully different in structure and hook β not just paraphrases.
+5. Do NOT use: "In today's fast-paced world", "game-changer", "synergy", or generic buzzwords.
+6. Do NOT use hashtags in the hook.
+
+INPUT FORMAT
+You will receive a JSON object with:
+- topic: the subject to write about
+- persona.role: the user's professional identity
+- persona.tone: array of tone adjectives (e.g. ["direct", "analytical"])
+- persona.experience: years of experience
+- persona.formatting: "emojis" or "no-emojis"
+- persona.dos: content guidelines to follow
+- persona.donts: content guidelines to avoid
+
+VOICE CALIBRATION
+- Tone "direct/no-nonsense" β short sentences, no adjectives, zero emojis
+- Tone "empathetic/coach" β softer transitions, question marks, moderate emojis
+- Role "executive" β high-level strategy, avoid tactical weeds
+- Role "builder/engineer" β technical accuracy, specifics over fluff
+- Experience "10+ years" β speak with authority; "junior" β speak with enthusiasm
+
+STYLE DEFINITIONS
+
+STYLE 1 β PERSONAL NARRATIVE (Framework: SLA β Story, Lesson, Application)
+- Hook: Cold open. Start mid-action. Use "I" statements. Vulnerability or failure.
+ Example pattern: "I [did X]. It [went wrong/changed everything]."
+- Body: Chronological micro-story. Short paragraphs. Emotional arc.
+- Takeaway: One universal lesson from the story.
+- Grammar: First-person singular throughout.
+
+STYLE 2 β ACTIONABLE LISTICLE (Framework: EDF β Educational Framework)
+- Hook: Specific value promise with a number.
+ Example pattern: "X [things/steps/mistakes] that [outcome]:"
+- Body: Vertical list. Each item on its own line. One idea per bullet. No fluff.
+- CTA: Tell the reader what to do with this list (save it, share it, try #1 today).
+- Grammar: Second-person ("you") or imperative voice.
+
+STYLE 3 β CONTRARIAN INSIGHT (Framework: CA β Contrarian Approach)
+- Hook: Challenge a widely-held belief or "best practice" directly.
+ Example pattern: "Stop [doing X]. Here's why it's hurting you."
+- Body: Dismantle the myth with logic or data. Offer the "new way."
+- Tone: Firm, authoritative, slightly polarizing β but professional, not aggressive.
+- Grammar: Short, punchy lines. One sentence per paragraph.
+
+CHAIN-OF-THOUGHT PROCESS (internal β do not output this)
+1. Read the persona. Identify 3 linguistic rules to apply (vocabulary, sentence length, emoji use).
+2. Read the topic. Identify the core insight, a personal angle, and a contrarian angle.
+3. For each style, write the hook_analysis first, then write the post.
+4. Verify: Do all 3 posts sound like the same person but look structurally different?
+5. Verify: Is the JSON valid? Are all line breaks escaped as \n?
+
+OUTPUT SCHEMA
+Return exactly this JSON structure with all 3 posts populated:
+
+{
+ "meta": {
+ "topic": "...",
+ "persona_analysis": "..."
+ },
+ "posts": [
+ {
+ "style_id": "narrative",
+ "style_label": "Personal Narrative",
+ "hook": "...",
+ "body": "...",
+ "cta": "...",
+ "hook_analysis": "...",
+ "estimated_length_words": 0
+ },
+ {
+ "style_id": "listicle",
+ "style_label": "Actionable Listicle",
+ "hook": "...",
+ "body": "...",
+ "cta": "...",
+ "hook_analysis": "...",
+ "estimated_length_words": 0
+ },
+ {
+ "style_id": "contrarian",
+ "style_label": "Contrarian Insight",
+ "hook": "...",
+ "body": "...",
+ "cta": "...",
+ "hook_analysis": "...",
+ "estimated_length_words": 0
+ }
+ ]
+}
+```
+
+---
+
+## How the App Uses This Prompt
+
+The app injects the user's persona + topic as the user message:
+
+```python
+import json
+
+SYSTEM_PROMPT = "..." # The full prompt above
+
+user_input = {
+ "topic": "The future of remote work for creative agencies",
+ "persona": {
+ "role": "Agency Founder",
+ "tone": ["direct", "ambitious"],
+ "experience": "15 years",
+ "formatting": "no-emojis",
+ "dos": ["share real lessons", "use specific numbers"],
+ "donts": ["avoid corporate jargon", "no motivational fluff"]
+ }
+}
+
+user_message = f"Generate 3 LinkedIn posts for this input:\n{json.dumps(user_input, indent=2)}"
+
+# OpenAI
+response = client.chat.completions.create(
+ model="gpt-4o",
+ messages=[
+ {"role": "system", "content": SYSTEM_PROMPT},
+ {"role": "user", "content": user_message}
+ ],
+ response_format={"type": "json_object"} # Forces valid JSON output
+)
+
+# Gemini
+response = client.generate_content(
+ contents=user_message,
+ generation_config={"response_mime_type": "application/json"}
+)
+```
+
+---
+
+## Why Zero-Shot Works Here
+
+- **No examples needed** β the style definitions (SLA, EDF, CA) are precise enough to guide structure
+- **Saves context window** β few-shot examples would consume tokens needed for the transcript/topic
+- **Chain-of-thought via `hook_analysis`** β forces the model to reason before generating, reducing hallucination
+- **Negative constraints** β "Do NOT use..." instructions are more effective than positive ones at preventing generic output
+
+---
+
+## Error Handling
+
+| Failure | Detection | Fix |
+|---|---|---|
+| Invalid JSON | Pydantic/Zod parse error | Retry with error message: "Invalid JSON at line X. Regenerate." |
+| All 3 posts sound the same | Style-bleed | Add to prompt: "Verify posts are structurally distinct before outputting." |
+| Contrarian post is too soft | Safety alignment | Reframe as "professional debate" not "attack" |
+| Line breaks broken | Literal newline in JSON | Enforce `\n` rule in prompt + post-process: `content.replace('\n', '\\n')` |
+
+---
diff --git a/Solution_3_Doc_Template.md b/Solution_3_Doc_Template.md
new file mode 100644
index 00000000..647ca0d8
--- /dev/null
+++ b/Solution_3_Doc_Template.md
@@ -0,0 +1,202 @@
+# Solution: Smart DOCX Template β Bulk DOCX/PDF Generator
+
+## The Task (from GenAI.md)
+
+Build a system that:
+1. Converts an uploaded DOCX into a reusable template by **auto-detecting editable fields using GenAI**
+2. Supports **single generation** (form-fill β DOCX/PDF output)
+3. Supports **bulk generation** via Excel/Google Sheet rows
+
+No code required β practical design using GenAI (OpenAI/Gemini) for field detection and schema generation.
+
+---
+
+## System Architecture
+
+```mermaid
+graph TD
+ subgraph "Step 1: Template Creation"
+ Upload[User uploads DOCX] --> Cleaner[XML Run Merger]
+ Cleaner --> LLM[GenAI Field Detector]
+ LLM --> Schema[JSON Field Schema]
+ Schema --> UI[Smart Mapper UI]
+ end
+
+ subgraph "Step 2: Single Generation"
+ UI --> Form[Dynamic Web Form]
+ Form --> Engine[Jinja2 Template Engine]
+ Engine --> Gotenberg[PDF Renderer - Gotenberg]
+ Gotenberg --> Output[DOCX / PDF Download]
+ end
+
+ subgraph "Step 3: Bulk Generation"
+ Sheet[Excel / Google Sheet] --> Validator[Schema Validator]
+ Validator --> Queue[Task Queue - Redis/Celery]
+ Queue --> Workers[Worker Pool]
+ Workers --> Gotenberg
+ Workers --> S3[S3 Storage]
+ S3 --> ZIP[ZIP Bundle + Report.csv]
+ end
+```
+
+---
+
+## The Core GenAI Role: Field Detection
+
+The hardest part of this system is **automatically identifying which parts of a DOCX are dynamic fields**. This is where GenAI is used.
+
+### The Problem: Split Runs in DOCX XML
+
+When a user types `{{CandidateName}}` in Word, the internal XML often looks like this due to autocorrect/spell-check interruptions:
+
+```xml
+
+ {{Candidate
+ Na
+ me}}
+
+```
+
+A regex search for `{{CandidateName}}` fails. The system must first **merge split runs**, then scan.
+
+### GenAI Field Detection Prompt
+
+After XML cleaning, the plain text of the document is sent to the LLM:
+
+```
+ROLE
+You are a document analysis engine. Analyze the following document text and identify all dynamic fields that should be replaced per-recipient.
+
+RULES
+1. Identify explicit placeholders: {{FieldName}}, [FieldName], , or ALL_CAPS_WORDS used as variables.
+2. Identify implicit fields: dates, names, amounts, addresses that appear to be instance-specific.
+3. For each field, infer its data type: String, Date, Currency, Email, Boolean, List.
+4. Identify any repeating blocks (e.g., invoice line items) as Loop fields.
+5. Identify any conditional blocks (e.g., "If EU client, include GDPR clause") as Boolean fields.
+6. Output ONLY valid JSON. No explanation.
+
+OUTPUT SCHEMA:
+{
+ "fields": [
+ {
+ "name": "string β camelCase field name",
+ "label": "string β human readable label",
+ "type": "String | Date | Currency | Email | Boolean | Number",
+ "required": true,
+ "detected_from": "string β the exact text that triggered detection",
+ "description": "string β hint for the user filling the form"
+ }
+ ],
+ "loops": [
+ {
+ "name": "string β loop variable name",
+ "description": "string β what each row represents",
+ "fields": ["field1", "field2"]
+ }
+ ],
+ "conditionals": [
+ {
+ "name": "string β boolean flag name",
+ "description": "string β what block this controls"
+ }
+ ]
+}
+
+DOCUMENT TEXT:
+[DOCUMENT PLAIN TEXT HERE]
+```
+
+### Example Output
+
+For an offer letter containing `Dear {{CandidateName}}`, `Start Date: [StartDate]`, and a salary table:
+
+```json
+{
+ "fields": [
+ { "name": "candidateName", "label": "Candidate Full Name", "type": "String", "required": true, "detected_from": "{{CandidateName}}", "description": "Enter the candidate's full legal name" },
+ { "name": "startDate", "label": "Start Date", "type": "Date", "required": true, "detected_from": "[StartDate]", "description": "First day of employment" },
+ { "name": "salary", "label": "Annual Salary", "type": "Currency", "required": true, "detected_from": "{{Salary}}", "description": "Gross annual compensation in USD" },
+ { "name": "includeRelocation", "label": "Include Relocation Package?", "type": "Boolean", "required": false, "detected_from": "Relocation Allowance clause", "description": "Toggle to include/exclude relocation terms" }
+ ],
+ "loops": [],
+ "conditionals": [
+ { "name": "includeRelocation", "description": "Entire relocation package paragraph" }
+ ]
+}
+```
+
+---
+
+## Template Engine: How Fields Get Injected
+
+The DOCX template uses **Jinja2 syntax** (via `python-docx-template`):
+
+| Use Case | Syntax in DOCX |
+|---|---|
+| Simple field | `{{ candidateName }}` |
+| Date formatting | `{{ startDate \| date_format }}` |
+| Currency formatting | `{{ salary \| currency }}` |
+| Conditional block | `{%p if includeRelocation %}...{%p endif %}` |
+| Table row loop | `{%tr for item in lineItems %}...{%tr endfor %}` |
+
+The system wraps the cleaned XML with these tags based on the GenAI-detected schema.
+
+---
+
+## Single Generation Flow
+
+1. User uploads DOCX β GenAI detects fields β JSON schema saved
+2. App renders a **dynamic web form** from the schema (date pickers for Date fields, currency inputs for Currency, toggles for Boolean)
+3. User fills form β app injects data into template β sends to **Gotenberg** (Dockerized LibreOffice) for PDF conversion
+4. User downloads DOCX + PDF
+
+---
+
+## Bulk Generation Flow
+
+```mermaid
+graph LR
+ A[Upload Excel / Connect Google Sheet] --> B[Validate columns against schema]
+ B --> C{All rows valid?}
+ C -- No --> D[Pre-flight Report: show errors before running]
+ C -- Yes --> E[Push one task per row to Redis queue]
+ E --> F[Worker pool: inject data + render PDF per row]
+ F --> G[Upload PDFs to S3]
+ G --> H[Stream ZIP + Generation_Report.csv to user]
+```
+
+**Key design decisions:**
+- **Pre-flight validation** before any generation starts β show all errors upfront, not mid-job
+- **Fan-out architecture** β each row is an independent task; one failure doesn't kill the batch
+- **Streaming ZIP** β PDFs piped directly from S3 into ZIP stream; server never holds full file in RAM
+- **Generation_Report.csv** β lists every row with status (Success/Failed) and error reason for failed rows
+
+---
+
+## File Naming
+
+User defines a naming pattern during template setup using the same field names:
+
+```
+Pattern: {{ candidateName }}_OfferLetter_{{ startDate }}.pdf
+Output: John-Doe_OfferLetter_2024-03-15.pdf
+```
+
+**Sanitization:** `/`, `\`, `:`, `*` and other illegal characters in field values are replaced with `-` before filename construction. Duplicates get `_1`, `_2` suffixes.
+
+---
+
+## Technology Stack
+
+| Layer | Choice | Why |
+|---|---|---|
+| Field Detection | OpenAI GPT-4o / Gemini 1.5 Pro | Best at inferring field types from context |
+| Template Engine | `python-docx-template` (Jinja2) | Handles loops, conditionals natively in DOCX XML |
+| Excel Parsing | `python-calamine` | Rust-based, ~10x faster than pandas for large files |
+| PDF Rendering | Gotenberg (Dockerized LibreOffice) | Preserves fonts, tables, headers β no Word license needed |
+| Task Queue | Redis + Celery | Industry standard for Python async bulk jobs |
+| Storage | AWS S3 / MinIO | Lifecycle rules auto-delete temp files after 24hrs |
+| Google Sheets | Sheets API v4 + OAuth 2.0 | Batch fetch entire range in one API call |
+
+---
+
diff --git a/Solution_4_Character_Video.md b/Solution_4_Character_Video.md
new file mode 100644
index 00000000..d5021cb3
--- /dev/null
+++ b/Solution_4_Character_Video.md
@@ -0,0 +1,373 @@
+# Architecture: Character-Based Video Series Generator (5-Min Episodes)
+
+> **Real-World Reference Implementation:**
+> This architecture is based on **[LoraFrame (IDLock Engine)](https://github.com/Nithin9585/LoraFrame_)** β a project I built and presented at the **CENI AI Hackathon, Hyderabad** (Top 10 Finalist).
+> LoraFrame is a persistent character memory & video generation system that combines episodic memory, LLM reasoning, and identity-preservation technology to create "permanent digital actors" that maintain visual consistency across generated images and videos.
+> *(Note: This is my personal GitHub account β [Nithin9585](https://github.com/Nithin9585))*
+
+---
+
+## The Problem
+
+We need to generate a **5-minute episode video** using AI-generated characters, but **no current AI video model can produce 5 minutes in one shot**. State-of-the-art models (Veo 3.1, Runway Gen-4, Kling 3.0) cap out at **5β15 seconds per clip**. The core challenge is:
+
+1. **Generate many short clips** that are individually high-quality
+2. **Maintain character identity** (face, clothing, style) across all clips
+3. **Assemble clips** into a coherent 5-minute narrative with transitions, audio, and pacing
+4. **Remember character history** across episodes so a series feels continuous
+
+This proposal describes a system that solves all four problems.
+
+---
+
+## System Architecture (Core Engine)
+
+The following diagram shows the **LoraFrame IDLock Engine** β the core system that powers character creation, identity-locked generation, memory, and self-healing refinement.
+
+```mermaid
+graph TD
+
+ %% ===== CLIENT LAYER =====
+ U["User / Client App"] --> API["API Gateway (FastAPI)"]
+
+ %% ===== JOB QUEUE =====
+ API --> Q["Redis Queue (RQ)"]
+
+ %% ===== WORKERS =====
+ Q --> GEN["Generator Worker"]
+ Q --> REF["Refiner Worker"]
+ Q --> STA["State Analyzer Worker"]
+ Q --> COL["LoRA Collector Worker"]
+ Q --> TRN["LoRA Trainer Worker"]
+
+ %% ===== DATA LAYER =====
+ PG["Postgres Metadata DB"]
+ VDB["Vector DB (Embeddings)"]
+ OBJ["Object Storage (Images / Models)"]
+
+ %% ===== MEMORY =====
+ GEN --> PG
+ GEN --> VDB
+ STA --> PG
+ STA --> VDB
+
+ %% ===== PROMPT ENGINE =====
+ GEN --> LLM["LLM Prompt Engine"]
+
+ %% ===== LORA SYSTEM =====
+ GEN --> LR["LoRA Registry"]
+ LR --> GEN
+ COL --> TRN
+ TRN --> OBJ
+ TRN --> LR
+ LR --> PG
+
+ %% ===== GENERATION =====
+ GEN --> AI["Image / Video Generator"]
+ AI --> OBJ
+
+ %% ===== VALIDATION =====
+ AI --> VAL["Vision Validator (IDR)"]
+ VAL -->|Pass| STA
+ VAL -->|Fail| REF
+
+ %% ===== REFINEMENT LOOP =====
+ REF --> AI
+
+ %% ===== LORA DATA PIPELINE =====
+ STA --> COL
+
+ %% ===== ADMIN UI =====
+ API --> UI["Admin / Dashboard"]
+```
+
+### Component Breakdown
+
+| Component | Technology | Role |
+|---|---|---|
+| **API Gateway** | FastAPI (Python) | REST API for character creation, episode requests, job status |
+| **Redis Queue** | Redis / RQ | Async job dispatch to workers; decouples API from heavy GPU tasks |
+| **Generator Worker** | Veo 3.1 / Imagen 3 / Kling 3.0 | Produces identity-locked images/video clips per scene |
+| **Refiner Worker** | InsightFace + Inpainting | Self-healing loop β if IDR detects identity drift, re-generates the face region |
+| **State Analyzer** | LLM + Vector DB | Updates episodic memory after each generation (injuries, costume changes, mood) |
+| **LoRA Collector/Trainer** | SDXL LoRA training pipeline | Collects validated images β fine-tunes a character-specific LoRA adapter |
+| **LLM Prompt Engine** | Groq (Llama 3 70B) | Converts simple user prompts into rich, context-aware scene descriptions |
+| **Vision Validator (IDR)** | InsightFace + ONNX Runtime | Compares generated face embeddings against canonical reference β reject if similarity < threshold |
+| **LoRA Registry** | Postgres + Object Storage | Tracks which LoRA weights belong to which character; version-controlled |
+| **Postgres** | PostgreSQL (SQLAlchemy) | Stores character metadata, episode scripts, scene timelines, generation logs |
+| **Vector DB** | Pinecone / FAISS | Stores episodic memory embeddings for RAG-based character state retrieval |
+| **Object Storage** | GCS / S3 | Stores generated images, video clips, LoRA model weights |
+
+---
+
+## Long Video Assembly Pipeline (5-Minute Episodes)
+
+Since no single AI model can generate 5 minutes of video at once, we use a **Scene-by-Scene Generation + Assembly** pipeline. This is the key architectural addition that turns short AI clips into a full episode.
+
+```mermaid
+graph TD
+ subgraph "Phase 1: Script & Storyboard"
+ EP["Episode Prompt (user story)"] --> SCRIPT["LLM Script Engine"]
+ BIBLE["Series Bible (characters, relationships)"] --> SCRIPT
+ MEM["Episodic Memory (Vector DB)"] --> SCRIPT
+ SCRIPT --> SCENES["Scene Breakdown JSON"]
+ SCENES --> |"12-18 scenes Γ 15-25s each"| SB["Storyboard Plan"]
+ end
+
+ subgraph "Phase 2: Scene-by-Scene Generation"
+ SB --> LOOP["Scene Generation Loop"]
+ LOOP --> |"For each scene"| IDGEN["IDLock Generator (Core Engine)"]
+ IDGEN --> |"5-8s clip"| EXTEND["Scene Extension (last-frame chaining)"]
+ EXTEND --> |"15-25s clip"| VALID["IDR Validation"]
+ VALID --> |Pass| CLIP["Validated Scene Clip"]
+ VALID --> |Fail| REFINE["Refiner β Re-generate"]
+ REFINE --> IDGEN
+ end
+
+ subgraph "Phase 3: Audio Pipeline"
+ SCENES --> TTS["TTS Engine (per character voice)"]
+ SCENES --> MUSIC["Background Music / SFX Selection"]
+ TTS --> AUDIO["Scene Audio Tracks"]
+ MUSIC --> AUDIO
+ end
+
+ subgraph "Phase 4: Assembly & Post-Production"
+ CLIP --> ASSEMBLE["FFmpeg Video Assembler"]
+ AUDIO --> ASSEMBLE
+ ASSEMBLE --> TRANS["Transition Engine (cross-fade, cuts)"]
+ TRANS --> FINAL["Final 5-Min Episode MP4"]
+ FINAL --> QC["Quality Check (duration, lip-sync, continuity)"]
+ QC --> |Pass| DELIVER["Deliver to User"]
+ QC --> |Fail| RETRY["Flag scenes for regeneration"]
+ RETRY --> LOOP
+ end
+```
+
+### How the Long Video Pipeline Works
+
+#### Phase 1: Script & Storyboard Generation
+
+The user submits a **short episode prompt** (e.g., *"Ava discovers the secret lab; her mentor warns her about the consequences"*). The LLM Script Engine:
+
+1. **Loads the Series Bible** β character profiles, relationship maps, visual rules
+2. **Retrieves episodic memory** via RAG β what happened in previous episodes (Vector DB)
+3. **Generates a structured scene breakdown** β typically **12β18 scenes**, each 15β25 seconds long, totaling ~5 minutes
+
+Example Scene Breakdown JSON:
+```json
+{
+ "episode": {
+ "title": "The Hidden Lab",
+ "total_target_duration_sec": 300,
+ "scenes": [
+ {
+ "scene_id": "sc_001",
+ "description": "Establishing shot β Ava walks toward the abandoned building at dusk",
+ "characters": ["ava"],
+ "duration_sec": 20,
+ "camera": "wide tracking shot, golden hour lighting",
+ "dialogue": null,
+ "narration": "Ava had always been curious. But tonight, curiosity felt dangerous.",
+ "mood": "tense, mysterious"
+ },
+ {
+ "scene_id": "sc_002",
+ "description": "Close-up β Ava pushes open the heavy metal door, revealing blue lab lighting inside",
+ "characters": ["ava"],
+ "duration_sec": 15,
+ "camera": "close-up face, rack focus to door interior",
+ "dialogue": null,
+ "narration": null,
+ "mood": "suspenseful"
+ }
+ ]
+ }
+}
+```
+
+#### Phase 2: Scene-by-Scene Generation with Last-Frame Chaining
+
+This is the critical technique for producing **long, continuous video from short AI clips**:
+
+1. **Generate an initial 5β8 second clip** for each scene using the IDLock Generator (Veo 3.1 / Kling 3.0 API)
+2. **Extract the last frame** of the generated clip
+3. **Feed the last frame as a reference image** to the next generation call β this is **"Scene Extension" / "Last-Frame Chaining"**
+4. **Repeat 2β3 times** per scene to extend each scene to 15β25 seconds
+5. **IDR Validation** checks every clip β if the character's face drifts beyond the similarity threshold, the Refiner re-generates that clip
+
+This approach is inspired by how **Veo 3.1's SceneBuilder** and **Kling 3.0's multi-shot generation** work:
+
+| Technique | Source | Used For |
+|---|---|---|
+| **Last-Frame Chaining** | Veo 3.1 Scene Extension API | Extend a scene from 8s β 20s while maintaining visual continuity |
+| **Multi-Shot Generation** | Kling 3.0 MVL Architecture | Generate 2β6 distinct scenes with character consistency in a single session |
+| **Element Reference** | Kling 3.0 Character Reference 3.0 | Lock character identity across all shots using reference images |
+| **IDR Self-Healing** | LoraFrame (InsightFace) | If face similarity drops below 0.85, regenerate that specific clip |
+
+#### Phase 3: Audio Pipeline (Parallel)
+
+While video is being generated, the audio pipeline runs in parallel:
+
+- **TTS Engine** (ElevenLabs / Coqui XTTS) generates character-specific voice lines from the script
+- **Music Selection** picks background tracks matching mood tags (tense, joyful, dramatic)
+- **SFX Engine** adds ambient sounds (footsteps, door creaks, wind)
+
+Each character has a **fixed voice profile** stored in the Series Bible β ensuring the same voice across episodes.
+
+#### Phase 4: Assembly & Post-Production
+
+```
+FFmpeg Assembly Pipeline:
+1. Concatenate scene clips in order β raw_video.mp4
+2. Add cross-fade transitions (0.5s) β smooth_video.mp4
+3. Mix dialogue audio with music/SFX β mixed_audio.aac
+4. Merge video + audio β episode_final.mp4
+5. Validate total duration β 300 seconds β QC pass/fail
+```
+
+If QC fails (e.g., total duration is 247s instead of 300s), the system flags the shortest scenes and regenerates them with longer durations.
+
+---
+
+## Key System: Identity Persistence (IDLock)
+
+The biggest challenge in AI video series is **keeping the same character looking the same** across hundreds of generated clips. LoraFrame solves this with a multi-layer identity system:
+
+```
+βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+β IDLock Stack β
+βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
+β Layer 1: Reference Images (canonical face + angles) β
+β Layer 2: InsightFace Embeddings (512-d face vector) β
+β Layer 3: LoRA Weights (fine-tuned on character) β
+β Layer 4: Style Anchors (clothing, palette, props) β
+β Layer 5: Episodic Memory (RAG β what happened) β
+βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
+
+Generation Flow:
+User Prompt β LLM enriches with memory + style anchors
+ β Generator uses reference images + LoRA weights
+ β Output validated by InsightFace (cosine similarity β₯ 0.85)
+ β PASS: save to memory | FAIL: refine and retry (max 3 loops)
+```
+
+---
+
+## Episodic Memory: How Characters "Remember"
+
+Each generation event creates a **memory record** stored in the Vector DB:
+
+```json
+{
+ "character_id": "char_ava_001",
+ "episode": 3,
+ "scene": "sc_007",
+ "state": {
+ "clothing": "torn lab coat, no glasses",
+ "injuries": "bandaged left hand",
+ "mood": "determined but shaken",
+ "location": "underground lab corridor"
+ },
+ "embedding": [0.012, -0.445, 0.893, ...]
+}
+```
+
+Before generating a new scene, the **LLM Prompt Engine** runs a RAG query:
+- *"What was Ava wearing in the most recent scene?"*
+- Vector DB returns the latest state β LLM includes `"torn lab coat, bandaged left hand"` in the generation prompt
+- This ensures **visual continuity within and across episodes**
+
+---
+
+## Technology Stack
+
+### Backend (LoraFrame Engine)
+| Layer | Technology |
+|---|---|
+| Framework | Python 3.10+, FastAPI |
+| Database | PostgreSQL (SQLAlchemy) |
+| Cache / Queue | Redis (RQ workers) |
+| Vector DB | Pinecone / FAISS |
+| LLM Inference | Groq (Llama 3 70B/8B) |
+| Image/Video Gen | Google Veo 3.1, Imagen 3, Kling 3.0 (multi-provider) |
+| Identity Lock | InsightFace, ONNX Runtime |
+| LoRA Training | SDXL LoRA fine-tuning pipeline |
+| Storage | Google Cloud Storage (GCS) / AWS S3 |
+
+### Long Video Assembly Layer
+| Component | Technology |
+|---|---|
+| Scene Extension | Veo 3.1 Scene Extension API (last-frame chaining) |
+| Multi-Shot | Kling 3.0 Multi-Shot Generation |
+| TTS | ElevenLabs API / Coqui XTTS (self-hosted) |
+| Music/SFX | Mubert API / local library |
+| Video Assembly | FFmpeg (concat, transitions, audio merge) |
+| Quality Control | Duration validator + lip-sync checker |
+
+### Frontend
+| Layer | Technology |
+|---|---|
+| Framework | React.js |
+| Language | JavaScript |
+
+---
+
+## Project Structure (LoraFrame Backend)
+
+```
+cineAI/
+βββ app/
+β βββ api/ # API Routes (characters, generate, video, episodes)
+β βββ core/ # Config, Database, Redis setup
+β βββ models/ # SQLAlchemy Database Models
+β βββ schemas/ # Pydantic Request/Response Models
+β βββ services/ # Core Logic (Groq, Gemini, MemoryEngine)
+β βββ workers/ # Async Task Workers (generator, refiner, trainer, assembler)
+βββ assembly/
+β βββ script_engine/ # LLM-based scene breakdown generator
+β βββ chainer/ # Last-frame chaining / scene extension logic
+β βββ audio/ # TTS, music selection, SFX mixing
+β βββ ffmpeg_ops/ # FFmpeg concat, transitions, final render
+βββ scripts/ # Utility scripts
+βββ tests/ # Pytest suite
+βββ uploads/ # Local storage for dev
+βββ .env.example # Environment variable template
+βββ requirements.txt # Python dependencies
+βββ README.md
+```
+
+---
+
+## API Endpoints
+
+| Method | Endpoint | Description |
+|---|---|---|
+| `POST` | `/api/v1/characters` | Create a new character from reference images |
+| `POST` | `/api/v1/generate` | Generate a consistent image for a character |
+| `POST` | `/api/v1/video/generate` | Generate a single video scene (short clip) |
+| `POST` | `/api/v1/episodes/create` | Create a full 5-min episode from a story prompt |
+| `GET` | `/api/v1/episodes/{episode_id}/status` | Poll episode assembly progress |
+| `GET` | `/api/v1/episodes/{episode_id}/download` | Download the final episode MP4 + assets |
+| `GET` | `/api/v1/jobs/{job_id}` | Check individual generation job status |
+
+---
+
+## Live Deployment
+
+| Component | URL |
+|---|---|
+| **Frontend (Live App)** | [https://lore-frame-in.vercel.app](https://lore-frame-in.vercel.app) |
+| **Backend API (GCP)** | [https://cineai-api-4sjsy6xola-uc.a.run.app/docs](https://cineai-api-4sjsy6xola-uc.a.run.app/docs) |
+
+---
+
+## Why This Architecture Works for 5-Minute Videos
+
+| Challenge | Solution |
+|---|---|
+| AI models only generate 5β15s clips | **Scene-by-scene generation** with last-frame chaining extends each to 15β25s; 15 scenes = 5 min |
+| Character faces change between clips | **IDLock (InsightFace + LoRA)** validates every frame; self-healing refiner fixes drift |
+| Stories lack continuity across episodes | **Episodic Memory (RAG + Vector DB)** ensures characters "remember" past events and state |
+| Audio doesn't match video | **Parallel audio pipeline** with per-character voice profiles + mood-tagged music |
+| Quality varies across scenes | **Vision Validator + QC pipeline** β reject and regenerate below-threshold clips |
+| No single tool does everything | **Modular, multi-provider architecture** β swap Veo for Kling or Runway per scene as needed |