From 25b205caf1282193c2498b27a4e0ccbf526a154b Mon Sep 17 00:00:00 2001
From: Dinesh Kumar <154494542+Dineshchoudhary6229@users.noreply.github.com>
Date: Thu, 19 Feb 2026 19:34:15 +0530
Subject: [PATCH 1/2] Add backend architecture solutions for all problems

Updated the Backend.md document to reflect changes in problem solutions, including system design, schema, and security considerations for various projects.
---
 Backend.md | 162 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 148 insertions(+), 14 deletions(-)

diff --git a/Backend.md b/Backend.md
index 4f7ff501..69176bc6 100644
--- a/Backend.md
+++ b/Backend.md
@@ -15,7 +15,7 @@
 
 **Goal:** Upload video → async processing → outputs: transcript, **Summary.md**, highlights (timestamps), screenshot/clip references. [READ MORE ABOUT THE PROJECT](./Video-summary-platform.md)
 
-**Your solution must include**
+**My solution  includes**
 
 * **System design:** API + worker(s) + storage + external AI/transcription boundaries
 * **DB choice:** Postgres vs other (short justification)
@@ -24,9 +24,58 @@
 * **Job lifecycle:** queued → processing → success/failed (+ retry/idempotency rule)
 * **Storage layout:** how files/artifacts are stored + safe download strategy
 
-**Your Solution for problem 1:**
+**My Solution for problem 1:**
+## System Design
+API Service (Spring Boot) handles video upload, job creation, and artifact retrieval.  
+Worker Service processes jobs asynchronously using a queue (Redis/RabbitMQ).  
+PostgreSQL stores metadata.  
+Object storage (S3-compatible) stores videos and generated outputs.  
+External AI services handle transcription and summarization.
+
+Supports batch folder processing, chunked upload, and streaming for large files (200MB+, 3–4 hours).
+
+Flow:
+1. Upload video → create VideoAsset + Job (QUEUED)
+2. Worker picks job → PROCESSING → calls AI
+3. Stores transcript, summary, highlights → SUCCESS/FAILED
+
+## Database Choice
+PostgreSQL for strong relational modeling, indexing on job status, and JSONB for AI metadata.
+
+## Schema (Tables)
+User(id, email, created_at)  
+VideoAsset(id, user_id, storage_path, duration, created_at)  
+Job(id, video_id, status, retry_count, idempotency_key, created_at)  
+JobEvent(id, job_id, status, message, created_at)  
+Artifact(id, job_id, type, storage_path, created_at)  
+Highlight(id, job_id, timestamp_start, timestamp_end, text)
+
+## Constraints & Indexes
+FK: video_asset.user_id → user.id  
+FK: job.video_id → video_asset.id  
+Index on job(status)  
+Unique job.idempotency_key  
+Index on artifact.job_id
+
+## Job Lifecycle
+QUEUED → PROCESSING → SUCCESS | FAILED  
+Max 3 retries with idempotency check.
+
+## Storage Strategy
+/tenant/{userId}/videos/{videoId}.mp4  
+/tenant/{userId}/jobs/{jobId}/summary.md  
+
+Secure downloads via pre-signed URLs after auth validation.
+
+## Security
+User ownership validation and private object storage.
+
+## Reliability
+Job state machine, JobEvent logging, retries with backoff.
+
+## Cost & Scalability
+MVP single worker → v1 horizontal workers + S3 + queue partitioning.
 
-You need to put your solution here.
 
 ---
 
@@ -34,7 +83,7 @@ You need to put your solution here.
 
 **Goal:** Connect LinkedIn → store persona → generate drafts (handled by GenAI team) → approve → schedule → auto-post + audit logs. [READ MORE ABOUT THE PROJECT](./linkedin-automation.md)
 
-**Your solution must include**
+**My solution  includes**
 
 * **System design:** OAuth flow, token storage/refresh, scheduler/worker design
 * **Schema:** User, LinkedInAccount, Persona, Draft, Schedule, PostAttempt/PostLog
@@ -42,9 +91,32 @@ You need to put your solution here.
 * **Reliability:** retry posting, dedupe to prevent double-post, rate limiting
 * **Prompt/config storage proposal:** how backend stores “prompt versions” or “config packs” provided by GenAI team (DB vs repo vs hybrid, versioning + rollback)
 
-**Your Solution for problem 2:**
+**My Solution for problem 2:**
+
+## System Design
+OAuth connection to LinkedIn → store encrypted tokens.  
+Persona stored per user → Draft generation → Schedule → Worker auto-post → PostLog for audit.
+
+## Schema
+User(id, email)  
+LinkedInAccount(id, user_id, access_token_enc, refresh_token_enc, expires_at)  
+Persona(id, user_id, tone, topics)  
+Draft(id, user_id, content, status)  
+Schedule(id, draft_id, scheduled_time, status)  
+PostLog(id, schedule_id, status, response, created_at)
+
+## Security
+AES-256 encrypted tokens, least-privilege scopes, per-user access control.
+
+## Reliability
+Dedupe key = hash(content + scheduled_time)  
+Retry with exponential backoff  
+Rate limiting per user  
+Automatic token refresh via worker
+
+## Prompt Storage
+PromptConfig(id, version, content, created_at) for versioning and rollback.
 
-You need to put your solution here.
 
 ---
 
@@ -52,7 +124,7 @@ You need to put your solution here.
 
 **Goal:** Upload DOCX template → detect fields → single fill export → bulk fill via CSV/Sheet → ZIP + per-row report. [READ MORE ABOUT THE PROJECT](./docs-template-output-generation.md)
 
-**Your solution must include**
+**My solution includes**
 
 * **System design:** template ingestion, field extraction service, bulk job worker, export service
 * **Schema:** Template, TemplateVersion, TemplateField, BulkRun, BulkRow, Artifact, JobEvent
@@ -60,9 +132,31 @@ You need to put your solution here.
 * **Reliability:** partial success handling, per-row status, retries, resumable bulk run
 * **Security:** template isolation per tenant/user, safe downloads, anti-path traversal
 
-**Your Solution for problem 3:**
+**My Solution for problem 3:**
+
+## System Design
+Template upload → extract fields → store TemplateVersion + TemplateFields.  
+CSV upload → BulkRun → Worker processes rows → generate DOCX/PDF → ZIP export.
+
+## Schema
+Template(id, user_id, name)  
+TemplateVersion(id, template_id, storage_path, created_at)  
+TemplateField(id, template_version_id, field_name)  
+BulkRun(id, template_version_id, status, total_rows, processed_rows)  
+BulkRow(id, bulk_run_id, status, output_artifact_id)  
+Artifact(id, storage_path, type)  
+JobEvent(id, bulk_run_id, message)
+
+## Storage Strategy
+CSV input, generated docs, and ZIP stored in object storage.  
+Temporary files deleted after ZIP creation.
+
+## Reliability
+Per-row status, resumable using processed_rows, partial success supported.
+
+## Security
+Tenant isolation via user_id, signed URLs, path traversal prevention.
 
-You need to put your solution here.
 
 ---
 
@@ -70,7 +164,7 @@ You need to put your solution here.
 
 **Goal:** Define characters once (image + traits + relationships). For each episode story → output episode package (script/scenes/assets plan/render plan), optionally render. [READ MORE ABOUT THE PROJECT](./char-based-video-generation.md)
 
-**Your solution must include**
+**My solution includes**
 
 * **System design:** episodic pipeline as jobs, asset management, consistency strategy storage
 * **Schema:** Character, Relationship, Episode, Scene, Asset, RenderJob, Artifact
@@ -78,9 +172,30 @@ You need to put your solution here.
 * **Storage:** images/audio/video assets, versioning, dedupe strategy
 * **Security + cost controls:** quotas, rate limits, large asset constraints
 
-**Your Solution for problem 4:**
+**My Solution for problem 4:**
+
+## System Design
+Characters defined once with traits and assets.  
+Episode pipeline: Episode → scenes → assets → render jobs processed asynchronously.
+
+## Schema
+Character(id, user_id, name, traits, voice_id, appearance_ref)  
+Relationship(id, character_a_id, character_b_id, type)  
+Episode(id, user_id, title, character_snapshot_version)  
+Scene(id, episode_id, script_text)  
+Asset(id, scene_id, type, storage_path, hash)  
+RenderJob(id, episode_id, status, retry_count)  
+Artifact(id, render_job_id, storage_path)
+
+## Consistency
+Character snapshot version stored per episode to maintain continuity.
+
+## Storage
+Versioned assets with hash-based deduplication and per-user quotas.
+
+## Security & Cost
+Rate limits on render jobs, file size limits, per-user storage quotas.
 
-You need to put your solution here.
 
 ## Problem 5: Cross-Cutting
 
@@ -92,6 +207,25 @@ Answer briefly for the whole platform:
 4. **Data retention:** what to delete and when (inputs, artifacts, logs)
 5. **Secrets & compliance:** token encryption, key management approach, PII handling
 
-**Your Answer for problem 5:**
+**My Answer for problem 5:**
+
+## Multi-Tenancy
+User-level tenancy with user_id present in all tables.
+
+## AuthZ Model
+RBAC (USER, ADMIN) enforced at API layer and query filters.
+
+## Observability
+Logs include job_id, correlation_id, and status transitions.  
+Metrics: job latency, failure rate, queue depth.
+
+## Data Retention
+Raw inputs: 30 days  
+Logs: 14 days  
+Artifacts: user-controlled deletion.
+
+## Secrets & Compliance
+Tokens encrypted with AES-256.  
+Keys stored in environment/secret manager.  
+Minimal PII storage.
 
-You need to put your solution here.

From 23b139c3cae1bec0a460d393b1219c1653ebe2b3 Mon Sep 17 00:00:00 2001
From: Dinesh Kumar <154494542+Dineshchoudhary6229@users.noreply.github.com>
Date: Thu, 19 Feb 2026 19:45:10 +0530
Subject: [PATCH 2/2] Update Backend.md

---
 Backend.md | 113 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 98 insertions(+), 15 deletions(-)

diff --git a/Backend.md b/Backend.md
index 69176bc6..14bd2faf 100644
--- a/Backend.md
+++ b/Backend.md
@@ -94,29 +94,112 @@ MVP single worker → v1 horizontal workers + S3 + queue partitioning.
 **My Solution for problem 2:**
 
 ## System Design
-OAuth connection to LinkedIn → store encrypted tokens.  
-Persona stored per user → Draft generation → Schedule → Worker auto-post → PostLog for audit.
+User connects LinkedIn via OAuth → backend stores encrypted access + refresh tokens.  
+User creates Persona (tone, topics, do/don’t rules).  
+GenAI service generates 3 drafts → stored as Draft records.  
+User approves one draft → creates Schedule entry.  
+Scheduler service polls due schedules → sends job to worker.  
+Worker posts to LinkedIn API → stores result in PostLog.
+
+Components:
+- API Service (Spring Boot): OAuth, persona CRUD, draft approval, scheduling
+- Scheduler (cron/queue based): finds due posts using scheduled_time index
+- Worker Service: posts to LinkedIn, handles retries, token refresh
+- PostgreSQL: metadata storage
+- Redis/Queue: async posting jobs
+
+Flow:
+1. OAuth connect → store encrypted tokens
+2. Create persona → request draft generation (GenAI boundary)
+3. Save drafts → user approves one
+4. Create Schedule (PENDING)
+5. Scheduler → enqueue job when scheduled_time reached
+6. Worker → post to LinkedIn → update status POSTED/FAILED → write PostLog
 
 ## Schema
-User(id, email)  
-LinkedInAccount(id, user_id, access_token_enc, refresh_token_enc, expires_at)  
-Persona(id, user_id, tone, topics)  
-Draft(id, user_id, content, status)  
-Schedule(id, draft_id, scheduled_time, status)  
-PostLog(id, schedule_id, status, response, created_at)
+User(id, email)
+
+LinkedInAccount(
+  id,
+  user_id,
+  access_token_enc,
+  refresh_token_enc,
+  expires_at,
+  created_at
+)
+
+Persona(
+  id,
+  user_id,
+  tone,
+  topics,
+  do_dont_rules,
+  created_at
+)
+
+Draft(
+  id,
+  user_id,
+  persona_id,
+  content,
+  status,        -- DRAFT | APPROVED | REJECTED
+  created_at
+)
+
+Schedule(
+  id,
+  draft_id,
+  scheduled_time,
+  status,        -- PENDING | POSTED | FAILED
+  dedupe_key,
+  created_at
+)
+
+PostLog(
+  id,
+  schedule_id,
+  status,
+  linkedin_post_id,
+  response,
+  created_at
+)
+
+PromptConfig(
+  id,
+  version,
+  content,
+  created_at
+)
+
+## Constraints & Indexes
+Unique index on LinkedInAccount.user_id  
+Index on Schedule.scheduled_time for scheduler polling  
+Unique index on Schedule.dedupe_key to prevent double posting  
+FK: Draft.persona_id → Persona.id  
+FK: Schedule.draft_id → Draft.id  
+FK: PostLog.schedule_id → Schedule.id  
 
 ## Security
-AES-256 encrypted tokens, least-privilege scopes, per-user access control.
+OAuth with least-privilege scopes.  
+Access and refresh tokens encrypted using AES-256.  
+Tokens decrypted only inside worker at posting time.  
+User ownership validation on persona, drafts, and schedules.
 
 ## Reliability
-Dedupe key = hash(content + scheduled_time)  
-Retry with exponential backoff  
-Rate limiting per user  
-Automatic token refresh via worker
+Dedupe key = hash(draft_id + scheduled_time) to avoid double posting.  
+Retry with exponential backoff for transient LinkedIn failures.  
+Automatic token refresh using refresh_token before expiry.  
+Rate limiting per user to respect LinkedIn API limits.  
+PostLog keeps full audit trail.
 
-## Prompt Storage
-PromptConfig(id, version, content, created_at) for versioning and rollback.
+## Prompt / Config Storage
+PromptConfig table stores versioned prompt templates from GenAI team.  
+Each draft stores prompt version used → enables rollback and reproducibility.
 
+## Cost & Scalability
+MVP: single scheduler + single worker.  
+v1: horizontally scalable workers, queue partitioning, and delayed job queues.  
+Use scheduled_time index to avoid full table scans.
 
 ---