From aa8507b4d5448b6405e9653f1cbe00376eebb5c0 Mon Sep 17 00:00:00 2001
From: Mohmed Husain <arkonafoob@gmail.com>
Date: Sat, 28 Feb 2026 13:44:44 +0530
Subject: [PATCH] All questions ans

---
 Backend.md | 314 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 309 insertions(+), 5 deletions(-)

diff --git a/Backend.md b/Backend.md
index 4f7ff501..bb8b28ca 100644
--- a/Backend.md
+++ b/Backend.md
@@ -26,7 +26,70 @@
 
 **Your Solution for problem 1:**
 
-You need to put your solution here.
+**Database & Storage Selection**
+
+PostgreSQL for all metadata — it gives us ACID transactions (critical for job state integrity), JSONB for flexible metadata fields, strong indexing, and mature migration tooling (Flyway/Alembic). For MVP it scales vertically; for v1 we add read replicas and PgBouncer.
+
+S3 (or MinIO self-hosted for MVP) for blob storage — videos, transcripts, clips, screenshots, and Summary.md files. Objects are stored under a predictable key layout:
+`uploads/{user_id}/{video_id}/original.mp4` and `artifacts/{job_id}/transcript.json|summary.md|clips/|screenshots/`.
+
+**Schema (tables, key constraints, indexes)**
+
+- **User** — `id (PK, UUID)`, email (UNIQUE), password_hash, display_name, created_at
+- **VideoAsset** — `id (PK)`, `user_id (FK → User, CASCADE)`, original_filename, `storage_key (UNIQUE)`, mime_type, size_bytes, duration_secs, metadata (JSONB), uploaded_at. *Index on user_id.*
+- **Job** — `id (PK)`, `video_asset_id (FK → VideoAsset, CASCADE)`, `idempotency_key (UNIQUE)` — prevents duplicate jobs for the same video, status (CHECK: queued/processing/success/failed), stage (CHECK: transcription/summarization/asset_extraction/complete), attempt_count, max_attempts (default 3), locked_by, locked_at, error_message, timestamps. *Index on (status, stage) for worker polling.*
+- **JobEvent** — `id (PK)`, `job_id (FK → Job, CASCADE)`, event_type, from_status, to_status, message, metadata (JSONB), created_at. *Index on job_id and created_at for audit queries.*
+- **Artifact** — `id (PK)`, `job_id (FK → Job, CASCADE)`, artifact_type (CHECK: transcript/summary_md/clip/screenshot), `storage_key (UNIQUE)`, mime_type, size_bytes, created_at. *Index on job_id.*
+- **Highlight** — `id (PK)`, `job_id (FK → Job, CASCADE)`, title, description, start_ts_secs, end_ts_secs, `clip_artifact_id (FK → Artifact, SET NULL)`, `screenshot_artifact_id (FK → Artifact, SET NULL)`, sort_order. *Index on (job_id, sort_order).*
+
+*Migration order:* User → VideoAsset → Job → JobEvent → Artifact → Highlight (follows FK dependencies).
+
+**Job Lifecycle & Reliability**
+
+The job follows: **queued → processing → success/failed**. Internally, the job tracks the current `stage` (transcription → summarization → asset_extraction → complete). Each stage completion publishes a message for the next stage, so if a failure happens mid-pipeline, retry resumes from the failed stage — not from scratch.
+
+- **Idempotency:** The `idempotency_key` (hash of user_id + video storage_key) prevents duplicate job creation at the API level.
+- **Worker locking:** Workers claim jobs using `SELECT ... FOR UPDATE SKIP LOCKED` — no double-processing.
+- **Retries:** On transient failure, the worker increments attempt_count, logs a JobEvent, and re-enqueues with exponential backoff. After max_attempts (3), the job moves to `failed`.
+- **Stale lock recovery:** A cron reaper reclaims jobs where locked_at is older than 10 minutes.
+- **Observability:** Every state transition is recorded in JobEvent with timestamps and worker IDs. We log correlation_ids through the queue for end-to-end tracing.
+
+**Security**
+
+- All S3 buckets are private. Uploads and downloads use **pre-signed URLs** (15-min expiry) so the API never proxies large files.
+- Storage keys are server-generated UUIDs — users never control path components (prevents path traversal).
+- API authenticates via JWT; every request checks `user_id` ownership before returning data.
+- Content-Disposition header is set on downloads to force the original filename.
+
+**Cost & Scalability (MVP → v1)**
+
+- *MVP:* Single API instance, 2–3 workers, one Postgres, MinIO. Handles tens of concurrent jobs.
+- *v1:* Horizontal worker autoscaling based on queue depth, Postgres read replicas, move to managed S3, add CDN (CloudFront) for frequently accessed artifacts.
+
+**System Design**
+
+```
+┌──────────┐       ┌──────────────┐       ┌───────────────────┐
+│  Client  │──────▶│  API Server  │──────▶│  Message Queue    │
+│          │◀──────│  (REST)      │       │  (Redis/RabbitMQ) │
+└──────────┘       └──────┬───────┘       └────────┬──────────┘
+                          │                        │
+                          ▼                        ▼
+                   ┌──────────────┐       ┌──────────────────┐
+                   │  PostgreSQL  │       │  Worker Pool     │
+                   │  (metadata)  │       │  • Transcription  │
+                   └──────────────┘       │  • Summarization  │
+                          │               │  • Asset Extractor│
+                   ┌──────────────┐       └────────┬─────────┘
+                   │ S3 / MinIO   │                │
+                   │ (blob store) │◀───────────────┘
+                   └──────────────┘       ┌──────────────────┐
+                                          │  External AI     │
+                                          │  (Whisper, GPT)  │
+                                          └──────────────────┘
+```
+
+**Flow:** Client uploads video via pre-signed URL → API creates Job (queued) → Queue dispatches to Transcription Worker → worker calls Whisper API, stores transcript in S3, advances stage → Summarization Worker calls LLM, produces Summary.md + Highlights → Asset Extraction Worker uses ffmpeg to cut clips/screenshots at highlight timestamps → Job marked success → Client polls status and downloads artifacts via pre-signed URLs.
 
 ---
 
@@ -44,7 +107,70 @@ You need to put your solution here.
 
 **Your Solution for problem 2:**
 
-You need to put your solution here.
+**Database & Storage**
+
+PostgreSQL — relational integrity between users, accounts, personas, drafts, and schedules is essential. Token encryption is done at the application level before storage. No blob storage needed here (text-only data).
+
+**Schema**
+
+- **User** — `id (PK, UUID)`, email (UNIQUE), password_hash, display_name, created_at
+- **LinkedInAccount** — `id (PK)`, `user_id (FK → User, CASCADE)`, linkedin_user_id, `access_token_enc (BYTEA)` — AES-256-GCM encrypted, refresh_token_enc (BYTEA), token_expires_at, scopes, is_active, timestamps. *UNIQUE(user_id, linkedin_user_id).*
+- **Persona** — `id (PK)`, `user_id (FK → User, CASCADE)`, name, background, tone, language_style, dos, donts, extra_config (JSONB), is_default, timestamps. *Index on user_id.*
+- **Draft** — `id (PK)`, `user_id (FK)`, `persona_id (FK → Persona, CASCADE)`, topic, context, style_variant (concise_insight / story_based / actionable_checklist), content (TEXT), status (CHECK: generated/approved/rejected/posted/failed), prompt_version_id (FK), generation_meta (JSONB), timestamps. *Index on (user_id, status).*
+- **Schedule** — `id (PK)`, `draft_id (FK → Draft, CASCADE, UNIQUE)` — a draft can only be scheduled once (dedupe), `linkedin_account_id (FK)`, scheduled_at, timezone, status (CHECK: pending/enqueued/posted/failed/cancelled). *Partial index on (status, scheduled_at) WHERE status = 'pending' for scheduler queries.*
+- **PostAttempt** — `id (PK)`, `schedule_id (FK → Schedule, CASCADE)`, attempt_number, linkedin_post_id, http_status, response_body, status (success/failed/rate_limited), attempted_at. *UNIQUE(schedule_id, attempt_number).*
+- **PromptVersion** — `id (PK)`, name, version, prompt_template (TEXT), model_config (JSONB), is_active, changelog, created_at. *UNIQUE(name, version).* *Partial index on (name, is_active) WHERE is_active = true.*
+
+*Migration order:* User → LinkedInAccount → Persona → PromptVersion → Draft → Schedule → PostAttempt.
+
+**Security**
+
+- LinkedIn tokens are encrypted with **AES-256-GCM** before DB storage. The encryption key lives in a secrets manager (AWS KMS / Vault) — never in code.
+- OAuth flow uses the `state` parameter (signed, short-lived) to prevent CSRF.
+- Only least-privilege LinkedIn scopes are requested: `w_member_social` (post) and `r_liteprofile` (name).
+- All queries enforce `WHERE user_id = :current_user` — users can only access their own accounts, personas, and drafts.
+
+**Reliability**
+
+- **Retry posting:** On 5xx or network timeout, retry up to 3 times with exponential backoff. Permanent errors (401/403) are not retried — user is notified to re-authorize.
+- **Double-post prevention:** `draft_id UNIQUE` on Schedule means one draft = one schedule only. Post Worker checks for an existing `success` PostAttempt before calling LinkedIn. Scheduler uses `FOR UPDATE SKIP LOCKED` so two workers never pick the same job.
+- **Rate limiting:** Worker respects LinkedIn's 429 `Retry-After` header. API also limits users to max 5 posts/day and 10 draft generations/hour.
+- **Token refresh:** A background cron refreshes tokens 7 days before expiry. If refresh fails, `is_active` is set to false and the user is notified.
+
+**Prompt/Config Storage Proposal**
+
+Hybrid approach: GenAI team maintains prompts in a **Git repo** (source of truth, reviewed via PRs). On merge, a CI step upserts the new version into the **PromptVersion DB table** and marks it active. Workers read the active version from the DB at runtime. Rollback is instant: flip `is_active` to a previous version via an admin API. Every Draft stores `prompt_version_id` so we can trace which prompt produced which output.
+
+**Cost & Scalability (MVP → v1)**
+
+- *MVP:* One API server, one scheduler cron (runs every minute), one post worker. Handles hundreds of users.
+- *v1:* Separate draft-generation workers (CPU-bound LLM calls) from lightweight post workers. Queue-based autoscaling. Rate-limit-aware scheduling with priority lanes.
+
+**System Design**
+
+```
+┌──────────┐      ┌──────────────────┐      ┌──────────────┐
+│ Frontend │─────▶│   API Server     │─────▶│  PostgreSQL  │
+│          │◀─────│ (REST + OAuth)   │      └──────────────┘
+└──────────┘      └───────┬──────────┘
+      │                   │
+      │ OAuth redirect    │ enqueue
+      ▼                   ▼
+┌──────────────┐  ┌─────────────────────┐
+│  LinkedIn    │  │   Task Queue        │
+│  OAuth 2.0   │  │  (Redis + BullMQ)   │
+└──────────────┘  └──────┬──────────────┘
+                         │
+              ┌──────────┼───────────┐
+              ▼          ▼           ▼
+       ┌───────────┐ ┌─────────┐ ┌───────────┐
+       │ Draft Gen │ │Scheduler│ │Post Worker│
+       │ Worker    │ │ (cron)  │ │(publishes │
+       │(calls AI) │ │         │ │to LinkedIn│
+       └───────────┘ └─────────┘ └───────────┘
+```
+
+**Flow:** User connects LinkedIn via OAuth → tokens encrypted and stored → User sets up Persona → Requests drafts → Draft Gen Worker calls GenAI with persona + topic + active PromptVersion, returns 3 variants → User approves one, schedules it → Scheduler cron picks up due schedules, enqueues publish jobs → Post Worker decrypts token, posts to LinkedIn API, logs result in PostAttempt.
 
 ---
 
@@ -62,7 +188,75 @@ You need to put your solution here.
 
 **Your Solution for problem 3:**
 
-You need to put your solution here.
+**Database & Storage**
+
+PostgreSQL for metadata (templates, fields, bulk run tracking, per-row status). S3/MinIO for blob storage — uploaded DOCX templates, CSV inputs, generated documents, and ZIP bundles.
+
+Storage layout: `templates/{user_id}/{template_id}/v{N}/template.docx`, `inputs/{bulk_run_id}/input.csv`, `outputs/{bulk_run_id}/rows/001_Name.pdf`, `outputs/{bulk_run_id}/bundle.zip`.
+
+**Cleanup policy:** CSV inputs deleted after 7 days, generated docs after 30 days, ZIPs after 14 days (large, expensive). Automated via S3 lifecycle rules.
+
+**Schema**
+
+- **Template** — `id (PK)`, `user_id (FK → User, CASCADE)`, name, description, current_version, timestamps. *Index on user_id.*
+- **TemplateVersion** — `id (PK)`, `template_id (FK → Template, CASCADE)`, version, `storage_key (UNIQUE)`, original_filename, file_hash (SHA-256), created_at. *UNIQUE(template_id, version).*
+- **TemplateField** — `id (PK)`, `template_version_id (FK → TemplateVersion, CASCADE)`, field_key, field_type (CHECK: text/number/date/currency/boolean/optional_block), is_required, default_value, sort_order. *UNIQUE(template_version_id, field_key).*
+- **BulkRun** — `id (PK)`, `template_version_id (FK)`, `user_id (FK)`, input_storage_key, total_rows, processed_rows, succeeded_rows, failed_rows, status (CHECK: pending/processing/completed/failed/cancelled), zip_storage_key, report_storage_key, timestamps. *Index on (user_id) and (status).*
+- **BulkRow** — `id (PK)`, `bulk_run_id (FK → BulkRun, CASCADE)`, row_number, input_data (JSONB), status (CHECK: pending/processing/success/failed/skipped), error_message, output_storage_key, attempt_count. *UNIQUE(bulk_run_id, row_number). Index on (bulk_run_id, status).*
+- **Artifact** — `id (PK)`, `user_id (FK)`, source_type (single_fill/bulk_row/bulk_zip), source_id, `storage_key (UNIQUE)`, filename, mime_type, size_bytes, created_at.
+- **JobEvent** — `id (PK)`, entity_type, entity_id, event_type, message, metadata (JSONB), created_at. *Index on (entity_type, entity_id).*
+
+*Migration order:* User → Template → TemplateVersion → TemplateField → BulkRun → BulkRow → Artifact → JobEvent.
+
+**Reliability**
+
+- **Partial success:** Each row is processed independently. Failed rows don't stop the bulk run. Final status shows `succeeded_rows` vs `failed_rows` counts.
+- **Per-row status:** Every BulkRow tracks its own status and error_message. The generation report (CSV) lists each row's outcome.
+- **Retries:** Transient errors (PDF conversion timeout) retry up to 3 times. Validation errors (missing field) fail immediately — no retry.
+- **Resumable runs:** If the worker crashes, it restarts by querying `WHERE status IN ('pending','processing')` on that BulkRun and resumes. Already-succeeded rows are skipped.
+- **Idempotency:** UNIQUE(bulk_run_id, row_number) ensures reprocessing overwrites — no duplicate docs.
+- **Concurrency:** Worker processes rows in batches of 10 to avoid overloading the PDF renderer.
+
+**Security**
+
+- All queries scoped to `user_id`. Templates and outputs are isolated per user via S3 key prefixes (`{user_id}/...`).
+- Downloads via pre-signed S3 URLs (15-min expiry), Content-Disposition forces download.
+- Storage keys are server-generated UUIDs — no user-controlled paths (anti-path-traversal).
+- DOCX uploads are scanned for macros/VBA and rejected if found. Field values are escaped to prevent XML injection.
+- Max file sizes enforced: DOCX 20MB, CSV 50MB.
+
+**Cost & Scalability (MVP → v1)**
+
+- *MVP:* Single API + one bulk worker. PDF conversion via LibreOffice headless (or Gotenberg). Handles hundreds of rows per run.
+- *v1:* Pool of bulk workers with concurrency-limited row processing. Move PDF conversion to a dedicated microservice for better resource isolation. Add progress webhooks for real-time UI updates.
+
+**System Design**
+
+```
+┌──────────┐      ┌──────────────────┐      ┌──────────────┐
+│ Frontend │─────▶│   API Server     │─────▶│  PostgreSQL  │
+│ (form UI,│◀─────│   (REST API)     │      └──────────────┘
+│  uploads)│      └───────┬──────────┘
+└──────────┘              │
+               ┌──────────┼──────────────┐
+               ▼          ▼              ▼
+      ┌──────────────┐ ┌────────────┐ ┌──────────────────┐
+      │ Template     │ │  Export    │ │ Bulk Job Worker  │
+      │ Ingestion    │ │  Service   │ │ (parallel rows,  │
+      │ (parse DOCX, │ │ (single    │ │  ZIP builder,    │
+      │  detect      │ │  fill +    │ │  report gen)     │
+      │  fields)     │ │  PDF)      │ └──────────────────┘
+      └──────────────┘ └────────────┘          │
+               │              │                ▼
+               ▼              ▼         ┌──────────────┐
+              ┌────────────────────┐    │ PDF Converter│
+              │    S3 / MinIO      │    │ (LibreOffice │
+              │ (templates, CSVs,  │    │  headless)   │
+              │  outputs, ZIPs)    │    └──────────────┘
+              └────────────────────┘
+```
+
+**Flow:** User uploads DOCX → Template Ingestion parses it, detects `{{field}}` placeholders, stores fields in DB → For single fill: user fills form → Export Service fills template + converts to PDF → returns download URL. For bulk fill: user uploads CSV → API creates BulkRun → Bulk Worker processes each row (fill + convert), tracks per-row status → builds ZIP + report → uploads to S3 → user downloads via pre-signed URL.
 
 ---
 
@@ -80,7 +274,89 @@ You need to put your solution here.
 
 **Your Solution for problem 4:**
 
-You need to put your solution here.
+**Database & Storage**
+
+PostgreSQL for all structured data (characters, relationships, episodes, scenes, render jobs). S3/MinIO for all media assets — character reference images, generated scene images, voice lines, music, and rendered videos.
+
+Storage layout: `series/{series_id}/characters/{char_id}/reference_v{N}.png`, `series/{series_id}/episodes/{ep_id}/scenes/{scene_num}/scene_image.png|voice_{char}.mp3`, `series/{series_id}/episodes/{ep_id}/render/final.mp4`.
+
+**Schema**
+
+- **Character** — `id (PK)`, `user_id (FK)`, series_id, name, personality (TEXT), speaking_style, visual_desc, `ref_image_key` (S3), voice_profile (JSONB — voice clone ID, pitch, speed), consistency_embedding (BYTEA — CLIP vector for visual consistency), extra_meta (JSONB), timestamps. *Index on series_id and user_id.*
+- **Relationship** — `id (PK)`, series_id, `character_a_id (FK → Character, CASCADE)`, `character_b_id (FK → Character, CASCADE)`, rel_type (friend/rival/parent_child/mentor), description, is_bidirectional. *UNIQUE(character_a_id, character_b_id). CHECK(a ≠ b).*
+- **Episode** — `id (PK)`, series_id, `user_id (FK)`, episode_number, title, story_prompt, style (CHECK: comedy/motivational/slice_of_life/drama/action), target_duration_secs (default 300), output_format (16:9/9:16/1:1), language, narration_ratio, status (CHECK: draft/generating/ready/rendering/complete/failed), timestamps. *UNIQUE(series_id, episode_number).*
+- **EpisodeCharacter** — (episode_id, character_id) composite PK — tracks which characters appear in each episode.
+- **Scene** — `id (PK)`, `episode_id (FK → Episode, CASCADE)`, scene_number, description, dialogue (JSONB array of {character_id, line, emotion}), narration, duration_secs, background_desc, music_cue. *UNIQUE(episode_id, scene_number).*
+- **Asset** — `id (PK)`, episode_id (FK), scene_id (FK), character_id (FK), asset_type (CHECK: scene_image/character_image/voice_line/background_music/sound_effect/script_doc), `storage_key (UNIQUE)`, `content_hash` (SHA-256 for dedup), mime_type, size_bytes, generation_meta (JSONB), version. *Index on episode_id and content_hash.*
+- **RenderJob** — `id (PK)`, `episode_id (FK → Episode, CASCADE)`, stage (CHECK: script_gen/asset_gen/video_render/complete), status (CHECK: queued/processing/success/failed), attempt_count, max_attempts, locked_by, locked_at, error_message, timestamps. *Index on (status, stage).*
+- **Artifact** — `id (PK)`, `episode_id (FK)`, artifact_type (CHECK: final_video/script_pdf/storyboard/asset_bundle), `storage_key (UNIQUE)`, mime_type, size_bytes, created_at.
+
+*Migration order:* User → Character → Relationship → Episode → EpisodeCharacter → Scene → Asset → RenderJob → Artifact.
+
+**Consistency Data (Character Continuity Across Episodes)**
+
+This is the core challenge. We persist:
+- **Reference image + CLIP embedding** — stored per character. Every image generation call receives the reference image and embedding as conditioning input (via IP-Adapter or similar) to maintain the same face/style.
+- **Visual description** — detailed text ("short brown hair, blue hoodie, round glasses") appended to every scene image prompt.
+- **Voice profile** — voice clone ID (ElevenLabs) ensures the same voice across all episodes.
+- **Personality + speaking style** — injected into the LLM system prompt so dialogue stays in character.
+- **Relationship graph** — fed to the LLM so characters interact correctly (mentor gives advice, rivals disagree).
+- **Episode history** — short LLM-generated summaries of past episodes stored in DB, fed as context for new episodes to maintain narrative continuity.
+
+Before generating any episode, the system assembles a "character context pack" (all the above) and passes it to both the LLM and the image/voice generation models.
+
+**Versioning & Dedup**
+
+- Character reference images are versioned (v1, v2, ...). The Character row points to the current version.
+- Every generated asset stores a `content_hash` (SHA-256). Before writing a new asset, the worker checks for an existing match — if found, it reuses that S3 object. Background music and sound effects benefit most from dedup.
+
+**Reliability**
+
+The episodic pipeline runs as sequential stages per episode: **script_gen → asset_gen → video_render → complete**. Each stage is a RenderJob with its own state machine (queued → processing → success/failed). Same retry/locking patterns as Problem 1 (SKIP LOCKED, exponential backoff, max 3 attempts, stale lock reaper). Episodes can be processed in parallel across different episodes.
+
+**Security + Cost Controls**
+
+- **Quotas:** Per-user limits (e.g., 5 series, 50 episodes/series, 20 characters/series) enforced at the API layer.
+- **Rate limits:** Max 10 episode generations/hour per user. External AI calls tracked in a `usage_ledger` table (tokens × cost) with daily/monthly caps.
+- **Large assets:** Max reference image 10MB, max rendered video 500MB. Render jobs timeout after 30 minutes.
+- **Access control:** All resources scoped by user_id. Downloads via pre-signed URLs.
+
+**Cost & Scalability (MVP → v1)**
+
+- *MVP:* Single API, 1–2 workers covering all stages, one Postgres. Good for single-user / small team.
+- *v1:* Separate worker pools per stage (script gen is LLM-heavy, asset gen is GPU-heavy, render is CPU-heavy). Queue-based autoscaling. Usage-based billing tied to the cost ledger.
+
+**System Design**
+
+```
+┌──────────┐      ┌──────────────────┐      ┌──────────────┐
+│ Frontend │─────▶│   API Server     │─────▶│  PostgreSQL  │
+│ (series  │◀─────│   (REST API)     │      └──────────────┘
+│  bible,  │      └───────┬──────────┘
+│  episode │              │
+│  editor) │    ┌─────────┼──────────────┐
+└──────────┘    ▼         ▼              ▼
+         ┌──────────┐ ┌────────────┐ ┌──────────────┐
+         │ Script   │ │ Asset Gen  │ │ Render       │
+         │ Generator│ │ Worker     │ │ Pipeline     │
+         │ (LLM →   │ │ (image gen,│ │ (compose     │
+         │  script + │ │ voice gen, │ │  video,      │
+         │  scenes) │ │ music)     │ │  export mp4) │
+         └──────────┘ └────────────┘ └──────────────┘
+                          │                  │
+                          ▼                  ▼
+              ┌──────────────────────────────────┐
+              │       S3 / MinIO (blob store)    │
+              └──────────────────────────────────┘
+                          │
+              ┌───────────┴────────────┐
+              │  External AI Services  │
+              │  (GPT-4, DALL-E,       │
+              │  ElevenLabs, etc.)     │
+              └────────────────────────┘
+```
+
+**Flow:** User defines characters with reference images and personality in a series bible → Creates an episode with a story prompt and selects characters → System assembles the "character context pack" → Script Generator (LLM) produces script + scene breakdown → Asset Gen Worker generates scene images (using character embeddings for consistency), voice lines (using cloned voices), and music cues → Render Pipeline composites everything into a ~5min video → Artifact stored in S3, episode marked complete.
 
 ## Problem 5: Cross-Cutting
 
@@ -94,4 +370,32 @@ Answer briefly for the whole platform:
 
 **Your Answer for problem 5:**
 
-You need to put your solution here.
+**1. Multi-Tenancy**
+
+Workspace-level. Every resource belongs to a `workspace` (not directly to a user). A `workspace_member` join table links users to workspaces with a role. This way, for personal use it's one user = one workspace, but teams can share resources (templates, personas, series) without data duplication. All resource tables carry a `workspace_id` FK, and every query filters by `WHERE workspace_id = :current_workspace`.
+
+**2. AuthZ Model**
+
+RBAC with four roles: **owner** (full control + billing), **admin** (CRUD all resources, manage members), **member** (CRUD own resources, read shared), **viewer** (read-only). Enforced at two layers: (1) API middleware checks the user's role from `workspace_member` against the endpoint's required permission, (2) PostgreSQL Row-Level Security as a defense-in-depth backup — even a bug in the API can't leak data across workspaces.
+
+**3. Observability**
+
+Every job state transition is logged in a JobEvent table with `job_id`, `from_status`, `to_status`, `worker_id`, `timestamp`, and `duration_ms`. Every inbound API request gets a `correlation_id` (UUID) that propagates through queue messages and worker logs, enabling end-to-end tracing. Key metrics (Prometheus): `jobs_total{status, stage}`, `job_duration_seconds{stage}`, `queue_depth{queue_name}`, `external_api_latency{service}`, `http_request_duration{method, path}`. Logs are structured JSON shipped to a centralized store (ELK/Loki).
+
+**4. Data Retention**
+
+- Uploaded inputs (videos, CSVs, DOCX): deleted 30 days after job completion.
+- Generated artifacts (transcripts, PDFs, clips): 60 days, then deleted.
+- ZIP bundles: 14 days (large, expensive).
+- Job events / audit logs: 90 days hot, archived for 1 year, then deleted.
+- LinkedIn tokens: hard-deleted immediately on user disconnect.
+- Deleted user accounts: 30-day soft-delete grace period, then hard-purged (GDPR).
+
+Automated via S3 lifecycle rules + a daily DB cleanup cron.
+
+**5. Secrets & Compliance**
+
+- **Token encryption:** AES-256-GCM envelope encryption — a Data Encryption Key (DEK) encrypts the token, the DEK itself is encrypted by a Key Encryption Key (KEK) in AWS KMS / Vault. KEK never leaves the HSM.
+- **Key rotation:** DEK rotated every 90 days. A cron re-encrypts active tokens. Old DEKs retained (encrypted) to decrypt legacy data.
+- **PII handling:** Emails, names, LinkedIn profiles are classified as PII. On account deletion, all PII is hard-deleted within 30 days. PII is never logged — emails are hashed in logs if needed. GDPR data export supported via an API endpoint.
+- **Transport:** All traffic over TLS 1.2+. Internal services use mTLS or private VPC. S3 enforces SecureTransport policy.