From 25b205caf1282193c2498b27a4e0ccbf526a154b Mon Sep 17 00:00:00 2001 From: Dinesh Kumar <154494542+Dineshchoudhary6229@users.noreply.github.com> Date: Thu, 19 Feb 2026 19:34:15 +0530 Subject: [PATCH 1/2] Add backend architecture solutions for all problems Updated the Backend.md document to reflect changes in problem solutions, including system design, schema, and security considerations for various projects. --- Backend.md | 162 ++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 148 insertions(+), 14 deletions(-) diff --git a/Backend.md b/Backend.md index 4f7ff501..69176bc6 100644 --- a/Backend.md +++ b/Backend.md @@ -15,7 +15,7 @@ **Goal:** Upload video → async processing → outputs: transcript, **Summary.md**, highlights (timestamps), screenshot/clip references. [READ MORE ABOUT THE PROJECT](./Video-summary-platform.md) -**Your solution must include** +**My solution includes** * **System design:** API + worker(s) + storage + external AI/transcription boundaries * **DB choice:** Postgres vs other (short justification) @@ -24,9 +24,58 @@ * **Job lifecycle:** queued → processing → success/failed (+ retry/idempotency rule) * **Storage layout:** how files/artifacts are stored + safe download strategy -**Your Solution for problem 1:** +**My Solution for problem 1:** +## System Design +API Service (Spring Boot) handles video upload, job creation, and artifact retrieval. +Worker Service processes jobs asynchronously using a queue (Redis/RabbitMQ). +PostgreSQL stores metadata. +Object storage (S3-compatible) stores videos and generated outputs. +External AI services handle transcription and summarization. + +Supports batch folder processing, chunked upload, and streaming for large files (200MB+, 3–4 hours). + +Flow: +1. Upload video → create VideoAsset + Job (QUEUED) +2. Worker picks job → PROCESSING → calls AI +3. Stores transcript, summary, highlights → SUCCESS/FAILED + +## Database Choice +PostgreSQL for strong relational modeling, indexing on job status, and JSONB for AI metadata. + +## Schema (Tables) +User(id, email, created_at) +VideoAsset(id, user_id, storage_path, duration, created_at) +Job(id, video_id, status, retry_count, idempotency_key, created_at) +JobEvent(id, job_id, status, message, created_at) +Artifact(id, job_id, type, storage_path, created_at) +Highlight(id, job_id, timestamp_start, timestamp_end, text) + +## Constraints & Indexes +FK: video_asset.user_id → user.id +FK: job.video_id → video_asset.id +Index on job(status) +Unique job.idempotency_key +Index on artifact.job_id + +## Job Lifecycle +QUEUED → PROCESSING → SUCCESS | FAILED +Max 3 retries with idempotency check. + +## Storage Strategy +/tenant/{userId}/videos/{videoId}.mp4 +/tenant/{userId}/jobs/{jobId}/summary.md + +Secure downloads via pre-signed URLs after auth validation. + +## Security +User ownership validation and private object storage. + +## Reliability +Job state machine, JobEvent logging, retries with backoff. + +## Cost & Scalability +MVP single worker → v1 horizontal workers + S3 + queue partitioning. -You need to put your solution here. --- @@ -34,7 +83,7 @@ You need to put your solution here. **Goal:** Connect LinkedIn → store persona → generate drafts (handled by GenAI team) → approve → schedule → auto-post + audit logs. [READ MORE ABOUT THE PROJECT](./linkedin-automation.md) -**Your solution must include** +**My solution includes** * **System design:** OAuth flow, token storage/refresh, scheduler/worker design * **Schema:** User, LinkedInAccount, Persona, Draft, Schedule, PostAttempt/PostLog @@ -42,9 +91,32 @@ You need to put your solution here. * **Reliability:** retry posting, dedupe to prevent double-post, rate limiting * **Prompt/config storage proposal:** how backend stores “prompt versions” or “config packs” provided by GenAI team (DB vs repo vs hybrid, versioning + rollback) -**Your Solution for problem 2:** +**My Solution for problem 2:** + +## System Design +OAuth connection to LinkedIn → store encrypted tokens. +Persona stored per user → Draft generation → Schedule → Worker auto-post → PostLog for audit. + +## Schema +User(id, email) +LinkedInAccount(id, user_id, access_token_enc, refresh_token_enc, expires_at) +Persona(id, user_id, tone, topics) +Draft(id, user_id, content, status) +Schedule(id, draft_id, scheduled_time, status) +PostLog(id, schedule_id, status, response, created_at) + +## Security +AES-256 encrypted tokens, least-privilege scopes, per-user access control. + +## Reliability +Dedupe key = hash(content + scheduled_time) +Retry with exponential backoff +Rate limiting per user +Automatic token refresh via worker + +## Prompt Storage +PromptConfig(id, version, content, created_at) for versioning and rollback. -You need to put your solution here. --- @@ -52,7 +124,7 @@ You need to put your solution here. **Goal:** Upload DOCX template → detect fields → single fill export → bulk fill via CSV/Sheet → ZIP + per-row report. [READ MORE ABOUT THE PROJECT](./docs-template-output-generation.md) -**Your solution must include** +**My solution includes** * **System design:** template ingestion, field extraction service, bulk job worker, export service * **Schema:** Template, TemplateVersion, TemplateField, BulkRun, BulkRow, Artifact, JobEvent @@ -60,9 +132,31 @@ You need to put your solution here. * **Reliability:** partial success handling, per-row status, retries, resumable bulk run * **Security:** template isolation per tenant/user, safe downloads, anti-path traversal -**Your Solution for problem 3:** +**My Solution for problem 3:** + +## System Design +Template upload → extract fields → store TemplateVersion + TemplateFields. +CSV upload → BulkRun → Worker processes rows → generate DOCX/PDF → ZIP export. + +## Schema +Template(id, user_id, name) +TemplateVersion(id, template_id, storage_path, created_at) +TemplateField(id, template_version_id, field_name) +BulkRun(id, template_version_id, status, total_rows, processed_rows) +BulkRow(id, bulk_run_id, status, output_artifact_id) +Artifact(id, storage_path, type) +JobEvent(id, bulk_run_id, message) + +## Storage Strategy +CSV input, generated docs, and ZIP stored in object storage. +Temporary files deleted after ZIP creation. + +## Reliability +Per-row status, resumable using processed_rows, partial success supported. + +## Security +Tenant isolation via user_id, signed URLs, path traversal prevention. -You need to put your solution here. --- @@ -70,7 +164,7 @@ You need to put your solution here. **Goal:** Define characters once (image + traits + relationships). For each episode story → output episode package (script/scenes/assets plan/render plan), optionally render. [READ MORE ABOUT THE PROJECT](./char-based-video-generation.md) -**Your solution must include** +**My solution includes** * **System design:** episodic pipeline as jobs, asset management, consistency strategy storage * **Schema:** Character, Relationship, Episode, Scene, Asset, RenderJob, Artifact @@ -78,9 +172,30 @@ You need to put your solution here. * **Storage:** images/audio/video assets, versioning, dedupe strategy * **Security + cost controls:** quotas, rate limits, large asset constraints -**Your Solution for problem 4:** +**My Solution for problem 4:** + +## System Design +Characters defined once with traits and assets. +Episode pipeline: Episode → scenes → assets → render jobs processed asynchronously. + +## Schema +Character(id, user_id, name, traits, voice_id, appearance_ref) +Relationship(id, character_a_id, character_b_id, type) +Episode(id, user_id, title, character_snapshot_version) +Scene(id, episode_id, script_text) +Asset(id, scene_id, type, storage_path, hash) +RenderJob(id, episode_id, status, retry_count) +Artifact(id, render_job_id, storage_path) + +## Consistency +Character snapshot version stored per episode to maintain continuity. + +## Storage +Versioned assets with hash-based deduplication and per-user quotas. + +## Security & Cost +Rate limits on render jobs, file size limits, per-user storage quotas. -You need to put your solution here. ## Problem 5: Cross-Cutting @@ -92,6 +207,25 @@ Answer briefly for the whole platform: 4. **Data retention:** what to delete and when (inputs, artifacts, logs) 5. **Secrets & compliance:** token encryption, key management approach, PII handling -**Your Answer for problem 5:** +**My Answer for problem 5:** + +## Multi-Tenancy +User-level tenancy with user_id present in all tables. + +## AuthZ Model +RBAC (USER, ADMIN) enforced at API layer and query filters. + +## Observability +Logs include job_id, correlation_id, and status transitions. +Metrics: job latency, failure rate, queue depth. + +## Data Retention +Raw inputs: 30 days +Logs: 14 days +Artifacts: user-controlled deletion. + +## Secrets & Compliance +Tokens encrypted with AES-256. +Keys stored in environment/secret manager. +Minimal PII storage. -You need to put your solution here. From 23b139c3cae1bec0a460d393b1219c1653ebe2b3 Mon Sep 17 00:00:00 2001 From: Dinesh Kumar <154494542+Dineshchoudhary6229@users.noreply.github.com> Date: Thu, 19 Feb 2026 19:45:10 +0530 Subject: [PATCH 2/2] Update Backend.md --- Backend.md | 113 ++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 98 insertions(+), 15 deletions(-) diff --git a/Backend.md b/Backend.md index 69176bc6..14bd2faf 100644 --- a/Backend.md +++ b/Backend.md @@ -94,29 +94,112 @@ MVP single worker → v1 horizontal workers + S3 + queue partitioning. **My Solution for problem 2:** ## System Design -OAuth connection to LinkedIn → store encrypted tokens. -Persona stored per user → Draft generation → Schedule → Worker auto-post → PostLog for audit. +User connects LinkedIn via OAuth → backend stores encrypted access + refresh tokens. +User creates Persona (tone, topics, do/don’t rules). +GenAI service generates 3 drafts → stored as Draft records. +User approves one draft → creates Schedule entry. +Scheduler service polls due schedules → sends job to worker. +Worker posts to LinkedIn API → stores result in PostLog. + +Components: +- API Service (Spring Boot): OAuth, persona CRUD, draft approval, scheduling +- Scheduler (cron/queue based): finds due posts using scheduled_time index +- Worker Service: posts to LinkedIn, handles retries, token refresh +- PostgreSQL: metadata storage +- Redis/Queue: async posting jobs + +Flow: +1. OAuth connect → store encrypted tokens +2. Create persona → request draft generation (GenAI boundary) +3. Save drafts → user approves one +4. Create Schedule (PENDING) +5. Scheduler → enqueue job when scheduled_time reached +6. Worker → post to LinkedIn → update status POSTED/FAILED → write PostLog ## Schema -User(id, email) -LinkedInAccount(id, user_id, access_token_enc, refresh_token_enc, expires_at) -Persona(id, user_id, tone, topics) -Draft(id, user_id, content, status) -Schedule(id, draft_id, scheduled_time, status) -PostLog(id, schedule_id, status, response, created_at) +User(id, email) + +LinkedInAccount( + id, + user_id, + access_token_enc, + refresh_token_enc, + expires_at, + created_at +) + +Persona( + id, + user_id, + tone, + topics, + do_dont_rules, + created_at +) + +Draft( + id, + user_id, + persona_id, + content, + status, -- DRAFT | APPROVED | REJECTED + created_at +) + +Schedule( + id, + draft_id, + scheduled_time, + status, -- PENDING | POSTED | FAILED + dedupe_key, + created_at +) + +PostLog( + id, + schedule_id, + status, + linkedin_post_id, + response, + created_at +) + +PromptConfig( + id, + version, + content, + created_at +) + +## Constraints & Indexes +Unique index on LinkedInAccount.user_id +Index on Schedule.scheduled_time for scheduler polling +Unique index on Schedule.dedupe_key to prevent double posting +FK: Draft.persona_id → Persona.id +FK: Schedule.draft_id → Draft.id +FK: PostLog.schedule_id → Schedule.id ## Security -AES-256 encrypted tokens, least-privilege scopes, per-user access control. +OAuth with least-privilege scopes. +Access and refresh tokens encrypted using AES-256. +Tokens decrypted only inside worker at posting time. +User ownership validation on persona, drafts, and schedules. ## Reliability -Dedupe key = hash(content + scheduled_time) -Retry with exponential backoff -Rate limiting per user -Automatic token refresh via worker +Dedupe key = hash(draft_id + scheduled_time) to avoid double posting. +Retry with exponential backoff for transient LinkedIn failures. +Automatic token refresh using refresh_token before expiry. +Rate limiting per user to respect LinkedIn API limits. +PostLog keeps full audit trail. -## Prompt Storage -PromptConfig(id, version, content, created_at) for versioning and rollback. +## Prompt / Config Storage +PromptConfig table stores versioned prompt templates from GenAI team. +Each draft stores prompt version used → enables rollback and reproducibility. +## Cost & Scalability +MVP: single scheduler + single worker. +v1: horizontally scalable workers, queue partitioning, and delayed job queues. +Use scheduled_time index to avoid full table scans. ---