Scene-to-Image Generation Platform

An interactive web application that retrieves scene context from markdown files, generates a natural language description and a Stable Diffusion prompt using an LLM (DeepSeek), and then produces an AI-generated image with Stable Diffusion (SD1.5).

This project demonstrates the integration of RAG, LLMs, and generative image models in a full-stack environment, with attention to UX design and extensibility.

Quick Start (Running Locally)

Backend

cd backend
pip install -r requirements.txt
uvicorn main:app --log-level debug

Frontend

cd frontend
npm install
npm run dev

Features

Query a scene in natural language.
Retrieve relevant context from markdown-based scene files (RAG).
Generate two-part LLM output:
1. Scene description (narrative text).
2. Stable Diffusion prompt (structured for SD).
Render the scene with SD1.5, optionally with custom checkpoints and LoRAs.
Handle nonsensical or inappropriate requests with safe defaults.

Tech Stack

Frontend: Next.js (React, TypeScript, Tailwind CSS).
Backend: FastAPI.
Retrieval: Custom RAG pipeline over local markdown scene files.
LLM: DeepSeek API.
Image Generation: Diffusers (SD1.5).
Checkpoints / LoRAs: Models from CivitAI (see below).
Infrastructure: DirectML locally, to be upgraded to a cloud infrastructure in the future.

Stable Diffusion 1.5 Setup

Checkpoint: https://civitai.com/models/4384/dreamshaper/

Samplers (selectable in code): Euler a

LoRA Used: https://civitai.com/models/256907/tavern-scenes

Typical Parameters:

Steps: 55
CFG Scale: 7
Resolution: 512x512

Project Structure

/frontend → Next.js (UI, rendering images + text)

/backend → FastAPI (API routes, RAG pipeline, LLM + SD integration)

/backend/scenes → Markdown scene files used for retrieval

Screenshots

Limitations

Quality inconsistency: While output descriptions and images describe a relevant scene, the generated image quality is rather inconsistent.

Limited scenes and angles: Despite the scene being a toy example, the amount of details required to generate different angles from the scene consistently is overwhelming for both SD and the LLM. Future ideas will focus on consistency with this limitation in mind.

Deployment: While the app runs locally, there’s no CI/CD pipeline or production hosting configured.

Model/computation constraints: Only SD1.5 is currently supported due to the high computational requirements of SDXL and Flux. Future cloud implementations will improve generated image quality.

Future Development

Deployment improvements

Migrate backend to GPU hosting.
Containerize backend with Docker for reproducibility.
Add CI/CD pipeline for easier deployment and updates.

Frontend polish

Implement an orb/magic theme.
Implement a loading screen fed by Diffusers callbacks (showing actual step progress, distorted logs, or percentages).
Expand error visuals (orb reactions to invalid inputs).

Model enhancements

Use SDXL or Flux for a higher quality image.
Create a UI to dynamically toggle LoRAs, adjust weights, and layer multiple styles.

Retrieval and LLM chain

Use a graph to improve retrieval.
Replace keyword search with embedding-based retrieval for richer RAG results.
Backtrack output LLM calls in order to further focus and refine the retrieved context.

Evaluation & Testing

Add unit tests.
Introduce automated tests for RAG quality (scene relevance).
Add image evaluation pipelines.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
chroma_db		chroma_db
frontend		frontend
media		media
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene-to-Image Generation Platform

Quick Start (Running Locally)

Backend

Frontend

Features

Tech Stack

Stable Diffusion 1.5 Setup

Project Structure

Screenshots

Limitations

Future Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scene-to-Image Generation Platform

Quick Start (Running Locally)

Backend

Frontend

Features

Tech Stack

Stable Diffusion 1.5 Setup

Project Structure

Screenshots

Limitations

Future Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages