GitHub - 25Devmaker/llama4-image-caption-generator: When user uploads an image it generates an prompt for the user to create similar images

AI Image → Prompt Generator (Groq + Llama-4)

Convert any image into a high-quality AI generation prompt using the Groq API and Meta Llama-4 Scout. This tool analyzes an uploaded image and produces a structured prompt optimized for Stable Diffusion, Midjourney, and other generative models. Built with Python, Gradio, and Groq's ultra-fast inference platform.

🎯 Features

Upload Images: Support for JPG, PNG, GIF, and WebP formats
AI-Powered Analysis: Uses Groq's vision-capable LLM (Llama-4-Scout) to analyze images
Prompt Generation: Generates detailed, AI-ready prompts for image generation tools like Stable Diffusion, Midjourney, or DALL-E
Chat Interface: Clean, interactive Gradio-based chat UI
Auto-Generation: Automatically generates prompts when you upload an image
Copy Functionality: Easy copy-to-clipboard for generated prompts

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set Up Your API Key

Option A: Edit the .env file and add your Groq API key:

GROQ_API_KEY="your_actual_groq_api_key_here"

Option B: Enter your API key directly in the UI when you run the application

Get a Groq API Key: Visit https://console.groq.com/ to sign up and get your free API key.

3. Run the Application

Method 1 - Using the test script:

python run_test.py

Method 2 - Run directly:

python imgcap.py

4. Use the Application

Open your browser to http://127.0.0.1:7860
Upload an image (or enter API key if not in .env)
The AI will automatically analyze the image and generate a detailed prompt
Copy the prompt to use in your favorite image generation tool!

How It Works

The application:

Accepts an uploaded image from the user
Encodes the image in base64 format
Sends it to Groq's vision-capable LLM (Llama-4-Scout-17B)
The LLM analyzes the image and generates:
- A ready-to-use comma-separated prompt
- Detailed breakdown (subject, style, lighting, color palette, composition, mood, etc.)

Tech Stack

Python: Main programming language
Gradio: Interactive web UI framework
Groq: Fast LLM inference platform
Llama-4-Scout-17B: Vision-capable language model

Project Structure

.
├── imgcap.py          # Main application file with Gradio UI
├── run_test.py        # Test/launch script
├── requirements.txt   # Python dependencies
├── .env              # Environment variables (API keys)
└── README.md         # This file

💡 Tips

The model automatically generates prompts as soon as you upload an image
Use the "Clear" button to reset the chat and upload a new image
The generated prompts are optimized for AI image generators
You can use the copy button on the chat interface to quickly copy prompts

🔧 Troubleshooting

API Key Error: Make sure your GROQ_API_KEY is set correctly in the .env file or entered in the UI.

Import Error: Run pip install -r requirements.txt to install all dependencies.

Image Upload Error: Ensure your image is in a supported format (JPG, PNG, GIF, WebP).

📄 License

Free to use and modify for personal and commercial projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Features

Quick Start

1. Install Dependencies

2. Set Up Your API Key

3. Run the Application

4. Use the Application

How It Works

Tech Stack

Project Structure

💡 Tips

🔧 Troubleshooting

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env		.env
README.md		README.md
example1.png		example1.png
example2.png		example2.png
imgcap.py		imgcap.py
requirements.txt		requirements.txt
run_test.py		run_test.py

Folders and files

Latest commit

History

Repository files navigation

🎯 Features

Quick Start

1. Install Dependencies

2. Set Up Your API Key

3. Run the Application

4. Use the Application

How It Works

Tech Stack

Project Structure

💡 Tips

🔧 Troubleshooting

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages