diff --git a/.cursor/rules/rp-styleguide.mdc b/.cursor/rules/rp-styleguide.mdc
index 9c4fefc5..e6529a1e 100644
--- a/.cursor/rules/rp-styleguide.mdc
+++ b/.cursor/rules/rp-styleguide.mdc
@@ -5,7 +5,7 @@ alwaysApply: true
 ---
 
 Always use sentence case for headings and titles.
-These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Tetra.
+These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash.
 These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume.
 
 Prefer using paragraphs to bullet points unless directly asked.
diff --git a/CLAUDE.md b/CLAUDE.md
index 3e7dae0c..0be9f6d3 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -98,7 +98,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev
 ### Capitalization and Terminology
 
 - **Always use sentence case** for headings and titles
-- **Proper nouns**: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Tetra
+- **Proper nouns**: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash
 - **Generic terms** (lowercase): endpoint, worker, cluster, template, handler, fine-tune, network volume
 
 ### Writing Style
diff --git a/docs.json b/docs.json
index cfabedbb..358e8951 100644
--- a/docs.json
+++ b/docs.json
@@ -118,6 +118,42 @@
               }
             ]
           },
+          {
+            "group": "Flash",
+            "pages": [
+              "flash/overview",
+              "flash/quickstart",
+              "flash/pricing",
+              "flash/remote-functions",
+              "flash/resource-configuration",
+              "flash/monitoring",
+              "flash/custom-docker-images",
+              {
+                "group": "Build apps",
+                "pages": [
+                  "flash/apps/overview",
+                  "flash/apps/build-app",
+                  "flash/apps/initialize-project",
+                  "flash/apps/local-testing",
+                  "flash/apps/apps-and-environments",
+                  "flash/apps/deploy-apps"
+                ]
+              },
+              {
+                "group": "CLI reference",
+                "pages": [
+                  "flash/cli/overview",
+                  "flash/cli/init",
+                  "flash/cli/run",
+                  "flash/cli/build",
+                  "flash/cli/deploy",
+                  "flash/cli/env",
+                  "flash/cli/app",
+                  "flash/cli/undeploy"
+                ]
+              }
+            ]
+          },
           {
             "group": "Pods",
             "pages": [
@@ -357,6 +393,13 @@
               "tutorials/serverless/run-gemma-7b"
             ]
           },
+          {
+            "group": "Flash",
+            "pages": [
+              "tutorials/flash/text-generation-with-transformers",
+              "tutorials/flash/image-generation-with-sdxl"
+            ]
+          },
           {
             "group": "Pods",
             "pages": [
diff --git a/flash/apps/apps-and-environments.mdx b/flash/apps/apps-and-environments.mdx
new file mode 100644
index 00000000..5911118c
--- /dev/null
+++ b/flash/apps/apps-and-environments.mdx
@@ -0,0 +1,207 @@
+---
+title: "Manage apps and environments"
+sidebarTitle: "Manage apps and environments"
+description: "Manage Flash apps and environments with the flash app and flash env commands."
+tag: "BETA"
+---
+
+This page covers practical commands and workflows for managing Flash apps and environments. For a conceptual overview of the deployment hierarchy, see the [development lifecycle guide](/flash/apps/overview).
+
+## Flash apps
+
+A **Flash app** is a namespace registered on Runpod's backend that groups all resources for a single project, including environments, builds, and configuration. The app itself is just metadata—actual cloud resources (endpoints, volumes) are created when you deploy to an environment.
+
+### App hierarchy
+
+```text
+Flash App (my-project)
+│
+├── Environments
+│   ├── dev
+│   │   ├── Endpoints (gpu-worker, cpu-worker)
+│   │   └── Volumes (model-cache)
+│   ├── staging
+│   │   ├── Endpoints (gpu-worker, cpu-worker)
+│   │   └── Volumes (model-cache)
+│   └── production
+│       ├── Endpoints (gpu-worker, cpu-worker)
+│       └── Volumes (model-cache)
+│
+└── Builds
+    ├── build_v1 (2024-01-15)
+    ├── build_v2 (2024-01-18)
+    └── build_v3 (2024-01-20)
+```
+
+### Creating apps
+
+Apps are created automatically when you first run `flash deploy`. You can also register them explicitly:
+
+```bash
+flash app create APP_NAME
+```
+
+This registers the app namespace on Runpod's backend but doesn't create any cloud resources or local files.
+
+### Managing apps
+
+Use `flash app` commands to manage your apps:
+
+```bash
+# List all apps
+flash app list
+
+# Get app details
+flash app get APP_NAME
+
+# Delete an app and all its resources
+flash app delete --app APP_NAME
+```
+
+<Warning>
+
+Deleting an app removes all environments, builds, endpoints, and volumes associated with it. This operation is irreversible.
+
+</Warning>
+
+## Understanding builds and deployments
+
+When you run `flash deploy`, three things happen on Runpod:
+
+### 1. Build artifact is uploaded
+
+Flash creates a **tarball** (`.flash/artifact.tar.gz`) containing:
+
+- Your Python code (`main.py`, `workers/`, etc.).
+- Pre-installed dependencies (bundled during build).
+- Deployment manifest (`flash_manifest.json`).
+- Auto-generated handler code.
+
+This tarball is uploaded to Runpod's storage and associated with your app as a "build."
+
+### 2. Serverless endpoints are provisioned
+
+For each resource in the manifest, Flash creates a Serverless endpoint:
+
+**Mothership endpoint** (Load-Balanced):
+
+- Runs your FastAPI app from `main.py`.
+- Provides the public HTTPS URL for users.
+- Orchestrates calls to worker endpoints.
+- Uses pre-built image: `runpod/flash-lb-cpu:latest`
+
+**Worker endpoints** (Queue-Based):
+
+- Execute your `@remote` functions.
+- Scale automatically based on load.
+- Run on GPUs or CPUs based on configuration.
+- Uses pre-built images: `runpod/flash:latest` (GPU) or `runpod/flash-cpu:latest` (CPU).
+
+### 3. Environment is activated
+
+The environment is linked to:
+
+- The uploaded build (specific version of your code).
+- The provisioned endpoints (running infrastructure).
+- Deployment state (health, status, metrics).
+
+**Key insight:** You're **not** building custom Docker images. The Flash images are pre-built and generic—they extract your tarball and run your code. This is why deployments are fast (no image build step) and limited to 500 MB (only code and dependencies, not full Docker images).
+
+## Environments
+
+An **environment** is an isolated deployment stage within a Flash app (e.g., `dev`, `staging`, `production`). Each environment has its own endpoints, build version, volumes, and deployment state. Environments are completely independent.
+
+### Creating environments
+
+Environments are created automatically when you deploy with `--env`:
+
+```bash
+# Creates 'staging' environment if it doesn't exist
+flash deploy --env staging
+```
+
+You can also create them explicitly:
+
+```bash
+flash env create staging
+```
+
+### Managing environments
+
+Use `flash env` commands to manage environments:
+
+```bash
+# List all environments
+flash env list
+
+# Get environment details
+flash env get production
+
+# Delete an environment
+flash env delete dev
+```
+
+### Environment states
+
+| State | Description |
+|-------|-------------|
+| PENDING | Environment created but not deployed |
+| DEPLOYING | Deployment in progress |
+| DEPLOYED | Successfully deployed and running |
+| FAILED | Deployment or health check failed |
+| DELETING | Deletion in progress |
+
+## Best practices
+
+### Naming conventions
+
+Use clear, descriptive names:
+
+```bash
+# Good
+flash env create dev
+flash env create staging
+flash env create production
+
+# Avoid
+flash env create env1
+flash env create test123
+```
+
+### Environment strategy
+
+**Three-tier approach** (recommended for teams):
+
+| Environment | Purpose |
+|-------------|---------|
+| `dev` | Active development, frequent deploys |
+| `staging` | Pre-production testing, QA validation |
+| `production` | Live user-facing deployment |
+
+**Simple approach** (small projects):
+
+| Environment | Purpose |
+|-------------|---------|
+| `dev` | Development and testing |
+| `production` | Live deployment |
+
+### Workflow recommendations
+
+1. **Develop locally**: Test with `flash run` before deploying.
+2. **Deploy to dev**: `flash deploy --env dev` for initial testing.
+3. **Deploy to staging**: `flash deploy --env staging` for QA.
+4. **Deploy to production**: `flash deploy --env production` after approval.
+
+### Resource management
+
+- Monitor environments regularly with `flash env list`
+- Clean up unused environments to avoid resource accumulation.
+- Check resource usage with `flash env get <name>`
+- Delete environments carefully as deletion is irreversible.
+
+## Next steps
+
+- [Deploy your first app](/flash/apps/deploy-apps) with `flash deploy`
+- [Learn about the CLI](/flash/cli/overview) for all available commands.
+- [View the env command reference](/flash/cli/env) for detailed options.
+- [View the app command reference](/flash/cli/app) for detailed options.
diff --git a/flash/apps/build-app.mdx b/flash/apps/build-app.mdx
new file mode 100644
index 00000000..5009330e
--- /dev/null
+++ b/flash/apps/build-app.mdx
@@ -0,0 +1,269 @@
+---
+title: "Build a Flash app"
+sidebarTitle: "Build a Flash app"
+description: "Create a Flash app, test it locally, and deploy it to production."
+tag: "BETA"
+---
+
+Flash apps let you build FastAPI apps to serve AI/ML workloads on Runpod Serverless. This guide walks you through the process of building a Flash app from scratch, from project initialization and local testing to production deployment.
+
+<Tip>
+If you haven't already, we recommend starting with the [Quickstart](/flash/quickstart) guide to get a feel for how Flash `@remote` functions work.
+</Tip>
+
+## Requirements:
+
+- You've [created a Runpod account](/get-started/manage-accounts).
+- You've [created a Runpod API key](/get-started/api-keys).
+- You've installed [Python 3.10 (or higher)](https://www.python.org/downloads/).
+
+## What you'll learn
+
+In this tutorial you'll learn how to:
+
+- Create a new Flash project with a template structure.
+- Explore the project template.
+- Install Python dependencies.
+- Add your API key to the environment.
+- Start the local development server.
+- Test the API endpoint using cURL.
+- Open the API explorer.
+- Customize your API endpoint.
+- Deploy to production.
+
+## Step 1: Initialize a new project
+
+Create a new directory and Python virtual environment:
+
+```bash
+# Create the project directory and navigate into it:
+mkdir flash_app
+cd flash_app
+
+# Install Flash:
+python3 -m venv venv
+source venv/bin/activate
+pip install runpod-flash
+```
+
+Use the `flash init` command to generate a structured project template with a preconfigured FastAPI application entry point:
+
+```bash
+flash init
+```
+
+Make sure your API key is set in the environment, either by creating a `.env` file or exporting the `RUNPOD_API_KEY` environment variable:
+
+```bash
+# Set the API key as an environment variable:
+export RUNPOD_API_KEY=YOUR_API_KEY
+
+# Or create a `.env` file:
+touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
+```
+
+Replace `YOUR_API_KEY` with your actual Runpod API key.
+
+## Step 2: Explore the project template
+
+This is the structure of the project template created by `flash init`:
+
+```text
+flash_app/
+├── main.py                    # FastAPI application entry point
+├── workers/
+│   ├── gpu/                   # GPU worker example
+│   │   ├── __init__.py        # FastAPI router
+│   │   └── endpoint.py        # GPU script with @remote decorated function
+│   └── cpu/                   # CPU worker example
+│       ├── __init__.py        # FastAPI router
+│       └── endpoint.py        # CPU script with @remote decorated function
+├── .gitignore                 # Git ignore patterns
+├── .flashignore               # Flash deployment ignore patterns
+├── requirements.txt           # Python dependencies
+└── README.md                  # Project documentation
+```
+
+This template includes:
+
+- A FastAPI application entry point and routers.
+- Templates for `requirements.txt`, `.env`, `.gitignore`, etc.
+- Flash scripts (`endpoint.py`) for both GPU and CPU workers, which include:
+  - Pre-configured worker scaling limits using the `LiveServerless()` object.
+  - A `@remote` decorated function that returns a response from a worker.
+
+When you start the FastAPI server, it creates API endpoints at `/gpu/hello` and `/cpu/hello`, which call the remote function described in their respective `endpoint.py` files.
+
+## Step 3: Install Python dependencies
+
+Install required dependencies:
+
+```bash
+pip install -r requirements.txt
+```
+
+## Step 4: Configure your API key
+
+Open the `.env` template file in a text editor and add your [Runpod API key](/get-started/api-keys):
+
+```bash
+# Use your text editor of choice, e.g.
+cursor .env
+```
+
+Remove the `#` symbol from the beginning of the `RUNPOD_API_KEY` line and replace `your_api_key_here` with your actual Runpod API key:
+
+```text
+RUNPOD_API_KEY=your_api_key_here
+# FLASH_HOST=localhost
+# FLASH_PORT=8888
+# LOG_LEVEL=INFO
+```
+
+Save the file and close it.
+
+## Step 5: Start the local API server
+
+Use `flash run` to start the API server:
+
+```bash
+flash run
+```
+
+Open a new terminal tab or window and test your GPU API using cURL:
+
+```bash
+curl -X POST http://localhost:8888/gpu/hello \
+    -H "Content-Type: application/json" \
+    -d '{"message": "Hello from the GPU!"}'
+```
+
+If you switch back to the terminal tab where you used `flash run`, you'll see the details of the job's progress.
+
+### Faster testing with auto-provisioning
+
+For development with multiple endpoints, use `--auto-provision` to deploy all resources before testing:
+
+```bash
+flash run --auto-provision
+```
+
+This eliminates cold-start delays by provisioning all serverless endpoints upfront. Endpoints are cached and reused across server restarts, making subsequent runs faster. Resources are identified by name, so the same endpoint won't be re-deployed if the configuration hasn't changed.
+
+## Step 6: Open the API explorer
+
+Besides starting the API server, `flash run` also starts an interactive API explorer. Point your web browser at [http://localhost:8888/docs](http://localhost:8888/docs) to explore the API.
+
+To run remote functions in the explorer:
+
+1. Expand one of the functions under **GPU Workers** or **CPU Workers**.
+2. Click **Try it out** and then **Execute**.
+
+You'll get a response from your workers right in the explorer.
+
+## Step 7: Customize your API
+
+To customize your API endpoint and functionality:
+
+1. Add or edit remote functions in your `endpoint.py` files.
+2. Test the scripts individually by running `python endpoint.py`.
+3. Configure your FastAPI routers by editing the `__init__.py` files.
+4. Add any new endpoints to your `main.py` file.
+
+### Example: Adding a custom endpoint
+
+To add a new GPU endpoint for image generation:
+
+1. Create a new file at `workers/gpu/image_gen.py`:
+
+```python
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+config = LiveServerless(
+    name="image-generator",
+    gpus=[GpuGroup.AMPERE_24],
+    workersMax=2
+)
+
+@remote(
+    resource_config=config,
+    dependencies=["diffusers", "torch", "transformers"]
+)
+def generate_image(prompt: str, width: int = 512, height: int = 512):
+    import torch
+    from diffusers import StableDiffusionPipeline
+    import base64
+    import io
+
+    pipeline = StableDiffusionPipeline.from_pretrained(
+        "runwayml/stable-diffusion-v1-5",
+        torch_dtype=torch.float16
+    ).to("cuda")
+
+    image = pipeline(prompt=prompt, width=width, height=height).images[0]
+
+    buffered = io.BytesIO()
+    image.save(buffered, format="PNG")
+    img_str = base64.b64encode(buffered.getvalue()).decode()
+
+    return {"image": img_str, "prompt": prompt}
+```
+
+2. Add a route in `workers/gpu/__init__.py`:
+
+```python
+from fastapi import APIRouter
+from .image_gen import generate_image
+
+router = APIRouter()
+
+@router.post("/generate")
+async def generate(prompt: str, width: int = 512, height: int = 512):
+    result = await generate_image(prompt, width, height)
+    return result
+```
+
+3. Include the router in `main.py` if not already included.
+
+## Step 8: Deploy to Runpod
+
+When you're ready to deploy your app to Runpod, use `flash deploy`:
+
+```bash
+flash deploy
+```
+
+This command:
+
+1. Builds your application into a deployment artifact.
+2. Uploads it to Runpod's storage.
+3. Provisions Serverless endpoints for your `@remote` functions.
+4. Deploys your FastAPI application as the "mothership" endpoint.
+
+After deployment, you'll receive a public URL for your API:
+
+```text
+Your mothership is deployed at:
+https://api-xxxxx.runpod.net
+
+Available Routes:
+POST   /gpu/hello
+POST   /cpu/hello
+```
+
+All requests to the deployed app require authentication with your Runpod API key:
+
+```bash
+curl -X POST https://api-xxxxx.runpod.net/gpu/hello \
+    -H "Authorization: Bearer $RUNPOD_API_KEY" \
+    -H "Content-Type: application/json" \
+    -d '{"message": "Hello from production!"}'
+```
+
+For detailed deployment options including environment management, see [Deploy Flash apps](/flash/apps/deploy-apps).
+
+## Next steps
+
+- [Deploy Flash applications](/flash/apps/deploy-apps) for production use.
+- [Configure resources](/flash/resource-configuration) for your endpoints.
+- [Monitor and debug](/flash/monitoring) your endpoints.
diff --git a/flash/apps/deploy-apps.mdx b/flash/apps/deploy-apps.mdx
new file mode 100644
index 00000000..5f5d8212
--- /dev/null
+++ b/flash/apps/deploy-apps.mdx
@@ -0,0 +1,318 @@
+---
+title: "Deploy Flash apps to Runpod"
+sidebarTitle: "Deploy to Runpod"
+description: "Bild and deploy your FastAPI app to Runpod."
+tag: "BETA"
+---
+
+Flash provides a complete deployment workflow for taking your local development project to production. Use `flash deploy` to build and deploy your application in a single command, or use `flash build` for more control over the build process.
+
+
+## Deployment workflow
+
+A typical deployment workflow looks like this:
+
+1. **Create a new project**: Use [`flash init`](/flash/cli/init) to create a new project.
+2. **Develop locally**: Use [`flash run`](/flash/cli/run) to test your application. Any functions decorated with `@remote` will be run on Runpod Serverless workers.
+3. **Preview** (optional): Use [`flash deploy --preview`](/flash/cli/deploy) to test locally with Docker.
+4. **Deploy**: Use [`flash deploy`](/flash/cli/deploy) to push to Runpod Serverless.
+5. **Manage**: Use [`flash env`](/flash/cli/env) and [`flash app`](/flash/cli/app) to manage your deployments.
+
+## Deploy your application
+
+When you're satisfied with your `@remote` functions and ready to move to production, use `flash deploy` to build and deploy your Flash application:
+
+```bash
+flash deploy
+```
+
+This command performs the following steps:
+
+1. **Build**: Packages your code, dependencies, and manifest.
+2. **Upload**: Sends the artifact to Runpod's storage.
+3. **Provision**: Creates or updates Serverless endpoints.
+4. **Configure**: Sets up environment variables and service discovery.
+5. **Verify**: Confirms endpoints are healthy.
+
+### Deployment architecture
+
+After deployment, your entire application runs on Runpod Serverless:
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+    Users(["USERS"])
+
+    subgraph Runpod ["RUNPOD SERVERLESS"]
+        Mothership["MOTHERSHIP ENDPOINT<br/>(your FastAPI app from main.py)<br/>• Your HTTP routes<br/>• Orchestrates @remote calls<br/>• Public URL for users"]
+        GPU["gpu-worker<br/>(your @remote function)"]
+        CPU["cpu-worker<br/>(your @remote function)"]
+
+        Mothership -->|"internal"| GPU
+        Mothership -->|"internal"| CPU
+    end
+
+    Users -->|"HTTPS (authenticated)"| Mothership
+
+    style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style Users fill:#4D38F5,stroke:#4D38F5,color:#fff
+    style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style GPU fill:#22C55E,stroke:#22C55E,color:#000
+    style CPU fill:#22C55E,stroke:#22C55E,color:#000
+```
+
+### Deploy to an environment
+
+Flash organizes deployments using [apps and environments](/flash/apps/apps-and-environments). Deploy to a specific environment using the `--env` flag:
+
+```bash
+# Deploy to staging
+flash deploy --env staging
+
+# Deploy to production
+flash deploy --env production
+```
+
+If the specified environment doesn't exist, Flash creates it automatically.
+
+### Post-deployment
+
+After a successful deployment, Flash displays:
+
+- The public URL for your application.
+- Available routes from your `@remote` decorated functions.
+- Instructions for authenticating requests.
+
+```text
+✓ Deployment Complete
+
+Your mothership is deployed at:
+https://api-xxxxx.runpod.net
+
+Available Routes:
+POST   /api/hello
+POST   /gpu/process
+
+All endpoints require authentication:
+curl -X POST https://api-xxxxx.runpod.net/api/hello \
+    -H "Authorization: Bearer $RUNPOD_API_KEY" \
+    -H "Content-Type: application/json" \
+    -d '{"message": "Hello"}'
+```
+
+## Preview before deploying
+
+Test your deployment locally using Docker before pushing to production:
+
+```bash
+flash deploy --preview
+```
+
+This command:
+
+1. Builds your project (creates the archive and manifest).
+2. Creates a Docker network for inter-container communication.
+3. Starts one container per resource config (mothership + workers).
+4. Exposes the mothership on `localhost:8000`.
+
+Use preview mode to:
+
+- Validate your deployment configuration.
+- Test cross-endpoint function calls.
+- Debug resource provisioning issues.
+- Verify the manifest structure.
+
+Press `Ctrl+C` to stop the preview environment.
+
+## Managing deployment size
+
+Runpod Serverless has a **500MB deployment limit**. If your deployment exceeds this limit, use the `--exclude` flag to skip packages already included in your base worker image:
+
+```bash
+# Exclude PyTorch packages (pre-installed in GPU images)
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+### Base image packages
+
+Which packages to exclude depends on your resource configuration:
+
+| Resource type | Base image | Pre-installed packages |
+|--------------|------------|------------------------|
+| GPU (`LiveServerless` with `gpus`) | PyTorch base | `torch`, `torchvision`, `torchaudio` |
+| CPU (`LiveServerless` with `instanceIds`) | Python slim | None |
+| Load-balanced | Same as GPU/CPU | Same as GPU/CPU |
+
+<Tip>
+
+Check the [worker-flash repository](https://github.com/runpod-workers/worker-flash) for current base images and pre-installed packages.
+
+</Tip>
+
+## Build process
+
+When you run `flash deploy` (or `flash build`), Flash:
+
+1. **Discovers** all `@remote` decorated functions.
+2. **Groups** functions by their `resource_config`.
+3. **Generates** handler files for each resource config.
+4. **Creates** a `flash_manifest.json` file for service discovery.
+5. **Installs** dependencies with Linux x86_64 compatibility.
+6. **Packages** everything into `.flash/artifact.tar.gz`.
+
+### Cross-platform builds
+
+Flash automatically handles cross-platform builds. You can build on macOS, Windows, or Linux, and the resulting package will run correctly on Runpod's Linux x86_64 infrastructure.
+
+### Build artifacts
+
+After building, these artifacts are created in the `.flash/` directory:
+
+| Artifact | Description |
+|----------|-------------|
+| `.flash/artifact.tar.gz` | Deployment package |
+| `.flash/flash_manifest.json` | Service discovery configuration |
+| `.flash/.build/` | Temporary build directory (removed by default) |
+
+## What gets deployed to Runpod
+
+When you deploy a Flash app, you're deploying a **build artifact** (tarball) onto pre-built Flash Docker images. This architecture is similar to AWS Lambda layers: the base runtime is pre-built, and your code and dependencies are layered on top.
+
+### The build artifact
+
+The `.flash/artifact.tar.gz` file (max 500 MB) contains:
+
+```text
+artifact.tar.gz/
+├── main.py                      # Your FastAPI application
+├── workers/                     # Your worker modules
+│   ├── gpu/
+│   │   └── endpoint.py          # Functions with @remote
+│   └── cpu/
+│       └── endpoint.py
+├── flash_manifest.json          # Deployment manifest (critical!)
+├── requirements.txt             # (empty or minimal)
+└── [installed dependencies]/    # All pip packages bundled
+    ├── numpy/
+    ├── torch/
+    └── ...
+```
+
+Dependencies are installed locally during the build process and bundled into the tarball. They are **not** installed at runtime on Runpod workers.
+
+### The deployment manifest
+
+The `flash_manifest.json` file is the brain of your deployment. It tells each endpoint:
+
+- Which functions to execute.
+- What Docker image to use.
+- How to configure resources (GPUs, workers, scaling).
+- How to route HTTP requests (for load balancer endpoints).
+
+```json
+{
+  "resources": {
+    "mothership": {
+      "resource_type": "CpuLiveLoadBalancer",
+      "is_mothership": true,
+      "imageName": "runpod/flash-lb-cpu:latest",
+      "main_file": "main.py",
+      "app_variable": "app",
+      "functions": [...]
+    },
+    "gpu-worker": {
+      "resource_type": "LiveServerless",
+      "imageName": "runpod/flash:latest",
+      "gpuIds": "...",
+      "workersMax": 5,
+      "functions": [
+        {
+          "name": "process_on_gpu",
+          "module": "workers.gpu.endpoint"
+        }
+      ]
+    }
+  },
+  "routes": {
+    "mothership": {
+      "POST /api/process": "process_endpoint"
+    }
+  }
+}
+```
+
+### What gets created on Runpod
+
+For each resource in the manifest, Flash creates a Serverless endpoint:
+
+**Mothership/orchestrator endpoint ([load balancer](/serverless/load-balancing/overview))**
+
+- **Purpose**: Receives HTTP requests, orchestrates `@remote` calls.
+- **Image**: Pre-built `runpod/flash-lb-cpu:latest` or `runpod/flash-lb:latest`
+- **Startup process**:
+  1. Container extracts your tarball.
+  2. Auto-generated handler imports your `main.py`.
+  3. FastAPI routes are mounted.
+  4. Uvicorn server starts.
+
+**Worker endpoints (queue-based by default)**
+
+- **Purpose**: Execute compute-intensive `@remote` functions.
+- **Image**: Pre-built `runpod/flash:latest` (GPU) or `runpod/flash-cpu:latest` (CPU)
+- **Startup process**:
+  1. Container extracts your tarball.
+  2. Your worker modules are imported.
+  3. Function registry is created (maps function names to actual function objects).
+  4. Workers listen for jobs with function name + serialized arguments.
+
+## Troubleshooting
+
+### No @remote functions found
+
+If the build process can't find your remote functions:
+
+- Ensure functions are decorated with `@remote(resource_config=...)`.
+- Check that Python files aren't excluded by `.gitignore` or `.flashignore`.
+- Verify decorator syntax is correct.
+
+### Deployment size limit exceeded
+
+If your deployment exceeds 500MB:
+
+```bash
+# Exclude packages already in base image
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+### Authentication errors
+
+Verify your API key is set correctly:
+
+```bash
+echo $RUNPOD_API_KEY
+```
+
+If not set, add it to your `.env` file or export it:
+
+```bash
+export RUNPOD_API_KEY=your_api_key_here
+```
+
+### Import errors in remote functions
+
+Import packages inside the remote function, not at the top of the file:
+
+```python
+@remote(resource_config=config, dependencies=["requests"])
+def fetch_data(url):
+    import requests  # Import here
+    return requests.get(url).json()
+```
+
+## Next steps
+
+- [Learn about apps and environments](/flash/apps/apps-and-environments) for managing deployments.
+- [View the CLI reference](/flash/cli/overview) for all available commands.
+- [Configure resources](/flash/resource-configuration) for your endpoints.
+- [Monitor and debug](/flash/monitoring) your deployments.
diff --git a/flash/apps/initialize-project.mdx b/flash/apps/initialize-project.mdx
new file mode 100644
index 00000000..8f8ce009
--- /dev/null
+++ b/flash/apps/initialize-project.mdx
@@ -0,0 +1,209 @@
+---
+title: "Initialize a Flash app project"
+sidebarTitle: "Initialize a project"
+description: "Use flash init to create a new Flash project with a ready-to-use structure."
+tag: "BETA"
+---
+
+The `flash init` command creates a new Flash project with a complete project structure, including a FastAPI server, example GPU and CPU workers, and configuration files. This gives you a working starting point for building Flash applications.
+
+Use `flash init` whenever you want to start a new Flash project, fully configured for you to run `flash run` and `flash deploy`.
+
+## Create a new project
+
+Create a new project in a new directory:
+
+```bash
+flash init PROJECT_NAME
+cd PROJECT_NAME
+```
+
+Or initialize in your current directory:
+
+```bash
+flash init .
+```
+
+## Project structure
+
+`flash init` creates the following structure:
+
+```text
+PROJECT_NAME/
+├── main.py              # FastAPI application entry point
+├── mothership.py        # Mothership endpoint configuration
+├── workers/
+│   ├── gpu/             # GPU worker
+│   │   ├── __init__.py
+│   │   └── endpoint.py
+│   └── cpu/             # CPU worker
+│       ├── __init__.py
+│       └── endpoint.py
+├── .env.example         # Environment variables template
+├── .flashignore         # Files to exclude from deployment
+├── .gitignore           # Git ignore patterns
+├── pyproject.toml       # Python project configuration
+├── requirements.txt     # Python dependencies
+└── README.md            # Project documentation
+```
+
+### Key files
+
+**main.py**: The FastAPI application that imports and registers your worker routers.
+
+**mothership.py**: Configuration for the "mothership" endpoint—the main entry point that orchestrates calls to your workers when deployed.
+
+**workers/gpu/endpoint.py**: An example GPU worker with a `@remote` decorated function. This is where you define functions that run on GPU workers.
+
+**workers/cpu/endpoint.py**: An example CPU worker for tasks that don't require GPU acceleration.
+
+**.flashignore**: Lists files and directories to exclude from the deployment artifact (similar to `.gitignore`).
+
+## Set up the project
+
+After initialization, complete the setup:
+
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Copy environment template
+cp .env.example .env
+
+# Add your API key to .env
+# RUNPOD_API_KEY=your_api_key_here
+```
+
+## How it fits into the workflow
+
+`flash init` is the first step in the Flash development workflow:
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart LR
+    Init["flash init"]
+    Dev["flash run"]
+    Deploy["flash deploy"]
+
+    Init -->|"Create project"| Dev
+    Dev -->|"Test locally"| Deploy
+
+    style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style Dev fill:#22C55E,stroke:#22C55E,color:#000
+    style Deploy fill:#4D38F5,stroke:#4D38F5,color:#fff
+```
+
+1. **`flash init`**: Creates project structure and boilerplate.
+2. **`flash run`**: Starts local development server for testing.
+3. **`flash deploy`**: Builds and deploys to Runpod Serverless.
+
+## Customize your project
+
+### Add a new GPU endpoint
+
+To add a new GPU endpoint, you need to create a new file in the `workers/gpu/` directory. This file will contain the code for the endpoint and be automatically included in the FastAPI app.
+
+1. Create a new file in `workers/gpu/` with the name of the endpoint. For example, `inference.py`:
+
+```python
+# workers/gpu/inference.py
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+config = LiveServerless(
+    name="inference-worker",
+    gpus=[GpuGroup.ADA_24],
+    workersMax=3,
+)
+
+@remote(resource_config=config, dependencies=["transformers", "torch"])
+def run_inference(prompt: str) -> dict:
+    # Rember to import endpoint dependencies inside the function.
+    from transformers import pipeline
+
+    generator = pipeline("text-generation", model="gpt2")
+    result = generator(prompt, max_length=50)
+    return {"output": result[0]["generated_text"]}
+```
+
+2. Add a route in `workers/gpu/__init__.py`:
+
+```python
+from fastapi import APIRouter
+from .inference import run_inference
+
+router = APIRouter(prefix="/gpu", tags=["GPU Workers"])
+
+@router.post("/inference")
+async def inference_endpoint(prompt: str):
+    result = await run_inference(prompt)
+    return result
+```
+
+3. The router is automatically included via `main.py`.
+
+### Add a CPU endpoint
+
+Follow the same pattern in `workers/cpu/`. CPU endpoints use `instanceIds` instead of `gpus`:
+
+```python
+from runpod_flash import remote, LiveServerless, CpuInstanceType
+
+config = LiveServerless(
+    name="data-processor",
+    instanceIds=[CpuInstanceType.CPU5C_4_8],
+    workersMax=2,
+)
+
+@remote(resource_config=config, dependencies=["pandas"])
+def process_data(data: list) -> dict:
+    import pandas as pd
+    df = pd.DataFrame(data)
+    return df.describe().to_dict()
+```
+
+## Handle existing files
+
+If you run `flash init` in a directory with existing files, Flash detects conflicts and prompts for confirmation:
+
+```text
+┌─ File Conflicts Detected ─────────────────────┐
+│ Warning: The following files will be          │
+│ overwritten:                                  │
+│                                               │
+│   • main.py                                   │
+│   • requirements.txt                          │
+└───────────────────────────────────────────────┘
+Continue and overwrite these files? [y/N]:
+```
+
+Use `--force` to skip the prompt and overwrite files:
+
+```bash
+flash init . --force
+```
+
+## Start developing
+
+Once your project is set up:
+
+```bash
+# Start the development server
+flash run
+
+# Open the API explorer
+# http://localhost:8888/docs
+```
+
+Make changes to your workers, and the server reloads automatically. When you're ready, deploy with:
+
+```bash
+flash deploy
+```
+
+## Next steps
+
+- [Test locally](/flash/apps/local-testing) with `flash run`
+- [Build your app](/flash/apps/build-app) by customizing workers.
+- [Deploy to production](/flash/apps/deploy-apps) with `flash deploy`
+- [View the flash init reference](/flash/cli/init) for all options.
diff --git a/flash/apps/local-testing.mdx b/flash/apps/local-testing.mdx
new file mode 100644
index 00000000..3ae547de
--- /dev/null
+++ b/flash/apps/local-testing.mdx
@@ -0,0 +1,174 @@
+---
+title: "Test Flash apps locally"
+sidebarTitle: "Test locally"
+description: "Use flash run to test your Flash application locally before deploying."
+tag: "BETA"
+---
+
+The `flash run` command starts a local development server that lets you test your Flash application before deploying to production. Your FastAPI app runs locally and updates automatically as you edit files. When you call a `@remote` function, Flash sends the latest function code to Serverless workers on Runpod, so your changes are reflected immediately.
+
+Use `flash run` when you want to:
+
+- Iterate quickly with automatic code updates.
+- Test `@remote` functions against real GPU/CPU workers.
+- Debug request/response handling before deployment.
+- Develop without redeploying after every change.
+
+## Start the development server
+
+From inside your [project directory](/flash/apps/initialize-project), run:
+
+```bash
+flash run
+```
+
+The server starts at `http://localhost:8888` by default. Your FastAPI routes are available immediately, and `@remote` functions provision Serverless endpoints on first call.
+
+### Custom host and port
+
+```bash
+# Change port
+flash run --port 3000
+
+# Make accessible on network
+flash run --host 0.0.0.0
+```
+
+## Test your endpoints
+
+### Using curl
+
+```bash
+curl -X POST http://localhost:8888/gpu/hello \
+  -H "Content-Type: application/json" \
+  -d '{"name": "Flash"}'
+```
+
+### Using the API explorer
+
+Open [http://localhost:8888/docs](http://localhost:8888/docs) in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser.
+
+### Using Python
+
+```python
+import requests
+
+response = requests.post(
+    "http://localhost:8888/gpu/hello",
+    json={"name": "Flash"}
+)
+print(response.json())
+```
+
+## Reduce cold-start delays
+
+The first call to a `@remote` function provisions a Serverless endpoint, which takes 30-60 seconds. Use `--auto-provision` to provision all endpoints at startup:
+
+```bash
+flash run --auto-provision
+```
+
+This scans your project for `@remote` functions and deploys them before the server starts accepting requests. Endpoints are cached in `.runpod/resources.pkl` and reused across server restarts.
+
+## How it works
+
+With `flash run`, your system runs in a hybrid architecture:
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+    subgraph Local ["YOUR MACHINE (localhost:8888)"]
+        FastAPI["FastAPI App<br/>• Updates automatically<br/>• Your HTTP routes"]
+    end
+
+    subgraph Runpod ["RUNPOD SERVERLESS"]
+        GPU["live-gpu-worker"]
+        CPU["live-cpu-worker"]
+    end
+
+    FastAPI -->|"HTTPS"| GPU
+    FastAPI -->|"HTTPS"| CPU
+
+    style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style GPU fill:#22C55E,stroke:#22C55E,color:#000
+    style CPU fill:#22C55E,stroke:#22C55E,color:#000
+```
+
+**What runs where:**
+
+| Component | Location |
+|-----------|----------|
+| FastAPI app (`main.py`) | Your machine |
+| HTTP routes | Your machine |
+| `@remote` function code | Runpod Serverless |
+
+Your code updates automatically as you edit files. Endpoints created by `flash run` are prefixed with `live-` to distinguish them from production endpoints.
+
+## Development workflow
+
+A typical development cycle looks like this:
+
+1. Start the server: `flash run`
+2. Make changes to your code.
+3. The server reloads automatically.
+4. Test your changes via curl or the API explorer.
+5. Repeat until ready to deploy.
+
+When you're done, use `flash undeploy` to clean up the `live-` endpoints created during development.
+
+## Differences from production
+
+| Aspect | `flash run` | `flash deploy` |
+|--------|-------------|----------------|
+| FastAPI app runs on | Your machine | Runpod Serverless |
+| Endpoint naming | `live-` prefix | No prefix |
+| Automatic updates | Yes | No |
+| Authentication | Not required | Required |
+
+## Clean up after testing
+
+Endpoints created by `flash run` persist until you delete them. To clean up:
+
+```bash
+# List all endpoints
+flash undeploy list
+
+# Remove a specific endpoint
+flash undeploy ENDPOINT_NAME
+
+# Remove all endpoints
+flash undeploy --all
+```
+
+## Troubleshooting
+
+**Port already in use**
+
+```bash
+flash run --port 3000
+```
+
+**Slow first request**
+
+Use `--auto-provision` to eliminate cold-start delays:
+
+```bash
+flash run --auto-provision
+```
+
+**Authentication errors**
+
+Ensure `RUNPOD_API_KEY` is set in your `.env` file or environment:
+
+```bash
+export RUNPOD_API_KEY=your_api_key_here
+```
+
+## Next steps
+
+- [Deploy to production](/flash/apps/deploy-apps) when your app is ready.
+- [Clean up endpoints](/flash/cli/undeploy) after testing.
+- [View the flash run reference](/flash/cli/run) for all options.
diff --git a/flash/apps/overview.mdx b/flash/apps/overview.mdx
new file mode 100644
index 00000000..74be9764
--- /dev/null
+++ b/flash/apps/overview.mdx
@@ -0,0 +1,267 @@
+---
+title: "Overview"
+sidebarTitle: "Overview"
+description: "Understand the Flash app development lifecycle."
+tag: "BETA"
+---
+
+A Flash app is a **FastAPI application with GPU/CPU workers** deployed to Runpod Serverless. When you deploy an app, Runpod:
+
+1. Packages your code, dependencies, and deployment manifest into a tarball (max 500 MB).
+2. Uploads the tarball to Runpod.
+3. Provisions Serverless endpoints:
+   - Orchestrator endpoint: Runs your FastAPI app on a [load balaning](/serverless/load-balancing/overview) endpoint.
+   - Worker endpoints: Execute your `@remote` functions on GPU/CPU endpoints.
+
+This page explains the key concepts and processes you'll use when building Flash apps.
+
+<Tip>
+If you prefer to learn by doing, follow this tuturial to [build your first Flash app](/flash/apps/build-app).
+</Tip>
+
+
+## App development overview
+
+Building a Flash application follows a clear progression from initialization to production deployment:
+
+<div style={{ marginLeft: '6rem'}}>
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+    Init["flash init<br/>Create project"]
+    Code["Define endpoints with<br/>@remote functions"]
+    Run["Test locally with<br/>flash run"]
+    Deploy["Deploy to Runpod with<br/>flash deploy"]
+    Manage["Manage apps and<br/>environments with<br/>flash app and flash env"]
+
+    Init --> Code
+    Code --> Run
+    Run -->|"Ready for production"| Deploy
+    Deploy --> Manage
+    Run -->|"Continue development"| Code
+
+    style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style Code fill:#22C55E,stroke:#22C55E,color:#000
+    style Run fill:#4D38F5,stroke:#4D38F5,color:#fff
+    style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000
+    style Manage fill:#9289FE,stroke:#9289FE,color:#fff
+```
+</div>
+
+<Steps>
+  <Step title="Initialize">
+    Use `flash init` to create a new project with a FastAPI server and example workers:
+
+    ```bash
+    flash init PROJECT_NAME
+    cd PROJECT_NAME
+    pip install -r requirements.txt
+    ```
+
+    This gives you a working project structure with GPU and CPU worker examples. [Learn more about project initialization](/flash/apps/initialize-project).
+  </Step>
+
+  <Step title="Develop">
+    Write your application code by defining `@remote` functions that execute on Runpod workers:
+
+    ```python
+    from runpod_flash import remote, LiveServerless, GpuGroup
+
+    config = LiveServerless(
+        name="inference-worker",
+        gpus=[GpuGroup.ADA_24],
+        workersMax=3,
+    )
+
+    @remote(resource_config=config, dependencies=["torch"])
+    def run_inference(prompt: str) -> dict:
+        import torch
+        # Your inference logic here
+        return {"result": "..."}
+    ```
+
+    [Learn more about remote functions](/flash/remote-functions).
+  </Step>
+
+  <Step title="Test locally">
+    Start a local development server to test your application:
+
+    ```bash
+    flash run
+    ```
+
+    Your FastAPI app runs locally and updates automatically. When you call a `@remote` function, Flash sends the latest code to Runpod workers. This hybrid architecture lets you iterate quickly without redeploying. [Learn more about local testing](/flash/apps/local-testing).
+  </Step>
+
+  <Step title="Deploy">
+    When ready for production, deploy your application to Runpod Serverless:
+
+    ```bash
+    flash deploy
+    ```
+
+    Your entire application—including the FastAPI server and all worker functions—runs on Runpod infrastructure. [Learn more about deployment](/flash/apps/deploy-apps).
+  </Step>
+
+  <Step title="Manage">
+    Use apps and environments to organize and manage your deployments across different stages (dev, staging, production). [Learn more about apps and environments](/flash/apps/apps-and-environments).
+  </Step>
+</Steps>
+
+## Apps and environments
+
+Flash uses a two-level organizational structure to manage deployments: **apps** and **environments**.
+
+### What is a Flash app?
+
+A **Flash app** is a logical container for all resources related to a single project. It consists of:
+
+- **App registry entry**: Metadata in Runpod's system (just a namespace).
+- **Environments**: Different deployment stages (dev, staging, production).
+- **Builds**: Versioned tarball artifacts containing your code and dependencies.
+- **Serverless endpoints**: The actual running infrastructure (mothership + workers).
+
+Apps are created automatically when you first run `flash deploy`, or you can create them explicitly with `flash app create`.
+
+### What is an environment?
+
+An **environment** is an isolated deployment stage within an app. Each environment has its own:
+
+- **Deployed endpoints**: Serverless workers for your `@remote` functions.
+- **Build version**: The specific code version running in this environment.
+- **State**: Current deployment status (deploying, deployed, failed, etc.).
+
+Environments are completely independent—deploying to `dev` has no effect on `production`. You can create and manage environments with the `flash env` command.
+
+## Local vs production deployment
+
+Flash supports two modes of operation:
+
+### Local development (`flash run`)
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+    subgraph Local ["YOUR MACHINE"]
+        FastAPI["FastAPI App<br/>• Updates automatically<br/>• localhost:8888"]
+    end
+
+    subgraph Runpod ["RUNPOD SERVERLESS"]
+        Workers["Workers<br/>• @remote functions<br/>• live- prefix"]
+    end
+
+    FastAPI -->|"HTTPS"| Workers
+
+    style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style Runpod fill:#1a1a2e,stroke:#22C55E,stroke-width:2px,color:#fff
+    style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style Workers fill:#22C55E,stroke:#22C55E,color:#000
+```
+
+**How it works:**
+
+- FastAPI runs on your machine and updates automatically.
+- `@remote` functions run on Runpod workers.
+- Endpoints prefixed with `live-` for easy identification.
+- No authentication required for local testing.
+- Fast iteration on application logic.
+
+### Production deployment (`flash deploy`)
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+    Users(["USERS"])
+
+    subgraph Runpod ["RUNPOD SERVERLESS"]
+        Mothership["Mothership<br/>• FastAPI app<br/>• Public URL"]
+        Workers["Workers<br/>• @remote functions"]
+
+        Mothership -->|"internal"| Workers
+    end
+
+    Users -->|"HTTPS (auth required)"| Mothership
+
+    style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style Users fill:#4D38F5,stroke:#4D38F5,color:#fff
+    style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style Workers fill:#22C55E,stroke:#22C55E,color:#000
+```
+
+**How it works:**
+
+- Entire application runs on Runpod Serverless.
+- FastAPI "mothership" endpoint orchestrates worker calls.
+- Public HTTPS URL with API key authentication.
+- Automatic scaling based on load.
+- Production-grade reliability and performance.
+
+## Common workflows
+
+### Simple projects (single environment)
+
+For solo projects or simple applications:
+
+```bash
+# Initialize and develop
+flash init PROJECT_NAME
+cd PROJECT_NAME
+
+# Test locally
+flash run
+
+# Deploy to production (creates 'production' environment by default)
+flash deploy
+```
+
+### Team projects (multiple environments)
+
+For team collaboration with dev, staging, and production stages:
+
+```bash
+# Create environments
+flash env create dev
+flash env create staging
+flash env create production
+
+# Development cycle
+flash run                          # Test locally
+flash deploy --env dev             # Deploy to dev for integration testing
+flash deploy --env staging         # Deploy to staging for QA
+flash deploy --env production      # Deploy to production after approval
+```
+
+### Feature development
+
+For testing new features in isolation:
+
+```bash
+# Create temporary feature environment
+flash env create FEATURE_NAME
+
+# Deploy and test
+flash deploy --env FEATURE_NAME
+
+# Clean up after merging
+flash env delete FEATURE_NAME
+```
+
+## Next steps
+
+<CardGroup cols={2}>
+  <Card title="Build your first app" href="/flash/apps/build-app" icon="code">
+    Create a Flash app, test it locally, and deploy it to production.
+  </Card>
+  <Card title="Initialize a project" href="/flash/apps/initialize-project" icon="folder-plus">
+    Create boilerplate code for a new Flash project with `flash init`.
+  </Card>
+  <Card title="Test locally" href="/flash/apps/local-testing" icon="flask">
+    Use `flash run` for local development and testing.
+  </Card>
+  <Card title="Deploy to Runpod" href="/flash/apps/deploy-apps" icon="rocket">
+    Deploy your application to production with `flash deploy`.
+  </Card>
+</CardGroup>
diff --git a/flash/cli/app.mdx b/flash/cli/app.mdx
new file mode 100644
index 00000000..81a5f55a
--- /dev/null
+++ b/flash/cli/app.mdx
@@ -0,0 +1,207 @@
+---
+title: "app"
+sidebarTitle: "app"
+---
+
+Manage Flash applications. An app is the top-level container that groups your deployment environments, build artifacts, and configuration.
+
+```bash Command
+flash app <subcommand> [OPTIONS]
+```
+
+## Subcommands
+
+| Subcommand | Description |
+|------------|-------------|
+| `list` | Show all apps in your account |
+| `create` | Create a new app |
+| `get` | Show details of an app |
+| `delete` | Delete an app and all its resources |
+
+---
+
+## app list
+
+Show all Flash apps under your account.
+
+```bash Command
+flash app list
+```
+
+### Output
+
+```text
+┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
+┃ Name           ┃ ID                   ┃ Environments            ┃ Builds           ┃
+┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
+│ my-project     │ app_abc123           │ dev, staging, prod      │ build_1, build_2 │
+│ demo-api       │ app_def456           │ production              │ build_3          │
+│ ml-inference   │ app_ghi789           │ dev, production         │ build_4, build_5 │
+└────────────────┴──────────────────────┴─────────────────────────┴──────────────────┘
+```
+
+---
+
+## app create
+
+Register a new Flash app on Runpod's backend.
+
+```bash Command
+flash app create <NAME>
+```
+
+### Arguments
+
+<ResponseField name="NAME" type="string" required>
+Name for the new Flash app. Must be unique within your account.
+</ResponseField>
+
+### What it creates
+
+This command registers a Flash app in Runpod's backend—essentially creating a namespace for your environments and builds. It does not:
+
+- Create local files (use `flash init` for that).
+- Provision cloud resources (endpoints, volumes, etc.).
+- Deploy any code.
+
+The app is just a container that groups environments and builds together.
+
+### When to use
+
+<Note>
+
+Most users don't need to run `flash app create` explicitly. Apps are created automatically when you first run `flash deploy`. This command is primarily for CI/CD pipelines that need to pre-register apps before deployment.
+
+</Note>
+
+---
+
+## app get
+
+Get detailed information about a Flash app.
+
+```bash Command
+flash app get <NAME>
+```
+
+### Arguments
+
+<ResponseField name="NAME" type="string" required>
+Name of the Flash app to inspect.
+</ResponseField>
+
+### Output
+
+```text
+╭─────────────────────────────────╮
+│ Flash App: my-project           │
+├─────────────────────────────────┤
+│ Name: my-project                │
+│ ID: app_abc123                  │
+│ Environments: 3                 │
+│ Builds: 5                       │
+╰─────────────────────────────────╯
+
+              Environments
+┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
+┃ Name       ┃ ID                 ┃ State   ┃ Active Build     ┃ Created          ┃
+┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
+│ dev        │ env_dev123         │ DEPLOYED│ build_xyz789     │ 2024-01-15 10:30 │
+│ staging    │ env_stg456         │ DEPLOYED│ build_xyz789     │ 2024-01-16 14:20 │
+│ production │ env_prd789         │ DEPLOYED│ build_abc123     │ 2024-01-20 09:15 │
+└────────────┴────────────────────┴─────────┴──────────────────┴──────────────────┘
+
+                     Builds
+┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
+┃ ID                 ┃ Status                   ┃ Created          ┃
+┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
+│ build_abc123       │ COMPLETED                │ 2024-01-20 09:00 │
+│ build_xyz789       │ COMPLETED                │ 2024-01-18 15:45 │
+│ build_def456       │ COMPLETED                │ 2024-01-15 11:20 │
+└────────────────────┴──────────────────────────┴──────────────────┘
+```
+
+---
+
+## app delete
+
+Delete a Flash app and all its associated resources.
+
+```bash Command
+flash app delete --app <NAME>
+```
+
+### Flags
+
+<ResponseField name="--app, -a" type="string" required>
+Name of the Flash app to delete. Required explicitly for safety.
+</ResponseField>
+
+<Note>
+
+Unlike other subcommands, `delete` requires the `--app` flag explicitly. This is a safety measure for destructive operations.
+
+</Note>
+
+### Process
+
+1. Shows app details and resources to be deleted.
+2. Prompts for confirmation (required).
+3. Deletes all environments and their resources.
+4. Deletes all builds.
+5. Deletes the app.
+
+<Warning>
+
+This operation is irreversible. All environments, builds, endpoints, volumes, and configuration will be permanently deleted.
+
+</Warning>
+
+---
+
+## App hierarchy
+
+A Flash app contains environments and builds:
+
+```text
+Flash App (my-project)
+│
+├── Environments
+│   ├── dev
+│   │   ├── Endpoints (ep1, ep2)
+│   │   └── Volumes (vol1)
+│   ├── staging
+│   │   ├── Endpoints (ep1, ep2)
+│   │   └── Volumes (vol1)
+│   └── production
+│       ├── Endpoints (ep1, ep2)
+│       └── Volumes (vol1)
+│
+└── Builds
+    ├── build_v1 (2024-01-15)
+    ├── build_v2 (2024-01-18)
+    └── build_v3 (2024-01-20)
+```
+
+## Auto-detection
+
+Flash CLI automatically detects the app name from your current directory:
+
+```bash
+cd /path/to/APP_NAME
+flash deploy          # Deploys to 'APP_NAME' app
+flash env list        # Lists 'APP_NAME' environments
+```
+
+Override with the `--app` flag:
+
+```bash
+flash deploy --app other-project
+flash env list --app other-project
+```
+
+## Related commands
+
+- [`flash env`](/flash/cli/env) - Manage environments within an app
+- [`flash deploy`](/flash/cli/deploy) - Deploy to an app's environment
+- [`flash init`](/flash/cli/init) - Create a new project
diff --git a/flash/cli/build.mdx b/flash/cli/build.mdx
new file mode 100644
index 00000000..fb6da58f
--- /dev/null
+++ b/flash/cli/build.mdx
@@ -0,0 +1,184 @@
+---
+title: "build"
+sidebarTitle: "build"
+---
+
+Build a deployment-ready artifact for your Flash application without deploying. Use this for more control over the build process or to inspect the artifact before deploying.
+
+```bash
+flash build [OPTIONS]
+```
+
+## Examples
+
+Build with all dependencies:
+
+```bash
+flash build
+```
+
+Build and launch local preview environment:
+
+```bash
+flash build --preview
+```
+
+Build with excluded packages (for smaller deployment size):
+
+```bash
+flash build --exclude torch,torchvision,torchaudio
+```
+
+Keep the build directory for inspection:
+
+```bash
+flash build --keep-build
+```
+
+## Flags
+
+<ResponseField name="--no-deps">
+Skip transitive dependencies during pip install. Only installs direct dependencies specified in `@remote` decorators. Useful when the base image already includes dependencies.
+</ResponseField>
+
+<ResponseField name="--keep-build">
+Keep the `.flash/.build` directory after creating the archive. Useful for debugging build issues or inspecting generated files.
+</ResponseField>
+
+<ResponseField name="--output, -o" type="string" default="artifact.tar.gz">
+Custom name for the output archive file.
+</ResponseField>
+
+<ResponseField name="--exclude" type="string">
+Comma-separated list of packages to exclude from the build (e.g., `torch,torchvision`). Use this to skip packages already in the base image.
+</ResponseField>
+
+<ResponseField name="--preview">
+Launch a local Docker-based test environment after building. Automatically enables `--keep-build`.
+</ResponseField>
+
+## What happens during build
+
+1. **Function discovery**: Finds all `@remote` decorated functions.
+2. **Grouping**: Groups functions by their `resource_config`.
+3. **Manifest generation**: Creates `.flash/flash_manifest.json` with endpoint definitions.
+4. **Dependency installation**: Installs Python packages for Linux x86_64.
+5. **Packaging**: Bundles everything into `.flash/artifact.tar.gz`.
+
+## Build artifacts
+
+After running `flash build`:
+
+| File/Directory | Description |
+|----------------|-------------|
+| `.flash/artifact.tar.gz` | Deployment package ready for Runpod |
+| `.flash/flash_manifest.json` | Service discovery configuration |
+| `.flash/.build/` | Temporary build directory (removed unless `--keep-build`) |
+
+## Cross-platform builds
+
+Flash automatically handles cross-platform builds:
+
+- **Automatic platform targeting**: Dependencies are installed for Linux x86_64, regardless of your build platform.
+- **Python version matching**: Uses your current Python version for package compatibility.
+- **Binary wheel enforcement**: Only pre-built wheels are used, preventing compilation issues.
+
+You can build on macOS, Windows, or Linux, and the deployment will work on Runpod.
+
+## Managing deployment size
+
+Runpod Serverless has a **500MB deployment limit**. Use `--exclude` to skip packages already in your base image:
+
+```bash
+# For GPU deployments (PyTorch pre-installed)
+flash build --exclude torch,torchvision,torchaudio
+```
+
+### Base image reference
+
+| Resource type | Base image | Safe to exclude |
+|--------------|------------|-----------------|
+| GPU | PyTorch base | `torch`, `torchvision`, `torchaudio` |
+| CPU | Python slim | Do not exclude ML packages |
+
+<Tip>
+
+Check the [worker-flash repository](https://github.com/runpod-workers/worker-flash) for current base images and pre-installed packages.
+
+</Tip>
+
+## Preview environment
+
+Test your deployment locally before pushing to Runpod:
+
+```bash
+flash build --preview
+```
+
+This:
+
+1. Builds your project (creates archive and manifest).
+2. Creates a Docker network for inter-container communication.
+3. Starts one container per resource config (mothership + workers).
+4. Exposes the mothership on `localhost:8000`.
+5. On shutdown (`Ctrl+C`), stops and removes all containers.
+
+### When to use preview
+
+- Test deployment configuration before production.
+- Validate manifest structure.
+- Debug resource provisioning.
+- Verify cross-endpoint function calls.
+
+## Troubleshooting
+
+### Build fails with "functions not found"
+
+Ensure your project has `@remote` decorated functions:
+
+```python
+from runpod_flash import remote, LiveServerless
+
+config = LiveServerless(name="my-worker")
+
+@remote(resource_config=config)
+def my_function(data):
+    return {"result": data}
+```
+
+### Archive is too large
+
+Use `--exclude` or `--no-deps`:
+
+```bash
+flash build --exclude torch,torchvision,torchaudio
+```
+
+### Dependency installation fails
+
+If a package doesn't have Linux x86_64 wheels:
+
+1. Ensure standard pip is installed: `python -m ensurepip --upgrade`
+2. Check PyPI for Linux wheel availability.
+3. For Python 3.13+, some packages may require newer manylinux versions.
+
+### Need to examine generated files
+
+Use `--keep-build`:
+
+```bash
+flash build --keep-build
+ls .flash/.build/
+```
+
+## Related commands
+
+- [`flash deploy`](/flash/cli/deploy) - Build and deploy in one step
+- [`flash run`](/flash/cli/run) - Start development server
+- [`flash env`](/flash/cli/env) - Manage environments
+
+<Note>
+
+Most users should use `flash deploy` instead, which runs build and deploy in one step. Use `flash build` when you need more control or want to inspect the artifact.
+
+</Note>
diff --git a/flash/cli/deploy.mdx b/flash/cli/deploy.mdx
new file mode 100644
index 00000000..bd4224fa
--- /dev/null
+++ b/flash/cli/deploy.mdx
@@ -0,0 +1,247 @@
+---
+title: "deploy"
+sidebarTitle: "deploy"
+---
+
+Build and deploy your Flash application to Runpod Serverless endpoints in one step. This is the primary command for getting your application running in the cloud.
+
+```bash
+flash deploy [OPTIONS]
+```
+
+## Examples
+
+Build and deploy a Flash app from the current directory (auto-selects environment if only one exists):
+
+```bash
+flash deploy
+```
+
+Deploy to a specific environment:
+
+```bash
+flash deploy --env production
+```
+
+Deploy with excluded packages to reduce size:
+
+```bash
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+Build and test locally before deploying:
+
+```bash
+flash deploy --preview
+```
+
+## Flags
+
+<ResponseField name="--env, -e" type="string">
+Target environment name (e.g., `dev`, `staging`, `production`). Auto-selected if only one exists. Creates the environment if it doesn't exist.
+</ResponseField>
+
+<ResponseField name="--app, -a" type="string">
+Flash app name. Auto-detected from the current directory if not specified.
+</ResponseField>
+
+<ResponseField name="--no-deps">
+Skip transitive dependencies during pip install. Useful when the base image already includes dependencies.
+</ResponseField>
+
+<ResponseField name="--exclude" type="string">
+Comma-separated packages to exclude (e.g., `torch,torchvision`). Use this to stay under the 500MB deployment limit.
+</ResponseField>
+
+<ResponseField name="--output, -o" type="string" default="artifact.tar.gz">
+Custom archive name for the build artifact.
+</ResponseField>
+
+<ResponseField name="--preview">
+Build and launch a local Docker-based preview environment instead of deploying to Runpod.
+</ResponseField>
+
+<ResponseField name="--use-local-flash">
+Bundle local `runpod_flash` source instead of the PyPI version. For development and testing only.
+</ResponseField>
+
+## What happens during deployment
+
+1. **Build phase**: Creates the deployment artifact (same as `flash build`).
+2. **Environment resolution**: Detects or creates the target environment.
+3. **Upload**: Sends the artifact to Runpod storage.
+4. **Provisioning**: Creates or updates Serverless endpoints.
+5. **Configuration**: Sets up environment variables and service discovery.
+6. **Verification**: Confirms endpoints are healthy.
+
+## Architecture
+
+After deployment, your entire application runs on Runpod Serverless:
+
+<div style={{ marginLeft: '4rem'}}>
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+    Users(["USERS"])
+
+    subgraph Runpod ["RUNPOD SERVERLESS"]
+        Mothership["MOTHERSHIP ENDPOINT<br/>(your FastAPI app from main.py)<br/>• Your HTTP routes<br/>• Orchestrates @remote calls<br/>• Public URL for users"]
+        GPU["gpu-worker<br/>(your @remote function)"]
+        CPU["cpu-worker<br/>(your @remote function)"]
+
+        Mothership -->|"internal"| GPU
+        Mothership -->|"internal"| CPU
+    end
+
+    Users -->|"HTTPS (authenticated)"| Mothership
+
+    style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style Users fill:#4D38F5,stroke:#4D38F5,color:#fff
+    style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style GPU fill:#22C55E,stroke:#22C55E,color:#000
+    style CPU fill:#22C55E,stroke:#22C55E,color:#000
+```
+</div>
+
+## Environment management
+
+### Automatic creation
+
+If the specified environment doesn't exist, `flash deploy` creates it:
+
+```bash
+# Creates 'staging' if it doesn't exist
+flash deploy --env staging
+```
+
+### Auto-selection
+
+When you have only one environment, it's selected automatically:
+
+```bash
+# Auto-selects the only available environment
+flash deploy
+```
+
+When multiple environments exist, you must specify one:
+
+```bash
+# Required when multiple environments exist
+flash deploy --env staging
+```
+
+### Default environment
+
+If no environment exists and none is specified, Flash creates a `production` environment by default.
+
+## Post-deployment
+
+After successful deployment, Flash displays:
+
+```text
+✓ Deployment Complete
+
+Your mothership is deployed at:
+https://api-xxxxx.runpod.net
+
+Available Routes:
+POST   /api/hello
+POST   /gpu/process
+
+All endpoints require authentication:
+curl -X POST https://api-xxxxx.runpod.net/api/hello \
+    -H "Authorization: Bearer $RUNPOD_API_KEY" \
+    -H "Content-Type: application/json" \
+    -d '{"param": "value"}'
+```
+
+### Authentication
+
+All deployed endpoints require authentication with your Runpod API key:
+
+```bash
+export RUNPOD_API_KEY="your_key_here"
+
+curl -X POST https://YOUR_ENDPOINT_URL/path \
+    -H "Authorization: Bearer $RUNPOD_API_KEY" \
+    -H "Content-Type: application/json" \
+    -d '{"param": "value"}'
+```
+
+## Preview mode
+
+Test locally before deploying:
+
+```bash
+flash deploy --preview
+```
+
+This builds your project and runs it in Docker containers locally:
+
+- Mothership exposed on `localhost:8000`.
+- All containers communicate via Docker network.
+- Press `Ctrl+C` to stop.
+
+## Managing deployment size
+
+Runpod Serverless has a **500MB limit**. Use `--exclude` to skip packages in the base image:
+
+```bash
+# GPU deployments (PyTorch pre-installed)
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+| Resource type | Safe to exclude |
+|--------------|-----------------|
+| GPU | `torch`, `torchvision`, `torchaudio` |
+| CPU | Do not exclude ML packages |
+
+## flash run vs flash deploy
+
+| Aspect | `flash run` | `flash deploy` |
+|--------|-------------|----------------|
+| FastAPI app runs on | Your machine | Runpod Serverless |
+| `@remote` functions run on | Runpod Serverless | Runpod Serverless |
+| Endpoint naming | `live-` prefix | No prefix |
+| Automatic updates | Yes | No |
+| Use case | Development | Production |
+
+## Troubleshooting
+
+### Multiple environments error
+
+```text
+Error: Multiple environments found: dev, staging, production
+```
+
+Specify the target environment:
+
+```bash
+flash deploy --env staging
+```
+
+### Deployment size limit
+
+Use `--exclude` to reduce size:
+
+```bash
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+### Authentication fails
+
+Ensure your API key is set:
+
+```bash
+echo $RUNPOD_API_KEY
+export RUNPOD_API_KEY="your_key_here"
+```
+
+## Related commands
+
+- [`flash build`](/flash/cli/build) - Build without deploying
+- [`flash run`](/flash/cli/run) - Local development server
+- [`flash env`](/flash/cli/env) - Manage environments
+- [`flash app`](/flash/cli/app) - Manage applications
+- [`flash undeploy`](/flash/cli/undeploy) - Remove endpoints
diff --git a/flash/cli/env.mdx b/flash/cli/env.mdx
new file mode 100644
index 00000000..7d4494ba
--- /dev/null
+++ b/flash/cli/env.mdx
@@ -0,0 +1,255 @@
+---
+title: "env"
+sidebarTitle: "env"
+---
+
+Manage deployment environments for Flash applications. Environments are isolated deployment contexts (like `dev`, `staging`, `production`) within a Flash app.
+
+```bash Command
+flash env <subcommand> [OPTIONS]
+```
+
+## Subcommands
+
+| Subcommand | Description |
+|------------|-------------|
+| `list` | Show all environments for an app |
+| `create` | Create a new environment |
+| `get` | Show details of an environment |
+| `delete` | Delete an environment and its resources |
+
+---
+
+## env list
+
+Show all available environments for an app.
+
+```bash Command
+flash env list [OPTIONS]
+```
+
+### Example
+
+```bash
+# List environments for current app
+flash env list
+
+# List environments for specific app
+flash env list --app APP_NAME
+```
+
+### Flags
+
+<ResponseField name="--app, -a" type="string">
+Flash app name. Auto-detected from current directory if not specified.
+</ResponseField>
+
+### Output
+
+```text
+┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
+┃ Name       ┃ ID                  ┃ Active Build      ┃ Created At       ┃
+┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
+│ dev        │ env_abc123          │ build_xyz789      │ 2024-01-15 10:30 │
+│ staging    │ env_def456          │ build_uvw456      │ 2024-01-16 14:20 │
+│ production │ env_ghi789          │ build_rst123      │ 2024-01-20 09:15 │
+└────────────┴─────────────────────┴───────────────────┴──────────────────┘
+```
+
+---
+
+## env create
+
+Create a new deployment environment.
+
+```bash Command
+flash env create <NAME> [OPTIONS]
+```
+
+### Example
+
+```bash
+# Create staging environment
+flash env create staging
+
+# Create environment in specific app
+flash env create production --app APP_NAME
+```
+
+### Arguments
+
+<ResponseField name="NAME" type="string" required>
+Name for the new environment (e.g., `dev`, `staging`, `production`).
+</ResponseField>
+
+### Flags
+
+<ResponseField name="--app, -a" type="string">
+Flash app name. Auto-detected from current directory if not specified.
+</ResponseField>
+
+### Notes
+
+- If the app doesn't exist, it's created automatically.
+- Environment names must be unique within an app.
+- Newly created environments have no active build until first deployment.
+
+<Note>
+
+You don't always need to create environments explicitly. Running `flash deploy --env <name>` creates the environment automatically if it doesn't exist.
+
+</Note>
+
+---
+
+## env get
+
+Show detailed information about a deployment environment.
+
+```bash Command
+flash env get <NAME> [OPTIONS]
+```
+
+### Example
+
+```bash
+# Get details for production environment
+flash env get production
+
+# Get details for specific app's environment
+flash env get staging --app APP_NAME
+```
+
+### Arguments
+
+<ResponseField name="NAME" type="string" required>
+Name of the environment to inspect.
+</ResponseField>
+
+### Flags
+
+<ResponseField name="--app, -a" type="string">
+Flash app name. Auto-detected from current directory if not specified.
+</ResponseField>
+
+### Output
+
+```text
+╭────────────────────────────────────╮
+│ Environment: production            │
+├────────────────────────────────────┤
+│ ID: env_ghi789                     │
+│ State: DEPLOYED                    │
+│ Active Build: build_rst123         │
+│ Created: 2024-01-20 09:15:00       │
+╰────────────────────────────────────╯
+
+           Associated Endpoints
+┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
+┃ Name           ┃ ID                 ┃
+┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
+│ my-gpu         │ ep_abc123          │
+│ my-cpu         │ ep_def456          │
+└────────────────┴────────────────────┘
+
+       Associated Network Volumes
+┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
+┃ Name           ┃ ID                 ┃
+┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
+│ model-cache    │ nv_xyz789          │
+└────────────────┴────────────────────┘
+```
+
+---
+
+## env delete
+
+Delete a deployment environment and all its associated resources.
+
+```bash Command
+flash env delete <NAME> [OPTIONS]
+```
+
+### Examples
+
+```bash
+# Delete development environment
+flash env delete dev
+
+# Delete environment in specific app
+flash env delete staging --app APP_NAME
+```
+
+### Arguments
+
+<ResponseField name="NAME" type="string" required>
+Name of the environment to delete.
+</ResponseField>
+
+### Flags
+
+<ResponseField name="--app, -a" type="string">
+Flash app name. Auto-detected from current directory if not specified.
+</ResponseField>
+
+### Process
+
+1. Shows environment details and resources to be deleted.
+2. Prompts for confirmation (required).
+3. Undeploys all associated endpoints.
+4. Removes all associated network volumes.
+5. Deletes the environment from the app.
+
+<Warning>
+
+This operation is irreversible. All endpoints, volumes, and configuration associated with the environment will be permanently deleted.
+
+</Warning>
+
+---
+
+## Environment states
+
+| State | Description |
+|-------|-------------|
+| PENDING | Environment created but not deployed |
+| DEPLOYING | Deployment in progress |
+| DEPLOYED | Successfully deployed and running |
+| FAILED | Deployment or health check failed |
+| DELETING | Deletion in progress |
+
+## Common workflows
+
+### Three-tier deployment
+
+```bash
+# Create environments
+flash env create dev
+flash env create staging
+flash env create production
+
+# Deploy to each
+flash deploy --env dev
+flash deploy --env staging
+flash deploy --env production
+```
+
+### Feature branch testing
+
+```bash
+# Create feature environment
+flash env create FEATURE_NAME
+
+# Deploy feature branch
+git checkout FEATURE_NAME
+flash deploy --env FEATURE_NAME
+
+# Clean up after merge
+flash env delete FEATURE_NAME
+```
+
+## Related commands
+
+- [`flash deploy`](/flash/cli/deploy) - Deploy to an environment
+- [`flash app`](/flash/cli/app) - Manage applications
+- [`flash undeploy`](/flash/cli/undeploy) - Remove specific endpoints
diff --git a/flash/cli/init.mdx b/flash/cli/init.mdx
new file mode 100644
index 00000000..12f93b93
--- /dev/null
+++ b/flash/cli/init.mdx
@@ -0,0 +1,89 @@
+---
+title: "init"
+sidebarTitle: "init"
+---
+
+Create a new Flash project with a ready-to-use template structure including a FastAPI server, example GPU and CPU workers, and configuration files.
+
+```bash
+flash init [PROJECT_NAME] [OPTIONS]
+```
+
+## Example
+
+Create a new project directory:
+
+```bash
+flash init PROJECT_NAME
+cd PROJECT_NAME
+pip install -r requirements.txt
+flash run
+```
+
+Initialize in the current directory:
+
+```bash
+flash init .
+```
+
+## Arguments
+
+<ResponseField name="PROJECT_NAME" type="string">
+Name of the project directory to create. If omitted or set to `.`, initializes in the current directory.
+</ResponseField>
+
+## Flags
+
+<ResponseField name="--force, -f">
+Overwrite existing files if they already exist in the target directory.
+</ResponseField>
+
+## What it creates
+
+The command creates the following project structure:
+
+```text
+PROJECT_NAME/
+├── main.py              # FastAPI application entry point
+├── workers/
+│   ├── gpu/             # GPU worker example
+│   │   ├── __init__.py
+│   │   └── endpoint.py
+│   └── cpu/             # CPU worker example
+│       ├── __init__.py
+│       └── endpoint.py
+├── .env                 # Environment variables template
+├── .gitignore           # Git ignore patterns
+├── .flashignore         # Flash deployment ignore patterns
+├── requirements.txt     # Python dependencies
+└── README.md            # Project documentation
+```
+
+### Template contents
+
+- **main.py**: FastAPI application that imports routers from the `workers/` directory.
+- **workers/gpu/endpoint.py**: Example GPU worker with a `@remote` decorated function using `LiveServerless`.
+- **workers/cpu/endpoint.py**: Example CPU worker with a `@remote` decorated function using CPU configuration.
+- **.env**: Template for environment variables including `RUNPOD_API_KEY`.
+
+## Next steps
+
+After initialization:
+
+1. Copy `.env.example` to `.env` (if needed) and add your `RUNPOD_API_KEY`.
+2. Install dependencies: `pip install -r requirements.txt`
+3. Start the development server: `flash run`
+4. Open http://localhost:8888/docs to explore the API.
+5. Customize the workers for your use case.
+6. Deploy with `flash deploy` when ready.
+
+<Note>
+
+This command only creates local files. It doesn't interact with Runpod or create any cloud resources. Cloud resources are created when you run `flash run` or `flash deploy`.
+
+</Note>
+
+## Related commands
+
+- [`flash run`](/flash/cli/run) - Start the development server
+- [`flash deploy`](/flash/cli/deploy) - Build and deploy to Runpod
diff --git a/flash/cli/overview.mdx b/flash/cli/overview.mdx
new file mode 100644
index 00000000..db53b4bb
--- /dev/null
+++ b/flash/cli/overview.mdx
@@ -0,0 +1,84 @@
+---
+title: "CLI overview"
+sidebarTitle: "Overview"
+description: "Learn how to use the Flash CLI for local development and deployment."
+---
+
+The Flash CLI provides commands for initializing projects, running local development servers, building deployment artifacts, and managing your applications on Runpod Serverless.
+
+Before using the CLI, make sure you've [installed Flash](/flash/overview#install-flash) and set your [Runpod API key](/get-started/api-keys) in your environment.
+
+## Available commands
+
+| Command | Description |
+|---------|-------------|
+| [`flash init`](/flash/cli/init) | Create a new Flash project with a template structure |
+| [`flash run`](/flash/cli/run) | Start the local development server with automatic updates |
+| [`flash build`](/flash/cli/build) | Build a deployment artifact without deploying |
+| [`flash deploy`](/flash/cli/deploy) | Build and deploy your application to Runpod |
+| [`flash env`](/flash/cli/env) | Manage deployment environments |
+| [`flash app`](/flash/cli/app) | Manage Flash applications |
+| [`flash undeploy`](/flash/cli/undeploy) | Remove deployed endpoints |
+
+## Getting help
+
+View help for any command by adding `--help`:
+
+```bash
+flash --help
+flash deploy --help
+flash env --help
+```
+
+## Common workflows
+
+### Local development
+
+```bash
+# Create a new project
+flash init PROJECT_NAME
+cd PROJECT_NAME
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Add your API key to .env
+# Start the development server
+flash run
+```
+
+### Deploy to production
+
+```bash
+# Build and deploy
+flash deploy
+
+# Deploy to a specific environment
+flash deploy --env ENVIRONMENT_NAME
+```
+
+### Manage deployments
+
+```bash
+# List environments
+flash env list
+
+# Check environment status
+flash env get ENVIRONMENT_NAME
+
+# Remove an environment
+flash env delete ENVIRONMENT_NAME
+```
+
+### Clean up endpoints
+
+```bash
+# List deployed endpoints
+flash undeploy list
+
+# Remove specific endpoint
+flash undeploy ENDPOINT_NAME
+
+# Remove all endpoints
+flash undeploy --all
+```
\ No newline at end of file
diff --git a/flash/cli/run.mdx b/flash/cli/run.mdx
new file mode 100644
index 00000000..4dab9e6c
--- /dev/null
+++ b/flash/cli/run.mdx
@@ -0,0 +1,156 @@
+---
+title: "run"
+sidebarTitle: "run"
+---
+
+Start the Flash development server for local testing with automatic updates. Your FastAPI app runs locally while `@remote` functions execute on Runpod Serverless.
+
+```bash
+flash run [OPTIONS]
+```
+
+## Example
+
+Start the development server with defaults:
+
+```bash
+flash run
+```
+
+Start with auto-provisioning to eliminate cold-start delays:
+
+```bash
+flash run --auto-provision
+```
+
+Start on a custom port:
+
+```bash
+flash run --port 3000
+```
+
+## Flags
+
+<ResponseField name="--host" type="string" default="localhost">
+Host address to bind the server to.
+</ResponseField>
+
+<ResponseField name="--port, -p" type="integer" default={8888}>
+Port number to bind the server to.
+</ResponseField>
+
+<ResponseField name="--reload/--no-reload" default="enabled">
+Enable or disable auto-reload on code changes. Enabled by default.
+</ResponseField>
+
+<ResponseField name="--auto-provision">
+Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.
+</ResponseField>
+
+## Architecture
+
+With `flash run`, your system runs in a hybrid architecture:
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+    subgraph Local ["YOUR MACHINE (localhost:8888)"]
+        FastAPI["FastAPI App (main.py)<br/>• Your HTTP routes<br/>• Orchestrates @remote calls<br/>• Updates automatically"]
+    end
+
+    subgraph Runpod ["RUNPOD SERVERLESS"]
+        GPU["live-gpu-worker<br/>(your @remote function)"]
+        CPU["live-cpu-worker<br/>(your @remote function)"]
+    end
+
+    FastAPI -->|"HTTPS"| GPU
+    FastAPI -->|"HTTPS"| CPU
+
+    style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+    style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+    style GPU fill:#22C55E,stroke:#22C55E,color:#000
+    style CPU fill:#22C55E,stroke:#22C55E,color:#000
+```
+
+**Key points:**
+
+- Your FastAPI app runs locally and updates automatically for rapid iteration.
+- `@remote` functions run on Runpod as Serverless endpoints.
+- Endpoints are prefixed with `live-` to distinguish from production.
+- Changes to local code are picked up instantly.
+
+This is different from `flash deploy`, where everything runs on Runpod.
+
+## Auto-provisioning
+
+By default, endpoints are provisioned lazily on first `@remote` function call. Use `--auto-provision` to provision all endpoints at server startup:
+
+```bash
+flash run --auto-provision
+```
+
+### How it works
+
+1. **Discovery**: Scans your app for `@remote` decorated functions.
+2. **Deployment**: Deploys resources concurrently (up to 3 at a time).
+3. **Confirmation**: Asks for confirmation if deploying more than 5 endpoints.
+4. **Caching**: Stores deployed resources in `.runpod/resources.pkl` for reuse.
+5. **Updates**: Recognizes existing endpoints and updates if configuration changed.
+
+### Benefits
+
+- **Zero cold start**: All endpoints ready before you test them.
+- **Faster development**: No waiting for deployment on first HTTP call.
+- **Resource reuse**: Cached endpoints are reused across server restarts.
+
+### When to use
+
+- Local development with multiple endpoints.
+- Testing workflows that call multiple remote functions.
+- Debugging where you want deployment separated from handler logic.
+
+## Provisioning modes
+
+| Mode | When endpoints are deployed |
+|------|----------------------------|
+| Default (lazy) | On first `@remote` function call |
+| `--auto-provision` | At server startup |
+
+## Testing your API
+
+Once the server is running, test your endpoints:
+
+```bash
+# Health check
+curl http://localhost:8888/
+
+# Call a GPU endpoint
+curl -X POST http://localhost:8888/gpu/hello \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Hello from GPU!"}'
+```
+
+Open http://localhost:8888/docs for the interactive API explorer.
+
+## Requirements
+
+- `RUNPOD_API_KEY` must be set in your `.env` file or environment.
+- A valid Flash project structure (created by `flash init` or manually).
+
+## flash run vs flash deploy
+
+| Aspect | `flash run` | `flash deploy` |
+|--------|-------------|----------------|
+| FastAPI app runs on | Your machine (localhost) | Runpod Serverless |
+| `@remote` functions run on | Runpod Serverless | Runpod Serverless |
+| Endpoint naming | `live-` prefix | No prefix |
+| Automatic updates | Yes | No |
+| Use case | Development | Production |
+
+## Related commands
+
+- [`flash init`](/flash/cli/init) - Create a new project
+- [`flash deploy`](/flash/cli/deploy) - Deploy to production
+- [`flash undeploy`](/flash/cli/undeploy) - Remove endpoints
diff --git a/flash/cli/undeploy.mdx b/flash/cli/undeploy.mdx
new file mode 100644
index 00000000..8225182f
--- /dev/null
+++ b/flash/cli/undeploy.mdx
@@ -0,0 +1,213 @@
+---
+title: "undeploy"
+sidebarTitle: "undeploy"
+---
+
+Manage and delete Runpod Serverless endpoints deployed via Flash. Use this command to clean up endpoints created during local development with `flash run`.
+
+```bash
+flash undeploy [NAME|list] [OPTIONS]
+```
+
+## Example
+
+List all tracked endpoints:
+
+```bash
+flash undeploy list
+```
+
+Remove a specific endpoint:
+
+```bash
+flash undeploy ENDPOINT_NAME
+```
+
+Remove all endpoints:
+
+```bash
+flash undeploy --all
+```
+
+## Usage modes
+
+### List endpoints
+
+Display all tracked endpoints with their current status:
+
+```bash
+flash undeploy list
+```
+
+Output includes:
+
+- **Name**: Endpoint name
+- **Endpoint ID**: Runpod endpoint identifier
+- **Status**: Current health status (Active/Inactive/Unknown)
+- **Type**: Resource type (Live Serverless, Cpu Live Serverless, etc.)
+
+**Status indicators:**
+
+| Status | Meaning |
+|--------|---------|
+| Active | Endpoint is running and responding |
+| Inactive | Tracking exists but endpoint deleted externally |
+| Unknown | Error during health check |
+
+### Undeploy by name
+
+Delete a specific endpoint:
+
+```bash
+flash undeploy ENDPOINT_NAME
+``` 
+
+This:
+
+1. Searches for endpoints matching the name.
+2. Shows endpoint details.
+3. Prompts for confirmation.
+4. Deletes the endpoint from Runpod.
+5. Removes from local tracking.
+
+### Undeploy all
+
+Delete all tracked endpoints (requires double confirmation):
+
+```bash
+flash undeploy --all
+```
+
+Safety features:
+
+1. Shows total count of endpoints.
+2. First confirmation: Yes/No prompt.
+3. Second confirmation: Type "DELETE ALL" exactly.
+4. Deletes all endpoints from Runpod.
+5. Removes all from tracking.
+
+### Interactive selection
+
+Select endpoints to undeploy using checkboxes:
+
+```bash
+flash undeploy --interactive
+```
+
+Use arrow keys to navigate, space bar to select/deselect, and Enter to confirm.
+
+### Clean up stale tracking
+
+Remove inactive endpoints from tracking without API deletion:
+
+```bash
+flash undeploy --cleanup-stale
+```
+
+Use this when endpoints were deleted via the Runpod console or API (not through Flash). The local tracking file (`.runpod/resources.pkl`) becomes stale, and this command cleans it up.
+
+## Flags
+
+<ResponseField name="--all">
+Undeploy all tracked endpoints. Requires double confirmation for safety.
+</ResponseField>
+
+<ResponseField name="--interactive, -i">
+Interactive checkbox selection mode. Select multiple endpoints to undeploy.
+</ResponseField>
+
+<ResponseField name="--cleanup-stale">
+Remove inactive endpoints from local tracking without attempting API deletion. Use when endpoints were deleted externally.
+</ResponseField>
+
+## Arguments
+
+<ResponseField name="NAME" type="string">
+Name of the endpoint to undeploy. Use `list` to show all endpoints.
+</ResponseField>
+
+## undeploy vs env delete
+
+| Command | Scope | When to use |
+|---------|-------|-------------|
+| `flash undeploy` | Individual endpoints from local tracking | Development cleanup, granular control |
+| `flash env delete` | Entire environment + all resources | Production cleanup, full teardown |
+
+For production deployments, use `flash env delete` to remove entire environments and all associated resources.
+
+## How tracking works
+
+Flash tracks deployed endpoints in `.runpod/resources.pkl`. Endpoints are added when you:
+
+- Run `flash run --auto-provision`
+- Run `flash run` and call `@remote` functions
+- Run `flash deploy`
+
+The tracking file is in `.gitignore` and should never be committed. It contains local deployment state.
+
+## Common workflows
+
+### Basic cleanup
+
+```bash
+# Check what's deployed
+flash undeploy list
+
+# Remove a specific endpoint
+flash undeploy ENDPOINT_NAME
+
+# Clean up stale tracking
+flash undeploy --cleanup-stale
+```
+
+### Bulk operations
+
+```bash
+# Undeploy all endpoints
+flash undeploy --all
+
+# Interactive selection
+flash undeploy --interactive
+```
+
+### Managing external deletions
+
+If you delete endpoints via the Runpod console:
+
+```bash
+# Check status - will show as "Inactive"
+flash undeploy list
+
+# Remove stale tracking entries
+flash undeploy --cleanup-stale
+```
+
+## Troubleshooting
+
+### Endpoint shows as "Inactive"
+
+The endpoint was deleted via Runpod console or API. Clean up:
+
+```bash
+flash undeploy --cleanup-stale
+```
+
+### Can't find endpoint by name
+
+Check the exact name:
+
+```bash
+flash undeploy list
+```
+
+### Undeploy fails with API error
+
+1. Check `RUNPOD_API_KEY` in `.env`.
+2. Verify network connectivity.
+3. Check if the endpoint still exists on Runpod.
+
+## Related commands
+
+- [`flash run`](/flash/cli/run) - Development server (creates endpoints)
+- [`flash deploy`](/flash/cli/deploy) - Deploy to Runpod
+- [`flash env delete`](/flash/cli/env) - Delete entire environment
diff --git a/flash/custom-docker-images.mdx b/flash/custom-docker-images.mdx
new file mode 100644
index 00000000..da7b4842
--- /dev/null
+++ b/flash/custom-docker-images.mdx
@@ -0,0 +1,327 @@
+---
+title: "Use custom Docker images with Flash"
+sidebarTitle: "Custom Docker images"
+description: "Deploy pre-built Docker images with Flash using ServerlessEndpoint."
+tag: "BETA"
+---
+
+Flash's `LiveServerless` configuration handles most use cases by automatically managing dependencies and executing arbitrary Python code. However, for specialized environments that require custom Docker images—such as pre-built ML frameworks, specific CUDA versions, or system-level dependencies—you can use `ServerlessEndpoint` or `CpuServerlessEndpoint`.
+
+## When to use custom Docker images
+
+Use custom Docker images when you need:
+
+- **Pre-built inference servers**: vLLM, TensorRT-LLM, or other specialized serving frameworks.
+- **System-level dependencies**: Custom CUDA versions, cuDNN, or system libraries not installable via `pip`.
+- **Baked-in models**: Large models pre-downloaded in the image to avoid runtime downloads.
+- **Existing Serverless workers**: You already have a working Runpod Serverless Docker image that you want to use with Flash.
+
+<Tip>
+For most use cases, you should use `LiveServerless` and [remote functions](/flash/remote-functions). It's simpler, faster, and lets you execute arbitrary Python code remotely.
+</Tip>
+
+## How it works
+
+Unlike `LiveServerless` (which delivers your Python code to pre-built Flash workers), you can use `ServerlessEndpoint` to create a traditional [Runpod Serverless endpoint](/serverless/overview) using any Docker image you specify.
+
+
+
+Here are the key differences between `ServerlessEndpoint` and `LiveServerless` resources:
+
+| Aspect | LiveServerless | ServerlessEndpoint |
+|--------|---------------|-------------------|
+| **Code execution** | Delivers Python code with each request | Uses the [handler function](/serverless/workers/handler-functions) in your Docker image |
+| **Input format** | Any Python arguments | Dictionary: `{"input": {...}}` |
+| **Docker image** | Pre-built Flash images | Your custom image |
+| **Dependencies** | Specified in decorator | Baked into Docker image |
+| **Use case** | Dynamic Python functions | Pre-built inference servers |
+
+## Basic usage
+
+<Steps>
+<Step title="Create a ServerlessEndpoint configuration">
+Create a `ServerlessEndpoint` resource configuration pointing to your Docker image. For example:
+
+```python
+from runpod_flash import ServerlessEndpoint, GpuGroup
+
+config = ServerlessEndpoint(
+    name="my-custom-worker",
+    imageName="your-registry/your-image:tag",
+    gpus=[GpuGroup.AMPERE_24],
+    workersMax=3
+)
+```
+</Step>
+
+<Step title="Make requests">
+
+Call `.run()` with a dictionary payload in the format `{"input": {...}}`:
+
+```python
+import asyncio
+from runpod_flash import ServerlessEndpoint, GpuGroup, ResourceManager
+
+async def main():
+    # Explicitly provision the endpoint if it doesn't already exist
+    manager = ResourceManager()
+    deployed_endpoint = await manager.get_or_deploy_resource(config)
+
+    # Send a request to the endpoint
+    result = await config.run({
+        "input": {
+            "prompt": "Your input data",
+            "param1": "value1"
+        }
+    })
+    print(result)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+**No `@remote` decorator is needed**. The endpoint will process the request using the [handler function](/serverless/workers/handler-functions) that's baked into your Docker image.
+
+</Step>
+</Steps>
+
+## Complete example: vLLM inference
+
+This example uses Runpod's official [vLLM worker](/serverless/vllm/overview) to deploy the `microsoft/Phi-3.5-mini-instruct` language model:
+
+```python title="vllm_example.py"
+import asyncio
+from runpod_flash import ServerlessEndpoint, GpuGroup, ResourceManager
+
+# Configure vLLM endpoint
+vllm_config = ServerlessEndpoint(
+    name="vllm-small-model",
+    imageName="runpod/worker-vllm:stable-cuda12.1.0",
+    gpus=[GpuGroup.AMPERE_24],  # RTX 4090 or similar (24GB)
+    workersMax=3,
+    env={
+        "MODEL_NAME": "microsoft/Phi-3.5-mini-instruct",
+        "MAX_MODEL_LEN": "4096",
+        "GPU_MEMORY_UTILIZATION": "0.9",
+        "MAX_CONCURRENCY": "30",
+    }
+)
+
+async def main():
+    # Explicitly provision the endpoint if it doesn't exist
+    manager = ResourceManager()
+    deployed_endpoint = await manager.get_or_deploy_resource(vllm_config)
+
+    print(f"Endpoint deployed at: {deployed_endpoint.endpoint_url}")
+
+    # Generate text
+    result = await deployed_endpoint.run({
+        "input": {
+            "prompt": "Explain quantum computing in simple terms:",
+            "max_tokens": 100,
+            "temperature": 0.7
+        }
+    })
+
+    # Extract the generated text
+    text = result.output[0]['choices'][0]['tokens'][0]
+    print(f"\nGenerated text: {text}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+Here's what happens when you run this code:
+
+1. **Resource configuration**: The `ServerlessEndpoint` configuration specifies the official Runpod [vLLM worker](/serverless/vllm/overview) Docker image and GPU requirements.
+2. **Environment variables**: Model and vLLM settings are configured via `env`.
+3. **Provisioning**: In `main()`, `ResourceManager.get_or_deploy_resource()` creates the endpoint if it doesn't already exist.
+3. **Request**: The input is sent as a dictionary via `.run()` to the deployed vLLM endpoint, matching the worker's expected input format.
+4. **Response**: The results are extracted from the nested response structure.
+
+## Available Docker images
+
+### Official Runpod workers
+
+Runpod provides pre-built worker images for common frameworks:
+
+| Framework | Image | Image link |
+|-----------|-------|---------------|
+| vLLM | `runpod/worker-vllm` | [Link](https://hub.docker.com/r/runpod/worker-vllm) |
+| Automatic1111 | `runpod/worker-a1111:stable` | [A1111 docs](/serverless/workers/sdxl-a1111) |
+| ComfyUI | `runpod/worker-comfy` | [Link]](https://hub.docker.com/r/runpod/worker-comfyui) |
+
+### Custom images
+
+To use your own Docker image:
+
+1. **Build a handler**: Follow the [Serverless handler guide](/serverless/workers/handler-functions).
+2. **Create a Dockerfile**: Package your handler with dependencies.
+3. **Push to registry**: Upload to Docker Hub, GitHub Container Registry, or Runpod's registry.
+4. **Use in Flash**: Reference the image in `imageName`.
+
+See [Deploy custom workers](/serverless/workers/deploy) for details.
+
+## Configuration options
+
+All parameters from `LiveServerless` are available:
+
+```python
+config = ServerlessEndpoint(
+    name="custom-worker",
+    imageName="your-registry/image:tag",  # Required
+    gpus=[GpuGroup.AMPERE_80],
+    workersMin=0,
+    workersMax=5,
+    idleTimeout=10,
+    env={
+        "MODEL_PATH": "/models/llama",
+        "MAX_BATCH_SIZE": "32"
+    },
+    networkVolumeId="vol_abc123",  # Optional: persistent storage
+    executionTimeoutMs=300000  # 5 minutes
+)
+```
+
+See the [resource configuration reference](/flash/resource-configuration) for all available options.
+
+## CPU endpoints
+
+For CPU workloads, use `CpuServerlessEndpoint`:
+
+```python
+from runpod_flash import CpuServerlessEndpoint, CpuInstanceType
+
+config = CpuServerlessEndpoint(
+    name="cpu-worker",
+    imageName="your-registry/cpu-worker:latest",
+    instanceIds=[CpuInstanceType.CPU5C_4_8]  # 4 vCPU, 8GB RAM
+)
+```
+
+## Environment variables
+
+Pass configuration to your Docker image via environment variables. For example:
+
+```python
+config = ServerlessEndpoint(
+    name="vllm-worker",
+    imageName="runpod/worker-vllm:stable-cuda12.1.0",
+    env={
+        "MODEL_NAME": "meta-llama/Llama-3.2-3B-Instruct",
+        "MAX_MODEL_LEN": "8192",
+        "HF_TOKEN": "hf_...",  # For gated models
+        "TRUST_REMOTE_CODE": "True"
+    }
+)
+```
+
+## Explicit provisioning
+
+If it doesn't already exist, you'll need to provision the endpoint before you can make requests. For example:
+
+```python
+from runpod_flash import ResourceManager
+
+async def main():
+    manager = ResourceManager()
+    deployed = await manager.get_or_deploy_resource(config)
+
+    print(f"Endpoint ID: {deployed.id}")
+    print(f"Endpoint URL: {deployed.endpoint_url}")
+
+    # Now make requests
+    result = await deployed.run({"input": {...}})
+```
+
+## Request/response format
+
+### Request structure
+
+All requests must use the format `{"input": {...}}`. For example:
+
+```python
+{
+    "input": {
+        # Your worker-specific parameters
+        "param1": "value1",
+        "param2": "value2"
+    }
+}
+```
+
+### Response structure
+
+The response is a `JobOutput` object with these attributes:
+
+```python
+result.id              # Job ID
+result.workerId        # Worker that processed the request
+result.status          # COMPLETED, IN_PROGRESS, FAILED
+result.delayTime       # Queue delay in ms
+result.executionTime   # Execution time in ms
+result.output          # Worker response (structure varies by worker)
+result.error           # Error message if failed
+```
+
+Extract data from `result.output` based on your worker's output format.
+
+
+
+## Limitations
+
+- **Input format**: Only supports dictionary payloads `{"input": {...}}`. You cannot pass arbitrary Python arguments like with `LiveServerless`.
+- **Code execution**: Cannot execute arbitrary Python code remotely. Your Docker image must include all logic.
+- **@remote decorator**: The `@remote` decorator does not work with `ServerlessEndpoint`. Use `.run()` directly.
+- **Handler required**: Your Docker image must implement a Runpod Serverless [handler function](/serverless/workers/handler-functions).
+
+## Troubleshooting
+
+### Endpoint fails to initialize
+
+**Problem**: Workers fail to start or crash immediately.
+
+**Solutions**:
+
+- Check that your Docker image is compatible with [Runpod Serverless](/serverless/overview).
+- Verify environment variables are correct.
+- Ensure the image includes a valid handler function.
+- Check worker logs in the Runpod console.
+
+### Out of memory errors
+
+**Problem**: Workers crash with CUDA OOM or RAM errors.
+
+**Solutions**:
+
+- Use a larger GPU: `gpus=[GpuGroup.AMPERE_80]`
+- Reduce `GPU_MEMORY_UTILIZATION` (for vLLM/ML frameworks).
+- Lower `MAX_MODEL_LEN` or batch size.
+- Reduce `workersMax` to limit parallel execution.
+
+### Wrong response format
+
+**Problem**: Cannot extract data from `result.output`.
+
+**Solutions**:
+
+- Check your worker's documentation for response format.
+- Print the full `result` to see the structure.
+- Look at worker logs for errors.
+
+### Authentication errors
+
+**Problem**: Cannot download gated models or private images.
+
+**Solutions**:
+
+- Add `HF_TOKEN` to `env` for Hugging Face gated models.
+- Configure Docker registry authentication in Runpod console for private images.
+- Verify API keys are correct.
+
+## Next steps
+
+- [View the resource configuration reference](/flash/resource-configuration) for all `ServerlessEndpoint` options.
+- [Learn about vLLM deployment](/serverless/vllm/overview) for LLM inference.
+- [Build custom Serverless workers](/serverless/workers/overview) for specialized use cases.
+- [Create Flash apps](/flash/apps/build-app) combining custom images with FastAPI.
diff --git a/flash/monitoring.mdx b/flash/monitoring.mdx
new file mode 100644
index 00000000..96212791
--- /dev/null
+++ b/flash/monitoring.mdx
@@ -0,0 +1,177 @@
+---
+title: "Monitor and debug remote functions"
+sidebarTitle: "Monitor and debug"
+description: "Monitor, debug, and troubleshoot Flash deployments."
+tag: "BETA"
+---
+
+This page covers how to monitor and debug your Flash deployments, including viewing logs, troubleshooting common issues, and optimizing performance.
+
+## Viewing logs
+
+When running Flash functions, logs are displayed in your terminal. The output includes:
+
+- Endpoint creation and reuse status.
+- Job submission and queue status.
+- Execution progress.
+- Worker information (delay time, execution time).
+
+Example output:
+
+```text
+2025-11-19 12:35:15,109 | INFO  | Created endpoint: rb50waqznmn2kg - flash-quickstart-fb
+2025-11-19 12:35:15,112 | INFO  | URL: https://console.runpod.io/serverless/user/endpoint/rb50waqznmn2kg
+2025-11-19 12:35:15,114 | INFO  | LiveServerless:rb50waqznmn2kg | API /run
+2025-11-19 12:35:15,655 | INFO  | LiveServerless:rb50waqznmn2kg | Started Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2
+2025-11-19 12:35:15,762 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: IN_QUEUE
+2025-11-19 12:36:09,983 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: COMPLETED
+2025-11-19 12:36:10,068 | INFO  | Worker:icmkdgnrmdf8gz | Delay Time: 51842 ms
+2025-11-19 12:36:10,068 | INFO  | Worker:icmkdgnrmdf8gz | Execution Time: 1533 ms
+```
+
+### Log levels
+
+You can control log verbosity using the `LOG_LEVEL` environment variable:
+
+```bash
+LOG_LEVEL=DEBUG python your_script.py
+```
+
+Available log levels: `DEBUG`, `INFO`, `WARNING`, `ERROR`.
+
+## Monitoring in the Runpod console
+
+View detailed metrics and logs in the [Runpod console](https://www.runpod.io/console/serverless):
+
+1. Navigate to the **Serverless** section.
+2. Click on your endpoint to view:
+   - Active workers and queue depth.
+   - Request history and job status.
+   - Worker logs and execution details.
+   - Metrics (requests, latency, errors).
+
+### Endpoint metrics
+
+The console provides metrics including:
+
+- **Request rate**: Number of requests per minute.
+- **Queue depth**: Number of pending requests.
+- **Latency**: Average response time.
+- **Worker count**: Active and idle workers.
+- **Error rate**: Failed requests percentage.
+
+## Debugging common issues
+
+### Cold start delays
+
+If you're experiencing slow initial responses:
+
+- **Cause**: Workers need time to start, load dependencies, and initialize models.
+- **Solutions**:
+  - Set `workersMin=1` to keep at least one worker warm.
+  - Use smaller models or optimize model loading.
+  - Use `--auto-provision` with `flash run` for development.
+
+```python
+config = LiveServerless(
+    name="always-warm",
+    workersMin=1,  # Keep one worker always running
+    idleTimeout=30  # Longer idle timeout
+)
+```
+
+### Timeout errors
+
+If requests are timing out:
+
+- **Cause**: Execution taking longer than the timeout limit.
+- **Solutions**:
+  - Increase `executionTimeoutMs` in your configuration.
+  - Optimize your function to run faster.
+  - Break long operations into smaller chunks.
+
+```python
+config = LiveServerless(
+    name="long-running",
+    executionTimeoutMs=600000  # 10 minutes
+)
+```
+
+### Memory errors
+
+If you're seeing out-of-memory errors:
+
+- **Cause**: Model or data too large for available GPU/CPU memory.
+- **Solutions**:
+  - Use a larger GPU type (e.g., `GpuGroup.AMPERE_80` for 80GB VRAM).
+  - Use model quantization or smaller batch sizes.
+  - Clear GPU memory between operations.
+
+```python
+config = LiveServerless(
+    name="large-model",
+    gpus=[GpuGroup.AMPERE_80],  # A100 80GB
+    template=PodTemplate(containerDiskInGb=100)  # More disk space
+)
+```
+
+### Dependency errors
+
+If packages aren't being installed correctly:
+
+- **Cause**: Missing or incompatible dependencies.
+- **Solutions**:
+  - Verify package names and versions in the `dependencies` list.
+  - Check that packages have Linux `x86_64` wheels available.
+  - Import packages inside the function, not at the top of the file.
+
+```python
+@remote(
+    resource_config=config,
+    dependencies=["torch==2.0.0", "transformers==4.36.0"]  # Pin versions
+)
+def my_function(data):
+    import torch  # Import inside the function
+    import transformers
+    # ...
+```
+
+### Authentication errors
+
+If you're seeing API key errors:
+
+- **Cause**: Missing or invalid Runpod API key.
+- **Solutions**:
+  - Verify your API key is set in the environment.
+  - Check that the `.env` file is in the correct directory.
+  - Ensure the API key has the required permissions.
+
+```bash
+# Check if API key is set
+echo $RUNPOD_API_KEY
+
+# Set API key directly
+export RUNPOD_API_KEY=your_api_key_here
+```
+
+## Performance optimization
+
+### Reducing cold starts
+
+- Set `workersMin=1` for endpoints that need fast responses.
+- Use `idleTimeout` to balance cost and warm worker availability.
+- Cache models on network volumes to reduce loading time.
+
+### Optimizing execution time
+
+- Profile your functions to identify bottlenecks.
+- Use appropriate GPU types for your workload.
+- Batch multiple inputs into a single request when possible.
+- Use async operations to parallelize independent tasks.
+
+### Managing costs
+
+- Set appropriate `workersMax` limits to control scaling.
+- Use CPU workers for non-GPU tasks.
+- Monitor usage in the console to identify optimization opportunities.
+- Use shorter `idleTimeout` for sporadic workloads.
\ No newline at end of file
diff --git a/flash/overview.mdx b/flash/overview.mdx
new file mode 100644
index 00000000..106a00b8
--- /dev/null
+++ b/flash/overview.mdx
@@ -0,0 +1,260 @@
+---
+title: "Overview"
+sidebarTitle: "Overview"
+description: "Rapidly develop and deploy AI/ML apps with the Flash Python SDK."
+tag: "BETA"
+---
+
+import { ServerlessTooltip, PodsTooltip, WorkersTooltip } from "/snippets/tooltips.jsx";
+
+<Note>
+Flash is currently in beta. [Join our Discord](https://discord.gg/cUpRmau42V) to provide feedback and get support.
+</Note>
+
+Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serverless](/serverless/overview). You write Python functions locally, and Flash handles infrastructure management, GPU/CPU provisioning, dependency installation, and data transfer automatically.
+
+<CardGroup>
+  <Card title="Quickstart" href="/flash/quickstart" icon="bolt">
+    Write a standalone Flash script for instant access to Runpod infrastructure.
+  </Card>
+  <Card title="Build an app" href="/flash/apps/build-app" icon="code">
+    Create a Flash app with a FastAPI server and deploy it on Runpod to serve production endpoints.
+  </Card>
+</CardGroup>
+
+## Why use Flash?
+
+Flash is the easiest and fastest way to develop and deploy AI/ML workloads on Runpod:
+
+- **No Docker images or manual resource management:** Unlike traditional Runpod <ServerlessTooltip /> (which requires you to build custom Docker images) or <PodsTooltip /> (which require manual management and bill 24/7), Flash automatically handles infrastructure using simple Python decorators.
+- **Write [remote functions](#remote-functions) using local Python scripts:** Run the script, and Flash provisions endpoints, installs dependencies, and scales GPU/CPU <WorkersTooltip /> automatically.
+- **Instant updates without rebuilds:** When you update your code, changes can be deployed to your workers instantly without requiring you to rebuild/redeploy the worker image—just run the script again.
+- **Granular hardware control:** Specify the [exact hardware](#resource-configuration) you need for each function, from RTX 4090s to A100 80GB GPUs, enabling you to optimize for cost and performance for AI inference, training, and other compute-intensive tasks.
+- **Production-ready architecture:** When you're ready to deploy your code to production, build a [Flash app](/flash/apps/overview) with a FastAPI server to route requests between GPU/CPU workers. The [Flash CLI](/flash/cli/overview) gives you full control over the app's development and deployment lifecycle.
+- **Pay only for compute time:** Flash uses the same per-second pricing model as [Runpod Serverless](/serverless/pricing). You're only charged for actual compute time—there are no costs when your code isn't running.
+
+## Install Flash
+
+Create a Python virtual environment and use `pip` to install Flash:
+
+```bash
+python3 -m venv venv
+source venv/bin/activate
+pip install runpod-flash
+```
+
+<Note>
+Flash requires Python 3.10 or higher.
+</Note>
+
+In your project directory, create a `.env` file and add your [Runpod API key](/get-started/api-keys), replacing `YOUR_API_KEY` with your actual API key:
+
+```bash
+touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
+```
+
+<Note>
+Your Flash API key needs **All** access permissions to your Runpod account. You can generate an API key with the correct permissions from [Settings > API Keys](https://www.runpod.io/console/user/settings) in the Runpod console.
+</Note>
+
+
+## Core concepts
+
+### Remote functions
+
+The `@remote` decorator marks functions for execution on Runpod's infrastructure. Code inside the decorated function runs remotely on a Serverless worker, while code outside the function runs locally on your machine.
+
+```python
+@remote(resource_config=config, dependencies=["pandas"])
+def process_data(data):
+    # This code runs remotely on Runpod
+    import pandas as pd
+    df = pd.DataFrame(data)
+    return df.describe().to_dict()
+
+async def main():
+    # This code runs locally
+    result = await process_data(my_data)
+```
+
+When you run a remote function, Flash:
+- Automatically provisions resources on Runpod's infrastructure.
+- Installs your dependencies automatically.
+- Runs your function on a remote GPU/CPU.
+- Returns the result to your local environment.
+
+[Learn more about remote functions](/flash/remote-functions).
+
+### Resource configuration
+
+Flash provides fine-grained control over hardware allocation through configuration objects. You can configure GPU types, worker counts, idle timeouts, environment variables, and more.
+
+```python
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+gpu_config = LiveServerless(
+    name="ml-inference",
+    gpus=[GpuGroup.AMPERE_80],  # A100 80GB
+    workersMax=5
+)
+```
+
+[View the complete resource configuration reference](/flash/resource-configuration).
+
+### Dependency management
+
+Specify Python packages in the decorator, and Flash installs them automatically on the remote worker:
+
+```python
+@remote(
+    resource_config=gpu_config,
+    dependencies=["transformers==4.36.0", "torch", "pillow"]
+)
+def generate_image(prompt):
+    # Import inside the function
+    from transformers import pipeline
+    # ...
+```
+
+Imports should be placed inside the function body because they need to happen on the remote worker, not in your local environment.
+
+[Learn more about dependency management](/flash/remote-functions#dependency-management).
+
+### Parallel execution
+
+Run multiple remote functions concurrently using Python's async capabilities:
+
+```python
+results = await asyncio.gather(
+    process_item(item1),
+    process_item(item2),
+    process_item(item3)
+)
+```
+
+## Development workflows
+
+Flash supports two primary workflows for running workloads on Runpod: standalone scripts and Flash apps.
+
+### Standalone scripts
+
+This is the fastest way to get started with Flash. Just write a Python script with `@remote` decorated functions and run it locally with `python script.py`.
+
+```python
+import asyncio
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+config = LiveServerless(
+    name="gpu-inference",
+    gpus=[GpuGroup.ADA_24],
+)
+
+@remote(resource_config=config, dependencies=["torch"])
+def process_on_gpu(data):
+    import torch
+    # Your GPU workload here
+    return {"result": "processed"}
+
+async def main():
+    result = await process_on_gpu({"input": "data"})
+    print(result)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+Run the script locally, and Flash executes the `@remote` function on Runpod's infrastructure:
+
+```bash
+python script.py
+```
+
+**Use this approach for:**
+- Quick prototypes and experiments.
+- Batch processing jobs.
+- One-off data processing tasks.
+- Local development and testing.
+
+[Follow the quickstart](/flash/quickstart) to create your first Flash script.
+
+### Flash apps
+
+When you're ready to build a production-ready API, you can create a [Flash app](/flash/apps/overview) with FastAPI and deploy it to Runpod. Flash apps provide a complete development and deployment workflow with local testing and production deployment.
+
+To get started, initialize a new Flash app project in your current directory:
+
+```bash
+flash init
+```
+
+This creates a new project with a FastAPI server and example workers. Remote functions are defined in the `workers/` directory.
+
+Start a local development server to test your app:
+
+```bash
+flash run
+```
+
+Deploy your app to production when ready:
+
+```bash
+flash deploy
+```
+
+**Use this approach for:**
+
+- Production HTTP APIs.
+- Persistent endpoints.
+- Long-running services.
+- Team collaboration with staging/production environments.
+
+[Follow this tutorial](/flash/apps/build-app) to build your first Flash app.
+
+## Use cases
+
+Flash is well-suited for a range of AI and data processing workloads:
+
+- **Multi-modal AI pipelines**: Orchestrate unified workflows combining text, image, and audio models with GPU acceleration.
+- **Distributed model training**: Scale training operations across multiple GPU workers for faster model development.
+- **AI research experimentation**: Rapidly prototype and test complex model combinations without infrastructure overhead.
+- **Production inference systems**: Deploy multi-stage inference pipelines for real-world applications.
+- **Data processing workflows**: Process large datasets using CPU workers for general computation and GPU workers for accelerated tasks.
+- **Hybrid GPU/CPU workflows**: Optimize cost and performance by combining CPU preprocessing with GPU inference.
+
+## Coding agent integration
+
+Flash provides a skill package for AI coding agents like Claude Code, Cline, and Cursor. The skill gives these agents detailed context about the Flash SDK, CLI, best practices, and common patterns.
+
+Install the Flash skill by running the following command in your terminal:
+
+```bash
+npx skills add runpod/skills
+```
+
+This allows your coding agent to provide more accurate Flash code suggestions and troubleshooting help. You can find the Flash `SKILL.md` file in the [runpod/skills repository](https://github.com/runpod/skills/blob/main/flash/SKILL.md).
+
+## Limitations
+
+- Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter.
+- Be aware of your account's maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints. Contact [Runpod support](https://www.runpod.io/contact) to increase your account's capacity allocation if needed.
+
+## Next steps
+
+<CardGroup cols={2}>
+  <Card title="Quickstart" href="/flash/quickstart" icon="bolt">
+    Write your first standalone script with Flash
+  </Card>
+  <Card title="Build an app" href="/flash/apps/build-app" icon="code">
+    Create a FastAPI app with Flash
+  </Card>
+  <Card title="Configuration reference" href="/flash/resource-configuration" icon="sliders">
+    Complete reference for resource configuration
+  </Card>
+  <Card title="CLI reference" href="/flash/cli/overview" icon="terminal">
+    Learn about Flash CLI commands
+  </Card>
+</CardGroup>
+
+## Getting help
+
+Join the [Runpod community on Discord](https://discord.gg/cUpRmau42V) for support and discussion.
diff --git a/flash/pricing.mdx b/flash/pricing.mdx
new file mode 100644
index 00000000..28ca0df8
--- /dev/null
+++ b/flash/pricing.mdx
@@ -0,0 +1,109 @@
+---
+title: "Pricing"
+sidebarTitle: "Pricing"
+description: "Understand Flash pricing and optimize your costs."
+tag: "BETA"
+---
+
+Flash follows the same pricing model as [Runpod Serverless](/serverless/pricing). You pay per second of compute time, with no charges when your code isn't running. Pricing depends on the GPU or CPU type you configure for your endpoints.
+
+## How pricing works
+
+You're billed from when a worker starts until it completes your request, plus any idle time before scaling down. If a worker is already warm, you skip the cold start and only pay for execution time.
+
+### Compute cost breakdown
+
+Flash workers incur charges during these periods:
+
+1. **Start time**: The time required to initialize a worker and load models into GPU memory. This includes starting the container, installing dependencies, and preparing the runtime environment.
+2. **Execution time**: The time spent processing your request (running your `@remote` decorated function).
+3. **Idle time**: The period a worker remains active after completing a request, waiting for additional requests before scaling down.
+
+### Pricing by resource type
+
+Flash supports both GPU and CPU workers. Pricing varies based on the hardware type:
+
+- **GPU workers**: Use `LiveServerless` or `ServerlessEndpoint` with GPU configurations. Pricing depends on the GPU type (e.g., RTX 4090, A100 80GB).
+- **CPU workers**: Use `LiveServerless` or `CpuServerlessEndpoint` with CPU configurations. Pricing depends on the CPU instance type.
+
+See the [Serverless pricing page](/serverless/pricing) for current rates by GPU and CPU type.
+
+## How to estimate and optimize costs
+
+To estimate costs for your Flash workloads, consider:
+
+- How long each function takes to execute.
+- How many concurrent workers you need (`workersMax` setting).
+- Which GPU or CPU types you'll use.
+- Your idle timeout configuration (`idleTimeout` setting).
+
+### Cost optimization strategies
+
+#### Choose appropriate hardware
+
+Select the smallest GPU or CPU that meets your performance requirements. For example, if your workload fits in 24GB of VRAM, use `GpuGroup.ADA_24` or `GpuGroup.AMPERE_24` instead of larger GPUs.
+
+```python
+# Cost-effective configuration for workloads that fit in 24GB VRAM
+config = LiveServerless(
+    name="cost-optimized",
+    gpus=[GpuGroup.ADA_24, GpuGroup.AMPERE_24],  # RTX 4090, L4, A5000, 3090
+)
+```
+
+#### Configure idle timeouts
+
+Balance responsiveness and cost by adjusting the `idleTimeout` parameter. Shorter timeouts reduce idle costs but increase cold starts for sporadic traffic.
+
+```python
+# Lower idle timeout for cost savings (more cold starts)
+config = LiveServerless(
+    name="low-idle",
+    idleTimeout=5,  # 5 seconds (default)
+)
+
+# Higher idle timeout for responsiveness (higher idle costs)
+config = LiveServerless(
+    name="responsive",
+    idleTimeout=30,  # 30 seconds
+)
+```
+
+#### Use CPU workers for non-GPU tasks
+
+For data preprocessing, postprocessing, or other tasks that don't require GPU acceleration, use CPU workers instead of GPU workers.
+
+```python
+from runpod_flash import LiveServerless, CpuInstanceType
+
+# CPU configuration for non-GPU tasks
+cpu_config = LiveServerless(
+    name="data-processor",
+    instanceIds=[CpuInstanceType.CPU5C_2_4],  # 2 vCPU, 4GB RAM
+)
+```
+
+#### Limit maximum workers
+
+Set `workersMax` to prevent runaway scaling and unexpected costs:
+
+```python
+config = LiveServerless(
+    name="controlled-scaling",
+    workersMax=3,  # Limit to 3 concurrent workers
+)
+```
+
+### Monitoring costs
+
+Monitor your usage in the [Runpod console](https://www.runpod.io/console/serverless) to track:
+
+- Total compute time across endpoints.
+- Worker utilization and idle time.
+- Cost breakdown by endpoint.
+
+## Next steps
+
+- [Create remote functions](/flash/remote-functions) with optimized resource configurations.
+- [View Serverless pricing details](/serverless/pricing) for current rates.
+- [Configure resources](/flash/resource-configuration) for your workloads.
diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx
new file mode 100644
index 00000000..60770d89
--- /dev/null
+++ b/flash/quickstart.mdx
@@ -0,0 +1,358 @@
+---
+title: "Get started with Flash"
+sidebarTitle: "Quickstart"
+description: "Run your first GPU workload with Flash in less than 5 minutes."
+tag: "BETA"
+---
+
+This quickstart gets you running GPU workloads on Runpod in minutes. You'll execute a function on a remote GPU and see the results immediately.
+
+## Requirements
+
+- [Runpod account](/get-started/manage-accounts).
+- [An API key](/get-started/api-keys) with **All** access permissions to your Runpod account.
+- [Python 3.10+](https://www.python.org/downloads/) installed.
+
+## Step 1: Install Flash
+
+Create a virtual environment and install Flash:
+
+```bash
+python3 -m venv venv
+source venv/bin/activate
+pip install runpod-flash
+```
+
+## Step 2: Add your API key to your environment
+
+Create a `.env` file with your Runpod API key:
+
+```bash
+touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
+```
+
+Replace `YOUR_API_KEY` with your actual API key from the [Runpod console](https://www.runpod.io/console/user/settings).
+
+<Warning>
+Your API key needs **All** access permissions to your Runpod account.
+</Warning>
+
+## Step 3: Copy this code
+
+Create a file called `gpu_demo.py` and paste this code into it:
+
+```python
+import asyncio
+from dotenv import load_dotenv
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+# Load API key from .env
+load_dotenv()
+
+# Configure GPU resources
+config = LiveServerless(
+    name="flash-quickstart",
+    gpus=[GpuGroup.ADA_24],  # RTX 4090
+    workersMax=3
+)
+
+# Define a function that runs on GPU
+@remote(resource_config=config, dependencies=["numpy", "torch"])
+def gpu_matrix_multiply(size):
+    # IMPORTANT: Import packages INSIDE the function
+    import numpy as np
+    import torch
+
+    # Get GPU name
+    device_name = torch.cuda.get_device_name(0)
+
+    # Create random matrices
+    A = np.random.rand(size, size)
+    B = np.random.rand(size, size)
+
+    # Multiply matrices
+    C = np.dot(A, B)
+
+    return {
+        "matrix_size": size,
+        "result_mean": float(np.mean(C)),
+        "gpu": device_name
+    }
+
+# Call the function
+async def main():
+    print("Running matrix multiplication on Runpod GPU...")
+    result = await gpu_matrix_multiply(1000)
+
+    print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
+    print(f"✓ Result mean: {result['result_mean']:.4f}")
+    print(f"✓ GPU used: {result['gpu']}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+## Step 4: Run it
+
+Execute the script:
+
+```bash
+python gpu_demo.py
+```
+
+You'll see Flash provision a GPU worker and execute your function:
+
+```text
+Running matrix multiplication on Runpod GPU...
+Creating endpoint: server_LiveServerless_a1b2c3d4
+Provisioning Serverless endpoint...
+Endpoint ready
+Executing function on RunPod endpoint ID: xvf32dan8rcilp
+Initial job status: IN_QUEUE
+Job completed, output received
+
+✓ Matrix size: 1000x1000
+✓ Result mean: 249.8286
+✓ GPU used: NVIDIA GeForce RTX 4090
+```
+
+The first run takes 30-60 seconds, while Runpod provisions the endpoint, installs dependencies, and starts a worker. Subsequent runs take 2-3 seconds (because the worker is already running).
+
+<Tip>
+Try running the script again immediately and notice how much faster it is. Flash reuses the same endpoint and cached dependencies. You can even update the code and run it again to see the changes take effect instantly.
+</Tip>
+
+## Step 5: Understand what you just did
+
+Let's break down the code you just ran:
+
+### Imports and setup
+
+```python
+import asyncio
+from dotenv import load_dotenv
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+load_dotenv()
+```
+
+- **`asyncio`**: Enables asynchronous execution (Flash functions run async).
+- **`load_dotenv()`**: Loads your `RUNPOD_API_KEY` from the `.env` file for authentication.
+- **`remote`, `LiveServerless`, `GpuGroup`**: Core Flash components.
+
+### Resource configuration
+
+```python
+config = LiveServerless(
+    name="flash-quickstart",
+    gpus=[GpuGroup.ADA_24],
+    workersMax=3
+)
+```
+
+This tells Flash what hardware to use:
+
+- **`name`**: Identifies your endpoint in the [Runpod console](https://www.runpod.io/console/serverless).
+- **`gpus`**: Which GPU types to use (here: RTX 4090 with 24GB VRAM).
+- **`workersMax`**: Maximum parallel workers (allows 3 concurrent executions).
+
+See [GPU types](/references/gpu-types#gpu-pools) for all available GPUs or [resource configuration](/flash/resource-configuration) for all options.
+
+### Remote function
+
+```python
+@remote(resource_config=config, dependencies=["numpy", "torch"])
+def gpu_matrix_multiply(size):
+    import numpy as np
+    import torch
+    
+    # Get GPU name
+    device_name = torch.cuda.get_device_name(0)
+
+    # Create random matrices
+    A = np.random.rand(size, size)
+    B = np.random.rand(size, size)
+
+    # Multiply matrices
+    C = np.dot(A, B)
+
+    return {
+        "matrix_size": size,
+        "result_mean": float(np.mean(C)),
+        "gpu": device_name
+    }
+```
+
+The `@remote` decorator marks the function to run on Runpod's infrastructure:
+
+- **`resource_config=config`**: Uses the GPU configuration you defined.
+- **`dependencies=["numpy", "torch"]`**: Packages to install on the worker.
+- **Function body**: The matrix multiplication code runs on the remote GPU, not your local machine.
+- **Return value**: The result is returned to your local machine as a Python dictionary.
+
+<Warning>
+You must import packages **inside the function body**, not at the top of your file. These imports need to happen on the remote worker.
+</Warning>
+
+### Calling the function
+
+```python
+async def main():
+    print("Running matrix multiplication on Runpod GPU...")
+    result = await gpu_matrix_multiply(1000)
+
+    print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
+    print(f"✓ Result mean: {result['result_mean']:.4f}")
+    print(f"✓ GPU used: {result['gpu']}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+Here's what happens when you run the `@remote` decorated function:
+
+1. Flash checks if the endpoint specified in your `LiveServerless` configuration already exists.
+   - If yes: It updates the endpoint if the configuration has changed.
+   - If no: It creates a new endpoint, initializes a worker, and installs your dependencies.
+2. Flash sends your code to the GPU worker
+3. The GPU worker executes the function with the provided inputs.
+4. The result is returned to your local machine as a Python dictionary, where it's printed in your terminal.
+
+Everything outside the `@remote` function (all the `print` statements, etc.) runs **locally on your machine**. Only the decorated function runs remotely.
+
+## Step 6: Run multiple operations in parallel
+
+Flash makes it easy to run multiple GPU operations concurrently. Replace your `main()` function with the code below:
+
+```python
+async def main():
+    print("Running 3 matrix operations in parallel...")
+
+    # Run all three operations at once
+    results = await asyncio.gather(
+        gpu_matrix_multiply(500),
+        gpu_matrix_multiply(1000),
+        gpu_matrix_multiply(2000)
+    )
+
+    # Print results
+    for i, result in enumerate(results, 1):
+        print(f"\n{i}. Size: {result['matrix_size']}x{result['matrix_size']}")
+        print(f"   Mean: {result['result_mean']:.4f}")
+        print(f"   GPU: {result['gpu']}")
+```
+
+Run the script again:
+
+```bash
+python gpu_demo.py
+```
+
+All three operations execute simultaneously:
+
+```text
+Running 3 matrix operations in parallel...
+Initial job status: IN_QUEUE
+Initial job status: IN_QUEUE
+Initial job status: IN_QUEUE
+Job completed, output received
+Job completed, output received
+Job completed, output received
+
+1. Size: 500x500
+   Mean: 125.3097
+   GPU: NVIDIA GeForce RTX 4090
+
+2. Size: 1000x1000
+   Mean: 249.9442
+   GPU: NVIDIA GeForce RTX 4090
+
+3. Size: 2000x2000
+   Mean: 500.1321
+   GPU: NVIDIA GeForce RTX 4090
+```
+
+## Clean up
+
+When you're done testing, clean up the endpoints:
+
+```bash
+# List all endpoints
+flash undeploy list
+
+# Remove the quickstart endpoint
+flash undeploy flash-quickstart
+
+# Or remove all endpoints
+flash undeploy --all
+```
+
+## Next steps
+
+You've successfully run GPU code on Runpod! Now you're ready to learn more about Flash:
+
+<CardGroup cols={2}>
+  <Card title="Build a text generation app" href="/tutorials/flash/text-generation-with-transformers" icon="message-bot">
+    Use Hugging Face transformers to generate text with GPT-2
+  </Card>
+  <Card title="Create remote functions" href="/flash/remote-functions" icon="function">
+    Learn how to configure and optimize remote functions
+  </Card>
+  <Card title="Configure resources" href="/flash/resource-configuration" icon="sliders">
+    Choose GPUs, adjust workers, set timeouts
+  </Card>
+  <Card title="Build Flash apps" href="/flash/apps/overview" icon="code">
+    Deploy production APIs with FastAPI
+  </Card>
+</CardGroup>
+
+## Troubleshooting
+
+### Authentication error
+
+```text
+Error: API key is not set
+```
+
+**Solution**: Make sure your `.env` file is in the same directory as your Python script and contains `RUNPOD_API_KEY=your_key`. 
+
+You can also export the API key in your terminal as a workaround:
+
+```bash
+export RUNPOD_API_KEY=your_key
+```
+
+### Job stuck in queue
+
+```text
+Initial job status: IN_QUEUE
+[Stays in queue for >60 seconds]
+```
+
+**Solution**: No GPUs available. Add more GPU types for fallback:
+
+```python
+config = LiveServerless(
+    name="flash-quickstart",
+    gpus=[GpuGroup.ADA_24, GpuGroup.AMPERE_24, GpuGroup.AMPERE_48]
+)
+```
+
+Or check [GPU availability](https://www.runpod.io/console/serverless) in the console.
+
+### Import errors
+
+```text
+ModuleNotFoundError: No module named 'numpy'
+```
+
+**Solution**: Move imports inside the `@remote` function:
+
+```python
+@remote(resource_config=config, dependencies=["numpy"])
+def my_function():
+    import numpy as np  # Import here, not at top of file
+    # ...
+```
+
+See the [execution model](/flash/execution-model#common-execution-issues) for more troubleshooting.
diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx
new file mode 100644
index 00000000..3a5da808
--- /dev/null
+++ b/flash/remote-functions.mdx
@@ -0,0 +1,273 @@
+---
+title: "Create remote functions"
+sidebarTitle: "Create remote functions"
+description: "Learn how to create and configure remote functions with Flash."
+tag: "BETA"
+---
+
+Remote functions are the core building blocks of Flash. The `@remote` decorator marks Python functions for execution on Runpod's Serverless infrastructure, handling resource provisioning, dependency installation, and data transfer automatically.
+
+## How remote functions work
+
+A remote function is just a Python function that's been marked with the `@remote` decorator. For example:
+
+```python
+@remote(resource_config=config, dependencies=["torch"])
+def run_inference(data):
+    import torch
+    # Your inference code here
+    return result
+```
+
+When you call a remote function from a local Python script or [Flash app](/flash/apps/overview), the function code is sent to a Runpod worker. The worker executes the function code and returns the result to your local environment.
+
+## Resource configuration
+
+Every remote function requires a resource configuration that specifies the compute resources to use.
+
+`LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure.
+
+### GPU configuration
+
+For GPU workloads, create a `LiveServerless` configuration and specify the [GPU pool(s)](/references/gpu-types#gpu-pools) that your workers will use with the `gpus` parameter.
+
+```python
+from runpod_flash import LiveServerless, GpuGroup
+
+gpu_config = LiveServerless(
+    name="ml-inference",
+    gpus=[GpuGroup.AMPERE_80],  # A100 80GB
+    workersMax=5,
+    idleTimeout=10
+)
+
+@remote(resource_config=gpu_config, dependencies=["torch"])
+def run_inference(data):
+    import torch
+    # Your inference code here
+    return result
+```
+
+Here are the common configuration options for `LiveServerless`:
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `name` | Name for your endpoint (required) | - |
+| `gpus` | [GPU pool IDs](/references/gpu-types#gpu-pools) that can be used by workers | `[GpuGroup.ANY]` |
+| `workersMax` | Maximum number of workers | 3 |
+| `workersMin` | Minimum number of workers | 0 |
+| `idleTimeout` | Minutes before scaling down | 5 |
+
+See the [resource configuration reference](/flash/resource-configuration) for all available options.
+
+<Tip>
+Learn how to view, update, and delete your endpoints in the [managing endpoints guide](/flash/managing-endpoints).
+</Tip>
+
+### CPU configuration
+
+For CPU workloads, specify `instanceIds` instead of `gpus`:
+
+```python
+from runpod_flash import LiveServerless, CpuInstanceType
+
+cpu_config = LiveServerless(
+    name="data-processor",
+    instanceIds=[CpuInstanceType.CPU5C_4_8],  # 4 vCPU, 8GB RAM
+    workersMax=3
+)
+
+@remote(resource_config=cpu_config, dependencies=["pandas"])
+def process_data(data):
+    import pandas as pd
+    df = pd.DataFrame(data)
+    return df.describe().to_dict()
+```
+
+### Custom Docker images
+
+For specialized environments that require pre-built Docker images—such as vLLM, TensorRT, or images with custom system dependencies—you'll need to use the `ServerlessEndpoint` configuration.
+
+See [Custom Docker images](/flash/custom-docker-images) for details.
+
+
+## Dependency management
+
+Specify Python packages in the `dependencies` parameter of the `@remote` decorator. Flash installs these packages on the remote worker before executing your function.
+
+```python
+@remote(
+    resource_config=config,
+    dependencies=["transformers==4.36.0", "torch", "pillow"]
+)
+def generate_image(prompt):
+    from transformers import pipeline
+    import torch
+    from PIL import Image
+    # Your code here
+```
+
+<Tip>
+Some packages (like PyTorch) are pre-installed on GPU workers, but including them in dependencies ensures the correct version is available.
+</Tip>
+
+
+### Import packages inside the function body
+
+You must import packages **inside the decorated function body,** not at the top of your file. This will ensure the imports happen on the remote worker, not in your local environment.
+
+
+**Correct:** imports inside the function.
+```python
+@remote(resource_config=config, dependencies=["numpy"])
+def compute(data):
+    import numpy as np  # Import here
+    return np.sum(data)
+```
+**Incorrect:** imports at top of file won't work.
+
+```python
+import numpy as np  # This import happens locally, not on the worker
+
+@remote(resource_config=config, dependencies=["numpy"])
+def compute(data):
+    return np.sum(data)  # numpy not available on the remote worker
+```
+
+### Version pinning
+
+You can pin specific versions using standard pip syntax:
+
+```python
+dependencies=["transformers==4.36.0", "torch>=2.0.0"]
+```
+
+## Parallel execution
+
+Flash functions are asynchronous by default. Use Python's `asyncio` to run multiple functions in parallel:
+
+```python
+import asyncio
+
+async def main():
+    # Run three functions in parallel
+    results = await asyncio.gather(
+        process_item(item1),
+        process_item(item2),
+        process_item(item3)
+    )
+    return results
+```
+
+This is particularly useful for:
+
+- Batch processing multiple inputs.
+- Running different models on the same data.
+- Parallelizing independent pipeline stages.
+
+### Example: Parallel batch processing
+
+```python
+import asyncio
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+config = LiveServerless(
+    name="batch-processor",
+    gpus=[GpuGroup.ADA_24],
+    workersMax=5  # Allow up to 5 parallel workers
+)
+
+@remote(resource_config=config, dependencies=["torch"])
+def process_batch(batch_id, data):
+    import torch
+    # Process batch
+    return {"batch_id": batch_id, "result": len(data)}
+
+async def main():
+    batches = [
+        (1, [1, 2, 3]),
+        (2, [4, 5, 6]),
+        (3, [7, 8, 9])
+    ]
+    
+    # Process all batches in parallel
+    results = await asyncio.gather(*[
+        process_batch(batch_id, data) 
+        for batch_id, data in batches
+    ])
+    
+    print(results)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+## Using persistent storage
+
+Attach [network volumes](/storage/network-volumes) for persistent storage across workers and endpoints. This is useful for sharing large models or datasets between workers without downloading them each time.
+
+```python
+config = LiveServerless(
+    name="model-server",
+    networkVolumeId="vol_abc123",  # Your network volume ID
+    template=PodTemplate(containerDiskInGb=100)
+)
+```
+
+To find your network volume ID:
+
+1. Go to the [Storage page](https://www.runpod.io/console/storage) in the Runpod console.
+2. Click on your network volume.
+3. Copy the volume ID from the URL or volume details.
+
+### Example: Using a network volume for model storage
+
+```python
+from runpod_flash import LiveServerless, GpuGroup, PodTemplate
+
+config = LiveServerless(
+    name="model-inference",
+    gpus=[GpuGroup.AMPERE_80],
+    networkVolumeId="vol_abc123",
+    template=PodTemplate(containerDiskInGb=100)
+)
+
+@remote(resource_config=config, dependencies=["torch", "transformers"])
+def run_inference(prompt):
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    
+    # Load model from network volume
+    model_path = "/runpod-volume/models/llama-7b"
+    model = AutoModelForCausalLM.from_pretrained(model_path)
+    tokenizer = AutoTokenizer.from_pretrained(model_path)
+    
+    # Run inference
+    inputs = tokenizer(prompt, return_tensors="pt")
+    outputs = model.generate(**inputs)
+    return tokenizer.decode(outputs[0])
+```
+
+## Environment variables
+
+Pass environment variables to remote functions using the `env` parameter:
+
+```python
+config = LiveServerless(
+    name="api-worker",
+    env={"HF_TOKEN": "your_token", "MODEL_ID": "gpt2"}
+)
+```
+
+<Note>
+
+Environment variables are excluded from configuration hashing. Changing environment values won't trigger endpoint recreation, which allows different processes to load environment variables from `.env` files without causing false drift detection.
+
+</Note>
+
+
+## Next steps
+
+- [Create API endpoints](/flash/apps/build-app) using FastAPI.
+- [Deploy Flash applications](/flash/apps/deploy-apps) for production.
+- [View the resource configuration reference](/flash/resource-configuration) for all available options.
+- [Clean up development endpoints](/flash/cli/undeploy) when you're done testing.
diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx
new file mode 100644
index 00000000..bd773ad9
--- /dev/null
+++ b/flash/resource-configuration.mdx
@@ -0,0 +1,324 @@
+---
+title: "Resource configuration reference"
+sidebarTitle: "Configuration reference"
+description: "A complete reference for Flash GPU/CPU resource configuration options."
+tag: "BETA"
+---
+
+Flash provides several resource configuration classes for different use cases. This reference covers all available parameters and options.
+
+## LiveServerless
+
+`LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure.
+
+```python
+from runpod_flash import LiveServerless, GpuGroup, CpuInstanceType, PodTemplate
+
+gpu_config = LiveServerless(
+    name="ml-inference",
+    gpus=[GpuGroup.AMPERE_80],
+    workersMax=5,
+    idleTimeout=10,
+    template=PodTemplate(containerDiskInGb=100)
+)
+```
+
+### Parameters
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `name` | `string` | Name for your endpoint (required) | - |
+| `gpus` | `list[GpuGroup]` | GPU pool IDs that can be used by workers | `[GpuGroup.ANY]` |
+| `gpuCount` | `int` | Number of GPUs per worker | 1 |
+| `instanceIds` | `list[CpuInstanceType]` | CPU instance types (forces CPU endpoint) | `None` |
+| `workersMin` | `int` | Minimum number of workers | 0 |
+| `workersMax` | `int` | Maximum number of workers | 3 |
+| `idleTimeout` | `int` | Minutes before scaling down | 5 |
+| `env` | `dict` | Environment variables | `None` |
+| `networkVolumeId` | `string` | Persistent storage volume ID | `None` |
+| `executionTimeoutMs` | `int` | Max execution time in milliseconds | 0 (no limit) |
+| `scalerType` | `string` | Scaling strategy | `QUEUE_DELAY` |
+| `scalerValue` | `int` | Scaling parameter value | 4 |
+| `locations` | `string` | Preferred datacenter locations | `None` |
+| `template` | `PodTemplate` | Pod template overrides | `None` |
+
+### GPU configuration example
+
+```python
+from runpod_flash import LiveServerless, GpuGroup, PodTemplate
+
+config = LiveServerless(
+    name="gpu-inference",
+    gpus=[GpuGroup.AMPERE_80],  # A100 80GB
+    gpuCount=1,
+    workersMin=0,
+    workersMax=5,
+    idleTimeout=10,
+    template=PodTemplate(containerDiskInGb=100),
+    env={"MODEL_ID": "llama-7b"}
+)
+```
+
+### Handling GPU unavailability issues
+
+If no GPUs in your list are available, you'll see:
+
+```text
+Initial job status: IN_QUEUE
+```
+
+The job will stay in queue waiting for capacity to become available.
+
+**Solutions**:
+1. **Check console**: View [GPU availability](https://www.runpod.io/console/serverless) in the Runpod console.
+2. **Add more GPU types**: Expand your `gpus` list to include more fallback options.
+3. **Use GpuGroup.ANY**: Switch to `[GpuGroup.ANY]` for maximum availability.
+4. **Contact support**: If capacity is consistently unavailable, contact [Runpod support](https://www.runpod.io/contact) to discuss increasing your account limits.
+
+### CPU configuration example
+
+```python
+from runpod_flash import LiveServerless, CpuInstanceType
+
+config = LiveServerless(
+    name="cpu-processor",
+    instanceIds=[CpuInstanceType.CPU5C_4_8],  # 4 vCPU, 8GB RAM
+    workersMax=3,
+    idleTimeout=5
+)
+```
+
+## ServerlessEndpoint
+
+`ServerlessEndpoint` is for GPU workloads that require custom Docker images.
+
+These resources work similarly to [traditional Serverless endpoints](/serverless/overview). Before you can run your function, you'll need to:
+- Write a [handler function](/serverless/workers/handler-functions) that processes the input dictionary.
+- [Create a Dockerfile](/serverless/workers/create-dockerfile) that packages your handler function and its dependencies.
+- [Push the image to a container registry](/serverless/workers/deploy).
+
+You'll then add the image name to your resource configuration:
+
+```python highlight="5"
+from runpod_flash import ServerlessEndpoint, GpuGroup
+
+config = ServerlessEndpoint(
+    name="custom-ml-env",
+    imageName="pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime",
+    gpus=[GpuGroup.AMPERE_80]
+)
+```
+
+
+### Request structure
+
+When you make requests to the endpoint, you'll need to provide the input as a dictionary in the form of `{"input": {...}}`. For example:
+
+```json
+{
+    "input": {
+        "prompt": "Hello, world!"
+    }
+}
+```
+
+### Parameters
+
+All parameters from `LiveServerless` are available, plus:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `imageName` | `string` | Custom Docker image | - |
+
+### Limitations
+
+- Only supports dictionary payloads in the form of `{"input": {...}}`.
+- Cannot execute arbitrary Python functions remotely.
+- Requires a custom Docker image with a [handler function](/serverless/workers/handler-functions) that processes the input dictionary.
+
+### Example
+
+```python
+from runpod_flash import ServerlessEndpoint, GpuGroup
+
+# Custom image with pre-installed models
+config = ServerlessEndpoint(
+    name="stable-diffusion",
+    imageName="my-registry/stable-diffusion:v1.0",
+    gpus=[GpuGroup.AMPERE_24],
+    workersMax=3
+)
+
+# Send requests as dictionaries
+result = await config.run({
+    "input": {
+        "prompt": "A beautiful sunset over mountains",
+        "width": 512,
+        "height": 512
+    }
+})
+```
+
+## CpuServerlessEndpoint
+
+`CpuServerlessEndpoint` is for CPU workloads that require custom Docker images. Like `ServerlessEndpoint`, it only supports dictionary payloads.
+
+```python
+from runpod_flash import CpuServerlessEndpoint, CpuInstanceType
+
+config = CpuServerlessEndpoint(
+    name="data-processor",
+    imageName="python:3.11-slim",
+    instanceIds=[CpuInstanceType.CPU5C_4_8]
+)
+```
+
+### Parameters
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `name` | `string` | Name for your endpoint (required) | - |
+| `imageName` | `string` | Custom Docker image | - |
+| `instanceIds` | `list[CpuInstanceType]` | CPU instance types | - |
+| `workersMin` | `int` | Minimum number of workers | 0 |
+| `workersMax` | `int` | Maximum number of workers | 3 |
+| `idleTimeout` | `int` | Minutes before scaling down | 5 |
+| `env` | `dict` | Environment variables | `None` |
+| `networkVolumeId` | `string` | Persistent storage volume ID | `None` |
+| `executionTimeoutMs` | `int` | Max execution time in milliseconds | 0 (no limit) |
+
+## Resource class comparison
+
+| Feature | LiveServerless | ServerlessEndpoint | CpuServerlessEndpoint |
+|---------|----------------|--------------------|-----------------------|
+| Remote code execution | ✅ Full Python function execution | ❌ Dictionary payload only | ❌ Dictionary payload only |
+| Custom Docker images | ❌ Fixed optimized images | ✅ Any Docker image | ✅ Any Docker image |
+| Use case | Dynamic remote functions | Traditional API endpoints | Traditional CPU endpoints |
+| Function returns | Any Python object | Dictionary only | Dictionary only |
+| `@remote` decorator | Full functionality | Limited to payload passing | Limited to payload passing |
+
+## Available GPU types
+
+The `GpuGroup` enum provides access to GPU pools. Each pool contains specific GPU models grouped by architecture and VRAM.
+
+| GpuGroup | GPUs Included | VRAM | Best For |
+|----------|---------------|------|----------|
+| `GpuGroup.ANY` | Any available GPU | Varies | Fast provisioning, prototyping, development |
+| `GpuGroup.AMPERE_16` | RTX A4000 | 16GB | Small models, basic inference |
+| `GpuGroup.AMPERE_24` | L4, A5000, RTX 3090 | 24GB | General ML workloads, mid-size models |
+| `GpuGroup.ADA_24` | RTX 4090 | 24GB | Cost-effective inference, gaming GPUs |
+| `GpuGroup.AMPERE_48` | A40, RTX A6000 | 48GB | Large models, fine-tuning |
+| `GpuGroup.ADA_48_PRO` | L40, L40S | 48GB | Professional inference, large models |
+| `GpuGroup.AMPERE_80` | A100 80GB | 80GB | XL models, intensive training |
+| `GpuGroup.ADA_80_PRO` | H100 | 80GB | Cutting-edge inference, latest architecture |
+| `GpuGroup.HOPPER_141` | H200 | 141GB | Largest models, maximum VRAM |
+
+### Pool naming conventions
+
+GPU pool names follow the pattern: `{ARCHITECTURE}_{VRAM}_{TIER}`
+
+- **AMPERE**: NVIDIA Ampere architecture (A-series, RTX 30-series)
+- **ADA**: NVIDIA Ada Lovelace architecture (RTX 40-series, L40)
+- **HOPPER**: NVIDIA Hopper architecture (H-series)
+- **VRAM number**: Memory capacity in GB (16, 24, 48, 80, 141)
+- **PRO suffix**: Professional/datacenter GPUs (L40, H100, H200)
+
+**Examples**:
+- `AMPERE_80` = Ampere architecture with 80GB VRAM (A100)
+- `ADA_24` = Ada Lovelace with 24GB VRAM (RTX 4090)
+- `ADA_48_PRO` = Professional Ada GPUs with 48GB (L40/L40S)
+
+See the [complete GPU types reference](/references/gpu-types#gpu-pools) for detailed specifications and availability.
+
+## Available CPU instance types
+
+The `CpuInstanceType` enum provides access to CPU configurations:
+
+### 3rd generation general purpose
+
+| CpuInstanceType | ID | vCPU | RAM |
+|-----------------|-----|------|-----|
+| `CPU3G_1_4` | cpu3g-1-4 | 1 | 4GB |
+| `CPU3G_2_8` | cpu3g-2-8 | 2 | 8GB |
+| `CPU3G_4_16` | cpu3g-4-16 | 4 | 16GB |
+| `CPU3G_8_32` | cpu3g-8-32 | 8 | 32GB |
+
+### 3rd generation compute-optimized
+
+| CpuInstanceType | ID | vCPU | RAM |
+|-----------------|-----|------|-----|
+| `CPU3C_1_2` | cpu3c-1-2 | 1 | 2GB |
+| `CPU3C_2_4` | cpu3c-2-4 | 2 | 4GB |
+| `CPU3C_4_8` | cpu3c-4-8 | 4 | 8GB |
+| `CPU3C_8_16` | cpu3c-8-16 | 8 | 16GB |
+
+### 5th generation compute-optimized
+
+| CpuInstanceType | ID | vCPU | RAM |
+|-----------------|-----|------|-----|
+| `CPU5C_1_2` | cpu5c-1-2 | 1 | 2GB |
+| `CPU5C_2_4` | cpu5c-2-4 | 2 | 4GB |
+| `CPU5C_4_8` | cpu5c-4-8 | 4 | 8GB |
+| `CPU5C_8_16` | cpu5c-8-16 | 8 | 16GB |
+
+## PodTemplate
+
+Use `PodTemplate` to configure additional pod settings:
+
+```python
+from runpod_flash import LiveServerless, PodTemplate
+
+config = LiveServerless(
+    name="custom-template",
+    template=PodTemplate(
+        containerDiskInGb=100,
+        env=[{"key": "PYTHONPATH", "value": "/workspace"}]
+    )
+)
+```
+
+### Parameters
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `containerDiskInGb` | `int` | Container disk size in GB | 20 |
+| `env` | `list[dict]` | Environment variables as key-value pairs | `None` |
+
+## Environment variables
+
+Environment variables can be set in two ways:
+
+### Using the `env` parameter
+
+```python
+config = LiveServerless(
+    name="api-worker",
+    env={"HF_TOKEN": "your_token", "MODEL_ID": "gpt2"}
+)
+```
+
+### Using PodTemplate
+
+```python
+config = LiveServerless(
+    name="api-worker",
+    template=PodTemplate(
+        env=[
+            {"key": "HF_TOKEN", "value": "your_token"},
+            {"key": "MODEL_ID", "value": "gpt2"}
+        ]
+    )
+)
+```
+
+<Note>
+
+Environment variables are excluded from configuration hashing. Changing environment values won't trigger endpoint recreation, which allows different processes to load environment variables from `.env` files without causing false drift detection. Only structural changes (like GPU type, image, or template modifications) trigger endpoint updates.
+
+</Note>
+
+## Next steps
+
+- [Create remote functions](/flash/remote-functions) using these configurations.
+- [Deploy Flash applications](/flash/apps/deploy-apps) for production.
+- [Learn about pricing](/flash/pricing) to optimize costs.
diff --git a/images/flash_sdxl_output.png b/images/flash_sdxl_output.png
new file mode 100644
index 00000000..07dbbe29
Binary files /dev/null and b/images/flash_sdxl_output.png differ
diff --git a/snippets/tooltips.jsx b/snippets/tooltips.jsx
index 1751ca50..ace6967f 100644
--- a/snippets/tooltips.jsx
+++ b/snippets/tooltips.jsx
@@ -83,7 +83,7 @@ export const WorkerTooltip = () => {
 
 export const WorkersTooltip = () => {
   return (
-  <Tooltip headline="Worker" tip="A container that runs your application code and processes requests to your Serverless endpoint. Workers are automatically started and stopped by Runpod to handle traffic spikes and ensure optimal resource utilization." cta="Learn more about workers" href="/serverless/workers/overview">worker</Tooltip>
+  <Tooltip headline="Worker" tip="A container that runs your application code and processes requests to your Serverless endpoint. Workers are automatically started and stopped by Runpod to handle traffic spikes and ensure optimal resource utilization." cta="Learn more about workers" href="/serverless/workers/overview">workers</Tooltip>
   );
 };
 
diff --git a/tutorials/flash/image-generation-with-sdxl.mdx b/tutorials/flash/image-generation-with-sdxl.mdx
new file mode 100644
index 00000000..5cb122a6
--- /dev/null
+++ b/tutorials/flash/image-generation-with-sdxl.mdx
@@ -0,0 +1,657 @@
+---
+title: "Generate images with Flash and SDXL"
+sidebarTitle: "Generate images with Flash + SDXL"
+description: "Learn how to use Flash with Stable Diffusion XL to generate high-quality images from text prompts."
+---
+
+This tutorial shows you how to build an image generation application using Flash and Stable Diffusion XL (SDXL). You'll learn how to load a pretrained diffusion model on a GPU worker and generate images from text prompts.
+
+<Frame alt="Cool cat image generated by the Public Endpoints text-to-video pipeline">
+  <img src="/images/flash_sdxl_output.png" />
+</Frame>
+
+## What you'll learn
+
+In this tutorial you'll learn how to:
+
+- Use the Hugging Face diffusers library with Flash.
+- Load and run Stable Diffusion XL models on GPU workers.
+- Generate high-quality images from text prompts.
+- Save generated images to disk.
+- Configure generation parameters like guidance scale and steps.
+
+## Requirements
+
+- You've [created a Runpod account](/get-started/manage-accounts).
+- You've [created a Runpod API key](/get-started/api-keys).
+- You've installed [Python 3.10 or higher](https://www.python.org/downloads/).
+- You've completed the [Flash quickstart](/flash/quickstart) or are familiar with Flash basics.
+
+## What you'll build
+
+By the end of this tutorial, you'll have a working image generation application that:
+
+- Accepts text prompts as input.
+- Generates photorealistic images using Stable Diffusion XL.
+- Runs entirely on Runpod's GPU infrastructure.
+- Saves generated images to your local machine.
+
+## Step 1: Set up your project
+
+Create a new directory for your project and set up a Python virtual environment:
+
+```bash
+mkdir flash-image-generation
+cd flash-image-generation
+python3 -m venv venv
+source venv/bin/activate
+```
+
+Install Flash:
+
+```bash
+pip install runpod-flash
+```
+
+Create a `.env` file with your Runpod API key:
+
+```bash
+touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
+```
+
+Replace `YOUR_API_KEY` with your actual API key from the [Runpod console](https://www.runpod.io/console/user/settings).
+
+## Step 2: Understand Stable Diffusion XL
+
+[Stable Diffusion XL (SDXL)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is a state-of-the-art text-to-image model from Stability AI. It offers:
+
+- **High-quality images**: Generates photorealistic 1024x1024 images
+- **Better prompt understanding**: Improved text comprehension compared to SD 1.5
+- **Fine details**: Enhanced rendering of hands, faces, and text
+- **Open source**: Available for free on Hugging Face
+
+SDXL requires significant GPU resources:
+- **Model size**: ~7GB of weights
+- **VRAM requirement**: Minimum 16GB (24GB recommended)
+- **Generation time**: 20-40 seconds per image on RTX 4090
+
+We'll use the [diffusers](https://huggingface.co/docs/diffusers/index) library from Hugging Face, which provides a clean Python API for Stable Diffusion models.
+
+## Step 3: Create your project file
+
+Create a new file called `image_generation.py`:
+
+```bash
+touch image_generation.py
+```
+
+Open this file in your code editor. The following steps walk through building the image generation application.
+
+## Step 4: Add imports and configuration
+
+Add the necessary imports and Flash configuration:
+
+```python
+import asyncio
+import base64
+from pathlib import Path
+from dotenv import load_dotenv
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+# Load environment variables from .env file
+load_dotenv()
+
+# Configuration for GPU workers
+gpu_config = LiveServerless(
+    name="image-generation",
+    gpus=[GpuGroup.ADA_24, GpuGroup.AMPERE_24],  # 24GB GPUs
+    workersMax=2,
+    idleTimeout=15
+)
+```
+
+**Configuration breakdown**:
+
+- **`name="image-generation"`**: Identifies your endpoint in the Runpod console.
+- **`gpus=[GpuGroup.ADA_24, GpuGroup.AMPERE_24]`**: Uses RTX 4090 or L4/A5000 GPUs (both have 24GB VRAM, sufficient for SDXL).
+- **`workersMax=2`**: Allows up to 2 parallel workers.
+- **`idleTimeout=15`**: Keeps workers active for 15 minutes (SDXL models are large, so we want longer caching).
+
+<Note>
+SDXL requires at least 16GB VRAM. Using 24GB GPUs provides comfortable headroom and faster generation.
+</Note>
+
+## Step 5: Define the image generation function
+
+Add the remote function that will run on the GPU worker:
+
+```python
+@remote(
+    resource_config=gpu_config,
+    dependencies=["diffusers", "torch", "transformers", "accelerate"]
+)
+def generate_image(prompt, negative_prompt="", num_steps=30, guidance_scale=7.5):
+    """Generate an image using Stable Diffusion XL."""
+    import torch
+    from diffusers import StableDiffusionXLPipeline
+    import base64
+    from io import BytesIO
+
+    # Load the SDXL model
+    model_id = "stabilityai/stable-diffusion-xl-base-1.0"
+    pipe = StableDiffusionXLPipeline.from_pretrained(
+        model_id,
+        torch_dtype=torch.float16,
+        use_safetensors=True,
+        variant="fp16"
+    )
+
+    # Move model to GPU
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    pipe = pipe.to(device)
+
+    # Generate image
+    image = pipe(
+        prompt=prompt,
+        negative_prompt=negative_prompt,
+        num_inference_steps=num_steps,
+        guidance_scale=guidance_scale,
+        height=1024,
+        width=1024
+    ).images[0]
+
+    # Convert image to base64 for transmission
+    buffered = BytesIO()
+    image.save(buffered, format="PNG")
+    img_str = base64.b64encode(buffered.getvalue()).decode()
+
+    return {
+        "image_base64": img_str,
+        "prompt": prompt,
+        "negative_prompt": negative_prompt,
+        "num_steps": num_steps,
+        "guidance_scale": guidance_scale,
+        "device": device,
+        "resolution": "1024x1024"
+    }
+```
+
+**Key concepts**:
+
+**1. Dependencies**: The function requires four packages:
+   - `diffusers`: Hugging Face library for diffusion models
+   - `torch`: PyTorch for GPU computation
+   - `transformers`: Text encoder dependencies
+   - `accelerate`: Efficient model loading
+
+**2. Model loading**:
+   ```python
+   pipe = StableDiffusionXLPipeline.from_pretrained(
+       model_id,
+       torch_dtype=torch.float16,
+       use_safetensors=True,
+       variant="fp16"
+   )
+   ```
+   This downloads SDXL from Hugging Face. Key parameters:
+   - `torch_dtype=torch.float16`: Use half-precision (saves VRAM, faster)
+   - `use_safetensors=True`: Use safe tensor format
+   - `variant="fp16"`: Download the fp16 version (~7GB instead of ~14GB)
+
+**3. GPU acceleration**:
+   ```python
+   pipe = pipe.to(device)
+   ```
+   Moves the entire pipeline (text encoder, UNet, VAE) to GPU.
+
+**4. Image generation**:
+   ```python
+   image = pipe(
+       prompt=prompt,
+       negative_prompt=negative_prompt,
+       num_inference_steps=num_steps,
+       guidance_scale=guidance_scale,
+       height=1024,
+       width=1024
+   ).images[0]
+   ```
+
+   Parameters:
+   - **`prompt`**: What you want to see in the image
+   - **`negative_prompt`**: What you don't want (e.g., "blurry, low quality")
+   - **`num_inference_steps`**: More steps = better quality but slower (20-50 typical)
+   - **`guidance_scale`**: How closely to follow the prompt (7-10 recommended)
+   - **`height/width`**: SDXL is trained for 1024x1024
+
+**5. Image encoding**:
+   ```python
+   buffered = BytesIO()
+   image.save(buffered, format="PNG")
+   img_str = base64.b64encode(buffered.getvalue()).decode()
+   ```
+   We encode the image as base64 to return it through Flash. This allows us to transmit the image data as a string.
+
+## Step 6: Add the main function and image saving
+
+Create functions to call the generator and save images:
+
+```python
+def save_image(base64_string, filename):
+    """Save a base64-encoded image to disk."""
+    import base64
+    from PIL import Image
+    from io import BytesIO
+
+    # Decode base64 string
+    img_data = base64.b64decode(base64_string)
+
+    # Open and save image
+    image = Image.open(BytesIO(img_data))
+    image.save(filename)
+    print(f"✓ Image saved to {filename}")
+
+async def main():
+    print("Generating image with Stable Diffusion XL on Runpod GPU...")
+    print("This may take 1-2 minutes on first run (downloading model)...\n")
+
+    # Define your prompt
+    prompt = "A serene landscape with mountains, a lake, and sunset, highly detailed, photorealistic"
+    negative_prompt = "blurry, low quality, distorted, ugly"
+
+    # Generate image
+    result = await generate_image(
+        prompt=prompt,
+        negative_prompt=negative_prompt,
+        num_steps=30,
+        guidance_scale=7.5
+    )
+
+    # Save the generated image
+    output_dir = Path("generated_images")
+    output_dir.mkdir(exist_ok=True)
+
+    filename = output_dir / "sdxl_output.png"
+    save_image(result["image_base64"], filename)
+
+    # Display metadata
+    print(f"\n{'='*60}")
+    print("GENERATION DETAILS")
+    print('='*60)
+    print(f"Prompt: {result['prompt']}")
+    print(f"Negative prompt: {result['negative_prompt']}")
+    print(f"Steps: {result['num_steps']}")
+    print(f"Guidance scale: {result['guidance_scale']}")
+    print(f"Resolution: {result['resolution']}")
+    print(f"Device: {result['device']}")
+    print('='*60)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+This main function:
+
+- Calls the remote function with `await`.
+- Creates a `generated_images` directory if it doesn't exist.
+- Decodes and saves the base64 image to disk.
+- Displays generation metadata.
+
+## Step 7: Run your first generation
+
+Run the application:
+
+```bash
+python image_generation.py
+```
+
+**First run output** (takes 2-3 minutes):
+
+```text
+Generating image with Stable Diffusion XL on Runpod GPU...
+This may take 1-2 minutes on first run (downloading model)...
+
+Creating endpoint: server_LiveServerless_a1b2c3d4
+Provisioning Serverless endpoint...
+Endpoint ready
+Executing function on RunPod endpoint ID: xvf32dan8rcilp
+Initial job status: IN_QUEUE
+Downloading model weights from Hugging Face...
+Model loaded, generating image...
+Job completed, output received
+✓ Image saved to generated_images/sdxl_output.png
+
+============================================================
+GENERATION DETAILS
+============================================================
+Prompt: A serene landscape with mountains, a lake, and sunset, highly detailed, photorealistic
+Negative prompt: blurry, low quality, distorted, ugly
+Steps: 30
+Guidance scale: 7.5
+Resolution: 1024x1024
+Device: cuda
+============================================================
+```
+
+**Subsequent runs** (takes 30-40 seconds):
+
+```text
+Generating image with Stable Diffusion XL on Runpod GPU...
+
+Resource LiveServerless_a1b2c3d4 already exists, reusing.
+Executing function on RunPod endpoint ID: xvf32dan8rcilp
+Initial job status: IN_QUEUE
+Job completed, output received
+✓ Image saved to generated_images/sdxl_output.png
+
+[Results appear]
+```
+
+Open `generated_images/sdxl_output.png` to see your generated image!
+
+<Tip>
+The first run downloads ~7GB of model weights, which takes 1-2 minutes. Subsequent runs reuse the cached model and complete in 30-40 seconds.
+</Tip>
+
+## Step 8: Experiment with different prompts
+
+Try various prompts to see SDXL's capabilities:
+
+```python
+async def main():
+    # Create output directory
+    output_dir = Path("generated_images")
+    output_dir.mkdir(exist_ok=True)
+
+    # Try different prompts
+    prompts = [
+        {
+            "prompt": "A cyberpunk city at night with neon lights, flying cars, rain, cinematic",
+            "negative": "blurry, low quality",
+            "filename": "cyberpunk_city.png"
+        },
+        {
+            "prompt": "A cute corgi puppy wearing a space suit, floating in space, highly detailed",
+            "negative": "distorted, ugly, bad anatomy",
+            "filename": "space_corgi.png"
+        },
+        {
+            "prompt": "An ancient wizard's study filled with books, potions, magical artifacts, candlelight",
+            "negative": "blurry, modern, plastic",
+            "filename": "wizard_study.png"
+        }
+    ]
+
+    for i, p in enumerate(prompts, 1):
+        print(f"\n{'='*60}")
+        print(f"Generating image {i}/{len(prompts)}")
+        print(f"Prompt: {p['prompt'][:50]}...")
+        print('='*60)
+
+        result = await generate_image(
+            prompt=p['prompt'],
+            negative_prompt=p['negative'],
+            num_steps=30,
+            guidance_scale=7.5
+        )
+
+        filename = output_dir / p['filename']
+        save_image(result["image_base64"], filename)
+        print(f"✓ Saved to {filename}\n")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+Run it:
+
+```bash
+python image_generation.py
+```
+
+You'll see three different images generated sequentially on the same GPU worker. Each generation takes about 30-40 seconds after the first one.
+
+## Understanding generation parameters
+
+Let's explore how different parameters affect image quality:
+
+### Number of inference steps
+
+```python
+# Fast but lower quality (15-20 steps)
+result = await generate_image(prompt, num_steps=20)
+
+# Balanced (30 steps) - recommended
+result = await generate_image(prompt, num_steps=30)
+
+# High quality but slower (50 steps)
+result = await generate_image(prompt, num_steps=50)
+```
+
+**Effects**:
+- **15-20 steps**: Faster (15-20 seconds) but less refined details
+- **30 steps**: Good balance of quality and speed (30-40 seconds) - **recommended**
+- **50+ steps**: Diminishing returns, minimal quality improvement
+
+### Guidance scale
+
+```python
+# Low guidance - more creative, less faithful to prompt
+result = await generate_image(prompt, guidance_scale=5.0)
+
+# Medium guidance - balanced (recommended)
+result = await generate_image(prompt, guidance_scale=7.5)
+
+# High guidance - very faithful to prompt, may oversaturate
+result = await generate_image(prompt, guidance_scale=12.0)
+```
+
+**Effects**:
+- **3-5**: More artistic freedom, less literal interpretation
+- **7-10**: Balanced, follows prompt closely - **recommended**
+- **12+**: Very literal, may produce oversaturated or exaggerated images
+
+### Negative prompts
+
+Negative prompts tell the model what to avoid:
+
+```python
+# Good negative prompts for photorealistic images
+negative_prompt = "blurry, low quality, distorted, ugly, bad anatomy, watermark"
+
+# Good negative prompts for artistic images
+negative_prompt = "realistic, photograph, blurry, low quality"
+
+# Good negative prompts for portraits
+negative_prompt = "distorted face, bad anatomy, extra limbs, low quality"
+```
+
+Use negative prompts to:
+
+- Remove common artifacts ("distorted", "low quality").
+- Avoid unwanted styles ("cartoon", "3D render").
+- Fix common issues ("bad anatomy", "extra fingers").
+
+## Troubleshooting
+
+### Out of memory error
+
+**Issue**: `RuntimeError: CUDA out of memory`
+
+**Cause**: SDXL requires significant VRAM (16GB minimum)
+
+**Solutions**:
+1. Verify you're using 24GB GPUs:
+   ```python
+   gpus=[GpuGroup.ADA_24, GpuGroup.AMPERE_24]  # 24GB GPUs
+   ```
+
+2. Use half-precision (already in the example):
+   ```python
+   torch_dtype=torch.float16  # Half precision
+   ```
+
+3. If still failing, use 48GB GPUs:
+   ```python
+   gpus=[GpuGroup.AMPERE_48]  # A40/A6000 with 48GB
+   ```
+
+### Model download fails
+
+**Issue**: `Error: Failed to download model from Hugging Face`
+
+**Solutions**:
+1. Increase execution timeout for first run:
+   ```python
+   gpu_config = LiveServerless(
+       name="image-generation",
+       executionTimeoutMs=600000  # 10 minutes for first download
+   )
+   ```
+
+2. Check Hugging Face Hub status at [status.huggingface.co](https://status.huggingface.co)
+
+3. Try a smaller model first to test connectivity:
+   ```python
+   model_id = "runwayml/stable-diffusion-v1-5"  # Smaller SD 1.5
+   ```
+
+### Image quality is poor
+
+**Issue**: Generated images look blurry or low quality
+
+**Solutions**:
+1. Increase inference steps:
+   ```python
+   num_steps=40  # More steps = better quality
+   ```
+
+2. Adjust guidance scale:
+   ```python
+   guidance_scale=8.5  # Higher guidance
+   ```
+
+3. Improve your prompt:
+   ```python
+   prompt = "A detailed portrait, highly detailed, sharp focus, 8k, professional photography"
+   ```
+
+4. Add quality keywords to your prompt:
+   - "highly detailed"
+   - "sharp focus"
+   - "8k"
+   - "photorealistic"
+   - "professional"
+
+### Slow generation
+
+**Issue**: Image generation takes >60 seconds per image
+
+**Possible causes**:
+1. Worker scaled down (cold start)
+2. Model not cached
+3. Too many inference steps
+
+**Solutions**:
+1. Increase `idleTimeout` to keep workers active:
+   ```python
+   idleTimeout=30  # Keep active for 30 minutes
+   ```
+
+2. Reduce inference steps:
+   ```python
+   num_steps=20  # Faster but slightly lower quality
+   ```
+
+3. Set `workersMin=1` to always have a warm worker ready
+
+### Images look distorted or have artifacts
+
+**Issue**: Generated images have weird artifacts or distortions
+
+**Solutions**:
+1. Use negative prompts:
+   ```python
+   negative_prompt="distorted, ugly, bad anatomy, extra limbs, disfigured"
+   ```
+
+2. Adjust guidance scale (try 7-9 range):
+   ```python
+   guidance_scale=8.0
+   ```
+
+3. Increase inference steps for better refinement:
+   ```python
+   num_steps=35
+   ```
+
+## Next steps
+
+Now that you've built an image generation app with Flash, you can:
+
+### Try other Stable Diffusion models
+
+Explore different models from Hugging Face:
+
+```python
+# SDXL Turbo - 4x faster, 1 step generation
+model_id = "stabilityai/sdxl-turbo"
+
+# Stable Diffusion 1.5 - smaller, faster
+model_id = "runwayml/stable-diffusion-v1-5"
+
+# Stable Diffusion 2.1 - better at artistic styles
+model_id = "stabilityai/stable-diffusion-2-1"
+```
+
+### Add image-to-image generation
+
+Use an existing image as a starting point:
+
+```python
+from diffusers import StableDiffusionXLImg2ImgPipeline
+
+# Load img2img pipeline
+pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(...)
+
+# Generate variations of an existing image
+image = pipe(prompt, image=init_image, strength=0.75).images[0]
+```
+
+### Build a Flash app
+
+Convert your script to a production [Flash app](/flash/apps/overview):
+
+```bash
+flash init image-generation-app
+# Move your function to workers/gpu/endpoint.py
+# Add FastAPI routes for HTTP API
+flash deploy
+```
+
+### Optimize with network volumes
+
+Use [network volumes](/flash/managing-endpoints) to cache models across workers:
+
+```python
+config = LiveServerless(
+    name="image-generation",
+    networkVolumeId="vol_abc123",  # Pre-loaded SDXL model
+    template=PodTemplate(containerDiskInGb=100)
+)
+```
+
+### Explore advanced features
+
+- **LoRA fine-tuning**: Customize SDXL for specific styles
+- **ControlNet**: Guide generation with edge maps, depth, or pose
+- **Inpainting**: Edit specific parts of images
+- **Upscaling**: Generate higher resolution images
+
+## Related resources
+
+- [Flash remote functions guide](/flash/remote-functions)
+- [Flash resource configuration](/flash/resource-configuration)
+- [Managing Flash endpoints](/flash/managing-endpoints)
+- [Hugging Face diffusers documentation](https://huggingface.co/docs/diffusers/index)
+- [Stable Diffusion XL model card](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+- [Prompt engineering guide](https://huggingface.co/docs/diffusers/using-diffusers/write_good_prompt)
diff --git a/tutorials/flash/text-generation-with-transformers.mdx b/tutorials/flash/text-generation-with-transformers.mdx
new file mode 100644
index 00000000..fda30664
--- /dev/null
+++ b/tutorials/flash/text-generation-with-transformers.mdx
@@ -0,0 +1,450 @@
+---
+title: "Generate text with Flash and transformers"
+sidebarTitle: "Generate text with Flash + transformers"
+description: "Learn how to use Flash with Hugging Face transformers to build a GPU-accelerated text generation application."
+---
+
+This tutorial shows you how to build a text generation application using Flash and Hugging Face's transformers library. You'll learn how to load a pretrained language model on a GPU worker and generate text from prompts.
+
+## What you'll learn
+
+In this tutorial you'll learn how to:
+
+- Install and use the Hugging Face transformers library with Flash.
+- Load pretrained models on remote GPU workers.
+- Move models to GPU for faster inference.
+- Configure text generation parameters like temperature and max length.
+- Return structured results with metadata.
+
+## Requirements
+
+- You've [created a Runpod account](/get-started/manage-accounts).
+- You've [created a Runpod API key](/get-started/api-keys).
+- You've installed [Python 3.10 or higher](https://www.python.org/downloads/).
+- You've completed the [Flash quickstart](/flash/quickstart) or are familiar with Flash basics.
+
+## What you'll build
+
+By the end of this tutorial, you'll have a working text generation application that:
+
+- Accepts text prompts as input.
+- Generates natural language completions using GPT-2.
+- Runs entirely on Runpod's GPU infrastructure.
+- Returns generated text with execution metadata.
+
+## Step 1: Set up your project
+
+Create a new directory for your project and set up a Python virtual environment:
+
+```bash
+mkdir flash-text-generation
+cd flash-text-generation
+python3 -m venv venv
+source venv/bin/activate
+```
+
+Install Flash:
+
+```bash
+pip install runpod-flash
+```
+
+Create a `.env` file with your Runpod API key:
+
+```bash
+touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
+```
+
+Replace `YOUR_API_KEY` with your actual API key from the [Runpod console](https://www.runpod.io/console/user/settings).
+
+## Step 2: Understand the Hugging Face transformers library
+
+[Hugging Face transformers](https://huggingface.co/docs/transformers/index) is a popular Python library for working with pretrained language models. It provides:
+
+- **Thousands of pretrained models**: GPT-2, GPT-3, BERT, T5, LLaMA, and more
+- **Unified API**: Same code works across different model architectures
+- **Model hub integration**: Download models directly from [Hugging Face Hub](https://huggingface.co/models)
+- **Production-ready**: Used by companies and researchers worldwide
+
+For this tutorial, we'll use **GPT-2**, a 124M parameter language model from OpenAI. It's small enough to load quickly but powerful enough to generate coherent text.
+
+## Step 3: Create your project file
+
+Create a new file called `text_generation.py`:
+
+```bash
+touch text_generation.py
+```
+
+Open this file in your code editor. The following steps walk through building the text generation application.
+
+## Step 4: Add imports and configuration
+
+Add the necessary imports and Flash configuration:
+
+```python
+import asyncio
+from dotenv import load_dotenv
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+# Load environment variables from .env file
+load_dotenv()
+
+# Configuration for GPU workers
+gpu_config = LiveServerless(
+    name="text-generation",
+    gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24],  # 24GB GPUs
+    workersMax=3,
+    idleTimeout=10
+)
+```
+
+**Configuration breakdown**:
+
+- **`name="text-generation"`**: Identifies your endpoint in the Runpod console
+- **`gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24]`**: Allows workers to use L4, A5000, RTX 3090, or RTX 4090 GPUs (all have 24GB VRAM)
+- **`workersMax=3`**: Allows up to 3 parallel workers for concurrent requests
+- **`idleTimeout=10`**: Keeps workers active for 10 minutes after last use (reduces cold starts)
+
+<Tip>
+GPT-2 only requires about 2GB of VRAM, so 24GB GPUs are more than sufficient. For larger models like LLaMA or GPT-J, you might need 48GB or 80GB GPUs.
+</Tip>
+
+## Step 5: Define the text generation function
+
+Add the remote function that will run on the GPU worker:
+
+```python
+@remote(
+    resource_config=gpu_config,
+    dependencies=["transformers", "torch", "accelerate"]
+)
+def generate_text(prompt, max_length=50):
+    """Generate text using a pretrained language model."""
+    import torch
+    from transformers import AutoTokenizer, AutoModelForCausalLM
+
+    # Load the GPT-2 model and tokenizer
+    model_name = "gpt2"
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForCausalLM.from_pretrained(model_name)
+
+    # Move model to GPU if available
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    device_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
+    model = model.to(device)
+
+    # Tokenize the input prompt
+    inputs = tokenizer(prompt, return_tensors="pt").to(device)
+
+    # Generate text
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_length=max_length,
+            num_return_sequences=1,
+            temperature=0.7,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id
+        )
+
+    # Decode the generated tokens back to text
+    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+
+    return {
+        "prompt": prompt,
+        "generated_text": generated_text,
+        "model_name": model_name,
+        "device": device,
+        "device_name": device_name,
+        "max_length": max_length
+    }
+```
+
+**Key concepts**:
+
+**1. Dependencies**: The function requires three packages:
+   - `transformers`: Hugging Face library for language models
+   - `torch`: PyTorch for GPU computation
+   - `accelerate`: Helper library for loading large models efficiently
+
+**2. Model loading**:
+   ```python
+   tokenizer = AutoTokenizer.from_pretrained(model_name)
+   model = AutoModelForCausalLM.from_pretrained(model_name)
+   ```
+   These lines download and load the GPT-2 model from Hugging Face Hub. The first time this runs, it downloads ~500MB of model weights. Subsequent runs use the cached version.
+
+**3. GPU acceleration**:
+   ```python
+   device = "cuda" if torch.cuda.is_available() else "cpu"
+   model = model.to(device)
+   ```
+   This moves the model to GPU for faster inference. On Runpod workers, `torch.cuda.is_available()` returns `True`.
+
+**4. Tokenization**:
+   ```python
+   inputs = tokenizer(prompt, return_tensors="pt").to(device)
+   ```
+   Converts your text prompt into token IDs that the model understands. The `.to(device)` moves these tokens to GPU memory.
+
+**5. Generation parameters**:
+   - `max_length=50`: Maximum number of tokens to generate
+   - `temperature=0.7`: Controls randomness (0.0 = deterministic, 1.0+ = very random)
+   - `do_sample=True`: Use sampling instead of greedy decoding for more diverse outputs
+   - `num_return_sequences=1`: Generate one completion per prompt
+
+**6. No gradient tracking**:
+   ```python
+   with torch.no_grad():
+   ```
+   Disables gradient computation, reducing memory usage and speeding up inference.
+
+## Step 6: Add the main function
+
+Create the main function to test your text generator:
+
+```python
+async def main():
+    print("Starting text generation on Runpod GPU...")
+
+    # Define a prompt
+    prompt = "The future of artificial intelligence is"
+
+    # Generate text
+    result = await generate_text(prompt, max_length=100)
+
+    # Display results
+    print("\n" + "="*60)
+    print("TEXT GENERATION RESULTS")
+    print("="*60)
+    print(f"\nPrompt: {result['prompt']}")
+    print(f"\nGenerated text:\n{result['generated_text']}")
+    print("\n" + "-"*60)
+    print(f"Model: {result['model_name']}")
+    print(f"Device: {result['device']}")
+    print(f"GPU: {result['device_name']}")
+    print(f"Max length: {result['max_length']} tokens")
+    print("="*60)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+This main function:
+
+- Calls the remote function with `await` (runs asynchronously).
+- Waits for the GPU worker to complete text generation.
+- Displays the results in a formatted output.
+
+## Step 7: Run your first generation
+
+Run the application:
+
+```bash
+python text_generation.py
+```
+
+**First run output** (takes 60-90 seconds):
+
+```text
+Starting text generation on Runpod GPU...
+Creating endpoint: server_LiveServerless_a1b2c3d4
+Provisioning Serverless endpoint...
+Endpoint ready
+Registering RunPod endpoint at https://api.runpod.ai/xvf32dan8rcilp
+Executing function on RunPod endpoint ID: xvf32dan8rcilp
+Initial job status: IN_QUEUE
+Installing dependencies: transformers torch accelerate
+Downloading model weights...
+Job completed, output received
+
+============================================================
+TEXT GENERATION RESULTS
+============================================================
+
+Prompt: The future of artificial intelligence is
+
+Generated text:
+The future of artificial intelligence is bright and full of possibilities. With advancements in machine learning and deep learning, we're seeing AI systems that can understand natural language, recognize images, and even create art. The potential applications are endless, from healthcare to transportation to education.
+
+------------------------------------------------------------
+Model: gpt2
+Device: cuda
+GPU: NVIDIA GeForce RTX 4090
+Max length: 100 tokens
+============================================================
+```
+
+**Subsequent runs** (takes 2-5 seconds):
+
+```text
+Starting text generation on Runpod GPU...
+Resource LiveServerless_a1b2c3d4 already exists, reusing.
+Registering RunPod endpoint at https://api.runpod.ai/xvf32dan8rcilp
+Executing function on RunPod endpoint ID: xvf32dan8rcilp
+Initial job status: IN_QUEUE
+Job completed, output received
+
+[Results appear immediately]
+```
+
+Notice the dramatic speed improvement on subsequent runs—the endpoint is already provisioned, dependencies are installed, and the model is cached.
+
+## Step 8: Experiment with different prompts
+
+Modify the main function to try different prompts:
+
+```python
+async def main():
+    print("Starting text generation on Runpod GPU...")
+
+    # Try multiple prompts
+    prompts = [
+        "Once upon a time in a distant galaxy",
+        "The secret to happiness is",
+        "In the year 2050, technology will"
+    ]
+
+    for prompt in prompts:
+        print(f"\n{'='*60}")
+        print(f"Generating for: {prompt}")
+        print('='*60)
+
+        result = await generate_text(prompt, max_length=80)
+        print(f"\n{result['generated_text']}\n")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+Run it again:
+
+```bash
+python text_generation.py
+```
+
+You'll see three different completions generated sequentially on the same GPU worker.
+
+## Troubleshooting
+
+### Model download fails
+
+**Issue**: `Error: Failed to download model from Hugging Face`
+
+**Solutions**:
+1. Check internet connectivity from workers (rare issue on Runpod)
+2. Try a different model that might be available faster
+3. Increase execution timeout in configuration:
+   ```python
+   gpu_config = LiveServerless(
+       name="text-generation",
+       executionTimeoutMs=300000  # 5 minutes
+   )
+   ```
+
+### Out of memory error
+
+**Issue**: `RuntimeError: CUDA out of memory`
+
+**Solutions**:
+1. Use smaller models (GPT-2 instead of GPT-2 Large)
+2. Reduce `max_length` parameter
+3. Use larger GPUs:
+   ```python
+   gpus=[GpuGroup.AMPERE_48]  # 48GB GPUs
+   ```
+
+### Slow generation
+
+**Issue**: Text generation takes >30 seconds per request
+
+**Possible causes**:
+1. Worker scaled down (cold start)
+2. Model not cached
+3. Large `max_length` value
+
+**Solutions**:
+1. Increase `idleTimeout` to keep workers active:
+   ```python
+   idleTimeout=30  # Keep active for 30 minutes
+   ```
+2. Set `workersMin=1` to always have a warm worker ready
+3. Reduce `max_length` to generate fewer tokens
+
+### Generation quality is poor
+
+**Issue**: Generated text is incoherent or repetitive
+
+**Solutions**:
+1. Adjust `temperature` (try 0.7-0.9)
+2. Add `top_p` and `top_k` sampling:
+   ```python
+   outputs = model.generate(
+       **inputs,
+       max_length=max_length,
+       temperature=0.8,
+       top_p=0.9,
+       top_k=50,
+       do_sample=True
+   )
+   ```
+3. Try a larger model (GPT-2 Medium or Large)
+
+## Next steps
+
+Now that you've built a text generation app with Flash, you can:
+
+### Explore other models
+
+Try different models from Hugging Face:
+
+```python
+# Instruction-following model
+model_name = "facebook/opt-1.3b"
+
+# Code generation model
+model_name = "Salesforce/codegen-350M-mono"
+
+# Dialogue model
+model_name = "microsoft/DialoGPT-medium"
+```
+
+### Build a chat interface
+
+Extend your app to handle multi-turn conversations:
+
+```python
+@remote(resource_config=gpu_config, dependencies=["transformers", "torch"])
+def chat(conversation_history):
+    """Multi-turn chat with context."""
+    # Concatenate conversation history
+    prompt = "\n".join(conversation_history)
+    # Generate response
+    # Return new message
+```
+
+### Deploy as a Flash app
+
+Convert your script to a production [Flash app](/flash/apps/overview):
+
+```bash
+flash init text-generation-app
+# Move your function to workers/gpu/endpoint.py
+# Add FastAPI routes
+flash deploy
+```
+
+### Optimize performance
+
+- Use [network volumes](/flash/managing-endpoints) to cache models across workers.
+- Implement [request batching](/flash/remote-functions#parallel-execution) for higher throughput.
+- Try [quantized models](https://huggingface.co/docs/transformers/main_classes/quantization) for faster inference.
+
+## Related resources
+
+- [Flash remote functions guide](/flash/remote-functions)
+- [Flash resource configuration](/flash/resource-configuration)
+- [Managing Flash endpoints](/flash/managing-endpoints)
+- [Hugging Face transformers documentation](https://huggingface.co/docs/transformers/index)
+- [Hugging Face model hub](https://huggingface.co/models)