runpod · KAJdev · Mar 5, 2026 · Feb 25, 2026 · Feb 27, 2026 · Feb 27, 2026
diff --git a/01_getting_started/01_hello_world/README.md b/01_getting_started/01_hello_world/README.md
@@ -30,16 +30,14 @@ Server starts at **http://localhost:8888**
 
 ### 4. Test the API
 
-Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@remote` functions.
+Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions.
 
 ```bash
 curl -X POST http://localhost:8888/gpu_worker/runsync \
   -H "Content-Type: application/json" \
   -d '{"message": "Hello GPU!"}'
 ```
 
-Visit **http://localhost:8888/docs** for interactive API documentation.
-
 ### Full CLI Documentation
 
 For complete CLI usage including deployment, environment management, and troubleshooting:
@@ -58,14 +56,14 @@ Simple GPU-based serverless function that:
 - Runs on any available GPU
 
 The worker demonstrates:
-- Remote execution with `@remote` decorator
-- GPU resource configuration with `LiveServerless`
-- Automatic scaling based on demand
+- Remote execution with the `@Endpoint` decorator
+- GPU resource configuration via `gpu=` parameter
+- Automatic scaling via `workers=` parameter
 - Local development and testing
 
 ## API Endpoints
 
-QB (queue-based) endpoints are auto-generated from `@remote` functions. Visit `/docs` for the full API schema.
+QB (queue-based) endpoints are auto-generated from `@Endpoint` functions. Visit `/docs` for the full API schema.
 
 ### `gpu_hello`
 
@@ -100,7 +98,7 @@ Executes a simple GPU worker and returns system/GPU information.
 
 ```
 01_hello_world/
-├── gpu_worker.py        # GPU worker with @remote decorator
+├── gpu_worker.py        # GPU worker with @Endpoint decorator
 ├── pyproject.toml       # Project metadata
 ├── requirements.txt     # Dependencies
 ├── .env.example         # Environment variables template
@@ -110,16 +108,23 @@ Executes a simple GPU worker and returns system/GPU information.
 ## Key Concepts
 
 ### Remote Execution
-The `@remote` decorator transparently executes functions on serverless infrastructure:
+The `@Endpoint` decorator transparently executes functions on serverless infrastructure:
 - Code runs locally during development
 - Automatically deploys to Runpod when configured
 - Handles serialization and resource management
 
+```python
+from runpod_flash import Endpoint, GpuGroup
+
+@Endpoint(name="my-worker", gpu=GpuGroup.ANY, workers=(0, 3))
+async def my_function(data: dict) -> dict:
+    return {"result": "processed"}
+```
+
 ### Resource Scaling
 The GPU worker scales to zero when idle:
-- **workersMin=0**: Scales down completely when idle
-- **workersMax=3**: Up to 3 concurrent workers
-- **idleTimeout=5**: 5 minutes before scaling down
+- **workers=(0, 3)**: Scale from 0 to 3 workers
+- **idle_timeout=5**: 5 minutes before scaling down
 
 ### GPU Detection
 The worker uses PyTorch to detect and report GPU information:
@@ -142,7 +147,7 @@ flash run
 
 ## Next Steps
 
-- Customize GPU type: Change `GpuGroup.ANY` to specific GPU (ADA_24, AMPERE_80, etc.)
+- Customize GPU type: Change `GpuGroup.ANY` to a specific GPU (e.g. `GpuGroup.ADA_24`, `GpuGroup.AMPERE_80`)
 - Add your own GPU-accelerated code
 - Implement error handling and validation
 - Deploy to production with `flash deploy`

diff --git a/01_getting_started/01_hello_world/gpu_worker.py b/01_getting_started/01_hello_world/gpu_worker.py
@@ -1,20 +1,17 @@
-# GPU serverless worker -- detects available GPU hardware.
-# Run with: flash run
-# Test directly: python gpu_worker.py
-from runpod_flash import GpuGroup, LiveServerless, remote
+# gpu serverless worker -- detects available GPU hardware.
+# run with: flash run
+# test directly: python gpu_worker.py
+from runpod_flash import Endpoint, GpuGroup
 
-gpu_config = LiveServerless(
+
+@Endpoint(
     name="01_01_gpu_worker",
-    gpus=[GpuGroup.ANY],
-    workersMin=0,
-    workersMax=3,
-    idleTimeout=5,
+    gpu=GpuGroup.ANY,
+    workers=(0, 3),
+    idle_timeout=5,
 )
-
-
-@remote(resource_config=gpu_config)
-async def gpu_hello(payload: dict) -> dict:
-    """Simple GPU worker that returns GPU hardware info."""
+async def gpu_hello(input_data: dict) -> dict:
+    """GPU worker that returns GPU hardware info."""
     import platform
     from datetime import datetime
 
@@ -25,7 +22,7 @@ async def gpu_hello(payload: dict) -> dict:
     gpu_count = torch.cuda.device_count()
     gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
 
-    message = payload.get("message", "Hello from GPU worker!")
+    message = input_data.get("message", "Hello from GPU worker!")
 
     return {
         "status": "success",

diff --git a/01_getting_started/02_cpu_worker/README.md b/01_getting_started/02_cpu_worker/README.md
@@ -30,7 +30,7 @@ Server starts at **http://localhost:8888**
 
 ### 4. Test the API
 
-Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@remote` functions.
+Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions.
 
 ```bash
 curl -X POST http://localhost:8888/cpu_worker/runsync \
@@ -56,14 +56,14 @@ Simple CPU-based serverless function that:
 - Runs on general-purpose CPU instances
 
 The worker demonstrates:
-- Remote execution with `@remote` decorator
-- CPU resource configuration with `CpuLiveServerless`
-- Automatic scaling based on demand
+- Remote execution with the `@Endpoint` decorator
+- CPU resource configuration via `cpu=` parameter
+- Automatic scaling via `workers=` parameter
 - Lightweight API request handling
 
 ## API Endpoints
 
-QB (queue-based) endpoints are auto-generated from `@remote` functions. Visit `/docs` for the full API schema.
+QB (queue-based) endpoints are auto-generated from `@Endpoint` functions. Visit `/docs` for the full API schema.
 
 ### `cpu_hello`
 
@@ -92,7 +92,7 @@ Executes a simple CPU worker and returns a greeting with system information.
 
 ```
 02_cpu_worker/
-├── cpu_worker.py        # CPU worker with @remote decorator
+├── cpu_worker.py        # CPU worker with @Endpoint decorator
 ├── pyproject.toml       # Project metadata
 ├── requirements.txt     # Dependencies
 ├── .env.example         # Environment variables template
@@ -102,23 +102,39 @@ Executes a simple CPU worker and returns a greeting with system information.
 ## Key Concepts
 
 ### Remote Execution
-The `@remote` decorator transparently executes functions on serverless infrastructure:
+The `@Endpoint` decorator transparently executes functions on serverless infrastructure:
 - Code runs locally during development
 - Automatically deploys to Runpod when configured
 - Handles serialization and resource management
 
-### Resource Scaling
-The CPU worker scales to zero when idle:
-- **workersMin=0**: Scales down completely when idle
-- **workersMax=3**: Up to 3 concurrent workers
-- **idleTimeout=5**: 5 minutes before scaling down
+```python
+from runpod_flash import Endpoint
+
+@Endpoint(name="my-worker", cpu="cpu3c-1-2", workers=(0, 3))
+async def my_function(data: dict) -> dict:
+    return {"result": "processed"}
+```
 
 ### CPU Instance Types
 Available CPU configurations:
 - `CpuInstanceType.CPU3G_2_8`: 2 vCPU, 8GB RAM (General Purpose)
 - `CpuInstanceType.CPU3C_4_8`: 4 vCPU, 8GB RAM (Compute Optimized)
 - `CpuInstanceType.CPU5G_4_16`: 4 vCPU, 16GB RAM (Latest Gen)
 
+CPU type can be specified as an enum or a string shorthand:
+```python
+# enum
+@Endpoint(name="worker", cpu=CpuInstanceType.CPU3C_1_2)
+
+# string shorthand
+@Endpoint(name="worker", cpu="cpu3c-1-2")
+```
+
+### Resource Scaling
+The CPU worker scales to zero when idle:
+- **workers=(0, 3)**: Scale from 0 to 3 workers
+- **idle_timeout=5**: 5 minutes before scaling down
+
 ## Development
 
 ### Test Worker Locally
@@ -148,7 +164,7 @@ Compare with GPU workers when you need:
 
 ## Next Steps
 
-- Customize CPU type: Change `CpuInstanceType.CPU3G_2_8` to specific instance type
+- Customize CPU type: Change `"cpu3c-1-2"` to a different instance type
 - Add request validation and error handling
 - Integrate with databases or external APIs
 - Deploy to production with `flash deploy`

diff --git a/01_getting_started/02_cpu_worker/cpu_worker.py b/01_getting_started/02_cpu_worker/cpu_worker.py
@@ -1,24 +1,21 @@
-# CPU serverless worker -- lightweight processing without GPU.
-# Run with: flash run
-# Test directly: python cpu_worker.py
-from runpod_flash import CpuInstanceType, CpuLiveServerless, remote
+# cpu serverless worker -- lightweight processing without GPU.
+# run with: flash run
+# test directly: python cpu_worker.py
+from runpod_flash import CpuInstanceType, Endpoint
 
-cpu_config = CpuLiveServerless(
+
+@Endpoint(
     name="01_02_cpu_worker",
-    instanceIds=[CpuInstanceType.CPU3C_1_2],
-    workersMin=0,
-    workersMax=3,
-    idleTimeout=5,
+    cpu=CpuInstanceType.CPU3C_1_2,
+    workers=(0, 3),
+    idle_timeout=5,
 )
-
-
-@remote(resource_config=cpu_config)
-async def cpu_hello(payload: dict) -> dict:
-    """Simple CPU worker that returns a greeting."""
+async def cpu_hello(input_data: dict) -> dict:
+    """CPU worker that returns a greeting."""
     import platform
     from datetime import datetime
 
-    message = f"Hello, {payload.get('name', 'Anonymous Panda')}!"
+    message = f"Hello, {input_data.get('name', 'Anonymous Panda')}!"
 
     return {
         "status": "success",

diff --git a/01_getting_started/03_mixed_workers/README.md b/01_getting_started/03_mixed_workers/README.md
@@ -6,7 +6,7 @@ Learn the production pattern of combining CPU and GPU workers for cost-effective
 
 - **Mixed worker architecture** - Combining CPU and GPU workers intelligently
 - **Cost optimization** - Using GPU only when necessary
-- **Pipeline orchestration** - Coordinating multiple worker types
+- **Pipeline orchestration** - Coordinating multiple worker types via a load-balanced endpoint
 - **Production patterns** - Real-world ML service architecture
 
 ## Architecture
@@ -131,41 +131,42 @@ Total: $0.0019/sec
 
 ### CPU Preprocessing Worker
 ```python
-preprocess_config = CpuLiveServerless(
+@Endpoint(
     name="preprocess_worker",
-    instanceIds=[CpuInstanceType.CPU3G_2_8],  # 2 vCPU, 8GB
-    workersMin=0,
-    workersMax=10,  # High traffic capacity
-    idleTimeout=3,   # Quick scale-down
+    cpu=CpuInstanceType.CPU3G_2_8,  # 2 vCPU, 8GB
+    workers=(0, 10),
+    idle_timeout=3,
 )
+async def preprocess_text(input_data: dict) -> dict: ...
 ```
 
 **Cost:** ~$0.0002/sec
 **Best for:** Validation, cleaning, tokenization
 
 ### GPU Inference Worker
 ```python
-gpu_config = LiveServerless(
+@Endpoint(
     name="inference_worker",
-    gpus=[GpuGroup.ADA_24],  # RTX 4090
-    workersMin=0,
-    workersMax=3,
-    idleTimeout=5,
+    gpu=GpuGroup.ADA_24,  # RTX 4090
+    workers=(0, 3),
+    idle_timeout=5,
+    dependencies=["torch"],
 )
+async def gpu_inference(input_data: dict) -> dict: ...
 ```
 
 **Cost:** ~$0.0015/sec
 **Best for:** ML model inference only
 
 ### CPU Postprocessing Worker
 ```python
-postprocess_config = CpuLiveServerless(
+@Endpoint(
     name="postprocess_worker",
-    instanceIds=[CpuInstanceType.CPU3G_2_8],  # 2 vCPU, 8GB
-    workersMin=0,
-    workersMax=10,
-    idleTimeout=3,
+    cpu=CpuInstanceType.CPU3G_2_8,  # 2 vCPU, 8GB
+    workers=(0, 10),
+    idle_timeout=3,
 )
+async def postprocess_results(input_data: dict) -> dict: ...
 ```
 
 **Cost:** ~$0.0002/sec
@@ -176,7 +177,13 @@ postprocess_config = CpuLiveServerless(
 The `/classify` load-balanced endpoint orchestrates all workers:
 
 ```python
-@remote(resource_config=pipeline_config, method="POST", path="/classify")
+from cpu_worker import postprocess_results, preprocess_text
+from gpu_worker import gpu_inference
+from runpod_flash import Endpoint
+
+pipeline = Endpoint(name="classify_pipeline", cpu="cpu3c-1-2", workers=(1, 3))
+
+@pipeline.post("/classify")
 async def classify(text: str) -> dict:
     """Complete ML pipeline: CPU preprocess -> GPU inference -> CPU postprocess."""
     preprocess_result = await preprocess_text({"text": text})
@@ -234,15 +241,9 @@ For higher volumes, savings multiply significantly.
 
 ```python
 try:
-    # Stage 1: Preprocess (validation already done)
     preprocess_result = await preprocess_text(data)
-
-    # Stage 2: GPU inference
     gpu_result = await gpu_inference(preprocess_result)
-
-    # Stage 3: Postprocess
     final_result = await postprocess_results(gpu_result)
-
     return final_result
 except Exception as e:
     logger.error(f"Pipeline failed: {e}")
@@ -251,13 +252,13 @@ except Exception as e:
 
 ### 2. Timeouts
 
-Set appropriate timeouts for each stage:
+Set appropriate timeouts for each stage via `execution_timeout_ms`:
 ```python
 # CPU stages: short timeouts
-preprocess_config.executionTimeout = 30  # seconds
+@Endpoint(name="preprocess", cpu="cpu3c-1-2", execution_timeout_ms=30000)
 
 # GPU stage: longer timeout
-gpu_config.executionTimeout = 120  # seconds
+@Endpoint(name="inference", gpu=GpuGroup.ADA_24, execution_timeout_ms=120000)
 ```
 
 ### 3. Monitoring
@@ -299,7 +300,7 @@ Review worker usage:
 ### Slow Performance
 
 - Increase CPU worker max count for preprocessing
-- Check if GPU cold start is the issue (set workersMin=1)
+- Check if GPU cold start is the issue (set `workers=(1, 3)` for always-warm)
 - Consider caching preprocessed data
 
 ## Next Steps