ROCm · raviguptaamd · Feb 5, 2026 · Feb 9, 2026 · Feb 11, 2026 · Feb 18, 2026
diff --git a/docs/cli-reference.md b/docs/cli-reference.md
@@ -98,6 +98,8 @@ madengine build [OPTIONS]
 | `--target-archs` | `-a` | TEXT | `[]` | Target GPU architectures (e.g., gfx908,gfx90a,gfx942) |
 | `--registry` | `-r` | TEXT | `None` | Docker registry to push images to |
 | `--batch-manifest` | | TEXT | `None` | Input batch.json file for batch build mode |
+| `--use-image` | | TEXT | `None` | Skip Docker build, use pre-built image. Omit value to auto-detect from model's `DOCKER_IMAGE_NAME` |
+| `--build-on-compute` | | FLAG | `False` | Build Docker images on SLURM compute node instead of login node |
 | `--additional-context` | `-c` | TEXT | `"{}"` | Additional context as JSON string |
 | `--additional-context-file` | `-f` | TEXT | `None` | File containing additional context JSON |
 | `--clean-docker-cache` | | FLAG | `False` | Rebuild images without using cache |
@@ -142,6 +144,16 @@ madengine build --tags model \
 
 # Real-time output with verbose logging
 madengine build --tags model --live-output --verbose
+
+# Use pre-built image (skip Docker build)
+madengine build --tags sglang_disagg \
+  --use-image lmsysorg/sglang:v0.5.5.post3-rocm700-mi30x \
+  --additional-context-file slurm-config.json
+
+# Build on SLURM compute node instead of login node
+madengine build --tags model \
+  --build-on-compute \
+  --additional-context-file slurm-config.json
 ```
 
 **Required Context for Build:**
@@ -170,6 +182,159 @@ When using `--batch-manifest`, provide a JSON file with selective build configur
 
 See [Batch Build Guide](batch-build.md) for details.
 
+**Pre-built Image Mode (`--use-image`):**
+
+Skip Docker build and use an existing image from a registry or local Docker cache:
+
+```bash
+# Auto-detect image from model card's DOCKER_IMAGE_NAME env var
+madengine build --tags sglang_disagg \
+  --use-image \
+  --additional-context-file config.json
+
+# Explicitly specify image from Docker Hub
+madengine build --tags sglang_disagg \
+  --use-image lmsysorg/sglang:v0.5.5.post3-rocm700-mi30x \
+  --additional-context-file config.json
+
+# Use image from NGC
+madengine build --tags model \
+  --use-image nvcr.io/nvidia/pytorch:24.01-py3
+
+# Use locally cached image
+madengine build --tags model \
+  --use-image my-local-image:latest
+```
+
+**Image Resolution Priority:**
+1. If `--use-image <name>` is specified, use that image
+2. If `--use-image` (no value), auto-detect from model card's `DOCKER_IMAGE_NAME` env var
+3. If no image found in model card, error with helpful suggestions
+
+**Multiple Models Warning:**
+When using auto-detection with multiple models that have different `DOCKER_IMAGE_NAME` values, the first model's image is used and a warning is printed.
+
+**Mutual Exclusivity:**
+- `--use-image` cannot be used with `--registry` (push requires local build)
+- `--use-image` cannot be used with `--build-on-compute` (skip build vs. build on compute)
+
+**When to use `--use-image`:**
+- Using official framework images (SGLang, vLLM, etc.)
+- Image is pre-cached on compute nodes
+- Testing without rebuilding
+- CI/CD pipelines with external images
+
+The generated manifest marks the image as `"prebuilt": true` with `build_time: 0`.
+
+**Build on Compute Node (`--build-on-compute`):**
+
+Build Docker images on a SLURM compute node, push to registry, and pull in parallel during run phase:
+
+```bash
+# Build on compute node and push to registry (--registry REQUIRED)
+madengine build --tags model \
+  --build-on-compute \
+  --registry docker.io/myorg \
+  --additional-context-file slurm-config.json
+```
+
+**Required:** `--registry` must be specified with `--build-on-compute`.
+
+**SLURM Config Priority:**
+1. Model card's `slurm` section (base configuration)
+2. `--additional-context` overrides (command line takes precedence)
+
+If the model card already has `slurm` config, you only need to provide missing or override values:
+
+```bash
+# Model card has partition/time, just override reservation
+madengine build --tags model \
+  --build-on-compute \
+  --registry docker.io/myorg \
+  --additional-context '{"slurm": {"reservation": "my-res"}}'
+```
+
+**When to use `--build-on-compute`:**
+- Login node has limited disk space or resources
+- Build requires GPU access (e.g., AOT compilation)
+- Login node policies prohibit heavy workloads
+- Distributing images to many compute nodes (build once, pull everywhere)
+
+**How it works:**
+
+*Build Phase:*
+1. Discovers model and merges SLURM config (model card + additional-context)
+2. Submits build job to **1 compute node** via `sbatch --wait`
+3. Builds Docker image on that node
+4. Pushes image to registry
+5. Generates manifest with registry image name
+
+*Run Phase:*
+1. Detects `built_on_compute: true` in manifest
+2. Pulls image **in parallel on ALL nodes** via `srun docker pull`
+3. Executes model script
+
+**Inside existing SLURM allocation:**
+
+If you're already inside an `salloc` allocation, `--build-on-compute` uses `srun` directly instead of submitting a new job.
+
+**Error Messages:**
+
+If required SLURM fields are missing, specific errors are shown:
+- Missing `partition`: "Add partition to model card's slurm section or via --additional-context"
+
+---
+
+**Multi-Node SLURM Launcher (`slurm_multi`):**
+
+Models using the `slurm_multi` launcher (for multi-node distributed inference) **require** either `--registry` or `--use-image`:
+
+```bash
+# Option 1: Build and push to registry
+madengine build --tags sglang_model \
+  --registry docker.io/myorg \
+  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
+
+# Option 2: Use pre-built image from registry
+madengine build --tags sglang_model \
+  --use-image docker.io/myorg/sglang:latest
+
+# Option 3: Build on compute and push
+madengine build --tags sglang_model \
+  --build-on-compute \
+  --registry docker.io/myorg \
+  --additional-context-file config.json
+```
+
+**Why this requirement?**
+
+Multi-node SLURM jobs run on multiple compute nodes. Each node needs access to the Docker image:
+- Local builds only exist on the login/build node
+- Compute nodes cannot access locally built images
+- Registry images enable parallel `docker pull` on all nodes
+
+**Parallel Image Pull:**
+
+During `madengine run`, images from a registry are automatically pulled in parallel on all allocated nodes:
+
+```bash
+srun --nodes=$SLURM_NNODES --ntasks=$SLURM_NNODES docker pull <image>
+```
+
+This ensures fast, consistent image availability across the cluster.
+
+**Re-using Images:**
+
+For subsequent runs with the same image, use `--use-image` to skip building:
+
+```bash
+# First run: build and push
+madengine build --tags model --registry docker.io/myorg
+
+# Subsequent runs: use pre-built image
+madengine build --tags model --use-image docker.io/myorg/model:latest
+```
+
 ---
 
 ### `run` - Execute Models

diff --git a/docs/deployment.md b/docs/deployment.md
@@ -258,6 +258,39 @@ SLURM automatically provides:
 - Network interface configuration
 - Rank assignment via `$SLURM_PROCID`
 
+### SLURM Allocation Detection
+
+madengine automatically detects if you're running inside an existing SLURM allocation (via `salloc`):
+
+```bash
+# Allocate nodes interactively
+salloc -N 3 -p gpu --gpus-per-node=8 -t 04:00:00
+
+# madengine detects the allocation automatically
+madengine run --manifest-file build_manifest.json
+# Output: ✓ Detected existing SLURM allocation: Job 12345
+#         Allocation has 3 nodes available
+```
+
+**Behavior inside allocation:**
+- Uses `srun` directly instead of `sbatch`
+- Validates requested nodes ≤ available nodes
+- Warns if using fewer nodes than allocated
+- Skips job submission (already allocated)
+
+**Build inside allocation:**
+
+```bash
+# Inside salloc session
+madengine build --tags model --build-on-compute
+# Uses srun instead of sbatch --wait
+```
+
+**Environment variables detected:**
+- `SLURM_JOB_ID` - Indicates inside allocation
+- `SLURM_NNODES` - Number of nodes available
+- `SLURM_NODELIST` - List of allocated nodes
+
 ### Monitoring
 
 ```bash
@@ -372,6 +405,60 @@ scancel -u $USER
 }
 ```
 
+### Baremetal Execution (slurm_multi)
+
+For disaggregated inference workloads like SGLang Disaggregated, madengine supports baremetal execution where the model's `.slurm` script manages Docker containers directly:
+
+```json
+{
+  "slurm": {
+    "partition": "gpu",
+    "nodes": 3,
+    "gpus_per_node": 8,
+    "time": "04:00:00"
+  },
+  "distributed": {
+    "launcher": "slurm_multi",
+    "nnodes": 3,
+    "nproc_per_node": 8,
+    "sglang_disagg": {
+      "prefill_nodes": 1,
+      "decode_nodes": 1
+    }
+  }
+}
+```
+
+**How baremetal execution works:**
+1. madengine generates a wrapper script (not a Docker container)
+2. The wrapper runs the model's `.slurm` script directly on baremetal
+3. The `.slurm` script manages Docker containers via `srun`
+4. Environment variables from `models.json` and `additional-context` are passed through
+
+**When to use `slurm_multi`:**
+- SGLang Disaggregated inference (proxy + prefill + decode nodes)
+- Workloads requiring direct SLURM node control
+- Custom Docker orchestration via `.slurm` scripts
+
+**Registry Requirement:**
+
+Models using `slurm_multi` **require** either `--registry` or `--use-image` during build:
+
+```bash
+# Option 1: Build and push to registry
+madengine build --tags model --registry docker.io/myorg
+
+# Option 2: Use pre-built image
+madengine build --tags model --use-image
+
+# Option 3: Build on compute and push
+madengine build --tags model --build-on-compute --registry docker.io/myorg
+```
+
+This ensures all compute nodes can pull the image in parallel during `madengine run`.
+
+See [Launchers Guide](launchers.md#7-sglang-disaggregated-new) for detailed configuration.
+
 ## Troubleshooting
 
 ### Kubernetes Issues

diff --git a/docs/launchers.md b/docs/launchers.md
@@ -364,7 +364,7 @@ SGLang Disaggregated separates inference into specialized node pools:
 ```json
 {
   "distributed": {
-    "launcher": "sglang-disagg",
+    "launcher": "slurm_multi",
     "nnodes": 5,
     "nproc_per_node": 8,
     "sglang_disagg": {
@@ -403,7 +403,7 @@ Override automatic split based on workload characteristics:
 ```json
 {
   "distributed": {
-    "launcher": "sglang-disagg",
+    "launcher": "slurm_multi",
     "nnodes": 7,
     "nproc_per_node": 8,
     "sglang_disagg": {
@@ -436,6 +436,18 @@ Override automatic split based on workload characteristics:
 - Ray cluster coordination
 - No torchrun needed (manages own processes)
 
+**Registry Requirement (SLURM)**:
+
+Models using `slurm_multi` launcher **require** `--registry` or `--use-image` during build:
+
+```bash
+madengine build --tags model --registry docker.io/myorg
+# OR
+madengine build --tags model --use-image
+```
+
+This ensures all compute nodes can pull the image in parallel during `madengine run`.
+
 **Environment Variables (K8s)**:
 ```bash
 POD_INDEX=${JOB_COMPLETION_INDEX}  # Pod index for role assignment
@@ -457,12 +469,12 @@ SGLANG_NODE_IPS="10.0.0.1,10.0.0.2,..."
 ```
 
 **Examples**:
-- K8s Minimal: `examples/k8s-configs/minimal/sglang-disagg-minimal.json`
-- K8s Basic: `examples/k8s-configs/basic/sglang-disagg-multi-node-basic.json`
-- K8s Custom: `examples/k8s-configs/basic/sglang-disagg-custom-split.json`
-- SLURM Minimal: `examples/slurm-configs/minimal/sglang-disagg-minimal.json`
-- SLURM Basic: `examples/slurm-configs/basic/sglang-disagg-multi-node.json`
-- SLURM Custom: `examples/slurm-configs/basic/sglang-disagg-custom-split.json`
+- K8s Minimal: `examples/k8s-configs/minimal/slurm-multi-minimal.json`
+- K8s Basic: `examples/k8s-configs/basic/slurm-multi-multi-node-basic.json`
+- K8s Custom: `examples/k8s-configs/basic/slurm-multi-custom-split.json`
+- SLURM Minimal: `examples/slurm-configs/minimal/slurm-multi-minimal.json`
+- SLURM Basic: `examples/slurm-configs/basic/slurm-multi-multi-node.json`
+- SLURM Custom: `examples/slurm-configs/basic/slurm-multi-custom-split.json`
 
 **Comparison: SGLang vs SGLang Disaggregated**:
 
@@ -681,7 +693,7 @@ SGLANG_NODE_RANK=${SLURM_PROCID}
 ```bash
 Error: Unknown launcher type 'xyz'
 ```
-Solution: Use one of: `torchrun`, `deepspeed`, `megatron`, `torchtitan`, `vllm`, `sglang`, `sglang-disagg`
+Solution: Use one of: `torchrun`, `deepspeed`, `megatron`, `torchtitan`, `vllm`, `sglang`, `slurm_multi`
 
 **2. Multi-Node Communication Fails**
 ```bash