Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ madengine build [OPTIONS]
| `--target-archs` | `-a` | TEXT | `[]` | Target GPU architectures (e.g., gfx908,gfx90a,gfx942) |
| `--registry` | `-r` | TEXT | `None` | Docker registry to push images to |
| `--batch-manifest` | | TEXT | `None` | Input batch.json file for batch build mode |
| `--use-image` | | TEXT | `None` | Skip Docker build, use pre-built image. Omit value to auto-detect from model's `DOCKER_IMAGE_NAME` |
| `--build-on-compute` | | FLAG | `False` | Build Docker images on SLURM compute node instead of login node |
| `--additional-context` | `-c` | TEXT | `"{}"` | Additional context as JSON string |
| `--additional-context-file` | `-f` | TEXT | `None` | File containing additional context JSON |
| `--clean-docker-cache` | | FLAG | `False` | Rebuild images without using cache |
Expand Down Expand Up @@ -142,6 +144,16 @@ madengine build --tags model \

# Real-time output with verbose logging
madengine build --tags model --live-output --verbose

# Use pre-built image (skip Docker build)
madengine build --tags sglang_disagg \
--use-image lmsysorg/sglang:v0.5.5.post3-rocm700-mi30x \
--additional-context-file slurm-config.json

# Build on SLURM compute node instead of login node
madengine build --tags model \
--build-on-compute \
--additional-context-file slurm-config.json
```

**Required Context for Build:**
Expand Down Expand Up @@ -170,6 +182,159 @@ When using `--batch-manifest`, provide a JSON file with selective build configur

See [Batch Build Guide](batch-build.md) for details.

**Pre-built Image Mode (`--use-image`):**

Skip Docker build and use an existing image from a registry or local Docker cache:

```bash
# Auto-detect image from model card's DOCKER_IMAGE_NAME env var
madengine build --tags sglang_disagg \
--use-image \
--additional-context-file config.json

# Explicitly specify image from Docker Hub
madengine build --tags sglang_disagg \
--use-image lmsysorg/sglang:v0.5.5.post3-rocm700-mi30x \
--additional-context-file config.json

# Use image from NGC
madengine build --tags model \
--use-image nvcr.io/nvidia/pytorch:24.01-py3

# Use locally cached image
madengine build --tags model \
--use-image my-local-image:latest
```

**Image Resolution Priority:**
1. If `--use-image <name>` is specified, use that image
2. If `--use-image` (no value), auto-detect from model card's `DOCKER_IMAGE_NAME` env var
3. If no image found in model card, error with helpful suggestions

**Multiple Models Warning:**
When using auto-detection with multiple models that have different `DOCKER_IMAGE_NAME` values, the first model's image is used and a warning is printed.

**Mutual Exclusivity:**
- `--use-image` cannot be used with `--registry` (push requires local build)
- `--use-image` cannot be used with `--build-on-compute` (skip build vs. build on compute)

**When to use `--use-image`:**
- Using official framework images (SGLang, vLLM, etc.)
- Image is pre-cached on compute nodes
- Testing without rebuilding
- CI/CD pipelines with external images

The generated manifest marks the image as `"prebuilt": true` with `build_time: 0`.

**Build on Compute Node (`--build-on-compute`):**

Build Docker images on a SLURM compute node, push to registry, and pull in parallel during run phase:

```bash
# Build on compute node and push to registry (--registry REQUIRED)
madengine build --tags model \
--build-on-compute \
--registry docker.io/myorg \
--additional-context-file slurm-config.json
```

**Required:** `--registry` must be specified with `--build-on-compute`.

**SLURM Config Priority:**
1. Model card's `slurm` section (base configuration)
2. `--additional-context` overrides (command line takes precedence)

If the model card already has `slurm` config, you only need to provide missing or override values:

```bash
# Model card has partition/time, just override reservation
madengine build --tags model \
--build-on-compute \
--registry docker.io/myorg \
--additional-context '{"slurm": {"reservation": "my-res"}}'
```

**When to use `--build-on-compute`:**
- Login node has limited disk space or resources
- Build requires GPU access (e.g., AOT compilation)
- Login node policies prohibit heavy workloads
- Distributing images to many compute nodes (build once, pull everywhere)

**How it works:**

*Build Phase:*
1. Discovers model and merges SLURM config (model card + additional-context)
2. Submits build job to **1 compute node** via `sbatch --wait`
3. Builds Docker image on that node
4. Pushes image to registry
5. Generates manifest with registry image name

*Run Phase:*
1. Detects `built_on_compute: true` in manifest
2. Pulls image **in parallel on ALL nodes** via `srun docker pull`
3. Executes model script

**Inside existing SLURM allocation:**

If you're already inside an `salloc` allocation, `--build-on-compute` uses `srun` directly instead of submitting a new job.

**Error Messages:**

If required SLURM fields are missing, specific errors are shown:
- Missing `partition`: "Add partition to model card's slurm section or via --additional-context"

---

**Multi-Node SLURM Launcher (`slurm_multi`):**

Models using the `slurm_multi` launcher (for multi-node distributed inference) **require** either `--registry` or `--use-image`:

```bash
# Option 1: Build and push to registry
madengine build --tags sglang_model \
--registry docker.io/myorg \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'

# Option 2: Use pre-built image from registry
madengine build --tags sglang_model \
--use-image docker.io/myorg/sglang:latest

# Option 3: Build on compute and push
madengine build --tags sglang_model \
--build-on-compute \
--registry docker.io/myorg \
--additional-context-file config.json
```

**Why this requirement?**

Multi-node SLURM jobs run on multiple compute nodes. Each node needs access to the Docker image:
- Local builds only exist on the login/build node
- Compute nodes cannot access locally built images
- Registry images enable parallel `docker pull` on all nodes

**Parallel Image Pull:**

During `madengine run`, images from a registry are automatically pulled in parallel on all allocated nodes:

```bash
srun --nodes=$SLURM_NNODES --ntasks=$SLURM_NNODES docker pull <image>
```

This ensures fast, consistent image availability across the cluster.

**Re-using Images:**

For subsequent runs with the same image, use `--use-image` to skip building:

```bash
# First run: build and push
madengine build --tags model --registry docker.io/myorg

# Subsequent runs: use pre-built image
madengine build --tags model --use-image docker.io/myorg/model:latest
```

---

### `run` - Execute Models
Expand Down
87 changes: 87 additions & 0 deletions docs/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,39 @@ SLURM automatically provides:
- Network interface configuration
- Rank assignment via `$SLURM_PROCID`

### SLURM Allocation Detection

madengine automatically detects if you're running inside an existing SLURM allocation (via `salloc`):

```bash
# Allocate nodes interactively
salloc -N 3 -p gpu --gpus-per-node=8 -t 04:00:00

# madengine detects the allocation automatically
madengine run --manifest-file build_manifest.json
# Output: ✓ Detected existing SLURM allocation: Job 12345
# Allocation has 3 nodes available
```

**Behavior inside allocation:**
- Uses `srun` directly instead of `sbatch`
- Validates requested nodes ≤ available nodes
- Warns if using fewer nodes than allocated
- Skips job submission (already allocated)

**Build inside allocation:**

```bash
# Inside salloc session
madengine build --tags model --build-on-compute
# Uses srun instead of sbatch --wait
```

**Environment variables detected:**
- `SLURM_JOB_ID` - Indicates inside allocation
- `SLURM_NNODES` - Number of nodes available
- `SLURM_NODELIST` - List of allocated nodes

### Monitoring

```bash
Expand Down Expand Up @@ -372,6 +405,60 @@ scancel -u $USER
}
```

### Baremetal Execution (slurm_multi)

For disaggregated inference workloads like SGLang Disaggregated, madengine supports baremetal execution where the model's `.slurm` script manages Docker containers directly:

```json
{
"slurm": {
"partition": "gpu",
"nodes": 3,
"gpus_per_node": 8,
"time": "04:00:00"
},
"distributed": {
"launcher": "slurm_multi",
"nnodes": 3,
"nproc_per_node": 8,
"sglang_disagg": {
"prefill_nodes": 1,
"decode_nodes": 1
}
}
}
```

**How baremetal execution works:**
1. madengine generates a wrapper script (not a Docker container)
2. The wrapper runs the model's `.slurm` script directly on baremetal
3. The `.slurm` script manages Docker containers via `srun`
4. Environment variables from `models.json` and `additional-context` are passed through

**When to use `slurm_multi`:**
- SGLang Disaggregated inference (proxy + prefill + decode nodes)
- Workloads requiring direct SLURM node control
- Custom Docker orchestration via `.slurm` scripts

**Registry Requirement:**

Models using `slurm_multi` **require** either `--registry` or `--use-image` during build:

```bash
# Option 1: Build and push to registry
madengine build --tags model --registry docker.io/myorg

# Option 2: Use pre-built image
madengine build --tags model --use-image

# Option 3: Build on compute and push
madengine build --tags model --build-on-compute --registry docker.io/myorg
```

This ensures all compute nodes can pull the image in parallel during `madengine run`.

See [Launchers Guide](launchers.md#7-sglang-disaggregated-new) for detailed configuration.

## Troubleshooting

### Kubernetes Issues
Expand Down
30 changes: 21 additions & 9 deletions docs/launchers.md
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@ SGLang Disaggregated separates inference into specialized node pools:
```json
{
"distributed": {
"launcher": "sglang-disagg",
"launcher": "slurm_multi",
"nnodes": 5,
"nproc_per_node": 8,
"sglang_disagg": {
Expand Down Expand Up @@ -403,7 +403,7 @@ Override automatic split based on workload characteristics:
```json
{
"distributed": {
"launcher": "sglang-disagg",
"launcher": "slurm_multi",
"nnodes": 7,
"nproc_per_node": 8,
"sglang_disagg": {
Expand Down Expand Up @@ -436,6 +436,18 @@ Override automatic split based on workload characteristics:
- Ray cluster coordination
- No torchrun needed (manages own processes)

**Registry Requirement (SLURM)**:

Models using `slurm_multi` launcher **require** `--registry` or `--use-image` during build:

```bash
madengine build --tags model --registry docker.io/myorg
# OR
madengine build --tags model --use-image
```

This ensures all compute nodes can pull the image in parallel during `madengine run`.

**Environment Variables (K8s)**:
```bash
POD_INDEX=${JOB_COMPLETION_INDEX} # Pod index for role assignment
Expand All @@ -457,12 +469,12 @@ SGLANG_NODE_IPS="10.0.0.1,10.0.0.2,..."
```

**Examples**:
- K8s Minimal: `examples/k8s-configs/minimal/sglang-disagg-minimal.json`
- K8s Basic: `examples/k8s-configs/basic/sglang-disagg-multi-node-basic.json`
- K8s Custom: `examples/k8s-configs/basic/sglang-disagg-custom-split.json`
- SLURM Minimal: `examples/slurm-configs/minimal/sglang-disagg-minimal.json`
- SLURM Basic: `examples/slurm-configs/basic/sglang-disagg-multi-node.json`
- SLURM Custom: `examples/slurm-configs/basic/sglang-disagg-custom-split.json`
- K8s Minimal: `examples/k8s-configs/minimal/slurm-multi-minimal.json`
- K8s Basic: `examples/k8s-configs/basic/slurm-multi-multi-node-basic.json`
- K8s Custom: `examples/k8s-configs/basic/slurm-multi-custom-split.json`
- SLURM Minimal: `examples/slurm-configs/minimal/slurm-multi-minimal.json`
- SLURM Basic: `examples/slurm-configs/basic/slurm-multi-multi-node.json`
- SLURM Custom: `examples/slurm-configs/basic/slurm-multi-custom-split.json`

**Comparison: SGLang vs SGLang Disaggregated**:

Expand Down Expand Up @@ -681,7 +693,7 @@ SGLANG_NODE_RANK=${SLURM_PROCID}
```bash
Error: Unknown launcher type 'xyz'
```
Solution: Use one of: `torchrun`, `deepspeed`, `megatron`, `torchtitan`, `vllm`, `sglang`, `sglang-disagg`
Solution: Use one of: `torchrun`, `deepspeed`, `megatron`, `torchtitan`, `vllm`, `sglang`, `slurm_multi`

**2. Multi-Node Communication Fails**
```bash
Expand Down
Loading