Add vLLM DSv4 FP8 MI355X benchmark (vllm#40889) by Oseltamivir · Pull Request #1188 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-04-26T21:14:15Z

Summary

Add vLLM benchmark for DeepSeek-V4-Pro FP8 on MI355X with AITER-accelerated MLA decode from vllm-project/vllm#40889 (stacked on #40871)
Update MI355X runner to resolve framework-specific script names (e.g. dsv4_fp8_mi355x_vllm.sh) with fallback to generic names, avoiding rename of existing scripts
YAML config: TP=4 and TP=8, concurrency 4–64, 1k1k and 8k1k

Image dependency

The image must contain PR #40871's compiled C++ kernels (base DSv4 ROCm support). PR #40889 is Python-only (3 files) and is patched at runtime at a pinned SHA (b3a4a44). Update the image tag once #40871 merges into a nightly or release.

Test plan

generate_sweep_configs.py test-config --config-keys dsv4-fp8-mi355x-vllm generates 20 valid matrix entries
Existing configs (kimik2.5-int4-mi355x-vllm, dsv4-fp8-mi355x-sglang) still resolve correctly
All 88 sweep config tests pass
E2E run on MI355X once image with #40871 is available

🤖 Generated with Claude Code

Add benchmark config for DeepSeek-V4-Pro FP8 on MI355X using vLLM with AITER-accelerated MLA decode from vllm-project/vllm#40889 (stacked on #40871 for base ROCm DSv4 support). - New benchmark script that overlays PR #40889's Python-only changes (3 files) on top of an image containing #40871's compiled C++ kernels - YAML config with TP=4 and TP=8, concurrency 4-64, for 1k1k and 8k1k - Runner updated to try framework-specific script names first (e.g. dsv4_fp8_mi355x_vllm.sh) with fallback to generic names, resolving the DSv4 SGLang/vLLM naming collision without renaming existing scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-26T21:14:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

The nightly doesn't contain #40871 yet. Switch to v0.19.1 as a stable base with full ROCm toolchain, and rebuild vLLM from the PR branch (includes both #40871 C++ kernels and #40889 AITER MLA decode) at runtime via pip install --no-build-isolation -e . Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The PR branch imports mori which requires a newer torch/HIP than v0.19.1 ships. The nightly has the matching libs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The vllm/vllm-openai-rocm:nightly image targets MI300X/MI325X and cannot enumerate MI355X GPUs, causing torch.accelerator.device_count() to return too few and tripping the DP rank bounds assertion. Switch to rocm/atom:rocm7.2.2 which has MI355X support, aiter with MLA decode, and PyTorch 2.10. Also drop TP=4 (model doesn't fit) and add --no-deps to protect the base image's pinned packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The ATOM image lacks setuptools-scm which vLLM's setup.py requires. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

--no-deps left vLLM runtime deps (cbor2 etc.) uninstalled. The ATOM image's plugin also causes a circular import when loaded by the PR branch's vLLM. Fix both: let pip resolve deps normally, and set VLLM_PLUGINS="" to skip the ATOM platform plugin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

--force-reinstall caused pip to re-download torch from PyPI (CUDA build), overwriting the ATOM image's ROCm torch and losing libtorch_hip.so. Without it, pip installs vLLM fresh and only adds missing deps without touching already-satisfied packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Split the install: --no-deps for vLLM itself (builds C++ extensions against the image's ROCm torch), then install runtime deps from requirements/rocm.txt constrained by a pip freeze snapshot of the ROCm packages (torch, torchvision, aiter, triton). This prevents pip from replacing them with incompatible CUDA builds from PyPI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The ATOM image's /triton-test/ build directory is cleaned up by the Dockerfile, leaving a stale editable install. pip chokes when resolving xgrammar's transitive deps through the missing path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The ATOM image may have multiple packages installed from /triton-test/. Remove direct_url.json from any dist-info that references the cleaned-up build directory so pip's resolver doesn't follow the stale path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

xgrammar's dependency chain resolves to a triton package that was editable-installed from /triton-test/ in the ATOM image build stage. That directory is cleaned up in the final image, so pip errors trying to process the path. xgrammar is not needed for serving benchmarks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Oseltamivir requested a review from a team April 26, 2026 21:14

Oseltamivir requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 26, 2026 21:14

github-project-automation Bot added this to InferenceMAX Board Apr 26, 2026

Merge branch 'main' into dsv4-fp8-mi355x-vllm

fcfd3c0

Oseltamivir added the sweep-enabled label Apr 26, 2026

Oseltamivir and others added 19 commits April 26, 2026 14:56

Add perf-changelog entry for dsv4-fp8-mi355x-vllm

4c1b22d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch to nightly image (v0.19.1 missing mori/libtorch_hip)

f0c6907

The PR branch imports mori which requires a newer torch/HIP than v0.19.1 ships. The nightly has the matching libs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Install setuptools-scm before vLLM build

feeced6

The ATOM image lacks setuptools-scm which vLLM's setup.py requires. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into dsv4-fp8-mi355x-vllm

bb2a840

triton problem

42981cf

next

f03e084

next

2f0323f

next

ee3c7e9

amdsmi

30005c5

tilelang

ee3db15

lower conc

9577af4

Oseltamivir and others added 2 commits April 27, 2026 11:54

Merge branch 'main' into dsv4-fp8-mi355x-vllm

3fb7c4d

lower mem?

baca692

Oseltamivir added full-sweep-enabled and removed sweep-enabled labels Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM DSv4 FP8 MI355X benchmark (vllm#40889)#1188

Add vLLM DSv4 FP8 MI355X benchmark (vllm#40889)#1188
Oseltamivir wants to merge 23 commits intomainfrom
dsv4-fp8-mi355x-vllm

Oseltamivir commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented Apr 26, 2026

Summary

Image dependency

Test plan

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant