Add vLLM DSv4 FP8 MI355X benchmark (vllm#40889)#1188
Add vLLM DSv4 FP8 MI355X benchmark (vllm#40889)#1188Oseltamivir wants to merge 23 commits intomainfrom
Conversation
Add benchmark config for DeepSeek-V4-Pro FP8 on MI355X using vLLM with AITER-accelerated MLA decode from vllm-project/vllm#40889 (stacked on #40871 for base ROCm DSv4 support). - New benchmark script that overlays PR #40889's Python-only changes (3 files) on top of an image containing #40871's compiled C++ kernels - YAML config with TP=4 and TP=8, concurrency 4-64, for 1k1k and 8k1k - Runner updated to try framework-specific script names first (e.g. dsv4_fp8_mi355x_vllm.sh) with fallback to generic names, resolving the DSv4 SGLang/vLLM naming collision without renaming existing scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
The nightly doesn't contain #40871 yet. Switch to v0.19.1 as a stable base with full ROCm toolchain, and rebuild vLLM from the PR branch (includes both #40871 C++ kernels and #40889 AITER MLA decode) at runtime via pip install --no-build-isolation -e . Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PR branch imports mori which requires a newer torch/HIP than v0.19.1 ships. The nightly has the matching libs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The vllm/vllm-openai-rocm:nightly image targets MI300X/MI325X and cannot enumerate MI355X GPUs, causing torch.accelerator.device_count() to return too few and tripping the DP rank bounds assertion. Switch to rocm/atom:rocm7.2.2 which has MI355X support, aiter with MLA decode, and PyTorch 2.10. Also drop TP=4 (model doesn't fit) and add --no-deps to protect the base image's pinned packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ATOM image lacks setuptools-scm which vLLM's setup.py requires. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
--no-deps left vLLM runtime deps (cbor2 etc.) uninstalled. The ATOM image's plugin also causes a circular import when loaded by the PR branch's vLLM. Fix both: let pip resolve deps normally, and set VLLM_PLUGINS="" to skip the ATOM platform plugin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
--force-reinstall caused pip to re-download torch from PyPI (CUDA build), overwriting the ATOM image's ROCm torch and losing libtorch_hip.so. Without it, pip installs vLLM fresh and only adds missing deps without touching already-satisfied packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split the install: --no-deps for vLLM itself (builds C++ extensions against the image's ROCm torch), then install runtime deps from requirements/rocm.txt constrained by a pip freeze snapshot of the ROCm packages (torch, torchvision, aiter, triton). This prevents pip from replacing them with incompatible CUDA builds from PyPI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ATOM image's /triton-test/ build directory is cleaned up by the Dockerfile, leaving a stale editable install. pip chokes when resolving xgrammar's transitive deps through the missing path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ATOM image may have multiple packages installed from /triton-test/. Remove direct_url.json from any dist-info that references the cleaned-up build directory so pip's resolver doesn't follow the stale path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
xgrammar's dependency chain resolves to a triton package that was editable-installed from /triton-test/ in the ATOM image build stage. That directory is cleaned up in the final image, so pip errors trying to process the path. xgrammar is not needed for serving benchmarks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
dsv4_fp8_mi355x_vllm.sh) with fallback to generic names, avoiding rename of existing scriptsImage dependency
The image must contain PR #40871's compiled C++ kernels (base DSv4 ROCm support). PR #40889 is Python-only (3 files) and is patched at runtime at a pinned SHA (
b3a4a44). Update the image tag once #40871 merges into a nightly or release.Test plan
generate_sweep_configs.py test-config --config-keys dsv4-fp8-mi355x-vllmgenerates 20 valid matrix entrieskimik2.5-int4-mi355x-vllm,dsv4-fp8-mi355x-sglang) still resolve correctly🤖 Generated with Claude Code