Skip to content

Add vLLM DSv4 FP8 MI355X benchmark (vllm#40889)#1188

Open
Oseltamivir wants to merge 23 commits intomainfrom
dsv4-fp8-mi355x-vllm
Open

Add vLLM DSv4 FP8 MI355X benchmark (vllm#40889)#1188
Oseltamivir wants to merge 23 commits intomainfrom
dsv4-fp8-mi355x-vllm

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

Summary

  • Add vLLM benchmark for DeepSeek-V4-Pro FP8 on MI355X with AITER-accelerated MLA decode from vllm-project/vllm#40889 (stacked on #40871)
  • Update MI355X runner to resolve framework-specific script names (e.g. dsv4_fp8_mi355x_vllm.sh) with fallback to generic names, avoiding rename of existing scripts
  • YAML config: TP=4 and TP=8, concurrency 4–64, 1k1k and 8k1k

Image dependency

The image must contain PR #40871's compiled C++ kernels (base DSv4 ROCm support). PR #40889 is Python-only (3 files) and is patched at runtime at a pinned SHA (b3a4a44). Update the image tag once #40871 merges into a nightly or release.

Test plan

  • generate_sweep_configs.py test-config --config-keys dsv4-fp8-mi355x-vllm generates 20 valid matrix entries
  • Existing configs (kimik2.5-int4-mi355x-vllm, dsv4-fp8-mi355x-sglang) still resolve correctly
  • All 88 sweep config tests pass
  • E2E run on MI355X once image with #40871 is available

🤖 Generated with Claude Code

Add benchmark config for DeepSeek-V4-Pro FP8 on MI355X using vLLM with
AITER-accelerated MLA decode from vllm-project/vllm#40889 (stacked on
#40871 for base ROCm DSv4 support).

- New benchmark script that overlays PR #40889's Python-only changes
  (3 files) on top of an image containing #40871's compiled C++ kernels
- YAML config with TP=4 and TP=8, concurrency 4-64, for 1k1k and 8k1k
- Runner updated to try framework-specific script names first (e.g.
  dsv4_fp8_mi355x_vllm.sh) with fallback to generic names, resolving
  the DSv4 SGLang/vLLM naming collision without renaming existing scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Oseltamivir and others added 19 commits April 26, 2026 14:56
The nightly doesn't contain #40871 yet. Switch to v0.19.1 as a stable
base with full ROCm toolchain, and rebuild vLLM from the PR branch
(includes both #40871 C++ kernels and #40889 AITER MLA decode) at
runtime via pip install --no-build-isolation -e .

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PR branch imports mori which requires a newer torch/HIP than
v0.19.1 ships. The nightly has the matching libs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The vllm/vllm-openai-rocm:nightly image targets MI300X/MI325X and
cannot enumerate MI355X GPUs, causing torch.accelerator.device_count()
to return too few and tripping the DP rank bounds assertion. Switch to
rocm/atom:rocm7.2.2 which has MI355X support, aiter with MLA decode,
and PyTorch 2.10. Also drop TP=4 (model doesn't fit) and add --no-deps
to protect the base image's pinned packages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ATOM image lacks setuptools-scm which vLLM's setup.py requires.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
--no-deps left vLLM runtime deps (cbor2 etc.) uninstalled. The ATOM
image's plugin also causes a circular import when loaded by the PR
branch's vLLM. Fix both: let pip resolve deps normally, and set
VLLM_PLUGINS="" to skip the ATOM platform plugin.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
--force-reinstall caused pip to re-download torch from PyPI (CUDA
build), overwriting the ATOM image's ROCm torch and losing
libtorch_hip.so. Without it, pip installs vLLM fresh and only adds
missing deps without touching already-satisfied packages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split the install: --no-deps for vLLM itself (builds C++ extensions
against the image's ROCm torch), then install runtime deps from
requirements/rocm.txt constrained by a pip freeze snapshot of the
ROCm packages (torch, torchvision, aiter, triton). This prevents
pip from replacing them with incompatible CUDA builds from PyPI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ATOM image's /triton-test/ build directory is cleaned up by the
Dockerfile, leaving a stale editable install. pip chokes when resolving
xgrammar's transitive deps through the missing path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ATOM image may have multiple packages installed from /triton-test/.
Remove direct_url.json from any dist-info that references the cleaned-up
build directory so pip's resolver doesn't follow the stale path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
xgrammar's dependency chain resolves to a triton package that was
editable-installed from /triton-test/ in the ATOM image build stage.
That directory is cleaned up in the final image, so pip errors trying
to process the path. xgrammar is not needed for serving benchmarks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant