SGLang MXFP8 on Ascend NPU Research

This repository (sglang_quant_eval) is dedicated to researching and implementing MXFP8/MXFP4 quantization adaptation for SGLang on Huawei Ascend NPU hardware.

🎯 Project Objective

Target: Adapt SGLang's quantization system to support Huawei Ascend NPU using MXFP8 and MXFP4 data formats.
Supported Models: Both standard LLMs (e.g., Qwen3, Qwen3.5, Llama, DeepSeek) via srt and Diffusion models (e.g., Wan2.2) via the multimodal_gen subsystem.
Related Issue: sgl-project/sglang#14424 (Diffusion), sgl-project/sglang#21584 (LLMs)

📁 Repository Structure

sglang/ - The core SGLang source code repository (submodule/clone) where the modifications will be made.
MindIE-SD/ - Huawei's MindIE-SD source code (submodule/clone), serving as a primary reference implementation for Ascend NPU MXFP8/FP8 operations (Diffusion).
vllm-ascend/ - vLLM backend code for Ascend (submodule/clone), serving as a primary reference for LLM MXFP adaptation.
sglang_mxfp8_ascend_research.md / _zh.md - Comprehensive research report, analysis, and implementation plan for the MXFP8 adaptation in English and Chinese.
README.md / README_zh.md - Project description and guide in English and Chinese.
CLAUDE.md - AI assistant system instructions and project context.
.agent/ & .claude/ - Custom agent skills and configurations for AI assistants to help with codebase reading and Gitmoji commits.

🚀 Implementation Paths

Based on our research (detailed in the research report), there are two main paths for MXFP8 adaptation:

Offline Quantization (msmodelslim): Adapting SGLang to load pre-quantized MXFP8 weights produced by Huawei's msmodelslim tool. This involves adding to SGLang's existing msmodelslim scheme framework.
Online Quantization: Implementing dynamic MXFP8 quantization during inference directly from FP16/BF16 weights using --quantization mxfp8.

Both paths leverage core torch_npu APIs such as torch_npu.npu_dynamic_mx_quant and torch_npu.npu_quant_matmul.

💻 Environment Requirements

To develop and run the code in this repository, the following environment is required:

Hardware: Huawei Ascend NPU (e.g., Atlas 800I A2/A3)
Software: CANN >= 8.0.RC3 (required for npu_dynamic_mx_quant and MXFP8 support)
Dependencies: torch, torch_npu, and sglang dependencies.

🔧 AI Agent Skills

This repository includes custom tools in .agent/skills to assist with development:

sglang-quant-lookup: Quickly find SGLang quantization implementation details.
npu-api-check: Analyze torch_npu API usage patterns.
compare-impl: Compare implementations between SGLang and MindIE-SD.
trace-quant-path: Trace the full code path for a quantization method in SGLang.
check-issue: Check the latest status of SGLang GitHub issues/PRs related to our work.
gitmoji_commit: Automatically generate Gitmoji-compliant commit messages.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.agents/skills		.agents/skills
.claude		.claude
MindIE-SD @ 17134ce		MindIE-SD @ 17134ce
diffusion		diffusion
docs/agents		docs/agents
llm		llm
msmodelslim @ 147a15a		msmodelslim @ 147a15a
sglang @ 373fc3f		sglang @ 373fc3f
vllm-ascend @ 43abdf0		vllm-ascend @ 43abdf0
.gitattibutes		.gitattibutes
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
DualLevelQuantBatchMatmul.md		DualLevelQuantBatchMatmul.md
DynamicDualLevelMxQuant.md		DynamicDualLevelMxQuant.md
MXFP4_OFFLINE_GUIDE.md		MXFP4_OFFLINE_GUIDE.md
README.md		README.md
README_zh.md		README_zh.md
SGLang_Ascend_MXFP8_Adaptation.pdf		SGLang_Ascend_MXFP8_Adaptation.pdf
SGLang_Ascend_MXFP8_Adaptation.pptx		SGLang_Ascend_MXFP8_Adaptation.pptx
a5_ascend.patch		a5_ascend.patch
debug_mxfp8_moe.py		debug_mxfp8_moe.py
sglang_mxfp8_ascend_research.md		sglang_mxfp8_ascend_research.md
sglang_mxfp8_ascend_research_zh.md		sglang_mxfp8_ascend_research_zh.md
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGLang MXFP8 on Ascend NPU Research

🎯 Project Objective

📁 Repository Structure

🚀 Implementation Paths

💻 Environment Requirements

🔧 AI Agent Skills

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SGLang MXFP8 on Ascend NPU Research

🎯 Project Objective

📁 Repository Structure

🚀 Implementation Paths

💻 Environment Requirements

🔧 AI Agent Skills

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages