Skip to content

SemiAnalysisAI/microbench-blackwell

Repository files navigation

Blackwell Microbenchmarks

A collection of microbenchmarks for NVIDIA Blackwell (SM 100) GPUs, covering memory throughput, latency, tensor core (UMMA) performance, and HBM-resident elementwise throughput.

https://newsletter.semianalysis.com/p/dissecting-nvidia-blackwell-tensor

Benchmarks

Path Purpose
ldgsts_throughput/ LDGSTS HBM throughput
tma2d_throughput/ TMA 2D HBM throughput
ldgsts_latency/ LDGSTS latency
tma2d_latency/ TMA 2D latency
umma_throughput/ UMMA tensor-core throughput
umma_latency/ UMMA tensor-core latency
elementwise_throughput/ fp32 HBM-resident activation/elementwise throughput
image

Acknowledgements

Compute for this project is generously sponsored by Nebius and Verda.

Nebius        Verda

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors