Only add one LDS Barrier for one wave blockSizes by umangyadav · Pull Request #2253 · ROCm/rocMLIR

umangyadav · 2026-02-24T20:27:54Z

Motivation

Depends on #2250

For schedule=1 after loop pipelining, it generates following program after barrier placement and before loop pipelining

scf.for 
{
LDSBarrier (__bwd__barrier_)
GlobalLoad
DSWrite
LDSBarrier (__fwd_barrier__)
DSRead + MFMA
}

After loop pipelining and pushDownBarrier it becomes this

scf.for {
GlobalLoad
LDSBarrier 
DSRead + MFMA
LDSBarrier 
DSWrite
}

In here it requires bwd_barrier for loop carried dependency which makes sure that all waves within workgroup have finished issuing and reading from LDS buffers before continuing to next iteration of for loop which again writes to same LDS buffer. Waves can be out of sync therefore this barrier ensures the Reads from all waves have finished before writing into same buffer.

For the special case of blockSize = 1xWaveSize, we don't need to wait for all the waves as there is only one wave. Just issuing DSReads is enough and we don't have to wait it to finish before proceeding to next iteration.

Therefore for that case we can skip adding __bwd_barrier__

Technical Details

Currently this is only enabled for single For loop and therefore only GEMMs and Convs. Nested for loop may require additional analysis across the loops.

Test Plan

Added new tests with blockSize = 1xWaveSize

Copilot

Pull request overview

This PR implements an optimization that skips backward LDS barriers for single-wave GPU kernels with specific schedule versions (1 and 3) during loop pipelining. The motivation is that when a workgroup consists of only one wave (blockSize ≤ waveSize), the GPU's in-order instruction execution within the wave eliminates the need for explicit synchronization barriers between iterations.

Changes:

Added canSkipBackwardBarrierForOneWave() function to determine when backward barriers can be safely skipped
Modified placeBarriers() to accept the parent function and conditionally skip backward barriers for single-wave cases
Added comprehensive test coverage including unit tests and end-to-end tests for various scheduleVersions and data types

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
mlir/lib/Dialect/Rock/Transforms/RockPipeline.cpp	Core implementation: added barrier optimization logic and modified placeBarriers signature
mlir/test/Dialect/Rock/test_rock_pipeline_wave_barrier.mlir	Unit tests covering single-wave (scheduleVersion 1,3), multi-wave, and different scheduleVersions
mlir/test/Dialect/Rock/rock-pipeline-early-exit.mlir	Added arch attribute to existing test function to ensure compatibility
mlir/test/e2e/GemmOneWaveBarrier.toml	E2E test for single-wave GEMM with scheduleVersion=1
mlir/test/e2e/GemmOneWaveBarrierDirectToLDS.toml	E2E test for single-wave GEMM with scheduleVersion=3 (DirectToLDS)
mlir/test/e2e/GemmOneWaveBarrierFp8.toml	E2E test for single-wave GEMM with fp8 data types
mlir/test/e2e/PrGemmOneWaveBarrier.toml	PR-specific test for single-wave GEMM optimization
mlir/test/e2e/PrGemmOneWaveBarrierDirectToLDS.toml	PR-specific test for DirectToLDS variant
mlir/test/e2e/*.cfg	Configuration files specifying hardware requirements for each test
mlir/test/e2e/CMakeLists.txt	Added new test suites to the build system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlir/lib/Dialect/Rock/Transforms/RockPipeline.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

dhernandez0 · 2026-02-26T13:43:18Z

Do we need barriers at all if there's a single wave per workgroup?
Also, please can you check if the barrier actually exists in assembly anyway? I think we pass known_block_size (or something like that) so LLVM might be removing the barriers for us?

umangyadav · 2026-02-26T13:58:34Z

Do we need barriers at all if there's a single wave per workgroup?
Also, please can you check if the barrier actually exists in assembly anyway? I think we pass known_block_size (or something like that) so LLVM might be removing the barriers for us?

yes i don't think this would be necessary. I'll run some checks

Add single barrier for single wave kernels

5732954

umangyadav requested a review from causten as a code owner February 24, 2026 20:27

umangyadav self-assigned this Feb 24, 2026

umangyadav requested review from Copilot, dhernandez0, justinrosner, pabloantoniom and stefankoncarevic February 24, 2026 20:28

Copilot started reviewing on behalf of umangyadav February 24, 2026 20:31 View session

just use CHECK

e083932

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Apply suggestions from code review

25dd1ad

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

umangyadav marked this pull request as draft February 26, 2026 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only add one LDS Barrier for one wave blockSizes#2253

Only add one LDS Barrier for one wave blockSizes#2253
umangyadav wants to merge 3 commits intobarrierFixfrom
barrierForOneWave

umangyadav commented Feb 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dhernandez0 commented Feb 26, 2026

Uh oh!

umangyadav commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

umangyadav commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dhernandez0 commented Feb 26, 2026

Uh oh!

umangyadav commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

umangyadav commented Feb 24, 2026 •

edited

Loading