Skip to content

Refactored reduction kernels#618

Open
Micky774 wants to merge 1 commit into
devfrom
zain/ck/reduction-kernel-refactor
Open

Refactored reduction kernels#618
Micky774 wants to merge 1 commit into
devfrom
zain/ck/reduction-kernel-refactor

Conversation

@Micky774

@Micky774 Micky774 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Description

Merged similarly-structured kernels into a single templated kernel w/ re-usable dispatch, as well as improvements to logging.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

ck_fused_attn_utils.hpp

  • Add to_f32<T>() / from_f32<T>() device helpers to centralize bf16↔float casting.

ck_fused_attn_bwd.cpp

  • Merge 4 dk/dv kernels (dk_dv_reduce, dk_or_dv_reduce, *_thd) into one dkv_reduce<DataType, GroupMode, ReduceBoth>.
  • Merge 3 dbias kernels (dbias_reduce_{11ss,1hss,b1ss}) into one dbias_reduce<DataType, ReduceB, ReduceH>.
  • Add log_dkv_reduce / launch_dkv_reduce / launch_dbias_reduce helpers; collapse the verbose per-launch logging + dispatch blocks in ck_attn_bwd.
  • Drop unused s_kv kernel param.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-level 3 CI test level 3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants