Skip to content

Add generate_identity_sequences helper for common pattern#3588

Closed
tenpercent wants to merge 1 commit intotenpercent/old-ck-pack-rewritesfrom
tenpercent/generate-identity-sequences
Closed

Add generate_identity_sequences helper for common pattern#3588
tenpercent wants to merge 1 commit intotenpercent/old-ck-pack-rewritesfrom
tenpercent/generate-identity-sequences

Conversation

@tenpercent
Copy link
Contributor

@tenpercent tenpercent commented Jan 16, 2026

Summary

  • Add generate_identity_sequences<N>() helper that returns Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>
  • Replaces common lambda pattern used in tensor descriptor transforms

Motivation

Multiple call sites use generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{}) pattern. A named helper reduces lambda instantiations.

Why It Works

Each lambda expression creates a unique closure type. When passed to template functions like generate_tuple, this causes separate template instantiations at every call site. A named helper shares a single type across all uses.

Test Plan

  • Waiting for full CI

PR Stack

# PR Description
1 #3585 sequence_gen with __make_integer_seq
2 #3588 generate_identity_sequences helper
3 #3589 Named functors in transform_tensor_descriptor
4 #3590 container_concat optimization
5 #3596 O(1) pack expansion rewrites
6 #3600 TensorDescriptor/TensorAdaptor lambda elimination

Tracking issue: #3575

This adds an optimized helper for the common generate_tuple pattern:
generate_tuple([](auto i) { return Sequence<i.value>{}; }, N)

The new generate_identity_sequences<N>() function creates
Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>> without
requiring lambda instantiation at each call site.

Updated 21 call sites across threadwise_tensor_slice_transfer,
wrapper utilities, and layout files to use the new helper.

Build time improvement: ~1.1% wall-clock (18.3s -> 18.1s)
@tenpercent tenpercent force-pushed the tenpercent/old-ck-pack-rewrites branch from 57c8cb1 to 3d46680 Compare January 17, 2026 03:51
@tenpercent
Copy link
Contributor Author

Closing this PR as it has been merged with #3589 into the new PR #3628.

The combined PR includes all functionality from both PRs plus unit tests, and targets develop directly instead of being part of a stacked PR chain.

@tenpercent tenpercent closed this Jan 22, 2026
shumway pushed a commit to ROCm/rocm-libraries that referenced this pull request Feb 28, 2026
… functors (#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
assistant-librarian bot pushed a commit that referenced this pull request Feb 28, 2026
Add generate_identity_sequences helper and replace lambdas
 with named functors (#4828)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- #3588 (generate_identity_sequences helper)
- #3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, #3588 and
#3589, which can be closed once this is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant