Replace lambdas with named functors in transform_tensor_descriptor#3589
Closed
tenpercent wants to merge 1 commit intotenpercent/generate-identity-sequencesfrom
Closed
Conversation
748497a to
885b80f
Compare
This was referenced Jan 16, 2026
1 task
885b80f to
0791bad
Compare
ef35913 to
7c37209
Compare
2 tasks
0791bad to
b26ed88
Compare
Lambda expressions in transform_tensor_descriptor created unique template instantiations for each capture combination. This change replaces lambdas with named functor structs to reduce instantiation count: - Add merge_sequences_functor and unpack_and_merge_sequences helper - Add convert_visible_to_hidden_id and convert_visible_ids_to_hidden_ids - Add generate_arithmetic_sequence_from_scan Build analysis shows instantiation count dropped from 388 to 32 (92% reduction).
7c37209 to
d7e7fbd
Compare
b26ed88 to
00849ac
Compare
This was referenced Jan 19, 2026
Contributor
Author
This was referenced Feb 10, 2026
2 tasks
shumway
pushed a commit
to ROCm/rocm-libraries
that referenced
this pull request
Feb 28, 2026
… functors (#4828) ## Summary - Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>` - Replace lambdas with named functors in `transform_tensor_descriptor` - Add `unpack_and_merge_sequences` helper functor - Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction) ## Motivation Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations. Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation. ## Changes ### Part 1: generate_identity_sequences helper - Replaces common lambda pattern for generating identity sequences - Each lambda expression creates a unique closure type, causing separate template instantiations at every call site - Named helper shares a single type across all uses ### Part 2: Named functors in transform_tensor_descriptor - Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension` - Use `generate_identity_sequences` in `matrix_padder.hpp` ## Test Plan - [x] Added 7 unit tests: - 4 tests for `generate_identity_sequences` - 3 tests for `unpack_and_merge_sequences` - [ ] Waiting for full CI ## Related PRs This PR merges the functionality from: - ROCm/composable_kernel#3588 (generate_identity_sequences helper) - ROCm/composable_kernel#3589 (Named functors in transform_tensor_descriptor) Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times) **Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged. --- 🔁 Imported from [ROCm/composable_kernel#3628](ROCm/composable_kernel#3628) 🧑💻 Originally authored by @tenpercent Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
assistant-librarian bot
pushed a commit
that referenced
this pull request
Feb 28, 2026
Add generate_identity_sequences helper and replace lambdas
with named functors (#4828)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)
## Motivation
Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.
Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.
## Changes
### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses
### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`
## Test Plan
- [x] Added 7 unit tests:
- 4 tests for `generate_identity_sequences`
- 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI
## Related PRs
This PR merges the functionality from:
- #3588 (generate_identity_sequences helper)
- #3589 (Named functors in
transform_tensor_descriptor)
Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)
**Note:** This PR supersedes #4283, #3588 and
#3589, which can be closed once this is merged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
transform_tensor_descriptortransform_tensor_descriptorinstantiations from 388 to 32 (92% reduction)Changes
unpack_and_merge_sequenceshelper to replace lambda inGetNumOfHiddenDimensiongenerate_identity_sequencesinmatrix_padder.hppWhy It Works
Each lambda creates a unique closure type, causing
transform_tensor_descriptorto be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.Test Plan
PR Stack
__make_integer_seqTracking issue: #3575