Replace lambdas with named functors in transform_tensor_descriptor by tenpercent · Pull Request #3589 · ROCm/composable_kernel

tenpercent · 2026-01-16T06:12:14Z

Summary

Replace lambdas with named functors in transform_tensor_descriptor
Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Changes

Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
Use generate_identity_sequences in matrix_padder.hpp

Why It Works

Each lambda creates a unique closure type, causing transform_tensor_descriptor to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Test Plan

Waiting for full CI

PR Stack

#	PR	Description
1	#3585	sequence_gen with `__make_integer_seq`
2	#3588	generate_identity_sequences helper
3	#3589	Named functors in transform_tensor_descriptor
4	#3590	container_concat optimization
5	#3596	O(1) pack expansion rewrites
6	#3600	TensorDescriptor/TensorAdaptor lambda elimination

Tracking issue: #3575

Lambda expressions in transform_tensor_descriptor created unique template instantiations for each capture combination. This change replaces lambdas with named functor structs to reduce instantiation count: - Add merge_sequences_functor and unpack_and_merge_sequences helper - Add convert_visible_to_hidden_id and convert_visible_ids_to_hidden_ids - Add generate_arithmetic_sequence_from_scan Build analysis shows instantiation count dropped from 388 to 32 (92% reduction).

tenpercent · 2026-01-22T00:20:44Z

Closing this PR as it has been merged with #3588 into the new PR #3628.

The combined PR includes all functionality from both PRs plus unit tests, and targets develop directly instead of being part of a stacked PR chain.

@tenpercent

… functors (#4828) ## Summary - Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>` - Replace lambdas with named functors in `transform_tensor_descriptor` - Add `unpack_and_merge_sequences` helper functor - Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction) ## Motivation Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations. Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation. ## Changes ### Part 1: generate_identity_sequences helper - Replaces common lambda pattern for generating identity sequences - Each lambda expression creates a unique closure type, causing separate template instantiations at every call site - Named helper shares a single type across all uses ### Part 2: Named functors in transform_tensor_descriptor - Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension` - Use `generate_identity_sequences` in `matrix_padder.hpp` ## Test Plan - [x] Added 7 unit tests: - 4 tests for `generate_identity_sequences` - 3 tests for `unpack_and_merge_sequences` - [ ] Waiting for full CI ## Related PRs This PR merges the functionality from: - ROCm/composable_kernel#3588 (generate_identity_sequences helper) - ROCm/composable_kernel#3589 (Named functors in transform_tensor_descriptor) Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times) **Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged. --- 🔁 Imported from [ROCm/composable_kernel#3628](ROCm/composable_kernel#3628) 🧑‍💻 Originally authored by @tenpercent Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Add generate_identity_sequences helper and replace lambdas with named functors (#4828) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary - Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>` - Replace lambdas with named functors in `transform_tensor_descriptor` - Add `unpack_and_merge_sequences` helper functor - Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction) ## Motivation Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations. Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation. ## Changes ### Part 1: generate_identity_sequences helper - Replaces common lambda pattern for generating identity sequences - Each lambda expression creates a unique closure type, causing separate template instantiations at every call site - Named helper shares a single type across all uses ### Part 2: Named functors in transform_tensor_descriptor - Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension` - Use `generate_identity_sequences` in `matrix_padder.hpp` ## Test Plan - [x] Added 7 unit tests: - 4 tests for `generate_identity_sequences` - 3 tests for `unpack_and_merge_sequences` - [ ] Waiting for full CI ## Related PRs This PR merges the functionality from: - #3588 (generate_identity_sequences helper) - #3589 (Named functors in transform_tensor_descriptor) Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times) **Note:** This PR supersedes #4283, #3588 and #3589, which can be closed once this is merged.

tenpercent requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, vidyasagar-amd and vpietila-amd as code owners January 16, 2026 06:12

tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch 3 times, most recently from 748497a to 885b80f Compare January 16, 2026 06:31

tenpercent marked this pull request as draft January 16, 2026 16:31

tenpercent mentioned this pull request Jan 16, 2026

Replace O(N) recursive sequence_map_inverse with O(1) pack expansion #3596

Closed

1 task

tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from 885b80f to 0791bad Compare January 16, 2026 20:16

tenpercent force-pushed the tenpercent/generate-identity-sequences branch from ef35913 to 7c37209 Compare January 16, 2026 20:16

tenpercent mentioned this pull request Jan 16, 2026

Replace nested static_for lambdas with compile-time search helper #3600

Closed

2 tasks

tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from 0791bad to b26ed88 Compare January 17, 2026 03:37

tenpercent marked this pull request as ready for review January 17, 2026 03:41

tenpercent force-pushed the tenpercent/generate-identity-sequences branch from 7c37209 to d7e7fbd Compare January 17, 2026 03:51

tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from b26ed88 to 00849ac Compare January 17, 2026 03:51

This was referenced Jan 19, 2026

Add unit tests for template optimization helpers #3610

Closed

Add generate_identity_sequences helper and replace lambdas with named functors #3628

Closed

tenpercent closed this Jan 22, 2026

tenpercent mentioned this pull request Feb 23, 2026

Add generate_identity_sequences helper and replace lambdas with named functors ROCm/rocm-libraries#4828

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace lambdas with named functors in transform_tensor_descriptor#3589

Replace lambdas with named functors in transform_tensor_descriptor#3589
tenpercent wants to merge 1 commit intotenpercent/generate-identity-sequencesfrom
mpodkory/transform-tensor-descriptor-optimization

tenpercent commented Jan 16, 2026 •

edited

Loading

Uh oh!

tenpercent commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tenpercent commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why It Works

Test Plan

PR Stack

Uh oh!

tenpercent commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tenpercent commented Jan 16, 2026 •

edited

Loading