Skip to content

Conversation

@tsu-bin
Copy link
Contributor

@tsu-bin tsu-bin commented Dec 12, 2025

Suppose I have a small helper function to create a tiled_copy automatically with optimal thread tile layout, and I want to this process happen during compile time, here is an example.

template <int thr_num, int buf_rows, int buf_cols,
    template<class> class CopyOpType, class CopyAsType, class OrigType>
auto constexpr make_tiled_cp()
{
  constexpr auto thr_rows = std::min(thr_num, buf_rows);
  constexpr auto thr_cols = thr_num / thr_rows;

  auto thr_tile = make_layout(
      make_shape(Int<thr_rows>{}, Int<thr_cols>{}), LayoutLeft{});

  return make_tiled_copy(
          Copy_Atom<CopyOpType<CopyAsType>, OrigType>{},
          thr_tile,
          Layout<Shape<Int<buf_rows/thr_rows>, Int<buf_cols/thr_cols>>>{});
}

I call this helper function inside the kernel.

constexpr auto g2s_async_cp_q = make_tiled_cp<k_thr_num, CTA_Q, HEAD_DIM, SM80_CP_ASYNC_CACHEGLOBAL, cute::uint128_t, int8_t>();

But currently make_tiled_copy lacks constexpr specifier, I think it would be helpful to add this.

@hwu36 hwu36 merged commit 3d9de19 into NVIDIA:main Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants