Skip to content

[AIROCMLIR-551] Completely support as_underlying_shape and as_logical_shape#2265

Open
Mr-Anyone wants to merge 5 commits intodevelopfrom
pr-migraphx-as-underlying-logical
Open

[AIROCMLIR-551] Completely support as_underlying_shape and as_logical_shape#2265
Mr-Anyone wants to merge 5 commits intodevelopfrom
pr-migraphx-as-underlying-logical

Conversation

@Mr-Anyone
Copy link
Member

Motivation

Being able to support transposed memory layout, long strides in both as_underlying_shape and as_logical_shape.

Furthermore, in as_underlying_shape, broadcasting is supported.

Technical Details

This one PR essentially implements these two commits: #2198 , this one.

as_logical_shape is implemented as the following:

  1. Reshape from flat memory layout to memory layout
  2. Check the stride permutation, and invert it to match the logical shape
  3. If the memory layout tensor is "greater in dimension" than the logical dimension, it means that we have a long stride layout so we emit tensor.extract_from_slice to get the logical shape
  4. Broadcast if needed

as_underlying_shape is implemented as the following:

  1. Reshape to memory based on stride permutation
  2. If there are long strides, we emit a tensor.insert_slice with a tensor.empty_op
  3. If there is broadcasting, we error out.

Test Plan

Added e2e and lit test.

Test Result

Passed both e2e and lit test.

Submission Checklist

@Mr-Anyone Mr-Anyone requested a review from causten as a code owner March 2, 2026 21:15
@Mr-Anyone
Copy link
Member Author

Mr-Anyone commented Mar 2, 2026

Also, note that the code structure is really similar to the one found in the tosa pipeline if that makes review easier.

// CHECK-DAG: %[[expanded:.*]] = tensor.expand_shape %[[arg1]]
// CHECK-DAG: %[[expanded_0:.*]] = tensor.expand_shape %[[arg0]]
// CHECK-DAG: linalg.sub ins(%[[expanded_0]], %[[expanded]] {{.*}})
// CHECK-DAG: linalg.sub ins(%[[arg0]], %[[arg1]] {{.*}})
Copy link
Member Author

@Mr-Anyone Mr-Anyone Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually improves the previous IR. Notice that the input shape !migraphx.shaped<16xf32, 1> get converted into tensor<16xf32> which is the memory layout shape. In that case, we don't emit tensor.expand_shape.

Before:

  func.func @func_sub(%arg0: tensor<16xf32>, %arg1: tensor<16xf32>) -> tensor<16xf32> {
    %expanded = tensor.expand_shape %arg1 [[0]] output_shape [16] : tensor<16xf32> into tensor<16xf32>
    %expanded_0 = tensor.expand_shape %arg0 [[0]] output_shape [16] : tensor<16xf32> into tensor<16xf32>
    %0 = tensor.empty() : tensor<16xf32>
    %1 = linalg.sub ins(%expanded_0, %expanded : tensor<16xf32>, tensor<16xf32>) outs(%0 : tensor<16xf32>) -> tensor<16xf32>
    %collapsed = tensor.collapse_shape %1 [[0]] : tensor<16xf32> into tensor<16xf32>
    return %collapsed : tensor<16xf32>
  }

After:

  func.func @func_sub(%arg0: tensor<16xf32>, %arg1: tensor<16xf32>) -> tensor<16xf32> {
    %0 = tensor.empty() : tensor<16xf32>
    %1 = linalg.sub ins(%arg0, %arg1 : tensor<16xf32>, tensor<16xf32>) outs(%0 : tensor<16xf32>) -> tensor<16xf32>
    return %1 : tensor<16xf32>
  }

ConversionPatternRewriter &rewriter) const {
/// The linalg-to-rock passes emits the following expression
/// for expanding the strides. We are matching the following IR
/// clang-format off
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, but why do we need to turn clang-format off here? It doesn't look like the comment is too long...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it is ok. Usually, two lines of mlir gets merged into one line if you don't turn off clang-format

Comment on lines +167 to +168
/// %empty = tensor.empty() : ....
/// %inserted_slice = tensor.insert_slice %actual_data into %empty ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expand strides case the only case that will produce an empty tensor + insert_slice? This seems quite broad, and potentially susceptible to pattern matching unwanted scenarios?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true indeed true, but my understanding is that empty tensor + insert_slice is the same as a rock.expand_strides. rock.expand_strides is the same as extract_slice (of the result) + memcpy which I think has the same semantics as above.

I think the we have to emit attribute during migraphx to linalg if we only want some case of the tensor.insert_slice to be lowered into rock. Semantically,

#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Statistic.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this header file still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed them.

@@ -0,0 +1,64 @@
// RUN: rocmlir-opt -split-input-file --migraphx-to-linalg -verify-diagnostics %s | FileCheck %s

// testcase taken from migraphx-to-tosa-non-contiguous-strides.mlir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to mention this here as this comment will no longer make sense once the Tosa path is removed. This probably makes more sense to have somewhere in the PR description.

RankedTensorType transposedType =
dyn_cast<RankedTensorType>(result.getType());
if (transposedType.getShape() != ArrayRef(slicingShape)) {
SmallVector<int64_t, 4> starts(permutation.size(), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we remove this unused variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, as there are no compiler warnings.

}

RankedTensorType transposedType =
dyn_cast<RankedTensorType>(result.getType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a null check after using dyn_cast in case we get a null value.

if (isPermutationStandardForm(permutation)) {
// reshape into memory layout type
RankedTensorType memoryType = inType.asMemoryLayoutTensor();
if (result.getType() != inType.asMemoryLayoutTensor()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (result.getType() != inType.asMemoryLayoutTensor()) {
if (result.getType() != memoryType) {

migraphx::MIXRShapedType resultType = op.getResult().getType();
auto resultTensorType =
cast<RankedTensorType>(getTypeConverter()->convertType(resultType));
RankedTensorType inTensorType = dyn_cast<RankedTensorType>(in.getType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with the dyn_cast with no nullptr check

@Mr-Anyone Mr-Anyone requested a review from justinrosner March 4, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants