Add support for non-contiguous tensors from MIGraphX #2198

justinrosner · 2026-01-06T18:51:02Z

Motivation

This PR adds support for non-contiguous (long stride) tensors in the MIGraphX -> Rock pipeline. Previously ops would fail with the error writing to tensors with long strides or broadcasts is unsupported when the output tensor had padded strides.

This implements: #2195

Technical Details

MIGraphXToTosa Changes

Instead of rejecting non-contiguous output strides, we now handle them by creating a tosa.CustomOp for expanded_strides.

TosaToRock (GPU lowering Path)

Lower to rock.expand_strides op.

RocmlirCustomTosaDecompose (CPU lowering path)

Lower to an empty larger tensor + tensor.insert_slice to put the values in the correct spots in the expanded memory layout. Note, if we went with this approach for the GPU path as well then it would create memref.sub_view ops that would need to be handled in AlignTiling (had this approach in a previous iteration of this PR).

LowerExpandStrides

Created a new pass that runs after bufferization has occurred that lowers a rock.expand_strides to a rock.transform + memref.copy

Test Plan

Add new LIT tests
Night CI

Test Result

Nightly CI

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR adds support for non-contiguous (long stride) tensors in the MIGraphX to Rock compilation pipeline, addressing issue #2195. Previously, operations would fail with the error "writing to tensors with long strides or broadcasts is unsupported" when output tensors had padded strides.

Key changes:

MIGraphXToTosa conversion now handles long strides by creating padded tensors and using tensor.insert_slice to place computed results
AlignTiling pass enhanced to convert memref.subview operations into equivalent Pad transforms for proper handling of non-contiguous memory layouts
Error messaging improved to distinguish between unsupported broadcasts vs. now-supported long strides

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`mlir/test/fusion/pr-e2e/mixr-non-contiguous-strides.mlir`	E2E test verifying dot+sigmoid fusion with non-contiguous output strides (50% initialized memory)
`mlir/test/Dialect/Rock/align-tiling-non-contiguous-strides.mlir`	Test ensuring Pad transforms are created for subviews and memref.copy operations are eliminated
`mlir/test/Conversion/MIGraphXToTosa/migraphx-to-tosa-non-contiguous-strides.mlir`	Test verifying tensor.insert_slice and reshape generation for long-stride outputs
`mlir/lib/Dialect/Rock/Transforms/AlignTiling.cpp`	Implements subviewToPadTransform and buildViewChainWithSubviews to handle non-contiguous strides through pad transforms
`mlir/lib/Conversion/MIGraphXToTosa/MIGraphXToTosa.cpp`	Modified AsUnderlyingShapeConverter to create padded tensors with insert_slice for long-stride outputs
`mlir/lib/Conversion/MIGraphXToTosa/MIGraphXToTosaPass.cpp`	Adds TensorDialect to legal dialects to support tensor.insert_slice operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlir/lib/Dialect/Rock/Transforms/AlignTiling.cpp

mlir/test/Conversion/MIGraphXToTosa/migraphx-to-tosa-non-contiguous-strides.mlir

mlir/lib/Conversion/MIGraphXToTosa/MIGraphXToTosa.cpp

mlir/lib/Dialect/Rock/Transforms/AlignTiling.cpp

dhernandez0 · 2026-01-07T12:17:48Z

mlir/test/fusion/pr-e2e/mixr-non-contiguous-strides.mlir

+
+// Only half of the results will be correct since the non-contiguous strides
+// in this example means that half of the memory is uninitialized.
+// CHECK: relDiff = 0     : 2304/4608 (50.000000%)


nit: you can use prefill to set all values to 0?

What do you mean by use prefill here? Are you talking about making test changes? Or changes to MIGraphXToTosa?

test changes. Could we use prefill setting in this test so that the other values are init to zero?

Does rocmlir-gen currently have a flag that lets us do such a thing? I was looking around but couldn't see anything, so not sure if you are aware of something that I'm not. If we don't have this feature yet, I can create a ticket for us to do this as I think it would be quite useful in some cases (this one included)

mlir/include/mlir/Dialect/MIGraphX/IR/MIGraphX.td

mlir/lib/Conversion/MIGraphXToTosa/MIGraphXToTosa.cpp

mlir/lib/Dialect/Rock/Transforms/AlignTiling.cpp

This reverts commit 7351030.

pabloantoniom

Code looks good to me. Only one comment and some nitpick. Also, I find the term "non-packed strides" a bit weird. Isn't "non contiguous tensors" more precise?

pabloantoniom · 2026-01-15T07:15:25Z

mlir/test/Conversion/MIGraphXToTosa/migraphx-to-tosa-non-contiguous-strides.mlir

+  // CHECK: tosa.custom %[[SIGMOID]] {domain_name = "rocmlir", implementation_attrs = "", operator_name = "expand_strides"}
+  // CHECK-SAME: (tensor<4x24x24xf16>) -> tensor<4x48x24xf16>
+  // CHECK: tosa.reshape
+  // CHECK-SAME: -> tensor<4608xf16>


This is a bit confusing to me.

Originally the function returns !migraphx.shaped<4x24x24xf16, 1152x24x1>, but after migraphx-to-tosa your approach makes it return tensor<4608xf16>. Is that valid? We are changing not only !migraphx.shaped with tensor but more importantly the shape information. The former has 2304 elements while the latter has 2x. And the key problem I see is that the tensor does not explicitly say that there is a non-unit stride.

That could break in some cases right?

Is migraphx safe if we change the memory layout of the output tensor like this?

Can you check what would happen if we have something else after the migraphx.sigmoid, would it still work?

To answer your questions:

I talked with Paul and MIGraphX is expecting the memory layout of the output to be in this state. Specifically they are looking for that strided pattern where the first 576 elements are the actual values and the second 576 are uninitialized padding. The larger tensor with 2304 extra elements is expected.

I created some sample code to test this out:

... %1 = migraphx.sigmoid %0 : <4x24x24xf16, 576x24x1> -> <4x24x24xf16, 1152x24x1> %cst = migraphx.literal(dense<1.0> : tensor<1xf16>) : <1xf16, 1> %2 = migraphx.multibroadcast %cst {out_dyn_dims = [], out_lens = [4, 24, 24]} : <1xf16, 1> -> <4x24x24xf16, 1152x24x1> %3 = migraphx.add %1, %2 : <4x24x24xf16, 1152x24x1>, <4x24x24xf16, 1152x24x1> -> <4x24x24xf16, 1152x24x1> return %3 : !migraphx.shaped<4x24x24xf16, 1152x24x1>

It looks like this generates the following IR after MIGraphXToTosa:

%8 = tosa.sigmoid %7 : (tensor<4x24x24xf16>) -> tensor<4x24x24xf16> %9 = "tosa.const"() <{values = dense<1.000000e+00> : tensor<4x24x24xf16>}> : () -> tensor<4x24x24xf16> %10 = tosa.add %8, %9 : (tensor<4x24x24xf16>, tensor<4x24x24xf16>) -> tensor<4x24x24xf16> %11 = tosa.custom %10 {domain_name = "rocmlir", implementation_attrs = "", operator_name = "expand_strides"} : (tensor<4x24x24xf16>) -> tensor<4x48x24xf16> %12 = tosa.const_shape {values = dense<4608> : tensor<1xindex>} : () -> !tosa.shape<1> %13 = tosa.reshape %11, %12 : (tensor<4x48x24xf16>, !tosa.shape<1>) -> tensor<4608xf16> return %13 : tensor<4608xf16>

The expand_strides gets pushed to after the add which seems correct? I'm trying to think of other cases where this may break...

Okay, if that is what MIGraphX wants I think it makes sense. This might break in some case but that, if it ever breaks, it's definitely not for this PR

mlir/lib/Dialect/Rock/Transforms/LowerExpandStrides.cpp

justinrosner · 2026-01-15T14:18:34Z

Code looks good to me. Only one comment and some nitpick. Also, I find the term "non-packed strides" a bit weird. Isn't "non contiguous tensors" more precise?

I'll update the Title of this PR and the description!

dhernandez0 · 2026-01-15T14:23:31Z

mlir/lib/Conversion/TosaToRock/TosaToRock.cpp

+    auto outputType = cast<RankedTensorType>(op.getResult(0).getType());
+
+    // Allocate the destination tensor with the larger (padded) size
+    Value dest =


so cpu and gpu paths will be diff here? cpu will lower it to tensor ops (mlir/lib/Conversion/RocmlirCustomTosaDecompose/RocmlirCustomTosaDecompose.cpp) and this will be the gpu path? or am I getting confused?

Correct, TosaToRock will generate rock.expand_stride ops for the GPU path, and the CPU path will generate tensor ops in RocmlirCustomTosaDecompose.

dhernandez0 · 2026-01-15T14:24:26Z

mlir/lib/Dialect/Rock/IR/RockDialect.cpp

+  // Verify element types match
+  if (inputType.getElementType() != outputType.getElementType())
+    return emitOpError("input and output must have the same element type");
+


is there any other restriction? what happens if more than one dimension is different?

More than one dimension being different is allowed. We also check that output is >= input in all dimensions and that each output dimension is a multiple of the input dimension

dhernandez0 · 2026-01-15T14:32:03Z

mlir/test/Dialect/Rock/lowering-expand-strides.mlir

+    %alloc_1 = memref.alloc() {alignment = 64 : i64} : memref<4x48x24xf16>
+    rock.expand_strides %alloc_0 into %alloc_1 : memref<4x24x24xf16> into memref<4x48x24xf16>
+    // CHECK: %[[TRANSFORM:.*]] = rock.transform %alloc_1 {{.*}} memref<4x48x24xf16> to memref<4x24x24xf16>
+    // CHECK: memref.copy %alloc_0, %[[TRANSFORM]] : memref<4x24x24xf16> to memref<4x24x24xf16>


so, the final IR ends up with to memref.copy, one from alloc_0 to alloc_1, then alloc_1 to arg2. Does how code handle that correctly? what happens if there are more memref.copys?

would it work if the first arg to rock.expand_strides is a transform map as well?

The two copies are independent. The rock.expand_strides lowering to a memref.copy will always just populate the valid portion of the expanded buffer. The second copy is just moving the entire buffer, with the uninitialized gaps, to the output (this was already present in the pre-transform IR). If there are more memref.copy's in the IR that should not have an impact on the expand_stride lowering since each one is independently lowered to transform + copy. Other memref.copy ops in the IR are unaffected, they're just separate memory operations that happen to operate on the same or related buffers.

As for the question about the input to expand_strides being a rock.transform, I don't think that should make a difference. expand_strides doesn't care where the input is coming from, just as long as that input is a valid memref value. The memref.copy would copy from the transformed input (from the rock.transform) to the transformed/expanded output.

justinrosner · 2026-01-15T18:39:54Z

Passing PR CI: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2198/18/pipeline-overview/?selected-node=1678

justinrosner requested review from dhernandez0, pabloantoniom and umangyadav January 6, 2026 18:51

justinrosner requested a review from causten as a code owner January 6, 2026 18:51

Copilot AI review requested due to automatic review settings January 6, 2026 18:51

justinrosner linked an issue Jan 6, 2026 that may be closed by this pull request

"Error: 'migraphx.mlir.as.underlying.shape' op writing to tensors with long strides or broadcasts is unsupported" when using non-packed output buffers #2195

Closed

Copilot started reviewing on behalf of justinrosner January 6, 2026 18:53 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

dhernandez0 reviewed Jan 7, 2026

View reviewed changes

justinrosner requested a review from dhernandez0 January 7, 2026 16:14

pfultz2 mentioned this pull request Jan 7, 2026

Refactor eliminate_concat ROCm/AMDMIGraphX#4320

Open

dhernandez0 reviewed Jan 12, 2026

View reviewed changes

justinrosner added 10 commits January 13, 2026 14:58

Initial implementation of non-packed strides

c0f1d59

Minor refactoring changes

983d17f

Add LIT tests

6fd0efe

Remove some comments

6a66a04

Clang-format

7a945b8

Attend to Copilot comments

56f3dff

More clang-format

e9fc898

Attend to review comments

51c60ce

Clang format

1a56ad3

Add migraphx.expand_strides

7351030

justinrosner force-pushed the 2195-non-packed-strides branch from 65c6e1f to 7351030 Compare January 13, 2026 15:03

justinrosner added 4 commits January 13, 2026 15:04

Revert "Add migraphx.expand_strides"

35fe1fc

This reverts commit 7351030.

More review changes

ef73cb0

More review changes

f7ebe53

Clang-format

600cd4f

justinrosner requested review from Copilot and dhernandez0 January 13, 2026 19:20

Merge branch 'develop' into 2195-non-packed-strides

8346b99

pabloantoniom reviewed Jan 15, 2026

View reviewed changes

Fix comment in LowerExpandStrides

81cd414

justinrosner requested a review from pabloantoniom January 15, 2026 14:16

justinrosner changed the title ~~Add support for non-packed strides from MIGraphX~~ Add support for non-contiguous tensors from MIGraphX Jan 15, 2026

dhernandez0 reviewed Jan 15, 2026

View reviewed changes

justinrosner requested a review from dhernandez0 January 15, 2026 14:54

Merge branch 'develop' into 2195-non-packed-strides

9796e69

dhernandez0 approved these changes Jan 15, 2026

View reviewed changes

Merge branch 'develop' into 2195-non-packed-strides

96d1ccb

pabloantoniom approved these changes Jan 15, 2026

View reviewed changes

justinrosner merged commit 687737e into develop Jan 15, 2026
15 checks passed

justinrosner deleted the 2195-non-packed-strides branch January 15, 2026 18:54

Add support for non-contiguous tensors from MIGraphX #2198

Add support for non-contiguous tensors from MIGraphX #2198

Conversation

justinrosner commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

MIGraphXToTosa Changes

TosaToRock (GPU lowering Path)

RocmlirCustomTosaDecompose (CPU lowering path)

LowerExpandStrides

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinrosner Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pabloantoniom left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinrosner commented Jan 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinrosner Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinrosner commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

justinrosner commented Jan 6, 2026 •

edited

Loading

justinrosner Jan 12, 2026 •

edited

Loading

pabloantoniom left a comment •

edited

Loading

justinrosner Jan 15, 2026 •

edited

Loading