This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/IR/
-
mlir/
-
Dialect/
-
Linalg/
-
IR/
1
LinalgStructuredOps.td
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
affine.mlir
-
loops.mlir
1
tile_conv.mlir

Differential D87781

Reorder linalg.conv indexing_maps loop order
ClosedPublic

Authored by asaadaldien on Sep 16 2020, 11:25 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
mravishankar

Commits

rG9b47525824df: Reorder linalg.conv indexing_maps loop order

Summary

Change the indexing map to iterate over the (b, x0, x1, z0, z1, q, k) instead of (b, x0, x1, k, q, z0, z1) to evaluate the convolution expression:
Y[b, x0, x1, k] = sum(W[z0, z1, q, k] * X[b, x0 + z0, x1 + z1, q], z0, z1, q)

This allows llvm auto vectorize to work and has better locality resulting significant performance improvments

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	260 ms	linux > LLVM.Other::change-printer.ll
	100 ms	windows > LLVM.Other::change-printer.ll

Event Timeline

asaadaldien created this revision.Sep 16 2020, 11:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2020, 11:25 AM

Herald added subscribers: tatianashp, msifontes, jurahul and 14 others. · View Herald Transcript

asaadaldien requested review of this revision.Sep 16 2020, 11:25 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald TranscriptSep 16 2020, 11:25 AM

Harbormaster completed remote builds in B71910: Diff 292282.Sep 16 2020, 11:53 AM

rebase.

Harbormaster completed remote builds in B71927: Diff 292337.Sep 16 2020, 2:31 PM

Looks good after comment is addressed.

mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
311	Is this more than 80 characters? Please adjust this. Also could you change the order of statements to this SmallVector<StringRef, 8> iters; iters.reserve(...); iters.resize(npar - getNumOutputFeatureDimensions(), getParallelIteratorTypeName()); iters.append(..); iters.append(..);

This revision is now accepted and ready to land.Sep 18 2020, 6:29 AM

Putting a blocker to get some discussion here.

The original definition was following TF, it is not anymore.
Do we consider this to not be a problem ?

Putting the loops in the "right" order is independent from the op spec.
You should be able to just use tiling without any op definition change (tile by 1-1-1-1-1 + perm for scalar loop-level).

More generally, the way I am approaching this longer term is that this magic "embed the world op" would disappear in favor of more, smaller named ops.
Transformations would look like: named_op1 -> generic form -> named_op2 and I think a lot of these things can be automated from the TC form.
My overall concern is whether we are going towards relying more and more on an op with very fat one-off semantics.

I won't oppose it strongly for now but the conv op will most likely be a prime candidate for deletion when the infra is more mature, so I'd be cautious on putting a lot of effort on transformations / optimizations / perf results if they don't generalize to linalg.generic.

Also, the doc would need to be updated too.

This revision now requires changes to proceed.Sep 18 2020, 7:01 AM

In D87781#2281741, @nicolasvasilache wrote:

Putting a blocker to get some discussion here.

The original definition was following TF, it is not anymore.
Do we consider this to not be a problem ?

Putting the loops in the "right" order is independent from the op spec.
You should be able to just use tiling without any op definition change (tile by 1-1-1-1-1 + perm for scalar loop-level).

More generally, the way I am approaching this longer term is that this magic "embed the world op" would disappear in favor of more, smaller named ops.
Transformations would look like: named_op1 -> generic form -> named_op2 and I think a lot of these things can be automated from the TC form.
My overall concern is whether we are going towards relying more and more on an op with very fat one-off semantics.

I won't oppose it strongly for now but the conv op will most likely be a prime candidate for deletion when the infra is more mature, so I'd be cautious on putting a lot of effort on transformations / optimizations / perf results if they don't generalize to linalg.generic.

Also, the doc would need to be updated too.

Actually a better way to have handled is to just lower it to generic form and do the interchange while lowering it, i.e. use the LinalgLoopInterchangePattern on linalg.conv operation. The issue with this AFAICS is

Since the indexing maps and iterator types are not mutable for named ops (as it should be) the linalg.conv needs to be lowered to its generic form while interchanging loops.
To do (1) there needs to be a way to get the body of the generic op. For the "legacy" named ops, these are hard-coded when they are lowered to loops. Instead the named ops need an interface that can be used to generate the body. If we had that, then I would favor the approach of keeping the conv definition as is and reordering the loops as a transformation.

In D87781#2281741, @nicolasvasilache wrote:

Putting a blocker to get some discussion here.

The original definition was following TF, it is not anymore.
Do we consider this to not be a problem ?

Which definition did we follow ? I am not familiar with explicit loop order def the only one I can see is the expression in linalg doc that doesn't imply the order, no ?

Putting the loops in the "right" order is independent from the op spec.
You should be able to just use tiling without any op definition change (tile by 1-1-1-1-1 + perm for scalar loop-level).

Maybe I am not following but this will require tiling the entire loop by 1 to reorder it ?

More generally, the way I am approaching this longer term is that this magic "embed the world op" would disappear in favor of more, smaller named ops.
Transformations would look like: named_op1 -> generic form -> named_op2 and I think a lot of these things can be automated from the TC form.
My overall concern is whether we are going towards relying more and more on an op with very fat one-off semantics.

I won't oppose it strongly for now but the conv op will most likely be a prime candidate for deletion when the infra is more mature, so I'd be cautious on putting a lot of effort on transformations / optimizations / perf results if they don't generalize to linalg.generic.

The reason I am adding this to make it competitive to other alternatives, more specifically I am working on Conv -> col2Im + matmul for IREE. When convoluting 1x7x7x512xf32 with filter 3x3x512x512 (Bottom layers of resnet50 with massive filter banks) without this change conv e2e CPU tile is 1718 ms with this change 28 ms which is better than my currently im2col implementation ~ 60 ms/img.

Can you please elaborate more about don't generalize to linalg.generic.

Also, the doc would need to be updated too.

In D87781#2281813, @mravishankar wrote:

In D87781#2281741, @nicolasvasilache wrote:

Putting a blocker to get some discussion here.

The original definition was following TF, it is not anymore.
Do we consider this to not be a problem ?

Putting the loops in the "right" order is independent from the op spec.
You should be able to just use tiling without any op definition change (tile by 1-1-1-1-1 + perm for scalar loop-level).

More generally, the way I am approaching this longer term is that this magic "embed the world op" would disappear in favor of more, smaller named ops.
Transformations would look like: named_op1 -> generic form -> named_op2 and I think a lot of these things can be automated from the TC form.
My overall concern is whether we are going towards relying more and more on an op with very fat one-off semantics.

I won't oppose it strongly for now but the conv op will most likely be a prime candidate for deletion when the infra is more mature, so I'd be cautious on putting a lot of effort on transformations / optimizations / perf results if they don't generalize to linalg.generic.

Also, the doc would need to be updated too.

Actually a better way to have handled is to just lower it to generic form and do the interchange while lowering it, i.e. use the LinalgLoopInterchangePattern on linalg.conv operation. The issue with this AFAICS is

Since the indexing maps and iterator types are not mutable for named ops (as it should be) the linalg.conv needs to be lowered to its generic form while interchanging loops.

To do (1) there needs to be a way to get the body of the generic op. For the "legacy" named ops, these are hard-coded when they are lowered to loops. Instead the named ops need an interface that can be used to generate the body. If we had that, then I would favor the approach of keeping the conv definition as is and reordering the loops as a transformation.

+1 I am favoring this approach I think having named_ops to implement the same trait as generic ops (indexing_maps attributes, iterator_types attributes, body block) will make applying transformations to linalgOp fairly easy.

Thanks for discussing, some comments.

Actually a better way to have handled is to just lower it to generic form and do the interchange while lowering it, i.e. use the LinalgLoopInterchangePattern on linalg.conv operation.
...
Since the indexing maps and iterator types are not mutable for named ops (as it should be) the linalg.conv needs to be lowered to its generic form while interchanging loops.

Yes, this is what I meant with named -> generic -> named.
Additionally we would "match" back from generic to named (if we needed it).
Both emitter and matchers should be auto-generated.

Maybe I am not following but this will require tiling the entire loop by 1 to reorder it ?

For that very specific case yes, but lowering to linalg.generic and permuting there would be more general.

Instead the named ops need an interface that can be used to generate the body.

We have that already, see here
What is not there is the matcher going from generic to a known named form (which may not be strictly needed in this use case).
I expect this matcher to just use the regionBuilder and compare the regions while allowing for permutations to be a little more robust.

Which definition did we follow ? I am not familiar with explicit loop order def the only one I can see is the expression in linalg doc that doesn't imply the order, no ?

Indeed, the source of truth is just about layout not loop order.

Can you please elaborate more about don't generalize to linalg.generic.

Basically, assume you have named_op1 and you want to get to X + named_op2 (examples: transposes for layout, peeling of some loops to rewrite conv as matmul, canonicalize some 1s away, permute some iterators etc).
The claim is we will likely want to go through linalg.generic perform the transformation there and then come back to named form.
The emitter / matcher from named to generic forms should be auto-generated as sketched above.

+1 I am favoring this approach I think having named_ops to implement the same trait as generic ops (indexing_maps attributes, iterator_types attributes, body block) will make applying transformations to linalgOp fairly easy.

We already have that, which is what I was trying to nudge you to think about :)

In general, I'd just say: do what is the simplest for now, with the understanding that we may have to change when we unify things.
But don't get too attached to a particular impl, if it requires specific op knowledge and does not go through the generic path as things will evolve as we understand more.

Of course if you have cycles to explore the named_ops to implement the same trait as generic ops will make applying transformations to linalgOp fairly easy, that would be a very cool (and much needed) step forward :) !

Rescinding my request changes, I'll let you guys decide the priorities.

Thanks!

nicolasvasilache accepted this revision.Sep 18 2020, 8:37 AM

This revision is now accepted and ready to land.Sep 18 2020, 8:37 AM

It's a bit odd to see the loop ordering of a *lowering conversion* change because the output improves performance with a specific backend/compiler and for specific reasons. Moreover, that way, this would keep changing/evolving and be sensitive to common downstream paths and sensitive to LLVM's opt pipeline. In the absence of any target / scheduling info, the order to choose is typically expected to be just the most intuitive / canonical and shouldn't keep changing. You need an optimization mechanism if you need a better one.

bondhugula added inline comments.Sep 18 2020, 1:03 PM

mlir/test/Dialect/Linalg/tile_conv.mlir
1	Why has the number of tile sizes increased here? If this is fixing another bug, please mention that in the summary.

In D87781#2282696, @bondhugula wrote:

It's a bit odd to see the loop ordering of a *lowering conversion* change because the output improves performance with a specific backend/compiler and for specific reasons. Moreover, that way, this would keep changing/evolving and be sensitive to common downstream paths and sensitive to LLVM's opt pipeline. In the absence of any target / scheduling info, the order to choose is typically expected to be just the most intuitive / canonical and shouldn't keep changing. You need an optimization mechanism if you need a better one.

The current lowering is expecting an NHWC o

In D87781#2281929, @nicolasvasilache wrote:

Thanks for discussing, some comments.

Actually a better way to have handled is to just lower it to generic form and do the interchange while lowering it, i.e. use the LinalgLoopInterchangePattern on linalg.conv operation.
...
Since the indexing maps and iterator types are not mutable for named ops (as it should be) the linalg.conv needs to be lowered to its generic form while interchanging loops.

Yes, this is what I meant with named -> generic -> named.
Additionally we would "match" back from generic to named (if we needed it).
Both emitter and matchers should be auto-generated.

Maybe I am not following but this will require tiling the entire loop by 1 to reorder it ?

For that very specific case yes, but lowering to linalg.generic and permuting there would be more general.

Instead the named ops need an interface that can be used to generate the body.

We have that already, see here
What is not there is the matcher going from generic to a known named form (which may not be strictly needed in this use case).
I expect this matcher to just use the regionBuilder and compare the regions while allowing for permutations to be a little more robust.

Which definition did we follow ? I am not familiar with explicit loop order def the only one I can see is the expression in linalg doc that doesn't imply the order, no ?

Indeed, the source of truth is just about layout not loop order.

Can you please elaborate more about don't generalize to linalg.generic.

Basically, assume you have named_op1 and you want to get to X + named_op2 (examples: transposes for layout, peeling of some loops to rewrite conv as matmul, canonicalize some 1s away, permute some iterators etc).
The claim is we will likely want to go through linalg.generic perform the transformation there and then come back to named form.
The emitter / matcher from named to generic forms should be auto-generated as sketched above.

+1 I am favoring this approach I think having named_ops to implement the same trait as generic ops (indexing_maps attributes, iterator_types attributes, body block) will make applying transformations to linalgOp fairly easy.

We already have that, which is what I was trying to nudge you to think about :)

In general, I'd just say: do what is the simplest for now, with the understanding that we may have to change when we unify things.
But don't get too attached to a particular impl, if it requires specific op knowledge and does not go through the generic path as things will evolve as we understand more.

Of course if you have cycles to explore the named_ops to implement the same trait as generic ops will make applying transformations to linalgOp fairly easy, that would be a very cool (and much needed) step forward :) !

Rescinding my request changes, I'll let you guys decide the priorities.

Thanks!

The more I think about global linalg transformations ( something that can work on base Type linalgOp) the more i think this refactoring is important. i will sneak in cycles to do it =)

In D87781#2282696, @bondhugula wrote:

It's a bit odd to see the loop ordering of a *lowering conversion* change because the output improves performance with a specific backend/compiler and for specific reasons. Moreover, that way, this would keep changing/evolving and be sensitive to common downstream paths and sensitive to LLVM's opt pipeline. In the absence of any target / scheduling info, the order to choose is typically expected to be just the most intuitive / canonical and shouldn't keep changing. You need an optimization mechanism if you need a better one.

I agree with you a lowering conversion shouldn't change because of specific perf/backend improvements. but this particular change take care of defaults (linear layouts, nhwc order...etc) so it sets the natural iteration order (n, h, w, kh, kw, ci, co) for most naive c++ multi array expression:
Y[n, h, w, co] += W[kh, kw, ci, co] * X[n, h + hk, w + kw, ci];

Comments...

This revision was landed with ongoing or failed builds.Sep 21 2020, 9:54 PM

Closed by commit rG9b47525824df: Reorder linalg.conv indexing_maps loop order (authored by asaadaldien). · Explain Why

This revision was automatically updated to reflect the committed changes.

asaadaldien added a commit: rG9b47525824df: Reorder linalg.conv indexing_maps loop order.

Harbormaster completed remote builds in B72469: Diff 293332.Sep 21 2020, 9:59 PM

antiagainst added a reverting change: D91796: Revert "Reorder linalg.conv indexing_maps loop order".Nov 19 2020, 6:39 AM

antiagainst added a reverting change: rG5b7bd89b3597: Revert "Reorder linalg.conv indexing_maps loop order".Nov 19 2020, 10:16 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

IR/

LinalgStructuredOps.td

11 lines

test/

Dialect/

Linalg/

affine.mlir

12 lines

loops.mlir

97 lines

tile_conv.mlir

2 lines

Diff 292282

mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td

Show First 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	ArrayAttr iterator_types() {
unsigned nPar = getOutputShapedType(0).getRank();		unsigned nPar = getOutputShapedType(0).getRank();
unsigned nRed = getNumInputFeatureDimensions();		unsigned nRed = getNumInputFeatureDimensions();
// Window loops are a special kind of reduction that is never tiled or		// Window loops are a special kind of reduction that is never tiled or
// parallelized across; i.e. [zs] in the TF notation above whose number		// parallelized across; i.e. [zs] in the TF notation above whose number
// match `xs` (i.e. 1 window loop per "image" dimension).		// match `xs` (i.e. 1 window loop per "image" dimension).
// This may evolve in the future.		// This may evolve in the future.
unsigned nWin =		unsigned nWin =
nPar - getNumBatchDimensions() - getNumInputFeatureDimensions();		nPar - getNumBatchDimensions() - getNumInputFeatureDimensions();
SmallVector<StringRef, 8> iters(nPar, getParallelIteratorTypeName());		SmallVector<StringRef, 8> iters(nPar - getNumOutputFeatureDimensions(), getParallelIteratorTypeName());
		mravishankarUnsubmitted Not Done Reply Inline Actions Is this more than 80 characters? Please adjust this. Also could you change the order of statements to this SmallVector<StringRef, 8> iters; iters.reserve(...); iters.resize(npar - getNumOutputFeatureDimensions(), getParallelIteratorTypeName()); iters.append(..); iters.append(..); mravishankar: Is this more than 80 characters? Please adjust this. Also could you change the order of…
iters.reserve(nPar + nRed + nWin);		iters.reserve(nPar + nRed + nWin);
iters.append(nRed, getReductionIteratorTypeName());		iters.append(nRed, getReductionIteratorTypeName());
iters.append(nWin, getWindowIteratorTypeName());		iters.append(nWin, getWindowIteratorTypeName());
		iters.append(getNumOutputFeatureDimensions(), getParallelIteratorTypeName());
return Builder(getContext()).getStrArrayAttr(iters);		return Builder(getContext()).getStrArrayAttr(iters);
}		}

// F(z0, ..., zN-1, q, k) *		// F(z0, ..., zN-1, q, k) *
// I(b, x0 + z0 - pad_low_0, ..., xN-1 + zN-1 - pad_low_N-1, q)		// I(b, x0 + z0 - pad_low_0, ..., xN-1 + zN-1 - pad_low_N-1, q)
// -> O(b, x0, ..., xN-1, k)		// -> O(b, x0, ..., xN-1, k)
// for N equal to `nWindow`. If there is no padding attribute, it will be		// for N equal to `nWindow`. If there is no padding attribute, it will be
// ignored.		// ignored.
ArrayAttr indexing_maps() {		ArrayAttr indexing_maps() {
MLIRContext *context = getContext();		MLIRContext *context = getContext();
auto nWin = getNumWindowLoops();		auto nWin = getNumWindowLoops();
assert(nWin > 0 && "expected at least one window dimension");		assert(nWin > 0 && "expected at least one window dimension");
unsigned idx = 0;		unsigned idx = 0;
// In the following, AffineDimExprs are indexed in loop order:		// In the following, AffineDimExprs are indexed in loop order:
// [ b, xs, k, q, zs]		// [ b, xs, k, q, zs]
// parallels non-window reductions windows		// parallels non-window reductions windows
//		//
// Parallel dims are exactly the dimensions indexing `output`:		// Parallel dims are exactly the dimensions indexing `output`:
// output[b, x[0], ..., x[N-1], k]; i.e.		// output[b, x[0], ..., x[N-1], k]; i.e.
// * batch dimensions (bs with #bs = 1 for now)		// * batch dimensions (bs with #bs = 1 for now)
// * "image" dimensions (xs with #xs = #zs = output_rank - #bs - #ks)		// * "image" dimensions (xs with #xs = #zs = output_rank - #bs - #ks)
// * output filter dimensions (ks with #ks = 1 for now)		// * output filter dimensions (ks with #ks = 1 for now)
auto bs = makeAffineDimExprs(getNumBatchDimensions(), idx, context);		auto bs = makeAffineDimExprs(getNumBatchDimensions(), idx, context);
auto xs = makeAffineDimExprs(nWin, idx, context);		auto xs = makeAffineDimExprs(nWin, idx, context);
auto ks = makeAffineDimExprs(		// Window reduction dims: sum_{z[0], ..., z[N-1], q}
getNumOutputFeatureDimensions(), idx, context);		auto zs = makeAffineDimExprs(nWin, idx, context);
// Non-window reduction dim: sum_{z[0], ..., z[N-1], q}		// Non-window reduction dim: sum_{z[0], ..., z[N-1], q}
auto qs = makeAffineDimExprs(		auto qs = makeAffineDimExprs(
getNumInputFeatureDimensions(), idx, context);		getNumInputFeatureDimensions(), idx, context);
// Window reduction dims: sum_{z[0], ..., z[N-1], q}		auto ks = makeAffineDimExprs(
auto zs = makeAffineDimExprs(nWin, idx, context);		getNumOutputFeatureDimensions(), idx, context);
// Construct the weighedSum expression.		// Construct the weighedSum expression.
auto ws = weightedPoolingInputIndex(*this, xs, zs);		auto ws = weightedPoolingInputIndex(*this, xs, zs);
return Builder(getContext()).getAffineMapArrayAttr({		return Builder(getContext()).getAffineMapArrayAttr({
// filter[z[0], ..., z[N-1], q, k]		// filter[z[0], ..., z[N-1], q, k]
AffineMap::get(idx, 0, concat(concat(zs, qs), ks), context),		AffineMap::get(idx, 0, concat(concat(zs, qs), ks), context),
// input[b,		// input[b,
// x[0]s[0] + d[0]z[0] - pad_low[0],		// x[0]s[0] + d[0]z[0] - pad_low[0],
// ...		// ...
▲ Show 20 Lines • Show All 468 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/affine.mlir

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	// CHECK: %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[$strided3D]]>) {			// CHECK: %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[$strided3D]]>) {
	// CHECK: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECK: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECK: %[[Q:.*]] = dim %arg0, %c1 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECK: %[[Q:.*]] = dim %arg0, %c1 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECK: %[[K:.*]] = dim %arg0, %c2 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECK: %[[K:.*]] = dim %arg0, %c2 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECK: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECK: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECK: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECK: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECK: affine.for %{{.*}} = 0 to %[[B]] {			// CHECK: affine.for %{{.*}} = 0 to %[[B]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[X0]] {			// CHECK: affine.for %{{.*}} = 0 to %[[X0]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[K]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[Q]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[Z0]] {			// CHECK: affine.for %{{.*}} = 0 to %[[Z0]] {
				// CHECK: affine.for %{{.*}} = 0 to %[[Q]] {
				// CHECK: affine.for %{{.*}} = 0 to %[[K]] {
	// CHECK: %[[SUM:.]] = affine.apply #[[$stride2Dilation1]](%{{.}}, %{{.*}})			// CHECK: %[[SUM:.]] = affine.apply #[[$stride2Dilation1]](%{{.}}, %{{.*}})
	// No padding needed here; only affine loads.			// No padding needed here; only affine loads.
	// CHECK-NEXT: affine.load			// CHECK-NEXT: affine.load
	// CHECK-NEXT: affine.load			// CHECK-NEXT: affine.load

	func @conv_padding(%arg0: memref<?x?x?x?xf32>,			func @conv_padding(%arg0: memref<?x?x?x?xf32>,
	%arg1: memref<?x?x?x?xf32>,			%arg1: memref<?x?x?x?xf32>,
	%arg2: memref<?x?x?x?xf32>) {			%arg2: memref<?x?x?x?xf32>) {
	Show All 11 Lines
	// CHECK: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32>			// CHECK: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32>
	// CHECK: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32>			// CHECK: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32>
	// CHECK: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32>			// CHECK: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32>
	// CHECK: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32>			// CHECK: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32>
	// CHECK: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32>			// CHECK: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32>
	// CHECK: affine.for %{{.*}} = 0 to %[[B]] {			// CHECK: affine.for %{{.*}} = 0 to %[[B]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[X0]] {			// CHECK: affine.for %{{.*}} = 0 to %[[X0]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[X1]] {			// CHECK: affine.for %{{.*}} = 0 to %[[X1]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[K]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[Q]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[Z0]] {			// CHECK: affine.for %{{.*}} = 0 to %[[Z0]] {
	// CHECK: affine.for %{{.*}} = 0 to %[[Z1]] {			// CHECK: affine.for %{{.*}} = 0 to %[[Z1]] {
				// CHECK: affine.for %{{.*}} = 0 to %[[Q]] {
				// CHECK: affine.for %{{.*}} = 0 to %[[K]] {
	// CHECK: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})			// CHECK: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
	// CHECK: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})			// CHECK: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
	// CHECK: %[[IDX:.*]] = affine.max #[[$clampMinMap]](%[[SUM0]])			// CHECK: %[[IDX:.*]] = affine.max #[[$clampMinMap]](%[[SUM0]])
	// CHECK: %[[IDY:.*]] = affine.max #[[$clampMinMap]](%[[SUM1]])			// CHECK: %[[IDY:.*]] = affine.max #[[$clampMinMap]](%[[SUM1]])
	// Padded conv involves an affine.max in the memory access and this is not			// Padded conv involves an affine.max in the memory access and this is not
	// allowed by affine.load. Use std.load in such cases.			// allowed by affine.load. Use std.load in such cases.
	// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>			// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>
	// CHECK: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32			// CHECK: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/loops.mlir

	Show First 20 Lines • Show All 276 Lines • ▼ Show 20 Lines
	// CHECKLOOP: %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[$strided3D]]>) {			// CHECKLOOP: %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[$strided3D]]>) {
	// CHECKLOOP: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: %[[Q:.*]] = dim %arg0, %c1 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %[[Q:.*]] = dim %arg0, %c1 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: %[[K:.*]] = dim %arg0, %c2 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %[[K:.*]] = dim %arg0, %c2 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
				// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
				// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
	// CHECKLOOP: %[[SUM:.]] = affine.apply #[[$stride2Dilation1]](%{{.}}, %{{.*}})			// CHECKLOOP: %[[SUM:.]] = affine.apply #[[$stride2Dilation1]](%{{.}}, %{{.*}})
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32			// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32			// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
	// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>

	// CHECKPARALLEL-LABEL: func @conv_view3(			// CHECKPARALLEL-LABEL: func @conv_view3(
	// CHECKPARALLEL: %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[$strided3D]]>) {			// CHECKPARALLEL: %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[$strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[$strided3D]]>) {
	// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, %c1 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, %c1 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: %[[K:.*]] = dim %arg0, %c2 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %[[K:.*]] = dim %arg0, %c2 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: scf.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[K]]) step (%{{.}}, %{{.}}, %{{.*}}) {			// CHECKPARALLEL: scf.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[B]], %[[X0]]) step (%{{.}}, %{{.}}) {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {			// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
				// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
				// CHECKPARALLEL: scf.parallel ({{.}}) = (%{{.}}) to (%[[K]]) step (%{{.*}}) {
	// CHECKPARALLEL: %[[SUM:.]] = affine.apply #[[$stride2Dilation1]](%{{.}}, %{{.*}})			// CHECKPARALLEL: %[[SUM:.]] = affine.apply #[[$stride2Dilation1]](%{{.}}, %{{.*}})
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32			// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>
	// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32			// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
	// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>			// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[$strided3D]]>

	func @conv_view4(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {			func @conv_view4(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {
	linalg.conv(%arg0, %arg1, %arg2) {dilations = [4, 5], strides = [2, 3]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>			linalg.conv(%arg0, %arg1, %arg2) {dilations = [4, 5], strides = [2, 3]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>
	return			return
	}			}
	// CHECKLOOP-LABEL: func @conv_view4(			// CHECKLOOP-LABEL: func @conv_view4(
	// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[$strided4D]]>) {			// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[$strided4D]]>) {
	// CHECKLOOP: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
				// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
				// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
	// CHECKLOOP: %[[SUM0:.]] = affine.apply #[[$stride2Dilation4]](%{{.}}, %{{.*}})			// CHECKLOOP: %[[SUM0:.]] = affine.apply #[[$stride2Dilation4]](%{{.}}, %{{.*}})
	// CHECKLOOP: %[[SUM1:.]] = affine.apply #[[$stride3Dilation5]](%{{.}}, %{{.*}})			// CHECKLOOP: %[[SUM1:.]] = affine.apply #[[$stride3Dilation5]](%{{.}}, %{{.*}})
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32			// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32			// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
	// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>

	// CHECKPARALLEL-LABEL: func @conv_view4(			// CHECKPARALLEL-LABEL: func @conv_view4(
	// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[$strided4D]]>) {			// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[$strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[$strided4D]]>) {
	// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: scf.parallel (%{{.}}, %{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]], %[[K]]) step (%{{.}}, %{{.}}, %{{.}}, %{{.}}) {			// CHECKPARALLEL: scf.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]]) step (%{{.}}, %{{.}}, %{{.*}}) {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {			// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {			// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
				// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
				// CHECKPARALLEL: scf.parallel (%{{.}}) = (%{{.}}) to (%[[K]]) step (%{{.*}}) {
	// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #[[$stride2Dilation4]](%{{.}}, %{{.*}})			// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #[[$stride2Dilation4]](%{{.}}, %{{.*}})
	// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #[[$stride3Dilation5]](%{{.}}, %{{.*}})			// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #[[$stride3Dilation5]](%{{.}}, %{{.*}})
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32			// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>
	// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32			// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
	// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>			// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[$strided4D]]>

	func @conv_padding(%arg0: memref<?x?x?x?xf32>,			func @conv_padding(%arg0: memref<?x?x?x?xf32>,
	%arg1: memref<?x?x?x?xf32>,			%arg1: memref<?x?x?x?xf32>,
	%arg2: memref<?x?x?x?xf32>) {			%arg2: memref<?x?x?x?xf32>) {
	linalg.conv(%arg0, %arg1, %arg2) {dilations = [1, 1],			linalg.conv(%arg0, %arg1, %arg2) {dilations = [1, 1],
	padding = dense<[[0, 1], [1, 1]]> : tensor<2x2xi64>,			padding = dense<[[0, 1], [1, 1]]> : tensor<2x2xi64>,
	strides = [1, 1]} :			strides = [1, 1]} :
	memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>			memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>
	return			return
	}			}
	// CHECKLOOP-LABEL: func @conv_padding			// CHECKLOOP-LABEL: func @conv_padding
	// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {			// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {
	// CHECKLOOP: %[[ZERO:.*]] = constant 0.000000e+00 : f32			// CHECKLOOP: %[[ZERO:.*]] = constant 0.000000e+00 : f32
	// CHECKLOOP: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32>			// CHECKLOOP: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32>
	// CHECKLOOP: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32>			// CHECKLOOP: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32>
	// CHECKLOOP: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32>			// CHECKLOOP: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32>
	// CHECKLOOP: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32>			// CHECKLOOP: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32>
	// CHECKLOOP: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32>			// CHECKLOOP: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32>
	// CHECKLOOP: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32>			// CHECKLOOP: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32>
	// CHECKLOOP: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32>			// CHECKLOOP: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32>
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
	// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {			// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
				// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
				// CHECKLOOP: scf.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
	// CHECKLOOP: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})			// CHECKLOOP: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
	// CHECKLOOP: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})			// CHECKLOOP: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
	// CHECKLOOP: %[[IDX:.*]] = affine.max #[[$clampMinMap]](%[[SUM0]])			// CHECKLOOP: %[[IDX:.*]] = affine.max #[[$clampMinMap]](%[[SUM0]])
	// CHECKLOOP: %[[IDY:.*]] = affine.max #[[$clampMinMap]](%[[SUM1]])			// CHECKLOOP: %[[IDY:.*]] = affine.max #[[$clampMinMap]](%[[SUM1]])
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>
	// CHECKLOOP: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32			// CHECKLOOP: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
	// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32			// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
	// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>			// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
	// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32			// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
	// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>			// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>

	// CHECKPARALLEL-LABEL: func @conv_padding			// CHECKPARALLEL-LABEL: func @conv_padding
	// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {			// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {
	// CHECKPARALLEL: %[[ZERO:.*]] = constant 0.000000e+00 : f32			// CHECKPARALLEL: %[[ZERO:.*]] = constant 0.000000e+00 : f32
	// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32>			// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, %c0 : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32>			// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, %c1 : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32>			// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, %c2 : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32>			// CHECKPARALLEL: %[[K:.*]] = dim %arg0, %c3 : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32>			// CHECKPARALLEL: %[[B:.*]] = dim %arg1, %c0 : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32>			// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, %c1 : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32>			// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, %c2 : memref<?x?x?x?xf32>
	// CHECKPARALLEL: scf.parallel (%{{.}}, %{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]], %[[K]]) step (%{{.}}, %{{.}}, %{{.}}, %{{.}}) {			// CHECKPARALLEL: scf.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]]) step (%{{.}}, %{{.}}, %{{.*}}) {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {			// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
	// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {			// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
				// CHECKPARALLEL: scf.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
				// CHECKPARALLEL: scf.parallel (%{{.}}) = (%{{.}}) to (%[[K]]) step (%{{.*}}) {
	// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})			// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
	// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})			// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
	// CHECKPARALLEL: %[[IDX:.*]] = affine.max #[[$clampMinMap]](%[[SUM0]])			// CHECKPARALLEL: %[[IDX:.*]] = affine.max #[[$clampMinMap]](%[[SUM0]])
	// CHECKPARALLEL: %[[IDY:.*]] = affine.max #[[$clampMinMap]](%[[SUM1]])			// CHECKPARALLEL: %[[IDY:.*]] = affine.max #[[$clampMinMap]](%[[SUM1]])
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32			// CHECKPARALLEL: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32			// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
	// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>			// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
	// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32			// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
	// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>			// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>

	func @pooling_max(%arg0: memref<?x?xf32>,			func @pooling_max(%arg0: memref<?x?xf32>,
	%arg1: memref<?x?xi32>,			%arg1: memref<?x?xi32>,
	%arg2: memref<?x?xf32>) {			%arg2: memref<?x?xf32>) {
	linalg.pooling_max(%arg0, %arg1, %arg2) { strides = [2, 1] }:			linalg.pooling_max(%arg0, %arg1, %arg2) { strides = [2, 1] }:
	memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>			memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>
	return			return
	}			}
	▲ Show 20 Lines • Show All 1,010 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile_conv.mlir

	// RUN: mlir-opt %s -linalg-tile="linalg-tile-sizes=2,3,0,0,4" \| FileCheck %s -check-prefix=TILE-23004			// RUN: mlir-opt %s -linalg-tile="linalg-tile-sizes=2,3,0,0,0,4" \| FileCheck %s -check-prefix=TILE-23004
				bondhugulaUnsubmitted Not Done Reply Inline Actions Why has the number of tile sizes increased here? If this is fixing another bug, please mention that in the summary. bondhugula: Why has the number of tile sizes increased here? If this is fixing another bug, please mention…

	// TILE-23004-DAG: #[[$D0x30pS0x10:.]] = affine_map<(d0) -> (d0 30)>			// TILE-23004-DAG: #[[$D0x30pS0x10:.]] = affine_map<(d0) -> (d0 30)>
	// TILE-23004-DAG: #[[$S0x10p90D0x30pS1:.]] = affine_map<(d0)[s0, s1] -> (s0 10 + 51, d0 * -30 + s1)>			// TILE-23004-DAG: #[[$S0x10p90D0x30pS1:.]] = affine_map<(d0)[s0, s1] -> (s0 10 + 51, d0 * -30 + s1)>
	// TILE-23004-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>			// TILE-23004-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
	// TILE-23004-DAG: #[[$bound_map_4:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>			// TILE-23004-DAG: #[[$bound_map_4:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>

	func @conv(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {			func @conv(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {
	linalg.conv(%arg0, %arg1, %arg2) {dilations = [10, 20], strides = [30, 40]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>			linalg.conv(%arg0, %arg1, %arg2) {dilations = [10, 20], strides = [30, 40]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>
	Show All 37 Lines