This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/Transforms/
-
mlir/
-
Dialect/
-
Linalg/
-
Transforms/
-
LinalgTransforms.h
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
-
LinalgToLoops.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
1/2
loops.mlir
-
parallel_loops.mlir

Differential D77678

[mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.
ClosedPublic

Authored by mravishankar on Apr 7 2020, 1:50 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
antiagainst
hanchung
asaadaldien
ftynse
rriddle
bondhugula

Commits

rG03391df90ed1: [mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.

Summary

The outer parallel loops of a linalg operation is lowered to
loop.parallel, with the other loops lowered to loop.for. This gets the
lowering to loop.parallel on par with the loop.for lowering. In future
the reduction loop could also be lowered to loop.parallel.

Also add a utility function that returns the loops that are
created. This requires change to the EDSC builders to return the
created ops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mravishankar created this revision.Apr 7 2020, 1:50 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 7 2020, 1:50 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, grosul1, Joonsoo and 12 others. · View Herald Transcript

mravishankar added reviewers: antiagainst, hanchung, asaadaldien, ftynse, rriddle.Apr 7 2020, 1:51 PM

mravishankar added a reviewer: bondhugula.

mravishankar removed subscribers: rriddle, antiagainst, nicolasvasilache.

Harbormaster failed remote builds in B52223: Diff 255795!Apr 7 2020, 2:12 PM

The utility function part looks like quite a massive change for just the purpose of getting the xxxForOp from their respective induction variables.
Why is this a good tradeoff (also, it is not tested FWICT)?

Could it be separated from this revision and evaluated independently as it seems quite orthogonal to the part that handles more cases of Linalg -> ploops ?

In D77678#1968093, @nicolasvasilache wrote:

The utility function part looks like quite a massive change for just the purpose of getting the xxxForOp from their respective induction variables.
Why is this a good tradeoff (also, it is not tested FWICT)?

If I understand the comment correctly, are you suggesting not changing the EDSC builders to return the operation handles as well? I think all the induction variables are block arguments of the entry block of the loop operations. I can get the parent op of the block and not modify the edsc builders. I started down the path of getting the op directly from the builders, and just finished it. I am fine with that approach as well. My initial thought was the approach of getting the op from the induction variables is more a WAR, but I dont have enough background in this to know if that is indeed the case.

Could it be separated from this revision and evaluated independently as it seems quite orthogonal to the part that handles more cases of Linalg -> ploops ?

Maybe. I want to do both lower linalg to ploops, and have the utility function that does that return the loops created as well, just like the tileLinalgOp method returns the resulting tiled op and loops created.

asaadaldien added inline comments.Apr 7 2020, 4:14 PM

mlir/include/mlir/EDSC/Builders.h
222 ↗	(On Diff #255795)	Can we move this block to back to line:79 ?

@mravishankar A general question on the direction this is taking: why are we even lowering all of this to loop.parallel and loop.for instead of affine.parallel and affine.for? The conversion from affine.* to loop.* is guaranteed to *always* succeed by design in all cases and it already does for affine.for -> loop.for. So you get your loop.for's for free (similarly affine.parallel -> loop.parallel). Looks like you are adding more and more code that is skipping a level of abstraction and introducing a parallel lowering path that would ultimately be redundant / subsumed. I feel this is taking the design and the infrastructure in the wrong direction, more so at each step. @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.

On a less major note, is there an example here that can't be represented via the affine dialect straightaway - the way it is today? Even all your loop steps are one (the ones I can immediately tell from the test cases) - if there are some cases that you need that aren't, they could always be normalized to one via affine (without even needing grayboxes for the cases you have).

bondhugula requested changes to this revision.Apr 7 2020, 8:54 PM

bondhugula added inline comments.

mlir/test/Dialect/Linalg/loops.mlir
128	Is there a need to match all of the trailing 'step %{{.*}}'? You always print step right?

This revision now requires changes to proceed.Apr 7 2020, 8:54 PM

FWIW, I entirely agree with @bondhugula 's sentiment here!

@bondhugula : Thanks for the comments. I certainly cannot think of a case in Linalg currently where the path of going from linalg -> affine -> loop will not be able to cover everything that currently is supported in Linalg. All I was trying to achieve with this change is to finish up the work that was started a while ago of lowering linalg to loop.parallel. From my perspective, loop.parallel and loop.for are in the same group. I certainly didnt intend to make a design or infrastructure level decision here.

I think it would be good to reach some consensus here on whether linalg should always lower to affine, and then lowered to loop dialect, and if it is not recommended to lower linalg to loops directly. As you mentioned going from affine.for -> loop.for is always guaranteed, and so is going from affine.parallel -> loop.parallel. So we can always make the decision to lower from linalg to affine and then to loops for both the parallel and non-parallel version. My only concern is that If there exists a lowering from linalg to loops, there should also be a way to target loop.parallel since without that you are dropping semantic information about the computation. This information is really important when lowering from loop dialect to GPU dialect (which was AFAIK one of the main motivations behind having loop.parallel in the first place).

@herhut who is probably also interested in this and has comments. I am assuming @nicolasvasilache will also provide his thoughts on this. FWIW, I am fine going through the affine dialect instead of directly going to loop dialect. I hadnt given thought about using affine dialect for my use cases, but i definitely dont see an issue with it.

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops. So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Side note: This change seems bigger than it is cause it is also doing some changes to EDSC which are probably not needed. Will try to remove those.

In D77678#1968714, @mravishankar wrote:

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops.

I've pointed this out a couple of times that this isn't accurate - you can represent non-constant tile sizes using either affine.parallel or affine.for (https://llvm.discourse.group/t/beginner-q-help-with-loops-affine-linalg/707/4).

So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Second, there is no issue with using a mix of affine and loop dialect ops - '-convert-affine-to-std' should be able to handle it by design. From a mix of affine.for and loop.for, it'll take you to just loop.for's. Please file a bug report if it doesn't!

In D77678#1968555, @bondhugula wrote:

. @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.

I did think about this, and we even had a document back in the time when had access to those ;) The discussion you want to have here is mostly independent of this patch, and pertains to the motivation for having the loop dialect in the first place. We had that discussion when the dialect was introduced.

Loop dialect was split out from Linalg, where the loop-related ops had been introduced to remove some of the affine constraints that were irrelevant and/or constraining for Linalg's use case. One of the constraints is the need for what I call "affine provenance", i.e. the set of rules spread out in the code that define which SSA values are allowed to be used as dimensions or as symbols in affine constructs. Supporting non-constant steps can be seen as a consequence of lifting those constraints. Linalg had (and still has) a forward-looking design that accounted for things like non-dense buffers and custom types. Plumbing all that through the affine machinery is hard (trust me, I tried).

While one can, in many cases, wiggle their way out of the representation problem, like you suggest with parametric steps, the question of whether one should remains pertinent. It's a complexity trade-off question. We can introduce extra operations and affine maps to model non-constant steps, call this an "affine idiom for parametric steps" and try to discover it when we reason about steps. We can introduce another idiom for another case that doesn't fit affine (let's take indirect accesses). And so on. This introduces extra complexity to the IR and to the code that manipulates it. What's the counterpart? Linalg-based flow does not intend to run affine transformations, so we cannot claim we pay the complexity price for having better optimization. We can spare some lowering code by... writing some other lowering code with more complex abstractions.

On a minor note, have you actually tried running the example you proposed in the linked forum post? :) There are many places where semi-affine maps are poorly supported, or not supported at all. Conversion to LLVM is one of them.

That being said, I was the one who has been arguing that Linalg lowering should go through affine when it can (and so was the design document). The problem is when it cannot, or when doing so would just increase the complexity of the system without visible benefits. Let's assume there are cases where it cannot (today, examples would be: using linalg ops with values that don't qualify as affine symbols, reductions that explicitly want to go through SSA values; tomorrow, we'll have sparse buffers). Then we need the possibility to emit at least some non-affine loops, which this patch contributes. Now, if there is one specific loop in a Linalg op that does not fit into affine, do we actually want a mix of affine and non-affine loops, or do we prefer a single non-affine loop nest that, e.g., preserves the idea of permutability that would be no longer discoverable by affine analysis. I can see value in having both options.

The actual duplication here is between Linalg->loop.for and Linalg->loop.parallel lowering, which I pointed out in one if the previous patches. Given that we have the lowering from loop.parallel to loop.for, we should remove the Linalg->loop.for and replace it with this. My recollection is that it was the plan, but it requires the lowering to loop.parallel to also support reductions, which this patch does not do.

Transformations on different kinds of loops are another question, unrelated to this patch. Again, I see value in removing affine restrictions or, conversely, having stricter restrictions such as hyper-rectangular, and separating the legality and profitability analysis (likely based on those restrictions) from the IR manipulation logic.

Herald added a subscriber: frgossen. · View Herald TranscriptApr 8 2020, 3:41 AM

Removing EDSC related changes to focus change only on lowering to
loop.parallel ops.

In D77678#1968752, @bondhugula wrote:

In D77678#1968714, @mravishankar wrote:

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops.

I've pointed this out a couple of times that this isn't accurate - you can represent non-constant tile sizes using either affine.parallel or affine.for (https://llvm.discourse.group/t/beginner-q-help-with-loops-affine-linalg/707/4).

Thanks for the pointer. As was done in that post, I just looked at the op definition and reached the conclusion about parametric tiling. I havent worked with affine dialect as much to know about such things. Its definitely something I want to look into in due course.

So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Second, there is no issue with using a mix of affine and loop dialect ops - '-lower-to-affine' should be able to handle it by design. From a mix of affine.for and loop.for, it'll take you to just loop.for's. Please file a bug report if it doesn't!

Agreed (and said so earlier). It should be OK to mix loop.parallel/loop.for with affine.for/affine.parallel. But based on your post is it possible to generate affine.for/affine.parallel while tiling linalg ops as well? That way the same benefit of going to affine.for/affine.parallel would be available at the inter-tile loops as well.

In D77678#1969051, @ftynse wrote:

In D77678#1968555, @bondhugula wrote:

. @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.

I did think about this, and we even had a document back in the time when had access to those ;) The discussion you want to have here is mostly independent of this patch, and pertains to the motivation for having the loop dialect in the first place. We had that discussion when the dialect was introduced.

Loop dialect was split out from Linalg, where the loop-related ops had been introduced to remove some of the affine constraints that were irrelevant and/or constraining for Linalg's use case. One of the constraints is the need for what I call "affine provenance", i.e. the set of rules spread out in the code that define which SSA values are allowed to be used as dimensions or as symbols in affine constructs. Supporting non-constant steps can be seen as a consequence of lifting those constraints. Linalg had (and still has) a forward-looking design that accounted for things like non-dense buffers and custom types. Plumbing all that through the affine machinery is hard (trust me, I tried).

While one can, in many cases, wiggle their way out of the representation problem, like you suggest with parametric steps, the question of whether one should remains pertinent. It's a complexity trade-off question. We can introduce extra operations and affine maps to model non-constant steps, call this an "affine idiom for parametric steps" and try to discover it when we reason about steps. We can introduce another idiom for another case that doesn't fit affine (let's take indirect accesses). And so on. This introduces extra complexity to the IR and to the code that manipulates it. What's the counterpart? Linalg-based flow does not intend to run affine transformations, so we cannot claim we pay the complexity price for having better optimization. We can spare some lowering code by... writing some other lowering code with more complex abstractions.

Thanks @ftynse for the really useful background. I am certainly unaware of the discussion here. Would be really good if we could surface this back up on the discussion forum. But as you mentioned, I hope that this patch will be seen independent of that discussion. I am not trying to weigh the scales one way or the other, but rather just filling missing pieces where I can and when I need them.

The actual duplication here is between Linalg->loop.for and Linalg->loop.parallel lowering, which I pointed out in one if the previous patches. Given that we have the lowering from loop.parallel to loop.for, we should remove the Linalg->loop.for and replace it with this. My recollection is that it was the plan, but it requires the lowering to loop.parallel to also support reductions, which this patch does not do.

Agreed that lowering from linalg to loop.for should become redundant eventually, but right now the lowering to loop.parallel does not support reductions (Apologies for misrepresenting earlier that I am "finishing" the linalg to loop.parallel lowering, there are a couple of cases missing). As it stands, with this patch itself we "can" remove the linalg -> loop.for lowering. For the unhandled cases thats the fallback used anyway. So there is no change in functionality by merging the lowering to loop.for and loop.parallel.

mlir/test/Dialect/Linalg/loops.mlir
128	Probably not. I didnt change what was already there, just changed the check-prefix. I would rather keep it as is.

Thanks Mahesh!

Re Linalg, affine and loop (structured control flow), different transformations can be done at different levels.
Determining where which transformation should be done is still a very open problem: just because you *can* perform transformation A on representation B does not necessarily mean you *should*.
Tradeoffs include complexity, scalability, maintainability, driving transformations, composability, deriving profitability metrics etc etc etc.
The interesting part IMO is being able to mix these different systems and essentially compose Halide-style, affine-style and Allen-Kennedy-style transformations, declaratively, and evaluate based on concrete data.

All this is completely orthogonal to the fact that we need to build the different paths.
Linalg -> affine -> loops / SCF is one valid path
Linalg -> loops / SCF is another valid path
Affine ->loops / SCF is another valid path
None should be discouraged.

Harbormaster completed remote builds in B52379: Diff 256053.Apr 8 2020, 3:46 PM

In D77678#1969889, @mravishankar wrote:

In D77678#1968752, @bondhugula wrote:

In D77678#1968714, @mravishankar wrote:

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops.

I've pointed this out a couple of times that this isn't accurate - you can represent non-constant tile sizes using either affine.parallel or affine.for (https://llvm.discourse.group/t/beginner-q-help-with-loops-affine-linalg/707/4).

Thanks for the pointer. As was done in that post, I just looked at the op definition and reached the conclusion about parametric tiling. I havent worked with affine dialect as much to know about such things. Its definitely something I want to look into in due course.

So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Second, there is no issue with using a mix of affine and loop dialect ops - '-lower-to-affine' should be able to handle it by design. From a mix of affine.for and loop.for, it'll take you to just loop.for's. Please file a bug report if it doesn't!

Agreed (and said so earlier). It should be OK to mix loop.parallel/loop.for with affine.for/affine.parallel. But based on your post is it possible to generate affine.for/affine.parallel while tiling linalg ops as well? That way the same benefit of going to affine.for/affine.parallel would be available at the inter-tile loops as well.

Yes, of course. That's exactly what I've been pointing out. loop.for/parallel is currently being unnecessarily used for all these cases where it is possible to just go through affine.for/affine.parallel. Even with more general computation (where dim/symbol restrictions get in the way) that you may need in the future, with affine.scope ops (renamed from grayboxes), you'd never need to use loop.for/if to lower any tensor/memref indexing computation. And when you lower to loop.for/if, you'd get what you want. Is there a need to maintain a non-unit symbolic step for loop.for's steps? Your bounds are already any index type SSA value. Won't your code be simplified if you just canonicalized to a unit stride? With semi-affine maps, even with the current support, you only get *more* than what you get with loop.for's.

Adding explicit instantiation of linalgLowerOpToLoops for all linalg
named ops.

Rebase

Harbormaster completed remote builds in B52835: Diff 256826.Apr 11 2020, 11:57 PM

Harbormaster completed remote builds in B52836: Diff 256827.Apr 12 2020, 12:29 AM

Harbormaster completed remote builds in B52838: Diff 256829.Apr 12 2020, 1:02 AM

Rebase

Harbormaster completed remote builds in B52943: Diff 257006.Apr 13 2020, 10:45 AM

Rebase

@bondhugula I am submitting this change now. Based on comments here there might be larger discussions to be had over how different dialects interact which is separate from this change. Please let me know if you have any other comments on the patch itself.

Rebase

This revision was not accepted when it landed; it landed in state Needs Review.Apr 13 2020, 1:36 PM

Closed by commit rG03391df90ed1: [mlir][Linalg] Add loop.parallel lowering for all Linalg Ops. (authored by mravishankar). · Explain Why

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B52982: Diff 257082!Apr 13 2020, 1:37 PM

Harbormaster failed remote builds in B52988: Diff 257087!Apr 13 2020, 2:07 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

LinalgTransforms.h

7 lines

lib/

Dialect/

Linalg/

Transforms/

LinalgToLoops.cpp

228 lines

test/

Dialect/

Linalg/

loops.mlir

890 lines

parallel_loops.mlir

24 lines

Diff 257098

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	LogicalResult tileLinalgOpAndSetMarker(PatternRewriter &rewriter, Operation *op,
ArrayRef<unsigned> permutation);		ArrayRef<unsigned> permutation);

/// Tiles `op` by `sizes`, fuses the producers of `operandIndicesToFuse` and		/// Tiles `op` by `sizes`, fuses the producers of `operandIndicesToFuse` and
/// sets the attribute `kLinalgTransformMarker` to `linalgMarker`.		/// sets the attribute `kLinalgTransformMarker` to `linalgMarker`.
LogicalResult tileAndFuseLinalgOpAndSetMarker(		LogicalResult tileAndFuseLinalgOpAndSetMarker(
PatternRewriter &rewriter, Operation *op, ArrayRef<int64_t> sizes,		PatternRewriter &rewriter, Operation *op, ArrayRef<int64_t> sizes,
ArrayRef<int64_t> operandIndicesToFuse, StringRef linalgMarker);		ArrayRef<int64_t> operandIndicesToFuse, StringRef linalgMarker);

		using LinalgLoops = SmallVector<Operation *, 4>;

		/// Emits a loop nest of with the proper body for `op`.
		template <typename LoopTy, typename ConcreteOp>
		Optional<LinalgLoops> linalgLowerOpToLoops(PatternRewriter &rewriter,
		Operation *op);

/// Emits a loop nest of `loop.for` with the proper body for `op`.		/// Emits a loop nest of `loop.for` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult linalgOpToLoops(PatternRewriter &rewriter, Operation *op);		LogicalResult linalgOpToLoops(PatternRewriter &rewriter, Operation *op);

/// Emits a loop nest of `loop.parallel` with the proper body for `op`.		/// Emits a loop nest of `loop.parallel` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult linalgOpToParallelLoops(PatternRewriter &rewriter, Operation *op);		LogicalResult linalgOpToParallelLoops(PatternRewriter &rewriter, Operation *op);

Show All 28 Lines

mlir/lib/Dialect/Linalg/Transforms/LinalgToLoops.cpp

Show First 20 Lines • Show All 527 Lines • ▼ Show 20 Lines
// This struct is for factoring out the implementation and support template		// This struct is for factoring out the implementation and support template
// instantiations in the following 2 cases:		// instantiations in the following 2 cases:
// 1. Appending to a list of patterns via RewritePatternList.		// 1. Appending to a list of patterns via RewritePatternList.
// 2. Direct invocation via `linalgOpToLoops` and `linalgOpToAffineLoops`.		// 2. Direct invocation via `linalgOpToLoops` and `linalgOpToAffineLoops`.
// The implementation must work both in DRR and inside a RewritePattern. As a		// The implementation must work both in DRR and inside a RewritePattern. As a
// consequence, (1) it is only allowed to emit new ops if the match is		// consequence, (1) it is only allowed to emit new ops if the match is
// guaranteed to be a success, (2) it is not allowed erase/replace, and (3) an		// guaranteed to be a success, (2) it is not allowed erase/replace, and (3) an
// encompassing pattern must take care of the erasure logic.		// encompassing pattern must take care of the erasure logic.
template <typename LoopTy, typename IndexedValueTy, typename ConcreteOpTy>		template <typename LoopTy, typename ConcreteOpTy>
class LinalgOpToLoopsImpl {		class LinalgOpToLoopsImpl {
public:		public:
static LogicalResult doit(Operation *op, PatternRewriter &rewriter);		static Optional<LinalgLoops> doit(Operation *op, PatternRewriter &rewriter);
};		};

template <typename LoopTy>		namespace {
bool loweringIsAllowed(int numParallelLoops, int numLoops) {		/// Helper struct to generate the loop nest for the op. This factored out here
return true;		/// to be able to partially specialize this for different LoopTy.
		template <typename LoopTy, typename ConcreteOpTy>
		class GenerateLoopNest {
		public:
		using IndexedValueTy =
		typename std::conditional<std::is_same<LoopTy, AffineForOp>::value,
		AffineIndexedValue, StdIndexedValue>::type;
		static void doit(ConcreteOpTy linalgOp, ArrayRef<Value> loopRanges,
		MutableArrayRef<ValueHandle> allIvs) {
		SmallVector<ValueHandle *, 4> allPIvs =
		makeHandlePointers(MutableArrayRef<ValueHandle>(allIvs));

		GenericLoopNestRangeBuilder<LoopTy>(allPIvs, loopRanges)([&] {
		SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());
		LinalgScopedEmitter<IndexedValueTy,
		ConcreteOpTy>::emitScalarImplementation(allIvValues,
		linalgOp);
		});
		}
		};

		/// Generates loops nest using loop.parallel. loop.parallel is only used for the
		/// outer parallel loops. All other loops are generated using loop.for
		/// operation.
		template <typename ConcreteOpTy>
		class GenerateLoopNest<loop::ParallelOp, ConcreteOpTy> {
		public:
		using IndexedValueTy = StdIndexedValue;

		static void doit(ConcreteOpTy linalgOp, ArrayRef<Value> loopRanges,
		MutableArrayRef<ValueHandle> allIvs) {
		// Only generate loop.parallel for outer consecutive "parallel"
		// iterator_types.
		// TODO(ravishankarm): Generate loop.parallel for all "parallel" iterator
		// types, not just the outer most ones. Also handle "reduction" iterator
		// types.
		auto nPar = linalgOp.getNumParallelLoops();
		auto nRed = linalgOp.getNumReductionLoops();
		auto nWin = linalgOp.getNumWindowLoops();
		auto nLoops = nPar + nRed + nWin;
		auto nOuterPar = linalgOp.iterator_types()
		.getValue()
		.take_while([](Attribute attr) {
		return attr.cast<StringAttr>().getValue() ==
		getParallelIteratorTypeName();
		})
		.size();
		// If there are no outer parallel loops, then number of loop ops is same as
		// the number of loops, and they are all loop.for ops.
		auto nLoopOps = (nOuterPar ? nLoops - nOuterPar + 1 : nLoops);
		SmallVector<ValueHandle *, 4> allPIvs =
		makeHandlePointers(MutableArrayRef<ValueHandle>(allIvs));

		SmallVector<OperationHandle, 4> allLoops(nLoopOps, OperationHandle());
		SmallVector<OperationHandle *, 4> allPLoops;
		allPLoops.reserve(allLoops.size());
		for (OperationHandle &loop : allLoops)
		allPLoops.push_back(&loop);

		ArrayRef<ValueHandle *> allPIvsRef(allPIvs);
		ArrayRef<OperationHandle *> allPLoopsRef(allPLoops);

		if (nOuterPar) {
		GenericLoopNestRangeBuilder<loop::ParallelOp>(
		allPIvsRef.take_front(nOuterPar),
		loopRanges.take_front(nOuterPar))([&] {
		GenericLoopNestRangeBuilder<loop::ForOp>(
		allPIvsRef.drop_front(nOuterPar),
		loopRanges.drop_front(nOuterPar))([&] {
		SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());
		LinalgScopedEmitter<StdIndexedValue, ConcreteOpTy>::
		emitScalarImplementation(allIvValues, linalgOp);
		});
		});
		} else {
		// If there are no parallel loops then fallback to generating all loop.for
		// operations.
		GenericLoopNestRangeBuilder<loop::ForOp>(allPIvsRef, loopRanges)([&] {
		SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());
		LinalgScopedEmitter<StdIndexedValue,
		ConcreteOpTy>::emitScalarImplementation(allIvValues,
		linalgOp);
		});
}		}
template <>
bool loweringIsAllowed<loop::ParallelOp>(int numParallelLoops, int numLoops) {
return numParallelLoops == numLoops;
}		}
		};
		} // namespace

		template <typename LoopTy, typename ConcreteOpTy>
		Optional<LinalgLoops>
		LinalgOpToLoopsImpl<LoopTy, ConcreteOpTy>::doit(Operation *op,
		PatternRewriter &rewriter) {
		using Impl = GenerateLoopNest<LoopTy, ConcreteOpTy>;
		using IndexedValueTy =
		typename GenerateLoopNest<LoopTy, ConcreteOpTy>::IndexedValueTy;

template <typename LoopTy, typename IndexedValueTy, typename ConcreteOpTy>		ScopedContext scope(rewriter, op->getLoc());
LogicalResult LinalgOpToLoopsImpl<LoopTy, IndexedValueTy, ConcreteOpTy>::doit(
Operation *op, PatternRewriter &rewriter) {
OpBuilder b(op);
ScopedContext scope(b, op->getLoc());

// The flattened loopToOperandRangesMaps is expected to be an invertible		// The flattened loopToOperandRangesMaps is expected to be an invertible
// permutation map (which is asserted in the inverse calculation).		// permutation map (which is asserted in the inverse calculation).
auto linalgOp = cast<ConcreteOpTy>(op);		auto linalgOp = cast<ConcreteOpTy>(op);
assert(linalgOp.hasBufferSemantics() &&		assert(linalgOp.hasBufferSemantics() &&
"expected linalg op with buffer semantics");		"expected linalg op with buffer semantics");
auto nPar = linalgOp.getNumParallelLoops();		auto nPar = linalgOp.getNumParallelLoops();
auto nRed = linalgOp.getNumReductionLoops();		auto nRed = linalgOp.getNumReductionLoops();
auto nWin = linalgOp.getNumWindowLoops();		auto nWin = linalgOp.getNumWindowLoops();
auto nLoops = nPar + nRed + nWin;		auto nLoops = nPar + nRed + nWin;
if (!loweringIsAllowed<LoopTy>(nPar, nLoops))
return failure();
auto mapsRange =		auto mapsRange =
linalgOp.indexing_maps().template getAsRange<AffineMapAttr>();		linalgOp.indexing_maps().template getAsRange<AffineMapAttr>();
auto maps =		auto maps =
functional::map([](AffineMapAttr a) { return a.getValue(); }, mapsRange);		functional::map([](AffineMapAttr a) { return a.getValue(); }, mapsRange);
auto invertedMap = inversePermutation(concatAffineMaps(maps));		auto invertedMap = inversePermutation(concatAffineMaps(maps));
if (!invertedMap) {		if (!invertedMap) {
LinalgScopedEmitter<IndexedValueTy, ConcreteOpTy>::emitScalarImplementation(		LinalgScopedEmitter<IndexedValueTy, ConcreteOpTy>::emitScalarImplementation(
{}, linalgOp);		{}, linalgOp);
return success();		return LinalgLoops();
}		}

SmallVector<ValueHandle, 4> allIvs(nLoops, ValueHandle(b.getIndexType()));		SmallVector<ValueHandle, 4> allIvs(nLoops,
SmallVector<ValueHandle *, 4> allPIvs =		ValueHandle(rewriter.getIndexType()));
makeHandlePointers(MutableArrayRef<ValueHandle>(allIvs));		auto loopRanges =
auto loopRanges = emitLoopRanges(scope.getBuilder(), scope.getLocation(),		emitLoopRanges(scope.getBuilder(), scope.getLocation(), invertedMap,
invertedMap, getViewSizes(b, linalgOp));		getViewSizes(rewriter, linalgOp));
assert(loopRanges.size() == allIvs.size());		assert(loopRanges.size() == allIvs.size());
		Impl::doit(linalgOp, loopRanges, allIvs);
GenericLoopNestRangeBuilder<LoopTy>(allPIvs, loopRanges)([&] {		// Number of loop ops might be different from the number of ivs since some
SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());		// loops like affine.parallel and loop.parallel have multiple ivs.
LinalgScopedEmitter<IndexedValueTy, ConcreteOpTy>::emitScalarImplementation(		llvm::SetVector<Operation *> loopSet;
allIvValues, linalgOp);		for (ValueHandle &iv : allIvs) {
});		if (!iv.hasValue())
return success();		return {};
		// The induction variable is a block argument of the entry block of the
		// loop operation.
		BlockArgument ivVal = iv.getValue().dyn_cast<BlockArgument>();
		if (!ivVal)
		return {};
		loopSet.insert(ivVal.getOwner()->getParentOp());
		}
		LinalgLoops loops(loopSet.begin(), loopSet.end());
		return loops;
}		}

template <typename LoopType, typename IndexedValueType, typename ConcreteOp>		template <typename LoopType, typename ConcreteOp>
class LinalgRewritePattern : public RewritePattern {		class LinalgRewritePattern : public RewritePattern {
public:		public:
explicit LinalgRewritePattern(MLIRContext *context)		explicit LinalgRewritePattern(MLIRContext *context)
: RewritePattern(ConcreteOp::getOperationName(), 1, context) {}		: RewritePattern(ConcreteOp::getOperationName(), 1, context) {}

LogicalResult matchAndRewrite(Operation *op,		LogicalResult matchAndRewrite(Operation *op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
using Impl = LinalgOpToLoopsImpl<LoopType, IndexedValueType, ConcreteOp>;		using Impl = LinalgOpToLoopsImpl<LoopType, ConcreteOp>;
if (failed(Impl::doit(op, rewriter)))		if (!Impl::doit(op, rewriter))
return failure();		return failure();
rewriter.eraseOp(op);		rewriter.eraseOp(op);
return success();		return success();
}		}
};		};

// Helper classes for type list expansion.		// Helper classes for type list expansion.
template <typename LoopType, typename IndexedValueType, typename... LinalgOps>		template <typename LoopType, typename... LinalgOps>
class RewritePatternList;		class RewritePatternList;

template <typename LoopType, typename IndexedValueType>		template <typename LoopType>
class RewritePatternList<LoopType, IndexedValueType> {		class RewritePatternList<LoopType> {
public:		public:
static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {}		static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {}
};		};

template <typename LoopType, typename IndexedValueType, typename ConcreteOp,		template <typename LoopType, typename ConcreteOp, typename... LinalgOps>
typename... LinalgOps>		class RewritePatternList<LoopType, ConcreteOp, LinalgOps...> {
class RewritePatternList<LoopType, IndexedValueType, ConcreteOp, LinalgOps...> {
public:		public:
static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {		static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {
patterns		patterns.insert<LinalgRewritePattern<LoopType, ConcreteOp>>(ctx);
.insert<LinalgRewritePattern<LoopType, IndexedValueType, ConcreteOp>>(		RewritePatternList<LoopType, LinalgOps...>::build(patterns, ctx);
ctx);
RewritePatternList<LoopType, IndexedValueType, LinalgOps...>::build(
patterns, ctx);
}		}
};		};

/// Populate the given list with patterns that convert from Linalg to LLVM.		/// Populate the given list with patterns that convert from Linalg to LLVM.
template <typename LoopType, typename IndexedValueType>		template <typename LoopType>
void FillRewritePatterns(OwningRewritePatternList &patterns, MLIRContext *ctx) {		void FillRewritePatterns(OwningRewritePatternList &patterns, MLIRContext *ctx) {
RewritePatternList<LoopType, IndexedValueType,		RewritePatternList<LoopType,
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.cpp.inc"		#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.cpp.inc"
>::build(patterns, ctx);		>::build(patterns, ctx);
}		}

// Local folding pattern for AffineApplyOp that we can apply greedily.		// Local folding pattern for AffineApplyOp that we can apply greedily.
// This replaces AffineApplyOp by the proper value in cases where the associated		// This replaces AffineApplyOp by the proper value in cases where the associated
// map is trivial. A trivial map here is defined as a map with a single result		// map is trivial. A trivial map here is defined as a map with a single result
Show All 27 Lines	if (expr.dyn_cast<AffineDimExpr>() \|\| expr.dyn_cast<AffineSymbolExpr>()) {
rewriter.replaceOp(op, op->getOperand(0));		rewriter.replaceOp(op, op->getOperand(0));
return success();		return success();
}		}
return failure();		return failure();
}		}
};		};
} // namespace		} // namespace

template <typename LoopType, typename IndexedValueType>		template <typename LoopType>
static void lowerLinalgToLoopsImpl(Operation op, MLIRContext context) {		static void lowerLinalgToLoopsImpl(Operation op, MLIRContext context) {
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
// Canonicalization and folding patterns applied greedily allow cleaning up		// Canonicalization and folding patterns applied greedily allow cleaning up
// the emitted IR on the fly.		// the emitted IR on the fly.
// TODO(ntv) fold view and subview ops?		// TODO(ntv) fold view and subview ops?
FillRewritePatterns<LoopType, IndexedValueType>(patterns, context);		FillRewritePatterns<LoopType>(patterns, context);
DimOp::getCanonicalizationPatterns(patterns, context);		DimOp::getCanonicalizationPatterns(patterns, context);
AffineApplyOp::getCanonicalizationPatterns(patterns, context);		AffineApplyOp::getCanonicalizationPatterns(patterns, context);
patterns.insert<FoldAffineOp>(context);		patterns.insert<FoldAffineOp>(context);
// Just apply the patterns greedily.		// Just apply the patterns greedily.
applyPatternsAndFoldGreedily(op, patterns);		applyPatternsAndFoldGreedily(op, patterns);
}		}

namespace {		namespace {
struct LowerToAffineLoops		struct LowerToAffineLoops
: public LinalgLowerToAffineLoopsBase<LowerToAffineLoops> {		: public LinalgLowerToAffineLoopsBase<LowerToAffineLoops> {
void runOnFunction() override {		void runOnFunction() override {
lowerLinalgToLoopsImpl<AffineForOp, AffineIndexedValue>(getFunction(),		lowerLinalgToLoopsImpl<AffineForOp>(getFunction(), &getContext());
&getContext());
}		}
};		};
struct LowerToLoops : public LinalgLowerToLoopsBase<LowerToLoops> {		struct LowerToLoops : public LinalgLowerToLoopsBase<LowerToLoops> {
void runOnFunction() override {		void runOnFunction() override {
lowerLinalgToLoopsImpl<loop::ForOp, StdIndexedValue>(getFunction(),		lowerLinalgToLoopsImpl<loop::ForOp>(getFunction(), &getContext());
&getContext());
}		}
};		};
struct LowerToParallelLoops		struct LowerToParallelLoops
: public LinalgLowerToParallelLoopsBase<LowerToParallelLoops> {		: public LinalgLowerToParallelLoopsBase<LowerToParallelLoops> {
void runOnFunction() override {		void runOnFunction() override {
lowerLinalgToLoopsImpl<loop::ParallelOp, StdIndexedValue>(getFunction(),		lowerLinalgToLoopsImpl<loop::ParallelOp>(getFunction(), &getContext());
&getContext());
}		}
};		};
} // namespace		} // namespace

std::unique_ptr<OperationPass<FuncOp>> mlir::createConvertLinalgToLoopsPass() {		std::unique_ptr<OperationPass<FuncOp>> mlir::createConvertLinalgToLoopsPass() {
return std::make_unique<LowerToLoops>();		return std::make_unique<LowerToLoops>();
}		}

std::unique_ptr<OperationPass<FuncOp>>		std::unique_ptr<OperationPass<FuncOp>>
mlir::createConvertLinalgToParallelLoopsPass() {		mlir::createConvertLinalgToParallelLoopsPass() {
return std::make_unique<LowerToParallelLoops>();		return std::make_unique<LowerToParallelLoops>();
}		}

std::unique_ptr<OperationPass<FuncOp>>		std::unique_ptr<OperationPass<FuncOp>>
mlir::createConvertLinalgToAffineLoopsPass() {		mlir::createConvertLinalgToAffineLoopsPass() {
return std::make_unique<LowerToAffineLoops>();		return std::make_unique<LowerToAffineLoops>();
}		}

		/// Emits a loop nest with the proper body for `op`.
		template <typename LoopTy, typename ConcreteOp>
		Optional<LinalgLoops>
		mlir::linalg::linalgLowerOpToLoops(PatternRewriter &rewriter, Operation *op) {
		return LinalgOpToLoopsImpl<LoopTy, ConcreteOp>::doit(op, rewriter);
		}

/// Emits a loop nest of `loop.for` with the proper body for `op`.		/// Emits a loop nest of `loop.for` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult mlir::linalg::linalgOpToLoops(PatternRewriter &rewriter,		LogicalResult mlir::linalg::linalgOpToLoops(PatternRewriter &rewriter,
Operation *op) {		Operation *op) {
return LinalgOpToLoopsImpl<loop::ForOp, StdIndexedValue, ConcreteOp>::doit(		Optional<LinalgLoops> loops =
op, rewriter);		linalgLowerOpToLoops<loop::ForOp, ConcreteOp>(rewriter, op);
		return loops ? success() : failure();
}		}

/// Emits a loop nest of `affine.for` with the proper body for `op`.		/// Emits a loop nest of `affine.for` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult mlir::linalg::linalgOpToAffineLoops(PatternRewriter &rewriter,		LogicalResult mlir::linalg::linalgOpToAffineLoops(PatternRewriter &rewriter,
Operation *op) {		Operation *op) {
return LinalgOpToLoopsImpl<AffineForOp, AffineIndexedValue, ConcreteOp>::doit(		Optional<LinalgLoops> loops =
op, rewriter);		linalgLowerOpToLoops<AffineForOp, ConcreteOp>(rewriter, op);
		return loops ? success() : failure();
}		}

/// Emits a loop nest of `loop.parallel` with the proper body for `op`.		/// Emits a loop nest of `loop.parallel` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult mlir::linalg::linalgOpToParallelLoops(PatternRewriter &rewriter,		LogicalResult mlir::linalg::linalgOpToParallelLoops(PatternRewriter &rewriter,
Operation *op) {		Operation *op) {
return LinalgOpToLoopsImpl<loop::ParallelOp, StdIndexedValue,		Optional<LinalgLoops> loops =
ConcreteOp>::doit(op, rewriter);		linalgLowerOpToLoops<loop::ParallelOp, ConcreteOp>(rewriter, op);
		return loops ? success() : failure();
}		}

// TODO(ntv) Need to make these instantiations more future-proof to avoid the		// TODO(ntv) Need to make these instantiations more future-proof to avoid the
// need to update as soon as we add new ops.		// need to update as soon as we add new ops.
#define INSTANTIATE_LINALG_OP_TO_LOOPS(OP_TYPE) \		#define INSTANTIATE_LINALG_OP_TO_LOOPS(OP_TYPE) \
template LogicalResult mlir::linalg::linalgOpToLoops<OP_TYPE>( \		template LogicalResult mlir::linalg::linalgOpToLoops<OP_TYPE>( \
PatternRewriter & rewriter, Operation * op); \		PatternRewriter & rewriter, Operation * op); \
template LogicalResult mlir::linalg::linalgOpToAffineLoops<OP_TYPE>( \		template LogicalResult mlir::linalg::linalgOpToAffineLoops<OP_TYPE>( \
		PatternRewriter & rewriter, Operation * op); \
		template LogicalResult mlir::linalg::linalgOpToParallelLoops<OP_TYPE>( \
		PatternRewriter & rewriter, Operation * op); \
		template Optional<LinalgLoops> \
		mlir::linalg::linalgLowerOpToLoops<loop::ParallelOp, OP_TYPE>( \
PatternRewriter & rewriter, Operation * op);		PatternRewriter & rewriter, Operation * op);

INSTANTIATE_LINALG_OP_TO_LOOPS(CopyOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(CopyOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(FillOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(FillOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(DotOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(DotOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(MatvecOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(MatvecOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(MatmulOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(MatmulOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(ConvOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(ConvOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMaxOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMaxOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMinOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMinOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingSumOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingSumOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(GenericOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(GenericOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(IndexedGenericOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(IndexedGenericOp)

// TODO(pifon): Enable lowering to parallel loops for ops other than
// linalg.generic for now to be on the safe side.
template LogicalResult
mlir::linalg::linalgOpToParallelLoops<GenericOp>(PatternRewriter &rewriter,
Operation *op);

mlir/test/Dialect/Linalg/loops.mlir

// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck %s		// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck --check-prefix=CHECKLOOP %s
		// RUN: mlir-opt %s -convert-linalg-to-parallel-loops \| FileCheck --check-prefix=CHECKPARALLEL %s

// Test that we can lower all the way to LLVM without crashing, don't check results here.		// Test that we can lower all the way to LLVM without crashing, don't check results here.
// RUN: mlir-opt %s --convert-linalg-to-llvm -o=/dev/null 2>&1		// RUN: mlir-opt %s --convert-linalg-to-llvm -o=/dev/null 2>&1

// CHECK-DAG: #[[strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>		// CHECKLOOP-DAG: #[[strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
// CHECK-DAG: #[[strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>		// CHECKLOOP-DAG: #[[strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
// CHECK-DAG: #[[strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>		// CHECKLOOP-DAG: #[[strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
// CHECK-DAG: #[[strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>		// CHECKLOOP-DAG: #[[strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
// CHECK-DAG: #[[clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>		// CHECKLOOP-DAG: #[[clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>

// CHECK-DAG: #[[Stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECKLOOP-DAG: #[[Stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
// CHECK-DAG: #[[Stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>		// CHECKLOOP-DAG: #[[Stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>
// CHECK-DAG: #[[Stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>		// CHECKLOOP-DAG: #[[Stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>
// CHECK-DAG: #[[Stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>		// CHECKLOOP-DAG: #[[Stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>

		// CHECKPARALLEL-DAG: #[[strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
		// CHECKPARALLEL-DAG: #[[strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
		// CHECKPARALLEL-DAG: #[[strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
		// CHECKPARALLEL-DAG: #[[strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
		// CHECKPARALLEL-DAG: #[[clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>

		// CHECKPARALLEL-DAG: #[[Stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
		// CHECKPARALLEL-DAG: #[[Stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>
		// CHECKPARALLEL-DAG: #[[Stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>
		// CHECKPARALLEL-DAG: #[[Stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>


func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {		func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%A = view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%A = view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
%B = view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%B = view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
%C = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%C = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
linalg.matmul(%A, %B, %C) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>		linalg.matmul(%A, %B, %C) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>
return		return
}		}
// CHECK-LABEL: func @matmul(%{{.*}}: memref<?xi8>,		// CHECKLOOP-LABEL: func @matmul(%{{.*}}: memref<?xi8>,
// CHECK-SAME: [[M:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[M:arg[0-9]+]]: index
// CHECK-SAME: [[N:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[N:arg[0-9]+]]: index
// CHECK-SAME: [[K:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[K:arg[0-9]+]]: index
// CHECK: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[N]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[N]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[b:.]] = load %[[B]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %[[B]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.]] = load %[[C]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[c:.]] = load %[[C]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %[[C]][%{{.}}, %{{.}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: store %[[res]], %[[C]][%{{.}}, %{{.}}] : memref<?x?xf32, #[[strided2D]]>

		// CHECKPARALLEL-LABEL: func @matmul(%{{.*}}: memref<?xi8>,
		// CHECKPARALLEL-SAME: [[M:arg[0-9]+]]: index
		// CHECKPARALLEL-SAME: [[N:arg[0-9]+]]: index
		// CHECKPARALLEL-SAME: [[K:arg[0-9]+]]: index
		// CHECKPARALLEL: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[M]], %[[N]]) step (%{{.}}, %{{.}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %[[B]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.]] = load %[[C]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %[[C]][%{{.}}, %{{.}}] : memref<?x?xf32, #[[strided2D]]>



func @matvec(%arg0: memref<?xi8>, %M: index, %N: index) {		func @matvec(%arg0: memref<?xi8>, %M: index, %N: index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%2 = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%2 = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
%3 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%3 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
%4 = view %arg0[%c0][%N] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%4 = view %arg0[%c0][%N] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
linalg.matvec(%2, %3, %4) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>		linalg.matvec(%2, %3, %4) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>
return		return
}		}
// CHECK-LABEL: func @matvec(%{{.*}}: memref<?xi8>,		// CHECKLOOP-LABEL: func @matvec(%{{.*}}: memref<?xi8>,
// CHECK-SAME: [[M:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[M:arg[0-9]+]]: index
// CHECK-SAME: [[K:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[K:arg[0-9]+]]: index
// CHECK: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.]] = load %[[C]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[c:.]] = load %[[C]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %[[C]][%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: store %[[res]], %[[C]][%{{.*}}] : memref<?xf32, #[[strided1D]]>

		// CHECKPARALLEL-LABEL: func @matvec(%{{.*}}: memref<?xi8>,
		// CHECKPARALLEL-SAME: [[M:arg[0-9]+]]: index
		// CHECKPARALLEL-SAME: [[K:arg[0-9]+]]: index
		// CHECKPARALLEL: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}) = (%{{.}}) to (%[[M]]) step (%{{.*}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.]] = load %[[C]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %[[C]][%{{.*}}] : memref<?xf32, #[[strided1D]]>


func @dot(%arg0: memref<?xi8>, %M: index) {		func @dot(%arg0: memref<?xi8>, %M: index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%1 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%1 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
%2 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%2 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
%3 = view %arg0[][] : memref<?xi8> to memref<f32>		%3 = view %arg0[][] : memref<?xi8> to memref<f32>
linalg.dot(%1, %2, %3) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>		linalg.dot(%1, %2, %3) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>
return		return
}		}
// CHECK-LABEL: func @dot(%{{.*}}: memref<?xi8>,		// CHECKLOOP-LABEL: func @dot(%{{.*}}: memref<?xi8>,
// CHECK-SAME: [[K:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[K:arg[0-9]+]]: index
// CHECK: %[[A:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[A:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: %[[B:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[B:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: %[[C:.]] = std.view %{{.}}[][] : memref<?xi8> to memref<f32>		// CHECKLOOP: %[[C:.]] = std.view %{{.}}[][] : memref<?xi8> to memref<f32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		bondhugulaUnsubmitted Not Done Reply Inline Actions Is there a need to match all of the trailing 'step %{{.}}'? You always print step right? bondhugula:* Is there a need to match all of the trailing 'step %{{.*}}'? You always print step right?
		mravishankarAuthorUnsubmitted Done Reply Inline Actions Probably not. I didnt change what was already there, just changed the check-prefix. I would rather keep it as is. mravishankar: Probably not. I didnt change what was already there, just changed the check-prefix. I would…
// CHECK-DAG: %[[a:.]] = load %[[A]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %[[A]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.*]] = load %[[C]][] : memref<f32>		// CHECKLOOP-DAG: %[[c:.*]] = load %[[C]][] : memref<f32>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %[[C]][] : memref<f32>		// CHECKLOOP: store %[[res]], %[[C]][] : memref<f32>

		// CHECKPARALLEL-LABEL: func @dot(%{{.*}}: memref<?xi8>,
		// CHECKPARALLEL-SAME: [[K:arg[0-9]+]]: index
		// CHECKPARALLEL: %[[A:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: %[[B:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: %[[C:.]] = std.view %{{.}}[][] : memref<?xi8> to memref<f32>
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %[[A]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.*]] = load %[[C]][] : memref<f32>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %[[C]][] : memref<f32>


func @dot_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {		func @dot_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {
linalg.dot(%arg0, %arg1, %arg2) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>		linalg.dot(%arg0, %arg1, %arg2) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>
return		return
}		}
// CHECK-LABEL: func @dot_view(		// CHECKLOOP-LABEL: func @dot_view(
// CHECK: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.*}}: memref<f32>) {		// CHECKLOOP: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.*}}: memref<f32>) {
// CHECK: %[[K:.*]] = dim %arg0, 0 : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 0 : memref<?xf32, #[[strided1D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK-DAG: %[[a:.]] = load %arg0[%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %arg0[%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[b:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.]] = load %{{.}}[] : memref<f32>		// CHECKLOOP-DAG: %[[c:.]] = load %{{.}}[] : memref<f32>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %{{.*}}[] : memref<f32>		// CHECKLOOP: store %[[res]], %{{.*}}[] : memref<f32>

		// CHECKPARALLEL-LABEL: func @dot_view(
		// CHECKPARALLEL: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.*}}: memref<f32>) {
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 0 : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %arg0[%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.]] = load %{{.}}[] : memref<f32>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %{{.*}}[] : memref<f32>

func @fill_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: f32) {		func @fill_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: f32) {
linalg.fill(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, f32		linalg.fill(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, f32
return		return
}		}
// CHECK-LABEL: func @fill_view(		// CHECKLOOP-LABEL: func @fill_view(
// CHECK: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: f32) {		// CHECKLOOP: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: f32) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: store %{{.}}, %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>

		// CHECKPARALLEL-LABEL: func @fill_view(
		// CHECKPARALLEL: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: f32) {
		// CHECKPARALLEL: loop.parallel (%{{.}}) = (%{{.}}) to (%{{.}}) step (%{{.}}) {
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>

func @fill_view0(%arg0: memref<f32>, %arg1: f32) {		func @fill_view0(%arg0: memref<f32>, %arg1: f32) {
linalg.fill(%arg0, %arg1) : memref<f32>, f32		linalg.fill(%arg0, %arg1) : memref<f32>, f32
return		return
}		}
// CHECK-LABEL: func @fill_view0(%{{.}}: memref<f32>, %{{.}}: f32) {		// CHECKLOOP-LABEL: func @fill_view0(%{{.}}: memref<f32>, %{{.}}: f32) {
// CHECK: store %{{.}}, %{{.}}[] : memref<f32>		// CHECKLOOP: store %{{.}}, %{{.}}[] : memref<f32>

		// CHECKPARALLEL-LABEL: func @fill_view0(%{{.}}: memref<f32>, %{{.}}: f32) {
		// CHECKPARALLEL: store %{{.}}, %{{.}}[] : memref<f32>

func @fill_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: f32) {		func @fill_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: f32) {
linalg.fill(%arg0, %arg1) : memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, f32		linalg.fill(%arg0, %arg1) : memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, f32
return		return
}		}
// CHECK-LABEL: func @fill_view3(		// CHECKLOOP-LABEL: func @fill_view3(
// CHECK: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: f32) {		// CHECKLOOP: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: f32) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: func @fill_view3(
		// CHECKPARALLEL: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: f32) {
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%{{.}}, %{{.}}, %{{.}}) step (%{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

func @copy_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>) {		func @copy_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>) {
linalg.copy(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>		linalg.copy(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>
return		return
}		}
// CHECK-LABEL: func @copy_view(		// CHECKLOOP-LABEL: func @copy_view(
// CHECK: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>) {		// CHECKLOOP: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: %[[L:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[L:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
// CHECK: store %[[L]], %{{.}}[%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: store %[[L]], %{{.}}[%{{.}}] : memref<?xf32, #[[strided1D]]>

		// CHECKPARALLEL-LABEL: func @copy_view(
		// CHECKPARALLEL: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>) {
		// CHECKPARALLEL: loop.parallel (%{{.}}) = (%{{.}}) to (%{{.}}) step (%{{.}}) {
		// CHECKPARALLEL: %[[L:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: store %[[L]], %{{.}}[%{{.}}] : memref<?xf32, #[[strided1D]]>

func @copy_view0(%arg0: memref<f32>, %arg1: memref<f32>) {		func @copy_view0(%arg0: memref<f32>, %arg1: memref<f32>) {
linalg.copy(%arg0, %arg1) : memref<f32>, memref<f32>		linalg.copy(%arg0, %arg1) : memref<f32>, memref<f32>
return		return
}		}
// CHECK-LABEL: func @copy_view0(%{{.}}: memref<f32>, %{{.}}: memref<f32>) {		// CHECKLOOP-LABEL: func @copy_view0(%{{.}}: memref<f32>, %{{.}}: memref<f32>) {
// CHECK: %{{.}} = load %{{.}}[] : memref<f32>		// CHECKLOOP: %{{.}} = load %{{.}}[] : memref<f32>
// CHECK: store %{{.}}, %{{.}}[] : memref<f32>		// CHECKLOOP: store %{{.}}, %{{.}}[] : memref<f32>

		// CHECKPARALLEL-LABEL: func @copy_view0(%{{.}}: memref<f32>, %{{.}}: memref<f32>) {
		// CHECKPARALLEL: %{{.}} = load %{{.}}[] : memref<f32>
		// CHECKPARALLEL: store %{{.}}, %{{.}}[] : memref<f32>

func @copy_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @copy_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.copy(%arg0, %arg1) {inputPermutation = affine_map<(i, j, k) -> (i, k, j)>,		linalg.copy(%arg0, %arg1) {inputPermutation = affine_map<(i, j, k) -> (i, k, j)>,
outputPermutation = affine_map<(i, j, k) -> (k, j, i)>} :		outputPermutation = affine_map<(i, j, k) -> (k, j, i)>} :
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: func @copy_view3		// CHECKLOOP-LABEL: func @copy_view3
// CHECK: (%{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>) {		// CHECKLOOP: (%{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: %[[L:.]] = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[L:.]] = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[L]], %{{.}}[%{{.}}, %{{.}}, %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[L]], %{{.}}[%{{.}}, %{{.}}, %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: func @copy_view3
		// CHECKPARALLEL: (%{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>) {
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%{{.}}, %{{.}}, %{{.}}) step (%{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: %[[L:.]] = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[L]], %{{.}}[%{{.}}, %{{.}}, %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>

func @conv_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @conv_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.conv(%arg0, %arg1, %arg2) {strides = [2]}: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		linalg.conv(%arg0, %arg1, %arg2) {strides = [2]}: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: func @conv_view3(		// CHECKLOOP-LABEL: func @conv_view3(
// CHECK: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[strided3D]]>) {		// CHECKLOOP: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[strided3D]]>) {
// CHECK: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[Q:.*]] = dim %arg0, 1 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[Q:.*]] = dim %arg0, 1 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[K:.*]] = dim %arg0, 2 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 2 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
// CHECK: %[[SUM:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[SUM:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %{{.}} = mulf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %{{.}} = addf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: func @conv_view3(
		// CHECKPARALLEL: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[strided3D]]>) {
		// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, 1 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 2 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[K]]) step (%{{.}}, %{{.}}, %{{.*}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
		// CHECKPARALLEL: %[[SUM:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

func @conv_view4(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {		func @conv_view4(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {
linalg.conv(%arg0, %arg1, %arg2) {dilations = [4, 5], strides = [2, 3]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>		linalg.conv(%arg0, %arg1, %arg2) {dilations = [4, 5], strides = [2, 3]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>
return		return
}		}
// CHECK-LABEL: func @conv_view4(		// CHECKLOOP-LABEL: func @conv_view4(
// CHECK: %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[strided4D]]>) {		// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[strided4D]]>) {
// CHECK: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
// CHECK: %[[SUM0:.]] = affine.apply #[[Stride2Dilation4]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[SUM0:.]] = affine.apply #[[Stride2Dilation4]](%{{.}}, %{{.*}})
// CHECK: %[[SUM1:.]] = affine.apply #[[Stride3Dilation5]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[SUM1:.]] = affine.apply #[[Stride3Dilation5]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %{{.}} = mulf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %{{.}} = addf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>

		// CHECKPARALLEL-LABEL: func @conv_view4(
		// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[strided4D]]>) {
		// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]], %[[K]]) step (%{{.}}, %{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
		// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #[[Stride2Dilation4]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #[[Stride3Dilation5]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>


func @conv_padding(%arg0: memref<?x?x?x?xf32>,		func @conv_padding(%arg0: memref<?x?x?x?xf32>,
%arg1: memref<?x?x?x?xf32>,		%arg1: memref<?x?x?x?xf32>,
%arg2: memref<?x?x?x?xf32>) {		%arg2: memref<?x?x?x?xf32>) {
linalg.conv(%arg0, %arg1, %arg2) {dilations = [1, 1],		linalg.conv(%arg0, %arg1, %arg2) {dilations = [1, 1],
padding = dense<[[0, 1], [1, 1]]> : tensor<2x2xi64>,		padding = dense<[[0, 1], [1, 1]]> : tensor<2x2xi64>,
strides = [1, 1]} :		strides = [1, 1]} :
memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>		memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>
return		return
}		}
// CHECK-LABEL: func @conv_padding		// CHECKLOOP-LABEL: func @conv_padding
// CHECK: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {		// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {
// CHECK: %[[ZERO:.*]] = constant 0.000000e+00 : f32		// CHECKLOOP: %[[ZERO:.*]] = constant 0.000000e+00 : f32
// CHECK: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32>
// CHECK: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32>
// CHECK: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32>
// CHECK: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32>
// CHECK: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32>
// CHECK: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32>
// CHECK: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
// CHECK: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})		// CHECKLOOP: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
// CHECK: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})		// CHECKLOOP: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
// CHECK: %[[IDX:.*]] = affine.max #[[clampMinMap]](%[[SUM0]])		// CHECKLOOP: %[[IDX:.*]] = affine.max #[[clampMinMap]](%[[SUM0]])
// CHECK: %[[IDY:.*]] = affine.max #[[clampMinMap]](%[[SUM1]])		// CHECKLOOP: %[[IDY:.*]] = affine.max #[[clampMinMap]](%[[SUM1]])
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>
// CHECK: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32		// CHECKLOOP: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
// CHECK: %{{.}} = mulf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
// CHECK: %{{.}} = addf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>

		// CHECKPARALLEL-LABEL: func @conv_padding
		// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {
		// CHECKPARALLEL: %[[ZERO:.*]] = constant 0.000000e+00 : f32
		// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]], %[[K]]) step (%{{.}}, %{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
		// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
		// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
		// CHECKPARALLEL: %[[IDX:.*]] = affine.max #[[clampMinMap]](%[[SUM0]])
		// CHECKPARALLEL: %[[IDY:.*]] = affine.max #[[clampMinMap]](%[[SUM1]])
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>

func @pooling_max(%arg0: memref<?x?xf32>,		func @pooling_max(%arg0: memref<?x?xf32>,
%arg1: memref<?x?xi32>,		%arg1: memref<?x?xi32>,
%arg2: memref<?x?xf32>) {		%arg2: memref<?x?xf32>) {
linalg.pooling_max(%arg0, %arg1, %arg2) { strides = [2, 1] }:		linalg.pooling_max(%arg0, %arg1, %arg2) { strides = [2, 1] }:
memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>		memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>
return		return
}		}
// CHECK-LABEL: func @pooling_max		// CHECKLOOP-LABEL: func @pooling_max
// CHECK: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>		// CHECKLOOP: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
// CHECK: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>		// CHECKLOOP: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
// CHECK: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>		// CHECKLOOP: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
// CHECK: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>		// CHECKLOOP: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
// CHECK: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
// CHECK: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
// CHECK: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32		// CHECKLOOP: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
// CHECK: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>		// CHECKLOOP: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

		// CHECKPARALLEL-LABEL: func @pooling_max
		// CHECKPARALLEL: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
		// CHECKPARALLEL: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
		// CHECKPARALLEL: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
		// CHECKPARALLEL: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[OX]], %[[OY]]) step (%{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
		// CHECKPARALLEL: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
		// CHECKPARALLEL: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
		// CHECKPARALLEL: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

func @pooling_min(%arg0: memref<?x?xf32>,		func @pooling_min(%arg0: memref<?x?xf32>,
%arg1: memref<?x?xi32>,		%arg1: memref<?x?xi32>,
%arg2: memref<?x?xf32>) {		%arg2: memref<?x?xf32>) {
linalg.pooling_min(%arg0, %arg1, %arg2) { strides = [2, 1] }:		linalg.pooling_min(%arg0, %arg1, %arg2) { strides = [2, 1] }:
memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>		memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>
return		return
}		}
// CHECK-LABEL: func @pooling_min		// CHECKLOOP-LABEL: func @pooling_min
// CHECK: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>		// CHECKLOOP: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
// CHECK: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>		// CHECKLOOP: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
// CHECK: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>		// CHECKLOOP: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
// CHECK: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>		// CHECKLOOP: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
// CHECK: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
// CHECK: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
// CHECK: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32		// CHECKLOOP: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
// CHECK: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>		// CHECKLOOP: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

		// CHECKPARALLEL-LABEL: func @pooling_min
		// CHECKPARALLEL: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
		// CHECKPARALLEL: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
		// CHECKPARALLEL: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
		// CHECKPARALLEL: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[OX]], %[[OY]]) step (%{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
		// CHECKPARALLEL: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
		// CHECKPARALLEL: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
		// CHECKPARALLEL: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

func @pooling_sum(%arg0: memref<?x?xf32>,		func @pooling_sum(%arg0: memref<?x?xf32>,
%arg1: memref<?x?xi32>,		%arg1: memref<?x?xi32>,
%arg2: memref<?x?xf32>) {		%arg2: memref<?x?xf32>) {
linalg.pooling_sum(%arg0, %arg1, %arg2) { strides = [2, 1] }:		linalg.pooling_sum(%arg0, %arg1, %arg2) { strides = [2, 1] }:
memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>		memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>
return		return
}		}
// CHECK-LABEL: func @pooling_sum		// CHECKLOOP-LABEL: func @pooling_sum
// CHECK: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>		// CHECKLOOP: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
// CHECK: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>		// CHECKLOOP: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
// CHECK: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>		// CHECKLOOP: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
// CHECK: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>		// CHECKLOOP: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
// CHECK: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[RHS:.]] = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>		// CHECKLOOP: %[[RHS:.]] = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
// CHECK: %[[LHS:.]] = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>		// CHECKLOOP: %[[LHS:.]] = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
// CHECK: %[[RES:.*]] = addf %[[LHS]], %[[RHS]] : f32		// CHECKLOOP: %[[RES:.*]] = addf %[[LHS]], %[[RHS]] : f32
// CHECK: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>		// CHECKLOOP: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

		// CHECKPARALLEL-LABEL: func @pooling_sum
		// CHECKPARALLEL: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
		// CHECKPARALLEL: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
		// CHECKPARALLEL: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
		// CHECKPARALLEL: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[OX]], %[[OY]]) step (%{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
		// CHECKPARALLEL: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[RHS:.]] = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
		// CHECKPARALLEL: %[[LHS:.]] = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
		// CHECKPARALLEL: %[[RES:.*]] = addf %[[LHS]], %[[RHS]] : f32
		// CHECKPARALLEL: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

func @foo(%0: f32, %1: f32, %2: f32) -> (f32, f32) {		func @foo(%0: f32, %1: f32, %2: f32) -> (f32, f32) {
%f0 = constant 0.0 : f32		%f0 = constant 0.0 : f32
return %f0, %f0 : f32, f32		return %f0, %f0 : f32, f32
}		}
#accesses = [		#accesses = [
affine_map<(i, j, k) -> (i, j)>,		affine_map<(i, j, k) -> (i, j)>,
affine_map<(i, j, k) -> (i, j, k)>,		affine_map<(i, j, k) -> (i, j, k)>,
affine_map<(i, j, k) -> (i, k, j)>		affine_map<(i, j, k) -> (i, k, j)>
]		]
#trait = {		#trait = {
args_in = 1,		args_in = 1,
args_out = 2,		args_out = 2,
iterator_types = ["parallel", "parallel", "parallel"],		iterator_types = ["parallel", "parallel", "parallel"],
indexing_maps = #accesses,		indexing_maps = #accesses,
fun = @foo,		fun = @foo,
library_call = "some_external_function_name_1",		library_call = "some_external_function_name_1",
doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"		doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"
}		}
func @generic_function(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @generic_function(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.generic #trait %arg0, %arg1, %arg2:		linalg.generic #trait %arg0, %arg1, %arg2:
memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: @foo		// CHECKLOOP-LABEL: @foo
// CHECK-LABEL: @generic_function		// CHECKLOOP-LABEL: @generic_function
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[res:.*]]:2 = call @foo(%[[a]], %[[b]], %[[c]]) : (f32, f32, f32) -> (f32, f32)		// CHECKLOOP: %[[res:.*]]:2 = call @foo(%[[a]], %[[b]], %[[c]]) : (f32, f32, f32) -> (f32, f32)
// CHECK: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: @foo
		// CHECKPARALLEL-LABEL: @generic_function
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[res:.*]]:2 = call @foo(%[[a]], %[[b]], %[[c]]) : (f32, f32, f32) -> (f32, f32)
		// CHECKPARALLEL: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

#trait2 = {		#trait2 = {
args_in = 1,		args_in = 1,
args_out = 2,		args_out = 2,
iterator_types = ["parallel", "parallel", "parallel"],		iterator_types = ["parallel", "parallel", "parallel"],
indexing_maps = #accesses,		indexing_maps = #accesses,
library_call = "some_external_function_name_2",		library_call = "some_external_function_name_2",
doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"		doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"
}		}
func @generic_region(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @generic_region(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.generic #trait2 %arg0, %arg1, %arg2 {		linalg.generic #trait2 %arg0, %arg1, %arg2 {
^bb0(%a: f32, %b: f32, %c: f32):		^bb0(%a: f32, %b: f32, %c: f32):
%d = mulf %a, %b : f32		%d = mulf %a, %b : f32
%e = addf %c, %d : f32		%e = addf %c, %d : f32
linalg.yield %d, %e : f32, f32		linalg.yield %d, %e : f32, f32
}: memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		}: memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: @generic_region		// CHECKLOOP-LABEL: @generic_region
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[d:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP: %[[d:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK: %[[e:.*]] = addf %[[c]], %[[d]] : f32		// CHECKLOOP: %[[e:.*]] = addf %[[c]], %[[d]] : f32
// CHECK: store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: @generic_region
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[d:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL: %[[e:.*]] = addf %[[c]], %[[d]] : f32
		// CHECKPARALLEL: store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

func @indexed_foo(%i: index, %j: index, %k: index, %0: f32, %1: f32, %2: f32) -> (f32, f32) {		func @indexed_foo(%i: index, %j: index, %k: index, %0: f32, %1: f32, %2: f32) -> (f32, f32) {
%i_int = index_cast %i: index to i32		%i_int = index_cast %i: index to i32
%i_float = sitofp %i_int : i32 to f32		%i_float = sitofp %i_int : i32 to f32
return %i_float, %i_float : f32, f32		return %i_float, %i_float : f32, f32
}		}
#trait3 = {		#trait3 = {
args_in = 1,		args_in = 1,
Show All 9 Lines	func @indexed_generic_function(
%arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,		%arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,
%arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		%arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.indexed_generic #trait3 %arg0, %arg1, %arg2:		linalg.indexed_generic #trait3 %arg0, %arg1, %arg2:
memref<?x?xf32, offset: ?, strides: [?, 1]>,		memref<?x?xf32, offset: ?, strides: [?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: @indexed_foo		// CHECKLOOP-LABEL: @indexed_foo
// CHECK-LABEL: @indexed_generic_function		// CHECKLOOP-LABEL: @indexed_generic_function
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[res:.*]]:2 = call @indexed_foo(%[[i]], %[[j]], %[[k]], %[[a]], %[[b]], %[[c]]) : (index, index, index, f32, f32, f32) -> (f32, f32)		// CHECKLOOP: %[[res:.*]]:2 = call @indexed_foo(%[[i]], %[[j]], %[[k]], %[[a]], %[[b]], %[[c]]) : (index, index, index, f32, f32, f32) -> (f32, f32)
// CHECK: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: @indexed_foo
		// CHECKPARALLEL-LABEL: @indexed_generic_function
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[res:.*]]:2 = call @indexed_foo(%[[i]], %[[j]], %[[k]], %[[a]], %[[b]], %[[c]]) : (index, index, index, f32, f32, f32) -> (f32, f32)
		// CHECKPARALLEL: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

#trait4 = {		#trait4 = {
args_in = 1,		args_in = 1,
args_out = 2,		args_out = 2,
iterator_types = ["parallel", "parallel", "parallel"],		iterator_types = ["parallel", "parallel", "parallel"],
indexing_maps = #accesses,		indexing_maps = #accesses,
library_call = "some_external_function_name_2",		library_call = "some_external_function_name_2",
doc = "B(i,j,k), C(i,k,j) = foo(A(i, j) * B(i,j,k), i * j * k + C(i,k,j))"		doc = "B(i,j,k), C(i,k,j) = foo(A(i, j) * B(i,j,k), i * j * k + C(i,k,j))"
Show All 14 Lines	^bb0(%i: index, %j: index, %k: index, %a: f32, %b: f32, %c: f32):
%result_2 = addf %c, %ijk_float : f32		%result_2 = addf %c, %ijk_float : f32
linalg.yield %result_1, %result_2 : f32, f32		linalg.yield %result_1, %result_2 : f32, f32
}: memref<?x?xf32, offset: ?, strides: [?, 1]>,		}: memref<?x?xf32, offset: ?, strides: [?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}

// CHECK-LABEL: @indexed_generic_region		// CHECKLOOP-LABEL: @indexed_generic_region
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]]		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]]
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]]		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]]
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]]		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]]
// CHECK: %[[result_1:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP: %[[result_1:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK: %[[ij:.*]] = addi %[[i]], %[[j]] : index		// CHECKLOOP: %[[ij:.*]] = addi %[[i]], %[[j]] : index
// CHECK: %[[ijk:.*]] = addi %[[ij]], %[[k]] : index		// CHECKLOOP: %[[ijk:.*]] = addi %[[ij]], %[[k]] : index
// CHECK: %[[ijk_int:.*]] = index_cast %[[ijk]] : index to i32		// CHECKLOOP: %[[ijk_int:.*]] = index_cast %[[ijk]] : index to i32
// CHECK: %[[ijk_float:.*]] = sitofp %[[ijk_int]] : i32 to f32		// CHECKLOOP: %[[ijk_float:.*]] = sitofp %[[ijk_int]] : i32 to f32
// CHECK: %[[result_2:.*]] = addf %[[c]], %[[ijk_float]] : f32		// CHECKLOOP: %[[result_2:.*]] = addf %[[c]], %[[ijk_float]] : f32
// CHECK: store %[[result_1]], %{{.*}}[%[[i]], %[[j]], %[[k]]]		// CHECKLOOP: store %[[result_1]], %{{.*}}[%[[i]], %[[j]], %[[k]]]
// CHECK: store %[[result_2]], %{{.*}}[%[[i]], %[[k]], %[[j]]]		// CHECKLOOP: store %[[result_2]], %{{.*}}[%[[i]], %[[k]], %[[j]]]

		// CHECKPARALLEL-LABEL: @indexed_generic_region
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]]
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]]
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]]
		// CHECKPARALLEL: %[[result_1:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL: %[[ij:.*]] = addi %[[i]], %[[j]] : index
		// CHECKPARALLEL: %[[ijk:.*]] = addi %[[ij]], %[[k]] : index
		// CHECKPARALLEL: %[[ijk_int:.*]] = index_cast %[[ijk]] : index to i32
		// CHECKPARALLEL: %[[ijk_float:.*]] = sitofp %[[ijk_int]] : i32 to f32
		// CHECKPARALLEL: %[[result_2:.*]] = addf %[[c]], %[[ijk_float]] : f32
		// CHECKPARALLEL: store %[[result_1]], %{{.*}}[%[[i]], %[[j]], %[[k]]]
		// CHECKPARALLEL: store %[[result_2]], %{{.*}}[%[[i]], %[[k]], %[[j]]]

// -----		// -----

#broadcast_access = [		#broadcast_access = [
affine_map<(i, j) -> ()>,		affine_map<(i, j) -> ()>,
affine_map<(i, j) -> (i, j)>		affine_map<(i, j) -> (i, j)>
]		]

Show All 9 Lines
{		{
linalg.generic #trait_broadcast %arg0, %arg1 {		linalg.generic #trait_broadcast %arg0, %arg1 {
^bb(%a: f32, %b: f32) :		^bb(%a: f32, %b: f32) :
linalg.yield %a : f32		linalg.yield %a : f32
} : memref<f32>, memref<3x4xf32>		} : memref<f32>, memref<3x4xf32>
return		return
}		}

// CHECK-LABEL: @generic_op_zero_rank		// CHECKLOOP-LABEL: @generic_op_zero_rank
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xf32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xf32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][]		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][]
// CHECK: store %[[a]], %[[ARG1]][%[[i]], %[[j]]]		// CHECKLOOP: store %[[a]], %[[ARG1]][%[[i]], %[[j]]]

		// CHECKPARALLEL-LABEL: @generic_op_zero_rank
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xf32>
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]])
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][]
		// CHECKPARALLEL: store %[[a]], %[[ARG1]][%[[i]], %[[j]]]

func @indexed_generic_op_zero_rank(%arg0: memref<i32>, %arg1: memref<3x4xi32>)		func @indexed_generic_op_zero_rank(%arg0: memref<i32>, %arg1: memref<3x4xi32>)
{		{
linalg.indexed_generic #trait_broadcast %arg0, %arg1 {		linalg.indexed_generic #trait_broadcast %arg0, %arg1 {
^bb(%i: index, %j: index, %a: i32, %b: i32) :		^bb(%i: index, %j: index, %a: i32, %b: i32) :
%ij = addi %i, %j : index		%ij = addi %i, %j : index
%ij_int = index_cast %ij : index to i32		%ij_int = index_cast %ij : index to i32
%result = addi %a, %ij_int : i32		%result = addi %a, %ij_int : i32
linalg.yield %result : i32		linalg.yield %result : i32
} : memref<i32>, memref<3x4xi32>		} : memref<i32>, memref<3x4xi32>
return		return
}		}

// CHECK-LABEL: @indexed_generic_op_zero_rank		// CHECKLOOP-LABEL: @indexed_generic_op_zero_rank
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<i32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<i32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xi32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xi32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][
// CHECK: %[[ij:.*]] = addi %[[i]], %[[j]] : index		// CHECKLOOP: %[[ij:.*]] = addi %[[i]], %[[j]] : index
// CHECK: %[[ij_int:.*]] = index_cast %[[ij]] : index to i32		// CHECKLOOP: %[[ij_int:.*]] = index_cast %[[ij]] : index to i32
// CHECK: %[[result:.*]] = addi %[[a]], %[[ij_int]] : i32		// CHECKLOOP: %[[result:.*]] = addi %[[a]], %[[ij_int]] : i32
// CHECK: store %[[result]], %[[ARG1]][%[[i]], %[[j]]]		// CHECKLOOP: store %[[result]], %[[ARG1]][%[[i]], %[[j]]]

		// CHECKPARALLEL-LABEL: @indexed_generic_op_zero_rank
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<i32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xi32>
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]])
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][
		// CHECKPARALLEL: %[[ij:.*]] = addi %[[i]], %[[j]] : index
		// CHECKPARALLEL: %[[ij_int:.*]] = index_cast %[[ij]] : index to i32
		// CHECKPARALLEL: %[[result:.*]] = addi %[[a]], %[[ij_int]] : i32
		// CHECKPARALLEL: store %[[result]], %[[ARG1]][%[[i]], %[[j]]]

#reduce_1D_access = [		#reduce_1D_access = [
affine_map<(i) -> (i)>,		affine_map<(i) -> (i)>,
affine_map<(i) -> ()>		affine_map<(i) -> ()>
]		]

#trait_reduce_1D = {		#trait_reduce_1D = {
args_in = 1,		args_in = 1,
args_out = 1,		args_out = 1,
indexing_maps = #reduce_1D_access,		indexing_maps = #reduce_1D_access,
iterator_types = ["reduction"],		iterator_types = ["reduction"],
library_call = "some_reduce_external_fn"		library_call = "some_reduce_external_fn"
}		}

func @generic_op_1D_reduce(%arg0: memref<?xf32>, %arg1: memref<f32>)		func @generic_op_1D_reduce(%arg0: memref<?xf32>, %arg1: memref<f32>)
{		{
linalg.generic #trait_reduce_1D %arg0, %arg1 {		linalg.generic #trait_reduce_1D %arg0, %arg1 {
^bb(%a: f32, %b: f32) :		^bb(%a: f32, %b: f32) :
%0 = addf %a, %b : f32		%0 = addf %a, %b : f32
linalg.yield %0 : f32		linalg.yield %0 : f32
} : memref<?xf32>, memref<f32>		} : memref<?xf32>, memref<f32>
return		return
}		}
// CHECK-LABEL: @generic_op_1D_reduce		// CHECKLOOP-LABEL: @generic_op_1D_reduce
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][%[[i]]]		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][%[[i]]]
// CHECK: %[[b:.*]] = load %[[ARG1]][]		// CHECKLOOP: %[[b:.*]] = load %[[ARG1]][]
// CHECK: %[[c:.*]] = addf %[[a]], %[[b]] : f32		// CHECKLOOP: %[[c:.*]] = addf %[[a]], %[[b]] : f32
// CHECK: store %[[c]], %[[ARG1]][]		// CHECKLOOP: store %[[c]], %[[ARG1]][]

		// CHECKPARALLEL-LABEL: @generic_op_1D_reduce
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL: loop.for %[[i:.]] = {{.}}
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][%[[i]]]
		// CHECKPARALLEL: %[[b:.*]] = load %[[ARG1]][]
		// CHECKPARALLEL: %[[c:.*]] = addf %[[a]], %[[b]] : f32
		// CHECKPARALLEL: store %[[c]], %[[ARG1]][]


#reduce_init_1D_access = [		#reduce_init_1D_access = [
affine_map<(i) -> (i)>,		affine_map<(i) -> (i)>,
affine_map<(i) -> ()>,		affine_map<(i) -> ()>,
affine_map<(i) -> ()>		affine_map<(i) -> ()>
]		]

Show All 14 Lines	^bb(%i : index, %a: f32, %b: f32, %c: f32) :
%0 = constant 0 : index		%0 = constant 0 : index
%1 = cmpi "eq", %0, %i : index		%1 = cmpi "eq", %0, %i : index
%2 = select %1, %b, %c : f32		%2 = select %1, %b, %c : f32
%3 = addf %a, %2 : f32		%3 = addf %a, %2 : f32
linalg.yield %3 : f32		linalg.yield %3 : f32
} : memref<?xf32>, memref<f32>, memref<f32>		} : memref<?xf32>, memref<f32>, memref<f32>
return		return
}		}
// CHECK-LABEL: @indexed_generic_op_1D_reduce		// CHECKLOOP-LABEL: @indexed_generic_op_1D_reduce
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][%[[i]]]		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][%[[i]]]
// CHECK: %[[b:.*]] = load %[[ARG1]][]		// CHECKLOOP: %[[b:.*]] = load %[[ARG1]][]
// CHECK: %[[c:.*]] = load %[[ARG2]][]		// CHECKLOOP: %[[c:.*]] = load %[[ARG2]][]
// CHECK: %[[d:.]] = select %{{.}}, %[[b]], %[[c]]		// CHECKLOOP: %[[d:.]] = select %{{.}}, %[[b]], %[[c]]
// CHECK: %[[e:.*]] = addf %[[a]], %[[d]]		// CHECKLOOP: %[[e:.*]] = addf %[[a]], %[[d]]
// CHECK: store %[[e]], %[[ARG2]][]		// CHECKLOOP: store %[[e]], %[[ARG2]][]

		// CHECKPARALLEL-LABEL: @indexed_generic_op_1D_reduce
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL: loop.for %[[i:.]] = {{.}}
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][%[[i]]]
		// CHECKPARALLEL: %[[b:.*]] = load %[[ARG1]][]
		// CHECKPARALLEL: %[[c:.*]] = load %[[ARG2]][]
		// CHECKPARALLEL: %[[d:.]] = select %{{.}}, %[[b]], %[[c]]
		// CHECKPARALLEL: %[[e:.*]] = addf %[[a]], %[[d]]
		// CHECKPARALLEL: store %[[e]], %[[ARG2]][]

#trait_const_fill = {		#trait_const_fill = {
args_in = 0,		args_in = 0,
args_out = 1,		args_out = 1,
indexing_maps = [affine_map<(i) -> (i)>],		indexing_maps = [affine_map<(i) -> (i)>],
iterator_types = ["parallel"],		iterator_types = ["parallel"],
library_call = "some_external_fn"		library_call = "some_external_fn"
}		}
func @generic_const_init(%arg0: memref<?xf32>) {		func @generic_const_init(%arg0: memref<?xf32>) {
%cst = constant 1.0 : f32		%cst = constant 1.0 : f32
linalg.generic #trait_const_fill %arg0 {		linalg.generic #trait_const_fill %arg0 {
^bb0(%arg1: f32): // no predecessors		^bb0(%arg1: f32): // no predecessors
linalg.yield %cst : f32		linalg.yield %cst : f32
}: memref<?xf32>		}: memref<?xf32>
return		return
}		}
// CHECK-LABEL: @generic_const_init		// CHECKLOOP-LABEL: @generic_const_init
// CHECK-SAME: %[[ARG0:.*]]: memref<?xf32>		// CHECKLOOP-SAME: %[[ARG0:.*]]: memref<?xf32>
// CHECK: %[[CONST:.*]] = constant 1.000000e+00 : f32		// CHECKLOOP: %[[CONST:.*]] = constant 1.000000e+00 : f32
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: store %[[CONST]], %[[ARG0]]		// CHECKLOOP: store %[[CONST]], %[[ARG0]]

		// CHECKPARALLEL-LABEL: @generic_const_init
		// CHECKPARALLEL-SAME: %[[ARG0:.*]]: memref<?xf32>
		// CHECKPARALLEL: %[[CONST:.*]] = constant 1.000000e+00 : f32
		// CHECKPARALLEL: loop.parallel (%[[i:.*]])
		// CHECKPARALLEL: store %[[CONST]], %[[ARG0]]

mlir/test/Dialect/Linalg/parallel_loops.mlir

	Show All 26 Lines
	// CHECK: %[[SUM_ELEM:.*]] = load %[[SUM]][%[[I]], %[[J]]]			// CHECK: %[[SUM_ELEM:.*]] = load %[[SUM]][%[[I]], %[[J]]]
	// CHECK: %[[SUM:.*]] = addf %[[LHS_ELEM]], %[[RHS_ELEM]] : f32			// CHECK: %[[SUM:.*]] = addf %[[LHS_ELEM]], %[[RHS_ELEM]] : f32
	// CHECK: store %[[SUM]], %{{.*}}[%[[I]], %[[J]]]			// CHECK: store %[[SUM]], %{{.*}}[%[[I]], %[[J]]]
	// CHECK: loop.yield			// CHECK: loop.yield

	// -----			// -----

	#accesses = [			#accesses = [
	affine_map<(m, n) -> (m, n)>,			affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,
	affine_map<(m, n) -> (m)>			affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
	]			]
	#trait = {			#trait = {
	args_in = 1,			args_in = 1,
	args_out = 1,			args_out = 1,
	iterator_types = ["parallel", "reduction"],			iterator_types = ["parallel", "parallel", "reduction", "parallel"],
	indexing_maps = #accesses			indexing_maps = #accesses
	}			}

	func @do_not_lower_reduce(%A: memref<2x4xf32>, %B: memref<2xf32>) {			func @lower_outer_parallel(%A: memref<?x?x?x?xf32>, %B: memref<?x?x?xf32>) {
	linalg.generic #trait %A, %B {			linalg.generic #trait %A, %B {
	^bb0(%a: f32, %b: f32):			^bb0(%a: f32, %b: f32):
	linalg.yield %a: f32			linalg.yield %a: f32
	} : memref<2x4xf32>, memref<2xf32>			} : memref<?x?x?x?xf32>, memref<?x?x?xf32>
	return			return
	}			}
	// CHECK-LABEL: @do_not_lower_reduce			// CHECK-LABEL: @lower_outer_parallel
	// CHECK: linalg.generic			// CHECK-DAG: %[[C0:.*]] = constant 0
				// CHECK-DAG: %[[C1:.*]] = constant 1
				// CHECK-DAG: %[[D0:.]] = dim %{{.}}, 0
				// CHECK-DAG: %[[D1:.]] = dim %{{.}}, 1
				// CHECK-DAG: %[[D2:.]] = dim %{{.}}, 2
				// CHECK-DAG: %[[D3:.]] = dim %{{.}}, 3
				// CHECK: loop.parallel (%[[IV0:.]], %[[IV1:.]]) = (%[[C0]], %[[C0]]) to (%[[D0]], %[[D1]]) step (%[[C1]], %[[C1]])
				// CHECK: loop.for %[[IV2:.*]] = %[[C0]] to %[[D2]] step %[[C1]]
				// CHECK: loop.for %[[IV3:.*]] = %[[C0]] to %[[D3]] step %[[C1]]
				// CHECK: load %{{.*}}[%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
				// CHECK: store %{{.}}, %{{.}}[%[[IV0]], %[[IV1]], %[[IV3]]]
				No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 257098

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h

mlir/lib/Dialect/Linalg/Transforms/LinalgToLoops.cpp

mlir/test/Dialect/Linalg/loops.mlir

mlir/test/Dialect/Linalg/parallel_loops.mlir

[mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.
ClosedPublic