This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/
-
Affine/EDSC/
-
EDSC/
-
Builders.h
-
Linalg/
-
EDSC/
-
Builders.h
-
Transforms/
-
LinalgTransforms.h
-
LoopOps/EDSC/
-
EDSC/
-
Builders.h
-
EDSC/
1
Builders.h
-
lib/Dialect/
-
Dialect/
-
Affine/EDSC/
-
EDSC/
-
Builders.cpp
-
Linalg/
-
EDSC/
-
Builders.cpp
-
Transforms/
-
LinalgToLoops.cpp
-
LoopOps/EDSC/
-
EDSC/
-
Builders.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
1/2
loops.mlir
-
parallel_loops.mlir

Differential D77678

[mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.
ClosedPublic

Authored by mravishankar on Apr 7 2020, 1:50 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
antiagainst
hanchung
asaadaldien
ftynse
rriddle
bondhugula

Commits

rG03391df90ed1: [mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.

Summary

The outer parallel loops of a linalg operation is lowered to
loop.parallel, with the other loops lowered to loop.for. This gets the
lowering to loop.parallel on par with the loop.for lowering. In future
the reduction loop could also be lowered to loop.parallel.

Also add a utility function that returns the loops that are
created. This requires change to the EDSC builders to return the
created ops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mravishankar created this revision.Apr 7 2020, 1:50 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 7 2020, 1:50 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, grosul1, Joonsoo and 12 others. · View Herald Transcript

mravishankar added reviewers: antiagainst, hanchung, asaadaldien, ftynse, rriddle.Apr 7 2020, 1:51 PM

mravishankar added a reviewer: bondhugula.

mravishankar removed subscribers: rriddle, antiagainst, nicolasvasilache.

Harbormaster failed remote builds in B52223: Diff 255795!Apr 7 2020, 2:12 PM

The utility function part looks like quite a massive change for just the purpose of getting the xxxForOp from their respective induction variables.
Why is this a good tradeoff (also, it is not tested FWICT)?

Could it be separated from this revision and evaluated independently as it seems quite orthogonal to the part that handles more cases of Linalg -> ploops ?

In D77678#1968093, @nicolasvasilache wrote:

The utility function part looks like quite a massive change for just the purpose of getting the xxxForOp from their respective induction variables.
Why is this a good tradeoff (also, it is not tested FWICT)?

If I understand the comment correctly, are you suggesting not changing the EDSC builders to return the operation handles as well? I think all the induction variables are block arguments of the entry block of the loop operations. I can get the parent op of the block and not modify the edsc builders. I started down the path of getting the op directly from the builders, and just finished it. I am fine with that approach as well. My initial thought was the approach of getting the op from the induction variables is more a WAR, but I dont have enough background in this to know if that is indeed the case.

Could it be separated from this revision and evaluated independently as it seems quite orthogonal to the part that handles more cases of Linalg -> ploops ?

Maybe. I want to do both lower linalg to ploops, and have the utility function that does that return the loops created as well, just like the tileLinalgOp method returns the resulting tiled op and loops created.

asaadaldien added inline comments.Apr 7 2020, 4:14 PM

mlir/include/mlir/EDSC/Builders.h
222	Can we move this block to back to line:79 ?

@mravishankar A general question on the direction this is taking: why are we even lowering all of this to loop.parallel and loop.for instead of affine.parallel and affine.for? The conversion from affine.* to loop.* is guaranteed to *always* succeed by design in all cases and it already does for affine.for -> loop.for. So you get your loop.for's for free (similarly affine.parallel -> loop.parallel). Looks like you are adding more and more code that is skipping a level of abstraction and introducing a parallel lowering path that would ultimately be redundant / subsumed. I feel this is taking the design and the infrastructure in the wrong direction, more so at each step. @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.

On a less major note, is there an example here that can't be represented via the affine dialect straightaway - the way it is today? Even all your loop steps are one (the ones I can immediately tell from the test cases) - if there are some cases that you need that aren't, they could always be normalized to one via affine (without even needing grayboxes for the cases you have).

bondhugula requested changes to this revision.Apr 7 2020, 8:54 PM

bondhugula added inline comments.

mlir/test/Dialect/Linalg/loops.mlir
128	Is there a need to match all of the trailing 'step %{{.*}}'? You always print step right?

This revision now requires changes to proceed.Apr 7 2020, 8:54 PM

FWIW, I entirely agree with @bondhugula 's sentiment here!

@bondhugula : Thanks for the comments. I certainly cannot think of a case in Linalg currently where the path of going from linalg -> affine -> loop will not be able to cover everything that currently is supported in Linalg. All I was trying to achieve with this change is to finish up the work that was started a while ago of lowering linalg to loop.parallel. From my perspective, loop.parallel and loop.for are in the same group. I certainly didnt intend to make a design or infrastructure level decision here.

I think it would be good to reach some consensus here on whether linalg should always lower to affine, and then lowered to loop dialect, and if it is not recommended to lower linalg to loops directly. As you mentioned going from affine.for -> loop.for is always guaranteed, and so is going from affine.parallel -> loop.parallel. So we can always make the decision to lower from linalg to affine and then to loops for both the parallel and non-parallel version. My only concern is that If there exists a lowering from linalg to loops, there should also be a way to target loop.parallel since without that you are dropping semantic information about the computation. This information is really important when lowering from loop dialect to GPU dialect (which was AFAIK one of the main motivations behind having loop.parallel in the first place).

@herhut who is probably also interested in this and has comments. I am assuming @nicolasvasilache will also provide his thoughts on this. FWIW, I am fine going through the affine dialect instead of directly going to loop dialect. I hadnt given thought about using affine dialect for my use cases, but i definitely dont see an issue with it.

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops. So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Side note: This change seems bigger than it is cause it is also doing some changes to EDSC which are probably not needed. Will try to remove those.

In D77678#1968714, @mravishankar wrote:

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops.

I've pointed this out a couple of times that this isn't accurate - you can represent non-constant tile sizes using either affine.parallel or affine.for (https://llvm.discourse.group/t/beginner-q-help-with-loops-affine-linalg/707/4).

So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Second, there is no issue with using a mix of affine and loop dialect ops - '-convert-affine-to-std' should be able to handle it by design. From a mix of affine.for and loop.for, it'll take you to just loop.for's. Please file a bug report if it doesn't!

In D77678#1968555, @bondhugula wrote:

. @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.

I did think about this, and we even had a document back in the time when had access to those ;) The discussion you want to have here is mostly independent of this patch, and pertains to the motivation for having the loop dialect in the first place. We had that discussion when the dialect was introduced.

Loop dialect was split out from Linalg, where the loop-related ops had been introduced to remove some of the affine constraints that were irrelevant and/or constraining for Linalg's use case. One of the constraints is the need for what I call "affine provenance", i.e. the set of rules spread out in the code that define which SSA values are allowed to be used as dimensions or as symbols in affine constructs. Supporting non-constant steps can be seen as a consequence of lifting those constraints. Linalg had (and still has) a forward-looking design that accounted for things like non-dense buffers and custom types. Plumbing all that through the affine machinery is hard (trust me, I tried).

While one can, in many cases, wiggle their way out of the representation problem, like you suggest with parametric steps, the question of whether one should remains pertinent. It's a complexity trade-off question. We can introduce extra operations and affine maps to model non-constant steps, call this an "affine idiom for parametric steps" and try to discover it when we reason about steps. We can introduce another idiom for another case that doesn't fit affine (let's take indirect accesses). And so on. This introduces extra complexity to the IR and to the code that manipulates it. What's the counterpart? Linalg-based flow does not intend to run affine transformations, so we cannot claim we pay the complexity price for having better optimization. We can spare some lowering code by... writing some other lowering code with more complex abstractions.

On a minor note, have you actually tried running the example you proposed in the linked forum post? :) There are many places where semi-affine maps are poorly supported, or not supported at all. Conversion to LLVM is one of them.

That being said, I was the one who has been arguing that Linalg lowering should go through affine when it can (and so was the design document). The problem is when it cannot, or when doing so would just increase the complexity of the system without visible benefits. Let's assume there are cases where it cannot (today, examples would be: using linalg ops with values that don't qualify as affine symbols, reductions that explicitly want to go through SSA values; tomorrow, we'll have sparse buffers). Then we need the possibility to emit at least some non-affine loops, which this patch contributes. Now, if there is one specific loop in a Linalg op that does not fit into affine, do we actually want a mix of affine and non-affine loops, or do we prefer a single non-affine loop nest that, e.g., preserves the idea of permutability that would be no longer discoverable by affine analysis. I can see value in having both options.

The actual duplication here is between Linalg->loop.for and Linalg->loop.parallel lowering, which I pointed out in one if the previous patches. Given that we have the lowering from loop.parallel to loop.for, we should remove the Linalg->loop.for and replace it with this. My recollection is that it was the plan, but it requires the lowering to loop.parallel to also support reductions, which this patch does not do.

Transformations on different kinds of loops are another question, unrelated to this patch. Again, I see value in removing affine restrictions or, conversely, having stricter restrictions such as hyper-rectangular, and separating the legality and profitability analysis (likely based on those restrictions) from the IR manipulation logic.

Herald added a subscriber: frgossen. · View Herald TranscriptApr 8 2020, 3:41 AM

Removing EDSC related changes to focus change only on lowering to
loop.parallel ops.

In D77678#1968752, @bondhugula wrote:

In D77678#1968714, @mravishankar wrote:

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops.

I've pointed this out a couple of times that this isn't accurate - you can represent non-constant tile sizes using either affine.parallel or affine.for (https://llvm.discourse.group/t/beginner-q-help-with-loops-affine-linalg/707/4).

Thanks for the pointer. As was done in that post, I just looked at the op definition and reached the conclusion about parametric tiling. I havent worked with affine dialect as much to know about such things. Its definitely something I want to look into in due course.

So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Second, there is no issue with using a mix of affine and loop dialect ops - '-lower-to-affine' should be able to handle it by design. From a mix of affine.for and loop.for, it'll take you to just loop.for's. Please file a bug report if it doesn't!

Agreed (and said so earlier). It should be OK to mix loop.parallel/loop.for with affine.for/affine.parallel. But based on your post is it possible to generate affine.for/affine.parallel while tiling linalg ops as well? That way the same benefit of going to affine.for/affine.parallel would be available at the inter-tile loops as well.

In D77678#1969051, @ftynse wrote:

In D77678#1968555, @bondhugula wrote:

. @mehdi_amini, @nicolasvasilache, @andydavis1 - has there been any thought and a clear design direction on this? If you go down this path, you'd be forced to duplicate even more of the infrastructure that exists on affine.for on loop.for in strictly less powerful ways and without a good reason. There may be a *few* things that you may just want to do on loop.for rather than on affine.for, but you could do that anyway even after having passed through the affine dialect.

I did think about this, and we even had a document back in the time when had access to those ;) The discussion you want to have here is mostly independent of this patch, and pertains to the motivation for having the loop dialect in the first place. We had that discussion when the dialect was introduced.

Loop dialect was split out from Linalg, where the loop-related ops had been introduced to remove some of the affine constraints that were irrelevant and/or constraining for Linalg's use case. One of the constraints is the need for what I call "affine provenance", i.e. the set of rules spread out in the code that define which SSA values are allowed to be used as dimensions or as symbols in affine constructs. Supporting non-constant steps can be seen as a consequence of lifting those constraints. Linalg had (and still has) a forward-looking design that accounted for things like non-dense buffers and custom types. Plumbing all that through the affine machinery is hard (trust me, I tried).

While one can, in many cases, wiggle their way out of the representation problem, like you suggest with parametric steps, the question of whether one should remains pertinent. It's a complexity trade-off question. We can introduce extra operations and affine maps to model non-constant steps, call this an "affine idiom for parametric steps" and try to discover it when we reason about steps. We can introduce another idiom for another case that doesn't fit affine (let's take indirect accesses). And so on. This introduces extra complexity to the IR and to the code that manipulates it. What's the counterpart? Linalg-based flow does not intend to run affine transformations, so we cannot claim we pay the complexity price for having better optimization. We can spare some lowering code by... writing some other lowering code with more complex abstractions.

Thanks @ftynse for the really useful background. I am certainly unaware of the discussion here. Would be really good if we could surface this back up on the discussion forum. But as you mentioned, I hope that this patch will be seen independent of that discussion. I am not trying to weigh the scales one way or the other, but rather just filling missing pieces where I can and when I need them.

The actual duplication here is between Linalg->loop.for and Linalg->loop.parallel lowering, which I pointed out in one if the previous patches. Given that we have the lowering from loop.parallel to loop.for, we should remove the Linalg->loop.for and replace it with this. My recollection is that it was the plan, but it requires the lowering to loop.parallel to also support reductions, which this patch does not do.

Agreed that lowering from linalg to loop.for should become redundant eventually, but right now the lowering to loop.parallel does not support reductions (Apologies for misrepresenting earlier that I am "finishing" the linalg to loop.parallel lowering, there are a couple of cases missing). As it stands, with this patch itself we "can" remove the linalg -> loop.for lowering. For the unhandled cases thats the fallback used anyway. So there is no change in functionality by merging the lowering to loop.for and loop.parallel.

mlir/test/Dialect/Linalg/loops.mlir
128	Probably not. I didnt change what was already there, just changed the check-prefix. I would rather keep it as is.

Thanks Mahesh!

Re Linalg, affine and loop (structured control flow), different transformations can be done at different levels.
Determining where which transformation should be done is still a very open problem: just because you *can* perform transformation A on representation B does not necessarily mean you *should*.
Tradeoffs include complexity, scalability, maintainability, driving transformations, composability, deriving profitability metrics etc etc etc.
The interesting part IMO is being able to mix these different systems and essentially compose Halide-style, affine-style and Allen-Kennedy-style transformations, declaratively, and evaluate based on concrete data.

All this is completely orthogonal to the fact that we need to build the different paths.
Linalg -> affine -> loops / SCF is one valid path
Linalg -> loops / SCF is another valid path
Affine ->loops / SCF is another valid path
None should be discouraged.

Harbormaster completed remote builds in B52379: Diff 256053.Apr 8 2020, 3:46 PM

In D77678#1969889, @mravishankar wrote:

In D77678#1968752, @bondhugula wrote:

In D77678#1968714, @mravishankar wrote:

Another point that is off the top of my head if the recommendation is to go through affine dialect. There is already mechanism to generate loop.parallel when tiling linalg operations. AFAIK, the tile size can be dynamic, and therefore cannot be expressed using affine.parallel loops.

I've pointed this out a couple of times that this isn't accurate - you can represent non-constant tile sizes using either affine.parallel or affine.for (https://llvm.discourse.group/t/beginner-q-help-with-loops-affine-linalg/707/4).

Thanks for the pointer. As was done in that post, I just looked at the op definition and reached the conclusion about parametric tiling. I havent worked with affine dialect as much to know about such things. Its definitely something I want to look into in due course.

So if the codegeneration process is tiling linalg ops and then lowering the tiled ops to loops, you can end up in a situation where the outer loops are in Loop dialect but the inner loops are in affine dialect. I am not sure there is an issue with that cause eventually you can lower the affine loops to loop dialect, but its just something that I havent reasoned fully about for myself.

Second, there is no issue with using a mix of affine and loop dialect ops - '-lower-to-affine' should be able to handle it by design. From a mix of affine.for and loop.for, it'll take you to just loop.for's. Please file a bug report if it doesn't!

Agreed (and said so earlier). It should be OK to mix loop.parallel/loop.for with affine.for/affine.parallel. But based on your post is it possible to generate affine.for/affine.parallel while tiling linalg ops as well? That way the same benefit of going to affine.for/affine.parallel would be available at the inter-tile loops as well.

Yes, of course. That's exactly what I've been pointing out. loop.for/parallel is currently being unnecessarily used for all these cases where it is possible to just go through affine.for/affine.parallel. Even with more general computation (where dim/symbol restrictions get in the way) that you may need in the future, with affine.scope ops (renamed from grayboxes), you'd never need to use loop.for/if to lower any tensor/memref indexing computation. And when you lower to loop.for/if, you'd get what you want. Is there a need to maintain a non-unit symbolic step for loop.for's steps? Your bounds are already any index type SSA value. Won't your code be simplified if you just canonicalized to a unit stride? With semi-affine maps, even with the current support, you only get *more* than what you get with loop.for's.

Adding explicit instantiation of linalgLowerOpToLoops for all linalg
named ops.

Rebase

Harbormaster completed remote builds in B52835: Diff 256826.Apr 11 2020, 11:57 PM

Harbormaster completed remote builds in B52836: Diff 256827.Apr 12 2020, 12:29 AM

Harbormaster completed remote builds in B52838: Diff 256829.Apr 12 2020, 1:02 AM

Rebase

Harbormaster completed remote builds in B52943: Diff 257006.Apr 13 2020, 10:45 AM

Rebase

@bondhugula I am submitting this change now. Based on comments here there might be larger discussions to be had over how different dialects interact which is separate from this change. Please let me know if you have any other comments on the patch itself.

Rebase

This revision was not accepted when it landed; it landed in state Needs Review.Apr 13 2020, 1:36 PM

Closed by commit rG03391df90ed1: [mlir][Linalg] Add loop.parallel lowering for all Linalg Ops. (authored by mravishankar). · Explain Why

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B52982: Diff 257082!Apr 13 2020, 1:37 PM

Harbormaster failed remote builds in B52988: Diff 257087!Apr 13 2020, 2:07 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Affine/

EDSC/

Builders.h

10 lines

Linalg/

EDSC/

Builders.h

28 lines

Transforms/

LinalgTransforms.h

7 lines

LoopOps/

EDSC/

Builders.h

23 lines

EDSC/

Builders.h

290 lines

lib/

Dialect/

Affine/

EDSC/

Builders.cpp

75 lines

Linalg/

EDSC/

Builders.cpp

97 lines

Transforms/

LinalgToLoops.cpp

237 lines

LoopOps/

EDSC/

Builders.cpp

56 lines

test/

Dialect/

Linalg/

loops.mlir

890 lines

parallel_loops.mlir

24 lines

Diff 255795

mlir/include/mlir/Dialect/Affine/EDSC/Builders.h

	//===- Builders.h - MLIR Declarative Builder Classes ------------- C++ --===//			//===- Builders.h - MLIR Declarative Builder Classes ------------- C++ --===//
				Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	Show All 15 Lines

	/// Constructs a new AffineForOp and captures the associated induction			/// Constructs a new AffineForOp and captures the associated induction
	/// variable. A ValueHandle pointer is passed as the first argument and is the			/// variable. A ValueHandle pointer is passed as the first argument and is the
	/// only way to capture the loop induction variable.			/// only way to capture the loop induction variable.
	LoopBuilder makeAffineLoopBuilder(ValueHandle *iv,			LoopBuilder makeAffineLoopBuilder(ValueHandle *iv,
	ArrayRef<ValueHandle> lbHandles,			ArrayRef<ValueHandle> lbHandles,
	ArrayRef<ValueHandle> ubHandles,			ArrayRef<ValueHandle> ubHandles,
	int64_t step);			int64_t step);
				LoopBuilder makeAffineLoopBuilder(OperationHandle loop, ValueHandle iv,
				ArrayRef<ValueHandle> lbHandles,
				ArrayRef<ValueHandle> ubHandles,
				int64_t step);

	/// Explicit nested LoopBuilder. Offers a compressed multi-loop builder to avoid			/// Explicit nested LoopBuilder. Offers a compressed multi-loop builder to avoid
	/// explicitly writing all the loops in a nest. This simple functionality is			/// explicitly writing all the loops in a nest. This simple functionality is
	/// also useful to write rank-agnostic custom ops.			/// also useful to write rank-agnostic custom ops.
	///			///
	/// Usage:			/// Usage:
	///			///
	/// ```c++			/// ```c++
	Show All 15 Lines
	/// ```			/// ```
	class AffineLoopNestBuilder {			class AffineLoopNestBuilder {
	public:			public:
	/// This entry point accommodates the fact that AffineForOp implicitly uses			/// This entry point accommodates the fact that AffineForOp implicitly uses
	/// multiple `lbs` and `ubs` with one single `iv` and `step` to encode `max`			/// multiple `lbs` and `ubs` with one single `iv` and `step` to encode `max`
	/// and and `min` constraints respectively.			/// and and `min` constraints respectively.
	AffineLoopNestBuilder(ValueHandle *iv, ArrayRef<ValueHandle> lbs,			AffineLoopNestBuilder(ValueHandle *iv, ArrayRef<ValueHandle> lbs,
	ArrayRef<ValueHandle> ubs, int64_t step);			ArrayRef<ValueHandle> ubs, int64_t step);
				AffineLoopNestBuilder(OperationHandle loopHandle, ValueHandle iv,
				ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,
				int64_t step);
	AffineLoopNestBuilder(ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> lbs,			AffineLoopNestBuilder(ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> lbs,
	ArrayRef<ValueHandle> ubs, ArrayRef<int64_t> steps);			ArrayRef<ValueHandle> ubs, ArrayRef<int64_t> steps);
				AffineLoopNestBuilder(ArrayRef<OperationHandle *> loopHandles,
				ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> lbs,
				ArrayRef<ValueHandle> ubs, ArrayRef<int64_t> steps);

	void operator()(function_ref<void(void)> fun = nullptr);			void operator()(function_ref<void(void)> fun = nullptr);

	private:			private:
	SmallVector<LoopBuilder, 4> loops;			SmallVector<LoopBuilder, 4> loops;
	};			};

	namespace op {			namespace op {
	▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/EDSC/Builders.h

	//===- Builders.h - MLIR Declarative Linalg Builders ------------- C++ --===//			//===- Builders.h - MLIR Declarative Linalg Builders ------------- C++ --===//
				Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	Show All 33 Lines
	class LoopRangeBuilder : public NestedBuilder {			class LoopRangeBuilder : public NestedBuilder {
	public:			public:
	/// Constructs a new loop.for and captures the associated induction			/// Constructs a new loop.for and captures the associated induction
	/// variable. A ValueHandle pointer is passed as the first argument and is the			/// variable. A ValueHandle pointer is passed as the first argument and is the
	/// only way to capture the loop induction variable.			/// only way to capture the loop induction variable.
	LoopRangeBuilder(ValueHandle *iv, ValueHandle range);			LoopRangeBuilder(ValueHandle *iv, ValueHandle range);
	LoopRangeBuilder(ValueHandle *iv, Value range);			LoopRangeBuilder(ValueHandle *iv, Value range);
	LoopRangeBuilder(ValueHandle *iv, SubViewOp::Range range);			LoopRangeBuilder(ValueHandle *iv, SubViewOp::Range range);
				LoopRangeBuilder(OperationHandle loop, ValueHandle iv, ValueHandle range);
				LoopRangeBuilder(OperationHandle loop, ValueHandle iv, Value range);
				LoopRangeBuilder(OperationHandle loop, ValueHandle iv,
				SubViewOp::Range range);

	LoopRangeBuilder(const LoopRangeBuilder &) = delete;			LoopRangeBuilder(const LoopRangeBuilder &) = delete;
	LoopRangeBuilder(LoopRangeBuilder &&) = default;			LoopRangeBuilder(LoopRangeBuilder &&) = default;

	LoopRangeBuilder &operator=(const LoopRangeBuilder &) = delete;			LoopRangeBuilder &operator=(const LoopRangeBuilder &) = delete;
	LoopRangeBuilder &operator=(LoopRangeBuilder &&) = default;			LoopRangeBuilder &operator=(LoopRangeBuilder &&) = default;

	/// The only purpose of this operator is to serve as a sequence point so that			/// The only purpose of this operator is to serve as a sequence point so that
	/// the evaluation of `fun` (which build IR snippets in a scoped fashion) is			/// the evaluation of `fun` (which build IR snippets in a scoped fashion) is
	/// scoped within a LoopRangeBuilder.			/// scoped within a LoopRangeBuilder.
	ValueHandle operator()(std::function<void(void)> fun = nullptr);			ValueHandle operator()(std::function<void(void)> fun = nullptr);
	};			};

	/// Helper class to sugar building loop.for loop nests from ranges.			/// Helper class to sugar building loop.for loop nests from ranges.
	/// This is similar to edsc::AffineLoopNestBuilder except it works on ranges			/// This is similar to edsc::AffineLoopNestBuilder except it works on ranges
	/// directly. In the current implementation it produces loop.for operations.			/// directly. In the current implementation it produces loop.for operations.
	class LoopNestRangeBuilder {			class LoopNestRangeBuilder {
	public:			public:
	LoopNestRangeBuilder(ArrayRef<edsc::ValueHandle *> ivs,			LoopNestRangeBuilder(ArrayRef<ValueHandle *> ivs,
	ArrayRef<edsc::ValueHandle> ranges);			ArrayRef<ValueHandle> ranges);
	LoopNestRangeBuilder(ArrayRef<edsc::ValueHandle *> ivs,			LoopNestRangeBuilder(ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges);
	ArrayRef<Value> ranges);			LoopNestRangeBuilder(ArrayRef<ValueHandle *> ivs,
	LoopNestRangeBuilder(ArrayRef<edsc::ValueHandle *> ivs,			ArrayRef<SubViewOp::Range> ranges);
				LoopNestRangeBuilder(ArrayRef<OperationHandle *> loop,
				ArrayRef<ValueHandle *> ivs,
				ArrayRef<ValueHandle> ranges);
				LoopNestRangeBuilder(ArrayRef<OperationHandle *> loop,
				ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges);
				LoopNestRangeBuilder(ArrayRef<OperationHandle *> loop,
				ArrayRef<ValueHandle *> ivs,
	ArrayRef<SubViewOp::Range> ranges);			ArrayRef<SubViewOp::Range> ranges);
	edsc::ValueHandle operator()(std::function<void(void)> fun = nullptr);			ValueHandle operator()(std::function<void(void)> fun = nullptr);

	private:			private:
	SmallVector<LoopRangeBuilder, 4> loops;			SmallVector<LoopRangeBuilder, 4> loops;
	};			};

	/// Helper template class for building loop.for and affine.loop nests from			/// Helper template class for building loop.for and affine.loop nests from
	/// ranges.			/// ranges.
	template <typename LoopTy> class GenericLoopNestRangeBuilder {			template <typename LoopTy> class GenericLoopNestRangeBuilder {
	public:			public:
	GenericLoopNestRangeBuilder(ArrayRef<edsc::ValueHandle *> ivs,			GenericLoopNestRangeBuilder(ArrayRef<ValueHandle *> ivs,
				ArrayRef<Value> ranges);
				GenericLoopNestRangeBuilder(ArrayRef<OperationHandle *> loops,
				ArrayRef<ValueHandle *> ivs,
	ArrayRef<Value> ranges);			ArrayRef<Value> ranges);
	void operator()(std::function<void(void)> fun = nullptr) { (*builder)(fun); }			void operator()(std::function<void(void)> fun = nullptr) { (*builder)(fun); }

	private:			private:
	using LoopOrAffineLoopBuilder =			using LoopOrAffineLoopBuilder =
	typename std::conditional_t<std::is_same<LoopTy, AffineForOp>::value,			typename std::conditional_t<std::is_same<LoopTy, AffineForOp>::value,
	AffineLoopNestBuilder, LoopNestRangeBuilder>;			AffineLoopNestBuilder, LoopNestRangeBuilder>;
	using BuilderType =			using BuilderType =
	▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h

//===- LinalgTransforms.h - Linalg transformations as patterns --- C++ --===//		//===- LinalgTransforms.h - Linalg transformations as patterns --- C++ --===//
		Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	LogicalResult tileLinalgOpAndSetMarker(PatternRewriter &rewriter, Operation *op,
ArrayRef<unsigned> permutation);		ArrayRef<unsigned> permutation);

/// Tiles `op` by `sizes`, fuses the producers of `operandIndicesToFuse` and		/// Tiles `op` by `sizes`, fuses the producers of `operandIndicesToFuse` and
/// sets the attribute `kLinalgTransformMarker` to `linalgMarker`.		/// sets the attribute `kLinalgTransformMarker` to `linalgMarker`.
LogicalResult tileAndFuseLinalgOpAndSetMarker(		LogicalResult tileAndFuseLinalgOpAndSetMarker(
PatternRewriter &rewriter, Operation *op, ArrayRef<int64_t> sizes,		PatternRewriter &rewriter, Operation *op, ArrayRef<int64_t> sizes,
ArrayRef<int64_t> operandIndicesToFuse, StringRef linalgMarker);		ArrayRef<int64_t> operandIndicesToFuse, StringRef linalgMarker);

		using LinalgLoops = SmallVector<Operation *, 4>;

		/// Emits a loop nest of with the proper body for `op`.
		template <typename LoopTy, typename ConcreteOp>
		Optional<LinalgLoops> linalgLowerOpToLoops(PatternRewriter &rewriter,
		ConcreteOp op);

/// Emits a loop nest of `loop.for` with the proper body for `op`.		/// Emits a loop nest of `loop.for` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult linalgOpToLoops(PatternRewriter &rewriter, Operation *op);		LogicalResult linalgOpToLoops(PatternRewriter &rewriter, Operation *op);

/// Emits a loop nest of `loop.parallel` with the proper body for `op`.		/// Emits a loop nest of `loop.parallel` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult linalgOpToParallelLoops(PatternRewriter &rewriter, Operation *op);		LogicalResult linalgOpToParallelLoops(PatternRewriter &rewriter, Operation *op);

Show All 28 Lines

mlir/include/mlir/Dialect/LoopOps/EDSC/Builders.h

	//===- Builders.h - MLIR Declarative Builder Classes ------------- C++ --===//			//===- Builders.h - MLIR Declarative Builder Classes ------------- C++ --===//
				Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	Show All 15 Lines

	/// Constructs a new loop::ParallelOp and captures the associated induction			/// Constructs a new loop::ParallelOp and captures the associated induction
	/// variables. An array of ValueHandle pointers is passed as the first			/// variables. An array of ValueHandle pointers is passed as the first
	/// argument and is the only way to capture loop induction variables.			/// argument and is the only way to capture loop induction variables.
	LoopBuilder makeParallelLoopBuilder(ArrayRef<ValueHandle *> ivs,			LoopBuilder makeParallelLoopBuilder(ArrayRef<ValueHandle *> ivs,
	ArrayRef<ValueHandle> lbHandles,			ArrayRef<ValueHandle> lbHandles,
	ArrayRef<ValueHandle> ubHandles,			ArrayRef<ValueHandle> ubHandles,
	ArrayRef<ValueHandle> steps);			ArrayRef<ValueHandle> steps);
				/// Constructs and captures a new loop::ParallelOp. Captures the associated
				/// induction variables. An array of ValueHandle pointers is passed as the first
				/// argument and is the only way to capture loop induction variables.
				LoopBuilder makeParallelLoopBuilder(OperationHandle *loop,
				ArrayRef<ValueHandle *> ivs,
				ArrayRef<ValueHandle> lbHandles,
				ArrayRef<ValueHandle> ubHandles,
				ArrayRef<ValueHandle> steps);

	/// Constructs a new loop::ForOp and captures the associated induction			/// Constructs a new loop::ForOp and captures the associated induction
	/// variable. A ValueHandle pointer is passed as the first argument and is the			/// variable. A ValueHandle pointer is passed as the first argument and is the
	/// only way to capture the loop induction variable.			/// only way to capture the loop induction variable.
	LoopBuilder makeLoopBuilder(ValueHandle *iv, ValueHandle lbHandle,			LoopBuilder makeLoopBuilder(ValueHandle *iv, ValueHandle lbHandle,
	ValueHandle ubHandle, ValueHandle stepHandle);			ValueHandle ubHandle, ValueHandle stepHandle);
				/// Constructs and captures a new loop::ForOp. Also captures the associated
				/// induction variable. A ValueHandle pointer is passed as the first argument
				/// and is the only way to capture the loop induction variable.
				LoopBuilder makeLoopBuilder(OperationHandle loop, ValueHandle iv,
				ValueHandle lbHandle, ValueHandle ubHandle,
				ValueHandle stepHandle);

	/// Helper class to sugar building loop.parallel loop nests from lower/upper			/// Helper class to sugar building loop.parallel loop nests from lower/upper
	/// bounds and step sizes.			/// bounds and step sizes.
	class ParallelLoopNestBuilder {			class ParallelLoopNestBuilder {
	public:			public:
	ParallelLoopNestBuilder(ArrayRef<ValueHandle *> ivs,			ParallelLoopNestBuilder(ArrayRef<ValueHandle *> ivs,
	ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,			ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,
	ArrayRef<ValueHandle> steps);			ArrayRef<ValueHandle> steps);
				ParallelLoopNestBuilder(OperationHandle loop, ArrayRef<ValueHandle > ivs,
				ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,
				ArrayRef<ValueHandle> steps);

	void operator()(function_ref<void(void)> fun = nullptr);			void operator()(function_ref<void(void)> fun = nullptr);

	private:			private:
	SmallVector<LoopBuilder, 4> loops;			SmallVector<LoopBuilder, 4> loops;
	};			};

	/// Helper class to sugar building loop.for loop nests from ranges.			/// Helper class to sugar building loop.for loop nests from ranges.
	/// This is similar to edsc::AffineLoopNestBuilder except it operates on			/// This is similar to edsc::AffineLoopNestBuilder except it operates on
	/// loop.for.			/// loop.for.
	class LoopNestBuilder {			class LoopNestBuilder {
	public:			public:
	LoopNestBuilder(ArrayRef<edsc::ValueHandle *> ivs, ArrayRef<ValueHandle> lbs,			LoopNestBuilder(ArrayRef<OperationHandle > loop, ArrayRef<ValueHandle > ivs,
				ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,
				ArrayRef<ValueHandle> steps);
				LoopNestBuilder(ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> lbs,
	ArrayRef<ValueHandle> ubs, ArrayRef<ValueHandle> steps);			ArrayRef<ValueHandle> ubs, ArrayRef<ValueHandle> steps);
	void operator()(std::function<void(void)> fun = nullptr);			void operator()(std::function<void(void)> fun = nullptr);

	private:			private:
	SmallVector<LoopBuilder, 4> loops;			SmallVector<LoopBuilder, 4> loops;
	};			};

	} // namespace edsc			} // namespace edsc
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_LOOPOPS_EDSC_BUILDERS_H_			#endif // MLIR_DIALECT_LOOPOPS_EDSC_BUILDERS_H_

mlir/include/mlir/EDSC/Builders.h

//===- Builders.h - MLIR Declarative Builder Classes ------------- C++ --===//		//===- Builders.h - MLIR Declarative Builder Classes ------------- C++ --===//
		Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
Show All 12 Lines

namespace mlir {		namespace mlir {
class OperationFolder;		class OperationFolder;

namespace edsc {		namespace edsc {
class BlockHandle;		class BlockHandle;
class CapturableHandle;		class CapturableHandle;
class NestedBuilder;		class NestedBuilder;
		class OperationHandle;
class ValueHandle;		class ValueHandle;

/// Helper class to transparently handle builder insertion points by RAII.		/// Helper class to transparently handle builder insertion points by RAII.
/// As its name indicates, a ScopedContext is means to be used locally in a		/// As its name indicates, a ScopedContext is means to be used locally in a
/// scoped fashion. This abstracts away all the boilerplate related to		/// scoped fashion. This abstracts away all the boilerplate related to
/// checking proper usage of captures, NestedBuilders as well as handling the		/// checking proper usage of captures, NestedBuilders as well as handling the
/// setting and restoring of insertion points.		/// setting and restoring of insertion points.
class ScopedContext {		class ScopedContext {
Show All 34 Lines	private:
/// scoping usage.		/// scoping usage.
NestedBuilder *nestedBuilder;		NestedBuilder *nestedBuilder;

// TODO: Implement scoping of ValueHandles. To do this we need a proper data		// TODO: Implement scoping of ValueHandles. To do this we need a proper data
// structure to hold ValueHandle objects. We can emulate one but there should		// structure to hold ValueHandle objects. We can emulate one but there should
// already be something available in LLVM for this purpose.		// already be something available in LLVM for this purpose.
};		};

/// A NestedBuilder is a scoping abstraction to create an idiomatic syntax
/// embedded in C++ that serves the purpose of building nested MLIR.
/// Nesting and compositionality is obtained by using the strict ordering that
/// exists between object construction and method invocation on said object (in
/// our case, the call to `operator()`).
/// This ordering allows implementing an abstraction that decouples definition
/// from declaration (in a PL sense) on placeholders of type ValueHandle and
/// BlockHandle.
class NestedBuilder {
protected:
NestedBuilder() = default;
NestedBuilder(const NestedBuilder &) = delete;
NestedBuilder(NestedBuilder &&other) : bodyScope(other.bodyScope) {
other.bodyScope = nullptr;
}

NestedBuilder &operator=(const NestedBuilder &) = delete;
NestedBuilder &operator=(NestedBuilder &&other) {
std::swap(bodyScope, other.bodyScope);
return *this;
}

/// Enter an mlir::Block and setup a ScopedContext to insert operations at
/// the end of it. Since we cannot use c++ language-level scoping to implement
/// scoping itself, we use enter/exit pairs of operations.
/// As a consequence we must allocate a new OpBuilder + ScopedContext and
/// let the escape.
/// Step back "prev" times from the end of the block to set up the insertion
/// point, which is useful for non-empty blocks.
void enter(mlir::Block *block, int prev = 0) {
bodyScope = new ScopedContext(
ScopedContext::getBuilder(),
OpBuilder::InsertPoint(block, std::prev(block->end(), prev)),
ScopedContext::getLocation());
bodyScope->nestedBuilder = this;
}

/// Exit the current mlir::Block by explicitly deleting the dynamically
/// allocated OpBuilder and ScopedContext.
void exit() {
// Reclaim now to exit the scope.
bodyScope->nestedBuilder = nullptr;
delete bodyScope;
bodyScope = nullptr;
}

/// Custom destructor does nothing because we already destroyed bodyScope
/// manually in `exit`. Insert an assertion to defensively guard against
/// improper usage of scoping.
~NestedBuilder() {
assert(!bodyScope &&
"Illegal use of NestedBuilder; must have called exit()");
}

private:
ScopedContext *bodyScope = nullptr;
};

/// A LoopBuilder is a generic NestedBuilder for loop-like MLIR operations.
/// More specifically it is meant to be used as a temporary object for
/// representing any nested MLIR construct that is "related to" an mlir::Value
/// (for now an induction variable).
/// This is extensible and will evolve in the future as MLIR evolves, hence
/// the name LoopBuilder (as opposed to say ForBuilder or AffineForBuilder).
class LoopBuilder : public NestedBuilder {
public:
LoopBuilder(const LoopBuilder &) = delete;
LoopBuilder(LoopBuilder &&) = default;

LoopBuilder &operator=(const LoopBuilder &) = delete;
LoopBuilder &operator=(LoopBuilder &&) = default;

/// The only purpose of this operator is to serve as a sequence point so that
/// the evaluation of `fun` (which build IR snippets in a scoped fashion) is
/// scoped within a LoopBuilder.
void operator()(function_ref<void(void)> fun = nullptr);

private:
LoopBuilder() = default;

friend LoopBuilder makeAffineLoopBuilder(ValueHandle *iv,
ArrayRef<ValueHandle> lbHandles,
ArrayRef<ValueHandle> ubHandles,
int64_t step);
friend LoopBuilder makeParallelLoopBuilder(ArrayRef<ValueHandle *> ivs,
ArrayRef<ValueHandle> lbHandles,
ArrayRef<ValueHandle> ubHandles,
ArrayRef<ValueHandle> steps);
friend LoopBuilder makeLoopBuilder(ValueHandle *iv, ValueHandle lbHandle,
ValueHandle ubHandle,
ValueHandle stepHandle);
};

// This class exists solely to handle the C++ vexing parse case when
// trying to enter a Block that has already been constructed.
class Append {};

/// A BlockBuilder is a NestedBuilder for mlir::Block*.
/// This exists by opposition to LoopBuilder which is not related to an
/// mlir::Block* but to a mlir::Value.
/// It is meant to be used as a temporary object for representing any nested
/// MLIR construct that is "related to" an mlir::Block*.
class BlockBuilder : public NestedBuilder {
public:
/// Enters the mlir::Block* previously captured by `bh` and sets the insertion
/// point to its end.
BlockBuilder(BlockHandle bh, Append);

/// Constructs a new mlir::Block with argument types derived from `args`.
/// Captures the new block in `bh` and its arguments into `args`.
/// Enters the new mlir::Block* and sets the insertion point to its end.
///
/// Prerequisites:
/// The ValueHandle `args` are typed delayed ValueHandles; i.e. they are
/// not yet bound to mlir::Value.
BlockBuilder(BlockHandle bh, ArrayRef<ValueHandle > args);

/// Constructs a new mlir::Block with argument types derived from `args` and
/// appends it as the last block in the region.
/// Captures the new block in `bh` and its arguments into `args`.
/// Enters the new mlir::Block* and sets the insertion point to its end.
///
/// Prerequisites:
/// The ValueHandle `args` are typed delayed ValueHandles; i.e. they are
/// not yet bound to mlir::Value.
BlockBuilder(BlockHandle bh, Region &region, ArrayRef<ValueHandle > args);

/// The only purpose of this operator is to serve as a sequence point so that
/// the evaluation of `fun` (which build IR snippets in a scoped fashion) is
/// scoped within a BlockBuilder.
void operator()(function_ref<void(void)> fun = nullptr);

private:
BlockBuilder(BlockBuilder &) = delete;
BlockBuilder &operator=(BlockBuilder &other) = delete;
};

/// Base class for ValueHandle, OperationHandle and BlockHandle.		/// Base class for ValueHandle, OperationHandle and BlockHandle.
/// Not meant to be used outside of these classes.		/// Not meant to be used outside of these classes.
class CapturableHandle {		class CapturableHandle {
protected:		protected:
CapturableHandle() = default;		CapturableHandle() = default;
};		};

/// ValueHandle implements a (potentially "delayed") typed Value abstraction.		/// ValueHandle implements a (potentially "delayed") typed Value abstraction.
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	struct OperationHandle : public CapturableHandle {

operator Operation *() { return op; }		operator Operation *() { return op; }
Operation *getOperation() const { return op; }		Operation *getOperation() const { return op; }

private:		private:
Operation *op;		Operation *op;
};		};

		/// A NestedBuilder is a scoping abstraction to create an idiomatic syntax
		asaadaldienUnsubmitted Not Done Reply Inline Actions Can we move this block to back to line:79 ? asaadaldien: Can we move this block to back to line:79 ?
		/// embedded in C++ that serves the purpose of building nested MLIR.
		/// Nesting and compositionality is obtained by using the strict ordering that
		/// exists between object construction and method invocation on said object (in
		/// our case, the call to `operator()`).
		/// This ordering allows implementing an abstraction that decouples definition
		/// from declaration (in a PL sense) on placeholders of type ValueHandle and
		/// BlockHandle.
		class NestedBuilder {
		protected:
		NestedBuilder() = default;
		NestedBuilder(const NestedBuilder &) = delete;
		NestedBuilder(NestedBuilder &&other) : bodyScope(other.bodyScope) {
		other.bodyScope = nullptr;
		}

		NestedBuilder &operator=(const NestedBuilder &) = delete;
		NestedBuilder &operator=(NestedBuilder &&other) {
		std::swap(bodyScope, other.bodyScope);
		return *this;
		}

		/// Enter an mlir::Block and setup a ScopedContext to insert operations at
		/// the end of it. Since we cannot use c++ language-level scoping to implement
		/// scoping itself, we use enter/exit pairs of operations.
		/// As a consequence we must allocate a new OpBuilder + ScopedContext and
		/// let the escape.
		/// Step back "prev" times from the end of the block to set up the insertion
		/// point, which is useful for non-empty blocks.
		void enter(mlir::Block *block, int prev = 0) {
		bodyScope = new ScopedContext(
		ScopedContext::getBuilder(),
		OpBuilder::InsertPoint(block, std::prev(block->end(), prev)),
		ScopedContext::getLocation());
		bodyScope->nestedBuilder = this;
		}

		/// Exit the current mlir::Block by explicitly deleting the dynamically
		/// allocated OpBuilder and ScopedContext.
		void exit() {
		// Reclaim now to exit the scope.
		bodyScope->nestedBuilder = nullptr;
		delete bodyScope;
		bodyScope = nullptr;
		}

		/// Custom destructor does nothing because we already destroyed bodyScope
		/// manually in `exit`. Insert an assertion to defensively guard against
		/// improper usage of scoping.
		~NestedBuilder() {
		assert(!bodyScope &&
		"Illegal use of NestedBuilder; must have called exit()");
		}

		private:
		ScopedContext *bodyScope = nullptr;
		};

		/// A LoopBuilder is a generic NestedBuilder for loop-like MLIR operations.
		/// More specifically it is meant to be used as a temporary object for
		/// representing any nested MLIR construct that is "related to" an mlir::Value
		/// (for now an induction variable).
		/// This is extensible and will evolve in the future as MLIR evolves, hence
		/// the name LoopBuilder (as opposed to say ForBuilder or AffineForBuilder).
		class LoopBuilder : public NestedBuilder {
		public:
		LoopBuilder(const LoopBuilder &) = delete;
		LoopBuilder(LoopBuilder &&) = default;

		LoopBuilder &operator=(const LoopBuilder &) = delete;
		LoopBuilder &operator=(LoopBuilder &&) = default;

		/// The only purpose of this operator is to serve as a sequence point so that
		/// the evaluation of `fun` (which build IR snippets in a scoped fashion) is
		/// scoped within a LoopBuilder. Returns the created loop operation.
		void operator()(function_ref<void(void)> fun = nullptr);

		private:
		LoopBuilder() = default;

		friend LoopBuilder makeAffineLoopBuilder(ValueHandle *iv,
		ArrayRef<ValueHandle> lbHandles,
		ArrayRef<ValueHandle> ubHandles,
		int64_t step);
		friend LoopBuilder makeAffineLoopBuilder(OperationHandle *loop,
		ValueHandle *iv,
		ArrayRef<ValueHandle> lbHandles,
		ArrayRef<ValueHandle> ubHandles,
		int64_t step);

		friend LoopBuilder makeParallelLoopBuilder(ArrayRef<ValueHandle *> ivs,
		ArrayRef<ValueHandle> lbHandles,
		ArrayRef<ValueHandle> ubHandles,
		ArrayRef<ValueHandle> steps);
		friend LoopBuilder makeParallelLoopBuilder(OperationHandle *loop,
		ArrayRef<ValueHandle *> ivs,
		ArrayRef<ValueHandle> lbHandles,
		ArrayRef<ValueHandle> ubHandles,
		ArrayRef<ValueHandle> steps);

		friend LoopBuilder makeLoopBuilder(ValueHandle *iv, ValueHandle lbHandle,
		ValueHandle ubHandle,
		ValueHandle stepHandle);
		friend LoopBuilder makeLoopBuilder(OperationHandle loop, ValueHandle iv,
		ValueHandle lbHandle, ValueHandle ubHandle,
		ValueHandle stepHandle);
		};

		// This class exists solely to handle the C++ vexing parse case when
		// trying to enter a Block that has already been constructed.
		class Append {};

		/// A BlockBuilder is a NestedBuilder for mlir::Block*.
		/// This exists by opposition to LoopBuilder which is not related to an
		/// mlir::Block* but to a mlir::Value.
		/// It is meant to be used as a temporary object for representing any nested
		/// MLIR construct that is "related to" an mlir::Block*.
		class BlockBuilder : public NestedBuilder {
		public:
		/// Enters the mlir::Block* previously captured by `bh` and sets the insertion
		/// point to its end.
		BlockBuilder(BlockHandle bh, Append);

		/// Constructs a new mlir::Block with argument types derived from `args`.
		/// Captures the new block in `bh` and its arguments into `args`.
		/// Enters the new mlir::Block* and sets the insertion point to its end.
		///
		/// Prerequisites:
		/// The ValueHandle `args` are typed delayed ValueHandles; i.e. they are
		/// not yet bound to mlir::Value.
		BlockBuilder(BlockHandle bh, ArrayRef<ValueHandle > args);

		/// Constructs a new mlir::Block with argument types derived from `args` and
		/// appends it as the last block in the region.
		/// Captures the new block in `bh` and its arguments into `args`.
		/// Enters the new mlir::Block* and sets the insertion point to its end.
		///
		/// Prerequisites:
		/// The ValueHandle `args` are typed delayed ValueHandles; i.e. they are
		/// not yet bound to mlir::Value.
		BlockBuilder(BlockHandle bh, Region &region, ArrayRef<ValueHandle > args);

		/// The only purpose of this operator is to serve as a sequence point so that
		/// the evaluation of `fun` (which build IR snippets in a scoped fashion) is
		/// scoped within a BlockBuilder.
		void operator()(function_ref<void(void)> fun = nullptr);

		private:
		BlockBuilder(BlockBuilder &) = delete;
		BlockBuilder &operator=(BlockBuilder &other) = delete;
		};

/// Simple wrapper to build a generic operation without successor blocks.		/// Simple wrapper to build a generic operation without successor blocks.
template <typename HandleType>		template <typename HandleType>
struct CustomOperation {		struct CustomOperation {
CustomOperation(StringRef name) : name(name) {		CustomOperation(StringRef name) : name(name) {
static_assert(std::is_same<HandleType, ValueHandle>() \|\|		static_assert(std::is_same<HandleType, ValueHandle>() \|\|
std::is_same<HandleType, OperationHandle>(),		std::is_same<HandleType, OperationHandle>(),
"Only CustomOperation<ValueHandle> or "		"Only CustomOperation<ValueHandle> or "
"CustomOperation<OperationHandle> can be constructed.");		"CustomOperation<OperationHandle> can be constructed.");
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

mlir/lib/Dialect/Affine/EDSC/Builders.cpp

	//===- Builders.cpp - MLIR Declarative Builder Classes --------------------===//			//===- Builders.cpp - MLIR Declarative Builder Classes --------------------===//
				Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Dialect/Affine/EDSC/Builders.h"			#include "mlir/Dialect/Affine/EDSC/Builders.h"
	#include "mlir/Dialect/StandardOps/EDSC/Builders.h"			#include "mlir/Dialect/StandardOps/EDSC/Builders.h"
	#include "mlir/IR/AffineExpr.h"			#include "mlir/IR/AffineExpr.h"
	#include "mlir/IR/AffineMap.h"			#include "mlir/IR/AffineMap.h"

	using namespace mlir;			using namespace mlir;
	using namespace mlir::edsc;			using namespace mlir::edsc;

	static Optional<ValueHandle> emitStaticFor(ArrayRef<ValueHandle> lbs,			static AffineForOp emitStaticFor(ArrayRef<ValueHandle> lbs,
	ArrayRef<ValueHandle> ubs,			ArrayRef<ValueHandle> ubs, int64_t step) {
	int64_t step) {
	if (lbs.size() != 1 \|\| ubs.size() != 1)			if (lbs.size() != 1 \|\| ubs.size() != 1)
	return Optional<ValueHandle>();			return nullptr;

	auto *lbDef = lbs.front().getValue().getDefiningOp();			auto *lbDef = lbs.front().getValue().getDefiningOp();
	auto *ubDef = ubs.front().getValue().getDefiningOp();			auto *ubDef = ubs.front().getValue().getDefiningOp();
	if (!lbDef \|\| !ubDef)			if (!lbDef \|\| !ubDef)
	return Optional<ValueHandle>();			return nullptr;

	auto lbConst = dyn_cast<ConstantIndexOp>(lbDef);			auto lbConst = dyn_cast<ConstantIndexOp>(lbDef);
	auto ubConst = dyn_cast<ConstantIndexOp>(ubDef);			auto ubConst = dyn_cast<ConstantIndexOp>(ubDef);
	if (!lbConst \|\| !ubConst)			if (!lbConst \|\| !ubConst)
	return Optional<ValueHandle>();			return nullptr;

	return ValueHandle(ScopedContext::getBuilder()			return ScopedContext::getBuilder().create<AffineForOp>(
	.create<AffineForOp>(ScopedContext::getLocation(),			ScopedContext::getLocation(), lbConst.getValue(), ubConst.getValue(),
	lbConst.getValue(),			step);
	ubConst.getValue(), step)
	.getInductionVar());
	}			}

	LoopBuilder mlir::edsc::makeAffineLoopBuilder(ValueHandle *iv,			LoopBuilder mlir::edsc::makeAffineLoopBuilder(ValueHandle *iv,
	ArrayRef<ValueHandle> lbHandles,			ArrayRef<ValueHandle> lbHandles,
	ArrayRef<ValueHandle> ubHandles,			ArrayRef<ValueHandle> ubHandles,
	int64_t step) {			int64_t step) {
	mlir::edsc::LoopBuilder result;			mlir::edsc::LoopBuilder result;
	if (auto staticFor = emitStaticFor(lbHandles, ubHandles, step)) {			AffineForOp forOp = emitStaticFor(lbHandles, ubHandles, step);
	*iv = staticFor.getValue();			if (!forOp) {
	} else {			SmallVector<Value, 4> lbs(lbHandles.begin(), lbHandles.end());
				SmallVector<Value, 4> ubs(ubHandles.begin(), ubHandles.end());
				auto b = ScopedContext::getBuilder();
				forOp = b.create<AffineForOp>(ScopedContext::getLocation(), lbs,
				b.getMultiDimIdentityMap(lbs.size()), ubs,
				b.getMultiDimIdentityMap(ubs.size()), step);
				}
				*iv = ValueHandle(forOp.getInductionVar());
				auto *body = getForInductionVarOwner(iv->getValue()).getBody();
				result.enter(body, /prev=/1);
				return result;
				}

				LoopBuilder mlir::edsc::makeAffineLoopBuilder(OperationHandle *loop,
				ValueHandle *iv,
				ArrayRef<ValueHandle> lbHandles,
				ArrayRef<ValueHandle> ubHandles,
				int64_t step) {
				mlir::edsc::LoopBuilder result;
				AffineForOp forOp = emitStaticFor(lbHandles, ubHandles, step);
				if (!forOp) {
	SmallVector<Value, 4> lbs(lbHandles.begin(), lbHandles.end());			SmallVector<Value, 4> lbs(lbHandles.begin(), lbHandles.end());
	SmallVector<Value, 4> ubs(ubHandles.begin(), ubHandles.end());			SmallVector<Value, 4> ubs(ubHandles.begin(), ubHandles.end());
	auto b = ScopedContext::getBuilder();			auto b = ScopedContext::getBuilder();
	*iv = ValueHandle(			forOp = b.create<AffineForOp>(ScopedContext::getLocation(), lbs,
	b.create<AffineForOp>(ScopedContext::getLocation(), lbs,
	b.getMultiDimIdentityMap(lbs.size()), ubs,			b.getMultiDimIdentityMap(lbs.size()), ubs,
	b.getMultiDimIdentityMap(ubs.size()), step)			b.getMultiDimIdentityMap(ubs.size()), step);
	.getInductionVar());
	}			}
				*iv = ValueHandle(forOp.getInductionVar());
				*loop = OperationHandle(forOp);
	auto *body = getForInductionVarOwner(iv->getValue()).getBody();			auto *body = getForInductionVarOwner(iv->getValue()).getBody();
	result.enter(body, /prev=/1);			result.enter(body, /prev=/1);
	return result;			return result;
	}			}

	mlir::edsc::AffineLoopNestBuilder::AffineLoopNestBuilder(			mlir::edsc::AffineLoopNestBuilder::AffineLoopNestBuilder(
				OperationHandle loopHandle, ValueHandle iv, ArrayRef<ValueHandle> lbs,
				ArrayRef<ValueHandle> ubs, int64_t step) {
				loops.emplace_back(makeAffineLoopBuilder(loopHandle, iv, lbs, ubs, step));
				}

				mlir::edsc::AffineLoopNestBuilder::AffineLoopNestBuilder(
	ValueHandle *iv, ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,			ValueHandle *iv, ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,
	int64_t step) {			int64_t step) {
	loops.emplace_back(makeAffineLoopBuilder(iv, lbs, ubs, step));			loops.emplace_back(makeAffineLoopBuilder(iv, lbs, ubs, step));
	}			}

	mlir::edsc::AffineLoopNestBuilder::AffineLoopNestBuilder(			mlir::edsc::AffineLoopNestBuilder::AffineLoopNestBuilder(
				ArrayRef<OperationHandle > loopHandles, ArrayRef<ValueHandle > ivs,
				ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,
				ArrayRef<int64_t> steps) {
				assert(loopHandles.size() == ivs.size() && "Mismatch in number of arguments");
				assert(ivs.size() == lbs.size() && "Mismatch in number of arguments");
				assert(ivs.size() == ubs.size() && "Mismatch in number of arguments");
				assert(ivs.size() == steps.size() && "Mismatch in number of arguments");
				for (auto it : llvm::zip(loopHandles, ivs, lbs, ubs, steps))
				loops.emplace_back(makeAffineLoopBuilder(std::get<0>(it), std::get<1>(it),
				std::get<2>(it), std::get<3>(it),
				std::get<4>(it)));
				}

				mlir::edsc::AffineLoopNestBuilder::AffineLoopNestBuilder(
	ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> lbs,			ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> lbs,
	ArrayRef<ValueHandle> ubs, ArrayRef<int64_t> steps) {			ArrayRef<ValueHandle> ubs, ArrayRef<int64_t> steps) {
	assert(ivs.size() == lbs.size() && "Mismatch in number of arguments");			assert(ivs.size() == lbs.size() && "Mismatch in number of arguments");
	assert(ivs.size() == ubs.size() && "Mismatch in number of arguments");			assert(ivs.size() == ubs.size() && "Mismatch in number of arguments");
	assert(ivs.size() == steps.size() && "Mismatch in number of arguments");			assert(ivs.size() == steps.size() && "Mismatch in number of arguments");
	for (auto it : llvm::zip(ivs, lbs, ubs, steps))			for (auto it : llvm::zip(ivs, lbs, ubs, steps))
	loops.emplace_back(makeAffineLoopBuilder(std::get<0>(it), std::get<1>(it),			loops.emplace_back(makeAffineLoopBuilder(std::get<0>(it), std::get<1>(it),
	std::get<2>(it), std::get<3>(it)));			std::get<2>(it), std::get<3>(it)));
	▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/EDSC/Builders.cpp

//===- Builders.cpp - MLIR Declarative Linalg Builders --------------------===//		//===- Builders.cpp - MLIR Declarative Linalg Builders --------------------===//
		Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Show All 22 Lines	mlir::edsc::LoopRangeBuilder::LoopRangeBuilder(ValueHandle *iv,
auto ub = rangeOp.max();		auto ub = rangeOp.max();
auto step = rangeOp.step();		auto step = rangeOp.step();
auto forOp = OperationHandle::createOp<ForOp>(lb, ub, step);		auto forOp = OperationHandle::createOp<ForOp>(lb, ub, step);
*iv = ValueHandle(forOp.getInductionVar());		*iv = ValueHandle(forOp.getInductionVar());
auto *body = forOp.getBody();		auto *body = forOp.getBody();
enter(body, /prev=/1);		enter(body, /prev=/1);
}		}

		mlir::edsc::LoopRangeBuilder::LoopRangeBuilder(OperationHandle *loop,
		ValueHandle *iv,
		ValueHandle range) {
		assert(range.getType() && "expected !linalg.range type");
		assert(range.getValue().getDefiningOp() &&
		"need operations to extract range parts");
		auto rangeOp = cast<RangeOp>(range.getValue().getDefiningOp());
		auto lb = rangeOp.min();
		auto ub = rangeOp.max();
		auto step = rangeOp.step();
		auto forOp = OperationHandle::createOp<ForOp>(lb, ub, step);
		*iv = ValueHandle(forOp.getInductionVar());
		*loop = OperationHandle(forOp);
		auto *body = forOp.getBody();
		enter(body, /prev=/1);
		}

mlir::edsc::LoopRangeBuilder::LoopRangeBuilder(ValueHandle *iv,		mlir::edsc::LoopRangeBuilder::LoopRangeBuilder(ValueHandle *iv,
SubViewOp::Range range) {		SubViewOp::Range range) {
auto forOp =		auto forOp =
OperationHandle::createOp<ForOp>(range.offset, range.size, range.stride);		OperationHandle::createOp<ForOp>(range.offset, range.size, range.stride);
*iv = ValueHandle(forOp.getInductionVar());		*iv = ValueHandle(forOp.getInductionVar());
auto *body = forOp.getBody();		auto *body = forOp.getBody();
enter(body, /prev=/1);		enter(body, /prev=/1);
}		}

		mlir::edsc::LoopRangeBuilder::LoopRangeBuilder(OperationHandle *loop,
		ValueHandle *iv,
		SubViewOp::Range range) {
		auto forOp =
		OperationHandle::createOp<ForOp>(range.offset, range.size, range.stride);
		*iv = ValueHandle(forOp.getInductionVar());
		*loop = OperationHandle(forOp);
		auto *body = forOp.getBody();
		enter(body, /prev=/1);
		}

ValueHandle		ValueHandle
mlir::edsc::LoopRangeBuilder::operator()(std::function<void(void)> fun) {		mlir::edsc::LoopRangeBuilder::operator()(std::function<void(void)> fun) {
if (fun)		if (fun)
fun();		fun();
exit();		exit();
return ValueHandle::null();		return ValueHandle::null();
}		}

mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(		mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(
ArrayRef<ValueHandle *> ivs, ArrayRef<SubViewOp::Range> ranges) {		ArrayRef<ValueHandle *> ivs, ArrayRef<SubViewOp::Range> ranges) {
loops.reserve(ranges.size());		loops.reserve(ranges.size());
for (unsigned i = 0, e = ranges.size(); i < e; ++i) {		for (unsigned i = 0, e = ranges.size(); i < e; ++i) {
loops.emplace_back(ivs[i], ranges[i]);		loops.emplace_back(ivs[i], ranges[i]);
}		}
assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");		assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");
}		}

mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(		mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(
		ArrayRef<OperationHandle > loopHandles, ArrayRef<ValueHandle > ivs,
		ArrayRef<SubViewOp::Range> ranges) {
		loops.reserve(ranges.size());
		for (unsigned i = 0, e = ranges.size(); i < e; ++i) {
		loops.emplace_back(loopHandles[i], ivs[i], ranges[i]);
		}
		assert(loops.size() == loopHandles.size() && "Mismatch loop vs loopHandles");
		assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");
		}

		mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(
ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> ranges) {		ArrayRef<ValueHandle *> ivs, ArrayRef<ValueHandle> ranges) {
loops.reserve(ranges.size());		loops.reserve(ranges.size());
for (unsigned i = 0, e = ranges.size(); i < e; ++i) {		for (unsigned i = 0, e = ranges.size(); i < e; ++i) {
loops.emplace_back(ivs[i], ranges[i]);		loops.emplace_back(ivs[i], ranges[i]);
}		}
assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");		assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");
}		}

mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(		mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(
		ArrayRef<OperationHandle > loopHandles, ArrayRef<ValueHandle > ivs,
		ArrayRef<ValueHandle> ranges) {
		loops.reserve(ranges.size());
		for (unsigned i = 0, e = ranges.size(); i < e; ++i) {
		loops.emplace_back(loopHandles[i], ivs[i], ranges[i]);
		}
		assert(loops.size() == loopHandles.size() &&
		"Mismatch loops vs loopHandle size");
		assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");
		}

		mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(
ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges)		ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges)
: LoopNestRangeBuilder(		: LoopNestRangeBuilder(
ivs, SmallVector<ValueHandle, 4>(ranges.begin(), ranges.end())) {}		ivs, SmallVector<ValueHandle, 4>(ranges.begin(), ranges.end())) {}

		mlir::edsc::LoopNestRangeBuilder::LoopNestRangeBuilder(
		ArrayRef<OperationHandle > loopHandles, ArrayRef<ValueHandle > ivs,
		ArrayRef<Value> ranges)
		: LoopNestRangeBuilder(
		loopHandles, ivs,
		SmallVector<ValueHandle, 4>(ranges.begin(), ranges.end())) {}

ValueHandle LoopNestRangeBuilder::LoopNestRangeBuilder::operator()(		ValueHandle LoopNestRangeBuilder::LoopNestRangeBuilder::operator()(
std::function<void(void)> fun) {		std::function<void(void)> fun) {
if (fun)		if (fun)
fun();		fun();
for (auto &lit : reverse(loops)) {		for (auto &lit : enumerate(reverse(loops))) {
lit({});		lit.value()({});
}		}
return ValueHandle::null();		return ValueHandle::null();
}		}

namespace mlir {		namespace mlir {
namespace edsc {		namespace edsc {

template <>		template <>
GenericLoopNestRangeBuilder<loop::ForOp>::GenericLoopNestRangeBuilder(		GenericLoopNestRangeBuilder<loop::ForOp>::GenericLoopNestRangeBuilder(
ArrayRef<edsc::ValueHandle *> ivs, ArrayRef<Value> ranges) {		ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges) {
builder = std::make_unique<LoopNestRangeBuilder>(ivs, ranges);		builder = std::make_unique<LoopNestRangeBuilder>(ivs, ranges);
}		}

template <>		template <>
		GenericLoopNestRangeBuilder<loop::ForOp>::GenericLoopNestRangeBuilder(
		ArrayRef<OperationHandle > loopHandles, ArrayRef<ValueHandle > ivs,
		ArrayRef<Value> ranges) {
		builder = std::make_unique<LoopNestRangeBuilder>(loopHandles, ivs, ranges);
		}

		template <>
GenericLoopNestRangeBuilder<AffineForOp>::GenericLoopNestRangeBuilder(		GenericLoopNestRangeBuilder<AffineForOp>::GenericLoopNestRangeBuilder(
ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges) {		ArrayRef<OperationHandle > loopHandles, ArrayRef<ValueHandle > ivs,
		ArrayRef<Value> ranges) {
SmallVector<ValueHandle, 4> lbs;		SmallVector<ValueHandle, 4> lbs;
SmallVector<ValueHandle, 4> ubs;		SmallVector<ValueHandle, 4> ubs;
SmallVector<int64_t, 4> steps;		SmallVector<int64_t, 4> steps;
for (Value range : ranges) {		for (Value range : ranges) {
assert(range.getType() && "expected linalg.range type");		assert(range.getType() && "expected linalg.range type");
assert(range.getDefiningOp() && "need operations to extract range parts");		assert(range.getDefiningOp() && "need operations to extract range parts");
RangeOp rangeOp = cast<RangeOp>(range.getDefiningOp());		RangeOp rangeOp = cast<RangeOp>(range.getDefiningOp());
lbs.emplace_back(rangeOp.min());		lbs.emplace_back(rangeOp.min());
ubs.emplace_back(rangeOp.max());		ubs.emplace_back(rangeOp.max());
steps.emplace_back(rangeOp.step());		steps.emplace_back(rangeOp.step());
}		}
builder = std::make_unique<AffineLoopNestBuilder>(ivs, lbs, ubs, steps);		builder = std::make_unique<AffineLoopNestBuilder>(loopHandles, ivs, lbs, ubs,
		steps);
}		}

template <>		template <>
GenericLoopNestRangeBuilder<loop::ParallelOp>::GenericLoopNestRangeBuilder(		GenericLoopNestRangeBuilder<loop::ParallelOp>::GenericLoopNestRangeBuilder(
ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges) {		ArrayRef<ValueHandle *> ivs, ArrayRef<Value> ranges) {
SmallVector<ValueHandle, 4> lbs, ubs, steps;		SmallVector<ValueHandle, 4> lbs, ubs, steps;
for (Value range : ranges) {		for (Value range : ranges) {
assert(range.getType() && "expected linalg.range type");		assert(range.getType() && "expected linalg.range type");
assert(range.getDefiningOp() && "need operations to extract range parts");		assert(range.getDefiningOp() && "need operations to extract range parts");
RangeOp rangeOp = cast<RangeOp>(range.getDefiningOp());		RangeOp rangeOp = cast<RangeOp>(range.getDefiningOp());
lbs.emplace_back(rangeOp.min());		lbs.emplace_back(rangeOp.min());
ubs.emplace_back(rangeOp.max());		ubs.emplace_back(rangeOp.max());
steps.emplace_back(rangeOp.step());		steps.emplace_back(rangeOp.step());
}		}
builder = std::make_unique<ParallelLoopNestBuilder>(ivs, lbs, ubs, steps);		builder = std::make_unique<ParallelLoopNestBuilder>(ivs, lbs, ubs, steps);
}		}

		template <>
		GenericLoopNestRangeBuilder<loop::ParallelOp>::GenericLoopNestRangeBuilder(
		ArrayRef<OperationHandle > loopHandles, ArrayRef<ValueHandle > ivs,
		ArrayRef<Value> ranges) {
		assert(
		loopHandles.size() == 1 &&
		"expected loopHandles to be of unit size when lowering to loop.parallel");
		SmallVector<ValueHandle, 4> lbs, ubs, steps;
		for (Value range : ranges) {
		assert(range.getType() && "expected linalg.range type");
		assert(range.getDefiningOp() && "need operations to extract range parts");
		RangeOp rangeOp = cast<RangeOp>(range.getDefiningOp());
		lbs.emplace_back(rangeOp.min());
		ubs.emplace_back(rangeOp.max());
		steps.emplace_back(rangeOp.step());
		}
		builder = std::make_unique<ParallelLoopNestBuilder>(loopHandles[0], ivs, lbs,
		ubs, steps);
		}

} // namespace edsc		} // namespace edsc
} // namespace mlir		} // namespace mlir

Operation *mlir::edsc::makeGenericLinalgOp(		Operation *mlir::edsc::makeGenericLinalgOp(
ArrayRef<IteratorType> iteratorTypes, ArrayRef<StructuredIndexed> inputs,		ArrayRef<IteratorType> iteratorTypes, ArrayRef<StructuredIndexed> inputs,
ArrayRef<StructuredIndexed> outputs,		ArrayRef<StructuredIndexed> outputs,
function_ref<void(ArrayRef<BlockArgument>)> regionBuilder,		function_ref<void(ArrayRef<BlockArgument>)> regionBuilder,
ArrayRef<Value> otherValues, ArrayRef<Attribute> otherAttributes) {		ArrayRef<Value> otherValues, ArrayRef<Attribute> otherAttributes) {
▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/LinalgToLoops.cpp

//===- LinalgToLoops.cpp - conversion from Linalg library ops to loops-----===//		//===- LinalgToLoops.cpp - conversion from Linalg library ops to loops-----===//
		Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 520 Lines • ▼ Show 20 Lines
// This struct is for factoring out the implementation and support template		// This struct is for factoring out the implementation and support template
// instantiations in the following 2 cases:		// instantiations in the following 2 cases:
// 1. Appending to a list of patterns via RewritePatternList.		// 1. Appending to a list of patterns via RewritePatternList.
// 2. Direct invocation via `linalgOpToLoops` and `linalgOpToAffineLoops`.		// 2. Direct invocation via `linalgOpToLoops` and `linalgOpToAffineLoops`.
// The implementation must work both in DRR and inside a RewritePattern. As a		// The implementation must work both in DRR and inside a RewritePattern. As a
// consequence, (1) it is only allowed to emit new ops if the match is		// consequence, (1) it is only allowed to emit new ops if the match is
// guaranteed to be a success, (2) it is not allowed erase/replace, and (3) an		// guaranteed to be a success, (2) it is not allowed erase/replace, and (3) an
// encompassing pattern must take care of the erasure logic.		// encompassing pattern must take care of the erasure logic.
template <typename LoopTy, typename IndexedValueTy, typename ConcreteOpTy>		template <typename LoopTy, typename ConcreteOpTy> class LinalgOpToLoopsImpl {
class LinalgOpToLoopsImpl {
public:		public:
static LogicalResult doit(Operation *op, PatternRewriter &rewriter);		static Optional<LinalgLoops> doit(Operation *op, PatternRewriter &rewriter);
};		};

template <typename LoopTy>		namespace {
bool loweringIsAllowed(int numParallelLoops, int numLoops) {		/// Helper struct to generate the loop nest for the op. This factored out here
return true;		/// to be able to partially specialize this for different LoopTy.
		template <typename LoopTy, typename ConcreteOpTy> class GenerateLoopNest {
		public:
		using IndexedValueTy =
		typename std::conditional<std::is_same<LoopTy, AffineForOp>::value,
		AffineIndexedValue, StdIndexedValue>::type;
		static Optional<LinalgLoops> doit(ConcreteOpTy linalgOp,
		ArrayRef<Value> loopRanges,
		MutableArrayRef<ValueHandle> allIvs) {
		auto nPar = linalgOp.getNumParallelLoops();
		auto nRed = linalgOp.getNumReductionLoops();
		auto nWin = linalgOp.getNumWindowLoops();
		auto nLoops = nPar + nRed + nWin;
		SmallVector<ValueHandle *, 4> allPIvs =
		makeHandlePointers(MutableArrayRef<ValueHandle>(allIvs));

		SmallVector<OperationHandle, 4> allLoops(nLoops, OperationHandle());
		SmallVector<OperationHandle *, 4> allPLoops;
		allPLoops.reserve(allLoops.size());
		for (OperationHandle &loop : allLoops)
		allPLoops.push_back(&loop);
		GenericLoopNestRangeBuilder<LoopTy>(allPLoops, allPIvs, loopRanges)([&] {
		SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());
		LinalgScopedEmitter<IndexedValueTy,
		ConcreteOpTy>::emitScalarImplementation(allIvValues,
		linalgOp);
		});
		// Number of loop ops might be different from the number of ivs since some
		// loops have multiple ivs.
		LinalgLoops loops;
		for (OperationHandle loop : allLoops) {
		if (Operation *op = loop.getOperation())
		loops.push_back(op);
		}
		return loops;
		}
		};

		/// Generates loops nest using loop.parallel. loop.parallel is only used for the
		/// outer parallel loops. All other loops are generated using loop.for
		/// operation.
		template <typename ConcreteOpTy>
		class GenerateLoopNest<loop::ParallelOp, ConcreteOpTy> {
		public:
		using IndexedValueTy = StdIndexedValue;

		static Optional<LinalgLoops> doit(ConcreteOpTy linalgOp,
		ArrayRef<Value> loopRanges,
		MutableArrayRef<ValueHandle> allIvs) {
		// Only generate loop.parallel for outer consecutive "parallel"
		// iterator_types.
		// TODO(ravishankarm): Generate loop.parallel for all "parallel" iterator
		// types.
		auto nPar = linalgOp.getNumParallelLoops();
		auto nRed = linalgOp.getNumReductionLoops();
		auto nWin = linalgOp.getNumWindowLoops();
		auto nLoops = nPar + nRed + nWin;
		auto nOuterPar = linalgOp.iterator_types()
		.getValue()
		.take_while([](Attribute attr) {
		return attr.cast<StringAttr>().getValue() ==
		getParallelIteratorTypeName();
		})
		.size();
		// If there are no outer parallel loops, then number of loop ops is same as
		// the number of loops, and they are all loop.for ops.
		auto nLoopOps = (nOuterPar ? nLoops - nOuterPar + 1 : nLoops);
		SmallVector<ValueHandle *, 4> allPIvs =
		makeHandlePointers(MutableArrayRef<ValueHandle>(allIvs));

		SmallVector<OperationHandle, 4> allLoops(nLoopOps, OperationHandle());
		SmallVector<OperationHandle *, 4> allPLoops;
		allPLoops.reserve(allLoops.size());
		for (OperationHandle &loop : allLoops)
		allPLoops.push_back(&loop);

		ArrayRef<ValueHandle *> allPIvsRef(allPIvs);
		ArrayRef<OperationHandle *> allPLoopsRef(allPLoops);

		if (nOuterPar) {
		GenericLoopNestRangeBuilder<loop::ParallelOp>(
		allPLoopsRef[0], allPIvsRef.take_front(nOuterPar),
		loopRanges.take_front(nOuterPar))([&] {
		GenericLoopNestRangeBuilder<loop::ForOp>(
		allPLoopsRef.drop_front(1), allPIvsRef.drop_front(nOuterPar),
		loopRanges.drop_front(nOuterPar))([&] {
		SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());
		LinalgScopedEmitter<StdIndexedValue, ConcreteOpTy>::
		emitScalarImplementation(allIvValues, linalgOp);
		});
		});
		} else {
		// If there are no parallel loops then fallback to generating all loop.for
		// operations.
		GenericLoopNestRangeBuilder<loop::ForOp>(allPLoopsRef, allPIvsRef,
		loopRanges)([&] {
		SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());
		LinalgScopedEmitter<StdIndexedValue,
		ConcreteOpTy>::emitScalarImplementation(allIvValues,
		linalgOp);
		});
		}
		// Number of loop ops might be different from the number of ivs since some
		// loops have multiple ivs.
		LinalgLoops loops;
		for (OperationHandle loop : allLoops) {
		if (Operation *op = loop.getOperation())
		loops.push_back(op);
}		}
template <>		return loops;
bool loweringIsAllowed<loop::ParallelOp>(int numParallelLoops, int numLoops) {
return numParallelLoops == numLoops;
}		}
		};
		} // namespace

		template <typename LoopTy, typename ConcreteOpTy>
		Optional<LinalgLoops>
		LinalgOpToLoopsImpl<LoopTy, ConcreteOpTy>::doit(Operation *op,
		PatternRewriter &rewriter) {
		using Impl = GenerateLoopNest<LoopTy, ConcreteOpTy>;
		using IndexedValueTy =
		typename GenerateLoopNest<LoopTy, ConcreteOpTy>::IndexedValueTy;

template <typename LoopTy, typename IndexedValueTy, typename ConcreteOpTy>		// using IndexedValueTy = typename Impl::IndexedValueTy;
LogicalResult LinalgOpToLoopsImpl<LoopTy, IndexedValueTy, ConcreteOpTy>::doit(		ScopedContext scope(rewriter, op->getLoc());
Operation *op, PatternRewriter &rewriter) {
OpBuilder b(op);
ScopedContext scope(b, op->getLoc());

// The flattened loopToOperandRangesMaps is expected to be an invertible		// The flattened loopToOperandRangesMaps is expected to be an invertible
// permutation map (which is asserted in the inverse calculation).		// permutation map (which is asserted in the inverse calculation).
auto linalgOp = cast<ConcreteOpTy>(op);		auto linalgOp = cast<ConcreteOpTy>(op);
assert(linalgOp.hasBufferSemantics() &&		assert(linalgOp.hasBufferSemantics() &&
"expected linalg op with buffer semantics");		"expected linalg op with buffer semantics");
auto nPar = linalgOp.getNumParallelLoops();		auto nPar = linalgOp.getNumParallelLoops();
auto nRed = linalgOp.getNumReductionLoops();		auto nRed = linalgOp.getNumReductionLoops();
auto nWin = linalgOp.getNumWindowLoops();		auto nWin = linalgOp.getNumWindowLoops();
auto nLoops = nPar + nRed + nWin;		auto nLoops = nPar + nRed + nWin;
if (!loweringIsAllowed<LoopTy>(nPar, nLoops))
return failure();
auto mapsRange =		auto mapsRange =
linalgOp.indexing_maps().template getAsRange<AffineMapAttr>();		linalgOp.indexing_maps().template getAsRange<AffineMapAttr>();
auto maps =		auto maps =
functional::map([](AffineMapAttr a) { return a.getValue(); }, mapsRange);		functional::map([](AffineMapAttr a) { return a.getValue(); }, mapsRange);
auto invertedMap = inversePermutation(concatAffineMaps(maps));		auto invertedMap = inversePermutation(concatAffineMaps(maps));
if (!invertedMap) {		if (!invertedMap) {
LinalgScopedEmitter<IndexedValueTy, ConcreteOpTy>::emitScalarImplementation(		LinalgScopedEmitter<IndexedValueTy, ConcreteOpTy>::emitScalarImplementation(
{}, linalgOp);		{}, linalgOp);
return success();		return LinalgLoops();
}		}

SmallVector<ValueHandle, 4> allIvs(nLoops, ValueHandle(b.getIndexType()));		SmallVector<ValueHandle, 4> allIvs(nLoops,
SmallVector<ValueHandle *, 4> allPIvs =		ValueHandle(rewriter.getIndexType()));
makeHandlePointers(MutableArrayRef<ValueHandle>(allIvs));		auto loopRanges =
auto loopRanges = emitLoopRanges(scope.getBuilder(), scope.getLocation(),		emitLoopRanges(scope.getBuilder(), scope.getLocation(), invertedMap,
invertedMap, getViewSizes(b, linalgOp));		getViewSizes(rewriter, linalgOp));
assert(loopRanges.size() == allIvs.size());		assert(loopRanges.size() == allIvs.size());
		return Impl::doit(linalgOp, loopRanges, allIvs);
GenericLoopNestRangeBuilder<LoopTy>(allPIvs, loopRanges)([&] {
SmallVector<Value, 4> allIvValues(allIvs.begin(), allIvs.end());
LinalgScopedEmitter<IndexedValueTy, ConcreteOpTy>::emitScalarImplementation(
allIvValues, linalgOp);
});
return success();
}		}

template <typename LoopType, typename IndexedValueType, typename ConcreteOp>		template <typename LoopType, typename ConcreteOp>
class LinalgRewritePattern : public RewritePattern {		class LinalgRewritePattern : public RewritePattern {
public:		public:
explicit LinalgRewritePattern(MLIRContext *context)		explicit LinalgRewritePattern(MLIRContext *context)
: RewritePattern(ConcreteOp::getOperationName(), 1, context) {}		: RewritePattern(ConcreteOp::getOperationName(), 1, context) {}

LogicalResult matchAndRewrite(Operation *op,		LogicalResult matchAndRewrite(Operation *op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
using Impl = LinalgOpToLoopsImpl<LoopType, IndexedValueType, ConcreteOp>;		using Impl = LinalgOpToLoopsImpl<LoopType, ConcreteOp>;
if (failed(Impl::doit(op, rewriter)))		Optional<LinalgLoops> loops = Impl::doit(op, rewriter);
		if (!loops)
return failure();		return failure();
rewriter.eraseOp(op);		rewriter.eraseOp(op);
return success();		return success();
}		}
};		};

// Helper classes for type list expansion.		// Helper classes for type list expansion.
template <typename LoopType, typename IndexedValueType, typename... LinalgOps>		template <typename LoopType, typename... LinalgOps> class RewritePatternList;
class RewritePatternList;

template <typename LoopType, typename IndexedValueType>		template <typename LoopType> class RewritePatternList<LoopType> {
class RewritePatternList<LoopType, IndexedValueType> {
public:		public:
static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {}		static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {}
};		};

template <typename LoopType, typename IndexedValueType, typename ConcreteOp,		template <typename LoopType, typename ConcreteOp, typename... LinalgOps>
typename... LinalgOps>		class RewritePatternList<LoopType, ConcreteOp, LinalgOps...> {
class RewritePatternList<LoopType, IndexedValueType, ConcreteOp, LinalgOps...> {
public:		public:
static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {		static void build(OwningRewritePatternList &patterns, MLIRContext *ctx) {
patterns		patterns.insert<LinalgRewritePattern<LoopType, ConcreteOp>>(ctx);
.insert<LinalgRewritePattern<LoopType, IndexedValueType, ConcreteOp>>(		RewritePatternList<LoopType, LinalgOps...>::build(patterns, ctx);
ctx);
RewritePatternList<LoopType, IndexedValueType, LinalgOps...>::build(
patterns, ctx);
}		}
};		};

/// Populate the given list with patterns that convert from Linalg to LLVM.		/// Populate the given list with patterns that convert from Linalg to LLVM.
template <typename LoopType, typename IndexedValueType>		template <typename LoopType>
void FillRewritePatterns(OwningRewritePatternList &patterns, MLIRContext *ctx) {		void FillRewritePatterns(OwningRewritePatternList &patterns, MLIRContext *ctx) {
RewritePatternList<LoopType, IndexedValueType,		RewritePatternList<LoopType,
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.cpp.inc"		#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.cpp.inc"
>::build(patterns, ctx);		>::build(patterns, ctx);
}		}

// Local folding pattern for AffineApplyOp that we can apply greedily.		// Local folding pattern for AffineApplyOp that we can apply greedily.
// This replaces AffineApplyOp by the proper value in cases where the associated		// This replaces AffineApplyOp by the proper value in cases where the associated
// map is trivial. A trivial map here is defined as a map with a single result		// map is trivial. A trivial map here is defined as a map with a single result
Show All 27 Lines	if (expr.dyn_cast<AffineDimExpr>() \|\| expr.dyn_cast<AffineSymbolExpr>()) {
rewriter.replaceOp(op, op->getOperand(0));		rewriter.replaceOp(op, op->getOperand(0));
return success();		return success();
}		}
return failure();		return failure();
}		}
};		};
} // namespace		} // namespace

template <typename LoopType, typename IndexedValueType>		template <typename LoopType>
static void lowerLinalgToLoopsImpl(Operation op, MLIRContext context) {		static void lowerLinalgToLoopsImpl(Operation op, MLIRContext context) {
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
// Canonicalization and folding patterns applied greedily allow cleaning up		// Canonicalization and folding patterns applied greedily allow cleaning up
// the emitted IR on the fly.		// the emitted IR on the fly.
// TODO(ntv) fold view and subview ops?		// TODO(ntv) fold view and subview ops?
FillRewritePatterns<LoopType, IndexedValueType>(patterns, context);		FillRewritePatterns<LoopType>(patterns, context);
DimOp::getCanonicalizationPatterns(patterns, context);		DimOp::getCanonicalizationPatterns(patterns, context);
AffineApplyOp::getCanonicalizationPatterns(patterns, context);		AffineApplyOp::getCanonicalizationPatterns(patterns, context);
patterns.insert<FoldAffineOp>(context);		patterns.insert<FoldAffineOp>(context);
// Just apply the patterns greedily.		// Just apply the patterns greedily.
applyPatternsGreedily(op, patterns);		applyPatternsGreedily(op, patterns);
}		}

namespace {		namespace {
struct LowerToAffineLoops : public FunctionPass<LowerToAffineLoops> {		struct LowerToAffineLoops : public FunctionPass<LowerToAffineLoops> {
/// Include the generated pass utilities.		/// Include the generated pass utilities.
#define GEN_PASS_LinalgLowerToAffineLoops		#define GEN_PASS_LinalgLowerToAffineLoops
#include "mlir/Dialect/Linalg/Passes.h.inc"		#include "mlir/Dialect/Linalg/Passes.h.inc"

void runOnFunction() override {		void runOnFunction() override {
lowerLinalgToLoopsImpl<AffineForOp, AffineIndexedValue>(getFunction(),		lowerLinalgToLoopsImpl<AffineForOp>(getFunction(), &getContext());
&getContext());
}		}
};		};
struct LowerToLoops : public FunctionPass<LowerToLoops> {		struct LowerToLoops : public FunctionPass<LowerToLoops> {
/// Include the generated pass utilities.		/// Include the generated pass utilities.
#define GEN_PASS_LinalgLowerToLoops		#define GEN_PASS_LinalgLowerToLoops
#include "mlir/Dialect/Linalg/Passes.h.inc"		#include "mlir/Dialect/Linalg/Passes.h.inc"

void runOnFunction() override {		void runOnFunction() override {
lowerLinalgToLoopsImpl<loop::ForOp, StdIndexedValue>(getFunction(),		lowerLinalgToLoopsImpl<loop::ForOp>(getFunction(), &getContext());
&getContext());
}		}
};		};
struct LowerToParallelLoops : public FunctionPass<LowerToParallelLoops> {		struct LowerToParallelLoops : public FunctionPass<LowerToParallelLoops> {
/// Include the generated pass utilities.		/// Include the generated pass utilities.
#define GEN_PASS_LinalgLowerToParallelLoops		#define GEN_PASS_LinalgLowerToParallelLoops
#include "mlir/Dialect/Linalg/Passes.h.inc"		#include "mlir/Dialect/Linalg/Passes.h.inc"

void runOnFunction() override {		void runOnFunction() override {
lowerLinalgToLoopsImpl<loop::ParallelOp, StdIndexedValue>(getFunction(),		lowerLinalgToLoopsImpl<loop::ParallelOp>(getFunction(), &getContext());
&getContext());
}		}
};		};
} // namespace		} // namespace

std::unique_ptr<OpPassBase<FuncOp>> mlir::createConvertLinalgToLoopsPass() {		std::unique_ptr<OpPassBase<FuncOp>> mlir::createConvertLinalgToLoopsPass() {
return std::make_unique<LowerToLoops>();		return std::make_unique<LowerToLoops>();
}		}

std::unique_ptr<OpPassBase<FuncOp>>		std::unique_ptr<OpPassBase<FuncOp>>
mlir::createConvertLinalgToParallelLoopsPass() {		mlir::createConvertLinalgToParallelLoopsPass() {
return std::make_unique<LowerToParallelLoops>();		return std::make_unique<LowerToParallelLoops>();
}		}

std::unique_ptr<OpPassBase<FuncOp>>		std::unique_ptr<OpPassBase<FuncOp>>
mlir::createConvertLinalgToAffineLoopsPass() {		mlir::createConvertLinalgToAffineLoopsPass() {
return std::make_unique<LowerToAffineLoops>();		return std::make_unique<LowerToAffineLoops>();
}		}

		/// Emits a loop nest with the proper body for `op`.
		template <typename LoopTy, typename ConcreteOp>
		Optional<LinalgLoops>
		mlir::linalg::linalgLowerOpToLoops(PatternRewriter &rewriter, ConcreteOp op) {
		return LinalgOpToLoopsImpl<LoopTy, ConcreteOp>::doit(op, rewriter);
		}

/// Emits a loop nest of `loop.for` with the proper body for `op`.		/// Emits a loop nest of `loop.for` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult mlir::linalg::linalgOpToLoops(PatternRewriter &rewriter,		LogicalResult mlir::linalg::linalgOpToLoops(PatternRewriter &rewriter,
Operation *op) {		Operation *op) {
return LinalgOpToLoopsImpl<loop::ForOp, StdIndexedValue, ConcreteOp>::doit(		Optional<LinalgLoops> loops =
op, rewriter);		LinalgOpToLoopsImpl<loop::ForOp, ConcreteOp>::doit(op, rewriter);
		return loops ? success() : failure();
}		}

/// Emits a loop nest of `affine.for` with the proper body for `op`.		/// Emits a loop nest of `affine.for` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult mlir::linalg::linalgOpToAffineLoops(PatternRewriter &rewriter,		LogicalResult mlir::linalg::linalgOpToAffineLoops(PatternRewriter &rewriter,
Operation *op) {		Operation *op) {
return LinalgOpToLoopsImpl<AffineForOp, AffineIndexedValue, ConcreteOp>::doit(		Optional<LinalgLoops> loops =
op, rewriter);		LinalgOpToLoopsImpl<AffineForOp, ConcreteOp>::doit(op, rewriter);
		return loops ? success() : failure();
}		}

/// Emits a loop nest of `loop.parallel` with the proper body for `op`.		/// Emits a loop nest of `loop.parallel` with the proper body for `op`.
template <typename ConcreteOp>		template <typename ConcreteOp>
LogicalResult mlir::linalg::linalgOpToParallelLoops(PatternRewriter &rewriter,		LogicalResult mlir::linalg::linalgOpToParallelLoops(PatternRewriter &rewriter,
Operation *op) {		Operation *op) {
return LinalgOpToLoopsImpl<loop::ParallelOp, StdIndexedValue,		Optional<LinalgLoops> loops =
ConcreteOp>::doit(op, rewriter);		LinalgOpToLoopsImpl<loop::ParallelOp, ConcreteOp>::doit(op, rewriter);
		return loops ? success() : failure();
}		}

// TODO(ntv) Need to make these instantiations more future-proof to avoid the		// TODO(ntv) Need to make these instantiations more future-proof to avoid the
// need to update as soon as we add new ops.		// need to update as soon as we add new ops.
#define INSTANTIATE_LINALG_OP_TO_LOOPS(OP_TYPE) \		#define INSTANTIATE_LINALG_OP_TO_LOOPS(OP_TYPE) \
template LogicalResult mlir::linalg::linalgOpToLoops<OP_TYPE>( \		template LogicalResult mlir::linalg::linalgOpToLoops<OP_TYPE>( \
PatternRewriter & rewriter, Operation * op); \		PatternRewriter & rewriter, Operation * op); \
template LogicalResult mlir::linalg::linalgOpToAffineLoops<OP_TYPE>( \		template LogicalResult mlir::linalg::linalgOpToAffineLoops<OP_TYPE>( \
		PatternRewriter & rewriter, Operation * op); \
		template LogicalResult mlir::linalg::linalgOpToParallelLoops<OP_TYPE>( \
PatternRewriter & rewriter, Operation * op);		PatternRewriter & rewriter, Operation * op);

INSTANTIATE_LINALG_OP_TO_LOOPS(CopyOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(CopyOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(FillOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(FillOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(DotOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(DotOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(MatvecOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(MatvecOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(MatmulOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(MatmulOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(ConvOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(ConvOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMaxOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMaxOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMinOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingMinOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingSumOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(PoolingSumOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(GenericOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(GenericOp)
INSTANTIATE_LINALG_OP_TO_LOOPS(IndexedGenericOp)		INSTANTIATE_LINALG_OP_TO_LOOPS(IndexedGenericOp)

// TODO(pifon): Enable lowering to parallel loops for ops other than
// linalg.generic for now to be on the safe side.
template LogicalResult
mlir::linalg::linalgOpToParallelLoops<GenericOp>(PatternRewriter &rewriter,
Operation *op);

mlir/lib/Dialect/LoopOps/EDSC/Builders.cpp

//===- Builders.cpp - MLIR Declarative Builder Classes --------------------===//		//===- Builders.cpp - MLIR Declarative Builder Classes --------------------===//
		Lint: Lint Inline Actions clang-format-diff not found in user's PATH; not linting file. Lint: Lint: clang-format-diff not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Show All 9 Lines	mlir::edsc::ParallelLoopNestBuilder::ParallelLoopNestBuilder(
ArrayRef<ValueHandle> ubs, ArrayRef<ValueHandle> steps) {		ArrayRef<ValueHandle> ubs, ArrayRef<ValueHandle> steps) {
assert(ivs.size() == lbs.size() && "Mismatch in number of arguments");		assert(ivs.size() == lbs.size() && "Mismatch in number of arguments");
assert(ivs.size() == ubs.size() && "Mismatch in number of arguments");		assert(ivs.size() == ubs.size() && "Mismatch in number of arguments");
assert(ivs.size() == steps.size() && "Mismatch in number of arguments");		assert(ivs.size() == steps.size() && "Mismatch in number of arguments");

loops.emplace_back(makeParallelLoopBuilder(ivs, lbs, ubs, steps));		loops.emplace_back(makeParallelLoopBuilder(ivs, lbs, ubs, steps));
}		}

		mlir::edsc::ParallelLoopNestBuilder::ParallelLoopNestBuilder(
		OperationHandle loop, ArrayRef<ValueHandle > ivs,
		ArrayRef<ValueHandle> lbs, ArrayRef<ValueHandle> ubs,
		ArrayRef<ValueHandle> steps) {
		assert(ivs.size() == lbs.size() && "Mismatch in number of arguments");
		assert(ivs.size() == ubs.size() && "Mismatch in number of arguments");
		assert(ivs.size() == steps.size() && "Mismatch in number of arguments");

		loops.emplace_back(makeParallelLoopBuilder(loop, ivs, lbs, ubs, steps));
		}

void mlir::edsc::ParallelLoopNestBuilder::operator()(		void mlir::edsc::ParallelLoopNestBuilder::operator()(
function_ref<void(void)> fun) {		function_ref<void(void)> fun) {
if (fun)		if (fun)
fun();		fun();
// Iterate on the calling operator() on all the loops in the nest.		// Iterate on the calling operator() on all the loops in the nest.
// The iteration order is from innermost to outermost because enter/exit needs		// The iteration order is from innermost to outermost because enter/exit needs
// to be asymmetric (i.e. enter() occurs on LoopBuilder construction, exit()		// to be asymmetric (i.e. enter() occurs on LoopBuilder construction, exit()
// occurs on calling operator()). The asymmetry is required for properly		// occurs on calling operator()). The asymmetry is required for properly
Show All 13 Lines	mlir::edsc::LoopNestBuilder::LoopNestBuilder(ArrayRef<ValueHandle *> ivs,
loops.reserve(ivs.size());		loops.reserve(ivs.size());
for (auto it : llvm::zip(ivs, lbs, ubs, steps)) {		for (auto it : llvm::zip(ivs, lbs, ubs, steps)) {
loops.emplace_back(makeLoopBuilder(std::get<0>(it), std::get<1>(it),		loops.emplace_back(makeLoopBuilder(std::get<0>(it), std::get<1>(it),
std::get<2>(it), std::get<3>(it)));		std::get<2>(it), std::get<3>(it)));
}		}
assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");		assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");
}		}

		mlir::edsc::LoopNestBuilder::LoopNestBuilder(ArrayRef<OperationHandle *> loop,
		ArrayRef<ValueHandle *> ivs,
		ArrayRef<ValueHandle> lbs,
		ArrayRef<ValueHandle> ubs,
		ArrayRef<ValueHandle> steps) {
		assert(ivs.size() == loop.size() &&
		"expected size of ivs and loops to match");
		assert(ivs.size() == lbs.size() && "expected size of ivs and lbs to match");
		assert(ivs.size() == ubs.size() && "expected size of ivs and ubs to match");
		assert(ivs.size() == steps.size() &&
		"expected size of ivs and steps to match");
		loops.reserve(ivs.size());
		for (auto it : llvm::zip(loop, ivs, lbs, ubs, steps)) {
		loops.emplace_back(makeLoopBuilder(std::get<0>(it), std::get<1>(it),
		std::get<2>(it), std::get<3>(it),
		std::get<4>(it)));
		}
		assert(loops.size() == ivs.size() && "Mismatch loops vs ivs size");
		}

void mlir::edsc::LoopNestBuilder::LoopNestBuilder::operator()(		void mlir::edsc::LoopNestBuilder::LoopNestBuilder::operator()(
std::function<void(void)> fun) {		std::function<void(void)> fun) {
if (fun)		if (fun)
fun();		fun();
for (auto &lit : reverse(loops))		for (auto &lit : reverse(loops))
lit({});		lit({});
}		}

LoopBuilder mlir::edsc::makeParallelLoopBuilder(ArrayRef<ValueHandle *> ivs,		LoopBuilder mlir::edsc::makeParallelLoopBuilder(OperationHandle *loop,
		ArrayRef<ValueHandle *> ivs,
ArrayRef<ValueHandle> lbHandles,		ArrayRef<ValueHandle> lbHandles,
ArrayRef<ValueHandle> ubHandles,		ArrayRef<ValueHandle> ubHandles,
ArrayRef<ValueHandle> steps) {		ArrayRef<ValueHandle> steps) {
LoopBuilder result;		LoopBuilder result;
auto opHandle = OperationHandle::create<loop::ParallelOp>(		auto opHandle = OperationHandle::create<loop::ParallelOp>(
SmallVector<Value, 4>(lbHandles.begin(), lbHandles.end()),		SmallVector<Value, 4>(lbHandles.begin(), lbHandles.end()),
SmallVector<Value, 4>(ubHandles.begin(), ubHandles.end()),		SmallVector<Value, 4>(ubHandles.begin(), ubHandles.end()),
SmallVector<Value, 4>(steps.begin(), steps.end()));		SmallVector<Value, 4>(steps.begin(), steps.end()));

loop::ParallelOp parallelOp =		loop::ParallelOp parallelOp =
cast<loop::ParallelOp>(*opHandle.getOperation());		cast<loop::ParallelOp>(*opHandle.getOperation());
for (size_t i = 0, e = ivs.size(); i < e; ++i)		for (size_t i = 0, e = ivs.size(); i < e; ++i)
*ivs[i] = ValueHandle(parallelOp.getBody()->getArgument(i));		*ivs[i] = ValueHandle(parallelOp.getBody()->getArgument(i));
		*loop = OperationHandle(parallelOp);
result.enter(parallelOp.getBody(), /prev=/1);		result.enter(parallelOp.getBody(), /prev=/1);
return result;		return result;
}		}

mlir::edsc::LoopBuilder mlir::edsc::makeLoopBuilder(ValueHandle *iv,		LoopBuilder mlir::edsc::makeParallelLoopBuilder(ArrayRef<ValueHandle *> ivs,
		ArrayRef<ValueHandle> lbHandles,
		ArrayRef<ValueHandle> ubHandles,
		ArrayRef<ValueHandle> steps) {
		OperationHandle loop;
		return mlir::edsc::makeParallelLoopBuilder(&loop, ivs, lbHandles, ubHandles,
		steps);
		}

		mlir::edsc::LoopBuilder mlir::edsc::makeLoopBuilder(OperationHandle *loop,
		ValueHandle *iv,
ValueHandle lbHandle,		ValueHandle lbHandle,
ValueHandle ubHandle,		ValueHandle ubHandle,
ValueHandle stepHandle) {		ValueHandle stepHandle) {
mlir::edsc::LoopBuilder result;		mlir::edsc::LoopBuilder result;
auto forOp =		auto forOp =
OperationHandle::createOp<loop::ForOp>(lbHandle, ubHandle, stepHandle);		OperationHandle::createOp<loop::ForOp>(lbHandle, ubHandle, stepHandle);
*iv = ValueHandle(forOp.getInductionVar());		*iv = ValueHandle(forOp.getInductionVar());
		*loop = OperationHandle(forOp);
auto *body = loop::getForInductionVarOwner(iv->getValue()).getBody();		auto *body = loop::getForInductionVarOwner(iv->getValue()).getBody();
result.enter(body, /prev=/1);		result.enter(body, /prev=/1);
return result;		return result;
}		}

		mlir::edsc::LoopBuilder mlir::edsc::makeLoopBuilder(ValueHandle *iv,
		ValueHandle lbHandle,
		ValueHandle ubHandle,
		ValueHandle stepHandle) {
		OperationHandle loop;
		return mlir::edsc::makeLoopBuilder(&loop, iv, lbHandle, ubHandle, stepHandle);
		}

mlir/test/Dialect/Linalg/loops.mlir

// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck %s		// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck --check-prefix=CHECKLOOP %s
		// RUN: mlir-opt %s -convert-linalg-to-parallel-loops \| FileCheck --check-prefix=CHECKPARALLEL %s

// Test that we can lower all the way to LLVM without crashing, don't check results here.		// Test that we can lower all the way to LLVM without crashing, don't check results here.
// RUN: mlir-opt %s --convert-linalg-to-llvm -o=/dev/null 2>&1		// RUN: mlir-opt %s --convert-linalg-to-llvm -o=/dev/null 2>&1

// CHECK-DAG: #[[strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>		// CHECKLOOP-DAG: #[[strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
// CHECK-DAG: #[[strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>		// CHECKLOOP-DAG: #[[strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
// CHECK-DAG: #[[strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>		// CHECKLOOP-DAG: #[[strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
// CHECK-DAG: #[[strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>		// CHECKLOOP-DAG: #[[strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
// CHECK-DAG: #[[clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>		// CHECKLOOP-DAG: #[[clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>

// CHECK-DAG: #[[Stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECKLOOP-DAG: #[[Stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
// CHECK-DAG: #[[Stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>		// CHECKLOOP-DAG: #[[Stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>
// CHECK-DAG: #[[Stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>		// CHECKLOOP-DAG: #[[Stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>
// CHECK-DAG: #[[Stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>		// CHECKLOOP-DAG: #[[Stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>

		// CHECKPARALLEL-DAG: #[[strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
		// CHECKPARALLEL-DAG: #[[strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
		// CHECKPARALLEL-DAG: #[[strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
		// CHECKPARALLEL-DAG: #[[strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
		// CHECKPARALLEL-DAG: #[[clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>

		// CHECKPARALLEL-DAG: #[[Stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
		// CHECKPARALLEL-DAG: #[[Stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>
		// CHECKPARALLEL-DAG: #[[Stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>
		// CHECKPARALLEL-DAG: #[[Stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>


func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {		func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%A = view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%A = view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
%B = view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%B = view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
%C = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%C = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
linalg.matmul(%A, %B, %C) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>		linalg.matmul(%A, %B, %C) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?xf32, offset: ?, strides: [?, 1]>
return		return
}		}
// CHECK-LABEL: func @matmul(%{{.*}}: memref<?xi8>,		// CHECKLOOP-LABEL: func @matmul(%{{.*}}: memref<?xi8>,
// CHECK-SAME: [[M:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[M:arg[0-9]+]]: index
// CHECK-SAME: [[N:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[N:arg[0-9]+]]: index
// CHECK-SAME: [[K:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[K:arg[0-9]+]]: index
// CHECK: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[N]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[N]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[b:.]] = load %[[B]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %[[B]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.]] = load %[[C]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[c:.]] = load %[[C]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %[[C]][%{{.}}, %{{.}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: store %[[res]], %[[C]][%{{.}}, %{{.}}] : memref<?x?xf32, #[[strided2D]]>

		// CHECKPARALLEL-LABEL: func @matmul(%{{.*}}: memref<?xi8>,
		// CHECKPARALLEL-SAME: [[M:arg[0-9]+]]: index
		// CHECKPARALLEL-SAME: [[N:arg[0-9]+]]: index
		// CHECKPARALLEL-SAME: [[K:arg[0-9]+]]: index
		// CHECKPARALLEL: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[M]], %[[N]]) step (%{{.}}, %{{.}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %[[B]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.]] = load %[[C]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %[[C]][%{{.}}, %{{.}}] : memref<?x?xf32, #[[strided2D]]>



func @matvec(%arg0: memref<?xi8>, %M: index, %N: index) {		func @matvec(%arg0: memref<?xi8>, %M: index, %N: index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%2 = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>		%2 = view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32, offset: ?, strides: [?, 1]>
%3 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%3 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
%4 = view %arg0[%c0][%N] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%4 = view %arg0[%c0][%N] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
linalg.matvec(%2, %3, %4) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>		linalg.matvec(%2, %3, %4) : memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>
return		return
}		}
// CHECK-LABEL: func @matvec(%{{.*}}: memref<?xi8>,		// CHECKLOOP-LABEL: func @matvec(%{{.*}}: memref<?xi8>,
// CHECK-SAME: [[M:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[M:arg[0-9]+]]: index
// CHECK-SAME: [[K:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[K:arg[0-9]+]]: index
// CHECK: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[M]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
// CHECK-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.]] = load %[[C]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[c:.]] = load %[[C]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %[[C]][%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: store %[[res]], %[[C]][%{{.*}}] : memref<?xf32, #[[strided1D]]>

		// CHECKPARALLEL-LABEL: func @matvec(%{{.*}}: memref<?xi8>,
		// CHECKPARALLEL-SAME: [[M:arg[0-9]+]]: index
		// CHECKPARALLEL-SAME: [[K:arg[0-9]+]]: index
		// CHECKPARALLEL: %[[A:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[B:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: %[[C:.]] = std.view %{{.}}[{{.*}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}) = (%{{.}}) to (%[[M]]) step (%{{.*}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %[[A]][%{{.}}, %{{.*}}] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.]] = load %[[C]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %[[C]][%{{.*}}] : memref<?xf32, #[[strided1D]]>


func @dot(%arg0: memref<?xi8>, %M: index) {		func @dot(%arg0: memref<?xi8>, %M: index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%1 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%1 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
%2 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>		%2 = view %arg0[%c0][%M] : memref<?xi8> to memref<?xf32, offset: ?, strides: [1]>
%3 = view %arg0[][] : memref<?xi8> to memref<f32>		%3 = view %arg0[][] : memref<?xi8> to memref<f32>
linalg.dot(%1, %2, %3) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>		linalg.dot(%1, %2, %3) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>
return		return
}		}
// CHECK-LABEL: func @dot(%{{.*}}: memref<?xi8>,		// CHECKLOOP-LABEL: func @dot(%{{.*}}: memref<?xi8>,
// CHECK-SAME: [[K:arg[0-9]+]]: index		// CHECKLOOP-SAME: [[K:arg[0-9]+]]: index
// CHECK: %[[A:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[A:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: %[[B:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[B:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
// CHECK: %[[C:.]] = std.view %{{.}}[][] : memref<?xi8> to memref<f32>		// CHECKLOOP: %[[C:.]] = std.view %{{.}}[][] : memref<?xi8> to memref<f32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		bondhugulaUnsubmitted Not Done Reply Inline Actions Is there a need to match all of the trailing 'step %{{.}}'? You always print step right? bondhugula:* Is there a need to match all of the trailing 'step %{{.*}}'? You always print step right?
		mravishankarAuthorUnsubmitted Done Reply Inline Actions Probably not. I didnt change what was already there, just changed the check-prefix. I would rather keep it as is. mravishankar: Probably not. I didnt change what was already there, just changed the check-prefix. I would…
// CHECK-DAG: %[[a:.]] = load %[[A]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %[[A]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.*]] = load %[[C]][] : memref<f32>		// CHECKLOOP-DAG: %[[c:.*]] = load %[[C]][] : memref<f32>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %[[C]][] : memref<f32>		// CHECKLOOP: store %[[res]], %[[C]][] : memref<f32>

		// CHECKPARALLEL-LABEL: func @dot(%{{.*}}: memref<?xi8>,
		// CHECKPARALLEL-SAME: [[K:arg[0-9]+]]: index
		// CHECKPARALLEL: %[[A:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: %[[B:.]] = std.view %{{.}}[{{.}}][{{.}}] : memref<?xi8> to memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: %[[C:.]] = std.view %{{.}}[][] : memref<?xi8> to memref<f32>
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %[[A]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %[[B]][%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.*]] = load %[[C]][] : memref<f32>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %[[C]][] : memref<f32>


func @dot_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {		func @dot_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {
linalg.dot(%arg0, %arg1, %arg2) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>		linalg.dot(%arg0, %arg1, %arg2) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>, memref<f32>
return		return
}		}
// CHECK-LABEL: func @dot_view(		// CHECKLOOP-LABEL: func @dot_view(
// CHECK: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.*}}: memref<f32>) {		// CHECKLOOP: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.*}}: memref<f32>) {
// CHECK: %[[K:.*]] = dim %arg0, 0 : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 0 : memref<?xf32, #[[strided1D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK-DAG: %[[a:.]] = load %arg0[%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[a:.]] = load %arg0[%{{.}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[b:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP-DAG: %[[b:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
// CHECK-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK-DAG: %[[c:.]] = load %{{.}}[] : memref<f32>		// CHECKLOOP-DAG: %[[c:.]] = load %{{.}}[] : memref<f32>
// CHECK-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32		// CHECKLOOP-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
// CHECK: store %[[res]], %{{.*}}[] : memref<f32>		// CHECKLOOP: store %[[res]], %{{.*}}[] : memref<f32>

		// CHECKPARALLEL-LABEL: func @dot_view(
		// CHECKPARALLEL: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.*}}: memref<f32>) {
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 0 : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
		// CHECKPARALLEL-DAG: %[[a:.]] = load %arg0[%{{.}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[b:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL-DAG: %[[inc:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL-DAG: %[[c:.]] = load %{{.}}[] : memref<f32>
		// CHECKPARALLEL-DAG: %[[res:.*]] = addf %[[c]], %[[inc]] : f32
		// CHECKPARALLEL: store %[[res]], %{{.*}}[] : memref<f32>

func @fill_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: f32) {		func @fill_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: f32) {
linalg.fill(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, f32		linalg.fill(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, f32
return		return
}		}
// CHECK-LABEL: func @fill_view(		// CHECKLOOP-LABEL: func @fill_view(
// CHECK: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: f32) {		// CHECKLOOP: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: f32) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: store %{{.}}, %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>

		// CHECKPARALLEL-LABEL: func @fill_view(
		// CHECKPARALLEL: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: f32) {
		// CHECKPARALLEL: loop.parallel (%{{.}}) = (%{{.}}) to (%{{.}}) step (%{{.}}) {
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>

func @fill_view0(%arg0: memref<f32>, %arg1: f32) {		func @fill_view0(%arg0: memref<f32>, %arg1: f32) {
linalg.fill(%arg0, %arg1) : memref<f32>, f32		linalg.fill(%arg0, %arg1) : memref<f32>, f32
return		return
}		}
// CHECK-LABEL: func @fill_view0(%{{.}}: memref<f32>, %{{.}}: f32) {		// CHECKLOOP-LABEL: func @fill_view0(%{{.}}: memref<f32>, %{{.}}: f32) {
// CHECK: store %{{.}}, %{{.}}[] : memref<f32>		// CHECKLOOP: store %{{.}}, %{{.}}[] : memref<f32>

		// CHECKPARALLEL-LABEL: func @fill_view0(%{{.}}: memref<f32>, %{{.}}: f32) {
		// CHECKPARALLEL: store %{{.}}, %{{.}}[] : memref<f32>

func @fill_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: f32) {		func @fill_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: f32) {
linalg.fill(%arg0, %arg1) : memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, f32		linalg.fill(%arg0, %arg1) : memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, f32
return		return
}		}
// CHECK-LABEL: func @fill_view3(		// CHECKLOOP-LABEL: func @fill_view3(
// CHECK: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: f32) {		// CHECKLOOP: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: f32) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: func @fill_view3(
		// CHECKPARALLEL: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: f32) {
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%{{.}}, %{{.}}, %{{.}}) step (%{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

func @copy_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>) {		func @copy_view(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>) {
linalg.copy(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>		linalg.copy(%arg0, %arg1) : memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>
return		return
}		}
// CHECK-LABEL: func @copy_view(		// CHECKLOOP-LABEL: func @copy_view(
// CHECK: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>) {		// CHECKLOOP: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: %[[L:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: %[[L:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
// CHECK: store %[[L]], %{{.}}[%{{.}}] : memref<?xf32, #[[strided1D]]>		// CHECKLOOP: store %[[L]], %{{.}}[%{{.}}] : memref<?xf32, #[[strided1D]]>

		// CHECKPARALLEL-LABEL: func @copy_view(
		// CHECKPARALLEL: %{{.}}: memref<?xf32, #[[strided1D]]>, %{{.}}: memref<?xf32, #[[strided1D]]>) {
		// CHECKPARALLEL: loop.parallel (%{{.}}) = (%{{.}}) to (%{{.}}) step (%{{.}}) {
		// CHECKPARALLEL: %[[L:.]] = load %{{.}}[%{{.*}}] : memref<?xf32, #[[strided1D]]>
		// CHECKPARALLEL: store %[[L]], %{{.}}[%{{.}}] : memref<?xf32, #[[strided1D]]>

func @copy_view0(%arg0: memref<f32>, %arg1: memref<f32>) {		func @copy_view0(%arg0: memref<f32>, %arg1: memref<f32>) {
linalg.copy(%arg0, %arg1) : memref<f32>, memref<f32>		linalg.copy(%arg0, %arg1) : memref<f32>, memref<f32>
return		return
}		}
// CHECK-LABEL: func @copy_view0(%{{.}}: memref<f32>, %{{.}}: memref<f32>) {		// CHECKLOOP-LABEL: func @copy_view0(%{{.}}: memref<f32>, %{{.}}: memref<f32>) {
// CHECK: %{{.}} = load %{{.}}[] : memref<f32>		// CHECKLOOP: %{{.}} = load %{{.}}[] : memref<f32>
// CHECK: store %{{.}}, %{{.}}[] : memref<f32>		// CHECKLOOP: store %{{.}}, %{{.}}[] : memref<f32>

		// CHECKPARALLEL-LABEL: func @copy_view0(%{{.}}: memref<f32>, %{{.}}: memref<f32>) {
		// CHECKPARALLEL: %{{.}} = load %{{.}}[] : memref<f32>
		// CHECKPARALLEL: store %{{.}}, %{{.}}[] : memref<f32>

func @copy_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @copy_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.copy(%arg0, %arg1) {inputPermutation = affine_map<(i, j, k) -> (i, k, j)>,		linalg.copy(%arg0, %arg1) {inputPermutation = affine_map<(i, j, k) -> (i, k, j)>,
outputPermutation = affine_map<(i, j, k) -> (k, j, i)>} :		outputPermutation = affine_map<(i, j, k) -> (k, j, i)>} :
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: func @copy_view3		// CHECKLOOP-LABEL: func @copy_view3
// CHECK: (%{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>) {		// CHECKLOOP: (%{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>) {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %{{.}} step %{{.}} {
// CHECK: %[[L:.]] = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[L:.]] = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[L]], %{{.}}[%{{.}}, %{{.}}, %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[L]], %{{.}}[%{{.}}, %{{.}}, %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: func @copy_view3
		// CHECKPARALLEL: (%{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>) {
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%{{.}}, %{{.}}, %{{.}}) step (%{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: %[[L:.]] = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[L]], %{{.}}[%{{.}}, %{{.}}, %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>

func @conv_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @conv_view3(%arg0: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.conv(%arg0, %arg1, %arg2) {strides = [2]}: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		linalg.conv(%arg0, %arg1, %arg2) {strides = [2]}: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: func @conv_view3(		// CHECKLOOP-LABEL: func @conv_view3(
// CHECK: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[strided3D]]>) {		// CHECKLOOP: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[strided3D]]>) {
// CHECK: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[Q:.*]] = dim %arg0, 1 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[Q:.*]] = dim %arg0, 1 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[K:.*]] = dim %arg0, 2 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 2 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
// CHECK: %[[SUM:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[SUM:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %{{.}} = mulf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %{{.}} = addf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: func @conv_view3(
		// CHECKPARALLEL: %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.}}: memref<?x?x?xf32, #[[strided3D]]>, %{{.*}}: memref<?x?x?xf32, #[[strided3D]]>) {
		// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, 1 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 2 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[K]]) step (%{{.}}, %{{.}}, %{{.*}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
		// CHECKPARALLEL: %[[SUM:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM]], %{{.}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.*}}] : memref<?x?x?xf32, #[[strided3D]]>

func @conv_view4(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {		func @conv_view4(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {
linalg.conv(%arg0, %arg1, %arg2) {dilations = [4, 5], strides = [2, 3]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>		linalg.conv(%arg0, %arg1, %arg2) {dilations = [4, 5], strides = [2, 3]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>
return		return
}		}
// CHECK-LABEL: func @conv_view4(		// CHECKLOOP-LABEL: func @conv_view4(
// CHECK: %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[strided4D]]>) {		// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[strided4D]]>) {
// CHECK: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
// CHECK: %[[SUM0:.]] = affine.apply #[[Stride2Dilation4]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[SUM0:.]] = affine.apply #[[Stride2Dilation4]](%{{.}}, %{{.*}})
// CHECK: %[[SUM1:.]] = affine.apply #[[Stride3Dilation5]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[SUM1:.]] = affine.apply #[[Stride3Dilation5]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %{{.}} = mulf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
// CHECK: %{{.}} = addf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>

		// CHECKPARALLEL-LABEL: func @conv_view4(
		// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.}}: memref<?x?x?x?xf32, #[[strided4D]]>, %{{.*}}: memref<?x?x?x?xf32, #[[strided4D]]>) {
		// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]], %[[K]]) step (%{{.}}, %{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
		// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #[[Stride2Dilation4]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #[[Stride3Dilation5]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[SUM0]], %[[SUM1]], %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>
		// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32, #[[strided4D]]>


func @conv_padding(%arg0: memref<?x?x?x?xf32>,		func @conv_padding(%arg0: memref<?x?x?x?xf32>,
%arg1: memref<?x?x?x?xf32>,		%arg1: memref<?x?x?x?xf32>,
%arg2: memref<?x?x?x?xf32>) {		%arg2: memref<?x?x?x?xf32>) {
linalg.conv(%arg0, %arg1, %arg2) {dilations = [1, 1],		linalg.conv(%arg0, %arg1, %arg2) {dilations = [1, 1],
padding = dense<[[0, 1], [1, 1]]> : tensor<2x2xi64>,		padding = dense<[[0, 1], [1, 1]]> : tensor<2x2xi64>,
strides = [1, 1]} :		strides = [1, 1]} :
memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>		memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>
return		return
}		}
// CHECK-LABEL: func @conv_padding		// CHECKLOOP-LABEL: func @conv_padding
// CHECK: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {		// CHECKLOOP: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {
// CHECK: %[[ZERO:.*]] = constant 0.000000e+00 : f32		// CHECKLOOP: %[[ZERO:.*]] = constant 0.000000e+00 : f32
// CHECK: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32>
// CHECK: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32>
// CHECK: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32>
// CHECK: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32>
// CHECK: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32>
// CHECK: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32>
// CHECK: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[B]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[X1]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[K]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
// CHECK: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})		// CHECKLOOP: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
// CHECK: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})		// CHECKLOOP: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
// CHECK: %[[IDX:.*]] = affine.max #[[clampMinMap]](%[[SUM0]])		// CHECKLOOP: %[[IDX:.*]] = affine.max #[[clampMinMap]](%[[SUM0]])
// CHECK: %[[IDY:.*]] = affine.max #[[clampMinMap]](%[[SUM1]])		// CHECKLOOP: %[[IDY:.*]] = affine.max #[[clampMinMap]](%[[SUM1]])
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>
// CHECK: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32		// CHECKLOOP: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
// CHECK: %{{.}} = mulf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = mulf %{{.}}, %{{.*}} : f32
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
// CHECK: %{{.}} = addf %{{.}}, %{{.*}} : f32		// CHECKLOOP: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>		// CHECKLOOP: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>

		// CHECKPARALLEL-LABEL: func @conv_padding
		// CHECKPARALLEL: %{{.}}: memref<?x?x?x?xf32>, %{{.}}: memref<?x?x?x?xf32>, %{{.*}}: memref<?x?x?x?xf32>) {
		// CHECKPARALLEL: %[[ZERO:.*]] = constant 0.000000e+00 : f32
		// CHECKPARALLEL: %[[Z0:.*]] = dim %arg0, 0 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[Z1:.*]] = dim %arg0, 1 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[Q:.*]] = dim %arg0, 2 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[K:.*]] = dim %arg0, 3 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[B:.*]] = dim %arg1, 0 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[X0:.*]] = dim %arg2, 1 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %[[X1:.*]] = dim %arg2, 2 : memref<?x?x?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}, %{{.}}, %{{.}}) = (%{{.}}, %{{.}}, %{{.}}, %{{.}}) to (%[[B]], %[[X0]], %[[X1]], %[[K]]) step (%{{.}}, %{{.}}, %{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Q]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z0]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[Z1]] step %{{.*}} {
		// CHECKPARALLEL: %[[SUM0:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
		// CHECKPARALLEL: %[[SUM1:.]] = affine.apply #{{.}}(%{{.}}, %{{.}})
		// CHECKPARALLEL: %[[IDX:.*]] = affine.max #[[clampMinMap]](%[[SUM0]])
		// CHECKPARALLEL: %[[IDY:.*]] = affine.max #[[clampMinMap]](%[[SUM1]])
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %[[IDX]], %[[IDY]], %{{.}}] : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %{{.}} = select %{{.}}, %{{.}}, %{{.}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %{{.}} = mulf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>
		// CHECKPARALLEL: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECKPARALLEL: store %{{.}}, %{{.}}[%{{.}}, %{{.}}, %{{.}}, %{{.}}] : memref<?x?x?x?xf32>

func @pooling_max(%arg0: memref<?x?xf32>,		func @pooling_max(%arg0: memref<?x?xf32>,
%arg1: memref<?x?xi32>,		%arg1: memref<?x?xi32>,
%arg2: memref<?x?xf32>) {		%arg2: memref<?x?xf32>) {
linalg.pooling_max(%arg0, %arg1, %arg2) { strides = [2, 1] }:		linalg.pooling_max(%arg0, %arg1, %arg2) { strides = [2, 1] }:
memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>		memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>
return		return
}		}
// CHECK-LABEL: func @pooling_max		// CHECKLOOP-LABEL: func @pooling_max
// CHECK: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>		// CHECKLOOP: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
// CHECK: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>		// CHECKLOOP: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
// CHECK: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>		// CHECKLOOP: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
// CHECK: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>		// CHECKLOOP: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
// CHECK: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
// CHECK: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
// CHECK: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32		// CHECKLOOP: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
// CHECK: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>		// CHECKLOOP: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

		// CHECKPARALLEL-LABEL: func @pooling_max
		// CHECKPARALLEL: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
		// CHECKPARALLEL: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
		// CHECKPARALLEL: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
		// CHECKPARALLEL: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[OX]], %[[OY]]) step (%{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
		// CHECKPARALLEL: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
		// CHECKPARALLEL: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
		// CHECKPARALLEL: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

func @pooling_min(%arg0: memref<?x?xf32>,		func @pooling_min(%arg0: memref<?x?xf32>,
%arg1: memref<?x?xi32>,		%arg1: memref<?x?xi32>,
%arg2: memref<?x?xf32>) {		%arg2: memref<?x?xf32>) {
linalg.pooling_min(%arg0, %arg1, %arg2) { strides = [2, 1] }:		linalg.pooling_min(%arg0, %arg1, %arg2) { strides = [2, 1] }:
memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>		memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>
return		return
}		}
// CHECK-LABEL: func @pooling_min		// CHECKLOOP-LABEL: func @pooling_min
// CHECK: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>		// CHECKLOOP: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
// CHECK: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>		// CHECKLOOP: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
// CHECK: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>		// CHECKLOOP: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
// CHECK: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>		// CHECKLOOP: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
// CHECK: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
// CHECK: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
// CHECK: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>		// CHECKLOOP: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
// CHECK: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32		// CHECKLOOP: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
// CHECK: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>		// CHECKLOOP: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

		// CHECKPARALLEL-LABEL: func @pooling_min
		// CHECKPARALLEL: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
		// CHECKPARALLEL: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
		// CHECKPARALLEL: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
		// CHECKPARALLEL: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[OX]], %[[OY]]) step (%{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
		// CHECKPARALLEL: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
		// CHECKPARALLEL: %{{.}} = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
		// CHECKPARALLEL: %[[RES:.]] = select %{{.}}, %{{.}}, %{{.}} : f32
		// CHECKPARALLEL: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

func @pooling_sum(%arg0: memref<?x?xf32>,		func @pooling_sum(%arg0: memref<?x?xf32>,
%arg1: memref<?x?xi32>,		%arg1: memref<?x?xi32>,
%arg2: memref<?x?xf32>) {		%arg2: memref<?x?xf32>) {
linalg.pooling_sum(%arg0, %arg1, %arg2) { strides = [2, 1] }:		linalg.pooling_sum(%arg0, %arg1, %arg2) { strides = [2, 1] }:
memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>		memref<?x?xf32>, memref<?x?xi32>, memref<?x?xf32>
return		return
}		}
// CHECK-LABEL: func @pooling_sum		// CHECKLOOP-LABEL: func @pooling_sum
// CHECK: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>		// CHECKLOOP: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
// CHECK: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>		// CHECKLOOP: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
// CHECK: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>		// CHECKLOOP: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
// CHECK: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>		// CHECKLOOP: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
// CHECK: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[OY]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
// CHECK: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {		// CHECKLOOP: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
// CHECK: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})		// CHECKLOOP: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
// CHECK: %[[RHS:.]] = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>		// CHECKLOOP: %[[RHS:.]] = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
// CHECK: %[[LHS:.]] = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>		// CHECKLOOP: %[[LHS:.]] = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
// CHECK: %[[RES:.*]] = addf %[[LHS]], %[[RHS]] : f32		// CHECKLOOP: %[[RES:.*]] = addf %[[LHS]], %[[RHS]] : f32
// CHECK: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>		// CHECKLOOP: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

		// CHECKPARALLEL-LABEL: func @pooling_sum
		// CHECKPARALLEL: %[[WX:.*]] = dim %arg1, 0 : memref<?x?xi32>
		// CHECKPARALLEL: %[[WY:.*]] = dim %arg1, 1 : memref<?x?xi32>
		// CHECKPARALLEL: %[[OX:.*]] = dim %arg2, 0 : memref<?x?xf32>
		// CHECKPARALLEL: %[[OY:.*]] = dim %arg2, 1 : memref<?x?xf32>
		// CHECKPARALLEL: loop.parallel (%{{.}}, %{{.}}) = (%{{.}}, %{{.}}) to (%[[OX]], %[[OY]]) step (%{{.}}, %{{.}}) {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WX]] step %{{.*}} {
		// CHECKPARALLEL: loop.for %{{.}} = %{{.}} to %[[WY]] step %{{.*}} {
		// CHECKPARALLEL: %[[IX:.]] = affine.apply #[[Stride2Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[IY:.]] = affine.apply #[[Stride1Dilation1]](%{{.}}, %{{.*}})
		// CHECKPARALLEL: %[[RHS:.]] = load %{{.}}[%[[IX]], %[[IY]]] : memref<?x?xf32>
		// CHECKPARALLEL: %[[LHS:.]] = load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32>
		// CHECKPARALLEL: %[[RES:.*]] = addf %[[LHS]], %[[RHS]] : f32
		// CHECKPARALLEL: store %[[RES]], %{{.}}[%{{.}}, %{{.*}}] : memref<?x?xf32>

func @foo(%0: f32, %1: f32, %2: f32) -> (f32, f32) {		func @foo(%0: f32, %1: f32, %2: f32) -> (f32, f32) {
%f0 = constant 0.0 : f32		%f0 = constant 0.0 : f32
return %f0, %f0 : f32, f32		return %f0, %f0 : f32, f32
}		}
#accesses = [		#accesses = [
affine_map<(i, j, k) -> (i, j)>,		affine_map<(i, j, k) -> (i, j)>,
affine_map<(i, j, k) -> (i, j, k)>,		affine_map<(i, j, k) -> (i, j, k)>,
affine_map<(i, j, k) -> (i, k, j)>		affine_map<(i, j, k) -> (i, k, j)>
]		]
#trait = {		#trait = {
args_in = 1,		args_in = 1,
args_out = 2,		args_out = 2,
iterator_types = ["parallel", "parallel", "parallel"],		iterator_types = ["parallel", "parallel", "parallel"],
indexing_maps = #accesses,		indexing_maps = #accesses,
fun = @foo,		fun = @foo,
library_call = "some_external_function_name_1",		library_call = "some_external_function_name_1",
doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"		doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"
}		}
func @generic_function(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @generic_function(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.generic #trait %arg0, %arg1, %arg2:		linalg.generic #trait %arg0, %arg1, %arg2:
memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: @foo		// CHECKLOOP-LABEL: @foo
// CHECK-LABEL: @generic_function		// CHECKLOOP-LABEL: @generic_function
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[res:.*]]:2 = call @foo(%[[a]], %[[b]], %[[c]]) : (f32, f32, f32) -> (f32, f32)		// CHECKLOOP: %[[res:.*]]:2 = call @foo(%[[a]], %[[b]], %[[c]]) : (f32, f32, f32) -> (f32, f32)
// CHECK: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: @foo
		// CHECKPARALLEL-LABEL: @generic_function
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[res:.*]]:2 = call @foo(%[[a]], %[[b]], %[[c]]) : (f32, f32, f32) -> (f32, f32)
		// CHECKPARALLEL: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

#trait2 = {		#trait2 = {
args_in = 1,		args_in = 1,
args_out = 2,		args_out = 2,
iterator_types = ["parallel", "parallel", "parallel"],		iterator_types = ["parallel", "parallel", "parallel"],
indexing_maps = #accesses,		indexing_maps = #accesses,
library_call = "some_external_function_name_2",		library_call = "some_external_function_name_2",
doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"		doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"
}		}
func @generic_region(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		func @generic_region(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>, %arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, %arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.generic #trait2 %arg0, %arg1, %arg2 {		linalg.generic #trait2 %arg0, %arg1, %arg2 {
^bb0(%a: f32, %b: f32, %c: f32):		^bb0(%a: f32, %b: f32, %c: f32):
%d = mulf %a, %b : f32		%d = mulf %a, %b : f32
%e = addf %c, %d : f32		%e = addf %c, %d : f32
linalg.yield %d, %e : f32, f32		linalg.yield %d, %e : f32, f32
}: memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		}: memref<?x?xf32, offset: ?, strides: [?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>, memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: @generic_region		// CHECKLOOP-LABEL: @generic_region
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[d:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP: %[[d:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK: %[[e:.*]] = addf %[[c]], %[[d]] : f32		// CHECKLOOP: %[[e:.*]] = addf %[[c]], %[[d]] : f32
// CHECK: store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: @generic_region
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[d:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL: %[[e:.*]] = addf %[[c]], %[[d]] : f32
		// CHECKPARALLEL: store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

func @indexed_foo(%i: index, %j: index, %k: index, %0: f32, %1: f32, %2: f32) -> (f32, f32) {		func @indexed_foo(%i: index, %j: index, %k: index, %0: f32, %1: f32, %2: f32) -> (f32, f32) {
%i_int = index_cast %i: index to i32		%i_int = index_cast %i: index to i32
%i_float = sitofp %i_int : i32 to f32		%i_float = sitofp %i_int : i32 to f32
return %i_float, %i_float : f32, f32		return %i_float, %i_float : f32, f32
}		}
#trait3 = {		#trait3 = {
args_in = 1,		args_in = 1,
Show All 9 Lines	func @indexed_generic_function(
%arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,		%arg1: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,
%arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {		%arg2: memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>) {
linalg.indexed_generic #trait3 %arg0, %arg1, %arg2:		linalg.indexed_generic #trait3 %arg0, %arg1, %arg2:
memref<?x?xf32, offset: ?, strides: [?, 1]>,		memref<?x?xf32, offset: ?, strides: [?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}
// CHECK-LABEL: @indexed_foo		// CHECKLOOP-LABEL: @indexed_foo
// CHECK-LABEL: @indexed_generic_function		// CHECKLOOP-LABEL: @indexed_generic_function
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: %[[res:.*]]:2 = call @indexed_foo(%[[i]], %[[j]], %[[k]], %[[a]], %[[b]], %[[c]]) : (index, index, index, f32, f32, f32) -> (f32, f32)		// CHECKLOOP: %[[res:.*]]:2 = call @indexed_foo(%[[i]], %[[j]], %[[k]], %[[a]], %[[b]], %[[c]]) : (index, index, index, f32, f32, f32) -> (f32, f32)
// CHECK: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
// CHECK: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>		// CHECKLOOP: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

		// CHECKPARALLEL-LABEL: @indexed_foo
		// CHECKPARALLEL-LABEL: @indexed_generic_function
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]] : memref<?x?xf32, #[[strided2D]]>
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: %[[res:.*]]:2 = call @indexed_foo(%[[i]], %[[j]], %[[k]], %[[a]], %[[b]], %[[c]]) : (index, index, index, f32, f32, f32) -> (f32, f32)
		// CHECKPARALLEL: store %[[res]]#0, %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, #[[strided3D]]>
		// CHECKPARALLEL: store %[[res]]#1, %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, #[[strided3D]]>

#trait4 = {		#trait4 = {
args_in = 1,		args_in = 1,
args_out = 2,		args_out = 2,
iterator_types = ["parallel", "parallel", "parallel"],		iterator_types = ["parallel", "parallel", "parallel"],
indexing_maps = #accesses,		indexing_maps = #accesses,
library_call = "some_external_function_name_2",		library_call = "some_external_function_name_2",
doc = "B(i,j,k), C(i,k,j) = foo(A(i, j) * B(i,j,k), i * j * k + C(i,k,j))"		doc = "B(i,j,k), C(i,k,j) = foo(A(i, j) * B(i,j,k), i * j * k + C(i,k,j))"
Show All 14 Lines	^bb0(%i: index, %j: index, %k: index, %a: f32, %b: f32, %c: f32):
%result_2 = addf %c, %ijk_float : f32		%result_2 = addf %c, %ijk_float : f32
linalg.yield %result_1, %result_2 : f32, f32		linalg.yield %result_1, %result_2 : f32, f32
}: memref<?x?xf32, offset: ?, strides: [?, 1]>,		}: memref<?x?xf32, offset: ?, strides: [?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>,
memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>		memref<?x?x?xf32, offset: ?, strides: [?, ?, 1]>
return		return
}		}

// CHECK-LABEL: @indexed_generic_region		// CHECKLOOP-LABEL: @indexed_generic_region
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: loop.for %[[k:.]] = {{.}}		// CHECKLOOP: loop.for %[[k:.]] = {{.}}
// CHECK: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]]		// CHECKLOOP: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]]
// CHECK: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]]		// CHECKLOOP: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]]
// CHECK: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]]		// CHECKLOOP: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]]
// CHECK: %[[result_1:.*]] = mulf %[[a]], %[[b]] : f32		// CHECKLOOP: %[[result_1:.*]] = mulf %[[a]], %[[b]] : f32
// CHECK: %[[ij:.*]] = addi %[[i]], %[[j]] : index		// CHECKLOOP: %[[ij:.*]] = addi %[[i]], %[[j]] : index
// CHECK: %[[ijk:.*]] = addi %[[ij]], %[[k]] : index		// CHECKLOOP: %[[ijk:.*]] = addi %[[ij]], %[[k]] : index
// CHECK: %[[ijk_int:.*]] = index_cast %[[ijk]] : index to i32		// CHECKLOOP: %[[ijk_int:.*]] = index_cast %[[ijk]] : index to i32
// CHECK: %[[ijk_float:.*]] = sitofp %[[ijk_int]] : i32 to f32		// CHECKLOOP: %[[ijk_float:.*]] = sitofp %[[ijk_int]] : i32 to f32
// CHECK: %[[result_2:.*]] = addf %[[c]], %[[ijk_float]] : f32		// CHECKLOOP: %[[result_2:.*]] = addf %[[c]], %[[ijk_float]] : f32
// CHECK: store %[[result_1]], %{{.*}}[%[[i]], %[[j]], %[[k]]]		// CHECKLOOP: store %[[result_1]], %{{.*}}[%[[i]], %[[j]], %[[k]]]
// CHECK: store %[[result_2]], %{{.*}}[%[[i]], %[[k]], %[[j]]]		// CHECKLOOP: store %[[result_2]], %{{.*}}[%[[i]], %[[k]], %[[j]]]

		// CHECKPARALLEL-LABEL: @indexed_generic_region
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]], %[[k:[a-zA-Z0-9_]*]])
		// CHECKPARALLEL: %[[a:.]] = load %{{.}}[%[[i]], %[[j]]]
		// CHECKPARALLEL: %[[b:.]] = load %{{.}}[%[[i]], %[[j]], %[[k]]]
		// CHECKPARALLEL: %[[c:.]] = load %{{.}}[%[[i]], %[[k]], %[[j]]]
		// CHECKPARALLEL: %[[result_1:.*]] = mulf %[[a]], %[[b]] : f32
		// CHECKPARALLEL: %[[ij:.*]] = addi %[[i]], %[[j]] : index
		// CHECKPARALLEL: %[[ijk:.*]] = addi %[[ij]], %[[k]] : index
		// CHECKPARALLEL: %[[ijk_int:.*]] = index_cast %[[ijk]] : index to i32
		// CHECKPARALLEL: %[[ijk_float:.*]] = sitofp %[[ijk_int]] : i32 to f32
		// CHECKPARALLEL: %[[result_2:.*]] = addf %[[c]], %[[ijk_float]] : f32
		// CHECKPARALLEL: store %[[result_1]], %{{.*}}[%[[i]], %[[j]], %[[k]]]
		// CHECKPARALLEL: store %[[result_2]], %{{.*}}[%[[i]], %[[k]], %[[j]]]

// -----		// -----

#broadcast_access = [		#broadcast_access = [
affine_map<(i, j) -> ()>,		affine_map<(i, j) -> ()>,
affine_map<(i, j) -> (i, j)>		affine_map<(i, j) -> (i, j)>
]		]

Show All 9 Lines
{		{
linalg.generic #trait_broadcast %arg0, %arg1 {		linalg.generic #trait_broadcast %arg0, %arg1 {
^bb(%a: f32, %b: f32) :		^bb(%a: f32, %b: f32) :
linalg.yield %a : f32		linalg.yield %a : f32
} : memref<f32>, memref<3x4xf32>		} : memref<f32>, memref<3x4xf32>
return		return
}		}

// CHECK-LABEL: @generic_op_zero_rank		// CHECKLOOP-LABEL: @generic_op_zero_rank
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xf32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xf32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][]		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][]
// CHECK: store %[[a]], %[[ARG1]][%[[i]], %[[j]]]		// CHECKLOOP: store %[[a]], %[[ARG1]][%[[i]], %[[j]]]

		// CHECKPARALLEL-LABEL: @generic_op_zero_rank
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xf32>
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]])
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][]
		// CHECKPARALLEL: store %[[a]], %[[ARG1]][%[[i]], %[[j]]]

func @indexed_generic_op_zero_rank(%arg0: memref<i32>, %arg1: memref<3x4xi32>)		func @indexed_generic_op_zero_rank(%arg0: memref<i32>, %arg1: memref<3x4xi32>)
{		{
linalg.indexed_generic #trait_broadcast %arg0, %arg1 {		linalg.indexed_generic #trait_broadcast %arg0, %arg1 {
^bb(%i: index, %j: index, %a: i32, %b: i32) :		^bb(%i: index, %j: index, %a: i32, %b: i32) :
%ij = addi %i, %j : index		%ij = addi %i, %j : index
%ij_int = index_cast %ij : index to i32		%ij_int = index_cast %ij : index to i32
%result = addi %a, %ij_int : i32		%result = addi %a, %ij_int : i32
linalg.yield %result : i32		linalg.yield %result : i32
} : memref<i32>, memref<3x4xi32>		} : memref<i32>, memref<3x4xi32>
return		return
}		}

// CHECK-LABEL: @indexed_generic_op_zero_rank		// CHECKLOOP-LABEL: @indexed_generic_op_zero_rank
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<i32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<i32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xi32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xi32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: loop.for %[[j:.]] = {{.}}		// CHECKLOOP: loop.for %[[j:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][
// CHECK: %[[ij:.*]] = addi %[[i]], %[[j]] : index		// CHECKLOOP: %[[ij:.*]] = addi %[[i]], %[[j]] : index
// CHECK: %[[ij_int:.*]] = index_cast %[[ij]] : index to i32		// CHECKLOOP: %[[ij_int:.*]] = index_cast %[[ij]] : index to i32
// CHECK: %[[result:.*]] = addi %[[a]], %[[ij_int]] : i32		// CHECKLOOP: %[[result:.*]] = addi %[[a]], %[[ij_int]] : i32
// CHECK: store %[[result]], %[[ARG1]][%[[i]], %[[j]]]		// CHECKLOOP: store %[[result]], %[[ARG1]][%[[i]], %[[j]]]

		// CHECKPARALLEL-LABEL: @indexed_generic_op_zero_rank
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<i32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<3x4xi32>
		// CHECKPARALLEL: loop.parallel (%[[i:[a-zA-Z0-9_]]], %[[j:[a-zA-Z0-9_]]])
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][
		// CHECKPARALLEL: %[[ij:.*]] = addi %[[i]], %[[j]] : index
		// CHECKPARALLEL: %[[ij_int:.*]] = index_cast %[[ij]] : index to i32
		// CHECKPARALLEL: %[[result:.*]] = addi %[[a]], %[[ij_int]] : i32
		// CHECKPARALLEL: store %[[result]], %[[ARG1]][%[[i]], %[[j]]]

#reduce_1D_access = [		#reduce_1D_access = [
affine_map<(i) -> (i)>,		affine_map<(i) -> (i)>,
affine_map<(i) -> ()>		affine_map<(i) -> ()>
]		]

#trait_reduce_1D = {		#trait_reduce_1D = {
args_in = 1,		args_in = 1,
args_out = 1,		args_out = 1,
indexing_maps = #reduce_1D_access,		indexing_maps = #reduce_1D_access,
iterator_types = ["reduction"],		iterator_types = ["reduction"],
library_call = "some_reduce_external_fn"		library_call = "some_reduce_external_fn"
}		}

func @generic_op_1D_reduce(%arg0: memref<?xf32>, %arg1: memref<f32>)		func @generic_op_1D_reduce(%arg0: memref<?xf32>, %arg1: memref<f32>)
{		{
linalg.generic #trait_reduce_1D %arg0, %arg1 {		linalg.generic #trait_reduce_1D %arg0, %arg1 {
^bb(%a: f32, %b: f32) :		^bb(%a: f32, %b: f32) :
%0 = addf %a, %b : f32		%0 = addf %a, %b : f32
linalg.yield %0 : f32		linalg.yield %0 : f32
} : memref<?xf32>, memref<f32>		} : memref<?xf32>, memref<f32>
return		return
}		}
// CHECK-LABEL: @generic_op_1D_reduce		// CHECKLOOP-LABEL: @generic_op_1D_reduce
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][%[[i]]]		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][%[[i]]]
// CHECK: %[[b:.*]] = load %[[ARG1]][]		// CHECKLOOP: %[[b:.*]] = load %[[ARG1]][]
// CHECK: %[[c:.*]] = addf %[[a]], %[[b]] : f32		// CHECKLOOP: %[[c:.*]] = addf %[[a]], %[[b]] : f32
// CHECK: store %[[c]], %[[ARG1]][]		// CHECKLOOP: store %[[c]], %[[ARG1]][]

		// CHECKPARALLEL-LABEL: @generic_op_1D_reduce
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL: loop.for %[[i:.]] = {{.}}
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][%[[i]]]
		// CHECKPARALLEL: %[[b:.*]] = load %[[ARG1]][]
		// CHECKPARALLEL: %[[c:.*]] = addf %[[a]], %[[b]] : f32
		// CHECKPARALLEL: store %[[c]], %[[ARG1]][]


#reduce_init_1D_access = [		#reduce_init_1D_access = [
affine_map<(i) -> (i)>,		affine_map<(i) -> (i)>,
affine_map<(i) -> ()>,		affine_map<(i) -> ()>,
affine_map<(i) -> ()>		affine_map<(i) -> ()>
]		]

Show All 14 Lines	^bb(%i : index, %a: f32, %b: f32, %c: f32) :
%0 = constant 0 : index		%0 = constant 0 : index
%1 = cmpi "eq", %0, %i : index		%1 = cmpi "eq", %0, %i : index
%2 = select %1, %b, %c : f32		%2 = select %1, %b, %c : f32
%3 = addf %a, %2 : f32		%3 = addf %a, %2 : f32
linalg.yield %3 : f32		linalg.yield %3 : f32
} : memref<?xf32>, memref<f32>, memref<f32>		} : memref<?xf32>, memref<f32>, memref<f32>
return		return
}		}
// CHECK-LABEL: @indexed_generic_op_1D_reduce		// CHECKLOOP-LABEL: @indexed_generic_op_1D_reduce
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>		// CHECKLOOP-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<f32>		// CHECKLOOP-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<f32>
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: %[[a:.*]] = load %[[ARG0]][%[[i]]]		// CHECKLOOP: %[[a:.*]] = load %[[ARG0]][%[[i]]]
// CHECK: %[[b:.*]] = load %[[ARG1]][]		// CHECKLOOP: %[[b:.*]] = load %[[ARG1]][]
// CHECK: %[[c:.*]] = load %[[ARG2]][]		// CHECKLOOP: %[[c:.*]] = load %[[ARG2]][]
// CHECK: %[[d:.]] = select %{{.}}, %[[b]], %[[c]]		// CHECKLOOP: %[[d:.]] = select %{{.}}, %[[b]], %[[c]]
// CHECK: %[[e:.*]] = addf %[[a]], %[[d]]		// CHECKLOOP: %[[e:.*]] = addf %[[a]], %[[d]]
// CHECK: store %[[e]], %[[ARG2]][]		// CHECKLOOP: store %[[e]], %[[ARG2]][]

		// CHECKPARALLEL-LABEL: @indexed_generic_op_1D_reduce
		// CHECKPARALLEL-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?xf32>
		// CHECKPARALLEL-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<f32>
		// CHECKPARALLEL: loop.for %[[i:.]] = {{.}}
		// CHECKPARALLEL: %[[a:.*]] = load %[[ARG0]][%[[i]]]
		// CHECKPARALLEL: %[[b:.*]] = load %[[ARG1]][]
		// CHECKPARALLEL: %[[c:.*]] = load %[[ARG2]][]
		// CHECKPARALLEL: %[[d:.]] = select %{{.}}, %[[b]], %[[c]]
		// CHECKPARALLEL: %[[e:.*]] = addf %[[a]], %[[d]]
		// CHECKPARALLEL: store %[[e]], %[[ARG2]][]

#trait_const_fill = {		#trait_const_fill = {
args_in = 0,		args_in = 0,
args_out = 1,		args_out = 1,
indexing_maps = [affine_map<(i) -> (i)>],		indexing_maps = [affine_map<(i) -> (i)>],
iterator_types = ["parallel"],		iterator_types = ["parallel"],
library_call = "some_external_fn"		library_call = "some_external_fn"
}		}
func @generic_const_init(%arg0: memref<?xf32>) {		func @generic_const_init(%arg0: memref<?xf32>) {
%cst = constant 1.0 : f32		%cst = constant 1.0 : f32
linalg.generic #trait_const_fill %arg0 {		linalg.generic #trait_const_fill %arg0 {
^bb0(%arg1: f32): // no predecessors		^bb0(%arg1: f32): // no predecessors
linalg.yield %cst : f32		linalg.yield %cst : f32
}: memref<?xf32>		}: memref<?xf32>
return		return
}		}
// CHECK-LABEL: @generic_const_init		// CHECKLOOP-LABEL: @generic_const_init
// CHECK-SAME: %[[ARG0:.*]]: memref<?xf32>		// CHECKLOOP-SAME: %[[ARG0:.*]]: memref<?xf32>
// CHECK: %[[CONST:.*]] = constant 1.000000e+00 : f32		// CHECKLOOP: %[[CONST:.*]] = constant 1.000000e+00 : f32
// CHECK: loop.for %[[i:.]] = {{.}}		// CHECKLOOP: loop.for %[[i:.]] = {{.}}
// CHECK: store %[[CONST]], %[[ARG0]]		// CHECKLOOP: store %[[CONST]], %[[ARG0]]

		// CHECKPARALLEL-LABEL: @generic_const_init
		// CHECKPARALLEL-SAME: %[[ARG0:.*]]: memref<?xf32>
		// CHECKPARALLEL: %[[CONST:.*]] = constant 1.000000e+00 : f32
		// CHECKPARALLEL: loop.parallel (%[[i:.*]])
		// CHECKPARALLEL: store %[[CONST]], %[[ARG0]]

mlir/test/Dialect/Linalg/parallel_loops.mlir

	Show All 26 Lines
	// CHECK: %[[SUM_ELEM:.*]] = load %[[SUM]][%[[I]], %[[J]]]			// CHECK: %[[SUM_ELEM:.*]] = load %[[SUM]][%[[I]], %[[J]]]
	// CHECK: %[[SUM:.*]] = addf %[[LHS_ELEM]], %[[RHS_ELEM]] : f32			// CHECK: %[[SUM:.*]] = addf %[[LHS_ELEM]], %[[RHS_ELEM]] : f32
	// CHECK: store %[[SUM]], %{{.*}}[%[[I]], %[[J]]]			// CHECK: store %[[SUM]], %{{.*}}[%[[I]], %[[J]]]
	// CHECK: loop.yield			// CHECK: loop.yield

	// -----			// -----

	#accesses = [			#accesses = [
	affine_map<(m, n) -> (m, n)>,			affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,
	affine_map<(m, n) -> (m)>			affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
	]			]
	#trait = {			#trait = {
	args_in = 1,			args_in = 1,
	args_out = 1,			args_out = 1,
	iterator_types = ["parallel", "reduction"],			iterator_types = ["parallel", "parallel", "reduction", "parallel"],
	indexing_maps = #accesses			indexing_maps = #accesses
	}			}

	func @do_not_lower_reduce(%A: memref<2x4xf32>, %B: memref<2xf32>) {			func @lower_outer_parallel(%A: memref<?x?x?x?xf32>, %B: memref<?x?x?xf32>) {
	linalg.generic #trait %A, %B {			linalg.generic #trait %A, %B {
	^bb0(%a: f32, %b: f32):			^bb0(%a: f32, %b: f32):
	linalg.yield %a: f32			linalg.yield %a: f32
	} : memref<2x4xf32>, memref<2xf32>			} : memref<?x?x?x?xf32>, memref<?x?x?xf32>
	return			return
	}			}
	// CHECK-LABEL: @do_not_lower_reduce			// CHECK-LABEL: @lower_outer_parallel
	// CHECK: linalg.generic			// CHECK-DAG: %[[C0:.*]] = constant 0
				// CHECK-DAG: %[[C1:.*]] = constant 1
				// CHECK-DAG: %[[D0:.]] = dim %{{.}}, 0
				// CHECK-DAG: %[[D1:.]] = dim %{{.}}, 1
				// CHECK-DAG: %[[D2:.]] = dim %{{.}}, 2
				// CHECK-DAG: %[[D3:.]] = dim %{{.}}, 3
				// CHECK: loop.parallel (%[[IV0:.]], %[[IV1:.]]) = (%[[C0]], %[[C0]]) to (%[[D0]], %[[D1]]) step (%[[C1]], %[[C1]])
				// CHECK: loop.for %[[IV2:.*]] = %[[C0]] to %[[D2]] step %[[C1]]
				// CHECK: loop.for %[[IV3:.*]] = %[[C0]] to %[[D3]] step %[[C1]]
				// CHECK: load %{{.*}}[%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
				// CHECK: store %{{.}}, %{{.}}[%[[IV0]], %[[IV1]], %[[IV3]]]
				No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255795

mlir/include/mlir/Dialect/Affine/EDSC/Builders.h

mlir/include/mlir/Dialect/Linalg/EDSC/Builders.h

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h

mlir/include/mlir/Dialect/LoopOps/EDSC/Builders.h

mlir/include/mlir/EDSC/Builders.h

mlir/lib/Dialect/Affine/EDSC/Builders.cpp

mlir/lib/Dialect/Linalg/EDSC/Builders.cpp

mlir/lib/Dialect/Linalg/Transforms/LinalgToLoops.cpp

mlir/lib/Dialect/LoopOps/EDSC/Builders.cpp

mlir/test/Dialect/Linalg/loops.mlir

mlir/test/Dialect/Linalg/parallel_loops.mlir

[mlir][Linalg] Add loop.parallel lowering for all Linalg Ops.
ClosedPublic