This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
docs/
-
Dialects/
11
Affine.md
4
Traits.md
-
include/mlir/
-
mlir/
-
Dialect/
-
Affine/IR/
-
IR/
-
AffineOps.h
-
AffineOps.td
-
Shape/IR/
-
IR/
-
ShapeOps.td
-
IR/
-
BuiltinOps.td
-
OpBase.td
-
OpDefinition.h
-
lib/
-
Analysis/
-
AffineAnalysis.cpp
-
Dialect/Affine/
-
Affine/
-
IR/
2
AffineOps.cpp
-
Transforms/
-
AffineParallelize.cpp
-
test/
-
Conversion/AffineToStandard/
-
AffineToStandard/
-
lower-affine.mlir
-
Dialect/
-
Affine/
-
canonicalize.mlir
-
invalid.mlir
-
ops.mlir
-
Linalg/
-
comprehensive-module-bufferize.mlir
-
fusion-indexed.mlir
-
fusion-pattern.mlir
-
fusion-sequence.mlir
-
fusion-tensor-pattern.mlir
-
fusion.mlir
-
hoist-padding.mlir
-
loops.mlir
-
pad-and-hoist.mlir
-
reshape_fusion.mlir
-
tile-and-fuse-on-tensors.mlir
-
tile-and-fuse-tensors.mlir
-
tile-conv.mlir
-
tile-indexed.mlir
-
tile-tensors.mlir
-
tile.mlir
-
SCF/
-
for-loop-peeling.mlir
-
SparseTensor/
-
sparse_vector_peeled.mlir
-
lib/Dialect/Test/
-
Dialect/
-
Test/
-
TestDialect.cpp
-
TestOps.td

Differential D112891

[MLIR][Affine] Replace AffineScope by its complement trait
Needs RevisionPublic

Authored by bondhugula on Oct 31 2021, 12:05 PM.

Download Raw Diff

Details

Reviewers

mehdi_amini
rriddle
antiagainst
aartbik
jpienaar
nicolasvasilache
ftynse
dcaballe

Summary

Replace AffineScope trait by its complement trait. Introduce
ExtendsAffineScope trait that is the complement of AffineScope and
remove the latter.

This change does not bring any additional representational power but
changes the "default" trait with respect to ops creating new affine
scopes.

Any region holding op now starts an affine scope. Ops like affine.for,
affine.if, affine.parallel further extend the affine scope either
started or extended by their enclosing op. affine.for,
affine.parallel, and affine.if now have the ExtendsAffineScope
trait.

As a result of this change:

no additional trait is needed for a region holding op (like various FuncOp's, scf.for, scf.if, scf.execute_region, etc.) to allow affine load/stores/for/ifs in a larger set of cases; symbols that are defined in their bodies would be valid symbols for the affine ops.

ops outside the affine dialect can add an ExtendsAffineScope trait in order to have their block arguments treated as dimensional identifiers.

Clean up documentation and code comments.

This change also has the benign affect of dimensions becoming symbols in
several contexts (since arbitrary region holding ops are able to define
symbols at their top-level): hence the several mechanical updates to
test cases.

Losely speaking, letting all region-holding ops by default
start an affine scope leads to more "symbols" by default, and admits
affine IR in a significantly larger set of use cases by default.

Some additional discussion:
https://llvm.discourse.group/t/should-linalg-indexed-generic-allow-for-affine-operations-on-its-body/2889/14

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bondhugula created this revision.Oct 31 2021, 12:05 PM

Herald added a reviewer: rriddle. · View Herald TranscriptOct 31 2021, 12:05 PM

Herald added a reviewer: antiagainst. · View Herald Transcript

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a reviewer: jpienaar. · View Herald Transcript

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added subscribers: Groverkss, wenzhicui, wrengr and 22 others. · View Herald Transcript

bondhugula requested review of this revision.Oct 31 2021, 12:05 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptOct 31 2021, 12:05 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

bondhugula added a reviewer: ftynse.Oct 31 2021, 12:08 PM

bondhugula edited the summary of this revision. (Show Details)

Nice!

This LGTM, but it is worth having someone like @ftynse and/or @dcaballe to look at this as well.

mlir/docs/Dialects/Affine.md
75	Should we leave an empty line between the title and the paragraph?
83	Can this be any constant-like operation?

Extra comments.

Harbormaster completed remote builds in B131644: Diff 383671.Oct 31 2021, 12:18 PM

bondhugula added a reviewer: dcaballe.Oct 31 2021, 12:18 PM

Harbormaster completed remote builds in B131641: Diff 383667.Oct 31 2021, 12:59 PM

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I won't start reviewing before later this week but this change smells funny.
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.
The only relevant rule there remains that "dims compose" and "symbols concatenate".

Specifically, is there anything that would make it harder to move affine.apply/min/max to arith.affine_apply/min/max ?

In D112891#3099227, @nicolasvasilache wrote:

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.

I've covered this in the commit summary. More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

Specifically, is there anything that would make it harder to move affine.apply/min/max to arith.affine_apply/min/max ?

This is unrelated to this patch itself and we should have this discussion on discourse.

bondhugula added inline comments.Nov 1 2021, 7:43 AM

mlir/docs/Dialects/Affine.md
83	I guess any constant-like operation that generates an `index` type. I'll update the line and the check. Thanks.

bondhugula added inline comments.Nov 1 2021, 7:44 AM

mlir/docs/Dialects/Affine.md
83	Actually, the check is already fine (it's only looking for a constant-like operation); the doc needs an update.

In D112891#3100275, @bondhugula wrote:

In D112891#3099227, @nicolasvasilache wrote:

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.

I've covered this in the commit summary.

You've covered that it happens but not why it happens, the why is important.

More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

Ok, so the proper answer to my question is that affine.apply canonicalization has assumptions related to AffineScope that behave differently under this patch.
This actually happens in canonicalizePromotedSymbols.
I had to introduce this in 071ca8da918a5aed4758c4b4e27b946663adce58 as a counterpart to promoteComposedSymbolsAsDims.
The rationale at the time was to reduce the mess created by the multi-result + chains of AffineApplyOp design which had its own separate implementation that did not agree with AffineMap::compose.
As time passed, things got better and all this technical debt could be cleaned up at long last.

I think it is time to drop this AffineScope-dependent rewrite from affine.apply canonicalization and make it into a separate opt-in pattern (if you still need it, I'd be fine just dropping it altogether).
In any case, this revision should not change any Linalg test.

Specifically, is there anything that would make it harder to move affine.apply/min/max to arith.affine_apply/min/max ?

This is unrelated to this patch itself and we should have this discussion on discourse.

It is related to this patch because it is a test of separation of concerns: there is a clear issue atm.
Deciding whether / when to make this move is a topic for discourse indeed.

This revision now requires changes to proceed.Nov 2 2021, 3:21 AM

In D112891#3102418, @nicolasvasilache wrote:

In D112891#3100275, @bondhugula wrote:

In D112891#3099227, @nicolasvasilache wrote:

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.

I've covered this in the commit summary.

You've covered that it happens but not why it happens, the why is important.

More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

Ok, so the proper answer to my question is that affine.apply canonicalization has assumptions related to AffineScope that behave differently under this patch.
This actually happens in canonicalizePromotedSymbols.

Since the affine.apply has dimensional and symbolic operands, it is expected to behave differently when the surrounding affine scope changes. This patch changes the default trait and so the canonicalization will lead to a different result. I thought this was pretty clear from the commit summary and the expln above.

AffineScope-dependent rewrite from affine.apply canonicalization and make it into a separate opt-in pattern (if you still need it, I'd be fine just dropping it altogether).

In case you don't need an affine scope dependent affine.apply, you'll have to consider using a new "affine.apply" op or another reusable mechanism to achieve that. It's not appropriate to expect the author of this patch to do that for you -- feel free to design and do that yourself. It would also be unreasonable to expect this patch to wait until then. This revision is limited to changing the default trait.

In any case, this revision should not change any Linalg test.

Since Linalg uses affine.apply, this would impact its tests just like it impacts other tests. I am going to only update the test cases here and limit the revision to changing the default trait.

nicolasvasilache added inline comments.Nov 2 2021, 3:18 PM

mlir/docs/Dialects/Affine.md
78	Is this actually always true? Imagine calling a func from GPU and passing it threadIdx.x as an argument. Is it safe to consider this a symbol ? I can imagine we could create examples where the dependence analysis would be corrupted?
mlir/docs/Traits.md
268	Nice illustration.

mehdi_amini added inline comments.Nov 2 2021, 3:29 PM

mlir/docs/Dialects/Affine.md
78	Why wouldn't it be safe to consider this a symbol? How is it different from regular CPU functions and function parameters in general?

I think this goes in the right direction and only have one concern about two versions of isValidDim being seemingly inconsistent for block arguments, plus a couple of documentation suggestions.

IMO, changes to Linalg tests are indeed mechanical and thus should not be problematic. It is worth verifying that the expected canonicalizations still apply in Linalg pipelines though (@nicolasvasilache may have some end-to-end tests). Linalg only uses affine.apply for expression simplification purposes and I would expect canonicalizations to apply equally well regardless of operands being dimensions or symbols. Skimming through the affine maps in those tests looks like everything is fine. Certainly, there is the compose vs. concatenate difference that @nicolasvasilache points out, but it should be problematic either as long as we are also running the "operand deduplication" canonicalization for affine.apply that can remove unused dimensions.

I tend to agree that the discussion on affine.apply/min/max being factored is at least tangentially relevant to this patch. They both concern canonicalization of affine.apply/min/max and it might be surprising, although not incorrect, that the canonical form of these operations may be different depending on enclosing operations. We can consider this too surprising to be always desirable and thus factor out simplification of affine.apply/min/max based on affine value categorization rules into a separate pass.

mlir/docs/Dialects/Affine.md
78	I think this description can benefit from an example of how polyhedral analyses are supposed to reason about nested affine scopes, maybe link to the example in the trait documentation. Treating any value defined at the affine scope level as a symbol looks safe to me as is provided that we don't attempt any nested scope interaction. Regarding the GPU example with threadIdx.x as function argument, it is not a problem as long as we are not including the "virtual loop" corresponding treadIdx.x into the analysis. We can analyze and transform code within a single "iteration" with fixed threadIdx.x. This connects to the above: we don't reason across scopes, at least not implicitly. FWIW, fixing thread ids as symbols is exactly what PPCG does in GPU code generation.
89	Nit: now that `dim` also takes the position of the dimension as a value, I suppose we should also require that dimension to be a valid symbol?
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
272	Looking at this, I realize that it is worth discussing block arguments in the documentation. So far, it only mentions region arguments, i.e. arguments of the entry block. IMO, block arguments can be treated similarly to op results, that is, they are valid symbols if the region that contains the block is attached to an operation without the `ExtendsAffineScope` trait. And valid dimensions in any case. I don't know if we want to allow operations with the `ExtendsAffineScope` trait to have more than one block, maybe we can stay conservative for now and have a verifier check on the trait.
296–305	I'm not sure how this connects to `isValidDim(Value)` overload that says `BlockArgument`s are always valid dimensions, looks like a contradiction unless I am missing something.

In D112891#3108641, @ftynse wrote:

I tend to agree that the discussion on affine.apply/min/max being factored is at least tangentially relevant to this patch. They both concern canonicalization of affine.apply/min/max and it might be surprising, although not incorrect, that the canonical form of these operations may be different depending on enclosing operations. We can consider this too surprising to be always desirable and thus factor out simplification of affine.apply/min/max based on affine value categorization rules into a separate pass.

Yes, at least giving a good effort at independently factoring out the simplification rule into a separate pass / pattern that is applied with additional control seems like the constructive step forward.
There are still things we, collectively, do not fully understand and my Linalg gatekeeper bell rings when I see such changes: time to take the flashlight and see what that noise is about.

mlir/docs/Dialects/Affine.md
78	@mehdi_amini it is a bit tricky and I am not sure, which is why I ask: `threadId` is both a symbol from the point of view of a single thread (i.e. it is exactly one of the value in `[0, numThreads)`) it is also not a symbol from the point of view of the process (i.e. the union of all threads: it is a symbol that take all values in `[0, numThreads)` ) I put such duality in the past to good use but it was always clear that we were done with parallelization and dependence analysis. Here we cannot rely no such assumption. I imagine we could construct examples where dependence analysis could be messed up by this duality ?

ftynse added inline comments.Nov 4 2021, 8:49 AM

mlir/docs/Dialects/Affine.md
78	Only if your dependence analysis reasons across different scopes, which it should not under this patch IIUC. It sees the nested scope as an essentially opaque op and should not attempt reasoning about its regions, just assume it can access anything. (Later, we should be able to write a smarter analysis that overapproximates the internals.) That's why I asked for a clarification in the doc as to what happens in the case of nested scopes. If it only reasons inside one scope, you should never have thread id as alternatively symbol or dimension.

mehdi_amini added inline comments.Nov 4 2021, 5:49 PM

mlir/docs/Dialects/Affine.md
78	Ah I see what your were asking about originally now Nicolas, thanks Alex for presenting the "cross scope" problem. In the proposed model I would see the iteration space offered by the GPU grid of thread as a non-affine scope here: the affine scope is always the view of a single thread when you get to this level. Note that this is also a problem with a simple affine loop nest with a function call: the function boundary acts as a blocker for the affine scope definition. On the other hand, the `gpu.launch` with the region form can likely express the entire loop nest I think?

Seeing whether we can revive this in light of recent interest in: https://discourse.llvm.org/t/affineif-inside-a-linalg-operations/64544

More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

We have seen this type of intrusive simplifications in multiple other places through the codebase, each and every time, the solution has been to not make this a blanket canonicalization but opt-in via more control.
It would make sense that an affine-canonicalize pass would make these foldings.

Another potential direction is to only apply the intrusive canonicalize to only the uses that are immediately within an ExtendsAffineScope (I haven't analyzed the full implication of this yet, so maybe less relevant).

mlir/docs/Traits.md
278	It would be nice to discuss the relationship between affine scope and analyses / optimizations. In particular, what if op11 and op12 contain unknown side-effecting ops that one cannot summarize: what is the granularity at which e.g dependence analysis occurs? My intuition is that only AS1 is analyzable and transformable. In particular that if `op_with_extends_affine_scope_trait` is actually `affine.parallel`, the IR may be invalid, UB or racy by construction depending on how you want to define this.

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 1:34 AM

Herald added subscribers: anlunx, bzcheeseman, sdasgup3. · View Herald Transcript

mehdi_amini added inline comments.Aug 22 2022, 2:46 AM

mlir/docs/Traits.md
278	With an `affine.parallel`, could we just consider that the "unknown side-effects" in op11/op12 are guarantee'd to not conflict or there is UB because of the race condition?

nicolasvasilache added inline comments.Aug 23 2022, 3:17 AM

mlir/docs/Traits.md
278	This may be a bridge too far though .. in my mind `racy by construction` can very well conflict but is not `UB`. Lock-free like algorithms where multiple threads compute and commit the same results or RMW-like atomics are both important things to be able to represent.

But atomic updates aren’t races :)
That said it may be tricky for affine analysis consistency then?

Revision Contents

Path

Size

mlir/

docs/

Dialects/

Affine.md

59 lines

Traits.md

55 lines

include/

mlir/

Dialect/

Affine/

IR/

AffineOps.h

28 lines

AffineOps.td

23 lines

Shape/

IR/

ShapeOps.td

4 lines

IR/

BuiltinOps.td

7 lines

OpBase.td

4 lines

OpDefinition.h

16 lines

lib/

Analysis/

AffineAnalysis.cpp

6 lines

Dialect/

Affine/

IR/

AffineOps.cpp

118 lines

Transforms/

AffineParallelize.cpp

2 lines

test/

Conversion/

AffineToStandard/

lower-affine.mlir

2 lines

Dialect/

Affine/

canonicalize.mlir

19 lines

invalid.mlir

41 lines

ops.mlir

63 lines

Linalg/

comprehensive-module-bufferize.mlir

2 lines

fusion-indexed.mlir

17 lines

fusion-pattern.mlir

73 lines

fusion-sequence.mlir

21 lines

fusion-tensor-pattern.mlir

16 lines

12 lines

38 lines

28 lines

16 lines

20 lines

tile-and-fuse-on-tensors.mlir

59 lines

tile-and-fuse-tensors.mlir

58 lines

16 lines

22 lines

12 lines

60 lines

SCF/

for-loop-peeling.mlir

40 lines

SparseTensor/

sparse_vector_peeled.mlir

4 lines

lib/

Dialect/

Test/

TestDialect.cpp

16 lines

TestOps.td

9 lines

Diff 383667

mlir/docs/Dialects/Affine.md

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	#affine_map2to3 = affine_map<(d0, d1)[s0] -> (d0, d1 + s0, d1 - s0)>			#affine_map2to3 = affine_map<(d0, d1)[s0] -> (d0, d1 + s0, d1 - s0)>
	// Binds %N to the s0 symbol in affine_map2to3.			// Binds %N to the s0 symbol in affine_map2to3.
	%x = memref.alloc()[%N] : memref<40x50xf32, #affine_map2to3>			%x = memref.alloc()[%N] : memref<40x50xf32, #affine_map2to3>
	```			```

	### Restrictions on Dimensions and Symbols			### Restrictions on Dimensions and Symbols

	The affine dialect imposes certain restrictions on dimension and symbolic			The affine dialect imposes certain restrictions on dimension and symbolic
	identifiers to enable powerful analysis and transformation. An SSA value's use			identifiers to enable analysis and transformation. Any region-holding operation
	can be bound to a symbolic identifier if that SSA value is either 1. a region			that does not have the [ExtendsAffineScope](../Traits.md#ExtendsAffineScope)
	argument for an op with trait `AffineScope` (eg. `FuncOp`), 2. a value defined			trait starts an affine scope. An affine scope is a unit of IR (operations)
	at the top level of an `AffineScope` op (i.e., immediately enclosed by the			used for affine/polyhedral optimization purposes. An affine scope is started by
	latter), 3. a value that dominates the `AffineScope` op enclosing the value's			any region-holding operation without the `ExtendsAffineScope` trait. For more
	use, 4. the result of a			details, see [ExtendsAffineScope](../Traits.md#ExtendsAffineScope).
	[`constant` operation](Standard.md/#stdconstant-constantop), 5. the result of an
	[`affine.apply` operation](#affineapply-affineapplyop) that recursively takes as			The `affine.for`, `affine.parallel`, and
	arguments any valid symbolic identifiers, or 6. the result of a			`affine.if` ops extend an affine scope started or in turn extended by their
	[`dim` operation](MemRef.md/#memrefdim-mlirmemrefdimop) on either a memref that			parent operation.
	is an argument to a `AffineScope` op or a memref where the corresponding
	dimension is either static or a dynamic one in turn bound to a valid symbol.			#### Symbolic SSA values
	Note: if the use of an SSA value is not contained in any op with the			An SSA value's use can be bound to a symbolic identifier if that SSA value is
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Should we leave an empty line between the title and the paragraph? mehdi_amini: Should we leave an empty line between the title and the paragraph?
	`AffineScope` trait, only the rules 4-6 can be applied.			either:

	Note that as a result of rule (3) above, symbol validity is sensitive to the			1. a region argument for an op without the
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Is this actually always true? Imagine calling a func from GPU and passing it threadIdx.x as an argument. Is it safe to consider this a symbol ? I can imagine we could create examples where the dependence analysis would be corrupted? nicolasvasilache: Is this actually always true? Imagine calling a func from GPU and passing it threadIdx.x as an…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Why wouldn't it be safe to consider this a symbol? How is it different from regular CPU functions and function parameters in general? mehdi_amini: Why wouldn't it be safe to consider this a symbol? How is it different from regular CPU…
				ftynseUnsubmitted Not Done Reply Inline Actions I think this description can benefit from an example of how polyhedral analyses are supposed to reason about nested affine scopes, maybe link to the example in the trait documentation. Treating any value defined at the affine scope level as a symbol looks safe to me as is provided that we don't attempt any nested scope interaction. Regarding the GPU example with threadIdx.x as function argument, it is not a problem as long as we are not including the "virtual loop" corresponding treadIdx.x into the analysis. We can analyze and transform code within a single "iteration" with fixed threadIdx.x. This connects to the above: we don't reason across scopes, at least not implicitly. FWIW, fixing thread ids as symbols is exactly what PPCG does in GPU code generation. ftynse: I think this description can benefit from an example of how polyhedral analyses are supposed to…
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions @mehdi_amini it is a bit tricky and I am not sure, which is why I ask: `threadId` is both a symbol from the point of view of a single thread (i.e. it is exactly one of the value in `[0, numThreads)`) it is also not a symbol from the point of view of the process (i.e. the union of all threads: it is a symbol that take all values in `[0, numThreads)` ) I put such duality in the past to good use but it was always clear that we were done with parallelization and dependence analysis. Here we cannot rely no such assumption. I imagine we could construct examples where dependence analysis could be messed up by this duality ? nicolasvasilache: @mehdi_amini it is a bit tricky and I am not sure, which is why I ask: `threadId` is both a…
				ftynseUnsubmitted Not Done Reply Inline Actions Only if your dependence analysis reasons across different scopes, which it should not under this patch IIUC. It sees the nested scope as an essentially opaque op and should not attempt reasoning about its regions, just assume it can access anything. (Later, we should be able to write a smarter analysis that overapproximates the internals.) That's why I asked for a clarification in the doc as to what happens in the case of nested scopes. If it only reasons inside one scope, you should never have thread id as alternatively symbol or dimension. ftynse: Only if your dependence analysis reasons across different scopes, which it should not under…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Ah I see what your were asking about originally now Nicolas, thanks Alex for presenting the "cross scope" problem. In the proposed model I would see the iteration space offered by the GPU grid of thread as a non-affine scope here: the affine scope is always the view of a single thread when you get to this level. Note that this is also a problem with a simple affine loop nest with a function call: the function boundary acts as a blocker for the affine scope definition. On the other hand, the `gpu.launch` with the region form can likely express the entire loop nest I think? mehdi_amini: Ah I see what your were asking about originally now Nicolas, thanks Alex for presenting the…
	location of the SSA use. Dimensions may be bound not only to anything that a			[ExtendsAffineScope](../Traits.md#ExtendsAffineScope) trait (eg. `FuncOp`),
	symbol is bound to, but also to induction variables of enclosing			2. a value defined at the top level of an op that starts an affine scope (i.e.,
	[`affine.for`](#affinefor-affineforop) and			whose definition is immediately enclosed by such an op),
				3. a value that dominates the affine scope that the value's use is part of,
				4. the result of a [`constant` operation](Standard.md/#stdconstant-constantop),
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Can this be any constant-like operation? mehdi_amini: Can this be any constant-like operation?
				bondhugulaAuthorUnsubmitted Not Done Reply Inline Actions I guess any constant-like operation that generates an `index` type. I'll update the line and the check. Thanks. bondhugula: I guess any constant-like operation that generates an `index` type. I'll update the line and…
				bondhugulaAuthorUnsubmitted Not Done Reply Inline Actions Actually, the check is already fine (it's only looking for a constant-like operation); the doc needs an update. bondhugula: Actually, the check is already fine (it's only looking for a constant-like operation); the doc…
				5. the result of an [`affine.apply` operation](#affineapply-affineapplyop) that
				recursively takes as arguments any valid symbolic identifiers, or
				6. the result of a [dim operation](MemRef.md/#memrefdim-mlirmemrefdimop) on
				either a memref that is a region argument to an op starting an affine scope
				or a memref where the corresponding dimension is either static or a dynamic
				one in turn bound to a valid symbol.
				ftynseUnsubmitted Not Done Reply Inline Actions Nit: now that `dim` also takes the position of the dimension as a value, I suppose we should also require that dimension to be a valid symbol? ftynse: Nit: now that `dim` also takes the position of the dimension as a value, I suppose we should…

				Note: if the use of an SSA value is not contained in any op that starts a new
				affine scope, only the rules 4-6 can be applied.

				Note that as a result of rule (3) above, the validity to use a value as a
				symbols is sensitive to the location of the SSA use, or more precisely, the
				affine scope such a use is part of.

				#### Dimensional SSA values
				Dimensions may be bound not only to anything that a symbol is bound to, but
				also to region arguments of any operation with the
				[ExtendsAffineScope](../Traits.md#ExtendsAffineScope]) trait. This includes
				induction variables of enclosing [`affine.for`](#affinefor-affineforop) and
	[`affine.parallel`](#affineparallel-affineparallelop) operations, and the result			[`affine.parallel`](#affineparallel-affineparallelop) operations, and the result
	of an [`affine.apply` operation](#affineapply-affineapplyop) (which recursively			of an [`affine.apply` operation](#affineapply-affineapplyop) (which recursively
	may use other dimensions and symbols).			may use other dimensions and symbols).

	### Affine Expressions			### Affine Expressions

	Syntax:			Syntax:

	▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

mlir/docs/Traits.md

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	MLIR provides a suite of traits that provide various functionalities that are			MLIR provides a suite of traits that provide various functionalities that are
	common across many different operations. Below is a list of some key traits that			common across many different operations. Below is a list of some key traits that
	may be used directly by any dialect. The format of the header for each trait			may be used directly by any dialect. The format of the header for each trait
	section goes as follows:			section goes as follows:

	* `Header`			* `Header`
	- (`C++ class` -- `ODS class`(if applicable))			- (`C++ class` -- `ODS class`(if applicable))

	### AffineScope

	* `OpTrait::AffineScope` -- `AffineScope`

	This trait is carried by region holding operations that define a new scope for
	the purposes of polyhedral optimization and the affine dialect in particular.
	Any SSA values of 'index' type that either dominate such operations, or are
	defined at the top-level of such operations, or appear as region arguments for
	such operations automatically become valid symbols for the polyhedral scope
	defined by that operation. As a result, such SSA values could be used as the
	operands or index operands of various affine dialect operations like affine.for,
	affine.load, and affine.store. The polyhedral scope defined by an operation with
	this trait includes all operations in its region excluding operations that are
	nested inside of other operations that themselves have this trait.

	### AutomaticAllocationScope			### AutomaticAllocationScope

	* `OpTrait::AutomaticAllocationScope` -- `AutomaticAllocationScope`			* `OpTrait::AutomaticAllocationScope` -- `AutomaticAllocationScope`

	This trait is carried by region holding operations that define a new scope for			This trait is carried by region holding operations that define a new scope for
	automatic allocation. Such allocations are automatically freed when control is			automatic allocation. Such allocations are automatically freed when control is
	transferred back from the regions of such operations. As an example, allocations			transferred back from the regions of such operations. As an example, allocations
	performed by			performed by
	Show All 34 Lines
	establishes a set of properties that allow reasoning about / converting between			establishes a set of properties that allow reasoning about / converting between
	scalar/vector/tensor code. These same properties allow blanket implementations			scalar/vector/tensor code. These same properties allow blanket implementations
	of various analyses/transformations for all `ElementwiseMappable` ops.			of various analyses/transformations for all `ElementwiseMappable` ops.

	Note: Not all ops that are "elementwise" in some abstract sense satisfy this			Note: Not all ops that are "elementwise" in some abstract sense satisfy this
	trait. In particular, broadcasting behavior is not allowed. See the comments on			trait. In particular, broadcasting behavior is not allowed. See the comments on
	`OpTrait::ElementwiseMappable` for the precise requirements.			`OpTrait::ElementwiseMappable` for the precise requirements.

				### ExtendsAffineScope

				* `OpTrait::ExtendsAffineScope` -- `ExtendsAffineScope`

				This trait is carried by region-holding operations that further extend an
				`affine scope`. An affine scope is a unit of IR (operations) used for
				affine/polyhedral optimization purposes. An affine scope is started by any
				region-holding operation without the `ExtendsAffineScope` trait. An operation
				with this trait further extends such an affine scope started or extended by its
				parent operation to its own blocks. An affine scope does not extend further
				when a region-holding operation without the `ExtendsAffineScope` trait is
				encountered. Any region arguments of an operation with the `ExtendsAffineScope`
				trait are valid [dimensional
				identifiers](Dialects/Affine.md#restrictions-on-dimensions-and-symbols) for that
				affine
				scope.

				The affine scope defined by a region-holding operation without this trait
				includes all operations in its region excluding any operations that
				are nested inside of other region-holding operations that themselves do not have
				this trait. An example is shown below: `AS0` and `AS1` are affine scopes.

				```
				region_op() { // Starts a new affine scope `AS0`.
				op_with_extends_affine_scope_trait(%a, %b) { // AS0
				op1 // AS0
				op2 // AS0
				op3() { // AS0
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Nice illustration. nicolasvasilache: Nice illustration.
				op11 // New scope AS1 started by op3. // AS1
				op12 // AS1
				}
				affine.for %i = 0 to %N { // AS0
				affine.for %i = 0 to %N { // AS0
				op22 // AS0
				}
				}
				}
				}
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions It would be nice to discuss the relationship between affine scope and analyses / optimizations. In particular, what if op11 and op12 contain unknown side-effecting ops that one cannot summarize: what is the granularity at which e.g dependence analysis occurs? My intuition is that only AS1 is analyzable and transformable. In particular that if `op_with_extends_affine_scope_trait` is actually `affine.parallel`, the IR may be invalid, UB or racy by construction depending on how you want to define this. nicolasvasilache: It would be nice to discuss the relationship between affine scope and analyses / optimizations.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions With an `affine.parallel`, could we just consider that the "unknown side-effects" in op11/op12 are guarantee'd to not conflict or there is UB because of the race condition? mehdi_amini: With an `affine.parallel`, could we just consider that the "unknown side-effects" in op11/op12…
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This may be a bridge too far though .. in my mind `racy by construction` can very well conflict but is not `UB`. Lock-free like algorithms where multiple threads compute and commit the same results or RMW-like atomics are both important things to be able to represent. nicolasvasilache: This may be a bridge too far though .. in my mind `racy by construction` can very well conflict…
				```

	### Function-Like			### Function-Like

	* `OpTrait::FunctionLike`			* `OpTrait::FunctionLike`

	This trait provides APIs for operations that behave like functions. In			This trait provides APIs for operations that behave like functions. In
	particular:			particular:

	- Ops must be symbols, i.e. also have the `Symbol` trait;			- Ops must be symbols, i.e. also have the `Symbol` trait;
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Affine/IR/AffineOps.h

Show All 18 Lines
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/Interfaces/LoopLikeInterface.h"		#include "mlir/Interfaces/LoopLikeInterface.h"

namespace mlir {		namespace mlir {
class AffineApplyOp;		class AffineApplyOp;
class AffineBound;		class AffineBound;
class AffineValueMap;		class AffineValueMap;

/// A utility function to check if a value is defined at the top level of an		/// Checks if a value is a top level value of an op that starts an affine scope.
/// op with trait `AffineScope` or is a region argument for such an op. A value		/// If the value is defined in an unlinked region, it is conservatively assumed
/// of index type defined at the top level is always a valid symbol for all its		/// not to be top-level. A value of index type defined at the top level is
/// uses.		/// always a valid symbol for affine purposes.
bool isTopLevelValue(Value value);		bool isTopLevelValue(Value value);

/// AffineDmaStartOp starts a non-blocking DMA operation that transfers data		/// AffineDmaStartOp starts a non-blocking DMA operation that transfers data
/// from a source memref to a destination memref. The source and destination		/// from a source memref to a destination memref. The source and destination
/// memref need not be of the same dimensionality, but need to have the same		/// memref need not be of the same dimensionality, but need to have the same
/// elemental type. The operands include the source and destination memref's		/// elemental type. The operands include the source and destination memref's
/// each followed by its indices, size of the data transfer in terms of the		/// each followed by its indices, size of the data transfer in terms of the
/// number of elements (of the elemental type of the memref), a tag memref with		/// number of elements (of the elemental type of the memref), a tag memref with
▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	public:
static StringRef getTagMapAttrName() { return "tag_map"; }		static StringRef getTagMapAttrName() { return "tag_map"; }
static ParseResult parse(OpAsmParser &parser, OperationState &result);		static ParseResult parse(OpAsmParser &parser, OperationState &result);
void print(OpAsmPrinter &p);		void print(OpAsmPrinter &p);
LogicalResult verify();		LogicalResult verify();
LogicalResult fold(ArrayRef<Attribute> cstOperands,		LogicalResult fold(ArrayRef<Attribute> cstOperands,
SmallVectorImpl<OpFoldResult> &results);		SmallVectorImpl<OpFoldResult> &results);
};		};

/// Returns true if the given Value can be used as a dimension id in the region		/// Returns true if the given value can be used as a symbol at all its use
/// of the closest surrounding op that has the trait `AffineScope`.		/// sites. This is true iff it meets one of the following
		/// conditions:
		// *) It is valid as a symbol.
		// *) It is a region argument for a block with the `ExtendsAffineScope` trait
		// (eg. an induction variable of an affine.for or affine.parallel).
		// *) It is the result of affine apply operation with dimension id arguments.
bool isValidDim(Value value);		bool isValidDim(Value value);

/// Returns true if the given Value can be used as a dimension id in `region`,		/// Returns true if the `value` can be used as a dimension id in the affine
/// i.e., for all its uses in `region`.		/// scope that begins at `region`.
bool isValidDim(Value value, Region *region);		bool isValidDim(Value value, Region *region);

/// Returns true if the given value can be used as a symbol in the region of the		/// Returns true if `value` can be used as a symbol at all its use sites.
/// closest surrounding op that has the trait `AffineScope`.
bool isValidSymbol(Value value);		bool isValidSymbol(Value value);

/// Returns true if the given Value can be used as a symbol for `region`, i.e.,		/// Returns true if the given Value can be used as a symbol in the affine
/// for all its uses in `region`.		/// scope that begins at `region`.
bool isValidSymbol(Value value, Region *region);		bool isValidSymbol(Value value, Region *region);

/// Parses dimension and symbol list. `numDims` is set to the number of		/// Parses dimension and symbol list. `numDims` is set to the number of
/// dimensions in the list parsed.		/// dimensions in the list parsed.
ParseResult parseDimAndSymbolList(OpAsmParser &parser,		ParseResult parseDimAndSymbolList(OpAsmParser &parser,
SmallVectorImpl<Value> &operands,		SmallVectorImpl<Value> &operands,
unsigned &numDims);		unsigned &numDims);

▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Affine/IR/AffineOps.td

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	def AffineApplyOp : Affine_Op<"apply", [NoSideEffect]> {

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Returns the affine map to be applied by this operation.		/// Returns the affine map to be applied by this operation.
AffineMap getAffineMap() { return map(); }		AffineMap getAffineMap() { return map(); }

/// Returns the affine value map computed from this operation.		/// Returns the affine value map computed from this operation.
AffineValueMap getAffineValueMap();		AffineValueMap getAffineValueMap();

/// Returns true if the result of this operation can be used as dimension id		/// Returns true if the result of this operation can be used as a dimension id
/// in the region of the closest surrounding op with trait AffineScope.		/// at any of its use sites.
bool isValidDim();		bool isValidDim();

/// Returns true if the result of this operation can be used as dimension id		/// Returns true if the result of this operation can be used as dimension id
/// within 'region', i.e., for all its uses with `region`.		/// within the affine scope starting with 'region', i.e., for all its uses
		/// in that affine scope.
bool isValidDim(Region *region);		bool isValidDim(Region *region);

/// Returns true if the result of this operation is a symbol in the region		/// Returns true if the result of this operation is a valid symbol for all
/// of the closest surrounding op that has the trait AffineScope.		/// of its uses.
bool isValidSymbol();		bool isValidSymbol();

/// Returns true if the result of this operation is a symbol for all its		/// Returns true if the result of this operation is a valid symbol for all its
/// uses in `region`.		/// uses in the affine scope starting at `region`.
bool isValidSymbol(Region *region);		bool isValidSymbol(Region *region);

operand_range getMapOperands() { return getOperands(); }		operand_range getMapOperands() { return getOperands(); }
}];		}];

let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasFolder = 1;		let hasFolder = 1;
}		}

def AffineForOp : Affine_Op<"for",		def AffineForOp : Affine_Op<"for",
[ImplicitAffineTerminator, RecursiveSideEffects,		[ExtendsAffineScope, ImplicitAffineTerminator, RecursiveSideEffects,
DeclareOpInterfaceMethods<LoopLikeOpInterface>]> {		DeclareOpInterfaceMethods<LoopLikeOpInterface>]> {
let summary = "for operation";		let summary = "for operation";
let description = [{		let description = [{
Syntax:		Syntax:

```		```
operation ::= `affine.for` ssa-id `=` lower-bound `to` upper-bound		operation ::= `affine.for` ssa-id `=` lower-bound `to` upper-bound
(`step` integer-literal)? `{` op* `}`		(`step` integer-literal)? `{` op* `}`
▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
bool matchingBoundOperandList();		bool matchingBoundOperandList();
}];		}];

let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasFolder = 1;		let hasFolder = 1;
}		}

def AffineIfOp : Affine_Op<"if",		def AffineIfOp : Affine_Op<"if",
[ImplicitAffineTerminator, RecursiveSideEffects,		[ExtendsAffineScope, ImplicitAffineTerminator,
NoRegionArguments]> {		RecursiveSideEffects, NoRegionArguments]> {
let summary = "if-then-else operation";		let summary = "if-then-else operation";
let description = [{		let description = [{
Syntax:		Syntax:

```		```
operation ::= `affine.if` if-op-cond `{` op* `}` (`else` `{` op* `}`)?		operation ::= `affine.if` if-op-cond `{` op* `}` (`else` `{` op* `}`)?
if-op-cond ::= integer-set-attr dim-and-symbol-use-list		if-op-cond ::= integer-set-attr dim-and-symbol-use-list
```		```
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	let description = [{

```mlir		```mlir
%0 = affine.max (d0) -> (1000, d0 + 512) (%i0) : index		%0 = affine.max (d0) -> (1000, d0 + 512) (%i0) : index
```		```
}];		}];
}		}

def AffineParallelOp : Affine_Op<"parallel",		def AffineParallelOp : Affine_Op<"parallel",
[ImplicitAffineTerminator, RecursiveSideEffects,		[ExtendsAffineScope, ImplicitAffineTerminator, RecursiveSideEffects,
DeclareOpInterfaceMethods<LoopLikeOpInterface>, MemRefsNormalizable]> {		DeclareOpInterfaceMethods<LoopLikeOpInterface>, MemRefsNormalizable]> {
let summary = "multi-index parallel band operation";		let summary = "multi-index parallel band operation";
let description = [{		let description = [{
The "affine.parallel" operation represents a hyper-rectangular affine		The "affine.parallel" operation represents a hyper-rectangular affine
parallel band, defining zero or more SSA values for its induction variables.		parallel band, defining zero or more SSA values for its induction variables.
It has one region capturing the parallel band body. The induction variables		It has one region capturing the parallel band body. The induction variables
are represented as arguments of this region. These SSA values always have		are represented as arguments of this region. These SSA values always have
type index, which is the size of the machine word. The strides, represented		type index, which is the size of the machine word. The strides, represented
▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Shape/IR/ShapeOps.td

Show First 20 Lines • Show All 1,019 Lines • ▼ Show 20 Lines	def Shape_CstrRequireOp : Shape_Op<"cstr_require", []> {
let hasFolder = 1;		let hasFolder = 1;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Shape collection ops.		// Shape collection ops.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def Shape_FunctionLibraryOp : Shape_Op<"function_library",		def Shape_FunctionLibraryOp : Shape_Op<"function_library",
[AffineScope, IsolatedFromAbove, NoRegionArguments, SymbolTable, Symbol,		[IsolatedFromAbove, NoRegionArguments, SymbolTable, Symbol, NoTerminator,
NoTerminator, SingleBlock]> {		SingleBlock]> {
let summary = "Represents shape functions and corresponding ops";		let summary = "Represents shape functions and corresponding ops";
let description = [{		let description = [{
Represents a list of shape functions and the ops whose shape transfer		Represents a list of shape functions and the ops whose shape transfer
functions they represent.		functions they represent.

Example:		Example:

```mlir		```mlir
Show All 29 Lines

mlir/include/mlir/IR/BuiltinOps.td

Show All 26 Lines
class Builtin_Op<string mnemonic, list<OpTrait> traits = []> :		class Builtin_Op<string mnemonic, list<OpTrait> traits = []> :
Op<Builtin_Dialect, mnemonic, traits>;		Op<Builtin_Dialect, mnemonic, traits>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// FuncOp		// FuncOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def FuncOp : Builtin_Op<"func", [		def FuncOp : Builtin_Op<"func", [
AffineScope, AutomaticAllocationScope, CallableOpInterface, FunctionLike,		AutomaticAllocationScope, CallableOpInterface, FunctionLike, IsolatedFromAbove,
IsolatedFromAbove, Symbol		Symbol
]> {		]> {
let summary = "An operation with a name containing a single `SSACFG` region";		let summary = "An operation with a name containing a single `SSACFG` region";
let description = [{		let description = [{
Operations within the function cannot implicitly capture values defined		Operations within the function cannot implicitly capture values defined
outside of the function, i.e. Functions are `IsolatedFromAbove`. All		outside of the function, i.e. Functions are `IsolatedFromAbove`. All
external references must use function arguments or attributes that establish		external references must use function arguments or attributes that establish
a symbolic connection (e.g. symbols referenced by name via a string		a symbolic connection (e.g. symbols referenced by name via a string
attribute like SymbolRefAttr). An external function declaration (used when		attribute like SymbolRefAttr). An external function declaration (used when
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	]> {
let verifier = [{ return ::verify(*this); }];		let verifier = [{ return ::verify(*this); }];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ModuleOp		// ModuleOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def ModuleOp : Builtin_Op<"module", [		def ModuleOp : Builtin_Op<"module", [
AffineScope, IsolatedFromAbove, NoRegionArguments, SymbolTable, Symbol,		IsolatedFromAbove, NoRegionArguments, SymbolTable, Symbol, OpAsmOpInterface
OpAsmOpInterface
] # GraphRegionNoTerminator.traits> {		] # GraphRegionNoTerminator.traits> {
let summary = "A top level container operation";		let summary = "A top level container operation";
let description = [{		let description = [{
A `module` represents a top-level container operation. It contains a single		A `module` represents a top-level container operation. It contains a single
[graph region](../LangRef.md#control-flow-and-ssacfg-regions) containing a single block		[graph region](../LangRef.md#control-flow-and-ssacfg-regions) containing a single block
which can contain any operations and does not have a terminator. Operations		which can contain any operations and does not have a terminator. Operations
within this region cannot implicitly capture values defined outside the module,		within this region cannot implicitly capture values defined outside the module,
i.e. Modules are [IsolatedFromAbove](../Traits.md#isolatedfromabove). Modules have		i.e. Modules are [IsolatedFromAbove](../Traits.md#isolatedfromabove). Modules have
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

mlir/include/mlir/IR/OpBase.td

	Show First 20 Lines • Show All 1,910 Lines • ▼ Show 20 Lines

	// These classes are used to define operation specific traits.			// These classes are used to define operation specific traits.
	class NativeOpTrait<string name> : NativeTrait<name, "Op">, OpTrait;			class NativeOpTrait<string name> : NativeTrait<name, "Op">, OpTrait;
	class ParamNativeOpTrait<string prop, string params>			class ParamNativeOpTrait<string prop, string params>
	: ParamNativeTrait<prop, params, "Op">, OpTrait;			: ParamNativeTrait<prop, params, "Op">, OpTrait;
	class GenInternalOpTrait<string prop> : GenInternalTrait<prop, "Op">, OpTrait;			class GenInternalOpTrait<string prop> : GenInternalTrait<prop, "Op">, OpTrait;
	class PredOpTrait<string descr, Pred pred> : PredTrait<descr, pred>, OpTrait;			class PredOpTrait<string descr, Pred pred> : PredTrait<descr, pred>, OpTrait;

	// Op defines an affine scope.			// Op extends an affine scope. See `Traits.md#ExtendsAffineScope`.
	def AffineScope : NativeOpTrait<"AffineScope">;			def ExtendsAffineScope : NativeOpTrait<"ExtendsAffineScope">;
	// Op defines an automatic allocation scope.			// Op defines an automatic allocation scope.
	def AutomaticAllocationScope : NativeOpTrait<"AutomaticAllocationScope">;			def AutomaticAllocationScope : NativeOpTrait<"AutomaticAllocationScope">;
	// Op supports operand broadcast behavior.			// Op supports operand broadcast behavior.
	def ResultsBroadcastableShape :			def ResultsBroadcastableShape :
	NativeOpTrait<"ResultsBroadcastableShape">;			NativeOpTrait<"ResultsBroadcastableShape">;
	// X op Y == Y op X			// X op Y == Y op X
	def Commutative : NativeOpTrait<"IsCommutative">;			def Commutative : NativeOpTrait<"IsCommutative">;
	// op op X == op X			// op op X == op X
	▲ Show 20 Lines • Show All 1,091 Lines • Show Last 20 Lines

mlir/include/mlir/IR/OpDefinition.h

	Show First 20 Lines • Show All 1,167 Lines • ▼ Show 20 Lines
	class IsIsolatedFromAbove			class IsIsolatedFromAbove
	: public TraitBase<ConcreteType, IsIsolatedFromAbove> {			: public TraitBase<ConcreteType, IsIsolatedFromAbove> {
	public:			public:
	static LogicalResult verifyTrait(Operation *op) {			static LogicalResult verifyTrait(Operation *op) {
	return impl::verifyIsIsolatedFromAbove(op);			return impl::verifyIsIsolatedFromAbove(op);
	}			}
	};			};

	/// A trait of region holding operations that defines a new scope for polyhedral			/// A trait of region-holding operations that further extends an `affine scope`.
	/// optimization purposes. Any SSA values of 'index' type that either dominate			/// An affine scope is used for affine/polyhedral optimization purposes and is
	/// such an operation or are used at the top-level of such an operation			/// started by any region-holding operation without the `ExtendsAffineScope`
	/// automatically become valid symbols for the polyhedral scope defined by that			/// trait. An operation with the `ExtendsAffineScope` trait further extends the
	/// operation. For more details, see `Traits.md#AffineScope`.			/// affine scope started or extended by its parent operation to its own blocks.
				/// The affine scope doesn't extend further when a region-holding operation
				/// without this trait is encountered. Any region arguments of an operation with
				/// the `ExtendsAffineScope` trait are valid `dimensional` identifiers for the
				/// affine scope. For more details, see `Traits.md#ExtendsAffineScope`.
	template <typename ConcreteType>			template <typename ConcreteType>
	class AffineScope : public TraitBase<ConcreteType, AffineScope> {			class ExtendsAffineScope : public TraitBase<ConcreteType, ExtendsAffineScope> {
	public:			public:
	static LogicalResult verifyTrait(Operation *op) {			static LogicalResult verifyTrait(Operation *op) {
	static_assert(!ConcreteType::template hasTrait<ZeroRegion>(),			static_assert(!ConcreteType::template hasTrait<ZeroRegion>(),
	"expected operation to have one or more regions");			"expected operation to have one or more regions");
	return success();			return success();
	}			}
	};			};

	▲ Show 20 Lines • Show All 719 Lines • Show Last 20 Lines

mlir/lib/Analysis/AffineAnalysis.cpp

	Show All 26 Lines
	#include "llvm/ADT/TypeSwitch.h"			#include "llvm/ADT/TypeSwitch.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	#define DEBUG_TYPE "affine-analysis"			#define DEBUG_TYPE "affine-analysis"

	using namespace mlir;			using namespace mlir;

	using llvm::dbgs;

	/// Get the value that is being reduced by `pos`-th reduction in the loop if			/// Get the value that is being reduced by `pos`-th reduction in the loop if
	/// such a reduction can be performed by affine parallel loops. This assumes			/// such a reduction can be performed by affine parallel loops. This assumes
	/// floating-point operations are commutative. On success, `kind` will be the			/// floating-point operations are commutative. On success, `kind` will be the
	/// reduction kind suitable for use in affine parallel loop builder. If the			/// reduction kind suitable for use in affine parallel loop builder. If the
	/// reduction is not supported, returns null.			/// reduction is not supported, returns null.
	static Value getSupportedReduction(AffineForOp forOp, unsigned pos,			static Value getSupportedReduction(AffineForOp forOp, unsigned pos,
	AtomicRMWKind &kind) {			AtomicRMWKind &kind) {
	SmallVector<Operation *> combinerOps;			SmallVector<Operation *> combinerOps;
	▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
	}			}

	/// Returns Block common to 'srcAccess.opInst' and 'dstAccess.opInst'.			/// Returns Block common to 'srcAccess.opInst' and 'dstAccess.opInst'.
	static Block *getCommonBlock(const MemRefAccess &srcAccess,			static Block *getCommonBlock(const MemRefAccess &srcAccess,
	const MemRefAccess &dstAccess,			const MemRefAccess &dstAccess,
	const FlatAffineValueConstraints &srcDomain,			const FlatAffineValueConstraints &srcDomain,
	unsigned numCommonLoops) {			unsigned numCommonLoops) {
	// Get the chain of ancestor blocks to the given `MemRefAccess` instance. The			// Get the chain of ancestor blocks to the given `MemRefAccess` instance. The
	// search terminates when either an op with the `AffineScope` trait or			// search terminates when either an op that starts an affine scope or
	// `endBlock` is reached.			// `endBlock` is reached.
	auto getChainOfAncestorBlocks = [&](const MemRefAccess &access,			auto getChainOfAncestorBlocks = [&](const MemRefAccess &access,
	SmallVector<Block *, 4> &ancestorBlocks,			SmallVector<Block *, 4> &ancestorBlocks,
	Block *endBlock = nullptr) {			Block *endBlock = nullptr) {
	Block *currBlock = access.opInst->getBlock();			Block *currBlock = access.opInst->getBlock();
	// Loop terminates when the currBlock is nullptr or equals to the endBlock,			// Loop terminates when the currBlock is nullptr or equals to the endBlock,
	// or its parent operation holds an affine scope.			// or its parent operation holds an affine scope.
	while (currBlock && currBlock != endBlock &&			while (currBlock && currBlock != endBlock &&
	!currBlock->getParentOp()->hasTrait<OpTrait::AffineScope>()) {			currBlock->getParentOp()->hasTrait<OpTrait::ExtendsAffineScope>()) {
	ancestorBlocks.push_back(currBlock);			ancestorBlocks.push_back(currBlock);
	currBlock = currBlock->getParentOp()->getBlock();			currBlock = currBlock->getParentOp()->getBlock();
	}			}
	};			};

	if (numCommonLoops == 0) {			if (numCommonLoops == 0) {
	Block *block = srcAccess.opInst->getBlock();			Block *block = srcAccess.opInst->getBlock();
	while (!llvm::isa<FuncOp>(block->getParentOp())) {			while (!llvm::isa<FuncOp>(block->getParentOp())) {
	▲ Show 20 Lines • Show All 371 Lines • Show Last 20 Lines

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

Show All 24 Lines

using namespace mlir;		using namespace mlir;

#define DEBUG_TYPE "affine-analysis"		#define DEBUG_TYPE "affine-analysis"

#include "mlir/Dialect/Affine/IR/AffineOpsDialect.cpp.inc"		#include "mlir/Dialect/Affine/IR/AffineOpsDialect.cpp.inc"

/// A utility function to check if a value is defined at the top level of		/// A utility function to check if a value is defined at the top level of
/// `region` or is an argument of `region`. A value of index type defined at the		/// `region` or is an argument of `region`.
/// top level of a `AffineScope` region is always a valid symbol for all
/// uses in that region.
static bool isTopLevelValue(Value value, Region *region) {		static bool isTopLevelValue(Value value, Region *region) {
if (auto arg = value.dyn_cast<BlockArgument>())		if (auto arg = value.dyn_cast<BlockArgument>())
return arg.getParentRegion() == region;		return arg.getParentRegion() == region;
return value.getDefiningOp()->getParentRegion() == region;		return value.getDefiningOp()->getParentRegion() == region;
}		}

/// Checks if `value` known to be a legal affine dimension or symbol in `src`		/// Checks if `value` known to be a legal affine dimension or symbol in `src`
/// region remains legal if the operation that uses it is inlined into `dest`		/// region remains legal if the operation that uses it is inlined into `dest`
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	struct AffineInlinerInterface : public DialectInlinerInterface {
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// Returns true if the given region 'src' can be inlined into the region		/// Returns true if the given region 'src' can be inlined into the region
/// 'dest' that is attached to an operation registered to the current dialect.		/// 'dest' that is attached to an operation registered to the current dialect.
/// 'wouldBeCloned' is set if the region is cloned into its new location		/// 'wouldBeCloned' is set if the region is cloned into its new location
/// rather than moved, indicating there may be other users.		/// rather than moved, indicating there may be other users.
bool isLegalToInline(Region dest, Region src, bool wouldBeCloned,		bool isLegalToInline(Region dest, Region src, bool wouldBeCloned,
BlockAndValueMapping &valueMapping) const final {		BlockAndValueMapping &valueMapping) const final {
// We can inline into affine loops and conditionals if this doesn't break		// We can inline into affine loops, conditionals, and such other ops with
// affine value categorization rules.		// the trait `ExtendsAffineScope` if this doesn't break affine value
		// categorization rules.
Operation *destOp = dest->getParentOp();		Operation *destOp = dest->getParentOp();
if (!isa<AffineParallelOp, AffineForOp, AffineIfOp>(destOp))		if (!destOp->hasTrait<OpTrait::ExtendsAffineScope>())
return false;		return false;

// Multi-block regions cannot be inlined into affine constructs, all of		// Multi-block regions cannot be inlined into affine constructs, all of
// which require single-block regions.		// which require single-block regions.
if (!llvm::hasSingleElement(*src))		if (!llvm::hasSingleElement(*src))
return false;		return false;

// Side-effecting operations that the affine dialect cannot understand		// Side-effecting operations that the affine dialect cannot understand
Show All 25 Lines	bool isLegalToInline(Region dest, Region src, bool wouldBeCloned,

return true;		return true;
}		}

/// Returns true if the given operation 'op', that is registered to this		/// Returns true if the given operation 'op', that is registered to this
/// dialect, can be inlined into the given region, false otherwise.		/// dialect, can be inlined into the given region, false otherwise.
bool isLegalToInline(Operation op, Region region, bool wouldBeCloned,		bool isLegalToInline(Operation op, Region region, bool wouldBeCloned,
BlockAndValueMapping &valueMapping) const final {		BlockAndValueMapping &valueMapping) const final {
// Always allow inlining affine operations into a region that is marked as		// Always allow inlining affine operations into any region. Region-holding
// affine scope, or into affine loops and conditionals. There are some edge		// operations without the `ExtendsAffineScope` trait always start a new
// cases when inlining into affine structures, but that is handled in the		// affine scope, and so it's legal to inline into them. Those with the
// other 'isLegalToInline' hook above.		// `ExtendsAffineScope` trait cannot always be inlined into, but that is
Operation *parentOp = region->getParentOp();		// handled in the other `isLegalToInline` hook above.
return parentOp->hasTrait<OpTrait::AffineScope>() \|\|		return true;
isa<AffineForOp, AffineParallelOp, AffineIfOp>(parentOp);
}		}

/// Affine regions should be analyzed recursively.		/// Affine regions should be analyzed recursively.
bool shouldAnalyzeRecursively(Operation *op) const final { return true; }		bool shouldAnalyzeRecursively(Operation *op) const final { return true; }
};		};
} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
Show All 11 Lines
/// Materialize a single constant operation from a given attribute value with		/// Materialize a single constant operation from a given attribute value with
/// the desired resultant type.		/// the desired resultant type.
Operation *AffineDialect::materializeConstant(OpBuilder &builder,		Operation *AffineDialect::materializeConstant(OpBuilder &builder,
Attribute value, Type type,		Attribute value, Type type,
Location loc) {		Location loc) {
return builder.create<arith::ConstantOp>(loc, type, value);		return builder.create<arith::ConstantOp>(loc, type, value);
}		}

/// A utility function to check if a value is defined at the top level of an		/// Checks if a value is a top level value of an op that starts an affine scope.
/// op with trait `AffineScope`. If the value is defined in an unlinked region,		/// If the value is defined in an unlinked region, it is conservatively assumed
/// conservatively assume it is not top-level. A value of index type defined at		/// not to be top-level. A value of index type defined at the top level is
/// the top level is always a valid symbol.		/// always a valid symbol for affine purposes.
bool mlir::isTopLevelValue(Value value) {		bool mlir::isTopLevelValue(Value value) {
if (auto arg = value.dyn_cast<BlockArgument>()) {		if (auto arg = value.dyn_cast<BlockArgument>()) {
		Operation *parentOp = arg.getOwner()->getParentOp();
		// The value can't be a block argument owned by an op extending an affine
		// scope -- in the latter case, it can only be a dimensional value.
// The block owning the argument may be unlinked, e.g. when the surrounding		// The block owning the argument may be unlinked, e.g. when the surrounding
// region has not yet been attached to an Op, at which point the parent Op		// region has not yet been attached to an op, at which point the parent op
// is null.		// is null.
Operation *parentOp = arg.getOwner()->getParentOp();		return parentOp && !parentOp->hasTrait<OpTrait::ExtendsAffineScope>();
return parentOp && parentOp->hasTrait<OpTrait::AffineScope>();
}		}
// The defining Op may live in an unlinked block so its parent Op may be null.		// The defining op when it exists has to have a parent op that starts an
		// affine scope. The defining op may live in an unlinked region's block so its
		// parent op may be null.
Operation *parentOp = value.getDefiningOp()->getParentOp();		Operation *parentOp = value.getDefiningOp()->getParentOp();
return parentOp && parentOp->hasTrait<OpTrait::AffineScope>();		return parentOp && !parentOp->hasTrait<OpTrait::ExtendsAffineScope>();
}		}

/// Returns the closest region enclosing `op` that is held by an operation with		/// Returns the closest region enclosing `op` that is held by an operation that
/// trait `AffineScope`; `nullptr` if there is no such region.		/// starts an affine scope; `nullptr` if there is no such region.
// TODO: getAffineScope should be publicly exposed for affine passes/utilities.
static Region getAffineScope(Operation op) {		static Region getAffineScope(Operation op) {
auto *curOp = op;		Operation *curOp = op;
while (auto *parentOp = curOp->getParentOp()) {		while (auto *parentOp = curOp->getParentOp()) {
if (parentOp->hasTrait<OpTrait::AffineScope>())		if (!parentOp->hasTrait<OpTrait::ExtendsAffineScope>())
return curOp->getParentRegion();		return curOp->getParentRegion();
curOp = parentOp;		curOp = parentOp;
}		}
return nullptr;		return nullptr;
}		}

// A Value can be used as a dimension id iff it meets one of the following		// A Value can be used as a dimension id iff it meets one of the following
// conditions:		// conditions:
// *) It is valid as a symbol.		// 1) It is valid as a symbol.
// *) It is an induction variable.		// 2) It is a region argument on an op with the trait `ExtendsAffineScope`
		// (eg. induction variable of an affine.for/affine.parallel).
// *) It is the result of affine apply operation with dimension id arguments.		// *) It is the result of affine apply operation with dimension id arguments.
bool mlir::isValidDim(Value value) {		bool mlir::isValidDim(Value value) {
// The value must be an index type.		// The value must be an index type.
if (!value.getType().isIndex())		if (!value.getType().isIndex())
return false;		return false;

if (auto *defOp = value.getDefiningOp())		// Conditions (1) and (2) above imply any block argument would be a valid
return isValidDim(value, getAffineScope(defOp));		// dimesional identifier.
		if (value.isa<BlockArgument>())
		ftynseUnsubmitted Not Done Reply Inline Actions Looking at this, I realize that it is worth discussing block arguments in the documentation. So far, it only mentions region arguments, i.e. arguments of the entry block. IMO, block arguments can be treated similarly to op results, that is, they are valid symbols if the region that contains the block is attached to an operation without the `ExtendsAffineScope` trait. And valid dimensions in any case. I don't know if we want to allow operations with the `ExtendsAffineScope` trait to have more than one block, maybe we can stay conservative for now and have a verifier check on the trait. ftynse: Looking at this, I realize that it is worth discussing block arguments in the documentation. So…
		return true;

// This value has to be a block argument for an op that has the		// If defined by an op, the value has to be a valid dim for the affine scope
// `AffineScope` trait or for an affine.for or affine.parallel.		// it's definition is part of.
auto *parentOp = value.cast<BlockArgument>().getOwner()->getParentOp();		return isValidDim(value, getAffineScope(value.getDefiningOp()));
return parentOp && (parentOp->hasTrait<OpTrait::AffineScope>() \|\|
isa<AffineForOp, AffineParallelOp>(parentOp));
}		}

// Value can be used as a dimension id iff it meets one of the following		/// Returns true if the given Value can be used as a dimension id in the affine
// conditions:		/// scope starting at `region`, i.e., for all its uses in such an affine
// *) It is valid as a symbol.		/// scope. This is true if the value meets one of the following conditions:
// *) It is an induction variable.		// *) It is valid as a symbol for `region`:
// *) It is the result of an affine apply operation with dimension id operands.		// *) It is a region argument for a block with the `ExtendsAffineScope` trait
		// (eg. an induction variable of an affine.for or affine.parallel).
		// *) It is the result of an affine apply operation with dimensional operands.
bool mlir::isValidDim(Value value, Region *region) {		bool mlir::isValidDim(Value value, Region *region) {
// The value must be an index type.		// The value must be an index type.
if (!value.getType().isIndex())		if (!value.getType().isIndex())
return false;		return false;

// All valid symbols are okay.		// All valid symbols are okay.
if (isValidSymbol(value, region))		if (isValidSymbol(value, region))
return true;		return true;

auto *op = value.getDefiningOp();		Operation *op = value.getDefiningOp();
if (!op) {		if (!op) {
// This value has to be a block argument for an affine.for or an		// This value has to be a block argument for an op that extends the affine
// affine.parallel.		// scope of `region`. An unlinked region's block arguments are not valid
		// dimension ids.
auto *parentOp = value.cast<BlockArgument>().getOwner()->getParentOp();		auto *parentOp = value.cast<BlockArgument>().getOwner()->getParentOp();
return isa<AffineForOp, AffineParallelOp>(parentOp);		if (!parentOp \|\| !parentOp->hasTrait<OpTrait::ExtendsAffineScope>())
		return false;
		return getAffineScope(parentOp) == region;
}		}
		ftynseUnsubmitted Not Done Reply Inline Actions I'm not sure how this connects to `isValidDim(Value)` overload that says `BlockArgument`s are always valid dimensions, looks like a contradiction unless I am missing something. ftynse: I'm not sure how this connects to `isValidDim(Value)` overload that says `BlockArgument`s are…

// Affine apply operation is ok if all of its operands are ok.		// Affine apply operation is ok if all of its operands are ok.
if (auto applyOp = dyn_cast<AffineApplyOp>(op))		if (auto applyOp = dyn_cast<AffineApplyOp>(op))
return applyOp.isValidDim(region);		return applyOp.isValidDim(region);
// The dim op is okay if its operand memref/tensor is defined at the top		// The dim op is okay if its operand memref/tensor is a valid symbol for
// level.		// `region`.
if (auto dimOp = dyn_cast<memref::DimOp>(op))		if (auto dimOp = dyn_cast<memref::DimOp>(op))
return isTopLevelValue(dimOp.source());		return isTopLevelValue(dimOp.source());
if (auto dimOp = dyn_cast<tensor::DimOp>(op))		if (auto dimOp = dyn_cast<tensor::DimOp>(op))
return isTopLevelValue(dimOp.source());		return isTopLevelValue(dimOp.source());
return false;		return false;
}		}

/// Returns true if the 'index' dimension of the `memref` defined by		/// Returns true if the 'index' dimension of the `memref` defined by
/// `memrefDefOp` is a statically shaped one or defined using a valid symbol		/// `memrefDefOp` is a statically shaped one or defined using a valid symbol
/// for `region`.		/// for `region`.
template <typename AnyMemRefDefOp>		template <typename AnyMemRefDefOp>
static bool isMemRefSizeValidSymbol(AnyMemRefDefOp memrefDefOp, unsigned index,		static bool isMemRefSizeValidSymbol(AnyMemRefDefOp memrefDefOp, unsigned index,
Region *region) {		Region *region) {
auto memRefType = memrefDefOp.getType();		auto memRefType = memrefDefOp.getType();
// Statically shaped.		// Statically shaped.
if (!memRefType.isDynamicDim(index))		if (!memRefType.isDynamicDim(index))
return true;		return true;
// Get the position of the dimension among dynamic dimensions;		// Get the position of the dimension among dynamic dimensions;
unsigned dynamicDimPos = memRefType.getDynamicDimIndex(index);		unsigned dynamicDimPos = memRefType.getDynamicDimIndex(index);
return isValidSymbol(*(memrefDefOp.getDynamicSizes().begin() + dynamicDimPos),		return isValidSymbol(*(memrefDefOp.getDynamicSizes().begin() + dynamicDimPos),
region);		region);
}		}

/// Returns true if the result of the dim op is a valid symbol for `region`.		/// Returns true if the result of the dim op is a valid symbol for the affine
		/// scope starting at `region`.
template <typename OpTy>		template <typename OpTy>
static bool isDimOpValidSymbol(OpTy dimOp, Region *region) {		static bool isDimOpValidSymbol(OpTy dimOp, Region *region) {
// The dim op is okay if its source is defined at the top level.		// The dim op is okay if its source is defined at the top level.
if (isTopLevelValue(dimOp.source()))		if (isTopLevelValue(dimOp.source(), region))
return true;		return true;

// Conservatively handle remaining BlockArguments as non-valid symbols.		// Conservatively handle remaining BlockArguments as non-valid symbols.
// E.g. scf.for iterArgs.		// E.g. scf.for iterArgs.
if (dimOp.source().template isa<BlockArgument>())		if (dimOp.source().template isa<BlockArgument>())
return false;		return false;

// The dim op is also okay if its operand memref is a view/subview whose		// The dim op is also okay if its operand memref is a view/subview whose
// corresponding size is a valid symbol.		// corresponding size is a valid symbol.
Optional<int64_t> index = dimOp.getConstantIndex();		Optional<int64_t> index = dimOp.getConstantIndex();
assert(index.hasValue() &&		assert(index.hasValue() &&
"expect only `dim` operations with a constant index");		"expect only `dim` operations with a constant index");
int64_t i = index.getValue();		int64_t i = index.getValue();
return TypeSwitch<Operation *, bool>(dimOp.source().getDefiningOp())		return TypeSwitch<Operation *, bool>(dimOp.source().getDefiningOp())
.Case<memref::ViewOp, memref::SubViewOp, memref::AllocOp>(		.Case<memref::ViewOp, memref::SubViewOp, memref::AllocOp>(
[&](auto op) { return isMemRefSizeValidSymbol(op, i, region); })		[&](auto op) { return isMemRefSizeValidSymbol(op, i, region); })
.Default([](Operation *) { return false; });		.Default([](Operation *) { return false; });
}		}

// A value can be used as a symbol (at all its use sites) iff it meets one of		// A value can be used as a symbol (at all its use sites) iff it meets one of
// the following conditions:		// the following conditions:
// *) It is a constant.		// *) It is a constant.
// *) Its defining op or block arg appearance is immediately enclosed by an op		// *) Its defining op or block arg appearance is immediately enclosed by an op
// with `AffineScope` trait.		// that starts an affine scope.
// *) It is the result of an affine.apply operation with symbol operands.		// *) It is the result of an affine.apply operation with symbol operands.
// *) It is a result of the dim op on a memref whose corresponding size is a		// *) It is a result of the dim op on a memref whose corresponding size is a
// valid symbol.		// valid symbol.
bool mlir::isValidSymbol(Value value) {		bool mlir::isValidSymbol(Value value) {
if (!value)		if (!value)
return false;		return false;

// The value must be an index type.		// The value must be an index type.
if (!value.getType().isIndex())		if (!value.getType().isIndex())
return false;		return false;

// Check that the value is a top level value.		// Check that the value is a top level value.
if (isTopLevelValue(value))		if (isTopLevelValue(value))
return true;		return true;

if (auto *defOp = value.getDefiningOp())		if (auto *defOp = value.getDefiningOp())
return isValidSymbol(value, getAffineScope(defOp));		return isValidSymbol(value, getAffineScope(defOp));

return false;		return false;
}		}

/// A value can be used as a symbol for `region` iff it meets one of the		/// A value can be used as a symbol in the affine scope that begins at `region`.
/// following conditions:		/// iff it meets one of the following conditions:
/// *) It is a constant.		/// *) It is a constant.
/// *) It is the result of an affine apply operation with symbol arguments.		/// *) It is the result of an affine apply operation with symbol arguments.
/// *) It is a result of the dim op on a memref whose corresponding size is		/// *) It is a result of the dim op on a memref whose corresponding size is
/// a valid symbol.		/// a valid symbol.
/// *) It is defined at the top level of 'region' or is its argument.		/// *) It is defined at the top level of 'region' or is its argument.
/// *) It dominates `region`'s parent op.		/// *) It dominates `region`'s parent op.
/// If `region` is null, conservatively assume the symbol definition scope does		/// If `region` is null, conservatively assume the symbol definition scope does
/// not exist and only accept the values that would be symbols regardless of		/// not exist and only accept the values that would be symbols regardless of
/// the surrounding region structure, i.e. the first three cases above.		/// the surrounding region structure, i.e. the first three cases above.
bool mlir::isValidSymbol(Value value, Region *region) {		bool mlir::isValidSymbol(Value value, Region *region) {
// The value must be an index type.		// The value must be an index type.
if (!value.getType().isIndex())		if (!value.getType().isIndex())
return false;		return false;

// A top-level value is a valid symbol.		// A top-level value is a valid symbol.
if (region && ::isTopLevelValue(value, region))		if (region && ::isTopLevelValue(value, region))
return true;		return true;

auto *defOp = value.getDefiningOp();		auto *defOp = value.getDefiningOp();
if (!defOp) {		if (!defOp) {
// A block argument that is not a top-level value is a valid symbol if it		// A block argument that is not a top-level value is a valid symbol if it
// dominates region's parent op.		// dominates `region`'s parent op.
Operation *regionOp = region ? region->getParentOp() : nullptr;		Operation *regionOp = region ? region->getParentOp() : nullptr;
if (regionOp && !regionOp->hasTrait<OpTrait::IsIsolatedFromAbove>())		if (regionOp && !regionOp->hasTrait<OpTrait::IsIsolatedFromAbove>())
if (auto *parentOpRegion = region->getParentOp()->getParentRegion())		if (auto *parentOpRegion = regionOp->getParentRegion())
return isValidSymbol(value, parentOpRegion);		return isValidSymbol(value, parentOpRegion);
return false;		return false;
}		}

// Constant operation is ok.		// Constant operation is ok.
Attribute operandCst;		Attribute operandCst;
if (matchPattern(defOp, m_Constant(&operandCst)))		if (matchPattern(defOp, m_Constant(&operandCst)))
return true;		return true;
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	return llvm::all_of(getOperands(),
[](Value op) { return mlir::isValidDim(op); });		[](Value op) { return mlir::isValidDim(op); });
}		}

// The result of the affine apply operation can be used as a dimension id if all		// The result of the affine apply operation can be used as a dimension id if all
// its operands are valid dimension ids with the parent operation of `region`		// its operands are valid dimension ids with the parent operation of `region`
// defining the polyhedral scope for symbols.		// defining the polyhedral scope for symbols.
bool AffineApplyOp::isValidDim(Region *region) {		bool AffineApplyOp::isValidDim(Region *region) {
return llvm::all_of(getOperands(),		return llvm::all_of(getOperands(),
[&](Value op) { return ::isValidDim(op, region); });		[&](Value op) { return mlir::isValidDim(op, region); });
}		}

// The result of the affine apply operation can be used as a symbol if all its		// The result of the affine apply operation can be used as a symbol if all its
// operands are symbols.		// operands are symbols.
bool AffineApplyOp::isValidSymbol() {		bool AffineApplyOp::isValidSymbol() {
return llvm::all_of(getOperands(),		return llvm::all_of(getOperands(),
[](Value op) { return mlir::isValidSymbol(op); });		[](Value op) { return mlir::isValidSymbol(op); });
}		}
▲ Show 20 Lines • Show All 3,033 Lines • Show Last 20 Lines

mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	f.walk<WalkOrder::PreOrder>([&](AffineForOp loop) {
if (isLoopParallel(loop, parallelReductions ? &reductions : nullptr))		if (isLoopParallel(loop, parallelReductions ? &reductions : nullptr))
parallelizableLoops.push_back({loop, std::move(reductions)});		parallelizableLoops.push_back({loop, std::move(reductions)});
});		});

for (const ParallelizationCandidate &candidate : parallelizableLoops) {		for (const ParallelizationCandidate &candidate : parallelizableLoops) {
unsigned numParentParallelOps = 0;		unsigned numParentParallelOps = 0;
AffineForOp loop = candidate.loop;		AffineForOp loop = candidate.loop;
for (Operation *op = loop->getParentOp();		for (Operation *op = loop->getParentOp();
op != nullptr && !op->hasTrait<OpTrait::AffineScope>();		op && op->hasTrait<OpTrait::ExtendsAffineScope>();
op = op->getParentOp()) {		op = op->getParentOp()) {
if (isa<AffineParallelOp>(op))		if (isa<AffineParallelOp>(op))
++numParentParallelOps;		++numParentParallelOps;
}		}

if (numParentParallelOps < maxNested) {		if (numParentParallelOps < maxNested) {
if (failed(affineParallelize(loop, candidate.reductions))) {		if (failed(affineParallelize(loop, candidate.reductions))) {
LLVM_DEBUG(llvm::dbgs() << "[" DEBUG_TYPE "] failed to parallelize\n"		LLVM_DEBUG(llvm::dbgs() << "[" DEBUG_TYPE "] failed to parallelize\n"
Show All 12 Lines

mlir/test/Conversion/AffineToStandard/lower-affine.mlir

	Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: func @loop_min_max			// CHECK-LABEL: func @loop_min_max
	// CHECK-NEXT: %[[c0:.*]] = arith.constant 0 : index			// CHECK-NEXT: %[[c0:.*]] = arith.constant 0 : index
	// CHECK-NEXT: %[[c42:.*]] = arith.constant 42 : index			// CHECK-NEXT: %[[c42:.*]] = arith.constant 42 : index
	// CHECK-NEXT: %[[c1:.*]] = arith.constant 1 : index			// CHECK-NEXT: %[[c1:.*]] = arith.constant 1 : index
	// CHECK-NEXT: for %{{.*}} = %[[c0]] to %[[c42]] step %[[c1]] {			// CHECK-NEXT: for %{{.*}} = %[[c0]] to %[[c42]] step %[[c1]] {
	// CHECK-NEXT: %[[cm1:.*]] = arith.constant -1 : index			// CHECK-NEXT: %[[cm1:.*]] = arith.constant -1 : index
	// CHECK-NEXT: %[[a:.]] = arith.muli %{{.}}, %[[cm1]] : index			// CHECK-NEXT: %[[a:.]] = arith.muli %{{.}}, %[[cm1]] : index
	// CHECK-NEXT: %[[b:.]] = arith.addi %[[a]], %{{.}} : index			// CHECK-NEXT: %[[b:.]] = arith.addi %{{.}}, %[[a]] : index
	// CHECK-NEXT: %[[c:.]] = arith.cmpi sgt, %{{.}}, %[[b]] : index			// CHECK-NEXT: %[[c:.]] = arith.cmpi sgt, %{{.}}, %[[b]] : index
	// CHECK-NEXT: %[[d:.]] = select %[[c]], %{{.}}, %[[b]] : index			// CHECK-NEXT: %[[d:.]] = select %[[c]], %{{.}}, %[[b]] : index
	// CHECK-NEXT: %[[c10:.*]] = arith.constant 10 : index			// CHECK-NEXT: %[[c10:.*]] = arith.constant 10 : index
	// CHECK-NEXT: %[[e:.]] = arith.addi %{{.}}, %[[c10]] : index			// CHECK-NEXT: %[[e:.]] = arith.addi %{{.}}, %[[c10]] : index
	// CHECK-NEXT: %[[f:.]] = arith.cmpi slt, %{{.}}, %[[e]] : index			// CHECK-NEXT: %[[f:.]] = arith.cmpi slt, %{{.}}, %[[e]] : index
	// CHECK-NEXT: %[[g:.]] = select %[[f]], %{{.}}, %[[e]] : index			// CHECK-NEXT: %[[g:.]] = select %[[f]], %{{.}}, %[[e]] : index
	// CHECK-NEXT: %[[c1_0:.*]] = arith.constant 1 : index			// CHECK-NEXT: %[[c1_0:.*]] = arith.constant 1 : index
	// CHECK-NEXT: for %{{.*}} = %[[d]] to %[[g]] step %[[c1_0]] {			// CHECK-NEXT: for %{{.*}} = %[[d]] to %[[g]] step %[[c1_0]] {
	▲ Show 20 Lines • Show All 523 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/canonicalize.mlir

Show First 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	affine.for %i0 = 1 to 100 {
// CHECK-DAG: {{.}} = affine.apply #[[$symbolic_semi_affine]](%{{.}})[%{{.*}}]		// CHECK-DAG: {{.}} = affine.apply #[[$symbolic_semi_affine]](%{{.}})[%{{.*}}]
memref.store %f1, %A[%2] : memref<?xf32>		memref.store %f1, %A[%2] : memref<?xf32>
}		}
return		return
}		}

// -----		// -----

		// CHECK-LABEL: func @symbol_or_dim
		func @symbol_or_dim() {
		%c0 = arith.constant 0 : index
		%c1 = arith.constant 1 : index
		%c100 = arith.constant 100 : index
		affine.for %i = 0 to 100 {
		scf.for %j = %c0 to %c100 step %c1 {
		// %j should be a symbol here since it's part of the affine scope started
		// by the above scf.for.
		%s = affine.apply affine_map<(d0) -> (2 * d0)>(%j)
		// CHECK: affine.apply #{{.}}()[%{{.}}]
		"test.foo"(%s) : (index) -> ()
		}
		}
		return
		}

		// -----

// CHECK: #[[$MAP0:.*]] = affine_map<()[s0] -> (0, s0)>		// CHECK: #[[$MAP0:.*]] = affine_map<()[s0] -> (0, s0)>
// CHECK: #[[$MAP1:.*]] = affine_map<()[s0] -> (100, s0)>		// CHECK: #[[$MAP1:.*]] = affine_map<()[s0] -> (100, s0)>

// CHECK-LABEL: func @constant_fold_bounds(%arg0: index) {		// CHECK-LABEL: func @constant_fold_bounds(%arg0: index) {
func @constant_fold_bounds(%N : index) {		func @constant_fold_bounds(%N : index) {
// CHECK: arith.constant 3 : index		// CHECK: arith.constant 3 : index
// CHECK-NEXT: "foo"() : () -> index		// CHECK-NEXT: "foo"() : () -> index
%c9 = arith.constant 9 : index		%c9 = arith.constant 9 : index
▲ Show 20 Lines • Show All 540 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/invalid.mlir

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	affine.for %n0 = 0 to 7 {
// expected-error@+1 {{operand cannot be used as a dimension id}}		// expected-error@+1 {{operand cannot be used as a dimension id}}
affine.for %n1 = #map(%dim)[%arg] to 7 {		affine.for %n1 = #map(%dim)[%arg] to 7 {
}		}
}		}
return		return
}		}

// -----		// -----
func @affine_load_invalid_dim(%M : memref<10xi32>) {
"unknown"() ({
^bb0(%arg: index):
affine.load %M[%arg] : memref<10xi32>
// expected-error@-1 {{index must be a dimension or symbol identifier}}
br ^bb1
^bb1:
br ^bb1
}) : () -> ()
return
}

// -----

#map0 = affine_map<(d0)[s0] -> (d0 + s0)>		#map0 = affine_map<(d0)[s0] -> (d0 + s0)>

func @affine_for_lower_bound_invalid_sym() {		func @affine_for_lower_bound_invalid_sym() {
affine.for %i0 = 0 to 7 {		affine.for %i0 = 0 to 7 {
// expected-error@+1 {{operand cannot be used as a symbol}}		// expected-error@+1 {{operand cannot be used as a symbol}}
affine.for %n0 = #map0(%i0)[%i0] to 7 {		affine.for %n0 = #map0(%i0)[%i0] to 7 {
}		}
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	affine.for %n0 = 0 to 7 {
// expected-error@+1 {{operand cannot be used as a symbol}}		// expected-error@+1 {{operand cannot be used as a symbol}}
affine.if #set0(%dim)[%n0] {}		affine.if #set0(%dim)[%n0] {}
}		}
return		return
}		}

// -----		// -----

		// Test symbol and dim constraints for ops with ExtendAffineScope trait.

		// CHECK-LABEL: func @valid_symbol_affine_scope
		func @valid_symbol_affine_scope(%n : index, %A : memref<?xindex>) {
		// Any region holding op starts an affine scope unless it has an
		// `ExtendAffineScope` trait in which case it extends the affine scope started
		// by its parent op.
		"test.foo"() ({
		%c1 = arith.constant 1 : index
		%l = arith.subi %n, %c1 : index
		// %l, %n are valid symbols
		test.affine_scope_extend {
		// %d is a valid dimensional identifier.
		^bb0(%d : index):
		// %d is a valid dimensional identifier.
		%s = affine.load %A[%d] : memref<?xindex>
		// However, %s isn't.
		// expected-error@+1 {{index must be a dimension or symbol identifier}}
		affine.load %A[%s] : memref<?xindex>
		"terminate"() : () -> ()
		}
		"terminate"() : () -> ()
		}) : () -> ()
		return
		}

		// -----

func @affine_store_missing_l_square(%C: memref<4096x4096xf32>) {		func @affine_store_missing_l_square(%C: memref<4096x4096xf32>) {
%9 = arith.constant 0.0 : f32		%9 = arith.constant 0.0 : f32
// expected-error@+1 {{expected '['}}		// expected-error@+1 {{expected '['}}
affine.store %9, %C : memref<4096x4096xf32>		affine.store %9, %C : memref<4096x4096xf32>
return		return
}		}

// -----		// -----
▲ Show 20 Lines • Show All 230 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/ops.mlir

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	func @affine_max(%arg0 : index, %arg1 : index, %arg2 : index) {
%2 = affine.max affine_map<()[s0, s1] -> (s0 - s1, 11)> ()[%arg1, %arg2]		%2 = affine.max affine_map<()[s0, s1] -> (s0 - s1, 11)> ()[%arg1, %arg2]
// CHECK: affine.max #[[$MAP3]]()		// CHECK: affine.max #[[$MAP3]]()
%3 = affine.max affine_map<()[] -> (77, 78, 79)> ()[]		%3 = affine.max affine_map<()[] -> (77, 78, 79)> ()[]
return		return
}		}

// -----		// -----

func @valid_symbols(%arg0: index, %arg1: index, %arg2: index) {		func @valid_symbols(%arg0: index, %arg1: index, %arg2: index, %M: memref<10xi32>) {
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>		%0 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
affine.for %arg3 = 0 to %arg2 step 768 {		affine.for %arg3 = 0 to %arg2 step 768 {
%13 = memref.dim %0, %c1 : memref<?x?xf32>		%13 = memref.dim %0, %c1 : memref<?x?xf32>
affine.for %arg4 = 0 to %13 step 264 {		affine.for %arg4 = 0 to %13 step 264 {
%18 = memref.dim %0, %c0 : memref<?x?xf32>		%18 = memref.dim %0, %c0 : memref<?x?xf32>
%20 = memref.subview %0[%c0, %c0][%18,%arg4][%c1,%c1] : memref<?x?xf32>		%20 = memref.subview %0[%c0, %c0][%18,%arg4][%c1,%c1] : memref<?x?xf32>
to memref<?x?xf32, offset : ?, strides : [?, ?]>		to memref<?x?xf32, offset : ?, strides : [?, ?]>
%24 = memref.dim %20, %c0 : memref<?x?xf32, offset : ?, strides : [?, ?]>		%24 = memref.dim %20, %c0 : memref<?x?xf32, offset : ?, strides : [?, ?]>
affine.for %arg5 = 0 to %24 step 768 {		affine.for %arg5 = 0 to %24 step 768 {
"foo"() : () -> ()		"foo"() : () -> ()
}		}
}		}
}		}
		"test.unknown"() ({
		^bb0(%arg: index):
		// %arg is a valid symbolic identifier.
		affine.load %M[%arg] : memref<10xi32>
		"test.yield"() : () -> ()
		}) : () -> ()
return		return
}		}

// -----		// -----

// Test symbol constraints for ops with AffineScope trait.		// Test ops with ExtendAffineScope trait.

// CHECK-LABEL: func @valid_symbol_affine_scope		// CHECK-LABEL: func @valid_symbol_affine_scope
func @valid_symbol_affine_scope(%n : index, %A : memref<?xf32>) {		func @valid_symbol_affine_scope(%n : index, %A : memref<?xf32>) {
test.affine_scope {		// Any region holding op starts an affine scope unless it has an
		// `ExtendAffineScope` trait in which case it extends the affine scope started
		// by its parent op.
		"test.foo"() ({
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%l = arith.subi %n, %c1 : index		%l = arith.subi %n, %c1 : index
// %l, %n are valid symbols since test.affine_scope defines a new affine		// %l, %n are valid symbols.
// scope.
affine.for %i = %l to %n {		affine.for %i = %l to %n {
%m = arith.subi %l, %i : index
test.affine_scope {
// %m and %n are valid symbols.
affine.for %j = %m to %n {
%v = affine.load %A[%n - 1] : memref<?xf32>
affine.store %v, %A[%n - 1] : memref<?xf32>
}		}
		test.affine_scope_extend {
		// %d is a valid dimensional identifier.
		^bb0(%d : index):
		affine.load %A[%d] : memref<?xf32>
"terminate"() : () -> ()		"terminate"() : () -> ()
}		}
}
"terminate"() : () -> ()		"terminate"() : () -> ()
		}) : () -> ()
		return
		}

		// -----

		// Test dim/symbol rules involving memref dim ops.

		func @valid_memref_dim_symbols(%M : index) {
		%c0 = arith.constant 0 : index
		scf.execute_region {
		%A = memref.alloc() : memref<8xf32>
		affine.for %i = 0 to 100 {
		// %N is a valid symbol. It's defined at the top-level of a region-holding
		// op that doesn't have the ExtendsAffineScope trait.
		%N = memref.dim %A, %c0 : memref<8xf32>
		affine.for %j = 0 to %N {
		}
		}
		scf.yield
		}
		"test.foo"() ({
		%A = arith.constant dense<0.0> : tensor<8xf32>
		affine.for %i = 0 to 100 {
		// %N is a valid symbol. It's defined at the top-level of a region-holding
		// op that doesn't have the ExtendsAffineScope trait.
		%N = tensor.dim %A, %c0 : tensor<8xf32>
		affine.for %j = 0 to %N {
		}
}		}
		}) : () -> ()
return		return
}		}

// -----		// -----

// Test the fact that module op always provides an affine scope.		// Test the fact that module op always starts an affine scope.

%idx = "test.foo"() : () -> (index)		%idx = "test.foo"() : () -> (index)
"test.func"() ({		"test.func"() ({
^bb0(%A : memref<?xf32>):		^bb0(%A : memref<?xf32>):
affine.load %A[%idx] : memref<?xf32>		affine.load %A[%idx] : memref<?xf32>
"terminate"() : () -> ()		"terminate"() : () -> ()
}) : () -> ()		}) : () -> ()

▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/comprehensive-module-bufferize.mlir

	Show First 20 Lines • Show All 553 Lines • ▼ Show 20 Lines
	// -----			// -----

	func private @some_use(memref<?xf32>)			func private @some_use(memref<?xf32>)

	#TILE_MAP = affine_map<(d0)[s0] -> (3, -d0 + s0)>			#TILE_MAP = affine_map<(d0)[s0] -> (3, -d0 + s0)>

	// CHECK-DAG: #[[$DYN_0D_MAP:.*]] = affine_map<()[s0] -> (s0)>			// CHECK-DAG: #[[$DYN_0D_MAP:.*]] = affine_map<()[s0] -> (s0)>
	// CHECK-DAG: #[[$DYN_1D_MAP:.]] = affine_map<(d0)[s0, s1] -> (d0 s1 + s0)>			// CHECK-DAG: #[[$DYN_1D_MAP:.]] = affine_map<(d0)[s0, s1] -> (d0 s1 + s0)>
	// CHECK-DAG: #[[$TILE_MAP:.*]] = affine_map<(d0)[s0] -> (3, -d0 + s0)>			// CHECK-DAG: #[[$TILE_MAP:.*]] = affine_map<()[s0, s1] -> (3, s0 - s1)>

	// CHECK: func @tiled_dot(			// CHECK: func @tiled_dot(
	// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?xf32, #[[$DYN_1D_MAP]]>			// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?xf32, #[[$DYN_1D_MAP]]>
	// CHECK-SAME: %[[B:[a-zA-Z0-9]*]]: memref<?xf32, #[[$DYN_1D_MAP]]>			// CHECK-SAME: %[[B:[a-zA-Z0-9]*]]: memref<?xf32, #[[$DYN_1D_MAP]]>
	// CHECK-SAME: %[[c:[a-zA-Z0-9]*]]: memref<f32, #[[$DYN_0D_MAP]]>			// CHECK-SAME: %[[c:[a-zA-Z0-9]*]]: memref<f32, #[[$DYN_0D_MAP]]>
	func @tiled_dot(%A: tensor<?xf32>, %B: tensor<?xf32>, %c: tensor<f32> {linalg.inplaceable = true},			func @tiled_dot(%A: tensor<?xf32>, %B: tensor<?xf32>, %c: tensor<f32> {linalg.inplaceable = true},
	%effecting: memref<?xf32>) -> tensor<f32> {			%effecting: memref<?xf32>) -> tensor<f32> {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/fusion-indexed.mlir

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	linalg.generic {
ins(%A_view : memref<?x?xindex, #map>)		ins(%A_view : memref<?x?xindex, #map>)
outs(%B_view : memref<?x?xindex, #map>) {		outs(%B_view : memref<?x?xindex, #map>) {
^bb0(%a: index, %b: index):		^bb0(%a: index, %b: index):
linalg.yield %a : index		linalg.yield %a : index
}		}
}		}
return		return
}		}
// CHECK: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECK: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<()[s0, s1] -> (s0 + s1)>
// CHECK-LABEL: func @fuse_indexed_producer		// CHECK-LABEL: func @fuse_indexed_producer
// CHECK: scf.parallel ([[I:%.]], [[J:%.]]) =		// CHECK: scf.parallel ([[I:%.]], [[J:%.]]) =
// CHECK: linalg.generic		// CHECK: linalg.generic
// CHECK: [[idx0:%.*]] = linalg.index 0 : index		// CHECK: %[[idx0:.*]] = linalg.index 0 : index
// CHECK: [[i_new:%.*]] = affine.apply [[$MAP]]([[idx0]], [[J]])		// CHECK: [[i_new:%.*]] = affine.apply [[$MAP]]()[%[[idx0]], [[J]]]
// CHECK: [[idx1:%.*]] = linalg.index 1 : index		// CHECK: %[[idx1:.*]] = linalg.index 1 : index
// CHECK: [[j_new:%.*]] = affine.apply [[$MAP]]([[idx1]], [[I]])		// CHECK: [[j_new:%.*]] = affine.apply [[$MAP]]()[%[[idx1]], [[I]]]
// CHECK: [[sum:%.*]] = arith.addi [[i_new]], [[j_new]] : index		// CHECK: [[sum:%.*]] = arith.addi [[i_new]], [[j_new]] : index
// CHECK: linalg.yield [[sum]] : index		// CHECK: linalg.yield [[sum]] : index
// CHECK: linalg.generic		// CHECK: linalg.generic

// -----		// -----

#map = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>		#map = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>
func @fuse_indexed_producer_tiled_second_dim_only(%A: memref<?x?xindex>,		func @fuse_indexed_producer_tiled_second_dim_only(%A: memref<?x?xindex>,
Show All 25 Lines	linalg.generic {
ins(%A_view : memref<?x?xindex, #map>)		ins(%A_view : memref<?x?xindex, #map>)
outs(%B_view : memref<?x?xindex, #map>) {		outs(%B_view : memref<?x?xindex, #map>) {
^bb0(%a: index, %b: index):		^bb0(%a: index, %b: index):
linalg.yield %a : index		linalg.yield %a : index
}		}
}		}
return		return
}		}
// CHECK: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECK: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<()[s0, s1] -> (s0 + s1)>
// CHECK-LABEL: func @fuse_indexed_producer_tiled_second_dim_only		// CHECK-LABEL: func @fuse_indexed_producer_tiled_second_dim_only
// CHECK: scf.parallel ([[J:%.*]]) =		// CHECK: scf.parallel ([[J:%.*]]) =
// CHECK: linalg.generic		// CHECK: linalg.generic
// CHECK: [[idx0:%.*]] = linalg.index 0 : index		// CHECK: [[idx0:%.*]] = linalg.index 0 : index
// CHECK: [[idx1:%.*]] = linalg.index 1 : index		// CHECK: %[[idx1:.*]] = linalg.index 1 : index
// CHECK: [[j_new:%.*]] = affine.apply [[$MAP]]([[idx1]], [[J]])		// CHECK: [[j_new:%.*]] = affine.apply [[$MAP]]()[%[[idx1]], [[J]]]
// CHECK: [[sum:%.*]] = arith.addi [[idx0]], [[j_new]] : index		// CHECK: [[sum:%.*]] = arith.addi [[idx0]], [[j_new]] : index
// CHECK: linalg.yield [[sum]] : index		// CHECK: linalg.yield [[sum]] : index
// CHECK: linalg.generic		// CHECK: linalg.generic

mlir/test/Dialect/Linalg/fusion-pattern.mlir

// RUN: mlir-opt %s -test-linalg-fusion-transform-patterns -canonicalize -cse -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -test-linalg-fusion-transform-patterns -canonicalize -cse -split-input-file \| FileCheck %s

module {		module {
func @basic_fusion(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>,		func @basic_fusion(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>,
%arg2: memref<?x?xf32>) {		%arg2: memref<?x?xf32>) {
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
linalg.fill(%cst, %arg2) : f32, memref<?x?xf32>		linalg.fill(%cst, %arg2) : f32, memref<?x?xf32>
linalg.matmul {__internal_linalg_transform__ = "basic_fusion"}		linalg.matmul {__internal_linalg_transform__ = "basic_fusion"}
ins(%arg0, %arg1 : memref<?x?xf32>, memref<?x?xf32>)		ins(%arg0, %arg1 : memref<?x?xf32>, memref<?x?xf32>)
outs(%arg2 : memref<?x?xf32>)		outs(%arg2 : memref<?x?xf32>)
return		return
}		}
}		}

// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0] -> (32, -d0 + s0)>		// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (32, s0 - s1)>
// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>		// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>
// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0] -> (64, -d0 + s0)>		// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1] -> (64, s0 - s1)>
// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK-DAG: #[[MAP3:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>
// CHECK-DAG: #[[MAP4:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 32, -d0 + s1)>		// CHECK-DAG: #[[MAP4:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 32, -s1 + s2)>
// CHECK-DAG: #[[MAP5:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 64, -d0 + s1)>		// CHECK-DAG: #[[MAP5:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 64, -s1 + s2)>
// CHECK: func @basic_fusion		// CHECK: func @basic_fusion
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index		// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index
// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index		// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index
// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index		// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index
// CHECK-DAG: %[[CST:.+]] = arith.constant 0.0{{.*}} : f32		// CHECK-DAG: %[[CST:.+]] = arith.constant 0.0{{.*}} : f32
// CHECK-DAG: linalg.fill(%[[CST]], %[[ARG2]])		// CHECK-DAG: linalg.fill(%[[CST]], %[[ARG2]])
// CHECK-SAME: __internal_linalg_transform__ = "after_basic_fusion_original"		// CHECK-SAME: __internal_linalg_transform__ = "after_basic_fusion_original"
// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG0]], %[[C0]]		// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG0]], %[[C0]]
// CHECK-DAG: %[[N:.+]] = memref.dim %[[ARG1]], %[[C1]]		// CHECK-DAG: %[[N:.+]] = memref.dim %[[ARG1]], %[[C1]]
// CHECK: scf.parallel (%[[IV0:.+]], %[[IV1:.+]]) =		// CHECK: scf.parallel (%[[IV0:.+]], %[[IV1:.+]]) =
// CHECK-SAME: to (%[[M]], %[[N]])		// CHECK-SAME: to (%[[M]], %[[N]])
// CHECK-SAME: step (%[[C32]], %[[C64]]) {		// CHECK-SAME: step (%[[C32]], %[[C64]]) {
// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[M]]]		// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]]()[%[[M]], %[[IV0]]]
// CHECK: %[[K:.+]] = memref.dim %[[ARG0]], %[[C1]]		// CHECK: %[[K:.+]] = memref.dim %[[ARG0]], %[[C1]]
// CHECK: %[[SV1:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]		// CHECK: %[[SV1:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[K]]]		// CHECK-SAME: [%[[TILE_M]], %[[K]]]
// CHECK: %[[K_2:.+]] = memref.dim %[[ARG1]], %[[C0]]		// CHECK: %[[K_2:.+]] = memref.dim %[[ARG1]], %[[C0]]
// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP2]](%[[IV1]])[%[[N]]]		// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP2]]()[%[[N]], %[[IV1]]]
// CHECK: %[[SV2:.+]] = memref.subview %[[ARG1]][0, %[[IV1]]]		// CHECK: %[[SV2:.+]] = memref.subview %[[ARG1]][0, %[[IV1]]]
// CHECK-SAME: %[[K_2]], %[[TILE_N]]		// CHECK-SAME: %[[K_2]], %[[TILE_N]]
// CHECK: %[[SV3:.+]] = memref.subview %[[ARG2]][%[[IV0]], %[[IV1]]]		// CHECK: %[[SV3:.+]] = memref.subview %[[ARG2]][%[[IV0]], %[[IV1]]]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]
// CHECK: %[[M_2:.+]] = memref.dim %[[ARG2]], %[[C0]]		// CHECK: %[[M_2:.+]] = memref.dim %[[ARG2]], %[[C0]]
// CHECK: %[[N_2:.+]] = memref.dim %[[ARG2]], %[[C1]]		// CHECK: %[[N_2:.+]] = memref.dim %[[ARG2]], %[[C1]]
// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP4]](%[[IV0]])[%[[M_2]], %[[M]]]		// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP4]]()[%[[M_2]], %[[IV0]], %[[M]]]
// CHECK: %[[TILE_N_3:.+]] = affine.min #[[MAP5]](%[[IV1]])[%[[N_2]], %[[N]]]		// CHECK: %[[TILE_N_3:.+]] = affine.min #[[MAP5]]()[%[[N_2]], %[[IV1]], %[[N]]]
// CHECK: %[[SV3_2:.+]] = memref.subview %[[ARG2]][%[[IV0]], %[[IV1]]]		// CHECK: %[[SV3_2:.+]] = memref.subview %[[ARG2]][%[[IV0]], %[[IV1]]]
// CHECK-SAME: [%[[TILE_M_3]], %[[TILE_N_3]]]		// CHECK-SAME: [%[[TILE_M_3]], %[[TILE_N_3]]]
// CHECK: linalg.fill(%[[CST]], %[[SV3_2]])		// CHECK: linalg.fill(%[[CST]], %[[SV3_2]])
// CHECK-SAME: __internal_linalg_transform__ = "after_basic_fusion_producer"		// CHECK-SAME: __internal_linalg_transform__ = "after_basic_fusion_producer"
// CHECK: scf.for %[[IV2:.+]] = %[[C0]] to %[[K]] step %[[C16]] {		// CHECK: scf.for %[[IV2:.+]] = %[[C0]] to %[[K]] step %[[C16]] {
// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP3]](%[[IV2]])[%[[K]]]		// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP3]]()[%[[K]], %[[IV2]]]
// CHECK: %[[SV4:.+]] = memref.subview %[[SV1]][0, %[[IV2]]]		// CHECK: %[[SV4:.+]] = memref.subview %[[SV1]][0, %[[IV2]]]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]
// CHECK: %[[SV5:.+]] = memref.subview %[[SV2]][%[[IV2]], 0]		// CHECK: %[[SV5:.+]] = memref.subview %[[SV2]][%[[IV2]], 0]
// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: __internal_linalg_transform__ = "after_basic_fusion"		// CHECK-SAME: __internal_linalg_transform__ = "after_basic_fusion"
// CHECK-SAME: ins(%[[SV4]], %[[SV5]]		// CHECK-SAME: ins(%[[SV4]], %[[SV5]]
// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32, #[[MAP1]]>)		// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32, #[[MAP1]]>)
Show All 12 Lines	func @rhs_fusion(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>,
linalg.copy(%arg1, %arg2) : memref<?x?xf32>, memref<?x?xf32>		linalg.copy(%arg1, %arg2) : memref<?x?xf32>, memref<?x?xf32>
linalg.fill(%cst, %arg3) : f32, memref<?x?xf32>		linalg.fill(%cst, %arg3) : f32, memref<?x?xf32>
linalg.matmul {__internal_linalg_transform__ = "rhs_fusion"}		linalg.matmul {__internal_linalg_transform__ = "rhs_fusion"}
ins(%arg0, %arg2 : memref<?x?xf32>, memref<?x?xf32>)		ins(%arg0, %arg2 : memref<?x?xf32>, memref<?x?xf32>)
outs(%arg3 : memref<?x?xf32>)		outs(%arg3 : memref<?x?xf32>)
return		return
}		}
}		}
// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0] -> (64, -d0 + s0)>		// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (64, s0 - s1)>
// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>		// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>
// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0] -> (32, -d0 + s0)>		// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 64, -s1 + s2)>
// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK-DAG: #[[MAP3:.+]] = affine_map<()[s0, s1] -> (32, s0 - s1)>
// CHECK-DAG: #[[MAP4:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 64, -d0 + s1)>		// CHECK-DAG: #[[MAP4:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>

// CHECK: func @rhs_fusion		// CHECK: func @rhs_fusion
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index		// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index
// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index		// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index
// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index		// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index
// CHECK-DAG: %[[CST:.+]] = arith.constant 0.0{{.*}} : f32		// CHECK-DAG: %[[CST:.+]] = arith.constant 0.0{{.*}} : f32
// CHECK-DAG: linalg.copy(%[[ARG1]], %[[ARG2]])		// CHECK-DAG: linalg.copy(%[[ARG1]], %[[ARG2]])
// CHECK-SAME: __internal_linalg_transform__ = "after_rhs_fusion_original"		// CHECK-SAME: __internal_linalg_transform__ = "after_rhs_fusion_original"
// CHECK-DAG: %[[N:.+]] = memref.dim %[[ARG2]], %[[C1]]		// CHECK-DAG: %[[N:.+]] = memref.dim %[[ARG2]], %[[C1]]
// CHECK: scf.parallel (%[[IV0:.+]]) =		// CHECK: scf.parallel (%[[IV0:.+]]) =
// CHECK-SAME: (%[[C0]]) to (%[[N]]) step (%[[C64]]) {		// CHECK-SAME: (%[[C0]]) to (%[[N]]) step (%[[C64]]) {
// CHECK: %[[K:.+]] = memref.dim %[[ARG2]], %[[C0]]		// CHECK: %[[K:.+]] = memref.dim %[[ARG2]], %[[C0]]
// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[N]]]		// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP0]]()[%[[N]], %[[IV0]]]
// CHECK: %[[SV1:.+]] = memref.subview %[[ARG2]][0, %[[IV0]]]		// CHECK: %[[SV1:.+]] = memref.subview %[[ARG2]][0, %[[IV0]]]
// CHECK-SAME: [%[[K]], %[[TILE_N]]]		// CHECK-SAME: [%[[K]], %[[TILE_N]]]
// CHECK: %[[M:.+]] = memref.dim %[[ARG3]], %[[C0]]		// CHECK: %[[M:.+]] = memref.dim %[[ARG3]], %[[C0]]
// CHECK: %[[SV2:.+]] = memref.subview %[[ARG3]][0, %[[IV0]]]		// CHECK: %[[SV2:.+]] = memref.subview %[[ARG3]][0, %[[IV0]]]
// CHECK-SAME: [%[[M]], %[[TILE_N]]		// CHECK-SAME: [%[[M]], %[[TILE_N]]
// CHECK: %[[N_3:.+]] = memref.dim %[[ARG1]], %[[C1]]		// CHECK: %[[N_3:.+]] = memref.dim %[[ARG1]], %[[C1]]
// CHECK: %[[K_2:.+]] = memref.dim %[[ARG1]], %[[C0]]		// CHECK: %[[K_2:.+]] = memref.dim %[[ARG1]], %[[C0]]
// CHECK: %[[TILE_N_3:.+]] = affine.min #[[MAP4]](%[[IV0]])[%[[N_3]], %[[N]]]		// CHECK: %[[TILE_N_3:.+]] = affine.min #[[MAP2]]()[%[[N_3]], %[[IV0]], %[[N]]]
// CHECK: %[[SV3:.+]] = memref.subview %[[ARG1]][0, %[[IV0]]]		// CHECK: %[[SV3:.+]] = memref.subview %[[ARG1]][0, %[[IV0]]]
// CHECK-SAME: [%[[K_2]], %[[TILE_N_3]]]		// CHECK-SAME: [%[[K_2]], %[[TILE_N_3]]]
// CHECK: %[[SV3_2:.+]] = memref.subview %[[ARG2]][0, %[[IV0]]]		// CHECK: %[[SV3_2:.+]] = memref.subview %[[ARG2]][0, %[[IV0]]]
// CHECK-SAME: [%[[K]], %[[TILE_N_3]]]		// CHECK-SAME: [%[[K]], %[[TILE_N_3]]]
// CHECK: linalg.copy(%[[SV3]], %[[SV3_2]])		// CHECK: linalg.copy(%[[SV3]], %[[SV3_2]])
// CHECK-SAME: __internal_linalg_transform__ = "after_rhs_fusion_producer"		// CHECK-SAME: __internal_linalg_transform__ = "after_rhs_fusion_producer"
// CHECK-NOT: linalg.fill		// CHECK-NOT: linalg.fill
// CHECK-DAG: %[[M_2:.+]] = memref.dim %[[ARG0]], %[[C0]]		// CHECK-DAG: %[[M_2:.+]] = memref.dim %[[ARG0]], %[[C0]]
// CHECK-DAG: %[[K_2:.+]] = memref.dim %[[ARG0]], %[[C1]]		// CHECK-DAG: %[[K_2:.+]] = memref.dim %[[ARG0]], %[[C1]]
// CHECK: scf.parallel (%[[IV1:.+]]) =		// CHECK: scf.parallel (%[[IV1:.+]]) =
// CHECK-SAME: (%[[C0]]) to (%[[M_2]]) step (%[[C32]]) {		// CHECK-SAME: (%[[C0]]) to (%[[M_2]]) step (%[[C32]]) {
// CHECK-NEXT: scf.for %[[IV2:.+]] = %[[C0]] to %[[K_2]] step %[[C16]] {		// CHECK-NEXT: scf.for %[[IV2:.+]] = %[[C0]] to %[[K_2]] step %[[C16]] {
// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP2]](%[[IV1]])[%[[M_2]]]		// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP3]]()[%[[M_2]], %[[IV1]]]
// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP3]](%[[IV2]])[%[[K_2]]]		// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP4]]()[%[[K_2]], %[[IV2]]]
// CHECK: %[[SV4:.+]] = memref.subview %[[ARG0]][%[[IV1]], %[[IV2]]]		// CHECK: %[[SV4:.+]] = memref.subview %[[ARG0]][%[[IV1]], %[[IV2]]]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]
// CHECK: %[[SV5:.+]] = memref.subview %[[SV1]][%[[IV2]], 0]		// CHECK: %[[SV5:.+]] = memref.subview %[[SV1]][%[[IV2]], 0]
// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]
// CHECK: %[[SV6:.+]] = memref.subview %[[SV2]][%[[IV1]], 0]		// CHECK: %[[SV6:.+]] = memref.subview %[[SV2]][%[[IV1]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: __internal_linalg_transform__ = "after_rhs_fusion"		// CHECK-SAME: __internal_linalg_transform__ = "after_rhs_fusion"
Show All 16 Lines	func @two_operand_fusion(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>,
linalg.copy(%arg0, %arg1) : memref<?x?xf32>, memref<?x?xf32>		linalg.copy(%arg0, %arg1) : memref<?x?xf32>, memref<?x?xf32>
linalg.fill(%cst, %arg3) : f32, memref<?x?xf32>		linalg.fill(%cst, %arg3) : f32, memref<?x?xf32>
linalg.matmul {__internal_linalg_transform__ = "two_operand_fusion"}		linalg.matmul {__internal_linalg_transform__ = "two_operand_fusion"}
ins(%arg1, %arg2 : memref<?x?xf32>, memref<?x?xf32>)		ins(%arg1, %arg2 : memref<?x?xf32>, memref<?x?xf32>)
outs(%arg3 : memref<?x?xf32>)		outs(%arg3 : memref<?x?xf32>)
return		return
}		}
}		}
// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0] -> (32, -d0 + s0)>
		// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (32, s0 - s1)>
// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>		// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>
// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 32, -s1 + s2)>
// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0] -> (64, -d0 + s0)>		// CHECK-DAG: #[[MAP3:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>
// CHECK-DAG: #[[MAP4:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 32, -d0 + s1)>		// CHECK-DAG: #[[MAP4:.+]] = affine_map<()[s0, s1] -> (64, s0 - s1)>
// CHECK: func @two_operand_fusion		// CHECK: func @two_operand_fusion
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index		// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index
// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index		// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index
// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index		// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index
// CHECK-DAG: %[[CST:.+]] = arith.constant 0.0{{.*}} : f32		// CHECK-DAG: %[[CST:.+]] = arith.constant 0.0{{.*}} : f32
// CHECK: linalg.copy(%[[ARG0]], %[[ARG1]])		// CHECK: linalg.copy(%[[ARG0]], %[[ARG1]])
// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_original"		// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_original"
// CHECK: linalg.fill(%[[CST]], %[[ARG3]])		// CHECK: linalg.fill(%[[CST]], %[[ARG3]])
// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_original"		// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_original"
// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG1]], %[[C0]]		// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG1]], %[[C0]]
// CHECK: scf.parallel (%[[IV0:.+]]) =		// CHECK: scf.parallel (%[[IV0:.+]]) =
// CHECK-SAME: (%[[C0]]) to (%[[M]]) step (%[[C32]]) {		// CHECK-SAME: (%[[C0]]) to (%[[M]]) step (%[[C32]]) {
// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[M]]]		// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]]()[%[[M]], %[[IV0]]]
// CHECK: %[[K:.+]] = memref.dim %[[ARG1]], %[[C1]]		// CHECK: %[[K:.+]] = memref.dim %[[ARG1]], %[[C1]]
// CHECK: %[[SV1:.+]] = memref.subview %[[ARG1]][%[[IV0]], 0]		// CHECK: %[[SV1:.+]] = memref.subview %[[ARG1]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[K]]]		// CHECK-SAME: [%[[TILE_M]], %[[K]]]
// CHECK: %[[N:.+]] = memref.dim %[[ARG3]], %[[C1]]		// CHECK: %[[N:.+]] = memref.dim %[[ARG3]], %[[C1]]
// CHECK: %[[SV2:.+]] = memref.subview %[[ARG3]][%[[IV0]], 0]		// CHECK: %[[SV2:.+]] = memref.subview %[[ARG3]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[N]]]		// CHECK-SAME: [%[[TILE_M]], %[[N]]]
// CHECK: %[[M_2:.+]] = memref.dim %[[ARG3]], %[[C0]]		// CHECK: %[[M_2:.+]] = memref.dim %[[ARG3]], %[[C0]]
// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP4]](%[[IV0]])[%[[M_2]], %[[M]]]		// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP2]]()[%[[M_2]], %[[IV0]], %[[M]]]
// CHECK: %[[SV2_2:.+]] = memref.subview %[[ARG3]][%[[IV0]], 0]		// CHECK: %[[SV2_2:.+]] = memref.subview %[[ARG3]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_3]], %[[N]]]		// CHECK-SAME: [%[[TILE_M_3]], %[[N]]]
// CHECK: %[[M_3:.+]] = memref.dim %[[ARG0]], %[[C0]]		// CHECK: %[[M_3:.+]] = memref.dim %[[ARG0]], %[[C0]]
// CHECK: %[[TILE_M_4:.+]] = affine.min #[[MAP4]](%[[IV0]])[%[[M_3]], %[[M]]]		// CHECK: %[[TILE_M_4:.+]] = affine.min #[[MAP2]]()[%[[M_3]], %[[IV0]], %[[M]]]
// CHECK: %[[K_3:.+]] = memref.dim %[[ARG0]], %[[C1]]		// CHECK: %[[K_3:.+]] = memref.dim %[[ARG0]], %[[C1]]
// CHECK: %[[SV3:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]		// CHECK: %[[SV3:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_4]], %[[K_3]]]		// CHECK-SAME: [%[[TILE_M_4]], %[[K_3]]]
// CHECK: %[[SV3_2:.+]] = memref.subview %[[ARG1]][%[[IV0]], 0]		// CHECK: %[[SV3_2:.+]] = memref.subview %[[ARG1]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_4]], %[[K]]]		// CHECK-SAME: [%[[TILE_M_4]], %[[K]]]
// CHECK: linalg.copy(%[[SV3]], %[[SV3_2]])		// CHECK: linalg.copy(%[[SV3]], %[[SV3_2]])
// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_producer"		// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_producer"
// CHECK: linalg.fill(%[[CST]], %[[SV2_2]])		// CHECK: linalg.fill(%[[CST]], %[[SV2_2]])
// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_producer"		// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion_producer"
// CHECK-DAG: %[[N_2:.+]] = memref.dim %[[ARG2]], %[[C1]]		// CHECK-DAG: %[[N_2:.+]] = memref.dim %[[ARG2]], %[[C1]]
// CHECK: scf.parallel (%[[IV1:.+]]) =		// CHECK: scf.parallel (%[[IV1:.+]]) =
// CHECK-SAME: (%[[C0]]) to (%[[N_2]]) step (%[[C64]]) {		// CHECK-SAME: (%[[C0]]) to (%[[N_2]]) step (%[[C64]]) {
// CHECK-NEXT: scf.for %[[IV2:.+]] = %[[C0]] to %[[K]] step %[[C16]] {		// CHECK-NEXT: scf.for %[[IV2:.+]] = %[[C0]] to %[[K]] step %[[C16]] {
// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP2]](%[[IV2]])[%[[K]]]		// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP3]]()[%[[K]], %[[IV2]]]
// CHECK: %[[SV4:.+]] = memref.subview %[[SV1]][0, %[[IV2]]]		// CHECK: %[[SV4:.+]] = memref.subview %[[SV1]][0, %[[IV2]]]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]
// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP3]](%[[IV1]])[%[[N_2]]]		// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP4]]()[%[[N_2]], %[[IV1]]]
// CHECK: %[[SV5:.+]] = memref.subview %[[ARG2]][%[[IV2]], %[[IV1]]]		// CHECK: %[[SV5:.+]] = memref.subview %[[ARG2]][%[[IV2]], %[[IV1]]]
// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]
// CHECK: %[[SV6:.+]] = memref.subview %[[SV2]][0, %[[IV1]]]		// CHECK: %[[SV6:.+]] = memref.subview %[[SV2]][0, %[[IV1]]]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion"		// CHECK-SAME: __internal_linalg_transform__ = "after_two_operand_fusion"
// CHECK-SAME: ins(%[[SV4]], %[[SV5]]		// CHECK-SAME: ins(%[[SV4]], %[[SV5]]
// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32, #[[MAP1]]>)		// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32, #[[MAP1]]>)
Show All 13 Lines	func @matmul_fusion(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>,
linalg.matmul ins(%arg0, %arg1 : memref<?x?xf32>, memref<?x?xf32>)		linalg.matmul ins(%arg0, %arg1 : memref<?x?xf32>, memref<?x?xf32>)
outs(%arg2 : memref<?x?xf32>)		outs(%arg2 : memref<?x?xf32>)
linalg.matmul {__internal_linalg_transform__ = "lhs_fusion"}		linalg.matmul {__internal_linalg_transform__ = "lhs_fusion"}
ins(%arg2, %arg3 : memref<?x?xf32>, memref<?x?xf32>)		ins(%arg2, %arg3 : memref<?x?xf32>, memref<?x?xf32>)
outs(%arg4 : memref<?x?xf32>)		outs(%arg4 : memref<?x?xf32>)
return		return
}		}
}		}
// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0] -> (32, -d0 + s0)>		// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (32, s0 - s1)>
// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>		// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>
// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 32, -s1 + s2)>
// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0] -> (64, -d0 + s0)>		// CHECK-DAG: #[[MAP3:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>
// CHECK-DAG: #[[MAP4:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 32, -d0 + s1)>		// CHECK-DAG: #[[MAP4:.+]] = affine_map<()[s0, s1] -> (64, s0 - s1)>

// CHECK: func @matmul_fusion		// CHECK: func @matmul_fusion
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index		// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index
// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index		// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index
// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index		// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion_original"		// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion_original"
// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG2]], %[[C0]]		// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG2]], %[[C0]]
// CHECK: scf.parallel (%[[IV0:.+]]) =		// CHECK: scf.parallel (%[[IV0:.+]]) =
// CHECK-SAME: (%[[C0]]) to (%[[M]]) step (%[[C32]]) {		// CHECK-SAME: (%[[C0]]) to (%[[M]]) step (%[[C32]]) {
// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[M]]]		// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]]()[%[[M]], %[[IV0]]]
// CHECK: %[[K2:.+]] = memref.dim %[[ARG2]], %[[C1]]		// CHECK: %[[K2:.+]] = memref.dim %[[ARG2]], %[[C1]]
// CHECK: %[[SV1:.+]] = memref.subview %[[ARG2]][%[[IV0]], 0]		// CHECK: %[[SV1:.+]] = memref.subview %[[ARG2]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[K2]]]		// CHECK-SAME: [%[[TILE_M]], %[[K2]]]
// CHECK: %[[N:.+]] = memref.dim %[[ARG4]], %[[C1]]		// CHECK: %[[N:.+]] = memref.dim %[[ARG4]], %[[C1]]
// CHECK: %[[SV2:.+]] = memref.subview %[[ARG4]][%[[IV0]], 0]		// CHECK: %[[SV2:.+]] = memref.subview %[[ARG4]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[N]]]		// CHECK-SAME: [%[[TILE_M]], %[[N]]]
// CHECK: %[[M_3:.+]] = memref.dim %[[ARG0]], %[[C0]]		// CHECK: %[[M_3:.+]] = memref.dim %[[ARG0]], %[[C0]]
// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP4]](%[[IV0]])[%[[M_3]], %[[M]]]		// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP2]]()[%[[M_3]], %[[IV0]], %[[M]]]
// CHECK: %[[K1:.+]] = memref.dim %[[ARG0]], %[[C1]]		// CHECK: %[[K1:.+]] = memref.dim %[[ARG0]], %[[C1]]
// CHECK: %[[SV3:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]		// CHECK: %[[SV3:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_3]], %[[K1]]]		// CHECK-SAME: [%[[TILE_M_3]], %[[K1]]]
// CHECK: %[[SV1_2:.+]] = memref.subview %[[ARG2]][%[[IV0]], 0]		// CHECK: %[[SV1_2:.+]] = memref.subview %[[ARG2]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_3]], %[[K2]]]		// CHECK-SAME: [%[[TILE_M_3]], %[[K2]]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion_producer"		// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion_producer"
// CHECK-SAME: ins(%[[SV3]], %[[ARG1]]		// CHECK-SAME: ins(%[[SV3]], %[[ARG1]]
// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32>)		// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32>)
// CHECK-SAME: outs(%[[SV1_2]] : memref<?x?xf32, #[[MAP1]]>)		// CHECK-SAME: outs(%[[SV1_2]] : memref<?x?xf32, #[[MAP1]]>)
// CHECK: %[[N_2:.+]] = memref.dim %[[ARG3]], %[[C1]]		// CHECK: %[[N_2:.+]] = memref.dim %[[ARG3]], %[[C1]]
// CHECK: scf.parallel (%[[IV1:.+]]) =		// CHECK: scf.parallel (%[[IV1:.+]]) =
// CHECK-SAME: (%[[C0]]) to (%[[N_2]]) step (%[[C64]]) {		// CHECK-SAME: (%[[C0]]) to (%[[N_2]]) step (%[[C64]]) {
// CHECK-NEXT: scf.for %[[IV2:.+]] = %[[C0]] to %[[K2]] step %[[C16]] {		// CHECK-NEXT: scf.for %[[IV2:.+]] = %[[C0]] to %[[K2]] step %[[C16]] {
// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP2]](%[[IV2]])[%[[K2]]]		// CHECK: %[[TILE_K:.+]] = affine.min #[[MAP3]]()[%[[K2]], %[[IV2]]]
// CHECK: %[[SV6:.+]] = memref.subview %[[SV1]][0, %[[IV2]]]		// CHECK: %[[SV6:.+]] = memref.subview %[[SV1]][0, %[[IV2]]]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_K]]]
// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP3]](%[[IV1]])[%[[N_2]]]		// CHECK: %[[TILE_N:.+]] = affine.min #[[MAP4]]()[%[[N_2]], %[[IV1]]]
// CHECK: %[[SV7:.+]] = memref.subview %[[ARG3]][%[[IV2]], %[[IV1]]]		// CHECK: %[[SV7:.+]] = memref.subview %[[ARG3]][%[[IV2]], %[[IV1]]]
// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_K]], %[[TILE_N]]]
// CHECK: %[[SV8:.+]] = memref.subview %[[SV2]][0, %[[IV1]]]		// CHECK: %[[SV8:.+]] = memref.subview %[[SV2]][0, %[[IV1]]]
// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]		// CHECK-SAME: [%[[TILE_M]], %[[TILE_N]]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion"		// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion"
// CHECK-SAME: ins(%[[SV6]], %[[SV7]]		// CHECK-SAME: ins(%[[SV6]], %[[SV7]]
// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32, #[[MAP1]]>)		// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32, #[[MAP1]]>)
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/fusion-sequence.mlir

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	linalg.matmul ins(%0, %arg2 : memref<?x?xf32>, memref<?x?xf32>)
outs(%1 : memref<?x?xf32>)		outs(%1 : memref<?x?xf32>)
linalg.fill(%cst, %arg4) : f32, memref<?x?xf32>		linalg.fill(%cst, %arg4) : f32, memref<?x?xf32>
linalg.matmul ins(%1, %arg3 : memref<?x?xf32>, memref<?x?xf32>)		linalg.matmul ins(%1, %arg3 : memref<?x?xf32>, memref<?x?xf32>)
outs(%arg4 : memref<?x?xf32>)		outs(%arg4 : memref<?x?xf32>)
return		return
}		}
}		}

// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>
// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>		// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>
// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 16, -d0 + s1)>		// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 16, -s1 + s2)>
// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0] -> (-d0 + s0, 16)>


// CHECK: func @sequence_of_matmul		// CHECK: func @sequence_of_matmul
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: memref<?x?xf32>		// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: memref<?x?xf32>
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index		// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index
// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG0]], %[[C0]]		// CHECK-DAG: %[[M:.+]] = memref.dim %[[ARG0]], %[[C0]]
// CHECK-DAG: %[[N1:.+]] = memref.dim %[[ARG1]], %[[C1]]		// CHECK-DAG: %[[N1:.+]] = memref.dim %[[ARG1]], %[[C1]]
// CHECK-DAG: %[[N2:.+]] = memref.dim %[[ARG2]], %[[C1]]		// CHECK-DAG: %[[N2:.+]] = memref.dim %[[ARG2]], %[[C1]]
// CHECK: %[[ALLOC1:.+]] = memref.alloc(%[[M]], %[[N1]])		// CHECK: %[[ALLOC1:.+]] = memref.alloc(%[[M]], %[[N1]])
// CHECK: %[[ALLOC2:.+]] = memref.alloc(%[[M]], %[[N2]])		// CHECK: %[[ALLOC2:.+]] = memref.alloc(%[[M]], %[[N2]])
// CHECK: scf.parallel (%[[IV0:.+]]) = (%[[C0]]) to (%[[M]])		// CHECK: scf.parallel (%[[IV0:.+]]) = (%[[C0]]) to (%[[M]])
// CHECK-SAME: step (%[[C16]]) {		// CHECK-SAME: step (%[[C16]]) {
// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[M]]]		// CHECK: %[[TILE_M:.+]] = affine.min #[[MAP0]]()[%[[M]], %[[IV0]]]
// CHECK: %[[SV_ALLOC3:.+]] = memref.subview %[[ALLOC2]][%[[IV0]], 0]		// CHECK: %[[SV_ALLOC3:.+]] = memref.subview %[[ALLOC2]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[N2]]]		// CHECK-SAME: [%[[TILE_M]], %[[N2]]]
// CHECK: %[[N3:.+]] = memref.dim %[[ARG4]], %[[C1]]		// CHECK: %[[N3:.+]] = memref.dim %[[ARG4]], %[[C1]]
// CHECK: %[[SV_ARG4:.+]] = memref.subview %[[ARG4]][%[[IV0]], 0]		// CHECK: %[[SV_ARG4:.+]] = memref.subview %[[ARG4]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M]], %[[N3]]]		// CHECK-SAME: [%[[TILE_M]], %[[N3]]]
// CHECK: %[[M_2:.+]] = memref.dim %[[ARG4]], %[[C0]]		// CHECK: %[[M_2:.+]] = memref.dim %[[ARG4]], %[[C0]]
// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP2]](%[[IV0]])[%[[M_2]], %[[M]]]		// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP2]]()[%[[M_2]], %[[IV0]], %[[M]]]
// CHECK: %[[SV_ARG4_2:.+]] = memref.subview %[[ARG4]][%[[IV0]], 0]		// CHECK: %[[SV_ARG4_2:.+]] = memref.subview %[[ARG4]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_3]], %[[N3]]]		// CHECK-SAME: [%[[TILE_M_3]], %[[N3]]]
// CHECK: %[[TILE_M_4:.+]] = affine.min #[[MAP3]](%[[IV0]])[%[[M]]]		// CHECK: %[[TILE_M_4:.+]] = affine.min #[[MAP3]]()[%[[M]], %[[IV0]]]
// CHECK: %[[SV_ALLOC1:.+]] = memref.subview %[[ALLOC1]][%[[IV0]], 0]		// CHECK: %[[SV_ALLOC1:.+]] = memref.subview %[[ALLOC1]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_4]], %[[N1]]]		// CHECK-SAME: [%[[TILE_M_4]], %[[N1]]]
// CHECK: %[[SV_ALLOC2:.+]] = memref.subview %[[ALLOC2]][%[[IV0]], 0]		// CHECK: %[[SV_ALLOC2:.+]] = memref.subview %[[ALLOC2]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_4]], %[[N2]]]		// CHECK-SAME: [%[[TILE_M_4]], %[[N2]]]
// CHECK: %[[TILE_M_5:.+]] = affine.min #[[MAP2]](%[[IV0]])[%[[M]], %[[M]]]		// CHECK: %[[TILE_M_5:.+]] = affine.min #[[MAP2]]()[%[[M]], %[[IV0]], %[[M]]]
// CHECK: %[[N0:.+]] = memref.dim %[[ARG0]], %[[C1]]		// CHECK: %[[N0:.+]] = memref.dim %[[ARG0]], %[[C1]]
// CHECK: %[[SV_ARG0:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]		// CHECK: %[[SV_ARG0:.+]] = memref.subview %[[ARG0]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_5]], %[[N0]]]		// CHECK-SAME: [%[[TILE_M_5]], %[[N0]]]
// CHECK: %[[SV_ALLOC4:.+]] = memref.subview %[[ALLOC1]][%[[IV0]], 0]		// CHECK: %[[SV_ALLOC4:.+]] = memref.subview %[[ALLOC1]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_5]], %[[N1]]]		// CHECK-SAME: [%[[TILE_M_5]], %[[N1]]]
// CHECK: linalg.fill(%{{.+}}, %[[SV_ALLOC1]])		// CHECK: linalg.fill(%{{.+}}, %[[SV_ALLOC1]])
// CHECK: linalg.matmul ins(%[[SV_ARG0]], %[[ARG1]]		// CHECK: linalg.matmul ins(%[[SV_ARG0]], %[[ARG1]]
// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32>)		// CHECK-SAME: : memref<?x?xf32, #[[MAP1]]>, memref<?x?xf32>)
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	func @tensor_matmul_fusion(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>,
%1 = linalg.matmul ins(%0, %arg3 : tensor<?x?xf32>, tensor<?x?xf32>)		%1 = linalg.matmul ins(%0, %arg3 : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%arg4 : tensor<?x?xf32>) -> tensor<?x?xf32> // [M, N1] * [N1, N2]		outs(%arg4 : tensor<?x?xf32>) -> tensor<?x?xf32> // [M, N1] * [N1, N2]
%2 = linalg.matmul ins(%1, %arg5 : tensor<?x?xf32>, tensor<?x?xf32>)		%2 = linalg.matmul ins(%1, %arg5 : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%arg6 : tensor<?x?xf32>) -> tensor<?x?xf32> // [M, N2] * [N2, N3]		outs(%arg6 : tensor<?x?xf32>) -> tensor<?x?xf32> // [M, N2] * [N2, N3]
return %2 : tensor<?x?xf32>		return %2 : tensor<?x?xf32>
}		}
}		}

// CHECK: #[[MAP0:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>
// CHECK: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 16, -d0 + s1)>		// CHECK: #[[MAP1:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 16, -s1 + s2)>

// CHECK: func @tensor_matmul_fusion(		// CHECK: func @tensor_matmul_fusion(
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>		// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: tensor<?x?xf32>		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: tensor<?x?xf32>		// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: tensor<?x?xf32>		// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
// CHECK-SAME: %[[ARG5:[a-zA-Z0-9_]+]]: tensor<?x?xf32>		// CHECK-SAME: %[[ARG5:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
// CHECK-SAME: %[[ARG6:[a-zA-Z0-9_]+]]: tensor<?x?xf32>) -> tensor<?x?xf32> {		// CHECK-SAME: %[[ARG6:[a-zA-Z0-9_]+]]: tensor<?x?xf32>) -> tensor<?x?xf32> {
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
// CHECK: %[[M:.+]] = tensor.dim %[[ARG0]], %c0 : tensor<?x?xf32>		// CHECK: %[[M:.+]] = tensor.dim %[[ARG0]], %c0 : tensor<?x?xf32>
// CHECK: %[[R0:.+]] = scf.for %[[IV0:[a-zA-Z0-9_]+]] =		// CHECK: %[[R0:.+]] = scf.for %[[IV0:[a-zA-Z0-9_]+]] =
// CHECK-SAME: iter_args(%[[ARG8:.+]] = %[[ARG6]]) -> (tensor<?x?xf32>) {		// CHECK-SAME: iter_args(%[[ARG8:.+]] = %[[ARG6]]) -> (tensor<?x?xf32>) {
// CHECK: %[[TILE_M_1:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[M]]]		// CHECK: %[[TILE_M_1:.+]] = affine.min #[[MAP0]]()[%[[M]], %[[IV0]]]
// CHECK: %[[N3:.+]] = tensor.dim %[[ARG8]], %[[C1]]		// CHECK: %[[N3:.+]] = tensor.dim %[[ARG8]], %[[C1]]
// CHECK: %[[STARG6:.+]] = tensor.extract_slice %[[ARG8]][%[[IV0]], 0]		// CHECK: %[[STARG6:.+]] = tensor.extract_slice %[[ARG8]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_1]], %[[N3]]]		// CHECK-SAME: [%[[TILE_M_1]], %[[N3]]]
// CHECK: %[[TILE_M_2:.+]] = affine.min #[[MAP1]](%[[IV0]])[%[[M]], %[[M]]]		// CHECK: %[[TILE_M_2:.+]] = affine.min #[[MAP1]]()[%[[M]], %[[IV0]], %[[M]]]
// CHECK: %[[N2:.+]] = tensor.dim %[[ARG4]], %[[C1]]		// CHECK: %[[N2:.+]] = tensor.dim %[[ARG4]], %[[C1]]
// CHECK: %[[STARG4:.+]] = tensor.extract_slice %[[ARG4]][%[[IV0]], 0]		// CHECK: %[[STARG4:.+]] = tensor.extract_slice %[[ARG4]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_2]], %[[N2]]]		// CHECK-SAME: [%[[TILE_M_2]], %[[N2]]]
// CHECK: %[[N0:.+]] = tensor.dim %[[ARG0]], %[[C1]]		// CHECK: %[[N0:.+]] = tensor.dim %[[ARG0]], %[[C1]]
// CHECK: %[[STARG0:.+]] = tensor.extract_slice %[[ARG0]][%[[IV0]], 0]		// CHECK: %[[STARG0:.+]] = tensor.extract_slice %[[ARG0]][%[[IV0]], 0]
// CHECK-SAME: [%[[TILE_M_2]], %[[N0]]]		// CHECK-SAME: [%[[TILE_M_2]], %[[N0]]]
// CHECK: %[[N1:.+]] = tensor.dim %[[ARG2]], %[[C1]]		// CHECK: %[[N1:.+]] = tensor.dim %[[ARG2]], %[[C1]]
// CHECK: %[[STARG2:.+]] = tensor.extract_slice %[[ARG2]][%[[IV0]], 0]		// CHECK: %[[STARG2:.+]] = tensor.extract_slice %[[ARG2]][%[[IV0]], 0]
Show All 14 Lines

mlir/test/Dialect/Linalg/fusion-tensor-pattern.mlir

	// RUN: mlir-opt %s -test-linalg-tensor-fusion-transform-patterns -resolve-shaped-type-result-dims -canonicalize -cse --split-input-file \| FileCheck %s			// RUN: mlir-opt %s -test-linalg-tensor-fusion-transform-patterns -resolve-shaped-type-result-dims -canonicalize -cse --split-input-file \| FileCheck %s
	// RUN: mlir-opt %s -test-linalg-tiled-loop-fusion-transform-patterns -resolve-shaped-type-result-dims -canonicalize -cse --split-input-file \| FileCheck %s --check-prefix=TLOOP			// RUN: mlir-opt %s -test-linalg-tiled-loop-fusion-transform-patterns -resolve-shaped-type-result-dims -canonicalize -cse --split-input-file \| FileCheck %s --check-prefix=TLOOP

	module {			module {
	func @matmul_fusion(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>,			func @matmul_fusion(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>,
	%AB_init: tensor<?x?xf32>, %C: tensor<?x?xf32>,			%AB_init: tensor<?x?xf32>, %C: tensor<?x?xf32>,
	%ABC_init: tensor<?x?xf32>) -> tensor<?x?xf32> {			%ABC_init: tensor<?x?xf32>) -> tensor<?x?xf32> {
	%AB = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)			%AB = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%AB_init : tensor<?x?xf32>) -> tensor<?x?xf32> // <MxN1> <N1xN2>			outs(%AB_init : tensor<?x?xf32>) -> tensor<?x?xf32> // <MxN1> <N1xN2>
	%ABC = linalg.matmul {__internal_linalg_transform__ = "lhs_fusion"}			%ABC = linalg.matmul {__internal_linalg_transform__ = "lhs_fusion"}
	ins(%AB, %C : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%AB, %C : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%ABC_init : tensor<?x?xf32>) -> tensor<?x?xf32> // <MxN2> <N2xN3>			outs(%ABC_init : tensor<?x?xf32>) -> tensor<?x?xf32> // <MxN2> <N2xN3>
	return %ABC : tensor<?x?xf32>			return %ABC : tensor<?x?xf32>
	}			}
	}			}
	// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0] -> (32, -d0 + s0)>			// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (32, s0 - s1)>
	// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>			// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>
	// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0] -> (64, -d0 + s0)>			// CHECK-DAG: #[[MAP3:.+]] = affine_map<()[s0, s1] -> (64, s0 - s1)>
	// CHECK-DAG: #[[MAP5:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 32, -d0 + s1)>			// CHECK-DAG: #[[MAP5:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 32, -s1 + s2)>

	// CHECK: func @matmul_fusion			// CHECK: func @matmul_fusion
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG4:[a-zA-Z0-9_]+]]: tensor<?x?xf32>

	// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
	// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index			// CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index
	// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index			// CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index
	// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index			// CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index
	// CHECK-DAG: %[[M:.+]] = tensor.dim %[[ARG0]], %[[C0]]			// CHECK-DAG: %[[M:.+]] = tensor.dim %[[ARG0]], %[[C0]]
	// CHECK: %[[RESULT:.+]] = scf.for %[[IV0:[a-zA-Z0-9]+]] =			// CHECK: %[[RESULT:.+]] = scf.for %[[IV0:[a-zA-Z0-9]+]] =
	// CHECK-SAME: %[[C0]] to %[[M]] step %[[C32]]			// CHECK-SAME: %[[C0]] to %[[M]] step %[[C32]]
	// CHECK-SAME: iter_args(%[[ARG6:.+]] = %[[ARG4]]) -> (tensor<?x?xf32>) {			// CHECK-SAME: iter_args(%[[ARG6:.+]] = %[[ARG4]]) -> (tensor<?x?xf32>) {
	// CHECK: %[[TILE_M_2:.+]] = affine.min #[[MAP1]](%[[IV0]])[%[[M]]]			// CHECK: %[[TILE_M_2:.+]] = affine.min #[[MAP1]]()[%[[M]], %[[IV0]]]
	// CHECK: %[[N3:.+]] = tensor.dim %[[ARG6]], %[[C1]]			// CHECK: %[[N3:.+]] = tensor.dim %[[ARG6]], %[[C1]]
	// CHECK: %[[ST_ARG6:.+]] = tensor.extract_slice %[[ARG6]][%[[IV0]], 0]			// CHECK: %[[ST_ARG6:.+]] = tensor.extract_slice %[[ARG6]][%[[IV0]], 0]
	// CHECK-SAME: [%[[TILE_M_2]], %[[N3]]]			// CHECK-SAME: [%[[TILE_M_2]], %[[N3]]]
	// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP5]](%[[IV0]])[%[[M]], %[[M]]]			// CHECK: %[[TILE_M_3:.+]] = affine.min #[[MAP5]]()[%[[M]], %[[IV0]], %[[M]]]
	// CHECK: %[[N1:.+]] = tensor.dim %[[ARG0]], %[[C1]]			// CHECK: %[[N1:.+]] = tensor.dim %[[ARG0]], %[[C1]]
	// CHECK: %[[ST_ARG0:.+]] = tensor.extract_slice %[[ARG0]][%[[IV0]], 0]			// CHECK: %[[ST_ARG0:.+]] = tensor.extract_slice %[[ARG0]][%[[IV0]], 0]
	// CHECK-SAME: [%[[TILE_M_3]], %[[N1]]]			// CHECK-SAME: [%[[TILE_M_3]], %[[N1]]]
	// CHECK: %[[N2_2:.+]] = tensor.dim %[[ARG2]], %[[C1]]			// CHECK: %[[N2_2:.+]] = tensor.dim %[[ARG2]], %[[C1]]
	// CHECK: %[[ST_ARG2:.+]] = tensor.extract_slice %[[ARG2]][%[[IV0]], 0]			// CHECK: %[[ST_ARG2:.+]] = tensor.extract_slice %[[ARG2]][%[[IV0]], 0]
	// CHECK-SAME: [%[[TILE_M_3]], %[[N2_2]]]			// CHECK-SAME: [%[[TILE_M_3]], %[[N2_2]]]
	// CHECK: %[[LHS:.+]] = linalg.matmul			// CHECK: %[[LHS:.+]] = linalg.matmul
	// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion_producer"			// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion_producer"
	// CHECK-SAME: ins(%[[ST_ARG0]], %[[ARG1]] : tensor<?x?xf32>, tensor<?x?xf32>)			// CHECK-SAME: ins(%[[ST_ARG0]], %[[ARG1]] : tensor<?x?xf32>, tensor<?x?xf32>)
	// CHECK-SAME: outs(%[[ST_ARG2]] : tensor<?x?xf32>)			// CHECK-SAME: outs(%[[ST_ARG2]] : tensor<?x?xf32>)
	// CHECK: %[[N2:.+]] = tensor.dim %[[ARG1]], %[[C1]]			// CHECK: %[[N2:.+]] = tensor.dim %[[ARG1]], %[[C1]]
	// CHECK: %[[N3_2:.+]] = tensor.dim %[[ARG3]], %[[C1]]			// CHECK: %[[N3_2:.+]] = tensor.dim %[[ARG3]], %[[C1]]
	// CHECK: %[[YIELD0:.+]] = scf.for %[[IV1:[a-zA-Z0-9]+]] =			// CHECK: %[[YIELD0:.+]] = scf.for %[[IV1:[a-zA-Z0-9]+]] =
	// CHECK-SAME: %[[C0]] to %[[N3_2]] step %[[C64]]			// CHECK-SAME: %[[C0]] to %[[N3_2]] step %[[C64]]
	// CHECK-SAME: iter_args(%[[ARG8:.+]] = %[[ST_ARG6]]) -> (tensor<?x?xf32>) {			// CHECK-SAME: iter_args(%[[ARG8:.+]] = %[[ST_ARG6]]) -> (tensor<?x?xf32>) {
	// CHECK: %[[YIELD1:.+]] = scf.for %[[IV2:[a-zA-Z0-9]+]] =			// CHECK: %[[YIELD1:.+]] = scf.for %[[IV2:[a-zA-Z0-9]+]] =
	// CHECK-SAME: %[[C0]] to %[[N2]] step %[[C16]]			// CHECK-SAME: %[[C0]] to %[[N2]] step %[[C16]]
	// CHECK-SAME: iter_args(%[[ARG10:.+]] = %[[ARG8]]) -> (tensor<?x?xf32>) {			// CHECK-SAME: iter_args(%[[ARG10:.+]] = %[[ARG8]]) -> (tensor<?x?xf32>) {
	// CHECK: %[[TILE_N2:.+]] = affine.min #[[MAP2]](%[[IV2]])[%[[N2]]]			// CHECK: %[[TILE_N2:.+]] = affine.min #[[MAP2]]()[%[[N2]], %[[IV2]]]
	// CHECK: %[[ST_LHS:.+]] = tensor.extract_slice %[[LHS]][0, %[[IV2]]]			// CHECK: %[[ST_LHS:.+]] = tensor.extract_slice %[[LHS]][0, %[[IV2]]]
	// CHECK-SAME: [%[[TILE_M_3]], %[[TILE_N2]]]			// CHECK-SAME: [%[[TILE_M_3]], %[[TILE_N2]]]
	// CHECK: %[[TILE_N3:.+]] = affine.min #[[MAP3]](%[[IV1]])[%[[N3_2]]]			// CHECK: %[[TILE_N3:.+]] = affine.min #[[MAP3]]()[%[[N3_2]], %[[IV1]]]
	// CHECK: %[[ST_ARG3:.+]] = tensor.extract_slice %[[ARG3]][%[[IV2]], %[[IV1]]]			// CHECK: %[[ST_ARG3:.+]] = tensor.extract_slice %[[ARG3]][%[[IV2]], %[[IV1]]]
	// CHECK-SAME: [%[[TILE_N2]], %[[TILE_N3]]]			// CHECK-SAME: [%[[TILE_N2]], %[[TILE_N3]]]
	// CHECK: %[[M_4:.+]] = tensor.dim %[[ARG10]], %[[C0]]			// CHECK: %[[M_4:.+]] = tensor.dim %[[ARG10]], %[[C0]]
	// CHECK: %[[ST_ARG4:.+]] = tensor.extract_slice %[[ARG10]][0, %[[IV1]]]			// CHECK: %[[ST_ARG4:.+]] = tensor.extract_slice %[[ARG10]][0, %[[IV1]]]
	// CHECK-SAME: [%[[M_4]], %[[TILE_N3]]]			// CHECK-SAME: [%[[M_4]], %[[TILE_N3]]]
	// CHECK: %[[ST_RESULT:.+]] = linalg.matmul			// CHECK: %[[ST_RESULT:.+]] = linalg.matmul
	// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion"			// CHECK-SAME: __internal_linalg_transform__ = "after_lhs_fusion"
	// CHECK-SAME: ins(%[[ST_LHS]], %[[ST_ARG3]]			// CHECK-SAME: ins(%[[ST_LHS]], %[[ST_ARG3]]
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/fusion.mlir

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	scf.for %arg6 = %c0 to %0 step %c3 {
memref<?x?xf32, offset: ?, strides: [?, ?]>)		memref<?x?xf32, offset: ?, strides: [?, ?]>)
outs(%8 : memref<?x?xf32, offset: ?, strides: [?, ?]>)		outs(%8 : memref<?x?xf32, offset: ?, strides: [?, ?]>)
}		}
}		}
}		}
return %E : memref<?x?xf32, offset: 0, strides: [?, ?]>		return %E : memref<?x?xf32, offset: 0, strides: [?, ?]>
}		}

// CHECK-DAG: #[[BOUND_2_MAP:.+]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>		// CHECK-DAG: #[[BOUND_2_MAP:.+]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
// CHECK-DAG: #[[BOUND_2_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 2, -d0 + s1)>		// CHECK-DAG: #[[BOUND_2_MAP_2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 2, -s1 + s2)>
// CHECK-DAG: #[[BOUND_4_MAP:.+]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>		// CHECK-DAG: #[[BOUND_4_MAP:.+]] = affine_map<()[s0, s1] -> (4, s0 - s1)>
// CHECK: func @f5		// CHECK: func @f5
// CHECK-SAME: (%[[A:.]]:{{.}}, %[[B:.]]:{{.}}, %[[C:.]]:{{.}}, %[[D:.]]:{{.}}, %[[E:.]]:{{.}})		// CHECK-SAME: (%[[A:.]]:{{.}}, %[[B:.]]:{{.}}, %[[C:.]]:{{.}}, %[[D:.]]:{{.}}, %[[E:.]]:{{.}})
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[A_0:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32, #[[$strided2D]]>		// CHECK-DAG: %[[A_0:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32, #[[$strided2D]]>
// CHECK-DAG: %[[B_1:.*]] = memref.dim %[[B]], %[[C1]] : memref<?x?xf32, #[[$strided2D]]>		// CHECK-DAG: %[[B_1:.*]] = memref.dim %[[B]], %[[C1]] : memref<?x?xf32, #[[$strided2D]]>
// CHECK-DAG: %[[C_0:.*]] = memref.dim %[[C]], %[[C0]] : memref<?x?xf32, #[[$strided2D]]>		// CHECK-DAG: %[[C_0:.*]] = memref.dim %[[C]], %[[C0]] : memref<?x?xf32, #[[$strided2D]]>
// CHECK-DAG: %[[D_0:.*]] = memref.dim %[[D]], %[[C0]] : memref<?x?xf32, #[[$strided2D]]>		// CHECK-DAG: %[[D_0:.*]] = memref.dim %[[D]], %[[C0]] : memref<?x?xf32, #[[$strided2D]]>
// CHECK-DAG: %[[D_1:.*]] = memref.dim %[[D]], %[[C1]] : memref<?x?xf32, #[[$strided2D]]>		// CHECK-DAG: %[[D_1:.*]] = memref.dim %[[D]], %[[C1]] : memref<?x?xf32, #[[$strided2D]]>
// CHECK-DAG: %[[B_00:.]] = memref.subview %[[B]][0, 0]{{.}}		// CHECK-DAG: %[[B_00:.]] = memref.subview %[[B]][0, 0]{{.}}
// CHECK: scf.for %[[I:.]] = %{{.}} to %[[D_0]] step %{{.*}} {		// CHECK: scf.for %[[I:.]] = %{{.}} to %[[D_0]] step %{{.*}} {
// CHECK: %[[BOUND_2_C0:.+]] = affine.min #[[BOUND_2_MAP]](%[[I]])[%[[C_0]]]		// CHECK: %[[BOUND_2_C0:.+]] = affine.min #[[BOUND_2_MAP]]()[%[[C_0]], %[[I]]]
// CHECK: %[[C_I0:.*]] = memref.subview %[[C]][%[[I]], 0] [%[[BOUND_2_C0]]		// CHECK: %[[C_I0:.*]] = memref.subview %[[C]][%[[I]], 0] [%[[BOUND_2_C0]]
// CHECK: %[[BOUND_ID_C0:.+]] = affine.min #[[BOUND_2_MAP_2]](%[[I]])[%[[A_0]], %[[C_0]]]		// CHECK: %[[BOUND_ID_C0:.+]] = affine.min #[[BOUND_2_MAP_2]]()[%[[A_0]], %[[I]], %[[C_0]]]
// CHECK: %[[A_I0:.*]] = memref.subview %[[A]][%[[I]], 0]		// CHECK: %[[A_I0:.*]] = memref.subview %[[A]][%[[I]], 0]
// CHECK: %[[C_I0_OUT:.*]] = memref.subview %[[C]][%[[I]], 0] [%[[BOUND_ID_C0]]		// CHECK: %[[C_I0_OUT:.*]] = memref.subview %[[C]][%[[I]], 0] [%[[BOUND_ID_C0]]
// CHECK: scf.for %[[J:.]] = %{{.}} to %[[B_1]] step %{{.*}} {		// CHECK: scf.for %[[J:.]] = %{{.}} to %[[B_1]] step %{{.*}} {
// CHECK: %[[E_IJ:.*]] = memref.subview %[[E]][%[[I]], %[[J]]]		// CHECK: %[[E_IJ:.*]] = memref.subview %[[E]][%[[I]], %[[J]]]
// CHECK: scf.for %[[K:.]] = %{{.}} to %[[D_1]] step %{{.*}} {		// CHECK: scf.for %[[K:.]] = %{{.}} to %[[D_1]] step %{{.*}} {
// CHECK: %[[D_IK:.*]] = memref.subview %[[D]][%[[I]], %[[K]]] [2, 4]		// CHECK: %[[D_IK:.*]] = memref.subview %[[D]][%[[I]], %[[K]]] [2, 4]
// CHECK: %[[B_KJ:.*]] = memref.subview %[[B]][%[[K]], %[[J]]]		// CHECK: %[[B_KJ:.*]] = memref.subview %[[B]][%[[K]], %[[J]]]
// CHECK: %[[BOUND_4_B1:.*]] = affine.min #[[BOUND_4_MAP]](%[[K]])[%[[B_1]]]		// CHECK: %[[BOUND_4_B1:.*]] = affine.min #[[BOUND_4_MAP]]()[%[[B_1]], %[[K]]]
// CHECK: %[[B_0K:.*]] = memref.subview %[[B]][0, %[[K]]]		// CHECK: %[[B_0K:.*]] = memref.subview %[[B]][0, %[[K]]]
// CHECK: %[[D_IK_OUT:.+]] = memref.subview %[[D]][%[[I]], %[[K]]] [%[[BOUND_2_C0]], %[[BOUND_4_B1]]]		// CHECK: %[[D_IK_OUT:.+]] = memref.subview %[[D]][%[[I]], %[[K]]] [%[[BOUND_2_C0]], %[[BOUND_4_B1]]]
// CHECK: linalg.matmul ins(%[[A_I0]], %[[B_00]]{{.*}} outs(%[[C_I0_OUT]]		// CHECK: linalg.matmul ins(%[[A_I0]], %[[B_00]]{{.*}} outs(%[[C_I0_OUT]]
// CHECK: linalg.matmul ins(%[[C_I0]], %[[B_0K]]{{.*}} outs(%[[D_IK_OUT]]		// CHECK: linalg.matmul ins(%[[C_I0]], %[[B_0K]]{{.*}} outs(%[[D_IK_OUT]]
// CHECK: linalg.matmul ins(%[[D_IK]], %[[B_KJ]]{{.*}} outs(%[[E_IJ]]		// CHECK: linalg.matmul ins(%[[D_IK]], %[[B_KJ]]{{.*}} outs(%[[E_IJ]]

// -----		// -----

▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/hoist-padding.mlir

// Specific structural checks are performed on 2-level hoisting		// Specific structural checks are performed on 2-level hoisting
// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=2 -canonicalize \| FileCheck %s		// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=2 -canonicalize \| FileCheck %s

// IR verification is performed on [0-6]-level hoisting		// IR verification is performed on [0-6]-level hoisting
// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=0 \| FileCheck %s --check-prefix=VERIFIER-ONLY		// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=0 \| FileCheck %s --check-prefix=VERIFIER-ONLY
// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=1 \| FileCheck %s --check-prefix=VERIFIER-ONLY		// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=1 \| FileCheck %s --check-prefix=VERIFIER-ONLY
// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=3 \| FileCheck %s --check-prefix=VERIFIER-ONLY		// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=3 \| FileCheck %s --check-prefix=VERIFIER-ONLY
// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=4 \| FileCheck %s --check-prefix=VERIFIER-ONLY		// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=4 \| FileCheck %s --check-prefix=VERIFIER-ONLY
// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=5 \| FileCheck %s --check-prefix=VERIFIER-ONLY		// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=5 \| FileCheck %s --check-prefix=VERIFIER-ONLY
// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=6 \| FileCheck %s --check-prefix=VERIFIER-ONLY		// RUN: mlir-opt %s -split-input-file -test-linalg-transform-patterns=test-hoist-padding=6 \| FileCheck %s --check-prefix=VERIFIER-ONLY

// CHECK-DAG: #[[$DIV3:[0-9a-z]+]] = affine_map<(d0) -> (d0 ceildiv 3)>
// CHECK-DAG: #[[$DIV4:[0-9a-z]+]] = affine_map<(d0) -> (d0 ceildiv 4)>
// CHECK-DAG: #[[$DIVS3:[0-9a-z]+]] = affine_map<()[s0] -> (s0 ceildiv 3)>		// CHECK-DAG: #[[$DIVS3:[0-9a-z]+]] = affine_map<()[s0] -> (s0 ceildiv 3)>
// CHECK-DAG: #[[$DIVS4:[0-9a-z]+]] = affine_map<()[s0] -> (s0 ceildiv 4)>		// CHECK-DAG: #[[$DIVS4:[0-9a-z]+]] = affine_map<()[s0] -> (s0 ceildiv 4)>
#map0 = affine_map<(d0)[s0] -> (2, -d0 + s0)>		#map0 = affine_map<(d0)[s0] -> (2, -d0 + s0)>
#map1 = affine_map<(d0)[s0] -> (4, -d0 + s0)>		#map1 = affine_map<(d0)[s0] -> (4, -d0 + s0)>
#map2 = affine_map<(d0)[s0] -> (3, -d0 + s0)>		#map2 = affine_map<(d0)[s0] -> (3, -d0 + s0)>
#map3 = affine_map<(d0, d1) -> (2, d0 - d1)>		#map3 = affine_map<(d0, d1) -> (2, d0 - d1)>
#map4 = affine_map<(d0, d1) -> (3, d0 - d1)>		#map4 = affine_map<(d0, d1) -> (3, d0 - d1)>

Show All 25 Lines	func @matmul_tensors(

// CHECK: scf.for %[[I:[0-9a-z]+]] =		// CHECK: scf.for %[[I:[0-9a-z]+]] =
// First padded tensor is MxKx2x4 under loop M so Kx2x4		// First padded tensor is MxKx2x4 under loop M so Kx2x4
// CHECK: %[[SZpad0_K:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[dK]]]		// CHECK: %[[SZpad0_K:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[dK]]]
// CHECK: linalg.init_tensor [%[[SZpad0_K]], 2, 4] : tensor<?x2x4xf32>		// CHECK: linalg.init_tensor [%[[SZpad0_K]], 2, 4] : tensor<?x2x4xf32>
// 1-D loop		// 1-D loop
// CHECK: %[[A:.*]] = scf.for %[[J1:[0-9a-z]+]] =		// CHECK: %[[A:.*]] = scf.for %[[J1:[0-9a-z]+]] =
// Iteration count along J1		// Iteration count along J1
// CHECK: %[[IDXpad0_K:[0-9]+]] = affine.apply #[[$DIV4]](%[[J1]])		// CHECK: %[[IDXpad0_K:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[J1]]]
// CHECK: tensor.extract_slice %{{.*}} [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: tensor.extract_slice %{{.*}} [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: linalg.pad_tensor %{{.*}}		// CHECK: linalg.pad_tensor %{{.*}}
// CHECK: : tensor<?x?xf32> to tensor<2x4xf32>		// CHECK: : tensor<?x?xf32> to tensor<2x4xf32>
// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%[[IDXpad0_K]], 0, 0]		// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%[[IDXpad0_K]], 0, 0]
// CHECK-SAME: [1, 2, 4] [1, 1, 1] : tensor<2x4xf32> into tensor<?x2x4xf32>		// CHECK-SAME: [1, 2, 4] [1, 1, 1] : tensor<2x4xf32> into tensor<?x2x4xf32>
// Second tensor is KxN but loop order is (M, N, K) so padded tensor is NxKx4x3		// Second tensor is KxN but loop order is (M, N, K) so padded tensor is NxKx4x3
// CHECK: %[[SZpad1_N:[0-9]+]] = affine.apply #[[$DIVS3]]()[%[[dN]]]		// CHECK: %[[SZpad1_N:[0-9]+]] = affine.apply #[[$DIVS3]]()[%[[dN]]]
// CHECK: %[[SZpad1_K:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[dK]]]		// CHECK: %[[SZpad1_K:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[dK]]]
// CHECK: linalg.init_tensor [%[[SZpad1_N]], %[[SZpad1_K]], 4, 3] : tensor<?x?x4x3xf32>		// CHECK: linalg.init_tensor [%[[SZpad1_N]], %[[SZpad1_K]], 4, 3] : tensor<?x?x4x3xf32>
// 2-D loop		// 2-D loop
// CHECK: %[[B:.*]] = scf.for %[[K2:[0-9a-z]+]] =		// CHECK: %[[B:.*]] = scf.for %[[K2:[0-9a-z]+]] =
// Iteration count along K2		// Iteration count along K2
// CHECK: %[[IDXpad1_K:[0-9]+]] = affine.apply #[[$DIV3]](%[[K2]])		// CHECK: %[[IDXpad1_K:[0-9]+]] = affine.apply #[[$DIVS3]]()[%[[K2]]]
// CHECK: scf.for %[[J2:[0-9a-z]+]] =		// CHECK: scf.for %[[J2:[0-9a-z]+]] =
// Iteration count along J2		// Iteration count along J2
// CHECK: %[[IDXpad1_N:[0-9]+]] = affine.apply #[[$DIV4]](%[[J2]])		// CHECK: %[[IDXpad1_N:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[J2]]]
// CHECK: tensor.extract_slice %{{.*}} [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: tensor.extract_slice %{{.*}} [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: linalg.pad_tensor %{{.*}}		// CHECK: linalg.pad_tensor %{{.*}}
// CHECK: : tensor<?x?xf32> to tensor<4x3xf32>		// CHECK: : tensor<?x?xf32> to tensor<4x3xf32>
// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%[[IDXpad1_K]], %[[IDXpad1_N]], 0, 0]		// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%[[IDXpad1_K]], %[[IDXpad1_N]], 0, 0]
// CHECK-SAME: [1, 1, 4, 3] [1, 1, 1, 1] : tensor<4x3xf32> into tensor<?x?x4x3xf32>		// CHECK-SAME: [1, 1, 4, 3] [1, 1, 1, 1] : tensor<4x3xf32> into tensor<?x?x4x3xf32>
// 2-D loop		// 2-D loop
// CHECK: scf.for %[[J:[0-9a-zA-Z]+]]		// CHECK: scf.for %[[J:[0-9a-zA-Z]+]]
// CHECK: scf.for %[[K:[0-9a-zA-Z]+]]		// CHECK: scf.for %[[K:[0-9a-zA-Z]+]]
// Iteration count along K		// Iteration count along K
// CHECK: %[[IDXpad0_K:[0-9]+]] = affine.apply #[[$DIV4]](%[[K]])		// CHECK: %[[IDXpad0_K:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[K]]]
// CHECK: %[[stA:.*]] = tensor.extract_slice %[[A]][%[[IDXpad0_K]], 0, 0] [1, 2, 4] [1, 1, 1] :		// CHECK: %[[stA:.*]] = tensor.extract_slice %[[A]][%[[IDXpad0_K]], 0, 0] [1, 2, 4] [1, 1, 1] :
// CHECK-SAME: tensor<?x2x4xf32> to tensor<2x4xf32>		// CHECK-SAME: tensor<?x2x4xf32> to tensor<2x4xf32>
// Iteration count along J		// Iteration count along J
// CHECK: %[[IDXpad1_N:[0-9]+]] = affine.apply #[[$DIV3]](%[[J]])		// CHECK: %[[IDXpad1_N:[0-9]+]] = affine.apply #[[$DIVS3]]()[%[[J]]]
// Iteration count along K		// Iteration count along K
// CHECK: %[[IDXpad1_K:[0-9]+]] = affine.apply #[[$DIV4]](%[[K]])		// CHECK: %[[IDXpad1_K:[0-9]+]] = affine.apply #[[$DIVS4]]()[%[[K]]]
// CHECK: %[[stB:.*]] = tensor.extract_slice %[[B]][%[[IDXpad1_N]], %[[IDXpad1_K]], 0, 0] [1, 1, 4, 3] [1, 1, 1, 1] :		// CHECK: %[[stB:.*]] = tensor.extract_slice %[[B]][%[[IDXpad1_N]], %[[IDXpad1_K]], 0, 0] [1, 1, 4, 3] [1, 1, 1, 1] :
// CHECK-SAME: tensor<?x?x4x3xf32> to tensor<4x3xf32>		// CHECK-SAME: tensor<?x?x4x3xf32> to tensor<4x3xf32>
// CHECK: %[[stC:.]] = linalg.pad_tensor %{{.}}		// CHECK: %[[stC:.]] = linalg.pad_tensor %{{.}}
// CHECK: : tensor<?x?xf32> to tensor<2x3xf32>		// CHECK: : tensor<?x?xf32> to tensor<2x3xf32>
// CHECK: linalg.matmul ins(%[[stA]], %[[stB]] : tensor<2x4xf32>, tensor<4x3xf32>)		// CHECK: linalg.matmul ins(%[[stA]], %[[stB]] : tensor<2x4xf32>, tensor<4x3xf32>)
// CHECK-SAME: outs(%[[stC]] : tensor<2x3xf32>) -> tensor<2x3xf32>		// CHECK-SAME: outs(%[[stC]] : tensor<2x3xf32>) -> tensor<2x3xf32>
%3 = scf.for %arg3 = %c0 to %0 step %c2 iter_args(%arg4 = %arg2) -> (tensor<?x?xf32>) {		%3 = scf.for %arg3 = %c0 to %0 step %c2 iter_args(%arg4 = %arg2) -> (tensor<?x?xf32>) {
%4 = scf.for %arg5 = %c0 to %2 step %c3 iter_args(%arg6 = %arg4) -> (tensor<?x?xf32>) {		%4 = scf.for %arg5 = %c0 to %2 step %c3 iter_args(%arg6 = %arg4) -> (tensor<?x?xf32>) {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	%3 = scf.for %arg3 = %c0 to %0 step %c2 iter_args(%arg4 = %arg2) -> (tensor<?x?xf32>) {
scf.yield %4 : tensor<?x?xf32>		scf.yield %4 : tensor<?x?xf32>
}		}
return %3 : tensor<?x?xf32>		return %3 : tensor<?x?xf32>
}		}

// -----		// -----


// CHECK-DAG: #[[$MIN_REST8:[0-9a-z]+]] = affine_map<(d0)[s0] -> (8, -d0 + s0)>		// CHECK-DAG: #[[$MIN_REST8:[0-9a-z]+]] = affine_map<()[s0, s1] -> (8, s0 - s1)>
// CHECK-DAG: #[[$MIN_REST4:[0-9a-z]+]] = affine_map<(d0, d1) -> (4, d0 - d1)>		// CHECK-DAG: #[[$MIN_REST4:[0-9a-z]+]] = affine_map<()[s0, s1] -> (4, s0 - s1)>
// CHECK-DAG: #[[$MIN_REST2:[0-9a-z]+]] = affine_map<(d0, d1) -> (2, d0 - d1)>		// CHECK-DAG: #[[$MIN_REST2:[0-9a-z]+]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
// CHECK-DAG: #[[$DIV4:[0-9a-z]+]] = affine_map<(d0) -> (d0 ceildiv 4)>		// CHECK-DAG: #[[$DIVS4:[0-9a-z]+]] = affine_map<()[s0] -> (s0 ceildiv 4)>
// CHECK-DAG: #[[$DIV2:[0-9a-z]+]] = affine_map<(d0) -> (d0 ceildiv 2)>		// CHECK-DAG: #[[$DIV2:[0-9a-z]+]] = affine_map<()[s0] -> (s0 ceildiv 2)>
#map0 = affine_map<(d0)[s0] -> (8, -d0 + s0)>		#map0 = affine_map<(d0)[s0] -> (8, -d0 + s0)>
#map1 = affine_map<(d0, d1) -> (4, d0 - d1)>		#map1 = affine_map<(d0, d1) -> (4, d0 - d1)>
#map2 = affine_map<(d0, d1) -> (2, d0 - d1)>		#map2 = affine_map<(d0, d1) -> (2, d0 - d1)>

// CHECK-LABEL: func @dot		// CHECK-LABEL: func @dot
// VERIFIER-ONLY-LABEL: func @dot		// VERIFIER-ONLY-LABEL: func @dot
func @dot(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<f32>)		func @dot(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<f32>)
-> tensor<f32>		-> tensor<f32>
{		{
%c8 = arith.constant 8 : index		%c8 = arith.constant 8 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%1 = tensor.dim %arg0, %c0 : tensor<?xf32>		%1 = tensor.dim %arg0, %c0 : tensor<?xf32>
%2 = tensor.dim %arg0, %c0 : tensor<?xf32>		%2 = tensor.dim %arg0, %c0 : tensor<?xf32>
%3 = tensor.dim %arg1, %c0 : tensor<?xf32>		%3 = tensor.dim %arg1, %c0 : tensor<?xf32>

// CHECK: scf.for %[[I:[0-9a-z]+]] =		// CHECK: scf.for %[[I:[0-9a-z]+]] =
//		//
// CHECK: %[[MR8:.*]] = affine.min #[[$MIN_REST8]](%[[I]])		// CHECK: %[[MR8:.]] = affine.min #[[$MIN_REST8]]()[%{{.}}, %[[I]]]
// CHECK: %[[D0:.*]] = affine.apply #[[$DIV4]](%[[MR8]])		// CHECK: %[[D0:.*]] = affine.apply #[[$DIVS4]]()[%[[MR8]]]
// Init tensor and pack.		// Init tensor and pack.
// CHECK: %[[INIT_PACKED_A:.*]] = linalg.init_tensor [%[[D0]], 2, 2] : tensor<?x2x2xf32>		// CHECK: %[[INIT_PACKED_A:.*]] = linalg.init_tensor [%[[D0]], 2, 2] : tensor<?x2x2xf32>
// CHECK: %[[CAST_INIT_PACKED_A:.*]] = tensor.cast %[[INIT_PACKED_A]] : tensor<?x2x2xf32> to tensor<?x?x2xf32>		// CHECK: %[[CAST_INIT_PACKED_A:.*]] = tensor.cast %[[INIT_PACKED_A]] : tensor<?x2x2xf32> to tensor<?x?x2xf32>
// CHECK: %[[PACKED_A:.]] = scf.for %[[II:[0-9a-z]+]] = {{.}} iter_args(%{{.*}} = %[[CAST_INIT_PACKED_A]]) -> (tensor<?x?x2xf32>) {		// CHECK: %[[PACKED_A:.]] = scf.for %[[II:[0-9a-z]+]] = {{.}} iter_args(%{{.*}} = %[[CAST_INIT_PACKED_A]]) -> (tensor<?x?x2xf32>) {
// CHECK: scf.for %[[III:[0-9a-z]+]] =		// CHECK: scf.for %[[III:[0-9a-z]+]] =
// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%{{.}}, %{{.}}, 0] [1, 1, 2] [1, 1, 1] : tensor<2xf32> into tensor<?x?x2xf32>		// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%{{.}}, %{{.}}, 0] [1, 1, 2] [1, 1, 1] : tensor<2xf32> into tensor<?x?x2xf32>
//		//
// CHECK: %[[D0_2:.*]] = affine.apply #[[$DIV4]](%[[MR8]])		// CHECK: %[[D0_2:.*]] = affine.apply #[[$DIVS4]]()[%[[MR8]]]
// Init tensor and pack.		// Init tensor and pack.
// CHECK: %[[INIT_PACKED_B:.*]] = linalg.init_tensor [%[[D0_2]], 2, 2] : tensor<?x2x2xf32>		// CHECK: %[[INIT_PACKED_B:.*]] = linalg.init_tensor [%[[D0_2]], 2, 2] : tensor<?x2x2xf32>
// CHECK: %[[CAST_INIT_PACKED_B:.*]] = tensor.cast %[[INIT_PACKED_B]] : tensor<?x2x2xf32> to tensor<?x?x2xf32>		// CHECK: %[[CAST_INIT_PACKED_B:.*]] = tensor.cast %[[INIT_PACKED_B]] : tensor<?x2x2xf32> to tensor<?x?x2xf32>
// CHECK: %[[PACKED_B:.]] = scf.for %[[II_2:[0-9a-z]+]] = {{.}} iter_args(%{{.*}} = %[[CAST_INIT_PACKED_B]]) -> (tensor<?x?x2xf32>) {		// CHECK: %[[PACKED_B:.]] = scf.for %[[II_2:[0-9a-z]+]] = {{.}} iter_args(%{{.*}} = %[[CAST_INIT_PACKED_B]]) -> (tensor<?x?x2xf32>) {
// CHECK: scf.for %[[III_2:[0-9a-z]+]] =		// CHECK: scf.for %[[III_2:[0-9a-z]+]] =
// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%{{.}}, %{{.}}, 0] [1, 1, 2] [1, 1, 1] : tensor<2xf32> into tensor<?x?x2xf32>		// CHECK: tensor.insert_slice %{{.}} into %{{.}}[%{{.}}, %{{.}}, 0] [1, 1, 2] [1, 1, 1] : tensor<2xf32> into tensor<?x?x2xf32>
// Compute.		// Compute.
// CHECK: scf.for %[[II_3:[0-9a-z]+]] =		// CHECK: scf.for %[[II_3:[0-9a-z]+]] =
// CHECK: scf.for %[[III_3:[0-9a-z]+]] = {{.}} iter_args(%[[C:.]] = %{{.*}}) -> (tensor<f32>) {		// CHECK: scf.for %[[III_3:[0-9a-z]+]] = {{.}} iter_args(%[[C:.]] = %{{.*}}) -> (tensor<f32>) {
// CHECK: %[[IDX0:.*]] = affine.apply #[[$DIV4]](%[[II_3]])		// CHECK: %[[IDX0:.*]] = affine.apply #[[$DIVS4]]()[%[[II_3]]]
// CHECK: %[[IDX1:.*]] = affine.apply #[[$DIV2]](%[[III_3]])		// CHECK: %[[IDX1:.*]] = affine.apply #[[$DIV2]]()[%[[III_3]]]
// CHECK: %[[A:.*]] = tensor.extract_slice %[[PACKED_A]][%[[IDX0]], %[[IDX1]], 0] [1, 1, 2] [1, 1, 1] : tensor<?x?x2xf32> to tensor<2xf32>		// CHECK: %[[A:.*]] = tensor.extract_slice %[[PACKED_A]][%[[IDX0]], %[[IDX1]], 0] [1, 1, 2] [1, 1, 1] : tensor<?x?x2xf32> to tensor<2xf32>
// CHECK: %[[IDX0_2:.*]] = affine.apply #[[$DIV4]](%[[II_3]])		// CHECK: %[[IDX0_2:.*]] = affine.apply #[[$DIVS4]]()[%[[II_3]]]
// CHECK: %[[IDX1_2:.*]] = affine.apply #[[$DIV2]](%[[III_3]])		// CHECK: %[[IDX1_2:.*]] = affine.apply #[[$DIV2]]()[%[[III_3]]]
// CHECK: %[[B:.*]] = tensor.extract_slice %[[PACKED_B]][%[[IDX0_2]], %[[IDX1_2]], 0] [1, 1, 2] [1, 1, 1] : tensor<?x?x2xf32> to tensor<2xf32>		// CHECK: %[[B:.*]] = tensor.extract_slice %[[PACKED_B]][%[[IDX0_2]], %[[IDX1_2]], 0] [1, 1, 2] [1, 1, 1] : tensor<?x?x2xf32> to tensor<2xf32>
// CHECK: linalg.dot ins(%[[A]], %[[B]] : tensor<2xf32>, tensor<2xf32>) outs(%[[C]] : tensor<f32>) -> tensor<f32>		// CHECK: linalg.dot ins(%[[A]], %[[B]] : tensor<2xf32>, tensor<2xf32>) outs(%[[C]] : tensor<f32>) -> tensor<f32>

%4 = scf.for %arg3 = %c0 to %1 step %c8 iter_args(%arg4 = %arg2) -> (tensor<f32>) {		%4 = scf.for %arg3 = %c0 to %1 step %c8 iter_args(%arg4 = %arg2) -> (tensor<f32>) {
%5 = affine.min #map0(%arg3)[%2]		%5 = affine.min #map0(%arg3)[%2]
%6 = tensor.extract_slice %arg0[%arg3] [%5] [1] : tensor<?xf32> to tensor<?xf32>		%6 = tensor.extract_slice %arg0[%arg3] [%5] [1] : tensor<?xf32> to tensor<?xf32>
%7 = affine.min #map0(%arg3)[%3]		%7 = affine.min #map0(%arg3)[%3]
%8 = tensor.extract_slice %arg1[%arg3] [%7] [1] : tensor<?xf32> to tensor<?xf32>		%8 = tensor.extract_slice %arg1[%arg3] [%7] [1] : tensor<?xf32> to tensor<?xf32>
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/loops.mlir

	// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck %s			// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck %s
	// RUN: mlir-opt %s -convert-linalg-to-parallel-loops \| FileCheck --check-prefix=CHECKPARALLEL %s			// RUN: mlir-opt %s -convert-linalg-to-parallel-loops \| FileCheck --check-prefix=CHECKPARALLEL %s

	// Test that we can lower all the way to LLVM without crashing, don't check results here.			// Test that we can lower all the way to LLVM without crashing, don't check results here.
	// RUN: mlir-opt %s -convert-linalg-to-loops -convert-linalg-to-llvm -o=/dev/null 2>&1			// RUN: mlir-opt %s -convert-linalg-to-loops -convert-linalg-to-llvm -o=/dev/null 2>&1

	// CHECK-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>			// CHECK-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
	// CHECK-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>			// CHECK-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
	// CHECK-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>			// CHECK-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
	// CHECK-DAG: #[[$stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>			// CHECK-DAG: #[[$stride1Dilation1:.*]] = affine_map<()[s0, s1] -> (s0 + s1)>

	// CHECKPARALLEL-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>			// CHECKPARALLEL-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
	// CHECKPARALLEL-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>			// CHECKPARALLEL-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
	// CHECKPARALLEL-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>			// CHECKPARALLEL-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
	// CHECKPARALLEL-DAG: #[[$stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>			// CHECKPARALLEL-DAG: #[[$stride1Dilation1:.*]] = affine_map<()[s0, s1] -> (s0 + s1)>

	func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {			func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%A = memref.view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32>			%A = memref.view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32>
	%B = memref.view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32>			%B = memref.view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32>
	%C = memref.view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32>			%C = memref.view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32>
	linalg.matmul ins(%A, %B: memref<?x?xf32>, memref<?x?xf32>)			linalg.matmul ins(%A, %B: memref<?x?xf32>, memref<?x?xf32>)
	▲ Show 20 Lines • Show All 675 Lines • ▼ Show 20 Lines
	// CHECK-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>			// CHECK-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>
	// CHECK-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>			// CHECK-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>
	// CHECK: %[[c0:.*]] = arith.constant 0 : index			// CHECK: %[[c0:.*]] = arith.constant 0 : index
	// CHECK: %[[c1:.*]] = arith.constant 1 : index			// CHECK: %[[c1:.*]] = arith.constant 1 : index
	// CHECK: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?xf32>			// CHECK: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?xf32>
	// CHECK: %[[dim1:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?xf32>			// CHECK: %[[dim1:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?xf32>
	// CHECK: scf.for %[[b:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {			// CHECK: scf.for %[[b:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {
	// CHECK: scf.for %[[m:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {			// CHECK: scf.for %[[m:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {
	// CHECK: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]](%[[b]], %[[m]])			// CHECK: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[b]], %[[m]]]
	// CHECK: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]]] : memref<?xf32>			// CHECK: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]]] : memref<?xf32>
	// CHECK: %[[va:.*]] = memref.load %[[arg1]][%[[m]]] : memref<?xf32>			// CHECK: %[[va:.*]] = memref.load %[[arg1]][%[[m]]] : memref<?xf32>
	// CHECK: %[[vc:.*]] = memref.load %[[arg2]][%[[b]]] : memref<?xf32>			// CHECK: %[[vc:.*]] = memref.load %[[arg2]][%[[b]]] : memref<?xf32>
	// CHECK: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32			// CHECK: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32
	// CHECK: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32			// CHECK: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32
	// CHECK: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>			// CHECK: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>

	// CHECKPARALLEL-LABEL: @conv1d_no_symbols			// CHECKPARALLEL-LABEL: @conv1d_no_symbols
	// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?xf32>			// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?xf32>
	// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>			// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>
	// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>			// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>
	// CHECKPARALLEL: %[[c0:.*]] = arith.constant 0 : index			// CHECKPARALLEL: %[[c0:.*]] = arith.constant 0 : index
	// CHECKPARALLEL: %[[c1:.*]] = arith.constant 1 : index			// CHECKPARALLEL: %[[c1:.*]] = arith.constant 1 : index
	// CHECKPARALLEL: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?xf32>			// CHECKPARALLEL: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?xf32>
	// CHECKPARALLEL: %[[dim1:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?xf32>			// CHECKPARALLEL: %[[dim1:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?xf32>
	// CHECKPARALLEL: scf.parallel (%[[b:.*]]) = (%[[c0]]) to (%[[dim1]]) step (%[[c1]]) {			// CHECKPARALLEL: scf.parallel (%[[b:.*]]) = (%[[c0]]) to (%[[dim1]]) step (%[[c1]]) {
	// CHECKPARALLEL: scf.for %[[m:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {			// CHECKPARALLEL: scf.for %[[m:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {
	// CHECKPARALLEL: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]](%[[b]], %[[m]])			// CHECKPARALLEL: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[m]], %[[b]]]
	// CHECKPARALLEL: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]]] : memref<?xf32>			// CHECKPARALLEL: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]]] : memref<?xf32>
	// CHECKPARALLEL: %[[va:.*]] = memref.load %[[arg1]][%[[m]]] : memref<?xf32>			// CHECKPARALLEL: %[[va:.*]] = memref.load %[[arg1]][%[[m]]] : memref<?xf32>
	// CHECKPARALLEL: %[[vc:.*]] = memref.load %[[arg2]][%[[b]]] : memref<?xf32>			// CHECKPARALLEL: %[[vc:.*]] = memref.load %[[arg2]][%[[b]]] : memref<?xf32>
	// CHECKPARALLEL: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32			// CHECKPARALLEL: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32
	// CHECKPARALLEL: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32			// CHECKPARALLEL: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32
	// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>			// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>


	Show All 11 Lines
	// CHECK: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?x?xf32>			// CHECK: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?x?xf32>
	// CHECK: %[[dim1:.*]] = memref.dim %[[arg1]], %[[c1]] : memref<?x?xf32>			// CHECK: %[[dim1:.*]] = memref.dim %[[arg1]], %[[c1]] : memref<?x?xf32>
	// CHECK: %[[dim2:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?x?xf32>			// CHECK: %[[dim2:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?x?xf32>
	// CHECK: %[[dim3:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?xf32>			// CHECK: %[[dim3:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?xf32>
	// CHECK: scf.for %[[arg3:.*]] = %[[c0]] to %[[dim2]] step %[[c1]] {			// CHECK: scf.for %[[arg3:.*]] = %[[c0]] to %[[dim2]] step %[[c1]] {
	// CHECK: scf.for %[[arg4:.*]] = %[[c0]] to %[[dim3]] step %[[c1]] {			// CHECK: scf.for %[[arg4:.*]] = %[[c0]] to %[[dim3]] step %[[c1]] {
	// CHECK: scf.for %[[arg5:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {			// CHECK: scf.for %[[arg5:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {
	// CHECK: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {			// CHECK: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {
	// CHECK: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg3]], %[[arg5]])			// CHECK: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg3]], %[[arg5]]]
	// CHECK: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg4]], %[[arg6]])			// CHECK: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg4]], %[[arg6]]]
	// CHECK: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]]] : memref<?x?xf32>			// CHECK: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]]] : memref<?x?xf32>

	// CHECK: %[[va:.*]] = memref.load %[[arg1]][%[[arg5]], %[[arg6]]] : memref<?x?xf32>			// CHECK: %[[va:.*]] = memref.load %[[arg1]][%[[arg5]], %[[arg6]]] : memref<?x?xf32>
	// CHECK: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>			// CHECK: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>

	// CHECK: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32			// CHECK: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32
	// CHECK: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32			// CHECK: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32
	// CHECK: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>			// CHECK: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>

	// CHECKPARALLEL-LABEL: @conv2d_no_symbols			// CHECKPARALLEL-LABEL: @conv2d_no_symbols
	// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?xf32>			// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?xf32>
	// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?xf32>			// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?xf32>
	// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?xf32>			// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?xf32>
	// CHECKPARALLEL: %[[c0:.*]] = arith.constant 0 : index			// CHECKPARALLEL: %[[c0:.*]] = arith.constant 0 : index
	// CHECKPARALLEL: %[[c1:.*]] = arith.constant 1 : index			// CHECKPARALLEL: %[[c1:.*]] = arith.constant 1 : index
	// CHECKPARALLEL: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?x?xf32>			// CHECKPARALLEL: %[[dim0:.*]] = memref.dim %[[arg1]], %[[c0]] : memref<?x?xf32>
	// CHECKPARALLEL: %[[dim1:.*]] = memref.dim %[[arg1]], %[[c1]] : memref<?x?xf32>			// CHECKPARALLEL: %[[dim1:.*]] = memref.dim %[[arg1]], %[[c1]] : memref<?x?xf32>
	// CHECKPARALLEL: %[[dim2:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?x?xf32>			// CHECKPARALLEL: %[[dim2:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?x?xf32>
	// CHECKPARALLEL: %[[dim3:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?xf32>			// CHECKPARALLEL: %[[dim3:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?xf32>
	// CHECKPARALLEL: scf.parallel (%[[arg3:.]], %[[arg4:.]]) = (%[[c0]], %[[c0]]) to (%[[dim2]], %[[dim3]]) step (%[[c1]], %[[c1]]) {			// CHECKPARALLEL: scf.parallel (%[[arg3:.]], %[[arg4:.]]) = (%[[c0]], %[[c0]]) to (%[[dim2]], %[[dim3]]) step (%[[c1]], %[[c1]]) {
	// CHECKPARALLEL: scf.for %[[arg5:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {			// CHECKPARALLEL: scf.for %[[arg5:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {
	// CHECKPARALLEL: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {			// CHECKPARALLEL: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {
	// CHECKPARALLEL: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg3]], %[[arg5]])			// CHECKPARALLEL: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg5]], %[[arg3]]]
	// CHECKPARALLEL: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg4]], %[[arg6]])			// CHECKPARALLEL: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg6]], %[[arg4]]]
	// CHECKPARALLEL: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]]] : memref<?x?xf32>			// CHECKPARALLEL: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]]] : memref<?x?xf32>
	// CHECKPARALLEL: %[[va:.*]] = memref.load %[[arg1]][%[[arg5]], %[[arg6]]] : memref<?x?xf32>			// CHECKPARALLEL: %[[va:.*]] = memref.load %[[arg1]][%[[arg5]], %[[arg6]]] : memref<?x?xf32>
	// CHECKPARALLEL: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>			// CHECKPARALLEL: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>
	// CHECKPARALLEL: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32			// CHECKPARALLEL: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32
	// CHECKPARALLEL: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32			// CHECKPARALLEL: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32
	// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>			// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]]] : memref<?x?xf32>


	Show All 17 Lines
	// CHECK: %[[dim4:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?x?xf32>			// CHECK: %[[dim4:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?x?xf32>
	// CHECK: %[[dim5:.*]] = memref.dim %[[arg2]], %[[c2]] : memref<?x?x?xf32>			// CHECK: %[[dim5:.*]] = memref.dim %[[arg2]], %[[c2]] : memref<?x?x?xf32>
	// CHECK: scf.for %[[arg3:.*]] = %[[c0]] to %[[dim3]] step %[[c1]] {			// CHECK: scf.for %[[arg3:.*]] = %[[c0]] to %[[dim3]] step %[[c1]] {
	// CHECK: scf.for %[[arg4:.*]] = %[[c0]] to %[[dim4]] step %[[c1]] {			// CHECK: scf.for %[[arg4:.*]] = %[[c0]] to %[[dim4]] step %[[c1]] {
	// CHECK: scf.for %[[arg5:.*]] = %[[c0]] to %[[dim5]] step %[[c1]] {			// CHECK: scf.for %[[arg5:.*]] = %[[c0]] to %[[dim5]] step %[[c1]] {
	// CHECK: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {			// CHECK: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {
	// CHECK: scf.for %[[arg7:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {			// CHECK: scf.for %[[arg7:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {
	// CHECK: scf.for %[[arg8:.*]] = %[[c0]] to %[[dim2]] step %[[c1]] {			// CHECK: scf.for %[[arg8:.*]] = %[[c0]] to %[[dim2]] step %[[c1]] {
	// CHECK: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg3]], %[[arg6]])			// CHECK: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg3]], %[[arg6]]]
	// CHECK: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg4]], %[[arg7]])			// CHECK: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg4]], %[[arg7]]]
	// CHECK: %[[aff3:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg5]], %[[arg8]])			// CHECK: %[[aff3:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg5]], %[[arg8]]]
	// CHECK: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>			// CHECK: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>

	// CHECK: %[[va:.*]] = memref.load %[[arg1]][%[[arg6]], %[[arg7]], %[[arg8]]] : memref<?x?x?xf32>			// CHECK: %[[va:.*]] = memref.load %[[arg1]][%[[arg6]], %[[arg7]], %[[arg8]]] : memref<?x?x?xf32>
	// CHECK: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>			// CHECK: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>

	// CHECK: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32			// CHECK: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32
	// CHECK: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32			// CHECK: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32
	// CHECK: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>			// CHECK: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>
	Show All 10 Lines
	// CHECKPARALLEL: %[[dim2:.*]] = memref.dim %[[arg1]], %[[c2]] : memref<?x?x?xf32>			// CHECKPARALLEL: %[[dim2:.*]] = memref.dim %[[arg1]], %[[c2]] : memref<?x?x?xf32>
	// CHECKPARALLEL: %[[dim3:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?x?x?xf32>			// CHECKPARALLEL: %[[dim3:.*]] = memref.dim %[[arg2]], %[[c0]] : memref<?x?x?xf32>
	// CHECKPARALLEL: %[[dim4:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?x?xf32>			// CHECKPARALLEL: %[[dim4:.*]] = memref.dim %[[arg2]], %[[c1]] : memref<?x?x?xf32>
	// CHECKPARALLEL: %[[dim5:.*]] = memref.dim %[[arg2]], %[[c2]] : memref<?x?x?xf32>			// CHECKPARALLEL: %[[dim5:.*]] = memref.dim %[[arg2]], %[[c2]] : memref<?x?x?xf32>
	// CHECKPARALLEL: scf.parallel (%[[arg3:.]], %[[arg4:.]], %[[arg5:.*]]) = (%[[c0]], %[[c0]], %[[c0]]) to (%[[dim3]], %[[dim4]], %[[dim5]]) step (%[[c1]], %[[c1]], %[[c1]]) {			// CHECKPARALLEL: scf.parallel (%[[arg3:.]], %[[arg4:.]], %[[arg5:.*]]) = (%[[c0]], %[[c0]], %[[c0]]) to (%[[dim3]], %[[dim4]], %[[dim5]]) step (%[[c1]], %[[c1]], %[[c1]]) {
	// CHECKPARALLEL: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {			// CHECKPARALLEL: scf.for %[[arg6:.*]] = %[[c0]] to %[[dim0]] step %[[c1]] {
	// CHECKPARALLEL: scf.for %[[arg7:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {			// CHECKPARALLEL: scf.for %[[arg7:.*]] = %[[c0]] to %[[dim1]] step %[[c1]] {
	// CHECKPARALLEL: scf.for %[[arg8:.*]] = %[[c0]] to %[[dim2]] step %[[c1]] {			// CHECKPARALLEL: scf.for %[[arg8:.*]] = %[[c0]] to %[[dim2]] step %[[c1]] {
	// CHECKPARALLEL: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg3]], %[[arg6]])			// CHECKPARALLEL: %[[aff:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg6]], %[[arg3]]]
	// CHECKPARALLEL: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg4]], %[[arg7]])			// CHECKPARALLEL: %[[aff2:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg7]], %[[arg4]]]
	// CHECKPARALLEL: %[[aff3:.*]] = affine.apply #[[$stride1Dilation1]](%[[arg5]], %[[arg8]])			// CHECKPARALLEL: %[[aff3:.*]] = affine.apply #[[$stride1Dilation1]]()[%[[arg8]], %[[arg5]]]
	// CHECKPARALLEL: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>			// CHECKPARALLEL: %[[vb:.*]] = memref.load %[[arg0]][%[[aff]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>
	// CHECKPARALLEL: %[[va:.*]] = memref.load %[[arg1]][%[[arg6]], %[[arg7]], %[[arg8]]] : memref<?x?x?xf32>			// CHECKPARALLEL: %[[va:.*]] = memref.load %[[arg1]][%[[arg6]], %[[arg7]], %[[arg8]]] : memref<?x?x?xf32>
	// CHECKPARALLEL: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>			// CHECKPARALLEL: %[[vc:.*]] = memref.load %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>
	// CHECKPARALLEL: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32			// CHECKPARALLEL: %[[inc:.*]] = arith.mulf %[[vb]], %[[va]] : f32
	// CHECKPARALLEL: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32			// CHECKPARALLEL: %[[res:.*]] = arith.addf %[[vc]], %[[inc]] : f32
	// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>			// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[arg3]], %[[arg4]], %[[arg5]]] : memref<?x?x?xf32>

	// -----			// -----
	Show All 23 Lines

mlir/test/Dialect/Linalg/pad-and-hoist.mlir

// RUN: mlir-opt %s -test-linalg-transform-patterns="test-pad-pattern pack-paddings=1,1,0 hoist-paddings=2,1,0" -cse -canonicalize -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -test-linalg-transform-patterns="test-pad-pattern pack-paddings=1,1,0 hoist-paddings=2,1,0" -cse -canonicalize -split-input-file \| FileCheck %s
// RUN: mlir-opt %s -test-linalg-transform-patterns="test-pad-pattern pack-paddings=1,1,0 hoist-paddings=4,3,0" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=CHECK-DOUBLE		// RUN: mlir-opt %s -test-linalg-transform-patterns="test-pad-pattern pack-paddings=1,1,0 hoist-paddings=4,3,0" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=CHECK-DOUBLE

// CHECK-DAG: #[[MAP0:[0-9a-z]+]] = affine_map<(d0) -> (5, -d0 + 24)>		// CHECK-DAG: #[[MAP0:[0-9a-z]+]] = affine_map<()[s0] -> (5, -s0 + 24)>
// CHECK-DAG: #[[MAP1:[0-9a-z]+]] = affine_map<(d0) -> (8, -d0 + 12)>		// CHECK-DAG: #[[MAP1:[0-9a-z]+]] = affine_map<()[s0] -> (8, -s0 + 12)>
// CHECK-DAG: #[[DIV6:[0-9a-z]+]] = affine_map<(d0) -> (d0 ceildiv 6)>		// CHECK-DAG: #[[DIV6:[0-9a-z]+]] = affine_map<()[s0] -> (s0 ceildiv 6)>
#map0 = affine_map<(d0) -> (5, -d0 + 24)>		#map0 = affine_map<(d0) -> (5, -d0 + 24)>
#map1 = affine_map<(d0) -> (8, -d0 + 12)>		#map1 = affine_map<(d0) -> (8, -d0 + 12)>
#map2 = affine_map<(d0) -> (7, -d0 + 25)>		#map2 = affine_map<(d0) -> (7, -d0 + 25)>

// CHECK: single_tiling		// CHECK: single_tiling
// CHECK-DOUBLE: single_tiling		// CHECK-DOUBLE: single_tiling

// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<24x12xf32>		// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<24x12xf32>
Show All 14 Lines	func @single_tiling(%arg0: tensor<24x12xf32>,
%c5 = arith.constant 5 : index		%c5 = arith.constant 5 : index

// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =
%0 = scf.for %arg3 = %c0 to %c24 step %c5 iter_args(%arg4 = %arg2) -> (tensor<24x25xf32>) {		%0 = scf.for %arg3 = %c0 to %c24 step %c5 iter_args(%arg4 = %arg2) -> (tensor<24x25xf32>) {

// Packing the first input operand for all values of IV2 (IV2x5x6).		// Packing the first input operand for all values of IV2 (IV2x5x6).
// CHECK: = linalg.init_tensor [2, 5, 6]		// CHECK: = linalg.init_tensor [2, 5, 6]
// CHECK: %[[PT0:.*]] = scf.for %[[P0IV2:[0-9a-z]+]] =		// CHECK: %[[PT0:.*]] = scf.for %[[P0IV2:[0-9a-z]+]] =
// CHECK: %[[PIDX0:.*]] = affine.apply #[[DIV6]](%[[P0IV2]])		// CHECK: %[[PIDX0:.*]] = affine.apply #[[DIV6]]()[%[[P0IV2]]]
// CHECK: %[[TS0:.*]] = affine.min #[[MAP0]](%[[IV0]])		// CHECK: %[[TS0:.*]] = affine.min #[[MAP0]]()[%[[IV0]]]
// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]		// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]
// CHECK-SAME: %[[IV0]], %[[P0IV2]]		// CHECK-SAME: %[[IV0]], %[[P0IV2]]
// CHECK-SAME: %[[TS0]], 6		// CHECK-SAME: %[[TS0]], 6
// CHECK: %[[V0:.*]] = arith.subi %[[C5]], %[[TS0]]		// CHECK: %[[V0:.*]] = arith.subi %[[C5]], %[[TS0]]
// CHECK: %[[T1:.]] = linalg.pad_tensor %[[T0]] nofold {{.}} high[%[[V0]]		// CHECK: %[[T1:.]] = linalg.pad_tensor %[[T0]] nofold {{.}} high[%[[V0]]
// CHECK: %[[T2:.]] = tensor.insert_slice %[[T1:.]] into %{{.*}}[%[[PIDX0]], 0, 0]		// CHECK: %[[T2:.]] = tensor.insert_slice %[[T1:.]] into %{{.*}}[%[[PIDX0]], 0, 0]
// CHECK: scf.yield %[[T2:.*]]		// CHECK: scf.yield %[[T2:.*]]

// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =
%1 = scf.for %arg5 = %c0 to %c25 step %c7 iter_args(%arg6 = %arg4) -> (tensor<24x25xf32>) {		%1 = scf.for %arg5 = %c0 to %c25 step %c7 iter_args(%arg6 = %arg4) -> (tensor<24x25xf32>) {

// Packing the second input operand for all values of IV2 (IV2x6x8).		// Packing the second input operand for all values of IV2 (IV2x6x8).
// CHECK: = linalg.init_tensor [2, 6, 8]		// CHECK: = linalg.init_tensor [2, 6, 8]
// CHECK: %[[PT1:.*]] = scf.for %[[P1IV2:[0-9a-z]+]] =		// CHECK: %[[PT1:.*]] = scf.for %[[P1IV2:[0-9a-z]+]] =
// CHECK: %[[PIDX1:.*]] = affine.apply #[[DIV6]](%[[P1IV2]])		// CHECK: %[[PIDX1:.*]] = affine.apply #[[DIV6]]()[%[[P1IV2]]]
// CHECK: %[[TS1:.*]] = affine.min #[[MAP1]](%[[IV1]])		// CHECK: %[[TS1:.*]] = affine.min #[[MAP1]]()[%[[IV1]]]
// CHECK: %[[T3:.*]] = tensor.extract_slice %[[ARG1]]		// CHECK: %[[T3:.*]] = tensor.extract_slice %[[ARG1]]
// CHECK-SAME: %[[P1IV2]], %[[IV1]]		// CHECK-SAME: %[[P1IV2]], %[[IV1]]
// CHECK-SAME: 6, %[[TS1]]		// CHECK-SAME: 6, %[[TS1]]
// CHECK: %[[V1:.*]] = arith.subi %[[C8]], %[[TS1]]		// CHECK: %[[V1:.*]] = arith.subi %[[C8]], %[[TS1]]
// CHECK: %[[T4:.]] = linalg.pad_tensor %[[T3]] nofold {{.}} high[%[[C0]], %[[V1]]		// CHECK: %[[T4:.]] = linalg.pad_tensor %[[T3]] nofold {{.}} high[%[[C0]], %[[V1]]
// CHECK: %[[T5:.]] = tensor.insert_slice %[[T4:.]] into %{{.*}}[%[[PIDX1]], 0, 0]		// CHECK: %[[T5:.]] = tensor.insert_slice %[[T4:.]] into %{{.*}}[%[[PIDX1]], 0, 0]
// CHECK: scf.yield %[[T5:.*]]		// CHECK: scf.yield %[[T5:.*]]

// CHECK: scf.for %[[IV2:[0-9a-zA-Z]]] = {{.}} iter_args(%[[ARG4:.*]] =		// CHECK: scf.for %[[IV2:[0-9a-zA-Z]]] = {{.}} iter_args(%[[ARG4:.*]] =
%2 = scf.for %arg7 = %c0 to %c12 step %c6 iter_args(%arg8 = %arg6) -> (tensor<24x25xf32>) {		%2 = scf.for %arg7 = %c0 to %c12 step %c6 iter_args(%arg8 = %arg6) -> (tensor<24x25xf32>) {
%3 = affine.min #map0(%arg3)		%3 = affine.min #map0(%arg3)
// Index the packed operands.		// Index the packed operands.
// CHECK-DAG: %[[IDX:.*]] = affine.apply #[[DIV6]](%[[IV2]])		// CHECK-DAG: %[[IDX:.*]] = affine.apply #[[DIV6]]()[%[[IV2]]]
// CHECK-DAG: %[[T6:.*]] = tensor.extract_slice %[[PT0]][%[[IDX]]		// CHECK-DAG: %[[T6:.*]] = tensor.extract_slice %[[PT0]][%[[IDX]]
// CHECK-DAG: %[[T7:.*]] = tensor.extract_slice %[[PT1]][%[[IDX]]		// CHECK-DAG: %[[T7:.*]] = tensor.extract_slice %[[PT1]][%[[IDX]]
%4 = tensor.extract_slice %arg0[%arg3, %arg7] [%3, 6] [1, 1] : tensor<24x12xf32> to tensor<?x6xf32>		%4 = tensor.extract_slice %arg0[%arg3, %arg7] [%3, 6] [1, 1] : tensor<24x12xf32> to tensor<?x6xf32>
%5 = affine.min #map1(%arg5)		%5 = affine.min #map1(%arg5)
%6 = tensor.extract_slice %arg1[%arg7, %arg5] [6, %5] [1, 1] : tensor<12x25xf32> to tensor<6x?xf32>		%6 = tensor.extract_slice %arg1[%arg7, %arg5] [6, %5] [1, 1] : tensor<12x25xf32> to tensor<6x?xf32>

// Pad the output operand without setting the nofold attribute.		// Pad the output operand without setting the nofold attribute.
// CHECK-DAG: %[[T8:.*]] = tensor.extract_slice %[[ARG4]][%[[IV0]], %[[IV1]]		// CHECK-DAG: %[[T8:.*]] = tensor.extract_slice %[[ARG4]][%[[IV0]], %[[IV1]]
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/reshape_fusion.mlir

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	^bb0(%arg3: i32, %arg4: i32, %s: i32):
%6 = arith.index_cast %idx2 : index to i32		%6 = arith.index_cast %idx2 : index to i32
%7 = arith.addi %5, %6 : i32		%7 = arith.addi %5, %6 : i32
linalg.yield %7 : i32		linalg.yield %7 : i32
} -> tensor<?x?x?xi32>		} -> tensor<?x?x?xi32>
return %1 : tensor<?x?x?xi32>		return %1 : tensor<?x?x?xi32>
}		}

// Only check the body in the indexed version of the test.		// Only check the body in the indexed version of the test.
// CHECK: #[[MAP:.+]] = affine_map<(d0, d1) -> (d0 + d1 * 4)>		// CHECK: #[[MAP:.+]] = affine_map<()[s0, s1] -> (s0 + s1 * 4)>
// CHECK: func @indexed_consumer_reshape_producer_fusion		// CHECK: func @indexed_consumer_reshape_producer_fusion
// CHECK: linalg.generic		// CHECK: linalg.generic
// CHECK: ^{{.*}}(		// CHECK: ^{{.*}}(
// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: i32, %[[ARG4:[a-zA-Z0-9]+]]: i32,		// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: i32, %[[ARG4:[a-zA-Z0-9]+]]: i32,
// CHECK-SAME: %[[ARG8:[a-zA-Z0-9]+]]: i32)		// CHECK-SAME: %[[ARG8:[a-zA-Z0-9]+]]: i32)
// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index		// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index
// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index		// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index
// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index		// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index
// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index		// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index
// CHECK-DAG: %[[T3:.+]] = affine.apply #[[MAP]](%[[IDX1]], %[[IDX0]])		// CHECK-DAG: %[[T3:.+]] = affine.apply #[[MAP]]()[%[[IDX1]], %[[IDX0]]]
// CHECK: %[[T4:.+]] = arith.muli %[[ARG3]], %[[ARG4]]		// CHECK: %[[T4:.+]] = arith.muli %[[ARG3]], %[[ARG4]]
// CHECK: %[[T5:.+]] = arith.index_cast %[[T3]]		// CHECK: %[[T5:.+]] = arith.index_cast %[[T3]]
// CHECK: %[[T6:.+]] = arith.addi %[[T4]], %[[T5]]		// CHECK: %[[T6:.+]] = arith.addi %[[T4]], %[[T5]]
// CHECK: %[[T7:.+]] = arith.index_cast %[[IDX2]]		// CHECK: %[[T7:.+]] = arith.index_cast %[[IDX2]]
// CHECK: %[[T8:.+]] = arith.addi %[[T6]], %[[T7]]		// CHECK: %[[T8:.+]] = arith.addi %[[T6]], %[[T7]]
// CHECK: %[[T9:.+]] = arith.index_cast %[[IDX3]]		// CHECK: %[[T9:.+]] = arith.index_cast %[[IDX3]]
// CHECK: %[[T10:.+]] = arith.addi %[[T8]], %[[T9]]		// CHECK: %[[T10:.+]] = arith.addi %[[T8]], %[[T9]]
// CHECK: linalg.yield %[[T10]]		// CHECK: linalg.yield %[[T10]]
Show All 21 Lines	^bb0(%arg3: i32, %arg4: i32, %s: i32): // no predecessors
linalg.yield %5 : i32		linalg.yield %5 : i32
} -> tensor<?x?xi32>		} -> tensor<?x?xi32>
%1 = linalg.tensor_expand_shape %0 [[0], [1, 2, 3]] :		%1 = linalg.tensor_expand_shape %0 [[0], [1, 2, 3]] :
tensor<?x?xi32> into tensor<?x?x4x5xi32>		tensor<?x?xi32> into tensor<?x?x4x5xi32>
return %1 : tensor<?x?x4x5xi32>		return %1 : tensor<?x?x4x5xi32>
}		}

// Only check the body in the indexed version of the test.		// Only check the body in the indexed version of the test.
// CHECK: #[[MAP:.+]] = affine_map<(d0, d1, d2) -> (d0 + d1 * 5 + d2 * 20)>		// CHECK: #[[MAP:.+]] = affine_map<()[s0, s1, s2] -> (s0 + s1 * 5 + s2 * 20)>
// CHECK: func @indexed_producer_reshape_consumer_fusion		// CHECK: func @indexed_producer_reshape_consumer_fusion
// CHECK: linalg.generic		// CHECK: linalg.generic
// CHECK: ^{{.*}}(		// CHECK: ^{{.*}}(
// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: i32, %[[ARG4:[a-zA-Z0-9]+]]: i32,		// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: i32, %[[ARG4:[a-zA-Z0-9]+]]: i32,
// CHECK-SAME: %[[ARG5:[a-zA-Z0-9]+]]: i32)		// CHECK-SAME: %[[ARG5:[a-zA-Z0-9]+]]: i32)
// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index		// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index
// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index		// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index
// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index		// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index
// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index		// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index
// CHECK-DAG: %[[T3:.+]] = affine.apply #[[MAP]](%[[IDX3]], %[[IDX2]], %[[IDX1]])		// CHECK-DAG: %[[T3:.+]] = affine.apply #[[MAP]]()[%[[IDX3]], %[[IDX2]], %[[IDX1]]]
// CHECK: %[[T4:.+]] = arith.muli %[[ARG3]], %[[ARG4]]		// CHECK: %[[T4:.+]] = arith.muli %[[ARG3]], %[[ARG4]]
// CHECK: %[[T5:.+]] = arith.index_cast %[[IDX0]]		// CHECK: %[[T5:.+]] = arith.index_cast %[[IDX0]]
// CHECK: %[[T6:.+]] = arith.addi %[[T4]], %[[T5]]		// CHECK: %[[T6:.+]] = arith.addi %[[T4]], %[[T5]]
// CHECK: %[[T7:.+]] = arith.index_cast %[[T3]]		// CHECK: %[[T7:.+]] = arith.index_cast %[[T3]]
// CHECK: %[[T8:.+]] = arith.addi %[[T6]], %[[T7]]		// CHECK: %[[T8:.+]] = arith.addi %[[T6]], %[[T7]]
// CHECK: linalg.yield %[[T8]]		// CHECK: linalg.yield %[[T8]]

// -----		// -----
Show All 26 Lines	%d = linalg.tensor_expand_shape %c [[0, 1], [2], [3, 4, 5]]
: tensor<6x4x210xi32> into tensor<2x3x4x5x6x7xi32>		: tensor<6x4x210xi32> into tensor<2x3x4x5x6x7xi32>
return %d : tensor<2x3x4x5x6x7xi32>		return %d : tensor<2x3x4x5x6x7xi32>
}		}


// CHECK-DAG: #[[MAP5:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d3, d4, d0, d1, d5)>		// CHECK-DAG: #[[MAP5:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d3, d4, d0, d1, d5)>
// CHECK-DAG: #[[MAP6:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d3, d4, d5)>		// CHECK-DAG: #[[MAP6:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d3, d4, d5)>
// CHECK-DAG: #[[MAP7:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d5, d2, d3, d4)>		// CHECK-DAG: #[[MAP7:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d5, d2, d3, d4)>
// CHECK-DAG: #[[MAP8:.+]] = affine_map<(d0, d1) -> (d0 + d1 * 3)>		// CHECK-DAG: #[[MAP8:.+]] = affine_map<()[s0, s1] -> (s0 + s1 * 3)>
// CHECK-DAG: #[[MAP9:.+]] = affine_map<(d0, d1, d2) -> (d0 + d1 * 7 + d2 * 42)>		// CHECK-DAG: #[[MAP9:.+]] = affine_map<()[s0, s1, s2] -> (s0 + s1 * 7 + s2 * 42)>
// CHECK: func @reshape_as_consumer_permutation		// CHECK: func @reshape_as_consumer_permutation
// CHECK-SAME: %[[ARG0:.+]]: tensor<210x6x4xi32>		// CHECK-SAME: %[[ARG0:.+]]: tensor<210x6x4xi32>
// CHECK-SAME: %[[ARG1:.+]]: tensor<210x4xi32>		// CHECK-SAME: %[[ARG1:.+]]: tensor<210x4xi32>
// CHECK-DAG: %[[T1:.+]] = linalg.tensor_expand_shape %[[ARG0]]		// CHECK-DAG: %[[T1:.+]] = linalg.tensor_expand_shape %[[ARG0]]
// CHECK-SAME: [0, 1, 2], [3, 4], [5]		// CHECK-SAME: [0, 1, 2], [3, 4], [5]
// CHECK-DAG: %[[T2:.+]] = linalg.tensor_expand_shape %[[ARG1]]		// CHECK-DAG: %[[T2:.+]] = linalg.tensor_expand_shape %[[ARG1]]
// CHECK-SAME: [0, 1, 2], [3]		// CHECK-SAME: [0, 1, 2], [3]
// CHECK-DAG: %[[T0:.+]] = linalg.init_tensor [2, 3, 4, 5, 6, 7]		// CHECK-DAG: %[[T0:.+]] = linalg.init_tensor [2, 3, 4, 5, 6, 7]
// CHECK: %[[T4:.+]] = linalg.generic		// CHECK: %[[T4:.+]] = linalg.generic
// CHECK-SAME: indexing_maps = [#[[MAP5]], #[[MAP6]], #[[MAP7]]]		// CHECK-SAME: indexing_maps = [#[[MAP5]], #[[MAP6]], #[[MAP7]]]
// CHECK-SAME: ins(%[[T1]], %[[T2]] : tensor<5x6x7x2x3x4xi32>, tensor<5x6x7x4xi32>)		// CHECK-SAME: ins(%[[T1]], %[[T2]] : tensor<5x6x7x2x3x4xi32>, tensor<5x6x7x4xi32>)
// CHECK-SAME: outs(%[[T0]] : tensor<2x3x4x5x6x7xi32>)		// CHECK-SAME: outs(%[[T0]] : tensor<2x3x4x5x6x7xi32>)
// CHECK: ^{{.+}}(		// CHECK: ^{{.+}}(
// CHECK-SAME: %[[ARG8:[a-zA-Z0-9]+]]: i32, %[[ARG9:[a-zA-Z0-9]+]]: i32,		// CHECK-SAME: %[[ARG8:[a-zA-Z0-9]+]]: i32, %[[ARG9:[a-zA-Z0-9]+]]: i32,
// CHECK-SAME: %[[ARG10:[a-zA-Z0-9]+]]: i32)		// CHECK-SAME: %[[ARG10:[a-zA-Z0-9]+]]: i32)
// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index		// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index
// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index		// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index
// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index		// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index
// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index		// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index
// CHECK-DAG: %[[IDX4:.+]] = linalg.index 4 : index		// CHECK-DAG: %[[IDX4:.+]] = linalg.index 4 : index
// CHECK-DAG: %[[IDX5:.+]] = linalg.index 5 : index		// CHECK-DAG: %[[IDX5:.+]] = linalg.index 5 : index
// CHECK-DAG: %[[T5:.+]] = affine.apply #[[MAP8]](%[[IDX1]], %[[IDX0]])		// CHECK-DAG: %[[T5:.+]] = affine.apply #[[MAP8]]()[%[[IDX1]], %[[IDX0]]]
// CHECK-DAG: %[[T6:.+]] = affine.apply #[[MAP9]](%[[IDX4]], %[[IDX3]], %[[IDX2]])		// CHECK-DAG: %[[T6:.+]] = affine.apply #[[MAP9]]()[%[[IDX4]], %[[IDX3]], %[[IDX2]]]
// CHECK-DAG: %[[T7:.+]] = arith.addi %[[ARG8]], %[[ARG9]]		// CHECK-DAG: %[[T7:.+]] = arith.addi %[[ARG8]], %[[ARG9]]
// CHECK: %[[T8:.+]] = arith.index_cast %[[T5]]		// CHECK: %[[T8:.+]] = arith.index_cast %[[T5]]
// CHECK: %[[T9:.+]] = arith.addi %[[T7]], %[[T8]]		// CHECK: %[[T9:.+]] = arith.addi %[[T7]], %[[T8]]
// CHECK: %[[T10:.+]] = arith.index_cast %[[T6]]		// CHECK: %[[T10:.+]] = arith.index_cast %[[T6]]
// CHECK: %[[T11:.+]] = arith.addi %[[T9]], %[[T10]]		// CHECK: %[[T11:.+]] = arith.addi %[[T9]], %[[T10]]
// CHECK: %[[T12:.+]] = arith.index_cast %[[IDX5]]		// CHECK: %[[T12:.+]] = arith.index_cast %[[IDX5]]
// CHECK: %[[T13:.+]] = arith.addi %[[T11]], %[[T12]]		// CHECK: %[[T13:.+]] = arith.addi %[[T11]], %[[T12]]

Show All 22 Lines	^bb0(%arg1: i32, %s: i32): // no predecessors
%7 = arith.addi %5, %6 : i32		%7 = arith.addi %5, %6 : i32
linalg.yield %7 : i32		linalg.yield %7 : i32
} -> tensor<264x?x4xi32>		} -> tensor<264x?x4xi32>
return %1 : tensor<264x?x4xi32>		return %1 : tensor<264x?x4xi32>
}		}

// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>		// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>		// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1) -> (d0 + d1 * 8)>		// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1] -> (s0 + s1 * 8)>
// CHECK: @reshape_as_producer_projected_permutation		// CHECK: @reshape_as_producer_projected_permutation
// CHECK-SAME: %[[ARG0:.+]]: tensor<33x8x?xi32>		// CHECK-SAME: %[[ARG0:.+]]: tensor<33x8x?xi32>
// CHECK: %[[RES:.+]] = linalg.generic		// CHECK: %[[RES:.+]] = linalg.generic
// CHECK-SAME: indexing_maps = [#[[MAP0]], #[[MAP1]]]		// CHECK-SAME: indexing_maps = [#[[MAP0]], #[[MAP1]]]
// CHECK-SAME: ins(%[[ARG0]] : tensor<33x8x?xi32>)		// CHECK-SAME: ins(%[[ARG0]] : tensor<33x8x?xi32>)
// CHECK: ^{{.+}}(		// CHECK: ^{{.+}}(
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: i32,		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: i32,
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: i32)		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: i32)
// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index		// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index
// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index		// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index
// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index		// CHECK-DAG: %[[IDX2:.+]] = linalg.index 2 : index
// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index		// CHECK-DAG: %[[IDX3:.+]] = linalg.index 3 : index
// CHECK-DAG: %[[T0:.+]] = affine.apply #[[MAP2]](%[[IDX1]], %[[IDX0]])		// CHECK-DAG: %[[T0:.+]] = affine.apply #[[MAP2]]()[%[[IDX1]], %[[IDX0]]]
// CHECK: %[[T1:.+]] = arith.index_cast %[[T0]] : index to i32		// CHECK: %[[T1:.+]] = arith.index_cast %[[T0]] : index to i32
// CHECK: %[[T2:.+]] = arith.addi %[[ARG1]], %[[T1]] : i32		// CHECK: %[[T2:.+]] = arith.addi %[[ARG1]], %[[T1]] : i32
// CHECK: %[[T3:.+]] = arith.index_cast %[[IDX2]] : index to i32		// CHECK: %[[T3:.+]] = arith.index_cast %[[IDX2]] : index to i32
// CHECK: %[[T4:.+]] = arith.addi %[[T2]], %[[T3]] : i32		// CHECK: %[[T4:.+]] = arith.addi %[[T2]], %[[T3]] : i32
// CHECK: %[[T5:.+]] = arith.index_cast %[[IDX3]] : index to i32		// CHECK: %[[T5:.+]] = arith.index_cast %[[IDX3]] : index to i32
// CHECK: %[[T6:.+]] = arith.addi %[[T4]], %[[T5]] : i32		// CHECK: %[[T6:.+]] = arith.addi %[[T4]], %[[T5]] : i32
// CHECK: linalg.yield %[[T6]] : i32		// CHECK: linalg.yield %[[T6]] : i32
// CHECK: %[[RES2:.+]] = linalg.tensor_collapse_shape %[[RES]]		// CHECK: %[[RES2:.+]] = linalg.tensor_collapse_shape %[[RES]]
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile-and-fuse-on-tensors.mlir

// RUN: mlir-opt %s -linalg-tile-and-fuse-tensor-ops="tile-sizes=5,4,7 tile-interchange=1,0,2" -cse -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -linalg-tile-and-fuse-tensor-ops="tile-sizes=5,4,7 tile-interchange=1,0,2" -cse -split-input-file \| FileCheck %s

// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0) -> (5, -d0 + 24)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0] -> (5, -s0 + 24)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0) -> (7, -d0 + 12)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (7, -s0 + 12)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1) -> (d0, -d1 + 24)>		// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0, s1] -> (s0, -s1 + 24)>
// CHECK-DAG: #[[MAP3:.*]] = affine_map<(d0, d1) -> (d0, -d1 + 12)>		// CHECK-DAG: #[[MAP3:.*]] = affine_map<()[s0, s1] -> (s0, -s1 + 12)>

// CHECK: fuse_input		// CHECK: fuse_input
// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<24x12xf32>		// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<24x12xf32>
builtin.func @fuse_input(%arg0: tensor<24x12xf32>,		builtin.func @fuse_input(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {		%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c12 = arith.constant 12 : index		%c12 = arith.constant 12 : index
%c25 = arith.constant 25 : index		%c25 = arith.constant 25 : index
%c24 = arith.constant 24 : index		%c24 = arith.constant 24 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%0 = linalg.fill(%cst, %arg0) : f32, tensor<24x12xf32> -> tensor<24x12xf32>		%0 = linalg.fill(%cst, %arg0) : f32, tensor<24x12xf32> -> tensor<24x12xf32>

// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =
// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =
// CHECK: %[[TS1:.*]] = affine.min #[[MAP0]](%[[IV1]])		// CHECK: %[[TS1:.*]] = affine.min #[[MAP0]]()[%[[IV1]]]
// CHECK: scf.for %[[IV2:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV2:[0-9a-zA-Z]*]] =
// CHECK: %[[TS2:.*]] = affine.min #[[MAP1]](%[[IV2]])		// CHECK: %[[TS2:.*]] = affine.min #[[MAP1]]()[%[[IV2]]]

// Tile both input operand dimensions.		// Tile both input operand dimensions.
// CHECK: %[[UB1:.*]] = affine.min #[[MAP2]](%[[TS1]], %[[IV1]])		// CHECK: %[[UB1:.*]] = affine.min #[[MAP2]]()[%[[TS1]], %[[IV1]]]
// CHECK: %[[UB2:.*]] = affine.min #[[MAP3]](%[[TS2]], %[[IV2]])		// CHECK: %[[UB2:.*]] = affine.min #[[MAP3]]()[%[[TS2]], %[[IV2]]]
// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]		// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]
// CHECK-SAME: %[[IV1]], %[[IV2]]		// CHECK-SAME: %[[IV1]], %[[IV2]]
// CHECK-SAME: %[[UB1]], %[[UB2]]		// CHECK-SAME: %[[UB1]], %[[UB2]]
// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])		// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])
// CHECK: %{{.*}} = linalg.matmul ins(%[[T1]]		// CHECK: %{{.*}} = linalg.matmul ins(%[[T1]]
%1 = linalg.matmul ins(%0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>		%1 = linalg.matmul ins(%0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>
return %1 : tensor<24x25xf32>		return %1 : tensor<24x25xf32>
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0) -> (5, -d0 + 24)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0] -> (5, -s0 + 24)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0) -> (4, -d0 + 25)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (4, -s0 + 25)>

// CHECK: fuse_output		// CHECK: fuse_output
// CHECK-SAME: %[[ARG2:[0-9a-zA-Z]*]]: tensor<24x25xf32>		// CHECK-SAME: %[[ARG2:[0-9a-zA-Z]*]]: tensor<24x25xf32>
builtin.func @fuse_output(%arg0: tensor<24x12xf32>,		builtin.func @fuse_output(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {		%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c12 = arith.constant 12 : index		%c12 = arith.constant 12 : index
%c25 = arith.constant 25 : index		%c25 = arith.constant 25 : index
%c24 = arith.constant 24 : index		%c24 = arith.constant 24 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%0 = linalg.fill(%cst, %arg2) : f32, tensor<24x25xf32> -> tensor<24x25xf32>		%0 = linalg.fill(%cst, %arg2) : f32, tensor<24x25xf32> -> tensor<24x25xf32>

// Update the iteration argument of the outermost tile loop.		// Update the iteration argument of the outermost tile loop.
// CHECK: scf.for %[[IV0:.]] = {{.}} iter_args(%[[ARG3:.*]] = %[[ARG2]]		// CHECK: scf.for %[[IV0:.]] = {{.}} iter_args(%[[ARG3:.*]] = %[[ARG2]]
// CHECK: scf.for %[[IV1:.]] = {{.}} iter_args(%[[ARG4:.*]] = %[[ARG3]]		// CHECK: scf.for %[[IV1:.]] = {{.}} iter_args(%[[ARG4:.*]] = %[[ARG3]]
// CHECK: %[[TS1:.*]] = affine.min #[[MAP0]](%[[IV1]])		// CHECK: %[[TS1:.*]] = affine.min #[[MAP0]]()[%[[IV1]]]
// CHECK: %[[TS0:.*]] = affine.min #[[MAP1]](%[[IV0]])		// CHECK: %[[TS0:.*]] = affine.min #[[MAP1]]()[%[[IV0]]]

// Tile the both output operand dimensions.		// Tile the both output operand dimensions.
// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG4]]		// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG4]]
// CHECK-SAME: %[[IV1]], %[[IV0]]		// CHECK-SAME: %[[IV1]], %[[IV0]]
// CHECK-SAME: %[[TS1]], %[[TS0]]		// CHECK-SAME: %[[TS1]], %[[TS0]]
// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])		// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])
// CHECK: scf.for %[[IV2:.]] = {{.}} iter_args(%[[ARG5:.*]] = %[[T1]]		// CHECK: scf.for %[[IV2:.]] = {{.}} iter_args(%[[ARG5:.*]] = %[[T1]]
// CHECK: %{{.}} = linalg.matmul {{.}} outs(%[[ARG5]]		// CHECK: %{{.}} = linalg.matmul {{.}} outs(%[[ARG5]]
%1 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%0 : tensor<24x25xf32>) -> tensor<24x25xf32>		%1 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%0 : tensor<24x25xf32>) -> tensor<24x25xf32>
return %1 : tensor<24x25xf32>		return %1 : tensor<24x25xf32>
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0) -> (4, -d0 + 25)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0] -> (4, -s0 + 25)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0) -> (7, -d0 + 12)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (7, -s0 + 12)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1) -> (d0, -d1 + 25)>		// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0, s1] -> (s0, -s1 + 25)>
// CHECK-DAG: #[[MAP3:.*]] = affine_map<(d0, d1) -> (d0, -d1 + 12)>		// CHECK-DAG: #[[MAP3:.*]] = affine_map<()[s0, s1] -> (s0, -s1 + 12)>
#map0 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>		#map0 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>		#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>

// CHECK: fuse_reduction		// CHECK: fuse_reduction
// CHECK-SAME: %[[ARG1:[0-9a-zA-Z]*]]: tensor<12x25xf32>		// CHECK-SAME: %[[ARG1:[0-9a-zA-Z]*]]: tensor<12x25xf32>
// CHECK-SAME: %[[ARG3:[0-9a-zA-Z]*]]: tensor<12x7x25xf32>		// CHECK-SAME: %[[ARG3:[0-9a-zA-Z]*]]: tensor<12x7x25xf32>
builtin.func @fuse_reduction(%arg0: tensor<24x12xf32>,		builtin.func @fuse_reduction(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>,		%arg2: tensor<24x25xf32>,
%arg3: tensor<12x7x25xf32>) -> tensor<24x25xf32> {		%arg3: tensor<12x7x25xf32>) -> tensor<24x25xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c12 = arith.constant 12 : index		%c12 = arith.constant 12 : index
%c25 = arith.constant 25 : index		%c25 = arith.constant 25 : index
%c24 = arith.constant 24 : index		%c24 = arith.constant 24 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%0 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "reduction", "parallel"]} ins(%arg3 : tensor<12x7x25xf32>) outs(%arg1 : tensor<12x25xf32>) {		%0 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "reduction", "parallel"]} ins(%arg3 : tensor<12x7x25xf32>) outs(%arg1 : tensor<12x25xf32>) {
^bb0(%arg4: f32, %arg5: f32): // no predecessors		^bb0(%arg4: f32, %arg5: f32): // no predecessors
%2 = arith.addf %arg4, %arg5 : f32		%2 = arith.addf %arg4, %arg5 : f32
linalg.yield %2 : f32		linalg.yield %2 : f32
} -> tensor<12x25xf32>		} -> tensor<12x25xf32>

// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =
// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =
// CHECK: %[[TS0:.*]] = affine.min #[[MAP0]](%[[IV0]])		// CHECK: %[[TS0:.*]] = affine.min #[[MAP0]]()[%[[IV0]]]
// CHECK: scf.for %[[IV2:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV2:[0-9a-zA-Z]*]] =
// CHECK: %[[TS2:.*]] = affine.min #[[MAP1]](%[[IV2]])		// CHECK: %[[TS2:.*]] = affine.min #[[MAP1]]()[%[[IV2]]]
// CHECK: %[[UB2:.*]] = affine.min #[[MAP3]](%[[TS2]], %[[IV2]])		// CHECK: %[[UB2:.*]] = affine.min #[[MAP3]]()[%[[TS2]], %[[IV2]]]
// CHECK: %[[UB0:.*]] = affine.min #[[MAP2]](%[[TS0]], %[[IV0]])		// CHECK: %[[UB0:.*]] = affine.min #[[MAP2]]()[%[[TS0]], %[[IV0]]]

// Tile only the parallel dimensions but not the reduction dimension.		// Tile only the parallel dimensions but not the reduction dimension.
// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG3]]		// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG3]]
// CHECK-SAME: %[[IV2]], 0, %[[IV0]]		// CHECK-SAME: %[[IV2]], 0, %[[IV0]]
// CHECK-SAME: %[[UB2]], 7, %[[UB0]]		// CHECK-SAME: %[[UB2]], 7, %[[UB0]]
// CHECK: %[[T1:.*]] = tensor.extract_slice %[[ARG1]]		// CHECK: %[[T1:.*]] = tensor.extract_slice %[[ARG1]]
// CHECK-SAME: %[[IV2]], %[[IV0]]		// CHECK-SAME: %[[IV2]], %[[IV0]]
// CHECK-SAME: %[[UB2]], %[[UB0]]		// CHECK-SAME: %[[UB2]], %[[UB0]]
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	builtin.func @fuse_input_and_output(%arg0: tensor<24x12xf32>,
// CHECK: %[[T3:.]] = linalg.fill(%{{.}}, %[[T2]])		// CHECK: %[[T3:.]] = linalg.fill(%{{.}}, %[[T2]])
// CHECK: %{{.}} = linalg.matmul ins(%[[T3]], {{.}} outs(%[[ARG5]]		// CHECK: %{{.}} = linalg.matmul ins(%[[T3]], {{.}} outs(%[[ARG5]]
%2 = linalg.matmul ins(%0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%1 : tensor<24x25xf32>) -> tensor<24x25xf32>		%2 = linalg.matmul ins(%0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%1 : tensor<24x25xf32>) -> tensor<24x25xf32>
return %2 : tensor<24x25xf32>		return %2 : tensor<24x25xf32>
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1] -> (s0 + s1)>
#map0 = affine_map<(d0, d1) -> (d1, d0)>		#map0 = affine_map<(d0, d1) -> (d1, d0)>

// CHECK: fuse_indexed		// CHECK: fuse_indexed
// CHECK-SAME: %[[ARG1:[0-9a-zA-Z]*]]: tensor<12x25xi32>		// CHECK-SAME: %[[ARG1:[0-9a-zA-Z]*]]: tensor<12x25xi32>
builtin.func @fuse_indexed(%arg0: tensor<24x12xi32>,		builtin.func @fuse_indexed(%arg0: tensor<24x12xi32>,
%arg1: tensor<12x25xi32>,		%arg1: tensor<12x25xi32>,
%arg2: tensor<24x25xi32>) -> tensor<24x25xi32> {		%arg2: tensor<24x25xi32>) -> tensor<24x25xi32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
Show All 14 Lines	builtin.func @fuse_indexed(%arg0: tensor<24x12xi32>,
// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =
// CHECK: scf.for %[[IV2:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV2:[0-9a-zA-Z]*]] =

// Shift the indexes by the slice offsets and swap the offsets due to the transposed indexing map.		// Shift the indexes by the slice offsets and swap the offsets due to the transposed indexing map.
// CHECK: %[[T1:.*]] = tensor.extract_slice %[[ARG1]]		// CHECK: %[[T1:.*]] = tensor.extract_slice %[[ARG1]]
// CHECK-SAME: %[[IV2]], %[[IV0]]		// CHECK-SAME: %[[IV2]], %[[IV0]]
// CHECK: linalg.generic {{.*}} outs(%[[T1]]		// CHECK: linalg.generic {{.*}} outs(%[[T1]]
// CHECK: %[[IDX0:.*]] = linalg.index 0		// CHECK: %[[IDX0:.*]] = linalg.index 0
// CHECK: %[[IDX0_SHIFTED:.*]] = affine.apply #[[MAP0]](%[[IDX0]], %[[IV0]])		// CHECK: %[[IDX0_SHIFTED:.*]] = affine.apply #[[MAP0]]()[%[[IV0]], %[[IDX0]]]
// CHECK: %[[IDX1:.*]] = linalg.index 1		// CHECK: %[[IDX1:.*]] = linalg.index 1
// CHECK: %[[IDX1_SHIFTED:.*]] = affine.apply #[[MAP0]](%[[IDX1]], %[[IV2]])		// CHECK: %[[IDX1_SHIFTED:.*]] = affine.apply #[[MAP0]]()[%[[IV2]], %[[IDX1]]]
// CHECK: %{{.*}} = arith.addi %[[IDX0_SHIFTED]], %[[IDX1_SHIFTED]]		// CHECK: %{{.*}} = arith.addi %[[IDX0_SHIFTED]], %[[IDX1_SHIFTED]]
%1 = linalg.matmul ins(%arg0, %0 : tensor<24x12xi32>, tensor<12x25xi32>) outs(%arg2 : tensor<24x25xi32>) -> tensor<24x25xi32>		%1 = linalg.matmul ins(%arg0, %0 : tensor<24x12xi32>, tensor<12x25xi32>) outs(%arg2 : tensor<24x25xi32>) -> tensor<24x25xi32>
return %1 : tensor<24x25xi32>		return %1 : tensor<24x25xi32>
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1] -> (s0 + s1)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0, d1) -> (8, -d0 - d1 + 17)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0, s1] -> (8, -s0 - s1 + 17)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1, d2) -> (d0, -d1 - d2 + 17)>		// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0, s1, s2] -> (s2, -s0 - s1 + 17)>

#map0 = affine_map<(d0, d1) -> (d0, d0 + d1)>		#map0 = affine_map<(d0, d1) -> (d0, d0 + d1)>
#map1 = affine_map<(d0, d1) -> (d0, d1)>		#map1 = affine_map<(d0, d1) -> (d0, d1)>

// CHECK: fuse_non_rectangular		// CHECK: fuse_non_rectangular
// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<10x17xf32>		// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<10x17xf32>
func @fuse_non_rectangular(%arg0: tensor<10x17xf32>,		func @fuse_non_rectangular(%arg0: tensor<10x17xf32>,
%arg1: tensor<10x8xf32>) -> tensor<10x8xf32> {		%arg1: tensor<10x8xf32>) -> tensor<10x8xf32> {

// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index		// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index		// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
// CHECK-DAG: %[[C8:.*]] = arith.constant 8 : index		// CHECK-DAG: %[[C8:.*]] = arith.constant 8 : index
// CHECK-DAG: %[[C10:.*]] = arith.constant 10 : index		// CHECK-DAG: %[[C10:.*]] = arith.constant 10 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%0 = linalg.fill(%cst, %arg0) : f32, tensor<10x17xf32> -> tensor<10x17xf32>		%0 = linalg.fill(%cst, %arg0) : f32, tensor<10x17xf32> -> tensor<10x17xf32>

// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] = %[[C0]] to %[[C8]] step %[[C4]]		// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] = %[[C0]] to %[[C8]] step %[[C4]]
// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] = %[[C0]] to %[[C10]] step %[[C5]]		// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] = %[[C0]] to %[[C10]] step %[[C5]]

// Compute producer on a hyper rectangular bounding box. Along the second dimenson,		// Compute producer on a hyper rectangular bounding box. Along the second dimenson,
// the offset is set to the sum of the induction variables, and the upper bound		// the offset is set to the sum of the induction variables, and the upper bound
// to either 8 (tile size) or 17 (sum of max indices (9+7) then + 1) minus the		// to either 8 (tile size) or 17 (sum of max indices (9+7) then + 1) minus the
// induction variables.		// induction variables.
// CHECK: %[[SUM:.*]] = affine.apply #[[MAP0]](%[[IV1]], %[[IV0]]		// CHECK: %[[SUM:.*]] = affine.apply #[[MAP0]]()[%[[IV1]], %[[IV0]]
// CHECK: %[[TS1:.*]] = affine.min #[[MAP1]](%[[IV1]], %[[IV0]]		// CHECK: %[[TS1:.*]] = affine.min #[[MAP1]]()[%[[IV1]], %[[IV0]]
// CHECK: %[[UB1:.*]] = affine.min #[[MAP2]](%[[TS1]], %[[IV1]], %[[IV0]]		// CHECK: %[[UB1:.*]] = affine.min #[[MAP2]]()[%[[IV1]], %[[IV0]], %[[TS1]]
// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]		// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]
// CHECK-SAME: %[[IV1]], %[[SUM]]		// CHECK-SAME: %[[IV1]], %[[SUM]]
// CHECK-SAME: , %[[UB1]]		// CHECK-SAME: , %[[UB1]]
// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])		// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])
%1 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel"]} ins(%0 : tensor<10x17xf32>) outs(%arg1 : tensor<10x8xf32>) {		%1 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel"]} ins(%0 : tensor<10x17xf32>) outs(%arg1 : tensor<10x8xf32>) {
^bb0(%arg2: f32, %arg3: f32): // no predecessors		^bb0(%arg2: f32, %arg3: f32): // no predecessors
%2 = arith.addf %arg2, %arg3 : f32		%2 = arith.addf %arg2, %arg3 : f32
linalg.yield %2 : f32		linalg.yield %2 : f32
} -> tensor<10x8xf32>		} -> tensor<10x8xf32>
return %1 : tensor<10x8xf32>		return %1 : tensor<10x8xf32>
}		}

mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir

Show All 24 Lines	%4 = scf.for %arg5 = %c0 to %2 step %c3 iter_args(%arg6 = %arg4) -> (tensor<?x?xf32>) {
}		}
scf.yield %5 : tensor<?x?xf32>		scf.yield %5 : tensor<?x?xf32>
}		}
scf.yield %4 : tensor<?x?xf32>		scf.yield %4 : tensor<?x?xf32>
}		}
return %3 : tensor<?x?xf32>		return %3 : tensor<?x?xf32>
}		}

// CHECK: #[[BOUND2_MAP:.+]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>		// CHECK: #[[BOUND2_MAP:.+]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
// CHECK: #[[BOUND4_MAP:.+]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>		// CHECK: #[[BOUND4_MAP:.+]] = affine_map<()[s0, s1] -> (4, s0 - s1)>

// CHECK: func @matmul_tensors(		// CHECK: func @matmul_tensors(
// CHECK-SAME: %[[A:[0-9a-z]*]]: tensor<?x?xf32>		// CHECK-SAME: %[[A:[0-9a-z]*]]: tensor<?x?xf32>
// CHECK-SAME: %[[B:[0-9a-z]*]]: tensor<?x?xf32>		// CHECK-SAME: %[[B:[0-9a-z]*]]: tensor<?x?xf32>
// CHECK-SAME: %[[C:[0-9a-z]*]]: tensor<?x?xf32>		// CHECK-SAME: %[[C:[0-9a-z]*]]: tensor<?x?xf32>

// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[dA0:.*]] = tensor.dim %[[A]], %[[C0]] : tensor<?x?xf32>		// CHECK-DAG: %[[dA0:.*]] = tensor.dim %[[A]], %[[C0]] : tensor<?x?xf32>
// CHECK-DAG: %[[dA1:.*]] = tensor.dim %[[A]], %[[C1]] : tensor<?x?xf32>		// CHECK-DAG: %[[dA1:.*]] = tensor.dim %[[A]], %[[C1]] : tensor<?x?xf32>
// CHECK-DAG: %[[dB0:.*]] = tensor.dim %[[B]], %[[C0]] : tensor<?x?xf32>		// CHECK-DAG: %[[dB0:.*]] = tensor.dim %[[B]], %[[C0]] : tensor<?x?xf32>
// CHECK-DAG: %[[dB1:.*]] = tensor.dim %[[B]], %[[C1]] : tensor<?x?xf32>		// CHECK-DAG: %[[dB1:.*]] = tensor.dim %[[B]], %[[C1]] : tensor<?x?xf32>
// CHECK: scf.for %[[I:[0-9a-z]*]]		// CHECK: scf.for %[[I:[0-9a-z]*]]
// CHECK: %[[sizeA0:.*]] = affine.min #[[BOUND2_MAP]](%[[I]])[%[[dA0]]]		// CHECK: %[[sizeA0:.*]] = affine.min #[[BOUND2_MAP]]()[%[[dA0]], %[[I]]]
// CHECK: %[[stA:.*]] = tensor.extract_slice %[[A]][%[[I]], 0] [%[[sizeA0]], %[[dA1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[stA:.*]] = tensor.extract_slice %[[A]][%[[I]], 0] [%[[sizeA0]], %[[dA1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK-NEXT: scf.for %[[J:[0-9a-z]*]]		// CHECK-NEXT: scf.for %[[J:[0-9a-z]*]]
// CHECK-NEXT: scf.for %[[K:[0-9a-z]]] {{.}} iter_args(%[[RES:[0-9a-z]*]]		// CHECK-NEXT: scf.for %[[K:[0-9a-z]]] {{.}} iter_args(%[[RES:[0-9a-z]*]]
// CHECK-DAG: %[[stB1:.*]] = tensor.extract_slice %[[B]][%[[K]], %[[J]]] [4, 3] [1, 1] : tensor<?x?xf32> to tensor<4x3xf32>		// CHECK-DAG: %[[stB1:.*]] = tensor.extract_slice %[[B]][%[[K]], %[[J]]] [4, 3] [1, 1] : tensor<?x?xf32> to tensor<4x3xf32>
// CHECK-DAG: %[[stF:.*]] = tensor.extract_slice %[[RES]][%[[I]], %[[J]]] [2, 3] [1, 1] : tensor<?x?xf32> to tensor<2x3xf32>		// CHECK-DAG: %[[stF:.*]] = tensor.extract_slice %[[RES]][%[[I]], %[[J]]] [2, 3] [1, 1] : tensor<?x?xf32> to tensor<2x3xf32>
//		//
// slices of the producing matmul.		// slices of the producing matmul.
// CHECK: %[[sizeB1:.*]] = affine.min #[[BOUND4_MAP]](%[[K]])[%[[dB1]]]		// CHECK: %[[sizeB1:.*]] = affine.min #[[BOUND4_MAP]]()[%[[dB1]], %[[K]]]
// CHECK: %[[stB2:.*]] = tensor.extract_slice %[[B]][0, %[[K]]] [%[[dB0]], %[[sizeB1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[stB2:.*]] = tensor.extract_slice %[[B]][0, %[[K]]] [%[[dB0]], %[[sizeB1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: %[[stC:.*]] = tensor.extract_slice %[[C]][%[[I]], %[[K]]] [%[[sizeA0]], %[[sizeB1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[stC:.*]] = tensor.extract_slice %[[C]][%[[I]], %[[K]]] [%[[sizeA0]], %[[sizeB1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: %[[stD:.*]] = linalg.matmul ins(%[[stA]], %[[stB2]] : tensor<?x?xf32>, tensor<?x?xf32>) outs(%[[stC]] : tensor<?x?xf32>) -> tensor<?x?xf32>		// CHECK: %[[stD:.*]] = linalg.matmul ins(%[[stA]], %[[stB2]] : tensor<?x?xf32>, tensor<?x?xf32>) outs(%[[stC]] : tensor<?x?xf32>) -> tensor<?x?xf32>
// CHECK: %[[CAST:.*]] = tensor.cast %[[stD]] : tensor<?x?xf32> to tensor<2x4xf32>		// CHECK: %[[CAST:.*]] = tensor.cast %[[stD]] : tensor<?x?xf32> to tensor<2x4xf32>
// CHECK-NEXT: %[[stG:.*]] = linalg.matmul ins(%[[CAST]], %[[stB1]] : tensor<2x4xf32>, tensor<4x3xf32>) outs(%[[stF]] : tensor<2x3xf32>) -> tensor<2x3xf32>		// CHECK-NEXT: %[[stG:.*]] = linalg.matmul ins(%[[CAST]], %[[stB1]] : tensor<2x4xf32>, tensor<4x3xf32>) outs(%[[stF]] : tensor<2x3xf32>) -> tensor<2x3xf32>
// CHECK-NEXT: tensor.insert_slice %[[stG]] into %[[RES]][%[[I]], %[[J]]]		// CHECK-NEXT: tensor.insert_slice %[[stG]] into %[[RES]][%[[I]], %[[J]]]

// -----		// -----
Show All 40 Lines	%for1 = scf.for %iv1 = %c0 to %c112 step %c16 iter_args(%arg1 = %arg0) -> tensor<1x112x112x32xf32> {
}		}
scf.yield %for2 : tensor<1x112x112x32xf32>		scf.yield %for2 : tensor<1x112x112x32xf32>
}		}
scf.yield %for1 : tensor<1x112x112x32xf32>		scf.yield %for1 : tensor<1x112x112x32xf32>
}		}
return %for0 : tensor<1x112x112x32xf32>		return %for0 : tensor<1x112x112x32xf32>
}		}

// CHECK: #[[MAP0:.+]] = affine_map<(d0) -> (d0 * 2)>		// CHECK: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 * 2)>
// CHECK: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>		// CHECK: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>

// CHECK: func @conv_tensors_static		// CHECK: func @conv_tensors_static
// CHECK-SAME: (%[[INPUT:.+]]: tensor<1x225x225x3xf32>, %[[FILTER:.+]]: tensor<3x3x3x32xf32>, %[[ELEM:.+]]: tensor<1x112x112x32xf32>)		// CHECK-SAME: (%[[INPUT:.+]]: tensor<1x225x225x3xf32>, %[[FILTER:.+]]: tensor<3x3x3x32xf32>, %[[ELEM:.+]]: tensor<1x112x112x32xf32>)

// CHECK: %[[INIT:.+]] = linalg.init_tensor [1, 112, 112, 32] : tensor<1x112x112x32xf32>		// CHECK: %[[INIT:.+]] = linalg.init_tensor [1, 112, 112, 32] : tensor<1x112x112x32xf32>
// CHECK-NEXT: %[[FILL:.+]] = linalg.fill(%cst, %[[INIT]]) : f32, tensor<1x112x112x32xf32> -> tensor<1x112x112x32xf32>		// CHECK-NEXT: %[[FILL:.+]] = linalg.fill(%cst, %[[INIT]]) : f32, tensor<1x112x112x32xf32> -> tensor<1x112x112x32xf32>

// CHECK-NEXT: scf.for %[[IV0:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG0:.+]] = %[[FILL]])		// CHECK-NEXT: scf.for %[[IV0:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG0:.+]] = %[[FILL]])
// CHECK-NEXT: %[[OFFSET_H:.+]] = affine.apply #[[MAP0]](%[[IV0]])		// CHECK-NEXT: %[[OFFSET_H:.+]] = affine.apply #[[MAP0]]()[%[[IV0]]]
// CHECK-NEXT: scf.for %[[IV1:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG1:.+]] = %[[ARG0]])		// CHECK-NEXT: scf.for %[[IV1:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG1:.+]] = %[[ARG0]])
// CHECK-NEXT: %[[OFFSET_W:.+]] = affine.apply #[[MAP0]](%[[IV1]])		// CHECK-NEXT: %[[OFFSET_W:.+]] = affine.apply #[[MAP0]]()[%[[IV1]]]
// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %arg0[0, %[[OFFSET_H]], %[[OFFSET_W]], 0] [1, 17, 33, 3] [1, 1, 1, 1] : tensor<1x225x225x3xf32> to tensor<1x17x33x3xf32>		// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %arg0[0, %[[OFFSET_H]], %[[OFFSET_W]], 0] [1, 17, 33, 3] [1, 1, 1, 1] : tensor<1x225x225x3xf32> to tensor<1x17x33x3xf32>
// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG2:.+]] = %[[ARG1]])		// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG2:.+]] = %[[ARG1]])
// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>		// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>
// CHECK-NEXT: %[[ST_ARG2:.+]] = tensor.extract_slice %[[ARG2]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>		// CHECK-NEXT: %[[ST_ARG2:.+]] = tensor.extract_slice %[[ARG2]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>
// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV2]]] [3, 3, 3, 4] [1, 1, 1, 1] : tensor<3x3x3x32xf32> to tensor<3x3x3x4xf32>		// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV2]]] [3, 3, 3, 4] [1, 1, 1, 1] : tensor<3x3x3x32xf32> to tensor<3x3x3x4xf32>
// CHECK-NEXT: %[[ST_FILL:.+]] = tensor.extract_slice %[[FILL]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>		// CHECK-NEXT: %[[ST_FILL:.+]] = tensor.extract_slice %[[FILL]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>
// CHECK-NEXT: %[[ST_CONV:.+]] = linalg.conv_2d_nhwc_hwcf		// CHECK-NEXT: %[[ST_CONV:.+]] = linalg.conv_2d_nhwc_hwcf
// CHECK-SAME: ins(%[[ST_INPUT]], %[[ST_FILTER]] : tensor<1x17x33x3xf32>, tensor<3x3x3x4xf32>)		// CHECK-SAME: ins(%[[ST_INPUT]], %[[ST_FILTER]] : tensor<1x17x33x3xf32>, tensor<3x3x3x4xf32>)
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	%for1 = scf.for %iv1 = %c0 to %oh step %c16 iter_args(%arg1 = %arg0) -> tensor<?x?x?x?xf32> {
}		}
scf.yield %for2 : tensor<?x?x?x?xf32>		scf.yield %for2 : tensor<?x?x?x?xf32>
}		}
scf.yield %for1 : tensor<?x?x?x?xf32>		scf.yield %for1 : tensor<?x?x?x?xf32>
}		}
return %for0 : tensor<?x?x?x?xf32>		return %for0 : tensor<?x?x?x?xf32>
}		}

// CHECK: #[[BOUND8_MAP:.+]] = affine_map<(d0)[s0] -> (8, -d0 + s0)>		// CHECK: #[[BOUND8_MAP:.+]] = affine_map<()[s0, s1] -> (8, s0 - s1)>
// CHECK: #[[BOUND8_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 8, -d0 + s1)>		// CHECK: #[[BOUND8_MAP_2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 8, -s1 + s2)>
// CHECK: #[[BOUND16_MAP:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK: #[[BOUND16_MAP:.+]] = affine_map<()[s0, s1] -> (16, s0 - s1)>
// CHECK: #[[X2_MAP:.+]] = affine_map<(d0) -> (d0 * 2)>		// CHECK: #[[X2_MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
// CHECK: #[[INPUT_BOUND:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * 2 + s0 - 2, d1 * -2 + s0 + s1 * 2 - 2)>		// CHECK: #[[INPUT_BOUND:.+]] = affine_map<()[s0, s1, s2, s3] -> (s0 * 2 + s1 - 2, s1 + s2 * 2 - s3 * 2 - 2)>
// CHECK: #[[BOUND16_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 16, -d0 + s1)>		// CHECK: #[[BOUND16_MAP_2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 16, -s1 + s2)>
// CHECK: #[[BOUND4_MAP:.+]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>		// CHECK: #[[BOUND4_MAP:.+]] = affine_map<()[s0, s1] -> (4, s0 - s1)>
// CHECK: #[[BOUND2_MAP:.+]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>		// CHECK: #[[BOUND2_MAP:.+]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
// CHECK: #[[BOUND4_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 4, -d0 + s1)>		// CHECK: #[[BOUND4_MAP_2:.+]] = affine_map<()[s0, s1, s2] -> (s0 - s1, 4, -s1 + s2)>
// CHECK: #[[BOUND2_MAP_2:.+]] = affine_map<(d0, d1)[s0, s1] -> (-d0 + s0, 2, -d1 + s1)>		// CHECK: #[[BOUND2_MAP_2:.+]] = affine_map<()[s0, s1, s2, s3] -> (s0 - s1, 2, s2 - s3)>

// CHECK: func @conv_tensors_dynamic		// CHECK: func @conv_tensors_dynamic
// CHECK-SAME: (%[[INPUT]]: tensor<?x?x?x?xf32>, %[[FILTER]]: tensor<?x?x?x?xf32>, %[[ELEM]]: tensor<?x?x?x?xf32>)		// CHECK-SAME: (%[[INPUT]]: tensor<?x?x?x?xf32>, %[[FILTER]]: tensor<?x?x?x?xf32>, %[[ELEM]]: tensor<?x?x?x?xf32>)

// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
// CHECK-DAG: %[[C2:.+]] = arith.constant 2 : index		// CHECK-DAG: %[[C2:.+]] = arith.constant 2 : index
// CHECK-DAG: %[[C3:.+]] = arith.constant 3 : index		// CHECK-DAG: %[[C3:.+]] = arith.constant 3 : index
Show All 11 Lines
// CHECK-DAG: %[[FILTER_IC:.+]] = tensor.dim %[[FILTER]], %[[C2]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILTER_IC:.+]] = tensor.dim %[[FILTER]], %[[C2]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[FILTER_OC:.+]] = tensor.dim %[[FILTER]], %[[C3]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILTER_OC:.+]] = tensor.dim %[[FILTER]], %[[C3]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[INPUT_N:.+]] = tensor.dim %[[INPUT]], %[[C0]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[INPUT_N:.+]] = tensor.dim %[[INPUT]], %[[C0]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[INPUT_C:.+]] = tensor.dim %[[INPUT]], %[[C3]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[INPUT_C:.+]] = tensor.dim %[[INPUT]], %[[C3]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[FILL_H:.+]] = tensor.dim %[[FILL]], %[[C1]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILL_H:.+]] = tensor.dim %[[FILL]], %[[C1]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[FILL_W:.+]] = tensor.dim %[[FILL]], %[[C2]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILL_W:.+]] = tensor.dim %[[FILL]], %[[C2]] : tensor<?x?x?x?xf32>

// CHECK: scf.for %[[IV0:.+]] = %{{.+}} to %[[ELEM_N]] step %{{.+}} iter_args(%{{.+}} = %[[FILL]])		// CHECK: scf.for %[[IV0:.+]] = %{{.+}} to %[[ELEM_N]] step %{{.+}} iter_args(%{{.+}} = %[[FILL]])
// CHECK-NEXT: %[[SIZE_ELEM_N:.+]] = affine.min #[[BOUND8_MAP]](%[[IV0]])[%[[ELEM_N]]]		// CHECK-NEXT: %[[SIZE_ELEM_N:.+]] = affine.min #[[BOUND8_MAP]]()[%[[ELEM_N]], %[[IV0]]]
// CHECK-NEXT: %[[SIZE_INPUT_N:.+]] = affine.min #[[BOUND8_MAP_2]](%[[IV0]])[%[[INPUT_N]], %[[ELEM_N]]]		// CHECK-NEXT: %[[SIZE_INPUT_N:.+]] = affine.min #[[BOUND8_MAP_2]]()[%[[INPUT_N]], %[[IV0]], %[[ELEM_N]]]
// CHECK-NEXT: scf.for %[[IV1:.+]] = %{{.+}} to %[[ELEM_OH]]		// CHECK-NEXT: scf.for %[[IV1:.+]] = %{{.+}} to %[[ELEM_OH]]
// CHECK-NEXT: %[[SIZE_ELEM_OH:.+]] = affine.min #[[BOUND16_MAP]](%[[IV1]])[%[[ELEM_OH]]]		// CHECK-NEXT: %[[SIZE_ELEM_OH:.+]] = affine.min #[[BOUND16_MAP]]()[%[[ELEM_OH]], %[[IV1]]]
// CHECK-NEXT: %[[OFFSET_OH:.+]] = affine.apply #[[X2_MAP]](%[[IV1]])		// CHECK-NEXT: %[[OFFSET_OH:.+]] = affine.apply #[[X2_MAP]]()[%[[IV1]]]
// CHECK-NEXT: %[[SIZE_INPUT_H:.+]] = affine.min #[[INPUT_BOUND]](%[[SIZE_ELEM_OH]], %[[IV1]])[%[[FILTER_H]], %[[FILL_H]]]		// CHECK-NEXT: %[[SIZE_INPUT_H:.+]] = affine.min #[[INPUT_BOUND]]()[%[[SIZE_ELEM_OH]], %[[FILTER_H]], %[[FILL_H]], %[[IV1]]]
// CHECK-NEXT: %[[SIZE_ELEM_OH_2:.+]] = affine.min #[[BOUND16_MAP_2]](%[[IV1]])[%[[FILL_H]], %[[ELEM_OH]]]		// CHECK-NEXT: %[[SIZE_ELEM_OH_2:.+]] = affine.min #[[BOUND16_MAP_2]]()[%[[FILL_H]], %[[IV1]], %[[ELEM_OH]]]
// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %[[ELEM_OW]]		// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %[[ELEM_OW]]
// CHECK-NEXT: %[[SIZE_ELEM_OW:.+]] = affine.min #[[BOUND4_MAP]](%[[IV2]])[%[[ELEM_OW]]]		// CHECK-NEXT: %[[SIZE_ELEM_OW:.+]] = affine.min #[[BOUND4_MAP]]()[%[[ELEM_OW]], %[[IV2]]]
// CHECK-NEXT: %[[SIZE_ELEM_OC:.+]] = affine.min #[[BOUND2_MAP]](%[[IV2]])[%[[ELEM_OC]]]		// CHECK-NEXT: %[[SIZE_ELEM_OC:.+]] = affine.min #[[BOUND2_MAP]]()[%[[ELEM_OC]], %[[IV2]]]
// CHECK-NEXT: %[[OFFSET_OW:.+]] = affine.apply #[[X2_MAP]](%[[IV2]])		// CHECK-NEXT: %[[OFFSET_OW:.+]] = affine.apply #[[X2_MAP]]()[%[[IV2]]]
// CHECK-NEXT: %[[SIZE_INPUT_W:.+]] = affine.min #[[INPUT_BOUND]](%[[SIZE_ELEM_OW]], %[[IV2]])[%[[FILTER_W]], %[[FILL_W]]]		// CHECK-NEXT: %[[SIZE_INPUT_W:.+]] = affine.min #[[INPUT_BOUND]]()[%[[SIZE_ELEM_OW]], %[[FILTER_W]], %[[FILL_W]], %[[IV2]]]
// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %[[INPUT]][%[[IV0]], %[[OFFSET_OH]], %[[OFFSET_OW]], 0]		// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %[[INPUT]][%[[IV0]], %[[OFFSET_OH]], %[[OFFSET_OW]], 0]
// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_INPUT_H]], %[[SIZE_INPUT_W]], %[[INPUT_C]]]		// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_INPUT_H]], %[[SIZE_INPUT_W]], %[[INPUT_C]]]
// CHECK-NEXT: %[[SIZE_ELEM_OW_2:.+]] = affine.min #[[BOUND4_MAP_2]](%[[IV2]])[%[[FILL_W]], %[[ELEM_OW]]]		// CHECK-NEXT: %[[SIZE_ELEM_OW_2:.+]] = affine.min #[[BOUND4_MAP_2]]()[%[[FILL_W]], %[[IV2]], %[[ELEM_OW]]]
// CHECK-NEXT: scf.for %[[IV3:.+]] = %{{.+}} to %[[ELEM_OC]] step %{{.+}} iter_args(%[[ARG:[a-z0-9]+]]		// CHECK-NEXT: scf.for %[[IV3:.+]] = %{{.+}} to %[[ELEM_OC]] step %{{.+}} iter_args(%[[ARG:[a-z0-9]+]]
// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]		// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]		// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]
// CHECK-NEXT: %[[ST_ARG:.+]] = tensor.extract_slice %[[ARG]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]		// CHECK-NEXT: %[[ST_ARG:.+]] = tensor.extract_slice %[[ARG]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]		// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]
// CHECK-NEXT: %[[SIZE_ELEM_OC_2:.+]] = affine.min #[[BOUND2_MAP_2]](%[[IV3]], %[[IV2]])[%[[FILTER_OC]], %[[ELEM_OC]]]		// CHECK-NEXT: %[[SIZE_ELEM_OC_2:.+]] = affine.min #[[BOUND2_MAP_2]]()[%[[FILTER_OC]], %[[IV3]], %[[ELEM_OC]], %[[IV2]]]
// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV3]]]		// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV3]]]
// CHECK-SAME: [%[[FILTER_H]], %[[FILTER_W]], %[[FILTER_IC]], %[[SIZE_ELEM_OC_2]]]		// CHECK-SAME: [%[[FILTER_H]], %[[FILTER_W]], %[[FILTER_IC]], %[[SIZE_ELEM_OC_2]]]
// CHECK-NEXT: %[[ST_FILL:.+]] = tensor.extract_slice %[[FILL]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]		// CHECK-NEXT: %[[ST_FILL:.+]] = tensor.extract_slice %[[FILL]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_ELEM_OH_2]], %[[SIZE_ELEM_OW_2]], %[[SIZE_ELEM_OC_2]]]		// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_ELEM_OH_2]], %[[SIZE_ELEM_OW_2]], %[[SIZE_ELEM_OC_2]]]
// CHECK-NEXT: %[[ST_CONV:.+]] = linalg.conv_2d_nhwc_hwcf		// CHECK-NEXT: %[[ST_CONV:.+]] = linalg.conv_2d_nhwc_hwcf
// CHECK-SAME: ins(%[[ST_INPUT]], %[[ST_FILTER]] : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		// CHECK-SAME: ins(%[[ST_INPUT]], %[[ST_FILTER]] : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
// CHECK-SAME: outs(%[[ST_FILL]] : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		// CHECK-SAME: outs(%[[ST_FILL]] : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>
// CHECK-NEXT: %[[ST_ADD:.+]] = linalg.generic		// CHECK-NEXT: %[[ST_ADD:.+]] = linalg.generic
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile-conv.mlir

	// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3" \| FileCheck %s			// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3" \| FileCheck %s

	// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0)[s0, s1] -> (s0 + 1, -d0 + s0 + s1 - 1)>			// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1, s2] -> (s0 + 1, s0 + s1 - s2 - 1)>
	// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0, s1] -> (s0 + 2, -d0 + s0 + s1 - 1)>			// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0, s1, s2] -> (s0 + 2, s0 + s1 - s2 - 1)>
	// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
	// CHECK-DAG: #[[MAP3:.*]] = affine_map<(d0)[s0] -> (3, -d0 + s0)>			// CHECK-DAG: #[[MAP3:.*]] = affine_map<()[s0, s1] -> (3, s0 - s1)>

	func @conv(%arg0 : memref<?x?xf32>, %arg1 : memref<?x?xf32>, %arg2 : memref<?x?xf32>) {			func @conv(%arg0 : memref<?x?xf32>, %arg1 : memref<?x?xf32>, %arg2 : memref<?x?xf32>) {
	linalg.conv_2d ins(%arg0, %arg1 : memref<?x?xf32>, memref<?x?xf32>) outs(%arg2 : memref<?x?xf32>)			linalg.conv_2d ins(%arg0, %arg1 : memref<?x?xf32>, memref<?x?xf32>) outs(%arg2 : memref<?x?xf32>)
	return			return
	}			}

	// CHECK: func @conv			// CHECK: func @conv
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index			// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
	// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index			// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
	// CHECK-DAG: %[[T0:.*]] = memref.dim %[[ARG1]], %[[C0]]			// CHECK-DAG: %[[T0:.*]] = memref.dim %[[ARG1]], %[[C0]]
	// CHECK-DAG: %[[T1:.*]] = memref.dim %[[ARG1]], %[[C1]]			// CHECK-DAG: %[[T1:.*]] = memref.dim %[[ARG1]], %[[C1]]
	// CHECK-DAG: %[[T2:.*]] = memref.dim %[[ARG2]], %[[C0]]			// CHECK-DAG: %[[T2:.*]] = memref.dim %[[ARG2]], %[[C0]]
	// CHECK-DAG: %[[T3:.*]] = memref.dim %[[ARG2]], %[[C1]]			// CHECK-DAG: %[[T3:.*]] = memref.dim %[[ARG2]], %[[C1]]
	// CHECK: scf.for %[[ARG3:.*]] = %[[C0]] to %[[T2]] step %[[C2]]			// CHECK: scf.for %[[ARG3:.*]] = %[[C0]] to %[[T2]] step %[[C2]]
	// CHECK: scf.for %[[ARG4:.*]] = %[[C0]] to %[[T3]] step %[[C3]]			// CHECK: scf.for %[[ARG4:.*]] = %[[C0]] to %[[T3]] step %[[C3]]
	// CHECK: %[[T4:.*]] = affine.min #[[MAP0]](%[[ARG3]])[%[[T0]], %[[T2]]]			// CHECK: %[[T4:.*]] = affine.min #[[MAP0]]()[%[[T0]], %[[T2]], %[[ARG3]]]
	// CHECK: %[[T5:.*]] = affine.min #[[MAP1]](%[[ARG4]])[%[[T1]], %[[T3]]]			// CHECK: %[[T5:.*]] = affine.min #[[MAP1]]()[%[[T1]], %[[T3]], %[[ARG4]]]
	// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[ARG3]], %[[ARG4]]] [%[[T4]], %[[T5]]]			// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[ARG3]], %[[ARG4]]] [%[[T4]], %[[T5]]]
	// CHECK: %[[T6:.*]] = affine.min #[[MAP2]](%[[ARG3]])[%[[T2]]			// CHECK: %[[T6:.*]] = affine.min #[[MAP2]]()[%[[T2]], %[[ARG3]]]
	// CHECK: %[[T7:.*]] = affine.min #[[MAP3]](%[[ARG4]])[%[[T3]]]			// CHECK: %[[T7:.*]] = affine.min #[[MAP3]]()[%[[T3]], %[[ARG4]]]
	// CHECK: %[[SV2:.*]] = memref.subview %[[ARG2]][%[[ARG3]], %[[ARG4]]] [%[[T6]], %[[T7]]]			// CHECK: %[[SV2:.*]] = memref.subview %[[ARG2]][%[[ARG3]], %[[ARG4]]] [%[[T6]], %[[T7]]]
	// CHECK: linalg.conv_2d			// CHECK: linalg.conv_2d
	// CHECK-SAME: ins(%[[SV1]], %[[ARG1]]			// CHECK-SAME: ins(%[[SV1]], %[[ARG1]]
	// CHECK-SAME: outs(%[[SV2]]			// CHECK-SAME: outs(%[[SV2]]

mlir/test/Dialect/Linalg/tile-indexed.mlir

	// RUN: mlir-opt %s -linalg-tile="tile-sizes=10,25" -split-input-file \| FileCheck %s -check-prefix=TILE-10n25			// RUN: mlir-opt %s -linalg-tile="tile-sizes=10,25" -split-input-file \| FileCheck %s -check-prefix=TILE-10n25
	// RUN: mlir-opt %s -linalg-tile="tile-sizes=25,0" -split-input-file \| FileCheck %s -check-prefix=TILE-25n0			// RUN: mlir-opt %s -linalg-tile="tile-sizes=25,0" -split-input-file \| FileCheck %s -check-prefix=TILE-25n0
	// RUN: mlir-opt %s -linalg-tile="tile-sizes=0,25" -split-input-file \| FileCheck %s -check-prefix=TILE-0n25			// RUN: mlir-opt %s -linalg-tile="tile-sizes=0,25" -split-input-file \| FileCheck %s -check-prefix=TILE-0n25

	func @indexed_vector(%arg0: memref<50xindex>) {			func @indexed_vector(%arg0: memref<50xindex>) {
	linalg.generic {indexing_maps = [affine_map<(i) -> (i)>],			linalg.generic {indexing_maps = [affine_map<(i) -> (i)>],
	iterator_types = ["parallel"]}			iterator_types = ["parallel"]}
	outs(%arg0 : memref<50xindex>) {			outs(%arg0 : memref<50xindex>) {
	^bb0(%a: index):			^bb0(%a: index):
	%i = linalg.index 0 : index			%i = linalg.index 0 : index
	linalg.yield %i : index			linalg.yield %i : index
	}			}
	return			return
	}			}
	// TILE-10n25-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<(d0, d1) -> (d0 + d1)>			// TILE-10n25-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// TILE-10n25-LABEL: func @indexed_vector			// TILE-10n25-LABEL: func @indexed_vector
	// TILE-10n25: %[[C10:.*]] = arith.constant 10 : index			// TILE-10n25: %[[C10:.*]] = arith.constant 10 : index
	// TILE-10n25: scf.for %[[J:.]] = {{.}} step %[[C10]]			// TILE-10n25: scf.for %[[J:.]] = {{.}} step %[[C10]]
	// TILE-10n25: linalg.generic			// TILE-10n25: linalg.generic
	// TILE-10n25: %[[I:.*]] = linalg.index 0 : index			// TILE-10n25: %[[I:.*]] = linalg.index 0 : index
	// TILE-10n25: %[[NEW_I:.*]] = affine.apply [[$MAP]](%[[I]], %[[J]])			// TILE-10n25: %[[NEW_I:.*]] = affine.apply [[$MAP]]()[%[[I]], %[[J]]]
	// TILE-10n25: linalg.yield %[[NEW_I]] : index			// TILE-10n25: linalg.yield %[[NEW_I]] : index

	// TILE-25n0-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<(d0, d1) -> (d0 + d1)>			// TILE-25n0-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// TILE-25n0-LABEL: func @indexed_vector			// TILE-25n0-LABEL: func @indexed_vector
	// TILE-25n0: %[[C25:.*]] = arith.constant 25 : index			// TILE-25n0: %[[C25:.*]] = arith.constant 25 : index
	// TILE-25n0: scf.for %[[J:.]] = {{.}} step %[[C25]]			// TILE-25n0: scf.for %[[J:.]] = {{.}} step %[[C25]]
	// TILE-25n0: linalg.generic			// TILE-25n0: linalg.generic
	// TILE-25n0: %[[I:.*]] = linalg.index 0 : index			// TILE-25n0: %[[I:.*]] = linalg.index 0 : index
	// TILE-25n0: %[[NEW_I:.*]] = affine.apply [[$MAP]](%[[I]], %[[J]])			// TILE-25n0: %[[NEW_I:.*]] = affine.apply [[$MAP]]()[%[[I]], %[[J]]]
	// TILE-25n0: linalg.yield %[[NEW_I]] : index			// TILE-25n0: linalg.yield %[[NEW_I]] : index

	// TILE-0n25-LABEL: func @indexed_vector			// TILE-0n25-LABEL: func @indexed_vector
	// TILE-0n25-NOT: scf.for %[[J:.]] = {{.}} step %			// TILE-0n25-NOT: scf.for %[[J:.]] = {{.}} step %
	// TILE-0n25: linalg.generic			// TILE-0n25: linalg.generic

	// -----			// -----

	func @indexed_matrix(%arg0: memref<50x50xindex>) {			func @indexed_matrix(%arg0: memref<50x50xindex>) {
	linalg.generic {indexing_maps = [affine_map<(i, j) -> (i, j)>],			linalg.generic {indexing_maps = [affine_map<(i, j) -> (i, j)>],
	iterator_types = ["parallel", "parallel"]}			iterator_types = ["parallel", "parallel"]}
	outs(%arg0 : memref<50x50xindex>) {			outs(%arg0 : memref<50x50xindex>) {
	^bb0(%a: index):			^bb0(%a: index):
	%i = linalg.index 0 : index			%i = linalg.index 0 : index
	%j = linalg.index 1 : index			%j = linalg.index 1 : index
	%sum = arith.addi %i, %j : index			%sum = arith.addi %i, %j : index
	linalg.yield %sum : index			linalg.yield %sum : index
	}			}
	return			return
	}			}
	// TILE-10n25-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<(d0, d1) -> (d0 + d1)>			// TILE-10n25-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// TILE-10n25-LABEL: func @indexed_matrix			// TILE-10n25-LABEL: func @indexed_matrix
	// TILE-10n25-DAG: %[[C25:.*]] = arith.constant 25 : index			// TILE-10n25-DAG: %[[C25:.*]] = arith.constant 25 : index
	// TILE-10n25-DAG: %[[C10:.*]] = arith.constant 10 : index			// TILE-10n25-DAG: %[[C10:.*]] = arith.constant 10 : index
	// TILE-10n25: scf.for %[[K:.]] = {{.}} step %[[C10]]			// TILE-10n25: scf.for %[[K:.]] = {{.}} step %[[C10]]
	// TILE-10n25: scf.for %[[L:.]] = {{.}} step %[[C25]]			// TILE-10n25: scf.for %[[L:.]] = {{.}} step %[[C25]]
	// TILE-10n25: linalg.generic			// TILE-10n25: linalg.generic
	// TILE-10n25: %[[I:.*]] = linalg.index 0 : index			// TILE-10n25: %[[I:.*]] = linalg.index 0 : index
	// TILE-10n25: %[[NEW_I:.*]] = affine.apply [[$MAP]](%[[I]], %[[K]])			// TILE-10n25: %[[NEW_I:.*]] = affine.apply [[$MAP]]()[%[[I]], %[[K]]]
	// TILE-10n25: %[[J:.*]] = linalg.index 1 : index			// TILE-10n25: %[[J:.*]] = linalg.index 1 : index
	// TILE-10n25: %[[NEW_J:.*]] = affine.apply [[$MAP]](%[[J]], %[[L]])			// TILE-10n25: %[[NEW_J:.*]] = affine.apply [[$MAP]]()[%[[J]], %[[L]]]
	// TILE-10n25: %[[SUM:.*]] = arith.addi %[[NEW_I]], %[[NEW_J]] : index			// TILE-10n25: %[[SUM:.*]] = arith.addi %[[NEW_I]], %[[NEW_J]] : index
	// TILE-10n25: linalg.yield %[[SUM]] : index			// TILE-10n25: linalg.yield %[[SUM]] : index

	// TILE-25n0-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<(d0, d1) -> (d0 + d1)>			// TILE-25n0-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// TILE-25n0-LABEL: func @indexed_matrix			// TILE-25n0-LABEL: func @indexed_matrix
	// TILE-25n0: %[[C25:.*]] = arith.constant 25 : index			// TILE-25n0: %[[C25:.*]] = arith.constant 25 : index
	// TILE-25n0: scf.for %[[L:.]] = {{.}} step %[[C25]]			// TILE-25n0: scf.for %[[L:.]] = {{.}} step %[[C25]]
	// TILE-25n0: linalg.generic			// TILE-25n0: linalg.generic
	// TILE-25n0: %[[I:.*]] = linalg.index 0 : index			// TILE-25n0: %[[I:.*]] = linalg.index 0 : index
	// TILE-25n0: %[[NEW_I:.*]] = affine.apply [[$MAP]](%[[I]], %[[L]])			// TILE-25n0: %[[NEW_I:.*]] = affine.apply [[$MAP]]()[%[[I]], %[[L]]]
	// TILE-25n0: %[[J:.*]] = linalg.index 1 : index			// TILE-25n0: %[[J:.*]] = linalg.index 1 : index
	// TILE-25n0: %[[SUM:.*]] = arith.addi %[[NEW_I]], %[[J]] : index			// TILE-25n0: %[[SUM:.*]] = arith.addi %[[NEW_I]], %[[J]] : index
	// TILE-25n0: linalg.yield %[[SUM]] : index			// TILE-25n0: linalg.yield %[[SUM]] : index

	// TILE-0n25-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<(d0, d1) -> (d0 + d1)>			// TILE-0n25-DAG: [[$MAP:#[a-zA-Z0-9_]*]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// TILE-0n25-LABEL: func @indexed_matrix			// TILE-0n25-LABEL: func @indexed_matrix
	// TILE-0n25: %[[C25:.*]] = arith.constant 25 : index			// TILE-0n25: %[[C25:.*]] = arith.constant 25 : index
	// TILE-0n25: scf.for %[[L:.]] = {{.}} step %[[C25]]			// TILE-0n25: scf.for %[[L:.]] = {{.}} step %[[C25]]
	// TILE-0n25: linalg.generic			// TILE-0n25: linalg.generic
	// TILE-0n25: %[[I:.*]] = linalg.index 0 : index			// TILE-0n25: %[[I:.*]] = linalg.index 0 : index
	// TILE-0n25: %[[J:.*]] = linalg.index 1 : index			// TILE-0n25: %[[J:.*]] = linalg.index 1 : index
	// TILE-0n25: %[[NEW_J:.*]] = affine.apply [[$MAP]](%[[J]], %[[L]])			// TILE-0n25: %[[NEW_J:.*]] = affine.apply [[$MAP]]()[%[[J]], %[[L]]]
	// TILE-0n25: %[[SUM:.*]] = arith.addi %[[I]], %[[NEW_J]] : index			// TILE-0n25: %[[SUM:.*]] = arith.addi %[[I]], %[[NEW_J]] : index
	// TILE-0n25: linalg.yield %[[SUM]] : index			// TILE-0n25: linalg.yield %[[SUM]] : index

mlir/test/Dialect/Linalg/tile-tensors.mlir

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	// TLOOP-SAME: to (%[[ARG_0_X]], %[[ARG_0_Y]], %[[ARG_0_Z]])			// TLOOP-SAME: to (%[[ARG_0_X]], %[[ARG_0_Y]], %[[ARG_0_Z]])
	// TLOOP-SAME: step (%[[C2]], %[[C3]], %[[C4]])			// TLOOP-SAME: step (%[[C2]], %[[C3]], %[[C4]])
	// TLOOP-SAME: ins (%{{.}} = %[[ARG_0]]: [[TY]], %{{.}} = %[[ARG_1]]: [[TY]])			// TLOOP-SAME: ins (%{{.}} = %[[ARG_0]]: [[TY]], %{{.}} = %[[ARG_1]]: [[TY]])
	// TLOOP-SAME: outs (%{{.*}} = %[[INIT]]: [[TY]])			// TLOOP-SAME: outs (%{{.*}} = %[[INIT]]: [[TY]])
	// TLOOP-SAME: distribution["block_x", "block_y", "none"] {			// TLOOP-SAME: distribution["block_x", "block_y", "none"] {

	// -----			// -----

	// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
	// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0) -> (d0 + 3)>			// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 3)>
	// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0) -> (d0 + 4)>			// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0] -> (s0 + 4)>

	// CHECK: fold_extract_slice			// CHECK: fold_extract_slice
	// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<?x128xf32>			// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<?x128xf32>
	// CHECK-SAME: %[[ARG1:[0-9a-zA-Z]*]]: tensor<?x42xf32>			// CHECK-SAME: %[[ARG1:[0-9a-zA-Z]*]]: tensor<?x42xf32>
	func @fold_extract_slice(			func @fold_extract_slice(
	%arg0 : tensor<?x128xf32>, %arg1 : tensor<?x42xf32>, %arg2 : tensor<?x42x?xf32>) -> tensor<?x42xf32> {			%arg0 : tensor<?x128xf32>, %arg1 : tensor<?x42xf32>, %arg2 : tensor<?x42x?xf32>) -> tensor<?x42xf32> {

	// CHECK: %[[C0:.*]] = arith.constant 0			// CHECK: %[[C0:.*]] = arith.constant 0
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index

	// CHECK: %[[DIM:.*]] = tensor.dim %[[ARG1]], %[[C0]]			// CHECK: %[[DIM:.*]] = tensor.dim %[[ARG1]], %[[C0]]
	%0 = tensor.dim %arg1, %c0 : tensor<?x42xf32>			%0 = tensor.dim %arg1, %c0 : tensor<?x42xf32>
	%1 = tensor.extract_slice %arg0[3, 4] [%0, 42] [1, 1] : tensor<?x128xf32> to tensor<?x42xf32>			%1 = tensor.extract_slice %arg0[3, 4] [%0, 42] [1, 1] : tensor<?x128xf32> to tensor<?x42xf32>

	// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =			// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =
	// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =			// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =

	// Fold the existing extract slice op into the one created by the tiling.			// Fold the existing extract slice op into the one created by the tiling.
	// CHECK: %[[SIZE0:.*]] = affine.min #[[MAP0]](%[[IV0]])[%[[DIM]]			// CHECK: %[[SIZE0:.*]] = affine.min #[[MAP0]]()[%[[DIM]], %[[IV0]]]
	// CHECK: %[[OFF0:.*]] = affine.apply #[[MAP1]](%[[IV0]]			// CHECK: %[[OFF0:.*]] = affine.apply #[[MAP1]]()[%[[IV0]]]
	// CHECK: %[[OFF1:.*]] = affine.apply #[[MAP2]](%[[IV1]]			// CHECK: %[[OFF1:.*]] = affine.apply #[[MAP2]]()[%[[IV1]]]
	// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]
	// CHECK-SAME: %[[OFF0]], %[[OFF1]]			// CHECK-SAME: %[[OFF0]], %[[OFF1]]
	// CHECK-SAME: %[[SIZE0]], 3			// CHECK-SAME: %[[SIZE0]], 3
	// CHECK-SAME: 1, 1			// CHECK-SAME: 1, 1
	// CHECK: {{.}} = linalg.generic {{.}} ins(%[[T0]]			// CHECK: {{.}} = linalg.generic {{.}} ins(%[[T0]]
	%2 = linalg.generic			%2 = linalg.generic
	{indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1)>,			{indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1)>,
	affine_map<(d0, d1, d2) -> (d0, d1, d2)>,			affine_map<(d0, d1, d2) -> (d0, d1, d2)>,
	Show All 11 Lines

mlir/test/Dialect/Linalg/tile.mlir

	// RUN: mlir-opt %s -linalg-tile="tile-sizes=2" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-2			// RUN: mlir-opt %s -linalg-tile="tile-sizes=2" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-2
	// RUN: mlir-opt %s -linalg-tile="tile-sizes=0,2" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-02			// RUN: mlir-opt %s -linalg-tile="tile-sizes=0,2" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-02
	// RUN: mlir-opt %s -linalg-tile="tile-sizes=0,0,2" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-002			// RUN: mlir-opt %s -linalg-tile="tile-sizes=0,0,2" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-002
	// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3,4" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-234			// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3,4" -mlir-disable-threading=true \| FileCheck %s -check-prefix=TILE-234

	// TILE-2-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>			// TILE-2-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
	// TILE-02-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>			// TILE-02-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
	// TILE-002-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>			// TILE-002-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
	// TILE-234-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>			// TILE-234-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>

	// TILE-2-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>			// TILE-2-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
	// TILE-02-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>			// TILE-02-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
	// TILE-002-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>			// TILE-002-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
	// TILE-234-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>			// TILE-234-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>

	// TILE-2-DAG: #[[$bound_map:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// TILE-2-DAG: #[[$bound_map:.*]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
	// TILE-02-DAG: #[[$bound_map:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// TILE-02-DAG: #[[$bound_map:.*]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
	// TILE-002-DAG: #[[$bound_map:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// TILE-002-DAG: #[[$bound_map:.*]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
	// TILE-234-DAG: #[[$bound_map_2:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// TILE-234-DAG: #[[$bound_map_2:.*]] = affine_map<()[s0, s1] -> (2, s0 - s1)>
	// TILE-234-DAG: #[[$bound_map_3:.*]] = affine_map<(d0)[s0] -> (3, -d0 + s0)>			// TILE-234-DAG: #[[$bound_map_3:.*]] = affine_map<()[s0, s1] -> (3, s0 - s1)>
	// TILE-234-DAG: #[[$bound_map_4:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>			// TILE-234-DAG: #[[$bound_map_4:.*]] = affine_map<()[s0, s1] -> (4, s0 - s1)>

	// TILE-2-DAG: #[[$stride_99_1_layout_map:.]] = affine_map<(d0, d1)[s0] -> (d0 99 + s0 + d1)>			// TILE-2-DAG: #[[$stride_99_1_layout_map:.]] = affine_map<(d0, d1)[s0] -> (d0 99 + s0 + d1)>
	// TILE-02-DAG: #[[$stride_99_1_layout_map:.]] = affine_map<(d0, d1)[s0] -> (d0 99 + s0 + d1)>			// TILE-02-DAG: #[[$stride_99_1_layout_map:.]] = affine_map<(d0, d1)[s0] -> (d0 99 + s0 + d1)>
	// TILE-234-DAG: #[[$stride_99_1_layout_map:.]] = affine_map<(d0, d1)[s0] -> (d0 99 + s0 + d1)>			// TILE-234-DAG: #[[$stride_99_1_layout_map:.]] = affine_map<(d0, d1)[s0] -> (d0 99 + s0 + d1)>

	func @matmul(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>,			func @matmul(%arg0: memref<?x?xf32, offset: ?, strides: [?, 1]>,
	%arg1: memref<?x?xf32, offset: ?, strides: [?, 1]>,			%arg1: memref<?x?xf32, offset: ?, strides: [?, 1]>,
	%arg2: memref<?x?xf32, offset: ?, strides: [?, 1]>) {			%arg2: memref<?x?xf32, offset: ?, strides: [?, 1]>) {
	linalg.matmul			linalg.matmul
	ins(%arg0, %arg1: memref<?x?xf32, offset: ?, strides: [?, 1]>,			ins(%arg0, %arg1: memref<?x?xf32, offset: ?, strides: [?, 1]>,
	memref<?x?xf32, offset: ?, strides: [?, 1]>)			memref<?x?xf32, offset: ?, strides: [?, 1]>)
	outs(%arg2: memref<?x?xf32, offset: ?, strides: [?, 1]>)			outs(%arg2: memref<?x?xf32, offset: ?, strides: [?, 1]>)
	return			return
	}			}
	// TILE-2-LABEL: func @matmul(			// TILE-2-LABEL: func @matmul(
	// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]]()[%[[M]], %[[I]]]
	// TILE-2: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[K]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[K]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[szK:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szK:.*]] = affine.min #[[$bound_map]]()[%[[M]], %[[I]]]
	// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szK]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szK]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: linalg.matmul ins(%[[sAi]]{{.*}} outs(%[[sCi]]			// TILE-2: linalg.matmul ins(%[[sAi]]{{.*}} outs(%[[sCi]]

	// TILE-02-LABEL: func @matmul(			// TILE-02-LABEL: func @matmul(
	// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-02: %[[N:.*]] = memref.dim %arg1, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[N:.*]] = memref.dim %arg1, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: scf.for %[[J:.]] = %{{.}} to %[[N]] step %{{.*}} {			// TILE-02: scf.for %[[J:.]] = %{{.}} to %[[N]] step %{{.*}} {
	// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[N]]]			// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]]()[%[[N]], %[[J]]]
	// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[K]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[K]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szK:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[N]]]			// TILE-02: %[[szK:.*]] = affine.min #[[$bound_map]]()[%[[N]], %[[J]]]
	// TILE-02: %[[sCj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[sCj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: linalg.matmul ins(%{{.}}, %[[sBj]]{{.}} outs(%[[sCj]]			// TILE-02: linalg.matmul ins(%{{.}}, %[[sBj]]{{.}} outs(%[[sCj]]

	// TILE-002-LABEL: func @matmul(			// TILE-002-LABEL: func @matmul(
	// TILE-002-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-002-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-002-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-002-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-002: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {			// TILE-002: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {
	// TILE-002: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: %[[szK:.*]] = affine.min #[[$bound_map]](%[[K]])[%[[ubK]]]			// TILE-002: %[[szK:.*]] = affine.min #[[$bound_map]]()[%[[ubK]], %[[K]]]
	// TILE-002: %[[sAj:.]] = memref.subview %{{.}}[0, %[[K]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[sAj:.]] = memref.subview %{{.}}[0, %[[K]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: %[[szK:.*]] = affine.min #[[$bound_map]](%[[K]])[%[[ubK]]]			// TILE-002: %[[szK:.*]] = affine.min #[[$bound_map]]()[%[[ubK]], %[[K]]]
	// TILE-002: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: %[[sBj:.]] = memref.subview %{{.}}[%[[K]], 0] [%[[szK]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[sBj:.]] = memref.subview %{{.}}[%[[K]], 0] [%[[szK]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: linalg.matmul ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}			// TILE-002: linalg.matmul ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}

	// TILE-234-LABEL: func @matmul(			// TILE-234-LABEL: func @matmul(
	// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index			// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index
	// TILE-234-DAG: %[[C4:.*]] = arith.constant 4 : index			// TILE-234-DAG: %[[C4:.*]] = arith.constant 4 : index
	// TILE-234: %[[ubM:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[ubM:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[ubN:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[ubN:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[ubM]] step %{{.}} {			// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[ubM]] step %{{.}} {
	// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[ubN]] step %{{.}} {			// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[ubN]] step %{{.}} {
	// TILE-234: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {			// TILE-234: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubM]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]]()[%[[ubM]], %[[I]]]
	// TILE-234: %[[szK:.*]] = affine.min #[[$bound_map_4]](%[[K]])[%[[ubK]]]			// TILE-234: %[[szK:.*]] = affine.min #[[$bound_map_4]]()[%[[ubK]], %[[K]]]
	// TILE-234: %[[sAik:.]] = memref.subview %{{.}}[%[[I]], %[[K]]] [%[[szM]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[sAik:.]] = memref.subview %{{.}}[%[[I]], %[[K]]] [%[[szM]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[szK:.*]] = affine.min #[[$bound_map_4]](%[[K]])[%[[ubK]]]			// TILE-234: %[[szK:.*]] = affine.min #[[$bound_map_4]]()[%[[ubK]], %[[K]]]
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[ubN]]]			// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]]()[%[[ubN]], %[[J]]]
	// TILE-234: %[[sBkj:.]] = memref.subview %{{.}}[%[[K]], %[[J]]] [%[[szK]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[sBkj:.]] = memref.subview %{{.}}[%[[K]], %[[J]]] [%[[szK]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubM]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]]()[%[[ubM]], %[[I]]]
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[ubN]]]			// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]]()[%[[ubN]], %[[J]]]
	// TILE-234: %[[sCij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[sCij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	//			//
	// TILE-234: linalg.matmul ins(%[[sAik]], %[[sBkj]]{{.*}} outs(%[[sCij]]			// TILE-234: linalg.matmul ins(%[[sAik]], %[[sBkj]]{{.*}} outs(%[[sCij]]

	// When the buffer shapes are known at compile time, it is possible to avoid			// When the buffer shapes are known at compile time, it is possible to avoid
	// the "min" in subview size computation. This test uses buffer sizes divisible			// the "min" in subview size computation. This test uses buffer sizes divisible
	// by respective tile sizes (M=10 divisble by 2, N=12 divisible by 2 and 3,			// by respective tile sizes (M=10 divisble by 2, N=12 divisible by 2 and 3,
	// K=16 divisble by 2 and 4).			// K=16 divisble by 2 and 4).
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	// TILE-2-LABEL: func @matvec(			// TILE-2-LABEL: func @matvec(
	// TILE-2-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref			// TILE-2-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref
	// TILE-2-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-2-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-2-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-2-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]]()[%[[M]], %[[I]]]
	// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[szN:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szN:.*]] = affine.min #[[$bound_map]]()[%[[M]], %[[I]]]
	// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-2: linalg.matvec ins(%[[sAi]], %{{.*}} outs(%[[sCi]]			// TILE-2: linalg.matvec ins(%[[sAi]], %{{.*}} outs(%[[sCi]]

	// TILE-02-LABEL: func @matvec(			// TILE-02-LABEL: func @matvec(
	// TILE-02-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref			// TILE-02-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref
	// TILE-02-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-02-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-02-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-02-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {			// TILE-02: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {
	// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[K]]]			// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]]()[%[[K]], %[[J]]]
	// TILE-02: %[[sAj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[sAj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[K]]]			// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]]()[%[[K]], %[[J]]]
	// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-02: linalg.matvec ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}			// TILE-02: linalg.matvec ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}

	// TILE-002-LABEL: func @matvec(			// TILE-002-LABEL: func @matvec(
	// TILE-002-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref			// TILE-002-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref
	// TILE-002-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-002-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-002-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-002-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-002-NOT: scf.for			// TILE-002-NOT: scf.for

	// TILE-234-LABEL: func @matvec(			// TILE-234-LABEL: func @matvec(
	// TILE-234-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref			// TILE-234-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref
	// TILE-234-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-234-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-234-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-234-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index			// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index
	// TILE-234: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {			// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[M]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]]()[%[[M]], %[[I]]]
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[K]]]			// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]]()[%[[K]], %[[J]]]
	// TILE-234: %[[sAij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[sAij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[K]]]			// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]]()[%[[K]], %[[J]]]
	// TILE-234: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[M]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]]()[%[[M]], %[[I]]]
	// TILE-234: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	//			//
	// TILE-234: linalg.matvec ins(%[[sAij]], %[[sBj]]{{.*}} outs(%[[sCi]]			// TILE-234: linalg.matvec ins(%[[sAij]], %[[sBj]]{{.*}} outs(%[[sCi]]

	func @dot(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {			func @dot(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {
	linalg.dot			linalg.dot
	ins(%arg0, %arg1: memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>)			ins(%arg0, %arg1: memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>)
	outs(%arg2: memref<f32>)			outs(%arg2: memref<f32>)
	return			return
	}			}
	// TILE-2-LABEL: func @dot(			// TILE-2-LABEL: func @dot(
	// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>			// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>
	// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]]()[%[[M]], %[[I]]]
	// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]]()[%[[M]], %[[I]]]
	// TILE-2: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-2: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-2: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(			// TILE-2: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(

	// TILE-02-LABEL: func @dot(			// TILE-02-LABEL: func @dot(
	// TILE-02-NOT: scf.for			// TILE-02-NOT: scf.for

	// TILE-002-LABEL: func @dot(			// TILE-002-LABEL: func @dot(
	// TILE-002-NOT: scf.for			// TILE-002-NOT: scf.for

	// TILE-234-LABEL: func @dot(			// TILE-234-LABEL: func @dot(
	// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>
	// TILE-234: scf.for %[[I:.]] = %{{.}} to %[[ubK]] step %{{.*}} {			// TILE-234: scf.for %[[I:.]] = %{{.}} to %[[ubK]] step %{{.*}} {
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubK]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]]()[%[[ubK]], %[[I]]]
	// TILE-234: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubK]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]]()[%[[ubK]], %[[I]]]
	// TILE-234: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(			// TILE-234: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(

	func @fill_static(%arg0: memref<127x99xf32>, %arg1: f32) {			func @fill_static(%arg0: memref<127x99xf32>, %arg1: f32) {
	linalg.fill(%arg1, %arg0) : f32, memref<127x99xf32>			linalg.fill(%arg1, %arg0) : f32, memref<127x99xf32>
	return			return
	}			}
	// TILE-2-LABEL: func @fill_static			// TILE-2-LABEL: func @fill_static
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

mlir/test/Dialect/SCF/for-loop-peeling.mlir

// RUN: mlir-opt %s -for-loop-peeling -canonicalize -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -for-loop-peeling -canonicalize -split-input-file \| FileCheck %s
// RUN: mlir-opt %s -for-loop-peeling=skip-partial=false -canonicalize -split-input-file \| FileCheck %s -check-prefix=CHECK-NO-SKIP		// RUN: mlir-opt %s -for-loop-peeling=skip-partial=false -canonicalize -split-input-file \| FileCheck %s -check-prefix=CHECK-NO-SKIP

// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - s0) mod s2)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - s0) mod s2)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0] -> (-d0 + s0)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0, s1] -> (-s0 + s1)>
// CHECK: func @fully_dynamic_bounds(		// CHECK: func @fully_dynamic_bounds(
// CHECK-SAME: %[[LB:.]]: index, %[[UB:.]]: index, %[[STEP:.*]]: index		// CHECK-SAME: %[[LB:.]]: index, %[[UB:.]]: index, %[[STEP:.*]]: index
// CHECK: %[[C0_I32:.*]] = arith.constant 0 : i32		// CHECK: %[[C0_I32:.*]] = arith.constant 0 : i32
// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[LB]], %[[UB]], %[[STEP]]]		// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[LB]], %[[UB]], %[[STEP]]]
// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[LB]] to %[[NEW_UB]]		// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[LB]] to %[[NEW_UB]]
// CHECK-SAME: step %[[STEP]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {		// CHECK-SAME: step %[[STEP]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {
// CHECK: %[[CAST:.*]] = arith.index_cast %[[STEP]] : index to i32		// CHECK: %[[CAST:.*]] = arith.index_cast %[[STEP]] : index to i32
// CHECK: %[[ADD:.*]] = arith.addi %[[ACC]], %[[CAST]] : i32		// CHECK: %[[ADD:.*]] = arith.addi %[[ACC]], %[[CAST]] : i32
// CHECK: scf.yield %[[ADD]]		// CHECK: scf.yield %[[ADD]]
// CHECK: }		// CHECK: }
// CHECK: %[[RESULT:.]] = scf.for %[[IV2:.]] = %[[NEW_UB]] to %[[UB]]		// CHECK: %[[RESULT:.]] = scf.for %[[IV2:.]] = %[[NEW_UB]] to %[[UB]]
// CHECK-SAME: step %[[STEP]] iter_args(%[[ACC2:.*]] = %[[LOOP]]) -> (i32) {		// CHECK-SAME: step %[[STEP]] iter_args(%[[ACC2:.*]] = %[[LOOP]]) -> (i32) {
// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]](%[[IV2]])[%[[UB]]]		// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]]()[%[[IV2]], %[[UB]]]
// CHECK: %[[CAST2:.*]] = arith.index_cast %[[REM]]		// CHECK: %[[CAST2:.*]] = arith.index_cast %[[REM]]
// CHECK: %[[ADD2:.*]] = arith.addi %[[ACC2]], %[[CAST2]]		// CHECK: %[[ADD2:.*]] = arith.addi %[[ACC2]], %[[CAST2]]
// CHECK: scf.yield %[[ADD2]]		// CHECK: scf.yield %[[ADD2]]
// CHECK: }		// CHECK: }
// CHECK: return %[[RESULT]]		// CHECK: return %[[RESULT]]
#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
func @fully_dynamic_bounds(%lb : index, %ub: index, %step: index) -> i32 {		func @fully_dynamic_bounds(%lb : index, %ub: index, %step: index) -> i32 {
%c0 = arith.constant 0 : i32		%c0 = arith.constant 0 : i32
Show All 36 Lines	%r = scf.for %iv = %lb to %ub step %step
scf.yield %0 : i32		scf.yield %0 : i32
}		}
return %r : i32		return %r : i32
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>		// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0] -> (-d0 + s0)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0, s1] -> (-s0 + s1)>
// CHECK: func @dynamic_upper_bound(		// CHECK: func @dynamic_upper_bound(
// CHECK-SAME: %[[UB:.*]]: index		// CHECK-SAME: %[[UB:.*]]: index
// CHECK-DAG: %[[C0_I32:.*]] = arith.constant 0 : i32		// CHECK-DAG: %[[C0_I32:.*]] = arith.constant 0 : i32
// CHECK-DAG: %[[C4_I32:.*]] = arith.constant 4 : i32		// CHECK-DAG: %[[C4_I32:.*]] = arith.constant 4 : i32
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index		// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]		// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]
// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[C0]] to %[[NEW_UB]]		// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[C0]] to %[[NEW_UB]]
// CHECK-SAME: step %[[C4]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {		// CHECK-SAME: step %[[C4]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {
// CHECK: %[[ADD:.*]] = arith.addi %[[ACC]], %[[C4_I32]] : i32		// CHECK: %[[ADD:.*]] = arith.addi %[[ACC]], %[[C4_I32]] : i32
// CHECK: scf.yield %[[ADD]]		// CHECK: scf.yield %[[ADD]]
// CHECK: }		// CHECK: }
// CHECK: %[[RESULT:.]] = scf.for %[[IV2:.]] = %[[NEW_UB]] to %[[UB]]		// CHECK: %[[RESULT:.]] = scf.for %[[IV2:.]] = %[[NEW_UB]] to %[[UB]]
// CHECK-SAME: step %[[C4]] iter_args(%[[ACC2:.*]] = %[[LOOP]]) -> (i32) {		// CHECK-SAME: step %[[C4]] iter_args(%[[ACC2:.*]] = %[[LOOP]]) -> (i32) {
// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]](%[[IV2]])[%[[UB]]]		// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]]()[%[[IV2]], %[[UB]]]
// CHECK: %[[CAST2:.*]] = arith.index_cast %[[REM]]		// CHECK: %[[CAST2:.*]] = arith.index_cast %[[REM]]
// CHECK: %[[ADD2:.*]] = arith.addi %[[ACC2]], %[[CAST2]]		// CHECK: %[[ADD2:.*]] = arith.addi %[[ACC2]], %[[CAST2]]
// CHECK: scf.yield %[[ADD2]]		// CHECK: scf.yield %[[ADD2]]
// CHECK: }		// CHECK: }
// CHECK: return %[[RESULT]]		// CHECK: return %[[RESULT]]
#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
func @dynamic_upper_bound(%ub : index) -> i32 {		func @dynamic_upper_bound(%ub : index) -> i32 {
%c0_i32 = arith.constant 0 : i32		%c0_i32 = arith.constant 0 : i32
%lb = arith.constant 0 : index		%lb = arith.constant 0 : index
%step = arith.constant 4 : index		%step = arith.constant 4 : index
%r = scf.for %iv = %lb to %ub step %step		%r = scf.for %iv = %lb to %ub step %step
iter_args(%arg = %c0_i32) -> i32 {		iter_args(%arg = %c0_i32) -> i32 {
%s = affine.min #map(%ub, %iv)[%step]		%s = affine.min #map(%ub, %iv)[%step]
%casted = arith.index_cast %s : index to i32		%casted = arith.index_cast %s : index to i32
%0 = arith.addi %arg, %casted : i32		%0 = arith.addi %arg, %casted : i32
scf.yield %0 : i32		scf.yield %0 : i32
}		}
return %r : i32		return %r : i32
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>		// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0] -> (-d0 + s0)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0, s1] -> (-s0 + s1)>
// CHECK: func @no_loop_results(		// CHECK: func @no_loop_results(
// CHECK-SAME: %[[UB:.]]: index, %[[MEMREF:.]]: memref<i32>		// CHECK-SAME: %[[UB:.]]: index, %[[MEMREF:.]]: memref<i32>
// CHECK-DAG: %[[C4_I32:.*]] = arith.constant 4 : i32		// CHECK-DAG: %[[C4_I32:.*]] = arith.constant 4 : i32
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index		// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]		// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]
// CHECK: scf.for %[[IV:.*]] = %[[C0]] to %[[NEW_UB]] step %[[C4]] {		// CHECK: scf.for %[[IV:.*]] = %[[C0]] to %[[NEW_UB]] step %[[C4]] {
// CHECK: %[[LOAD:.*]] = memref.load %[[MEMREF]][]		// CHECK: %[[LOAD:.*]] = memref.load %[[MEMREF]][]
// CHECK: %[[ADD:.*]] = arith.addi %[[LOAD]], %[[C4_I32]] : i32		// CHECK: %[[ADD:.*]] = arith.addi %[[LOAD]], %[[C4_I32]] : i32
// CHECK: memref.store %[[ADD]], %[[MEMREF]]		// CHECK: memref.store %[[ADD]], %[[MEMREF]]
// CHECK: }		// CHECK: }
// CHECK: scf.for %[[IV2:.*]] = %[[NEW_UB]] to %[[UB]] step %[[C4]] {		// CHECK: scf.for %[[IV2:.*]] = %[[NEW_UB]] to %[[UB]] step %[[C4]] {
// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]](%[[IV2]])[%[[UB]]]		// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]]()[%[[IV2]], %[[UB]]]
// CHECK: %[[LOAD2:.*]] = memref.load %[[MEMREF]][]		// CHECK: %[[LOAD2:.*]] = memref.load %[[MEMREF]][]
// CHECK: %[[CAST2:.*]] = arith.index_cast %[[REM]]		// CHECK: %[[CAST2:.*]] = arith.index_cast %[[REM]]
// CHECK: %[[ADD2:.*]] = arith.addi %[[LOAD2]], %[[CAST2]]		// CHECK: %[[ADD2:.*]] = arith.addi %[[LOAD2]], %[[CAST2]]
// CHECK: memref.store %[[ADD2]], %[[MEMREF]]		// CHECK: memref.store %[[ADD2]], %[[MEMREF]]
// CHECK: }		// CHECK: }
// CHECK: return		// CHECK: return
#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
func @no_loop_results(%ub : index, %d : memref<i32>) {		func @no_loop_results(%ub : index, %d : memref<i32>) {
Show All 12 Lines

// -----		// -----

// Test rewriting of affine.min/max ops. Make sure that more general cases than		// Test rewriting of affine.min/max ops. Make sure that more general cases than
// the ones above are successfully rewritten. Also make sure that the pattern		// the ones above are successfully rewritten. Also make sure that the pattern
// does not rewrite ops that should not be rewritten.		// does not rewrite ops that should not be rewritten.

// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 1)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 1)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1 - 1)>		// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0, s1, s2] -> (s0, s1 - s2 - 1)>
// CHECK-DAG: #[[MAP3:.*]] = affine_map<(d0)[s0, s1, s2] -> (s0, -d0 + s1, s2)>		// CHECK-DAG: #[[MAP3:.*]] = affine_map<()[s0, s1, s2, s3] -> (s0, s1 - s2, s3)>
// CHECK-DAG: #[[MAP4:.*]] = affine_map<()[s0] -> (-s0)>		// CHECK-DAG: #[[MAP4:.*]] = affine_map<()[s0] -> (-s0)>
// CHECK-DAG: #[[MAP5:.*]] = affine_map<(d0)[s0] -> (-d0 + s0)>		// CHECK-DAG: #[[MAP5:.*]] = affine_map<()[s0, s1] -> (-s0 + s1)>
// CHECK-DAG: #[[MAP6:.*]] = affine_map<(d0)[s0] -> (-d0 + s0 + 1)>		// CHECK-DAG: #[[MAP6:.*]] = affine_map<()[s0, s1] -> (-s0 + s1 + 1)>
// CHECK-DAG: #[[MAP7:.*]] = affine_map<(d0)[s0] -> (-d0 + s0 - 1)>		// CHECK-DAG: #[[MAP7:.*]] = affine_map<()[s0, s1] -> (-s0 + s1 - 1)>
// CHECK-DAG: #[[MAP8:.*]] = affine_map<(d0)[s0] -> (d0 - s0)>		// CHECK-DAG: #[[MAP8:.*]] = affine_map<()[s0, s1] -> (s0 - s1)>
// CHECK: func @test_affine_op_rewrite(		// CHECK: func @test_affine_op_rewrite(
// CHECK-SAME: %[[LB:.]]: index, %[[UB:.]]: index, %[[STEP:.*]]: index,		// CHECK-SAME: %[[LB:.]]: index, %[[UB:.]]: index, %[[STEP:.*]]: index,
// CHECK-SAME: %[[MEMREF:.]]: memref<?xindex>, %[[SOME_VAL:.]]: index		// CHECK-SAME: %[[MEMREF:.]]: memref<?xindex>, %[[SOME_VAL:.]]: index
// CHECK: scf.for %[[IV:.]] = %[[LB]] to %{{.}} step %[[STEP]] {		// CHECK: scf.for %[[IV:.]] = %[[LB]] to %{{.}} step %[[STEP]] {
// (affine.min folded away)		// (affine.min folded away)
// CHECK: memref.store %[[STEP]]		// CHECK: memref.store %[[STEP]]
// (affine.min folded away)		// (affine.min folded away)
// CHECK: memref.store %[[STEP]]		// CHECK: memref.store %[[STEP]]
// CHECK: %[[RES2:.*]] = affine.apply #[[MAP1]]()[%[[STEP]]]		// CHECK: %[[RES2:.*]] = affine.apply #[[MAP1]]()[%[[STEP]]]
// CHECK: memref.store %[[RES2]]		// CHECK: memref.store %[[RES2]]
// CHECK: %[[RES3:.*]] = affine.min #[[MAP2]](%[[IV]])[%[[STEP]], %[[UB]]]		// CHECK: %[[RES3:.*]] = affine.min #[[MAP2]]()[%[[STEP]], %[[UB]], %[[IV]]]
// CHECK: memref.store %[[RES3]]		// CHECK: memref.store %[[RES3]]
// CHECK: %[[RES4:.*]] = affine.min #[[MAP3]](%[[IV]])[%[[STEP]], %[[UB]], %[[SOME_VAL]]]		// CHECK: %[[RES4:.*]] = affine.min #[[MAP3]]()[%[[STEP]], %[[UB]], %[[IV]], %[[SOME_VAL]]]
// CHECK: memref.store %[[RES4]]		// CHECK: memref.store %[[RES4]]
// CHECK: %[[RES5:.*]] = affine.apply #[[MAP4]]()[%[[STEP]]]		// CHECK: %[[RES5:.*]] = affine.apply #[[MAP4]]()[%[[STEP]]]
// CHECK: memref.store %[[RES5]]		// CHECK: memref.store %[[RES5]]
// CHECK: }		// CHECK: }
// CHECK: scf.for %[[IV2:.]] = {{.}} to %[[UB]] step %[[STEP]] {		// CHECK: scf.for %[[IV2:.]] = {{.}} to %[[UB]] step %[[STEP]] {
// CHECK: %[[RES_IF_0:.*]] = affine.apply #[[MAP5]](%[[IV2]])[%[[UB]]]		// CHECK: %[[RES_IF_0:.*]] = affine.apply #[[MAP5]]()[%[[IV2]], %[[UB]]]
// CHECK: memref.store %[[RES_IF_0]]		// CHECK: memref.store %[[RES_IF_0]]
// CHECK: %[[RES_IF_1:.*]] = affine.apply #[[MAP6]](%[[IV2]])[%[[UB]]]		// CHECK: %[[RES_IF_1:.*]] = affine.apply #[[MAP6]]()[%[[IV2]], %[[UB]]]
// CHECK: memref.store %[[RES_IF_1]]		// CHECK: memref.store %[[RES_IF_1]]
// CHECK: %[[RES_IF_2:.*]] = affine.apply #[[MAP6]](%[[IV2]])[%[[UB]]]		// CHECK: %[[RES_IF_2:.*]] = affine.apply #[[MAP6]]()[%[[IV2]], %[[UB]]]
// CHECK: memref.store %[[RES_IF_2]]		// CHECK: memref.store %[[RES_IF_2]]
// CHECK: %[[RES_IF_3:.*]] = affine.apply #[[MAP7]](%[[IV2]])[%[[UB]]]		// CHECK: %[[RES_IF_3:.*]] = affine.apply #[[MAP7]]()[%[[IV2]], %[[UB]]]
// CHECK: memref.store %[[RES_IF_3]]		// CHECK: memref.store %[[RES_IF_3]]
// CHECK: %[[RES_IF_4:.*]] = affine.min #[[MAP3]](%[[IV2]])[%[[STEP]], %[[UB]], %[[SOME_VAL]]]		// CHECK: %[[RES_IF_4:.*]] = affine.min #[[MAP3]]()[%[[STEP]], %[[UB]], %[[IV2]], %[[SOME_VAL]]]
// CHECK: memref.store %[[RES_IF_4]]		// CHECK: memref.store %[[RES_IF_4]]
// CHECK: %[[RES_IF_5:.*]] = affine.apply #[[MAP8]](%[[IV2]])[%[[UB]]]		// CHECK: %[[RES_IF_5:.*]] = affine.apply #[[MAP8]]()[%[[IV2]], %[[UB]]]
// CHECK: memref.store %[[RES_IF_5]]		// CHECK: memref.store %[[RES_IF_5]]
#map0 = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map0 = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
#map1 = affine_map<(d0, d1)[s0] -> (d0 - d1 + 1, s0)>		#map1 = affine_map<(d0, d1)[s0] -> (d0 - d1 + 1, s0)>
#map2 = affine_map<(d0, d1)[s0] -> (s0 + 1, d0 - d1 + 1)>		#map2 = affine_map<(d0, d1)[s0] -> (s0 + 1, d0 - d1 + 1)>
#map3 = affine_map<(d0, d1)[s0] -> (s0, d0 - d1 - 1)>		#map3 = affine_map<(d0, d1)[s0] -> (s0, d0 - d1 - 1)>
#map4 = affine_map<(d0, d1, d2)[s0] -> (s0, d0 - d1, d2)>		#map4 = affine_map<(d0, d1, d2)[s0] -> (s0, d0 - d1, d2)>
#map5 = affine_map<(d0, d1)[s0] -> (-s0, -d0 + d1)>		#map5 = affine_map<(d0, d1)[s0] -> (-s0, -d0 + d1)>
func @test_affine_op_rewrite(%lb : index, %ub: index,		func @test_affine_op_rewrite(%lb : index, %ub: index,
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

mlir/test/Dialect/SparseTensor/sparse_vector_peeled.mlir

Show All 12 Lines	indexing_maps = [
affine_map<(i) -> (i)>, // b		affine_map<(i) -> (i)>, // b
affine_map<(i) -> (i)> // x (out)		affine_map<(i) -> (i)> // x (out)
],		],
iterator_types = ["parallel"],		iterator_types = ["parallel"],
doc = "x(i) = a(i) * b(i)"		doc = "x(i) = a(i) * b(i)"
}		}

// CHECK-DAG: #[[$map0:.]] = affine_map<()[s0, s1] -> (s0 + ((-s0 + s1) floordiv 16) 16)>		// CHECK-DAG: #[[$map0:.]] = affine_map<()[s0, s1] -> (s0 + ((-s0 + s1) floordiv 16) 16)>
// CHECK-DAG: #[[$map1:.*]] = affine_map<(d0)[s0] -> (-d0 + s0)>		// CHECK-DAG: #[[$map1:.*]] = affine_map<()[s0, s1] -> (-s0 + s1)>
// CHECK-LABEL: func @mul_s		// CHECK-LABEL: func @mul_s
// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[c16:.*]] = arith.constant 16 : index		// CHECK-DAG: %[[c16:.*]] = arith.constant 16 : index
// CHECK: %[[p:.]] = memref.load %{{.}}[%[[c0]]] : memref<?xi32>		// CHECK: %[[p:.]] = memref.load %{{.}}[%[[c0]]] : memref<?xi32>
// CHECK: %[[a:.*]] = arith.extui %[[p]] : i32 to i64		// CHECK: %[[a:.*]] = arith.extui %[[p]] : i32 to i64
// CHECK: %[[q:.*]] = arith.index_cast %[[a]] : i64 to index		// CHECK: %[[q:.*]] = arith.index_cast %[[a]] : i64 to index
// CHECK: %[[r:.]] = memref.load %{{.}}[%[[c1]]] : memref<?xi32>		// CHECK: %[[r:.]] = memref.load %{{.}}[%[[c1]]] : memref<?xi32>
// CHECK: %[[b:.*]] = arith.extui %[[r]] : i32 to i64		// CHECK: %[[b:.*]] = arith.extui %[[r]] : i32 to i64
// CHECK: %[[s:.*]] = arith.index_cast %[[b]] : i64 to index		// CHECK: %[[s:.*]] = arith.index_cast %[[b]] : i64 to index
// CHECK: %[[boundary:.*]] = affine.apply #[[$map0]]()[%[[q]], %[[s]]]		// CHECK: %[[boundary:.*]] = affine.apply #[[$map0]]()[%[[q]], %[[s]]]
// CHECK: scf.for %[[i:.*]] = %[[q]] to %[[boundary]] step %[[c16]] {		// CHECK: scf.for %[[i:.*]] = %[[q]] to %[[boundary]] step %[[c16]] {
// CHECK: %[[mask:.*]] = vector.constant_mask [16] : vector<16xi1>		// CHECK: %[[mask:.*]] = vector.constant_mask [16] : vector<16xi1>
// CHECK: %[[li:.]] = vector.load %{{.}}[%[[i]]] : memref<?xi32>, vector<16xi32>		// CHECK: %[[li:.]] = vector.load %{{.}}[%[[i]]] : memref<?xi32>, vector<16xi32>
// CHECK: %[[zi:.*]] = arith.extui %[[li]] : vector<16xi32> to vector<16xi64>		// CHECK: %[[zi:.*]] = arith.extui %[[li]] : vector<16xi32> to vector<16xi64>
// CHECK: %[[la:.]] = vector.load %{{.}}[%[[i]]] : memref<?xf32>, vector<16xf32>		// CHECK: %[[la:.]] = vector.load %{{.}}[%[[i]]] : memref<?xf32>, vector<16xf32>
// CHECK: %[[lb:.]] = vector.gather %{{.}}[%[[c0]]] [%[[zi]]], %[[mask]], %{{.*}} : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32> into vector<16xf32>		// CHECK: %[[lb:.]] = vector.gather %{{.}}[%[[c0]]] [%[[zi]]], %[[mask]], %{{.*}} : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32> into vector<16xf32>
// CHECK: %[[m:.*]] = arith.mulf %[[la]], %[[lb]] : vector<16xf32>		// CHECK: %[[m:.*]] = arith.mulf %[[la]], %[[lb]] : vector<16xf32>
// CHECK: vector.scatter %{{.*}}[%[[c0]]] [%[[zi]]], %[[mask]], %[[m]] : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32>		// CHECK: vector.scatter %{{.*}}[%[[c0]]] [%[[zi]]], %[[mask]], %[[m]] : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32>
// CHECK: }		// CHECK: }
// CHECK: scf.for %[[i2:.*]] = %[[boundary]] to %[[s]] step %[[c16]] {		// CHECK: scf.for %[[i2:.*]] = %[[boundary]] to %[[s]] step %[[c16]] {
// CHECK: %[[sub:.*]] = affine.apply #[[$map1]](%[[i2]])[%[[s]]]		// CHECK: %[[sub:.*]] = affine.apply #[[$map1]]()[%[[i2]], %[[s]]]
// CHECK: %[[mask2:.*]] = vector.create_mask %[[sub]] : vector<16xi1>		// CHECK: %[[mask2:.*]] = vector.create_mask %[[sub]] : vector<16xi1>
// CHECK: %[[li2:.]] = vector.maskedload %{{.}}[%[[i2]]], %[[mask2]], %{{.*}} : memref<?xi32>, vector<16xi1>, vector<16xi32> into vector<16xi32>		// CHECK: %[[li2:.]] = vector.maskedload %{{.}}[%[[i2]]], %[[mask2]], %{{.*}} : memref<?xi32>, vector<16xi1>, vector<16xi32> into vector<16xi32>
// CHECK: %[[zi2:.*]] = arith.extui %[[li2]] : vector<16xi32> to vector<16xi64>		// CHECK: %[[zi2:.*]] = arith.extui %[[li2]] : vector<16xi32> to vector<16xi64>
// CHECK: %[[la2:.]] = vector.maskedload %{{.}}[%[[i2]]], %[[mask2]], %{{.*}} : memref<?xf32>, vector<16xi1>, vector<16xf32> into vector<16xf32>		// CHECK: %[[la2:.]] = vector.maskedload %{{.}}[%[[i2]]], %[[mask2]], %{{.*}} : memref<?xf32>, vector<16xi1>, vector<16xf32> into vector<16xf32>
// CHECK: %[[lb2:.]] = vector.gather %{{.}}[%[[c0]]] [%[[zi2]]], %[[mask2]], %{{.*}} : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32> into vector<16xf32>		// CHECK: %[[lb2:.]] = vector.gather %{{.}}[%[[c0]]] [%[[zi2]]], %[[mask2]], %{{.*}} : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32> into vector<16xf32>
// CHECK: %[[m2:.*]] = arith.mulf %[[la2]], %[[lb2]] : vector<16xf32>		// CHECK: %[[m2:.*]] = arith.mulf %[[la2]], %[[lb2]] : vector<16xf32>
// CHECK: vector.scatter %{{.*}}[%[[c0]]] [%[[zi2]]], %[[mask2]], %[[m2]] : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32>		// CHECK: vector.scatter %{{.*}}[%[[c0]]] [%[[zi2]]], %[[mask2]], %[[m2]] : memref<1024xf32>, vector<16xi64>, vector<16xi1>, vector<16xf32>
// CHECK: }		// CHECK: }
Show All 12 Lines

mlir/test/lib/Dialect/Test/TestDialect.cpp

Show First 20 Lines • Show All 624 Lines • ▼ Show 20 Lines	static void print(OpAsmPrinter &p, GraphRegionOp op) {
p.printRegion(op.getRegion(), /printEntryBlockArgs=/false);		p.printRegion(op.getRegion(), /printEntryBlockArgs=/false);
}		}

RegionKind GraphRegionOp::getRegionKind(unsigned index) {		RegionKind GraphRegionOp::getRegionKind(unsigned index) {
return RegionKind::Graph;		return RegionKind::Graph;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Test AffineScopeOp
//===----------------------------------------------------------------------===//

static ParseResult parseAffineScopeOp(OpAsmParser &parser,
OperationState &result) {
// Parse the body region, and reuse the operand info as the argument info.
Region *body = result.addRegion();
return parser.parseRegion(body, /arguments=/{}, /argTypes=*/{});
}

static void print(OpAsmPrinter &p, AffineScopeOp op) {
p << "test.affine_scope ";
p.printRegion(op.getRegion(), /printEntryBlockArgs=/false);
}

//===----------------------------------------------------------------------===//
// Test parser.		// Test parser.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static ParseResult parseParseIntegerLiteralOp(OpAsmParser &parser,		static ParseResult parseParseIntegerLiteralOp(OpAsmParser &parser,
OperationState &result) {		OperationState &result) {
if (parser.parseOptionalColon())		if (parser.parseOptionalColon())
return success();		return success();
uint64_t numResults;		uint64_t numResults;
▲ Show 20 Lines • Show All 448 Lines • Show Last 20 Lines

mlir/test/lib/Dialect/Test/TestOps.td

Show First 20 Lines • Show All 1,570 Lines • ▼ Show 20 Lines	let description = [{
Test op that defines a graph region.		Test op that defines a graph region.
}];		}];

let regions = (region AnyRegion:$region);		let regions = (region AnyRegion:$region);
let parser = [{ return ::parse$cppClass(parser, result); }];		let parser = [{ return ::parse$cppClass(parser, result); }];
let printer = [{ return ::print(p, *this); }];		let printer = [{ return ::print(p, *this); }];
}		}

def AffineScopeOp : TEST_Op<"affine_scope", [AffineScope]> {		def ExtendAffineScopeOp : TEST_Op<"affine_scope_extend", [ExtendsAffineScope]> {
let summary = "affine scope operation";		let summary = "an operation that extends an affine scope";
let description = [{		let description = [{
Test op that defines a new affine scope.		Test op that extends an affine scope created by its ancestor op chain.
}];		}];

let regions = (region SizedRegion<1>:$region);		let regions = (region SizedRegion<1>:$region);
let parser = [{ return ::parse$cppClass(parser, result); }];		let assemblyFormat = "$region attr-dict";
let printer = [{ return ::print(p, *this); }];
}		}

def WrappingRegionOp : TEST_Op<"wrapping_region",		def WrappingRegionOp : TEST_Op<"wrapping_region",
[SingleBlockImplicitTerminator<"TestReturnOp">]> {		[SingleBlockImplicitTerminator<"TestReturnOp">]> {
let summary = "wrapping region operation";		let summary = "wrapping region operation";
let description = [{		let description = [{
Test op wrapping another op in a region, to test calling		Test op wrapping another op in a region, to test calling
parseGenericOperation from the custom parser.		parseGenericOperation from the custom parser.
▲ Show 20 Lines • Show All 779 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][Affine] Replace AffineScope by its complement traitNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 383667

mlir/docs/Dialects/Affine.md

mlir/docs/Traits.md

mlir/include/mlir/Dialect/Affine/IR/AffineOps.h

mlir/include/mlir/Dialect/Affine/IR/AffineOps.td

mlir/include/mlir/Dialect/Shape/IR/ShapeOps.td

mlir/include/mlir/IR/BuiltinOps.td

mlir/include/mlir/IR/OpBase.td

mlir/include/mlir/IR/OpDefinition.h

mlir/lib/Analysis/AffineAnalysis.cpp

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp

mlir/test/Conversion/AffineToStandard/lower-affine.mlir

mlir/test/Dialect/Affine/canonicalize.mlir

mlir/test/Dialect/Affine/invalid.mlir

mlir/test/Dialect/Affine/ops.mlir

mlir/test/Dialect/Linalg/comprehensive-module-bufferize.mlir

mlir/test/Dialect/Linalg/fusion-indexed.mlir

mlir/test/Dialect/Linalg/fusion-pattern.mlir

mlir/test/Dialect/Linalg/fusion-sequence.mlir

mlir/test/Dialect/Linalg/fusion-tensor-pattern.mlir

mlir/test/Dialect/Linalg/fusion.mlir

mlir/test/Dialect/Linalg/hoist-padding.mlir

mlir/test/Dialect/Linalg/loops.mlir

mlir/test/Dialect/Linalg/pad-and-hoist.mlir

mlir/test/Dialect/Linalg/reshape_fusion.mlir

mlir/test/Dialect/Linalg/tile-and-fuse-on-tensors.mlir

mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir

mlir/test/Dialect/Linalg/tile-conv.mlir

mlir/test/Dialect/Linalg/tile-indexed.mlir

mlir/test/Dialect/Linalg/tile-tensors.mlir

mlir/test/Dialect/Linalg/tile.mlir

mlir/test/Dialect/SCF/for-loop-peeling.mlir

mlir/test/Dialect/SparseTensor/sparse_vector_peeled.mlir

mlir/test/lib/Dialect/Test/TestDialect.cpp

mlir/test/lib/Dialect/Test/TestOps.td

[MLIR][Affine] Replace AffineScope by its complement trait
Needs RevisionPublic