This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/
-
Linalg/
-
TransformOps/
1/2
LinalgTransformOps.td
-
Utils/
-
Utils.h
-
SCF/Transforms/
-
Transforms/
4/4
TileUsingInterface.h
-
Interfaces/
5/19
TilingInterface.td
-
lib/Dialect/
-
Dialect/
-
Linalg/
-
TransformOps/
-
LinalgTransformOps.cpp
-
Transforms/
-
SplitReduction.cpp
12/19
TilingInterfaceImpl.cpp
-
Utils/
2/4
Utils.cpp
-
SCF/Transforms/
-
Transforms/
8/11
TileUsingInterface.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
1/3
transform-tile-reduction.mlir

Differential D136586

[mlir][linalg] Add reduction tiling transformation
ClosedPublic

Authored by ThomasRaoux on Oct 24 2022, 2:05 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
mravishankar
pifon2a
dcaballe
vmurali
herhut

Commits

rG3310fe55d948: [mlir][linalg] Add reduction tiling transformation

Summary

Add a transformation to tile reduction ops into a parallel operation
followed by a merge operation. This is equivalent to the existing
reduction spliting transformation but using loops instead of using
higher dimensions linalg.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Oct 24 2022, 2:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 2:05 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 18 others. · View Herald Transcript

ThomasRaoux requested review of this revision.Oct 24 2022, 2:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 2:05 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B193891: Diff 470086.Oct 24 2022, 2:21 AM

ThomasRaoux added reviewers: pifon2a, dcaballe.Oct 24 2022, 8:41 AM

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

mlir/include/mlir/Interfaces/TilingInterface.td
163	Is there are reason the interface method has to return a tensor. I think if it returns the identity element that should be enough? I am not opposed to doing this.. This might actually be more general, but trying to understand what is the minimum requirement that the interface should expect ops to implement.
mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
273	If I am reading this correctly, the setting of the identity element can be moved into the `tileReductionUsingSCFForOp` method (related to the comment above)
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
434	I think SCF already has a depence on Arith dialect. I think you can replace this with `getValueOrCreateConstantIndexOp`.

This revision now requires changes to proceed.Oct 24 2022, 9:35 AM

In D136586#3879727, @mravishankar wrote:

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

Yes I can do that. I was trying to make the interface more forward looking as I think we should be able to support multiple reductions at some point but I agree it is a bit odd. I'll change the interfaces for now.

mlir/include/mlir/Interfaces/TilingInterface.td
163	If we return the identity element we still need to create op generating the tensor which would be dependent on the op we are tiling. (for linalg we would generate linalg.fill but not for other ops). Where would you do that?
mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
273	Getting the reductionOp requires code depending on linalg so I'm not sure how we could do that without adding more interface functions.

dcaballe added a reviewer: vmurali.Oct 24 2022, 10:21 AM

Flyby comment: looks like this differential is essentially doing what is being done in https://github.com/iree-org/iree/pull/10537/files#diff-e105ec843efed97f5884ad26c85561c72967f0546029bd772335634f23553c31, which means I can remove that PR and use your new interface. You seem to have gotten rid of the control for inserted index. Do you no longer need that for general purpose reduction tiling?

In D136586#3880177, @vmurali wrote:

Flyby comment: looks like this differential is essentially doing what is being done in https://github.com/iree-org/iree/pull/10537/files#diff-e105ec843efed97f5884ad26c85561c72967f0546029bd772335634f23553c31, which means I can remove that PR and use your new interface. You seem to have gotten rid of the control for inserted index. Do you no longer need that for general purpose reduction tiling?

Yes it could be a potential way to replace this and generalize it by supporting dynamic shapes. The control index can be added to this if needed. Right now the code just makes a decision based on the reduction loop index but for more flexibility there should be a control indeed.

In D136586#3879727, @mravishankar wrote:

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

I think this OP should _not_ take care of multiple reduction dimensions. There's already a separate transformation op or transformation (that Thomas wrote earlier) that combines multiple contiguous reduction dimensions - so we can use that to combine the sequence of innermost reductions into one. If the reduction dimension is not innermost, then there's no need to apply the technique as parallel tiling will already create tiles that can be reduced in parallel.

In other words, I think this differential should be accepted as is, with comments (and checks) denoting that only innermost dimension must be reduced using this transformation.

pifon2a added inline comments.Oct 25 2022, 2:51 AM

mlir/include/mlir/Interfaces/TilingInterface.td
200	why are `mergeReductions`, `tileToPartialReduction`, `generateInitialTensorForPartialReduction` a part of tiling interface? They work only for Linalg, don't they?

Hardcode84 added a subscriber: Hardcode84.Oct 25 2022, 5:57 AM

Hardcode84 added inline comments.

mlir/lib/Dialect/Linalg/Utils/Utils.cpp
1064	maybe return `Optional<Attribute>` or `FailureOr<Attribute>` to force users to check validity?
1079	Did you mean `getLargest(semantic, false)`? Also, why `getLargest` and not `getInf`?

Move the reduction detection dimension in the common code and address other review comments.

more review comments

In D136586#3879727, @mravishankar wrote:

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

Thanks Mahesh. As discussed I moved the code detecting the reduction dimension to the common code but creating the tensor is still done by the interface function.

mlir/include/mlir/Interfaces/TilingInterface.td
200	The idea is that other ops than linalg could implement this interface in the future. For instance we did have to do reduction splitting for linalgext topK and that could be an op that would implement this interface. Also since we need to re-use helpers in the tiling code this allows us to do that without adding dependency between the tiling code and linalg.

ThomasRaoux added inline comments.Oct 25 2022, 9:20 PM

mlir/lib/Dialect/Linalg/Utils/Utils.cpp
1064	Changed it.
1079	You are right this is a bug. Since I'm only moving this code here I would rather not add the bug fix as part of this patch. I'll send another patch to fix this independently of this change,

Harbormaster completed remote builds in B194322: Diff 470694.Oct 25 2022, 9:44 PM

herhut requested changes to this revision.Oct 26 2022, 4:27 AM

herhut added a subscriber: herhut.

herhut added inline comments.

mlir/include/mlir/Interfaces/TilingInterface.td
200	Is this really tiling the operation? Yes, at some level it moves the computation to working on tiles but those tiles are not "normal" tiles in the sense of all other operations in this interface. Instead, these methods allow to "distribute" a reduction over multiple threads. So I see how having these methods implemented allows to apply this transformation to anything that is like a reduction but they are not useful otherwise. Maybe have a `StructuredReductionOp` interface?
mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
315	Why is `omitPartialTileCheck` true here? Do we statically know that the dimension tiles?
mlir/test/Dialect/Linalg/transform-tile-reduction.mlir
38	Why is this valid? What happens if `ARG0` does not have a size that is divisible by 5?

This revision now requires changes to proceed.Oct 26 2022, 4:27 AM

mravishankar added inline comments.Oct 26 2022, 7:33 AM

mlir/include/mlir/Interfaces/TilingInterface.td
189	Could we make this an `ArrayRef<unsigned>`. Also I have been lead to believe that recommended C++ usage is to use `int` types unless you are doing bit manipulation.
200	I dont see a reason why this needs to be moved out of `TilingInterface` into any other interface... If an op doesnt support this kind of tiling then it just does not implement these methods. Also I think there is a final state where we can combine `tileToPartialReduction` and `getTiledImplementation`, but that isnt clear yet. So there is a separate method added for now. It is meant to handle tiling of loops that are reductions, which are fine. Haing more interfaces just adds strange dependencies, or results in having redundant interface methods across interfaces.
mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
315	The implementation of the loops already handles the partial tile handling. That is accounted for in the `sizes`. There is no need for an op implementation to account for it.
mlir/test/Dialect/Linalg/transform-tile-reduction.mlir
38	This is a good point. I'd expect there to be an `affine.min` generated by `generateTiledLoopNest` for this case....

Fix bug in insert_slice size

ThomasRaoux added inline comments.Oct 26 2022, 9:38 AM

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
315	Yes, this is handled by `generateTileLoopNest` so the sizes are already adjusted.
mlir/test/Dialect/Linalg/transform-tile-reduction.mlir
38	good catch, there was a bug in the size used to insert. It is fixes now and the size is based on affine_min as expected.

Address more review comments.

mlir/include/mlir/Interfaces/TilingInterface.td
189	Changed it to `ArrayRef<int>`

Fix alignments

Harbormaster completed remote builds in B194437: Diff 470848.Oct 26 2022, 10:43 AM

pifon2a requested changes to this revision.Oct 26 2022, 11:00 AM

pifon2a added inline comments.

mlir/include/mlir/Interfaces/TilingInterface.td
200	I am worried that this change modifies `TilingInterface` substantially without a proper RFC and it is not clear whether these methods should be a part of the interface at all.

This revision now requires changes to proceed.Oct 26 2022, 11:00 AM

mravishankar accepted this revision.Oct 26 2022, 11:14 AM

mravishankar added inline comments.

mlir/include/mlir/Interfaces/TilingInterface.td
200	This is a strict addition to the TilingInterface, and is an opt-in. So does not impact anyone who is using `TilingInterface`. So I dont really think it needs an RFC. Can you give more substantial reasoning as to why this should not be part of the `TilingInterface`? It is reusing the implementation of these methods in other operations, as well as reusing parts of the implementation of tiling using the interface. That is a strong signal to me that it indeed belongs here. I dont see a reason to block this patch. If indeed it turns out to be muddling separation of concerns we can revisit that, but to me this is already layered correctly.

pifon2a added inline comments.Oct 26 2022, 11:24 AM

mlir/include/mlir/Interfaces/TilingInterface.td
200	These methods are needed only for a subset of tileable operations. Adding it all here is similar to adding everything to LinalgStructuredOp interface and then spending a lot of time extracting DPS interface out of it, because LinalgOps are just a subset of DPS ops.

mravishankar added inline comments.Oct 26 2022, 11:39 AM

mlir/include/mlir/Interfaces/TilingInterface.td
200	I don't think they are equivalent. The DPS was deemed not a subset of LinalgStructuredOp, but rather a superset. That's why it was pulled out. As you say this is needed only for a subset of tileable operations. If an operation cannot support this way of Tiling, it just doesn't implement those interface methods. Then you cannot apply this transformation on those ops, and that is exactly how it should work.

Move code out of TilingInterface

ThomasRaoux added inline comments.Oct 27 2022, 12:05 PM

mlir/include/mlir/Interfaces/TilingInterface.td
200	The code is moved all into linalg. This requires exposing more scf helpers.

pifon2a accepted this revision.Oct 27 2022, 12:06 PM

I am not on board with exposing these utility functions. The layering of this going into TilingInterface is the right one which will avoid exposing these methods that are really implementation details. Lets hold off till we can reach a consensus.

This revision now requires changes to proceed.Oct 27 2022, 1:05 PM

Harbormaster completed remote builds in B194726: Diff 471253.Oct 27 2022, 1:36 PM

Moved new interface method to a separate interface outside of TilingInterface

Herald added a subscriber: Moerafaat. · View Herald TranscriptNov 1 2022, 10:19 PM

Based on discussions I moved the methods to a separate interface in parallel of TilingInterface. Please take another look.

ThomasRaoux updated this revision to Diff 472506.Nov 1 2022, 10:22 PM

Harbormaster completed remote builds in B195626: Diff 472506.Nov 1 2022, 10:39 PM

Bunch of suggestions for improving code and legitbility but nothing worth delaying more.

Nice stuff, thank you @ThomasRaoux !

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
652	Maybe document that your transform here overallocates for the sequential case (i.e. you could only have a tensor<5xf32> and iteratively reduce within the loop) but that this also works for the parallel case? Do you expect this to be configurable in the future or to always use the larger allocation? Edit: I realize the regular tiling to `scf.for` should do the "smaller allocation version". So now I am wondering whether we see an advantage doing it this way in the sequential case ? (i.e. is this related to @dcaballe 's vectorization comments ?)
mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
143	better docs here plz
153	nit: merged
154	super nit: column
160	super nit: extra newline
mlir/include/mlir/Interfaces/TilingInterface.td
159	I like this new interface, not all ops can support partial-reduction tiling without quite more effort (e.g. gather/scatter/sort). But a big class of ops does support this out of the box.
162	`to tile reductions using partial reductions + merge` ?
169	nit: methode is French :) (here and below)
179	nit: reductionDims (plural) since you take an ArrayRef ?
189	Our default type is int64_t everywhere, any concrete reason to use int (I don't think the memory usage for a few ints jusifies int)?
mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
256	`OpBuilder::InsertionGuard guard(b);` In anything that takes a builder reference, otherwise you're leaking insertion point changes across function boundaries which creates bugs that are very hard to track.
263	`int64_t` everywhere, all the time, plz (unless doing bit manipulations). If you hit a `size_t` or `unsigned` you don't control, please `static_cast<int64_t>`
276	I don't see a new map here ?
292	Please use ShapedType::isDynamic. We should be close to be able to make `ShapedType::kDynamicSize` private now.
310	insertion guard
317	Can we add a proper AffineMap.h/cpp helper: AffineMap::injectResult(int64_t resultPos, AffineExpr newResult) and just use that directly: newMaps.back() = newMaps.back() .injectResult(...);
318	Some comment here re injecting the split dimension `(e0 .. ek .. en) -> (e0 .. ek, split_dim, ek+1 .. en)`
327	Better comments plz: Step 1. ... Step 1. a. ...
333	Plz add a top-of-the-function `// TODO: SubsetExtractOpInterface`
355	insertion guard
365	I have a personal preference for fewer loops / less code under loops: SmallVector<StringRef> reductionIteratorTypes(intermRank, getParallelIteratorTypeName()); reductionIteratorTypes[dimToMerge] = `(); AffineMap outputMap = AffineMap::multiDimSomethingMap().drop_result(dimToMerge); // or the equivalent APIs
367	`unsigned` -> `int64_t` everywhere plz
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
432	Interesting, we don't have a good way of enforcing that an interface implies another. Can you add a top-level comment that this is assumed?
446	Can we just use an llvm::count and llvm::any_of/all_of ? The readability improvements would greatly trump the performance gains of fusion.. int64_t numReductionDims = llvm::count/count_if(lambda); if (numReductionDims != 1) error; if (llvm::any_of(lambda)) return b.notifyMatchFailure(op, "only reduction dimensions can have non zero tile sizes");
467	`return b.notifyMatchFailure(` for better debugging
470	Can you add a TODO at the func declaration that we should use `ArrayRef<OpFoldResult>` instead of `ArrayRef<Value>`?
475	something's fishy between this insertion point and the next .. either the next one crashes or this one does nothing.
485	You need to wrap this into something else to get a real OpFoldResult. However, with the current broken createOrFold API you can never avoir emitting the constant op ..
493	Is the else ever legitimate ? This needs a comment with a counter-example.

Thanks for refactoring. Having this in a separate interface addresses my concerns about the partial interface. Did not look at the deep details but also do not want to block this.

Nicolas already seems to have reviewed in great detail! The overall structure looks fine to me. Thanks @ThomasRaoux for going through the different iterations to reach convergence.

mlir/include/mlir/Interfaces/TilingInterface.td
159	Agreed! `sort` can actually support it I think :)

This revision is now accepted and ready to land.Nov 2 2022, 11:58 AM

Address review comments.

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
652	I'm adding more doc, in general the only case where we would want to allocate less is if the reduction size is smaller than the tile size. This could become an affin.min but I'm not sure we would ever want that. We can iterate if needed.
mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
256	even if I don't move the insertion point? Added to all of those right now
276	Yes the comment was outdated. Rewrote it.
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
432	Makes sense, added a comment.
475	oops yeah removed this code
493	I don't think there is, I removed the `if`

Harbormaster completed remote builds in B196013: Diff 473058.Nov 3 2022, 4:00 PM

Closed by commit rG3310fe55d948: [mlir][linalg] Add reduction tiling transformation (authored by ThomasRaoux). · Explain WhyNov 3 2022, 4:07 PM

This revision was automatically updated to reflect the committed changes.

ThomasRaoux added a commit: rG3310fe55d948: [mlir][linalg] Add reduction tiling transformation.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.td

88 lines

Utils/

Utils.h

4 lines

SCF/

Transforms/

TileUsingInterface.h

40 lines

Interfaces/

TilingInterface.td

68 lines

lib/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.cpp

27 lines

Transforms/

SplitReduction.cpp

49 lines

TilingInterfaceImpl.cpp

160 lines

Utils/

Utils.cpp

41 lines

SCF/

Transforms/

TileUsingInterface.cpp

84 lines

test/

Dialect/

Linalg/

transform-tile-reduction.mlir

88 lines

Diff 473058

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

Show First 20 Lines • Show All 602 Lines • ▼ Show 20 Lines	def SplitReductionOp : Op<Transform_Dialect, "structured.split_reduction",
let extraClassDeclaration = [{		let extraClassDeclaration = [{
::mlir::DiagnosedSilenceableFailure applyToOne(		::mlir::DiagnosedSilenceableFailure applyToOne(
::mlir::linalg::LinalgOp target,		::mlir::linalg::LinalgOp target,
::llvm::SmallVectorImpl<::mlir::Operation *> &results,		::llvm::SmallVectorImpl<::mlir::Operation *> &results,
::mlir::transform::TransformState &state);		::mlir::transform::TransformState &state);
}];		}];
}		}


		def TileReductionUsingScfOp : Op<Transform_Dialect, "structured.tile_reduction_using_scf",
		[FunctionalStyleTransformOpTrait, MemoryEffectsOpInterface,
		TransformEachOpTrait, TransformOpInterface]> {
		let description = [{
		Indicates that the given `target` op should be transformed with the
		`tileReduction` transformation with the tile size provided as attribute.

		This transformation tiles the `target` along the reduction dimensions. It
		creates a tensor initialized with the identity value. Then it creates nested
		loops with a parallel version of `target` op inside. The parallel op
		dimensions are less or equal to the tile size passed by user.
		After the loop a merge operation is created to do a final reduction with the
		partial reductions.
		The initial tensor always uses the tile size dimension. This may overallocate
		if the tile size is greater than the reduction dimension.

		#### Return modes

		This 3 returned handles point to:
		- the fill op used to initialize the neutral element,
		- the parallel tiled op and
		- the result-combining op.

		#### Example:

		```
		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
		affine_map<(d0, d1) -> (d0)>],
		iterator_types = ["parallel", "reduction"]}
		ins(%arg0 : tensor<?x?xf32>)
		outs(%out : tensor<?xf32>) {
		^bb0(%arg7: f32, %arg9: f32):
		%1 = arith.addf %arg7, %arg9 : f32
		linalg.yield %1 : f32
		} -> tensor<?xf32>
		return %red : tensor<?xf32>
		```

		is transformed into:

		```
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Maybe document that your transform here overallocates for the sequential case (i.e. you could only have a tensor<5xf32> and iteratively reduce within the loop) but that this also works for the parallel case? Do you expect this to be configurable in the future or to always use the larger allocation? Edit: I realize the regular tiling to `scf.for` should do the "smaller allocation version". So now I am wondering whether we see an advantage doing it this way in the sequential case ? (i.e. is this related to @dcaballe 's vectorization comments ?) nicolasvasilache: Maybe document that your transform here overallocates for the sequential case (i.e. you could…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions I'm adding more doc, in general the only case where we would want to allocate less is if the reduction size is smaller than the tile size. This could become an affin.min but I'm not sure we would ever want that. We can iterate if needed. ThomasRaoux: I'm adding more doc, in general the only case where we would want to allocate less is if the…
		%0 = tensor.empty(%dim_1) : tensor<?x5xf32>
		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x5xf32>) -> tensor<?x5xf32>
		%2 = scf.for %arg2 = %c0 to %dim_0 step %c5 iter_args(%arg3 = %1) -> (tensor<?x5xf32>) {
		%extracted_slice = tensor.extract_slice %1[0, 0] [%dim, 5] [1, 1] : tensor<?x5xf32> to tensor<?x5xf32>
		%extracted_slice_2 = tensor.extract_slice %arg0[0, %arg2] [%dim, 5] [1, 1] : tensor<?x?xf32> to tensor<?x5xf32>
		%4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
		affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]}
		ins(%extracted_slice_2 : tensor<?x5xf32>)
		outs(%extracted_slice : tensor<?x5xf32>) {
		^bb0(%in: f32, %out: f32):
		%5 = arith.addf %in, %out : f32
		linalg.yield %5 : f32
		} -> tensor<?x5xf32>
		%dim_3 = tensor.dim %1, %c0 : tensor<?x5xf32>
		%inserted_slice = tensor.insert_slice %4 into %arg3[0, 0] [%dim_3, 5] [1, 1] : tensor<?x5xf32> into tensor<?x5xf32>
		scf.yield %inserted_slice : tensor<?x5xf32>
		}
		%3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
		affine_map<(d0, d1) -> (d0)>],
		iterator_types = ["parallel", "reduction"]}
		ins(%2 : tensor<?x5xf32>)
		outs(%arg1 : tensor<?xf32>) {
		^bb0(%in: f32, %out: f32):
		%4 = arith.addf %in, %out : f32
		linalg.yield %4 : f32
		} -> tensor<?xf32>
		```
		}];

		let arguments = (ins PDL_Operation:$target,
		DefaultValuedAttr<I64ArrayAttr, "{}">:$tile_sizes);
		let results = (outs PDL_Operation:$fill_op,
		PDL_Operation:$split_linalg_op,
		PDL_Operation:$combining_linalg_op);

		let assemblyFormat = "$target attr-dict";

		let extraClassDeclaration = [{
		::mlir::DiagnosedSilenceableFailure applyToOne(
		::mlir::linalg::LinalgOp target,
		::llvm::SmallVectorImpl<::mlir::Operation *> &results,
		::mlir::transform::TransformState &state);
		}];
		}

def TileOp : Op<Transform_Dialect, "structured.tile",		def TileOp : Op<Transform_Dialect, "structured.tile",
[DeclareOpInterfaceMethods<TransformOpInterface>,		[DeclareOpInterfaceMethods<TransformOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>]> {		DeclareOpInterfaceMethods<MemoryEffectsOpInterface>]> {
let description = [{		let description = [{
Indicates that the given `target` op should be tiled with the given sizes.		Indicates that the given `target` op should be tiled with the given sizes.
This transform generates a loop nest with a smaller ("tiled") target		This transform generates a loop nest with a smaller ("tiled") target
operation in its body. Currently limited to LinalgOps.		operation in its body. Currently limited to LinalgOps.

▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	/// of a insert_slice) operation with given offsets, and sizes to its			/// of a insert_slice) operation with given offsets, and sizes to its
	/// rank-reduced version. This is only done for the cases where the size is 1			/// rank-reduced version. This is only done for the cases where the size is 1
	/// and offset is 0. Strictly speaking the offset 0 is not required in general,			/// and offset is 0. Strictly speaking the offset 0 is not required in general,
	/// but non-zero offsets are not handled by SPIR-V backend at this point (and			/// but non-zero offsets are not handled by SPIR-V backend at this point (and
	/// potentially cannot be handled).			/// potentially cannot be handled).
	Optional<SmallVector<ReassociationIndices>>			Optional<SmallVector<ReassociationIndices>>
	getReassociationMapForFoldingUnitDims(ArrayRef<OpFoldResult> mixedSizes);			getReassociationMapForFoldingUnitDims(ArrayRef<OpFoldResult> mixedSizes);

				/// Return the identity numeric value associated to the give op. Return
				/// llvm::None if there is no known neutral element.
				Optional<Attribute> getNeutralElement(Operation *op);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Fusion / Tiling utilities			// Fusion / Tiling utilities
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// The type of loops to be generated during tiling.			/// The type of loops to be generated during tiling.
	enum class LinalgTilingLoopType {			enum class LinalgTilingLoopType {
	Loops = 0,			Loops = 0,
	AffineLoops = 1,			AffineLoops = 1,
	▲ Show 20 Lines • Show All 359 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	tileConsumerAndFuseProducerGreedilyUsingSCFForOp(
RewriterBase &rewriter, TilingInterface consumer,		RewriterBase &rewriter, TilingInterface consumer,
const SCFTileAndFuseOptions &options);		const SCFTileAndFuseOptions &options);

/// Method to lower an `op` that implements the `TilingInterface` to		/// Method to lower an `op` that implements the `TilingInterface` to
/// loops/scalars.		/// loops/scalars.
FailureOr<SmallVector<scf::ForOp>>		FailureOr<SmallVector<scf::ForOp>>
lowerToLoopsUsingSCFForOp(RewriterBase &rewriter, TilingInterface op);		lowerToLoopsUsingSCFForOp(RewriterBase &rewriter, TilingInterface op);

		/// Transformation information returned after reduction tiling.
		struct SCFReductionTilingResult {
		/// The partial reduction tiled op generated.
		Operation *parallelTiledOp;
		/// The final reduction operation merging all the partial reductions.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions better docs here plz nicolasvasilache: better docs here plz
		Operation *mergeOp;
		/// Initial op
		Operation *initialOp;
		/// The `scf.for` operations that iterate over the tiles.
		SmallVector<scf::ForOp> loops;
		};

		/// Method to tile a reduction and generate a parallel op within a serial loop.
		/// Each of the partial reductions are calculated in parallel. Then after the
		/// loop all the partial reduction are merged into a final reduction.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: merged nicolasvasilache: nit: merged
		/// For example for the following sequence
		nicolasvasilacheUnsubmitted Done Reply Inline Actions super nit: column nicolasvasilache: super nit: column
		///
		/// ```mlir
		/// %0 = linalg.generic %in ["parallel", "reduction"]
		/// : tensor<7x9xf32> -> tensor<7xf32>
		/// ```
		///
		nicolasvasilacheUnsubmitted Done Reply Inline Actions super nit: extra newline nicolasvasilache: super nit: extra newline
		/// into:
		///
		/// ```mlir
		/// %0 = linalg.fill ... : tensor<7x4xf32>
		/// %1 = scf.for ... iter_args(%arg0 = %0)
		/// %2 = tensor.extract_slice %arg0 : tensor<7x4xf32> -> tensor<7x?xf32>
		/// %3 = tensor.extract_slice %in : tensor<7x9xf32> -> tensor<7x?xf32>
		/// %4 = linalg.generic %2, %3 ["parallel", "parallel"]
		/// : tensor<7x?xf32> -> tensor<7x?xf32>
		/// %5 = tensor.insert_slice %3, %0[0, 0] : tensor<7x4xf32>
		/// }
		/// %6 = linalg.generic %1 ["parallel", "reduction"]
		/// : tensor<7x4xf32> -> tensor<7xf32>
		/// ```
		FailureOr<scf::SCFReductionTilingResult>
		tileReductionUsingScf(PatternRewriter &b, PartialReductionOpInterface op,
		ArrayRef<OpFoldResult> tileSize);

} // namespace scf		} // namespace scf
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_SCF_TRANSFORMS_TILEUSINGINTERFACE_H		#endif // MLIR_DIALECT_SCF_TRANSFORMS_TILEUSINGINTERFACE_H

mlir/include/mlir/Interfaces/TilingInterface.td

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	let methods = [
"ValueRange ":$ivs),		"ValueRange ":$ivs),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
return failure();		return failure();
}]		}]
>		>
];		];
}		}

		def PartialReductionOpInterface : OpInterface<"PartialReductionOpInterface"> {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I like this new interface, not all ops can support partial-reduction tiling without quite more effort (e.g. gather/scatter/sort). But a big class of ops does support this out of the box. nicolasvasilache: I like this new interface, not all ops can support partial-reduction tiling without quite more…
		mravishankarUnsubmitted Not Done Reply Inline Actions Agreed! `sort` can actually support it I think :) mravishankar: Agreed! `sort` can actually support it I think :)
		let description = [{
		Interface for allowing operations to expose information needed to
		tile reductions using partial reduction followed by merge. This is
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `to tile reductions using partial reductions + merge` ? nicolasvasilache: `to tile reductions using partial reductions + merge` ?
		complementary to TilingInterface to tile reductions.
		mravishankarUnsubmitted Not Done Reply Inline Actions Is there are reason the interface method has to return a tensor. I think if it returns the identity element that should be enough? I am not opposed to doing this.. This might actually be more general, but trying to understand what is the minimum requirement that the interface should expect ops to implement. mravishankar: Is there are reason the interface method has to return a tensor. I think if it returns the…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions If we return the identity element we still need to create op generating the tensor which would be dependent on the op we are tiling. (for linalg we would generate linalg.fill but not for other ops). Where would you do that? ThomasRaoux: If we return the identity element we still need to create op generating the tensor which would…
		}];
		let cppNamespace = "::mlir";
		let methods = [
		InterfaceMethod<
		/desc=/[{
		Method to generate a tensor initalized with the identity value of the
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions nit: methode is French :) (here and below) nicolasvasilache: nit: methode is French :) (here and below)
		operation reduction. The tensor shape is equal to operation result
		shape with new dimension for each non zero tile size.
		}],
		/retType=/"FailureOr<Operation*>",
		/methodName=/"generateInitialTensorForPartialReduction",
		/args=/(ins
		"OpBuilder &":$b,
		"Location ":$loc,
		"ArrayRef<OpFoldResult>":$sizes,
		"ArrayRef<int>":$reductionDim),
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions nit: reductionDims (plural) since you take an ArrayRef ? nicolasvasilache: nit: reductionDims (plural) since you take an ArrayRef ?
		/methodBody=/"",
		/defaultImplementation=/[{
		return failure();
		}]
		>,
		InterfaceMethod<
		/desc=/[{
		Method to generate a tiled version of the operation where the tiled
		reduction dimension are converted to parallel dimensions with a size
		less or equal to the tile size. This is meant to be used with
		mravishankarUnsubmitted Not Done Reply Inline Actions Could we make this an `ArrayRef<unsigned>`. Also I have been lead to believe that recommended C++ usage is to use `int` types unless you are doing bit manipulation. mravishankar: Could we make this an `ArrayRef<unsigned>`. Also I have been lead to believe that recommended…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Changed it to `ArrayRef<int>` ThomasRaoux: Changed it to `ArrayRef<int>`
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Our default type is int64_t everywhere, any concrete reason to use int (I don't think the memory usage for a few ints jusifies int)? nicolasvasilache: Our default type is int64_t everywhere, any concrete reason to use int (I don't think the…
		`mergeReductions` method which will combine the partial reductions.
		}],
		/retType=/"Operation*",
		/methodName=/"tileToPartialReduction",
		/args=/(ins
		"OpBuilder &":$b,
		"Location ":$loc,
		"ValueRange":$init,
		"ArrayRef<OpFoldResult>":$offsets,
		"ArrayRef<OpFoldResult>":$sizes,
		"ArrayRef<int>":$reductionDims),
		pifon2aUnsubmitted Not Done Reply Inline Actions why are `mergeReductions`, `tileToPartialReduction`, `generateInitialTensorForPartialReduction` a part of tiling interface? They work only for Linalg, don't they? pifon2a: why are `mergeReductions`, `tileToPartialReduction`, `generateInitialTensorForPartialReduction`…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions The idea is that other ops than linalg could implement this interface in the future. For instance we did have to do reduction splitting for linalgext topK and that could be an op that would implement this interface. Also since we need to re-use helpers in the tiling code this allows us to do that without adding dependency between the tiling code and linalg. ThomasRaoux: The idea is that other ops than linalg could implement this interface in the future. For…
		herhutUnsubmitted Not Done Reply Inline Actions Is this really tiling the operation? Yes, at some level it moves the computation to working on tiles but those tiles are not "normal" tiles in the sense of all other operations in this interface. Instead, these methods allow to "distribute" a reduction over multiple threads. So I see how having these methods implemented allows to apply this transformation to anything that is like a reduction but they are not useful otherwise. Maybe have a `StructuredReductionOp` interface? herhut: Is this really tiling the operation? Yes, at some level it moves the computation to working on…
		mravishankarUnsubmitted Not Done Reply Inline Actions I dont see a reason why this needs to be moved out of `TilingInterface` into any other interface... If an op doesnt support this kind of tiling then it just does not implement these methods. Also I think there is a final state where we can combine `tileToPartialReduction` and `getTiledImplementation`, but that isnt clear yet. So there is a separate method added for now. It is meant to handle tiling of loops that are reductions, which are fine. Haing more interfaces just adds strange dependencies, or results in having redundant interface methods across interfaces. mravishankar: I dont see a reason why this needs to be moved out of `TilingInterface` into any other…
		pifon2aUnsubmitted Not Done Reply Inline Actions I am worried that this change modifies `TilingInterface` substantially without a proper RFC and it is not clear whether these methods should be a part of the interface at all. pifon2a: I am worried that this change modifies `TilingInterface` substantially without a proper RFC and…
		mravishankarUnsubmitted Not Done Reply Inline Actions This is a strict addition to the TilingInterface, and is an opt-in. So does not impact anyone who is using `TilingInterface`. So I dont really think it needs an RFC. Can you give more substantial reasoning as to why this should not be part of the `TilingInterface`? It is reusing the implementation of these methods in other operations, as well as reusing parts of the implementation of tiling using the interface. That is a strong signal to me that it indeed belongs here. I dont see a reason to block this patch. If indeed it turns out to be muddling separation of concerns we can revisit that, but to me this is already layered correctly. mravishankar: This is a strict addition to the TilingInterface, and is an opt-in. So does not impact anyone…
		pifon2aUnsubmitted Not Done Reply Inline Actions These methods are needed only for a subset of tileable operations. Adding it all here is similar to adding everything to LinalgStructuredOp interface and then spending a lot of time extracting DPS interface out of it, because LinalgOps are just a subset of DPS ops. pifon2a: These methods are needed only for a subset of tileable operations. Adding it all here is…
		mravishankarUnsubmitted Not Done Reply Inline Actions I don't think they are equivalent. The DPS was deemed not a subset of LinalgStructuredOp, but rather a superset. That's why it was pulled out. As you say this is needed only for a subset of tileable operations. If an operation cannot support this way of Tiling, it just doesn't implement those interface methods. Then you cannot apply this transformation on those ops, and that is exactly how it should work. mravishankar: I don't think they are equivalent. The DPS was deemed not a subset of LinalgStructuredOp, but…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions The code is moved all into linalg. This requires exposing more scf helpers. ThomasRaoux: The code is moved all into linalg. This requires exposing more scf helpers.
		/methodBody=/"",
		/defaultImplementation=/[{
		return nullptr;
		}]
		>,
		InterfaceMethod<
		/desc=/[{
		Method to merge partial reductions for an operation that has been
		tiled along the reduction dimensions. This will only apply the
		reduction the operation.
		}],
		/retType=/"Operation*",
		/methodName=/"mergeReductions",
		/args=/(ins
		"OpBuilder &":$b,
		"Location ":$loc,
		"ValueRange":$partialReduce,
		"ArrayRef<int>":$reductionDim),
		/methodBody=/"",
		/defaultImplementation=/[{
		return nullptr;
		}]
		>
		];
		}
#endif // MLIR_TILINGINTERFACE		#endif // MLIR_TILINGINTERFACE

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

Show First 20 Lines • Show All 1,089 Lines • ▼ Show 20 Lines	transform::SplitReductionOp::applyToOne(linalg::LinalgOp target,
results.push_back(splitResult->initOrAlloc);		results.push_back(splitResult->initOrAlloc);
results.push_back(splitResult->fillOp);		results.push_back(splitResult->fillOp);
results.push_back(splitResult->splitLinalgOp);		results.push_back(splitResult->splitLinalgOp);
results.push_back(splitResult->resultCombiningLinalgOp);		results.push_back(splitResult->resultCombiningLinalgOp);
return DiagnosedSilenceableFailure(success());		return DiagnosedSilenceableFailure(success());
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// SplitReductionOp
		//===----------------------------------------------------------------------===//

		DiagnosedSilenceableFailure transform::TileReductionUsingScfOp::applyToOne(
		linalg::LinalgOp target, SmallVectorImpl<Operation *> &results,
		transform::TransformState &state) {
		SimpleRewriter rewriter(getContext());
		rewriter.setInsertionPoint(target);
		SmallVector<int64_t> tileSizes = extractFromI64ArrayAttr(getTileSizes());
		SmallVector<OpFoldResult> sizes;
		for (int64_t size : tileSizes) {
		sizes.push_back(rewriter.getIndexAttr(size));
		}

		FailureOr<scf::SCFReductionTilingResult> result = scf::tileReductionUsingScf(
		rewriter, cast<PartialReductionOpInterface>(target.getOperation()),
		sizes);

		if (failed(result))
		return DiagnosedSilenceableFailure(reportUnknownTransformError(target));
		results.push_back(result->initialOp);
		results.push_back(result->parallelTiledOp);
		results.push_back(result->mergeOp);
		return DiagnosedSilenceableFailure(success());
		}

		//===----------------------------------------------------------------------===//
// TileOp		// TileOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::TileOp::apply(TransformResults &transformResults,		transform::TileOp::apply(TransformResults &transformResults,
TransformState &state) {		TransformState &state) {
SmallVector<int64_t> tileSizes = extractFromI64ArrayAttr(getStaticSizes());		SmallVector<int64_t> tileSizes = extractFromI64ArrayAttr(getStaticSizes());

▲ Show 20 Lines • Show All 536 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/SplitReduction.cpp

Show All 20 Lines
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Tensor/Utils/Utils.h"		#include "mlir/Dialect/Tensor/Utils/Utils.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::linalg;		using namespace mlir::linalg;

/// Return the identity numeric value associated to the give op.
static Attribute getNeutralElement(Operation *op) {
// Builder only used as helper for attribute creation.
OpBuilder b(op->getContext());
Type resultType = op->getResult(0).getType();
if (auto floatType = resultType.dyn_cast<FloatType>()) {
const llvm::fltSemantics &semantic = floatType.getFloatSemantics();
if (isa<arith::AddFOp>(op))
return b.getFloatAttr(resultType, llvm::APFloat::getZero(semantic));
if (isa<arith::MulFOp>(op))
return b.getFloatAttr(resultType, llvm::APFloat(semantic, 1));
if (isa<arith::MaxFOp>(op))
return b.getFloatAttr(resultType,
llvm::APFloat::getLargest(semantic, true));
if (isa<arith::MinFOp>(op))
return b.getFloatAttr(resultType,
llvm::APFloat::getLargest(semantic, true));
return Attribute();
}
if (isa<arith::AddIOp, arith::OrIOp, arith::XOrIOp>(op))
return b.getIntegerAttr(resultType, 0);
if (isa<arith::AndIOp>(op))
return b.getIntegerAttr(resultType, -1);
if (isa<arith::MaxSIOp>(op))
return b.getIntegerAttr(resultType, std::numeric_limits<int64_t>::min());
if (isa<arith::MinSIOp>(op))
return b.getIntegerAttr(resultType, std::numeric_limits<int64_t>::max());
if (isa<arith::MulIOp>(op))
return b.getIntegerAttr(resultType, 1);
return Attribute();
}

FailureOr<SplitReductionResult> mlir::linalg::splitReduction(		FailureOr<SplitReductionResult> mlir::linalg::splitReduction(
PatternRewriter &b, LinalgOp op,		PatternRewriter &b, LinalgOp op,
const ControlSplitReductionFn &controlSplitReductionFn, bool useAlloc) {		const ControlSplitReductionFn &controlSplitReductionFn, bool useAlloc) {
OpBuilder::InsertionGuard guard(b);		OpBuilder::InsertionGuard guard(b);
b.setInsertionPoint(op);		b.setInsertionPoint(op);

SplitReductionOptions control = controlSplitReductionFn(op);		SplitReductionOptions control = controlSplitReductionFn(op);
int64_t ratio = control.ratio;		int64_t ratio = control.ratio;
Show All 14 Lines	return b.notifyMatchFailure(
op, "Reduction dimension not divisible by split ratio");		op, "Reduction dimension not divisible by split ratio");

SmallVector<Operation *, 4> combinerOps;		SmallVector<Operation *, 4> combinerOps;
if (!matchReduction(op.getRegionOutputArgs(), 0, combinerOps) \|\|		if (!matchReduction(op.getRegionOutputArgs(), 0, combinerOps) \|\|
combinerOps.size() != 1)		combinerOps.size() != 1)
return b.notifyMatchFailure(op, "Cannot match the reduction pattern");		return b.notifyMatchFailure(op, "Cannot match the reduction pattern");

Operation *reductionOp = combinerOps[0];		Operation *reductionOp = combinerOps[0];
Attribute identity = getNeutralElement(reductionOp);		Optional<Attribute> identity = getNeutralElement(reductionOp);
if (!identity)		if (!identity.has_value())
return b.notifyMatchFailure(op, "Unknown identity value for the reduction");		return b.notifyMatchFailure(op, "Unknown identity value for the reduction");

Location loc = op->getLoc();		Location loc = op->getLoc();
SmallVector<Value> newInputs;		SmallVector<Value> newInputs;
SmallVector<AffineMap> newMaps;		SmallVector<AffineMap> newMaps;
// Calculate the new shapes and indexing maps of the input operands.		// Calculate the new shapes and indexing maps of the input operands.
for (OpOperand *operand : op.getDpsInputOperands()) {		for (OpOperand *operand : op.getDpsInputOperands()) {
AffineMap map = op.getMatchingIndexingMap(operand);		AffineMap map = op.getMatchingIndexingMap(operand);
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	emptyOrAllocTensor = b.create<bufferization::AllocTensorOp>(
loc,		loc,
RankedTensorType::get(newOutputShape,		RankedTensorType::get(newOutputShape,
op.getRegionOutputArgs()[0].getType()),		op.getRegionOutputArgs()[0].getType()),
ValueRange{});		ValueRange{});
} else {		} else {
emptyOrAllocTensor = b.create<tensor::EmptyOp>(		emptyOrAllocTensor = b.create<tensor::EmptyOp>(
loc, newOutputShape, op.getRegionOutputArgs()[0].getType());		loc, newOutputShape, op.getRegionOutputArgs()[0].getType());
}		}
Value constantOp = b.create<arith::ConstantOp>(loc, identity);		Value constantOp = b.create<arith::ConstantOp>(loc, *identity);
Value identityTensor =		Value identityTensor =
b.create<linalg::FillOp>(op->getLoc(), constantOp, emptyOrAllocTensor)		b.create<linalg::FillOp>(op->getLoc(), constantOp, emptyOrAllocTensor)
.getResult(0);		.getResult(0);

newMaps.push_back(AffineMap::get(oldOutputMap.getNumDims() + 1, 0, outputExpr,		newMaps.push_back(AffineMap::get(oldOutputMap.getNumDims() + 1, 0, outputExpr,
op.getContext()));		op.getContext()));
SmallVector<StringRef> newIteratorTypes;		SmallVector<StringRef> newIteratorTypes;
for (auto &it : llvm::enumerate(op.getIteratorTypesArray())) {		for (auto &it : llvm::enumerate(op.getIteratorTypesArray())) {
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	if (reductionDimSize == ShapedType::kDynamicSize \|\|
insertSplitDimension >= loopRanges.size())		insertSplitDimension >= loopRanges.size())
return b.notifyMatchFailure(		return b.notifyMatchFailure(
op, "first reduction dimension not divisible by split factor");		op, "first reduction dimension not divisible by split factor");

SmallVector<Operation *> combinerOps;		SmallVector<Operation *> combinerOps;
if (!matchReduction(op.getRegionOutputArgs(), 0, combinerOps))		if (!matchReduction(op.getRegionOutputArgs(), 0, combinerOps))
return b.notifyMatchFailure(op, "cannot match a reduction pattern");		return b.notifyMatchFailure(op, "cannot match a reduction pattern");

SmallVector<Attribute> neutralElements = llvm::to_vector<4>(		SmallVector<Attribute> neutralElements;
llvm::map_range(combinerOps, [&](Operation *reductionOp) {		for (Operation *reductionOp : combinerOps) {
return getNeutralElement(reductionOp);		Optional<Attribute> neutralElement = getNeutralElement(reductionOp);
}));		if (!neutralElement.has_value())
		return b.notifyMatchFailure(op, "cannot find neutral element.");
		neutralElements.push_back(*neutralElement);
		}
if (!llvm::all_of(neutralElements, [](Attribute attr) { return attr; }))		if (!llvm::all_of(neutralElements, [](Attribute attr) { return attr; }))
return b.notifyMatchFailure(op, "unknown reduction neutral");		return b.notifyMatchFailure(op, "unknown reduction neutral");

// TODO: relax this when multi-reduction support is available.		// TODO: relax this when multi-reduction support is available.
if (op.getNumDpsInits() != static_cast<int64_t>(neutralElements.size()))		if (op.getNumDpsInits() != static_cast<int64_t>(neutralElements.size()))
return b.notifyMatchFailure(op, "expect one reduction per output");		return b.notifyMatchFailure(op, "expect one reduction per output");

// Rewrite part.		// Rewrite part.
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp

//===- TilingInterfaceImpl.cpp - Implementation of TilingInterface -------===//		//===- TilingInterfaceImpl.cpp - Implementation of TilingInterface -------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Linalg/Transforms/TilingInterfaceImpl.h"		#include "mlir/Dialect/Linalg/Transforms/TilingInterfaceImpl.h"

		#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Arith/Utils/Utils.h"		#include "mlir/Dialect/Arith/Utils/Utils.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Utils/StaticValueUtils.h"		#include "mlir/Dialect/Utils/StaticValueUtils.h"
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	for (OpOperand &operand : linalgOp->getOpOperands()) {
indexedValues.push_back(load);		indexedValues.push_back(load);
}		}

/// Inline the op payload and store the result.		/// Inline the op payload and store the result.
return inlinePayload(builder, linalgOp, ivs, indexedValues);		return inlinePayload(builder, linalgOp, ivs, indexedValues);
}		}
};		};

		//===----------------------------------------------------------------------===//
		// External Model for implementing `PartialReductionInterface` for `LinalgOp`s.
		//===----------------------------------------------------------------------===//

		/// External model implementation of PartialReductionInterface for LinalgOps.
		template <typename LinalgOpTy>
		struct LinalgOpPartialReductionInterface
		: public PartialReductionOpInterface::ExternalModel<
		LinalgOpPartialReductionInterface<LinalgOpTy>, LinalgOpTy> {
		FailureOr<Operation *> generateInitialTensorForPartialReduction(
		Operation *op, OpBuilder &b, Location loc, ArrayRef<OpFoldResult> sizes,
		ArrayRef<int> reductionDims) const {
		auto linalgOp = cast<LinalgOp>(op);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `OpBuilder::InsertionGuard guard(b);` In anything that takes a builder reference, otherwise you're leaking insertion point changes across function boundaries which creates bugs that are very hard to track. nicolasvasilache: `OpBuilder::InsertionGuard guard(b);` In anything that takes a builder reference, otherwise…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions even if I don't move the insertion point? Added to all of those right now ThomasRaoux: even if I don't move the insertion point? Added to all of those right now
		OpBuilder::InsertionGuard guard(b);
		assert(reductionDims.size() == 1 &&
		"only support single reduction right now.");
		if (linalgOp.hasBufferSemantics())
		return op->emitOpError("expected operation to have tensor semantics");
		// Insert the new parallel dimension based on the index of the reduction
		// loop. This could be controlled by user for more flexibility.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `int64_t` everywhere, all the time, plz (unless doing bit manipulations). If you hit a `size_t` or `unsigned` you don't control, please `static_cast<int64_t>` nicolasvasilache: `int64_t` everywhere, all the time, plz (unless doing bit manipulations). If you hit a `size_t`…
		int64_t insertSplitDimension = reductionDims[0];

		SmallVector<Operation *, 4> combinerOps;
		if (!matchReduction(linalgOp.getRegionOutputArgs(), 0, combinerOps) \|\|
		combinerOps.size() != 1)
		return op->emitOpError("Failed to anaysis the reduction operation.");

		Operation *reductionOp = combinerOps[0];
		Optional<Attribute> identity = getNeutralElement(reductionOp);
		if (!identity.has_value())
		mravishankarUnsubmitted Not Done Reply Inline Actions If I am reading this correctly, the setting of the identity element can be moved into the `tileReductionUsingSCFForOp` method (related to the comment above) mravishankar: If I am reading this correctly, the setting of the identity element can be moved into the…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Getting the reductionOp requires code depending on linalg so I'm not sure how we could do that without adding more interface functions. ThomasRaoux: Getting the reductionOp requires code depending on linalg so I'm not sure how we could do that…
		return op->emitOpError(
		"Failed to get an identity value for the reduction operation.");

		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I don't see a new map here ? nicolasvasilache: I don't see a new map here ?
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Yes the comment was outdated. Rewrote it. ThomasRaoux: Yes the comment was outdated. Rewrote it.
		// Calculate the new shape, we insert the new dimension based on the index
		// of the reduction dimension.
		SmallVector<int64_t> newOutputShape;
		ArrayRef<int64_t> oldShape =
		linalgOp.getShape(linalgOp.getDpsInitOperand(0));
		SmallVector<Value> dynamicDims;
		for (int64_t idx : llvm::seq<int64_t>(0, oldShape.size() + 1)) {
		if (idx == insertSplitDimension) {
		dispatchIndexOpFoldResults(sizes[idx], dynamicDims, newOutputShape,
		ShapedType::kDynamicStrideOrOffset);
		continue;
		}
		int64_t oldIdx = idx < insertSplitDimension ? idx : idx - 1;
		int64_t dim = oldShape[oldIdx];
		newOutputShape.push_back(dim);
		if (ShapedType::isDynamic(dim))
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Please use ShapedType::isDynamic. We should be close to be able to make `ShapedType::kDynamicSize` private now. nicolasvasilache: Please use ShapedType::isDynamic. We should be close to be able to make `ShapedType…
		dynamicDims.push_back(b.createOrFold<tensor::DimOp>(
		loc, linalgOp.getDpsInitOperand(0)->get(), oldIdx));
		}
		Value emptyTensor = b.create<tensor::EmptyOp>(
		loc, newOutputShape, linalgOp.getRegionOutputArgs()[0].getType(),
		dynamicDims);
		Value constantOp = b.create<arith::ConstantOp>(loc, *identity);
		auto identityTensor =
		b.create<linalg::FillOp>(loc, constantOp, emptyTensor);
		return identityTensor.getOperation();
		}

		Operation tileToPartialReduction(Operation op, OpBuilder &b, Location loc,
		ValueRange init,
		ArrayRef<OpFoldResult> offsets,
		ArrayRef<OpFoldResult> sizes,
		ArrayRef<int> reductionDims) const {
		OpBuilder::InsertionGuard guard(b);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions insertion guard nicolasvasilache: insertion guard
		auto linalgOp = cast<LinalgOp>(op);
		assert(reductionDims.size() == 1 &&
		"only support single reduction right now.");
		int64_t insertSplitDimension = reductionDims[0];

		herhutUnsubmitted Not Done Reply Inline Actions Why is `omitPartialTileCheck` true here? Do we statically know that the dimension tiles? herhut: Why is `omitPartialTileCheck` true here? Do we statically know that the dimension tiles?
		mravishankarUnsubmitted Not Done Reply Inline Actions The implementation of the loops already handles the partial tile handling. That is accounted for in the `sizes`. There is no need for an op implementation to account for it. mravishankar: The implementation of the loops already handles the partial tile handling. That is accounted…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Yes, this is handled by `generateTileLoopNest` so the sizes are already adjusted. ThomasRaoux: Yes, this is handled by `generateTileLoopNest` so the sizes are already adjusted.
		AffineMap oldOutputMap =
		linalgOp.getMatchingIndexingMap(linalgOp.getDpsInitOperand(0));
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can we add a proper AffineMap.h/cpp helper: AffineMap::injectResult(int64_t resultPos, AffineExpr newResult) and just use that directly: newMaps.back() = newMaps.back() .injectResult(...); nicolasvasilache: Can we add a proper AffineMap.h/cpp helper: ``` AffineMap::injectResult(int64_t resultPos…
		SmallVector<AffineExpr> outputExpr;
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Some comment here re injecting the split dimension `(e0 .. ek .. en) -> (e0 .. ek, split_dim, ek+1 .. en)` nicolasvasilache: Some comment here re injecting the split dimension `(e0 .. ek .. en) -> (e0 .. ek, split_dim…
		for (auto &[idx, expr] : llvm::enumerate(oldOutputMap.getResults())) {
		if (static_cast<int64_t>(idx) == insertSplitDimension) {
		outputExpr.push_back(b.getAffineDimExpr(reductionDims[0]));
		}
		outputExpr.push_back(expr);
		}
		if (insertSplitDimension == oldOutputMap.getNumResults())
		outputExpr.push_back(b.getAffineDimExpr(reductionDims[0]));

		nicolasvasilacheUnsubmitted Done Reply Inline Actions Better comments plz: Step 1. ... Step 1. a. ... nicolasvasilache: Better comments plz: // Step 1. ... // Step 1. a. ...
		// Step 1: Extract a slice of the input operands.
		SmallVector<Value> valuesToTile = linalgOp.getDpsInputOperands();
		SmallVector<Value, 4> tiledOperands =
		makeTiledShapes(b, loc, op, valuesToTile, offsets, sizes, {}, true);

		// Step 2: Extract the accumulator operands
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Plz add a top-of-the-function `// TODO: SubsetExtractOpInterface` nicolasvasilache: Plz add a top-of-the-function `// TODO: SubsetExtractOpInterface`
		SmallVector<OpFoldResult> strides(offsets.size(), b.getIndexAttr(1));
		SmallVector<OpFoldResult> outOffsets(offsets.size(), b.getIndexAttr(0));
		// TODO: use SubsetExtractOpInterface once it is available.
		Value out = b.create<tensor::ExtractSliceOp>(loc, init[0], outOffsets,
		sizes, strides);

		// Step3. create a generic op where the reduction dimension is replaced by a
		// parallel dimension of the size of reduction.
		SmallVector<StringRef> newIteratorTypes = linalgOp.getIteratorTypesArray();
		newIteratorTypes[reductionDims[0]] = getParallelIteratorTypeName();
		SmallVector<AffineMap> newMaps = linalgOp.getIndexingMapsArray();
		newMaps.back() = AffineMap::get(newMaps.back().getNumDims(), 0, outputExpr,
		linalgOp.getContext());
		auto genericOp =
		b.create<GenericOp>(loc, TypeRange({out.getType()}), tiledOperands,
		ValueRange({out}), newMaps, newIteratorTypes);
		BlockAndValueMapping mapping;
		op->getRegion(0).cloneInto(&genericOp.getRegion(),
		genericOp.getRegion().begin(), mapping);
		return genericOp.getOperation();
		}

		nicolasvasilacheUnsubmitted Done Reply Inline Actions insertion guard nicolasvasilache: insertion guard
		Operation mergeReductions(Operation op, OpBuilder &b, Location loc,
		ValueRange partialReduce,
		ArrayRef<int> reductionDims) const {
		auto linalgOp = cast<LinalgOp>(op);
		assert(reductionDims.size() == 1 &&
		"only support single reduction right now.");
		int64_t dimToMerge = reductionDims[0];

		// Then create a new reduction that only reduce the newly added dimension
		// from the previous op.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I have a personal preference for fewer loops / less code under loops: SmallVector<StringRef> reductionIteratorTypes(intermRank, getParallelIteratorTypeName()); reductionIteratorTypes[dimToMerge] = `(); AffineMap outputMap = AffineMap::multiDimSomethingMap().drop_result(dimToMerge); // or the equivalent APIs nicolasvasilache: I have a personal preference for fewer loops / less code under loops: ```…
		int64_t intermRank =
		partialReduce[0].getType().cast<ShapedType>().getRank();
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `unsigned` -> `int64_t` everywhere plz nicolasvasilache: `unsigned` -> `int64_t` everywhere plz
		AffineMap inputMap = b.getMultiDimIdentityMap(intermRank);
		SmallVector<StringRef> reductionIteratorTypes;
		SmallVector<AffineExpr> exprs;
		for (int64_t i : llvm::seq<int64_t>(0, intermRank)) {
		if (dimToMerge == i) {
		reductionIteratorTypes.push_back(getReductionIteratorTypeName());
		} else {
		exprs.push_back(b.getAffineDimExpr(i));
		reductionIteratorTypes.push_back(getParallelIteratorTypeName());
		}
		}
		AffineMap outputMap =
		AffineMap::get(intermRank, 0, exprs, op->getContext());
		SmallVector<AffineMap> reductionMaps = {inputMap, outputMap};

		SmallVector<Operation *, 4> combinerOps;
		matchReduction(linalgOp.getRegionOutputArgs(), 0, combinerOps);
		Operation *reductionOp = combinerOps[0];

		auto reduction = b.create<GenericOp>(
		loc, op->getResultTypes(), ValueRange({partialReduce[0]}),
		SmallVector<Value>{linalgOp.getDpsInitOperands()}, reductionMaps,
		reductionIteratorTypes,
		[reductionOp](OpBuilder &b, Location loc, ValueRange inputs) {
		Operation clonedReductionOp = b.clone(reductionOp);
		clonedReductionOp->setOperand(0, inputs[0]);
		clonedReductionOp->setOperand(1, inputs[1]);
		b.create<linalg::YieldOp>(loc, clonedReductionOp->getResult(0));
		});
		return reduction.getOperation();
		}
		};

} // namespace		} // namespace

template <typename OpType>		template <typename OpType>
static void registerOne(MLIRContext *ctx) {		static void registerOne(MLIRContext *ctx) {
OpType::template attachInterface<LinalgOpTilingInterface<OpType>>(*ctx);		OpType::template attachInterface<LinalgOpTilingInterface<OpType>>(*ctx);
		OpType::template attachInterface<LinalgOpPartialReductionInterface<OpType>>(
		*ctx);
}		}

/// Variadic helper function.		/// Variadic helper function.
template <typename... OpTypes>		template <typename... OpTypes>
static void registerAll(MLIRContext *ctx) {		static void registerAll(MLIRContext *ctx) {
(registerOne<OpTypes>(ctx), ...);		(registerOne<OpTypes>(ctx), ...);
}		}

Show All 11 Lines

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

Show First 20 Lines • Show All 942 Lines • ▼ Show 20 Lines	computeAllSliceParameters(OpBuilder &builder, Location loc, LinalgOp linalgOp,

// Construct (potentially temporary) mins and maxes on which to apply maps		// Construct (potentially temporary) mins and maxes on which to apply maps
// that define tile subshapes.		// that define tile subshapes.
SmallVector<OpFoldResult> lbs =		SmallVector<OpFoldResult> lbs =
computeTileOffsets(builder, loc, ivs, tileSizes);		computeTileOffsets(builder, loc, ivs, tileSizes);
SmallVector<OpFoldResult> subShapeSizes =		SmallVector<OpFoldResult> subShapeSizes =
computeTileSizes(builder, loc, tileSizes, sizeBounds);		computeTileSizes(builder, loc, tileSizes, sizeBounds);

assert(static_cast<int64_t>(valuesToTile.size()) ==		assert(static_cast<int64_t>(valuesToTile.size()) <=
linalgOp->getNumOperands() &&		linalgOp->getNumOperands() &&
"expected one value to tile for every operand");		"more value to tile than operands.");
SmallVector<Optional<SliceParameters>> allSliceParams;		SmallVector<Optional<SliceParameters>> allSliceParams;
allSliceParams.reserve(valuesToTile.size());		allSliceParams.reserve(valuesToTile.size());
for (OpOperand &opOperand : linalgOp->getOpOperands()) {		for (auto [opOperand, val] :
Value shapedOp = valuesToTile[opOperand.getOperandNumber()];		llvm::zip(linalgOp->getOpOperands(), valuesToTile)) {
		Value shapedOp = val;
LLVM_DEBUG(llvm::dbgs() << "makeTiledShapes: for operand " << shapedOp);		LLVM_DEBUG(llvm::dbgs() << "makeTiledShapes: for operand " << shapedOp);
AffineMap map = linalgOp.getMatchingIndexingMap(&opOperand);		AffineMap map = linalgOp.getMatchingIndexingMap(&opOperand);
// Use `opOperand` as is if it is not tiled and not an output tensor. Having		// Use `opOperand` as is if it is not tiled and not an output tensor. Having
// an extract/insert slice pair for all output tensors simplifies follow up		// an extract/insert slice pair for all output tensors simplifies follow up
// transformations such as padding and bufferization since the		// transformations such as padding and bufferization since the
// extract/insert slice pairs make the accessed iteration argument		// extract/insert slice pairs make the accessed iteration argument
// subdomains explicit.		// subdomains explicit.

▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	getReassociationMapForFoldingUnitDims(ArrayRef<OpFoldResult> mixedSizes) {
// When the reassociations are not empty, then fold the remaining		// When the reassociations are not empty, then fold the remaining
// unit-dimensions into the last dimension. If the reassociations so far is		// unit-dimensions into the last dimension. If the reassociations so far is
// empty, then leave it emtpy. This will fold everything to a rank-0 tensor.		// empty, then leave it emtpy. This will fold everything to a rank-0 tensor.
if (!curr.empty() && !reassociation.empty())		if (!curr.empty() && !reassociation.empty())
reassociation.back().append(curr.begin(), curr.end());		reassociation.back().append(curr.begin(), curr.end());
return reassociation;		return reassociation;
}		}

		/// Return the identity numeric value associated to the give op.
		Optional<Attribute> getNeutralElement(Operation *op) {
		Hardcode84Unsubmitted Not Done Reply Inline Actions maybe return `Optional<Attribute>` or `FailureOr<Attribute>` to force users to check validity? Hardcode84: maybe return `Optional<Attribute>` or `FailureOr<Attribute>` to force users to check validity?
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Changed it. ThomasRaoux: Changed it.
		// Builder only used as helper for attribute creation.
		OpBuilder b(op->getContext());
		Type resultType = op->getResult(0).getType();
		if (auto floatType = resultType.dyn_cast<FloatType>()) {
		const llvm::fltSemantics &semantic = floatType.getFloatSemantics();
		if (isa<arith::AddFOp>(op))
		return b.getFloatAttr(resultType, llvm::APFloat::getZero(semantic));
		if (isa<arith::MulFOp>(op))
		return b.getFloatAttr(resultType, llvm::APFloat(semantic, 1));
		if (isa<arith::MaxFOp>(op))
		return b.getFloatAttr(resultType,
		llvm::APFloat::getLargest(semantic, true));
		if (isa<arith::MinFOp>(op))
		return b.getFloatAttr(resultType,
		llvm::APFloat::getLargest(semantic, true));
		Hardcode84Unsubmitted Not Done Reply Inline Actions Did you mean `getLargest(semantic, false)`? Also, why `getLargest` and not `getInf`? Hardcode84: Did you mean `getLargest(semantic, false)`? Also, why `getLargest` and not `getInf`?
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions You are right this is a bug. Since I'm only moving this code here I would rather not add the bug fix as part of this patch. I'll send another patch to fix this independently of this change, ThomasRaoux: You are right this is a bug. Since I'm only moving this code here I would rather not add the…
		return Attribute();
		}
		if (isa<arith::AddIOp, arith::OrIOp, arith::XOrIOp>(op))
		return b.getIntegerAttr(resultType, 0);
		if (isa<arith::AndIOp>(op))
		return b.getIntegerAttr(resultType, -1);
		if (isa<arith::MaxSIOp>(op))
		return b.getIntegerAttr(resultType, std::numeric_limits<int64_t>::min());
		if (isa<arith::MinSIOp>(op))
		return b.getIntegerAttr(resultType, std::numeric_limits<int64_t>::max());
		if (isa<arith::MulIOp>(op))
		return b.getIntegerAttr(resultType, 1);
		return llvm::None;
		}

} // namespace linalg		} // namespace linalg
} // namespace mlir		} // namespace mlir

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

Show First 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	if (!tilingResult.loops.empty()) {
llvm::dbgs() << "After tiled implementation :\n";		llvm::dbgs() << "After tiled implementation :\n";
tilingResult.loops.front().dump();		tilingResult.loops.front().dump();
llvm::dbgs() << "\n";		llvm::dbgs() << "\n";
}		}
});		});
return tilingResult;		return tilingResult;
}		}

		FailureOr<scf::SCFReductionTilingResult>
		mlir::scf::tileReductionUsingScf(PatternRewriter &b,
		PartialReductionOpInterface op,
		ArrayRef<OpFoldResult> tileSize) {
		Location loc = op.getLoc();
		// Ops implementing PartialReductionOpInterface are expected to implement
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Interesting, we don't have a good way of enforcing that an interface implies another. Can you add a top-level comment that this is assumed? nicolasvasilache: Interesting, we don't have a good way of enforcing that an interface implies another. Can you…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Makes sense, added a comment. ThomasRaoux: Makes sense, added a comment.
		// TilingInterface.
		auto tilingInterfaceOp = cast<TilingInterface>(op.getOperation());
		mravishankarUnsubmitted Done Reply Inline Actions I think SCF already has a depence on Arith dialect. I think you can replace this with `getValueOrCreateConstantIndexOp`. mravishankar: I think SCF already has a depence on Arith dialect. I think you can replace this with…
		SmallVector<Range> iterationDomain = tilingInterfaceOp.getIterationDomain(b);
		SmallVector<Value> tileSizeVector =
		getValueOrCreateConstantIndexOp(b, loc, tileSize);
		if (tileSizeVector.size() < iterationDomain.size()) {
		auto zero = b.create<arith::ConstantIndexOp>(loc, 0);
		tileSizeVector.append(iterationDomain.size() - tileSizeVector.size(), zero);
		}
		if (op->getNumResults() != 1)
		return b.notifyMatchFailure(
		op, "don't support ops with multiple results for now");
		SmallVector<utils::IteratorType> iterators =
		tilingInterfaceOp.getLoopIteratorTypes();
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Can we just use an llvm::count and llvm::any_of/all_of ? The readability improvements would greatly trump the performance gains of fusion.. int64_t numReductionDims = llvm::count/count_if(lambda); if (numReductionDims != 1) error; if (llvm::any_of(lambda)) return b.notifyMatchFailure(op, "only reduction dimensions can have non zero tile sizes"); nicolasvasilache: Can we just use an llvm::count and llvm::any_of/all_of ? The readability improvements would…
		int64_t numReductionDims = llvm::count(
		tilingInterfaceOp.getLoopIteratorTypes(), utils::IteratorType::reduction);
		if (numReductionDims != 1)
		return b.notifyMatchFailure(
		op, "only support ops with one reduction dimension.");
		int reductionDim;
		for (auto &[idx, iteratorType] :
		llvm::enumerate(tilingInterfaceOp.getLoopIteratorTypes())) {
		if (iteratorType == utils::IteratorType::reduction) {
		reductionDim = idx;
		break;
		}
		}
		// 1. create the inital tensor value.
		FailureOr<Operation *> identityTensor =
		op.generateInitialTensorForPartialReduction(b, loc, tileSize,
		reductionDim);
		if (failed(identityTensor))
		return b.notifyMatchFailure(op,
		"cannot create a tensor of identity value.");
		// 2. Create the nested loops.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `return b.notifyMatchFailure(` for better debugging nicolasvasilache: `return b.notifyMatchFailure( ` for better debugging
		SmallVector<OpFoldResult> offsets, sizes;
		SmallVector<scf::ForOp> loops = generateTileLoopNest(
		b, loc, iterationDomain, tileSizeVector, offsets, sizes);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Can you add a TODO at the func declaration that we should use `ArrayRef<OpFoldResult>` instead of `ArrayRef<Value>`? nicolasvasilache: Can you add a TODO at the func declaration that we should use `ArrayRef<OpFoldResult>` instead…

		// 3. Generate the tiled implementation within the inner most loop.
		b.setInsertionPoint(loops.back().getBody()->getTerminator());
		Operation *parallelOp =
		op.tileToPartialReduction(b, loc, identityTensor.value()->getResults(),
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions something's fishy between this insertion point and the next .. either the next one crashes or this one does nothing. nicolasvasilache: something's fishy between this insertion point and the next .. either the next one crashes or…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions oops yeah removed this code ThomasRaoux: oops yeah removed this code
		offsets, sizes, reductionDim);

		SmallVector<OpFoldResult> resultSizesList;
		for (size_t i = 0; i < offsets.size(); i++)
		resultSizesList.push_back(
		b.createOrFold<tensor::DimOp>(loc, parallelOp->getResult(0), i));
		SmallVector<OpFoldResult> outOffsets(offsets.size(), b.getIndexAttr(0));
		FailureOr<SmallVector<Value>> replacementOr = yieldTiledValues(
		b, identityTensor.value()->getResults(), parallelOp->getResults(),
		outOffsets, resultSizesList, loops);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions You need to wrap this into something else to get a real OpFoldResult. However, with the current broken createOrFold API you can never avoir emitting the constant op .. nicolasvasilache: You need to wrap this into something else to get a real OpFoldResult. However, with the current…
		if (failed(replacementOr))
		return b.notifyMatchFailure(op, "failed to yield replacement");

		auto dstOp = cast<DestinationStyleOpInterface>(parallelOp);
		auto innerMostLoop = loops.back();
		SmallVector<Value> destinationTensors = dstOp.getDpsInitOperands();
		assert(destinationTensors.size() ==
		innerMostLoop.getRegionIterArgs().size() &&
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Is the else ever legitimate ? This needs a comment with a counter-example. nicolasvasilache: Is the else ever legitimate ? This needs a comment with a counter-example.
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions I don't think there is, I removed the `if` ThomasRaoux: I don't think there is, I removed the `if`
		"unexpected number of outputs");
		updateDestinationOperandsForTiledOp(b, destinationTensors,
		innerMostLoop.getRegionIterArgs());

		// 4. Apply the merge reduction to combine all the partial values.
		b.setInsertionPointAfter(*loops.begin());
		Operation *mergeOp =
		op.mergeReductions(b, loc, replacementOr.value(), reductionDim);
		b.replaceOp(op, mergeOp->getResults());

		SCFReductionTilingResult results;
		results.initialOp = identityTensor.value();
		results.loops = std::move(loops);
		results.parallelTiledOp = parallelOp;
		results.mergeOp = mergeOp;
		return results;
		}
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// tileConsumerAndFuseProducerGreedilyUsingSCFForOp implementation.		// tileConsumerAndFuseProducerGreedilyUsingSCFForOp implementation.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Return the untiled producer whose slice is used in a tiled consumer. The		/// Return the untiled producer whose slice is used in a tiled consumer. The
/// method traverses the tile loop nest (`loops`) if needed, and returns the		/// method traverses the tile loop nest (`loops`) if needed, and returns the
/// `iter_args` of the outer most that is encountered. Traversing the iter_args		/// `iter_args` of the outer most that is encountered. Traversing the iter_args
/// indicates that this is a destination operand of the consumer. If there was		/// indicates that this is a destination operand of the consumer. If there was
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/transform-tile-reduction.mlir

This file was added.

				// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file -canonicalize \| FileCheck %s

				func.func @reduction_tile(%arg0: tensor<?x?xf32>, %out: tensor<?xf32>) -> tensor<?xf32> {
				%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
				affine_map<(d0, d1) -> (d0)>],
				iterator_types = ["parallel", "reduction"]}
				ins(%arg0 : tensor<?x?xf32>)
				outs(%out : tensor<?xf32>) {
				^bb0(%arg7: f32, %arg9: f32):
				%1 = arith.mulf %arg7, %arg7 : f32
				%2 = arith.addf %1, %arg9 : f32
				linalg.yield %2 : f32
				} -> tensor<?xf32>
				return %red : tensor<?xf32>
				}

				transform.sequence failures(propagate) {
				^bb0(%arg1: !pdl.operation):
				%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
				%1, %2, %3 = transform.structured.tile_reduction_using_scf %0 { tile_sizes = [0, 5] }
				}

				// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d0, d1)>
				// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0, d1) -> (d0)>
				// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0)[s0] -> (-d0 + s0, 5)>
				// CHECK: func @reduction_tile(%[[ARG0:.+]]: tensor<?x?xf32>, %[[ARG1:.+]]: tensor<?xf32>
				// CHECK-DAG: %[[I:.*]] = arith.constant 0.000000e+00 : f32
				// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>
				// CHECK-DAG: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>
				// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C0]] : tensor<?xf32>
				// CHECK: %[[E:.*]] = tensor.empty(%[[D2]]) : tensor<?x5xf32>
				// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x5xf32>) -> tensor<?x5xf32>
				// CHECK: %[[L:.]] = scf.for %[[K:.]] = %[[C0]] to %[[D1]] step %[[C5]] iter_args(%[[ARG3:.*]] = %[[F]]) -> (tensor<?x5xf32>) {
				// CHECK: %[[PS:.*]] = affine.min #[[MAP2]](%[[K]])[%[[D1]]]
				// CHECK: %[[EXT2:.]] = tensor.extract_slice %[[ARG0]][0, %[[K:.]]] [%[[D0]], %[[PS]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
				herhutUnsubmitted Not Done Reply Inline Actions Why is this valid? What happens if `ARG0` does not have a size that is divisible by 5? herhut: Why is this valid? What happens if `ARG0` does not have a size that is divisible by 5?
				mravishankarUnsubmitted Not Done Reply Inline Actions This is a good point. I'd expect there to be an `affine.min` generated by `generateTiledLoopNest` for this case.... mravishankar: This is a good point. I'd expect there to be an `affine.min` generated by…
				ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions good catch, there was a bug in the size used to insert. It is fixes now and the size is based on affine_min as expected. ThomasRaoux: good catch, there was a bug in the size used to insert. It is fixes now and the size is based…
				// CHECK: %[[EXT:.*]] = tensor.extract_slice %[[ARG3]][0, 0] [%[[D0]], %[[PS]]] [1, 1] : tensor<?x5xf32> to tensor<?x?xf32>
				// CHECK: %[[PR:.*]] = linalg.generic {indexing_maps = [#[[MAP0]], #[[MAP0]]], iterator_types = ["parallel", "parallel"]} ins(%[[EXT2]] : tensor<?x?xf32>) outs(%[[EXT]] : tensor<?x?xf32>) {
				// CHECK: arith.mulf
				// CHECK: arith.addf
				// CHECK: linalg.yield
				// CHECK: } -> tensor<?x?xf32>
				// CHECK: %[[D3:.*]] = tensor.dim %[[PR]], %[[C0]] : tensor<?x?xf32>
				// CHECK: %[[D4:.*]] = tensor.dim %[[PR]], %[[C1]] : tensor<?x?xf32>
				// CHECK: %[[INS:.*]] = tensor.insert_slice %[[PR]] into %[[ARG3]][0, 0] [%[[D3]], %[[D4]]] [1, 1] : tensor<?x?xf32> into tensor<?x5xf32>
				// CHECK: scf.yield %[[INS]] : tensor<?x5xf32>
				// CHECK: }
				// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[MAP0]], #[[MAP1]]], iterator_types = ["parallel", "reduction"]} ins(%[[L]] : tensor<?x5xf32>) outs(%[[ARG1]] : tensor<?xf32>) {
				// CHECK: arith.addf
				// CHECK: linalg.yield
				// CHECK: } -> tensor<?xf32>
				// CHECK: return %[[R]] : tensor<?xf32>

				// -----

				func.func @reduction_tile_transpose(%arg0: tensor<?x?xf32>, %out: tensor<?xf32>) -> tensor<?xf32> {
				%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
				affine_map<(d0, d1) -> (d1)>],
				iterator_types = ["reduction", "parallel"]}
				ins(%arg0 : tensor<?x?xf32>)
				outs(%out : tensor<?xf32>) {
				^bb0(%arg7: f32, %arg9: f32):
				%42 = arith.addf %arg7, %arg9 : f32
				linalg.yield %42 : f32
				} -> tensor<?xf32>
				return %red : tensor<?xf32>
				}

				transform.sequence failures(propagate) {
				^bb0(%arg1: !pdl.operation):
				%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
				%1, %2, %3 = transform.structured.tile_reduction_using_scf %0 { tile_sizes = [5, 0] }
				}

				// CHECK: func @reduction_tile_transpose
				// CHECK: tensor.empty(%{{.*}}) : tensor<5x?xf32>
				// CHECK: linalg.fill {{.*}} : tensor<5x?xf32>) -> tensor<5x?xf32>
				// CHECK: scf.for
				// CHECK: linalg.generic
				// CHECK: %[[D3:.]] = tensor.dim %{{.}}, %[[C0]] : tensor<?x?xf32>
				// CHECK: %[[D4:.]] = tensor.dim %{{.}}, %[[C1]] : tensor<?x?xf32>
				// CHECK: %[[INS:.*]] = tensor.insert_slice %[[PR]] into %[[ARG3]][0, 0] [%[[D3]], %[[D4]]] [1, 1] : tensor<?x?xf32> into tensor<5x?xf32>
				// CHECK: scf.yield {{.*}} : tensor<5x?xf32>
				// CHECK: }
				// CHECK: linalg.generic
				// CHECK: return

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Add reduction tiling transformationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 473058

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h

mlir/include/mlir/Interfaces/TilingInterface.td

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

mlir/lib/Dialect/Linalg/Transforms/SplitReduction.cpp

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

mlir/test/Dialect/Linalg/transform-tile-reduction.mlir

[mlir][linalg] Add reduction tiling transformation
ClosedPublic