This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Add reduction tiling transformation
ClosedPublic

Authored by ThomasRaoux on Oct 24 2022, 2:05 AM.

Details

Summary

Add a transformation to tile reduction ops into a parallel operation
followed by a merge operation. This is equivalent to the existing
reduction spliting transformation but using loops instead of using
higher dimensions linalg.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Oct 24 2022, 2:05 AM
ThomasRaoux requested review of this revision.Oct 24 2022, 2:05 AM
mravishankar requested changes to this revision.Oct 24 2022, 9:35 AM

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

mlir/include/mlir/Interfaces/TilingInterface.td
163

Is there are reason the interface method has to return a tensor. I think if it returns the identity element that should be enough? I am not opposed to doing this.. This might actually be more general, but trying to understand what is the minimum requirement that the interface should expect ops to implement.

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
273

If I am reading this correctly, the setting of the identity element can be moved into the tileReductionUsingSCFForOp method (related to the comment above)

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
424

I think SCF already has a depence on Arith dialect. I think you can replace this with getValueOrCreateConstantIndexOp.

This revision now requires changes to proceed.Oct 24 2022, 9:35 AM

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

Yes I can do that. I was trying to make the interface more forward looking as I think we should be able to support multiple reductions at some point but I agree it is a bit odd. I'll change the interfaces for now.

mlir/include/mlir/Interfaces/TilingInterface.td
163

If we return the identity element we still need to create op generating the tensor which would be dependent on the op we are tiling. (for linalg we would generate linalg.fill but not for other ops). Where would you do that?

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
273

Getting the reductionOp requires code depending on linalg so I'm not sure how we could do that without adding more interface functions.

Flyby comment: looks like this differential is essentially doing what is being done in https://github.com/iree-org/iree/pull/10537/files#diff-e105ec843efed97f5884ad26c85561c72967f0546029bd772335634f23553c31, which means I can remove that PR and use your new interface. You seem to have gotten rid of the control for inserted index. Do you no longer need that for general purpose reduction tiling?

Flyby comment: looks like this differential is essentially doing what is being done in https://github.com/iree-org/iree/pull/10537/files#diff-e105ec843efed97f5884ad26c85561c72967f0546029bd772335634f23553c31, which means I can remove that PR and use your new interface. You seem to have gotten rid of the control for inserted index. Do you no longer need that for general purpose reduction tiling?

Yes it could be a potential way to replace this and generalize it by supporting dynamic shapes. The control index can be added to this if needed. Right now the code just makes a decision based on the reduction loop index but for more flexibility there should be a control indeed.

vmurali added a comment.EditedOct 24 2022, 11:51 AM

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

I think this OP should _not_ take care of multiple reduction dimensions. There's already a separate transformation op or transformation (that Thomas wrote earlier) that combines multiple contiguous reduction dimensions - so we can use that to combine the sequence of innermost reductions into one. If the reduction dimension is not innermost, then there's no need to apply the technique as parallel tiling will already create tiles that can be reduced in parallel.

In other words, I think this differential should be accepted as is, with comments (and checks) denoting that only innermost dimension must be reduced using this transformation.

pifon2a added inline comments.Oct 25 2022, 2:51 AM
mlir/include/mlir/Interfaces/TilingInterface.td
200

why are mergeReductions, tileToPartialReduction, generateInitialTensorForPartialReduction a part of tiling interface? They work only for Linalg, don't they?

Hardcode84 added inline comments.
mlir/lib/Dialect/Linalg/Utils/Utils.cpp
1064

maybe return Optional<Attribute> or FailureOr<Attribute> to force users to check validity?

1079

Did you mean getLargest(semantic, false)? Also, why getLargest and not getInf?

Move the reduction detection dimension in the common code and address other review comments.

more review comments

ThomasRaoux marked an inline comment as done.Oct 25 2022, 9:20 PM

Wow! Awesome! I was just going to accept this, but realized I have one question on further reading. There seems to be an implicit assumption about there being a single reduction dimension through the different interface methods. Maybe its better to be more explicit, and just make the dimension (or list of dimensions) that is being tiled be an explicit argument to the interface methods. That will remove that implicit assumption through the different methods.

Thanks Mahesh. As discussed I moved the code detecting the reduction dimension to the common code but creating the tensor is still done by the interface function.

mlir/include/mlir/Interfaces/TilingInterface.td
200

The idea is that other ops than linalg could implement this interface in the future. For instance we did have to do reduction splitting for linalgext topK and that could be an op that would implement this interface. Also since we need to re-use helpers in the tiling code this allows us to do that without adding dependency between the tiling code and linalg.

ThomasRaoux added inline comments.Oct 25 2022, 9:20 PM
mlir/lib/Dialect/Linalg/Utils/Utils.cpp
1064

Changed it.

1079

You are right this is a bug. Since I'm only moving this code here I would rather not add the bug fix as part of this patch. I'll send another patch to fix this independently of this change,

herhut requested changes to this revision.Oct 26 2022, 4:27 AM
herhut added a subscriber: herhut.
herhut added inline comments.
mlir/include/mlir/Interfaces/TilingInterface.td
200

Is this really tiling the operation? Yes, at some level it moves the computation to working on tiles but those tiles are not "normal" tiles in the sense of all other operations in this interface. Instead, these methods allow to "distribute" a reduction over multiple threads. So I see how having these methods implemented allows to apply this transformation to anything that is like a reduction but they are not useful otherwise. Maybe have a StructuredReductionOp interface?

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
315

Why is omitPartialTileCheck true here? Do we statically know that the dimension tiles?

mlir/test/Dialect/Linalg/transform-tile-reduction.mlir
38

Why is this valid? What happens if ARG0 does not have a size that is divisible by 5?

This revision now requires changes to proceed.Oct 26 2022, 4:27 AM
mravishankar added inline comments.Oct 26 2022, 7:33 AM
mlir/include/mlir/Interfaces/TilingInterface.td
189

Could we make this an ArrayRef<unsigned>. Also I have been lead to believe that recommended C++ usage is to use int types unless you are doing bit manipulation.

200

I dont see a reason why this needs to be moved out of TilingInterface into any other interface... If an op doesnt support this kind of tiling then it just does not implement these methods. Also I think there is a final state where we can combine tileToPartialReduction and getTiledImplementation, but that isnt clear yet. So there is a separate method added for now. It is meant to handle tiling of loops that are reductions, which are fine. Haing more interfaces just adds strange dependencies, or results in having redundant interface methods across interfaces.

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
315

The implementation of the loops already handles the partial tile handling. That is accounted for in the sizes. There is no need for an op implementation to account for it.

mlir/test/Dialect/Linalg/transform-tile-reduction.mlir
38

This is a good point. I'd expect there to be an affine.min generated by generateTiledLoopNest for this case....

Fix bug in insert_slice size

ThomasRaoux added inline comments.Oct 26 2022, 9:38 AM
mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
315

Yes, this is handled by generateTileLoopNest so the sizes are already adjusted.

mlir/test/Dialect/Linalg/transform-tile-reduction.mlir
38

good catch, there was a bug in the size used to insert. It is fixes now and the size is based on affine_min as expected.

Address more review comments.

mlir/include/mlir/Interfaces/TilingInterface.td
189

Changed it to ArrayRef<int>

Fix alignments

pifon2a requested changes to this revision.Oct 26 2022, 11:00 AM
pifon2a added inline comments.
mlir/include/mlir/Interfaces/TilingInterface.td
200

I am worried that this change modifies TilingInterface substantially without a proper RFC and it is not clear whether these methods should be a part of the interface at all.

This revision now requires changes to proceed.Oct 26 2022, 11:00 AM
mravishankar accepted this revision.Oct 26 2022, 11:14 AM
mravishankar added inline comments.
mlir/include/mlir/Interfaces/TilingInterface.td
200

This is a strict addition to the TilingInterface, and is an opt-in. So does not impact anyone who is using TilingInterface. So I dont really think it needs an RFC. Can you give more substantial reasoning as to why this should not be part of the TilingInterface? It is reusing the implementation of these methods in other operations, as well as reusing parts of the implementation of tiling using the interface. That is a strong signal to me that it indeed belongs here. I dont see a reason to block this patch. If indeed it turns out to be muddling separation of concerns we can revisit that, but to me this is already layered correctly.

pifon2a added inline comments.Oct 26 2022, 11:24 AM
mlir/include/mlir/Interfaces/TilingInterface.td
200

These methods are needed only for a subset of tileable operations. Adding it all here is similar to adding everything to LinalgStructuredOp interface and then spending a lot of time extracting DPS interface out of it, because LinalgOps are just a subset of DPS ops.

mravishankar added inline comments.Oct 26 2022, 11:39 AM
mlir/include/mlir/Interfaces/TilingInterface.td
200

I don't think they are equivalent. The DPS was deemed not a subset of LinalgStructuredOp, but rather a superset. That's why it was pulled out. As you say this is needed only for a subset of tileable operations. If an operation cannot support this way of Tiling, it just doesn't implement those interface methods. Then you cannot apply this transformation on those ops, and that is exactly how it should work.

Move code out of TilingInterface

ThomasRaoux added inline comments.Oct 27 2022, 12:05 PM
mlir/include/mlir/Interfaces/TilingInterface.td
200

The code is moved all into linalg. This requires exposing more scf helpers.

pifon2a accepted this revision.Oct 27 2022, 12:06 PM
mravishankar requested changes to this revision.Oct 27 2022, 1:05 PM

I am not on board with exposing these utility functions. The layering of this going into TilingInterface is the right one which will avoid exposing these methods that are really implementation details. Lets hold off till we can reach a consensus.

This revision now requires changes to proceed.Oct 27 2022, 1:05 PM

Moved new interface method to a separate interface outside of TilingInterface

Based on discussions I moved the methods to a separate interface in parallel of TilingInterface. Please take another look.

nicolasvasilache accepted this revision.Nov 2 2022, 4:35 AM

Bunch of suggestions for improving code and legitbility but nothing worth delaying more.

Nice stuff, thank you @ThomasRaoux !

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
652

Maybe document that your transform here overallocates for the sequential case (i.e. you could only have a tensor<5xf32> and iteratively reduce within the loop) but that this also works for the parallel case?

Do you expect this to be configurable in the future or to always use the larger allocation?

Edit: I realize the regular tiling to scf.for should do the "smaller allocation version". So now I am wondering whether we see an advantage doing it this way in the sequential case ? (i.e. is this related to @dcaballe 's vectorization comments ?)

mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
143

better docs here plz

153

nit: merged

154

super nit: column

160

super nit: extra newline

mlir/include/mlir/Interfaces/TilingInterface.td
189

Our default type is int64_t everywhere, any concrete reason to use int (I don't think the memory usage for a few ints jusifies int)?

213

I like this new interface, not all ops can support partial-reduction tiling without quite more effort (e.g. gather/scatter/sort).
But a big class of ops does support this out of the box.

216

to tile reductions using partial reductions + merge ?

223

nit: methode is French :) (here and below)

233

nit: reductionDims (plural) since you take an ArrayRef ?

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
412

OpBuilder::InsertionGuard guard(b);

In anything that takes a builder reference, otherwise you're leaking insertion point changes across function boundaries which creates bugs that are very hard to track.

419

int64_t everywhere, all the time, plz (unless doing bit manipulations).
If you hit a size_t or unsigned you don't control, please static_cast<int64_t>

432

I don't see a new map here ?

448

Please use ShapedType::isDynamic.
We should be close to be able to make ShapedType::kDynamicSize private now.

466

insertion guard

473

Can we add a proper AffineMap.h/cpp helper:

AffineMap::injectResult(int64_t resultPos, AffineExpr newResult)

and just use that directly:

newMaps.back()  = newMaps.back() .injectResult(...);
474

Some comment here re injecting the split dimension (e0 .. ek .. en) -> (e0 .. ek, split_dim, ek+1 .. en)

483

Better comments plz:

Step 1. ...
Step 1. a. ...

489

Plz add a top-of-the-function // TODO: SubsetExtractOpInterface

511

insertion guard

521

I have a personal preference for fewer loops / less code under loops:

SmallVector<StringRef> reductionIteratorTypes(intermRank, getParallelIteratorTypeName());
reductionIteratorTypes[dimToMerge] = `();
AffineMap outputMap = AffineMap::multiDimSomethingMap().drop_result(dimToMerge); // or the equivalent APIs
523

unsigned -> int64_t everywhere plz

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
422

Interesting, we don't have a good way of enforcing that an interface implies another.
Can you add a top-level comment that this is assumed?

436

Can we just use an llvm::count and llvm::any_of/all_of ?
The readability improvements would greatly trump the performance gains of fusion..

int64_t numReductionDims = llvm::count/count_if(lambda);
if (numReductionDims != 1) error;

if (llvm::any_of(lambda))
  return b.notifyMatchFailure(op, "only reduction dimensions can have non zero tile sizes");
457

return b.notifyMatchFailure( for better debugging

460

Can you add a TODO at the func declaration that we should use ArrayRef<OpFoldResult> instead of ArrayRef<Value>?

465

something's fishy between this insertion point and the next .. either the next one crashes or this one does nothing.

475

You need to wrap this into something else to get a real OpFoldResult.
However, with the current broken createOrFold API you can never avoir emitting the constant op ..

483

Is the else ever legitimate ?
This needs a comment with a counter-example.

herhut accepted this revision.Nov 2 2022, 10:00 AM

Thanks for refactoring. Having this in a separate interface addresses my concerns about the partial interface. Did not look at the deep details but also do not want to block this.

mravishankar accepted this revision.Nov 2 2022, 11:58 AM

Nicolas already seems to have reviewed in great detail! The overall structure looks fine to me. Thanks @ThomasRaoux for going through the different iterations to reach convergence.

mlir/include/mlir/Interfaces/TilingInterface.td
213

Agreed! sort can actually support it I think :)

This revision is now accepted and ready to land.Nov 2 2022, 11:58 AM
ThomasRaoux marked 17 inline comments as done.

Address review comments.

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
652

I'm adding more doc, in general the only case where we would want to allocate less is if the reduction size is smaller than the tile size. This could become an affin.min but I'm not sure we would ever want that. We can iterate if needed.

mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp
412

even if I don't move the insertion point?
Added to all of those right now

432

Yes the comment was outdated. Rewrote it.

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
422

Makes sense, added a comment.

465

oops yeah removed this code

483

I don't think there is, I removed the if

This revision was automatically updated to reflect the committed changes.