Page MenuHomePhabricator

[mlir] Add an interface to allow operations to specify how they can be tiled.

Authored by mravishankar on Jul 20 2021, 3:29 PM.



An interface to allow for tiling of operations is introduced
here. Using this interface a tiling algorithm is implemented that
tiles an operation that implements that interface.

The current tiling algorithm uses the same options as Linalg ops and
uses a similar control mechanism. To not leak these options outside of
Linalg, the developed tiling algorithm is currently places within
Linalg Dialect.

The test ops defined to check the tiling algorithm implement the
interface as an ExternalModel. Also implemented is the tiling of
tensor.extract_slice using a similar approach.

Diff Detail

Event Timeline

mravishankar created this revision.Jul 20 2021, 3:29 PM
mravishankar requested review of this revision.Jul 20 2021, 3:29 PM

Cmake file fix.

Fix Cmake builds.

springerm added inline comments.



I think there could be a bit more documentation around these and the method description. In particular, what could've helped me understand this faster (without looking at the code):

  • getTiledImplementation generates the ExtractSlice ops (extracting tiles of size sizes at position offsets).
  • The caller of getTiledImplementation is responsible for writing back the tile result to the output tensor via InsertSlice. The result offset and size are calculated by getTiledImplementation.
  • The caller of getTiledImplementation is responsible for generating the loop nest.
  • Maybe rename sizes to tileSizes. Should resultOffsets etc. be called outputOffsets?

Can be deleted. These helper functions are in StaticValueUtils.


Just an implementation detail, but I would mention somewhere that this function generates just one loop and then calls itself recursively.


if body could be a separate static helper function. Sth like insertTiledOpIntoOutput.


Why not insert into ret.results directly?


What is a "buffer"? Is this needed to support memref and tensor outputs? (Probably not because the above code always generates an InsertSlice, so only tensor works...) A comment would be helpful.


Not sure what this means.


There's getAsOpFoldResult in StaticValueUtils.


why are non-parallel loop iterators different?


Not exactly the same, but you could probably use getConstantIntValue from StaticValueUtils.


nit: getDimValue uses Value()

nicolasvasilache requested changes to this revision.Aug 19 2021, 12:05 PM
nicolasvasilache added inline comments.

This looks quite similar to

I am wary of adding a parallel tiling implementation that has not yet proven to compose with the other transformations we want (e.g. fusion, padding, distribution).

I would be much more supportive of seeing the tiling interface integrated into the existing flow.
In other words, there isn't room for multiple implementations of the same thing: either we replace the existing one completely in a single shot or we incrementally refactor it to use the interface.


Part of the exercise here is to split the LinalgStructureOpInterface into the parts that are useful for tiling.
This should come with a matching deletion on the LinalgStructuredOpInterface side, otherwise this is duplication.

If there are layering issues that make this difficult, we don't want them to hit us halfway into some future refactoring.


+1 this definitely needs to be heavily documented.


This is not just tiling but tile and distributed fused into a single implementation.
Besides the other points I made in other places re. duplication of functionality and overlap, I also have deeper composability and evolution concerns with the current impl and API.


Why are we talking about distribution here?
This should be done in a composable way tiling is one thing, distribution is another and they should compose.


I am having trouble understanding the purpose of this file.
You propose a new tiling interface op that can generalize in the future, this is great.
I do not understand why this interface is not hooked up to the existing tiling passes.

Even if this is meant as a transient state, this introduces 3 extra "one-off" tiling implementations + extra patterns:

  • TestFullSizeOutputTileTilingInterface::getTiledImplementation
  • TestMixedReduceParallelTilingInterface::getTiledImplementation
  • InsertSliceTilingInterface::getTiledImplementation

History shows that this will turn into more dead code in the end (either this file or the existing tiling).
I have not yet seen evidence that we want to buy into these implementations of tiling.

The interface is one thing that I support as we previously discussed, the usage of the interface as proposed in this revision not so much.

This revision now requires changes to proceed.Aug 19 2021, 12:05 PM
mravishankar marked 9 inline comments as done.

Rebase and address a few comments.

mravishankar abandoned this revision.Aug 19 2021, 4:25 PM

Based on comments and offline discussion, going to abandon this change. Will try to find a way to land the interface without the tiling algorithm.


Yes it does. I agree that having two tiling algorithms is not ideal. I think integrating the interface into the existing Linalg tiling algorithm is going to be a more gradual process. My plan was to land a prototype and then evolve the existing tiling algorithm making one of them eventually redundant.
That being said, I am happy to not land the tiling algorithm implemented in this patch. So will abandon this patch and keep this as a WIP. Will try to see if I can land the itnerface by using it for tiling the linalg.pad_tensor that exists (I dont know how it is implemented, but would be good to use this interface for that and see what I can learn from this).


I suspect there is going to be some hairy issues to resolve. So might be worth landing the interface first and then making structured ops implement more and more of this interface (and removing them from structured ops interface).


Added some more documentation. More of an FYI. This patch is not meant to be landed at this point (based on feedback).


Added comments about this. Thanks!


The existing tiling algorithm in Linalg is also doing tile + distribute at the same time (unless it has been changed from the last time I saw this). So this isnt doing anything different. From the last time we talked about this (ages ago), tile + distribute could happen at the same time since distribution is only a matter of changing lower bound and step of the loop. Maybe there is a change in the thought here. I am all for composable steps, but this was just keeping the same functionality as the existing tiling algorithm


Changed it now, so this much simpler. After using this a bit, it is very awkward for the tiling algorithm to do these inserts. Its much cleaner for the interface method to do this. So the change has been refactored to reflect that. In any case, not going to land this patch, so the details of the tiling algorithm might be immaterial.


Unnecessary now.


(see above)


No, this works for both tensors and buffers. The above insert slice is generated only when the tiled operation has a result value (that happens only when there operands are tensor types). The structure of the loop generated is also different for tensors and buffers. So again here, the operation not returning a value is used as a proxy to say that this is tiling on tensors or tiling on buffers.
ANyway, we can revisit this since this algorithm is not going to be landed as is.


THis has to do with distribution. Right now the algorithm here does not handle tiling the reduction dimension. THe reduction is a bit more involved since I am not assuming that this is a commutative/associative operation. Maybe reduction is the wrong terminology and we need a new iterator type called "sequential" to truly capture the op semantics. TBD. (Again moot since this algorithm is not landing)


Agreed. To repeat from above. I wasnt sure I wanted to change the existing tiling algorithm to use the interface in one-shot. So tried to stage it with a separate tiling algorithm that uses the interface. Lets talk about how we can stage this process.