The structured op splitting transformation is conceptually similar to
tiling in the sense that it decomposes the iteration space of the
original op into several parts. Therefore, it is possible to implement
it using the TilingInterface to operate on iteration spaces and their
parts. However, the implementation also requires to pass updated input
operands, which is not supported by the interface, so the implementation
currently remains Linalg-specific.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/Linalg/Transforms/Split.cpp | ||
---|---|---|
108–109 | It wasn't clear to me from the documentation if the offset supplied to getTiledImplementation is relative to the iteration space as returned by the interface or should also include the offset along the dimension in the space (which may be non-zero in the interface). Please advise. |
mlir/lib/Dialect/Linalg/Transforms/Split.cpp | ||
---|---|---|
108–109 | Good point! let me throw the question right back at you. What would make more sense? The current implementation of how the TilingInterface is used is the if the iterations space is (lb0, ub0, step) the offset that is passed to getTiledImplementation is lb0 + tileSize * tileID . So it is not "relative to iteration space". But that can be changed, and indeed now is the time to change it. If relative makes more sense then we can change it, but if it is relative you also need the iteration space to get the absolute position (the same is true in reverse though. If you want relative, you need the iteration space). |
mlir/lib/Dialect/Linalg/Transforms/Split.cpp | ||
---|---|---|
108–109 | I have a very slight preference for it to be relative. Two reasons: (1) this matches the conceptual model of the subsetting (insert/extractslice, subview) operations in which the offset/stride is relative to the operand that may be a subset itself; (2) it somehow feels that the client of the TilingInterface would rarely care about absolute positions. But, in the end, I am fine with both ways as long as there is a note in the documentation that says which one it is :) |
However, the implementation also requires to pass updated input
operands, which is not supported by the interface, so the implementation
currently remains Linalg-specific.
That is true because the patch is trying to update the operation in place from op -> (split + modified op). If the implementation is changes to op -> (split1 + split2) and you replace all uses of op with result of split2 this doesnt have to be Linalg specific?
Overall this looks good to me. Just waiting on the above question.
mlir/lib/Dialect/Linalg/Transforms/Split.cpp | ||
---|---|---|
108–109 | Had an offline discussion on this. Staying with absolute documents the current state of the implementation. There is argument to be made for both, so leaving it as absolute for now until we have more data on whether relative or absolute is better. |
It doesn't preserve the modified op. The issue is as follows: given linalg.op ins(%inputs) outs(%outputs), calling getTiledImplementation with tileDestOperands=true will produce extract_slice from %outputs and feed them to the newly created op. When creating the second split, it will _still_ take the original %outputs whereas it should actually take the values updated by the first split. Hence the inplace update that makes the original op (passed to getTiledImplementation) take the results of the first split instead of %outputs. The original op is then replaced by the results of the second split so it does not persist. The interface takes the list of destination values, but they are only used for inserting partial results.
The DestinationStyleInterface proposed upstream would make this Linalg-independent because we would know which of the operands are destination operands that must be replaced. So would changing getDestinationOperands to return a list of OpOperand instead of Value so that we can update the operands.
Thanks for the explanation. Ill accept this so that you can land it, but I have more questions.
- If the slices you are taking are disjoint then you dont need to pass the result of the first split into second split right. All you need to do is take the tensor.insert_slice after the first split as the dest for the second tensor.insert_slice? If the requirement here is that the splits can be overlapping, then I understand how the getTiledImplementation does not help, but I am unclear of the use case of creating overlapping splits this way.
- Maybe the use of destination passing style interface will help. The current implementation of TilingInterface explicitly avoids making the assumption that we are using destination passing style operation. This is why the getDestinationOperands takes an OpBuilder as an argument. An operation can "build" the SSA value to use. When experimenting with IREE I had used the interface to implement tiling for tensor.extract_slice and tensor.insert_slice where that approach was useful. (I dont do this anymore in IREE cause it turned out to be just a silly thing to do, but a useful experiment with TilingInterface nevertheless to prove that this could be done). Returning a list of OpOperands for getDestinationOperands itself seems to be an assumption that the operation is in destination passing style. I want to not make that decision at the outset, and wait to see if this is indeed needed.
Also, I have been rethinking whether we even need the getDestinationOperands. It seems it has nothing to do with tiling at all and is an artifact of scf.for (i.e. destructive updates). I am thinking of dropping it and see what hell breaks loose.
This would confuse bufferization in the case of dynamic sizes, i.e., when it cannot prove inserts are disjoint. Try running it on https://gist.github.com/ftynse/d377db5cd8ab75870f1d8d74eb5fee46
- Maybe the use of destination passing style interface will help. The current implementation of TilingInterface explicitly avoids making the assumption that we are using destination passing style operation. This is why the getDestinationOperands takes an OpBuilder as an argument. An operation can "build" the SSA value to use. When experimenting with IREE I had used the interface to implement tiling for tensor.extract_slice and tensor.insert_slice where that approach was useful. (I dont do this anymore in IREE cause it turned out to be just a silly thing to do, but a useful experiment with TilingInterface nevertheless to prove that this could be done). Returning a list of OpOperands for getDestinationOperands itself seems to be an assumption that the operation is in destination passing style. I want to not make that decision at the outset, and wait to see if this is indeed needed.
Also, I have been rethinking whether we even need the getDestinationOperands. It seems it has nothing to do with tiling at all and is an artifact of scf.for (i.e. destructive updates). I am thinking of dropping it and see what hell breaks loose.
Here, I need the destination operand information to make that update work, and I can get it from either LinalgOpInterface or the proposed upstream interface. This is fine, tiling interface is for tiling, other transformations can have additional requirements. Not sure about the general case, I think as long as the particular implementation of tiling is able to insert partial results back somehow, it should be fine.
This would confuse bufferization in the case of dynamic sizes, i.e., when it cannot prove inserts are disjoint. Try running it on https://gist.github.com/ftynse/d377db5cd8ab75870f1d8d74eb5fee46
I think this is a missing part of bufferization. If during construction I know that there are things that are inplace-able, maybe some annotations on the op for bufferization to use would solve the issue. Anyway, dont need to solve it now...
It wasn't clear to me from the documentation if the offset supplied to getTiledImplementation is relative to the iteration space as returned by the interface or should also include the offset along the dimension in the space (which may be non-zero in the interface). Please advise.