This revision allows the base Linalg tiling pattern to optionally require padding to
a constant bounding shape.
When requested, a simple analysis is performed, similar to buffer promotion.
A temporary linalg.simple_pad op is added to model padding for the purpose of
connecting the dots. This will be replaced by a more fleshed out linalg.pad_tensor
op when it is available.
In the meantime, this temporary op serves the purpose of exhibiting the necessary
properties required from a more fleshed out pad op, to compose with transformations
properly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
mlir/include/mlir/Interfaces/ViewLikeInterface.h | ||
---|---|---|
25 | This typedef pollutes the entire mlir namespace. |
mlir/include/mlir/Interfaces/ViewLikeInterface.h | ||
---|---|---|
25 | I can just use OpFoldResult. |
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td | ||
---|---|---|
479–483 | It is still helpful to document the requirements and conventions of this op, instead of it being just a placeholder with unknown semantics. | |
488–490 | Just drop the verifier field. | |
mlir/include/mlir/Interfaces/ViewLikeInterface.h | ||
17–22 | Leftover unnecessary changes. | |
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp | ||
116 | We usually pass builder by-reference. Do you need a copy for some reason? | |
126 | Nit: SmallVector can now infer the default number of stack elements, so if there is no reason to require 4 specifically, it can be just omitted. | |
133 | size.get<Attribute> (there's no need to dyn_cast after a check). | |
138–139 | I'm not a fan of emitting warnings _inside_ transformations. IMO, errors and warnings are there for the validity state of the op in general, not for its suitability to a particular transformation. Remarks may be fine (the user will have to ask for them explicitly to see them). rewriter.notifyMatchFailure() is probably preferred. | |
150 | Write this comment? :) |
This feels like it is promotion on tensors with the linalg.simple_pad (for now and linalg.pad later) being used instead of the alloc(...) ; copy(...) method. So the tensor "promotion" and buffer promotion are living in entirely different places. Ill go deeper into this code, but my read of the tests informs this comment.
There are similarities and differences.
Promotion is guaranteed to alloc and copy but is not guaranteed to create static shapes.
This in contrast has no guarantees of alloc/copy and is guaranteed to create static shapes.
This is meant to be used mostly as an enabler for vectorization as well as packing.
Packing is more closely related to promotion with the big difference that we keep SSA use-def chains and hoisting is significantly simpler to implement.
In any case I expect we'll need to experiment before we can come to a conclusion, this gives us the ability to experiment.
Promotion is guaranteed to alloc and copy but is not guaranteed to create static shapes.
This in contrast has no guarantees of alloc/copy and is guaranteed to create static shapes.
That seems to be a small difference. I agree that promotion itself doesn't gaurantee static shape but it does do a best effort (and actually in IREE it always ends up being static shaped if the tile size is static). It should be easy to extend promotion to abort if it doesnt figure out the static sizes when "promotion" is applied to ops with tensor semantics.
It should be easy to extend promotion to abort if it doesnt figure out the static sizes when "promotion" is applied to ops with tensor semantics.
The fact that promotion tries to create static buffers is very much due to history and needs to go away.
Here is what happens concretely today to inject static sizes in the memory pipeline:
- Promotion creates alloc + fill + copy after some best effort local analysis to try and create a static buffer.
- Fingers are crossed so that vectorization succeeds.
- After vectorization, LinalgCopyVTRForwardingPattern and LinalgCopyVTWForwardingPattern are applied with a best effort dependence analysis to undo 1. and replace by vector.transfer.
- Later masked/unmasked vector.transfer splitting is performed to isolate parts.
In contrast, thanks to the progress on transformations on tensors we can:
- Tile on tensors with padding at the level(s) we are interested in.
- Hoist, as we wish to create and amortize packing.
- Decide which packed tensors materialize in memory, which ones turn into simple subview + control flow and which ones turn into vector transfers.
The implications of padding at the tiling on tensor level are far reaching as you have already seen in IREE, phases are being reordered and Promotion is a bystanding casualty.
See the followup https://reviews.llvm.org/D95243 to get a better idea how things look like after padding + packing.
I would not be surprised that Promotion just gets deleted / becomes part of some bufferization in a not too distant future: the ease with which we can perform transformations thanks to SSA use-def chains is just a no-brainer.
I am not disputing that this is useful, but at least within IREE, I am not sure we can actually set a static tile size at the time of tiling. In IREE with the linalg on tensors path the tile size has to by dynamic during tile and distribute since it is done at the Flow dialect level where no backend specific information is available. The static tile size picked while tiling might not be one that is best for the backend that is targeted (information that is available only at the HAL level). So making this padding as part of the tiling transformation might not work out for IREE. My understanding then is that if we separate out the padding parts from the tiling transformation itself then it has the same constraints/restrictions as promotion, but with the advantage of SSA use-def chains.
I am not disputing that this is useful, but at least within IREE, I am not sure we can actually set a static tile size at the time of tiling. In IREE with the linalg on tensors path the tile size has to by dynamic during tile and distribute since it is done at the Flow dialect level where no backend specific information is available. The static tile size picked while tiling might not be one that is best for the backend that is targeted (information that is available only at the HAL level). So making this padding as part of the tiling transformation might not work out for IREE. My understanding then is that if we separate out the padding parts from the tiling transformation itself then it has the same constraints/restrictions as promotion, but with the advantage of SSA use-def chains.
In the particular context of IREE, I see 2 things we can do easily:
- Pad at the level of tile and fuse and vectorize on tensors at the HAL level to enable vectorization (i.e. at the k^-th level of padding)
- If we have a need for it, this is not hard to generalize to "pad to the next multiple of X" where X can either be a constant or an SSA value.
The fact that the option exists to pad does not mean you have to do it at the same time as "tile-and-distribute."
I wouldn't pattern match this first implementation to how general the approach is: we also want to use it in the context of sparse, so I am really not scared of some dynamic values here and there.
After we connect this to linalg.pad_tensors and have gained some experience, we can decide where we want to push this as we will also need better types to support more advanced behaviors.
Still, I don't see promotion evolving to support tensors, which is the suggestion I seemed to get from your previous comment?
THanks Nicolas for the explanation. Code looks fine to me. Still think that this is effectively adding promotion during tiling and doing this on tensors only is strange. Just trying to see if there is a more commonality between buffers and tensors so as to reuse things instead of completely new paths. Lets land and experiment. I understand how this would subsume promotion though.
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td | ||
---|---|---|
478 | Do we still need this. I think the linalg.pad operation landed already | |
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp | ||
183 | Cant we just use shapedOutputOperands for this? Maybe just add a method for getting the shaped output types? |
Do we still need this. I think the linalg.pad operation landed already