Refactor the original code to rewrite a PadTensorOp into a
sequence of InitTensorOp, FillOp and InsertSliceOp without
vectorization by default. GenericPadTensorOpVectorizationPattern
provides a customized OptimizeCopyFn to vectorize the
copying step.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can you add an integration test analogous to mlir/test/Integration/Dialect/Linalg/CPU/test-subtensor-insert.mlir ?
mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp | ||
---|---|---|
223 ↗ | (On Diff #355925) | nit: Prefer PadTensorOp::Adaptor |
237 ↗ | (On Diff #355925) | nit: use llvm::hasSingleElement |
256 ↗ | (On Diff #355925) | nit: Use . at the end of the comment. Same with the other comments. |
mlir/test/Dialect/Linalg/bufferize.mlir | ||
270 ↗ | (On Diff #355925) | can you give meaningful names to the CHECK values? |
Note that I asked for a similar rewrite in https://reviews.llvm.org/D102804 but it was eventually not done.
I am wondering whether the pattern you add here should be more generally be the rewrite on tensors I mentioned and then just let existing bufferization kick in?
@cathyzhyi I copyhacked your code into a tensor rewrite here: https://reviews.llvm.org/D105317
It seems fine modulo canonicalizations missing on memref::DimOp + tensor (which also interferes with tensor::DimOp that @springerm is looking at).
Feel free to take whatever makes sense to you and refactor / land.
Depending on where you go with this I'll adapt.
Will pick it up again tomorrow.
Thanks for pushing this!
@nicolasvasilache seems with the rewrite pattern version the integration test passed but there are some failures in the unit test. Will need to take a closer look tmr. Here is the unit test.
func @pad_tensor(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> { %cst = constant 0.0 : f32 %out = linalg.pad_tensor %arg0 low[0, 0, %arg1, 0] high[0, 0, 0, %arg1] { ^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index): // no predecessors linalg.yield %cst : f32 } : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32> return %out : tensor<4x?x?x?xf32> }
The functionality of this pattern is already provided by GenericPadTensorOpVectorizationPattern. It also translates a PadTensorOp into InitTensorOp + FillOp + InsertSliceOp. However, GenericPadTensorOpVectorizationPattern also does a bit more: It tries to generate vectorized alternatives to FillOp and InsertSliceOp. Only if there is not enough static type information, it generates FillOp and InsertSliceOp.
See https://reviews.llvm.org/D103679 for details. (The pattern was extended in subsequent commits, so check on Github for the most recent version.)
Can you run the vectorization pass or is vectorization not desired in your use case? If you want just InitTensorOp + FillOp + InsertSliceOp and no vectorization, I think there should be a way to avoid duplicating that functionality.
It seems the version @springerm has for vectorization is already fitting the bill and would need to be split out from the vectorization part.
@springerm , indeed if we get to bufferization and still have this form, we should bufferize without relying on vectorization.
After looking at this in more detail, the easiest way would be to add a boolean (template) parameter to GenericPadTensorOpVectorizationPattern, which enables or disables vectorization (and maybe rename the pattern). Then register the pattern in applyEnablingTransformations. However, then we would have a dependency of ComprehensiveBufferize.cpp on Vectorization.cpp. Not sure if that is a good idea. I don't see a better way of factoring out common functionality.
Alternatively, just duplicate the pattern (without vectorization) as this revision does at the moment. With the above code suggestion, we would be duplicating around 50 lines of code. Maybe not too bad.
@nicolasvasilache What do you think?
mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferize.cpp | ||
---|---|---|
2490–2507 ↗ | (On Diff #356084) | Can be replaced with: Value paddingValue = op.getConstantPaddingValue(); if (!paddingValue) return failure(); |
Indeed, so what I have in my local branch is quite close:
- the FillOp is not a "tryVectorize" but just a "createFillOrGenerateOp", it is unconditionally always there
- I just deleted the copy vectorization.
It seems this could be refactored and moved to Linalg/Transforms/Transforms.h and take a lambda for the copy.
By default it would take nothing (to just lower to other tensor ops).
The vectorization pattern would just reconfigure the pattern to add its own copy vectorization mechanism.
@silvas @nicolasvasilache @springerm Any suggestion on which pass to put this pattern in? There is a LinalgGeneralizationPass which currently only applies to patterns lowering to Linalg.generic.
The pattern will be used by whichever passes need it, such as ComprehensiveBufferize, Bufferize. (you should be able to just add this pattern into Bufferize.cpp in your original patch and let dialect conversion just work.
Oh, sorry, I thought you meant which woudl use it. It would be nice if it was in a generic helper file. You can probably put it next to PadTensorOpTransformationPattern for consistency.
move the new pattern's source code next to PadTensorOpTransformationPattern for
consistency
Nice refactoring!
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp | ||
---|---|---|
689–691 | can be deleted |
This has reached consensus, I suspect @cathyzhyi does not yet have write privilege to github, committing on her behalf.
can be deleted