This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Support tensor.pad bufferization
AbandonedPublic

Authored by tpopp on Apr 4 2022, 6:07 AM.

Details

Summary

This tensor.pad can be represented as an initialization of the result
with the padding values followed by a strided insert of the input into
the output. To honor enforced invariants of bufferization, the pad
operation creates bufferized forms of both of these steps at the same
time.

Diff Detail

Event Timeline

tpopp created this revision.Apr 4 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 6:07 AM
tpopp requested review of this revision.Apr 4 2022, 6:07 AM
pifon2a accepted this revision.Apr 4 2022, 9:23 AM
pifon2a added inline comments.
mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
746

nit: loc?

754–760

what about making this function a part of extraClassDeclaration for PadOp?

760

cache results of getMixedLowPad and getMixedHighPad since they construct a new vector?

771

super-nit: "maybeResult" and "result" are a bit confusing here. What about just calling it allocOr and alloc?

783

upperBounds.reserve(rank)?

785

nit: for (int i = 0, nextDynamicIndex = 0; i < rank; i++)?

873

nit: remove empty line?

This revision is now accepted and ready to land.Apr 4 2022, 9:23 AM
springerm added inline comments.Apr 4 2022, 9:29 AM
mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
867

The buffer of padOp.source() is not modified, so this can return false.

876

bufferize() always allocates a brand new buffer, right? In that case, always return {}in this function.

springerm accepted this revision.Apr 4 2022, 9:41 AM
springerm added inline comments.
mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
769–770

See comment below. It would be cleaner to query the loop bounds with ReifyRankedShapedTypeOpInterface. Can be done in separate commit or add TODO.

mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp
43

Why is this needed?

mlir/test/Dialect/Tensor/bufferize.mlir
379–382

Try this with a test case that has a dynamic result dim size. I think the bufferization will fail.

PadOp should implement ReifyRankedShapedTypeOpInterface, which it does not currently. Once it does, the dynamic bufferization test case should pass.

(Can be done in a separate commit.)

springerm added inline comments.Apr 4 2022, 11:47 PM
mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
769–770

You can try rebasing on D123108. This implements ReifyRankedShapedTypeOpInterface. Should be able to simplify the index computation part.

bondhugula requested changes to this revision.Apr 5 2022, 12:31 AM
bondhugula added subscribers: ftynse, bondhugula.
bondhugula added inline comments.
mlir/test/Dialect/Tensor/bufferize.mlir
395–399

As a general comment, I'm not sure why scf.parallel's are being generated here instead of affine.parallel + affine.store -- these are trivially affine. And then if you do scf -> affine on this, you'll get this scf output exactly. So you have a strictly better path lowering this through affine.

I notice there are other lowerings that have chosen this and that's just because they were missed during review propagating the counter-pattern.

CC: @ftynse @mehdi_amini

This revision now requires changes to proceed.Apr 5 2022, 12:31 AM
bondhugula added inline comments.Apr 5 2022, 12:32 AM
mlir/test/Dialect/Tensor/bufferize.mlir
395–399

Correction: "scf -> affine" should "affine -> scf" above.

springerm added inline comments.Apr 5 2022, 1:34 AM
mlir/test/Dialect/Tensor/bufferize.mlir
395–399

E.g., tensor.generate also lowers to scf.parallel. I think there's no particular reason why it was implemented like that. When I touched that bufferization pattern the last time, I just did not think of lowering to affine, then to SCF.

springerm added inline comments.Apr 5 2022, 3:10 AM
mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
769–770

Just noticed that the op already implements the interface via an external model: mlir::tensor::registerInferTypeOpInterfaceExternalModels.

bondhugula added inline comments.Apr 5 2022, 7:52 PM
mlir/test/Dialect/Tensor/bufferize.mlir
395–399

tensor -> affine -> scf would be natural lowering and de-abstraction here. Lowering directly from tensor -> scf, whenever tensor -> affine is possible, would just be the wrong direction and strictly worse in all possible ways. We shouldn't propagate this further and fix the rest. They were just missed during the review.

To follow on Uday's comment, this should not generate loops at all.
In linalg, you'd generate linalg.fill + subview + linalg.copy and later re-use the lowering to any of the loop flavor you want.

Seems like you are missing a memref.fill or memref.memset abstraction.

nicolasvasilache requested changes to this revision.Apr 6 2022, 1:32 AM
tpopp added a comment.May 12 2022, 1:19 AM

Sorry to delay on this for so long. I am proposing, instead, a TensorToLinalgPass here: https://reviews.llvm.org/D125384

I believe that bufferization should try to stay simple: tensor ops converted to memref ops in the case of accesses and other operations converting their types from tensors to memrefs.

For lowerings like tensor.pad which involve other control flow as discussed here, I believe that should not be considered as part of bufferization but instead should be a more principled lowering at another point in time. This separates the concerns then of tensor operations that are closely tied to the type tensor and operations that are there for convenience but would be just as relevant somewhere like linalg.

tpopp abandoned this revision.May 23 2022, 7:12 AM