Page MenuHomePhabricator

[mlir][linalg] Extend linalg vectorization to support non-identity input maps
ClosedPublic

Authored by ThomasRaoux on Mar 12 2021, 9:33 AM.

Details

Summary

This handles cases where the linalg input needs to be transposed.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Mar 12 2021, 9:33 AM
ThomasRaoux requested review of this revision.Mar 12 2021, 9:33 AM
asaadaldien added inline comments.Mar 12 2021, 11:04 AM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
350

if its isElementwise && isMinorIdentity we still need to broadcast. Can we push this down to merge the two path by inserting transposes if !isMinorIdentity ?

ThomasRaoux added inline comments.Mar 12 2021, 11:10 AM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
350

if its isElementwise && isMinorIdentity we still need to broadcast. Can we push this down to merge the two path by inserting transposes if !isMinorIdentity ?

Correct, I was considering merging those two cases. That would simplify slightly the vectorization however that could generate less efficient code as right now the broadcasting is done lazily and if some operations don't need to be expanded on all the output dimensions it may pick a vector of smaller rank.
I'm not sure how likely this case is to happen so if people think it is not worth the extra complexity I can merge those two cases.

Note that this could also be optimized away by canonicalizations later on. I don't have a strong opinion, so I can change it based on what you, Nicolas and Aart think.

asaadaldien accepted this revision.Mar 12 2021, 12:52 PM
asaadaldien added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
276

Nit: Add a comment broadcasting into inner most dims.

350

If we have can one of operands at vectorized to a lower-rank that means all its consumers are at most at the same lower-rank. In other words we can split the linalg.generic into two independent ops operating at lower-rank & higher rank domains.....
IMO we can keep it that way, this is more general and the added complicity isn't much =)
Lets wait for @nicolasvasilache, @aartbik opinion.

This revision is now accepted and ready to land.Mar 12 2021, 12:52 PM
nicolasvasilache requested changes to this revision.Mar 15 2021, 3:04 AM

The vector.transfer_read / write operation have support for expressing the permutation and broadcast semantics as part of the permutation_map.
The transformation you are doing here looks like the kind of more general stuff needed to rewrite that permutation_map into vector.transpose + broadcast.

The tradeoff is that vector.transfer + permutation can lower better as loop + memory indexing reordering.
Introducing transpose + broadcast at the time of Linalg -> Vector is probably lowering too quickly.
OTOH we could also think of implementing some of this as canonicalizations of vector.transfer + transpose + broadcast.

This also has to be thought of in the context of vector.shape_cast and future scalable vectors.

So bottom line: I am a conflicted.
This looks more generally applicable and writing it in 2 steps as: 1) Linalg -> vector.transfer + permutation followed by 2) better support of permutation_map when lowering vector.transfer better satisfies my intuitions
OTOH it is also possible that the permutation_map in vector.transfer is more complicated to use; but this has to be weighed off against the alternative.

Can you put an extra time to chat in my cal tomorrow so we go a bit deeper with higher BW?
I'll put a blocker in the meantime because this sits at the intersection of a bunch of different things.

mlir/test/Dialect/Linalg/vectorization.mlir
346

args_in/out attributes are not needed anymore

This revision now requires changes to proceed.Mar 15 2021, 3:04 AM

Use transfer_read maps instead of emitting directly broadcast/transpose

ThomasRaoux marked an inline comment as done.Mar 16 2021, 2:17 PM

The vector.transfer_read / write operation have support for expressing the permutation and broadcast semantics as part of the permutation_map.
The transformation you are doing here looks like the kind of more general stuff needed to rewrite that permutation_map into vector.transpose + broadcast.

The tradeoff is that vector.transfer + permutation can lower better as loop + memory indexing reordering.
Introducing transpose + broadcast at the time of Linalg -> Vector is probably lowering too quickly.
OTOH we could also think of implementing some of this as canonicalizations of vector.transfer + transpose + broadcast.

This also has to be thought of in the context of vector.shape_cast and future scalable vectors.

So bottom line: I am a conflicted.
This looks more generally applicable and writing it in 2 steps as: 1) Linalg -> vector.transfer + permutation followed by 2) better support of permutation_map when lowering vector.transfer better satisfies my intuitions
OTOH it is also possible that the permutation_map in vector.transfer is more complicated to use; but this has to be weighed off against the alternative.

Can you put an extra time to chat in my cal tomorrow so we go a bit deeper with higher BW?
I'll put a blocker in the meantime because this sits at the intersection of a bunch of different things.

As discussed I switched to using affine map in vector transfer instead. I'll prepare another patch once we agree on this review to do the lowering of those transfer ops with affine maps.

nicolasvasilache added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
258

// -> ///

275

// -> ///

This revision is now accepted and ready to land.Mar 18 2021, 10:12 AM
This revision was landed with ongoing or failed builds.Mar 18 2021, 12:33 PM
This revision was automatically updated to reflect the committed changes.