This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Use 'vector.flat_transpose' for 2-D 'vector.tranpose'
ClosedPublic

Authored by aartbik on May 28 2020, 6:12 PM.

Details

Summary

Progressive lowering of vector.transpose into an operation that
is closer to an intrinsic, and thus the hardware ISA. Currently
under the common vector transform testing flag, as we prepare
deploying this transformation in the LLVM lowering pipeline.

Diff Detail

Event Timeline

aartbik created this revision.May 28 2020, 6:12 PM
aartbik marked an inline comment as done.May 28 2020, 6:14 PM
aartbik added inline comments.
mlir/test/Dialect/Vector/vector-flat-transforms.mlir
6

sending this out for some early discussion; this hits a phase ordering issue

I propose to move the shapecast lowering a bit later, so we can fold them first and only lower them when they cannot be eliminated

mlir/include/mlir/Dialect/Vector/VectorOps.h
67

Can we use a struct VectorTransposeLowering to keep it consistent with the one above (even if for now there are really only 2 options atm) ?

mlir/test/Dialect/Vector/vector-flat-transforms.mlir
6

I think this should be resolved in a separate revision and this is fine for now.

Note that this is more an order of visitation problem. Given:

%a = shape_cast %0
%b = shape_cast %a

where the shape casts can fold, if %a is visited before %b then it will be expanded.

I have seen this type of behavior a bunch of times in different place (albeit not involving folding + canonicalization + lowering IIRC).

Seems like ShapeCast should have a canonicalizer / canonicalization pattern (hasCanonicalizer=1) with a separate match and rewrite.
Then ShapeCastLowering could query that on all its uses and fail to lower if any foldable use is left.

In other words, this type of pattern ordering can be resolved by finer-grained case disjunction.
However this seems like it can still "miss folding at a distance": consider a chain of transposes that are lowered in some arbitrary order introducing reshapes.
Folding opportunities would only appear if consecutive transposes are lowered before any newly introduced shape_cast is visited.
This seems like the worklist-based algorithm would handle this ordering naturally but I imagine we can construct more intricate cases where this would not be true?

Pinging @rriddle to see if there are more idiomatic ways of doing this, if this should be integrated in the rewriter itself (i.e. delay pattern application if any operand has folding opportunities), or something else.

aartbik updated this revision to Diff 267411.May 29 2020, 4:41 PM
aartbik marked 4 inline comments as done.

made option an enum

aartbik added inline comments.
mlir/include/mlir/Dialect/Vector/VectorOps.h
67

made this an enum

mlir/test/Dialect/Vector/vector-flat-transforms.mlir
6

yes, sounds good fixing this later somehow; the TODO reminds us to follow up :-)

ftynse accepted this revision.Jun 2 2020, 7:51 AM
ftynse added inline comments.
mlir/test/Dialect/Vector/vector-flat-transforms.mlir
6

Canonicalizer doesn't run in any pattern rewriting (it can in the Nicolas's multi-level driver if you configure it that way), folding is run in both greedy rewriter and dialect converter. However, folding does not recurse upwards on operands of the given op, which it seems what you need here. In dialect conversion, there's a TODO comment related to potentially folding any operation it visits even if it is considered legal. Maybe that will help.

This revision is now accepted and ready to land.Jun 2 2020, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2020, 7:51 AM
nicolasvasilache accepted this revision.Jun 3 2020, 2:33 PM
This revision was automatically updated to reflect the committed changes.