Progressive lowering of vector.transpose into an operation that
is closer to an intrinsic, and thus the hardware ISA. Currently
under the common vector transform testing flag, as we prepare
deploying this transformation in the LLVM lowering pipeline.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/test/Dialect/Vector/vector-flat-transforms.mlir | ||
---|---|---|
6 | sending this out for some early discussion; this hits a phase ordering issue I propose to move the shapecast lowering a bit later, so we can fold them first and only lower them when they cannot be eliminated |
mlir/include/mlir/Dialect/Vector/VectorOps.h | ||
---|---|---|
67 | Can we use a struct VectorTransposeLowering to keep it consistent with the one above (even if for now there are really only 2 options atm) ? | |
mlir/test/Dialect/Vector/vector-flat-transforms.mlir | ||
6 | I think this should be resolved in a separate revision and this is fine for now. Note that this is more an order of visitation problem. Given: %a = shape_cast %0 %b = shape_cast %a where the shape casts can fold, if %a is visited before %b then it will be expanded. I have seen this type of behavior a bunch of times in different place (albeit not involving folding + canonicalization + lowering IIRC). Seems like ShapeCast should have a canonicalizer / canonicalization pattern (hasCanonicalizer=1) with a separate match and rewrite. In other words, this type of pattern ordering can be resolved by finer-grained case disjunction. Pinging @rriddle to see if there are more idiomatic ways of doing this, if this should be integrated in the rewriter itself (i.e. delay pattern application if any operand has folding opportunities), or something else. |
mlir/test/Dialect/Vector/vector-flat-transforms.mlir | ||
---|---|---|
6 | Canonicalizer doesn't run in any pattern rewriting (it can in the Nicolas's multi-level driver if you configure it that way), folding is run in both greedy rewriter and dialect converter. However, folding does not recurse upwards on operands of the given op, which it seems what you need here. In dialect conversion, there's a TODO comment related to potentially folding any operation it visits even if it is considered legal. Maybe that will help. |
Can we use a struct VectorTransposeLowering to keep it consistent with the one above (even if for now there are really only 2 options atm) ?