This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Improve lowering of extract_strided_slice (and friends like shape_cast)
ClosedPublic

Authored by aartbik on Aug 6 2020, 3:44 PM.

Details

Summary

Using a shuffle for the last recursive step in progressive lowering not only
results in much more compact IR, but also more efficient code (since the
backend is no longer confused on subvector aliasing for longer vectors).

E.g. the following

%f = vector.shape_cast %v0: vector<1024xf32> to vector<32x32xf32>

yields much better x86-64 code that runs 3x faster than the original.

Diff Detail

Event Timeline

aartbik created this revision.Aug 6 2020, 3:44 PM
Herald added a project: Restricted Project. · View Herald Transcript
aartbik requested review of this revision.Aug 6 2020, 3:44 PM
aartbik retitled this revision from [mlir] [VectorOps] Improve lowering of vector.extract_strided_slice (and friends like shape_cast) to [mlir] [VectorOps] Improve lowering of extract_strided_slice (and friends like shape_cast).Aug 6 2020, 3:53 PM
aartbik edited the summary of this revision. (Show Details)
This revision is now accepted and ready to land.Aug 6 2020, 4:48 PM
aartbik updated this revision to Diff 283776.Aug 6 2020, 5:27 PM

trigger tests

nicolasvasilache accepted this revision.Aug 7 2020, 2:38 AM

Great, thanks for fixing this perf bug !

This revision was landed with ongoing or failed builds.Aug 7 2020, 9:21 AM
This revision was automatically updated to reflect the committed changes.