Page MenuHomePhabricator

[mlir][X86Vector] Add specialized vector.transpose lowering patterns for AVX2
ClosedPublic

Authored by nicolasvasilache on Nov 6 2021, 9:19 AM.

Details

Summary

This revision adds an implementation of 2-D vector.transpose for 4x8 and 8x8 for
AVX2 and surfaces it to the Linalg level of control.

Diff Detail

Event Timeline

nicolasvasilache requested review of this revision.Nov 6 2021, 9:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 6 2021, 9:19 AM
dcaballe accepted this revision.Nov 8 2021, 12:52 PM

Thanks, Nicolas! LGTM.

mlir/include/mlir/Dialect/X86Vector/Transforms.h
100

nit: avx2 -> AVX2 for consistency

mlir/lib/Dialect/Linalg/Transforms/LinalgStrategyPasses.cpp
299

what happened here? Should we remove it?

mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp
92

Just a philosophical comment on whether we want to promote this way of describing masks. Even though these are AVX2 specific intrinsics, this mask format is so misleading if you compare it with how the shuffle mask is represented in MLIR and LLVM... I guess, it will facilitate the portability of code based on AVX2 intrinsics. That's an important point.

This revision is now accepted and ready to land.Nov 8 2021, 12:52 PM

Some fly by comments

mlir/include/mlir/Dialect/X86Vector/Transforms.h
28

how about making these unsigned
(which also makes your subsequent overflow asserts correct, since this would pass for negative numbers ;-)

mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp
10

The filename seems to imply it is specific to AVX target and the transpose op.
Perhaps make the L9 and L1 description more consistent with that?

82

I am guessing this is there to avoid "unused variable" errors in no NDEBUG mode that removes the asserts?

168

int64_t for i, since m is that type?

182

int64_t for i, since m is that type?

nicolasvasilache marked 8 inline comments as done.Nov 10 2021, 6:26 AM
nicolasvasilache added inline comments.
mlir/lib/Dialect/Linalg/Transforms/LinalgStrategyPasses.cpp
299

oops, thanks!

mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp
82

yup.

92

In the end, masks lower to a single i8 and anyone can use 0xCE.
However I will strongly push back against writing the compiler this way because it is hyper-obfuscating (in fact, even C intrinsics have an MM_SHUFFLE macro to try and improve things).
Depending on the instruction / instrinsic the 8bits are interpreted in various confusing ways..

My position is that the alternative is so terrible, that we should absolutely promote this way of describing masks.
Translating existing code may be trickier but it will also force anyone doing this translation understand what it is doing.

nicolasvasilache marked 3 inline comments as done.

Address comments.

aartbik accepted this revision.Nov 10 2021, 9:18 AM

still LGTM, thanks!

mehdi_amini added inline comments.Nov 10 2021, 9:22 AM
mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp
82

The #ifndef NDEBUG alternative would also eliminate the two uniquing calls / lookup in prod here (the optimizer won't remove these side-effecting get())

Rebase & address.

This revision was landed with ongoing or failed builds.Nov 10 2021, 11:39 PM
This revision was automatically updated to reflect the committed changes.