This revision adds an implementation of 2-D vector.transpose for 4x8 and 8x8 for
AVX2 and surfaces it to the Linalg level of control.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks, Nicolas! LGTM.
mlir/include/mlir/Dialect/X86Vector/Transforms.h | ||
---|---|---|
100 | nit: avx2 -> AVX2 for consistency | |
mlir/lib/Dialect/Linalg/Transforms/LinalgStrategyPasses.cpp | ||
299 | what happened here? Should we remove it? | |
mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp | ||
92 | Just a philosophical comment on whether we want to promote this way of describing masks. Even though these are AVX2 specific intrinsics, this mask format is so misleading if you compare it with how the shuffle mask is represented in MLIR and LLVM... I guess, it will facilitate the portability of code based on AVX2 intrinsics. That's an important point. |
Some fly by comments
mlir/include/mlir/Dialect/X86Vector/Transforms.h | ||
---|---|---|
28 | how about making these unsigned | |
mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp | ||
10 | The filename seems to imply it is specific to AVX target and the transpose op. | |
82 | I am guessing this is there to avoid "unused variable" errors in no NDEBUG mode that removes the asserts? | |
168 | int64_t for i, since m is that type? | |
182 | int64_t for i, since m is that type? |
mlir/lib/Dialect/Linalg/Transforms/LinalgStrategyPasses.cpp | ||
---|---|---|
299 | oops, thanks! | |
mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp | ||
82 | yup. | |
92 | In the end, masks lower to a single i8 and anyone can use 0xCE. My position is that the alternative is so terrible, that we should absolutely promote this way of describing masks. |
mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp | ||
---|---|---|
82 | The #ifndef NDEBUG alternative would also eliminate the two uniquing calls / lookup in prod here (the optimizer won't remove these side-effecting get()) |
how about making these unsigned
(which also makes your subsequent overflow asserts correct, since this would pass for negative numbers ;-)