Page MenuHomePhabricator

[AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
ClosedPublic

Authored by craig.topper on Dec 15 2016, 1:13 PM.

Details

Summary

This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper retitled this revision from to [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions.
craig.topper updated this object.
craig.topper added reviewers: RKSimon, zvi, delena, spatel.
craig.topper added a subscriber: llvm-commits.

Accidentally dropped an important change simplifyX86vpermv to mask a different number of bits based on element count.

RKSimon added inline comments.Dec 16 2016, 2:49 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
2165 ↗(On Diff #81710)

Move the mask code to a helper function? Isn't it likely we'll need it again for future avx512 intrinsic combines?

Moved masking to a helper function for future reuse.

delena added inline comments.Dec 17 2016, 8:39 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1049 ↗(On Diff #81834)

do we have a good code at the end after all mask shuffles?

I think in the worst case we may end up selecting a shuffle with large element size and end up being unable to combine the mask into the instruction. But we've likely removed a constant pool load so that may still be a win.

RKSimon edited edge metadata.Dec 19 2016, 1:30 AM

Has the time come to add VPERMV/VPERMV3 support to combineBitcastForMaskedOp?

Simon, you mean turning other shuffles into VPERMV/VPERMV3? This patch would tend to prevent usage of VPERMV/VPERMV3 in favor of other shuffles. But may breaking masking so to recover masking we'd need to turn those other shuffles in to VPERMV/VPERMV3.

RKSimon accepted this revision.Dec 21 2016, 9:16 AM
RKSimon edited edge metadata.

Simon, you mean turning other shuffles into VPERMV/VPERMV3? This patch would tend to prevent usage of VPERMV/VPERMV3 in favor of other shuffles. But may breaking masking so to recover masking we'd need to turn those other shuffles in to VPERMV/VPERMV3.

I meant using combineBitcastForMaskedOp to rescale VPERMV/VPERMV3 constant shuffle masks (if legal) so that they can combine with a select mask if its there - so a vpermv <8 x i64> becomes <16 x i32> etc.

Anyway, this would be a separate task and shouldn't have that much effect on this, but would allow us to relax the strict bail out at the start of combineX86ShuffleChain.

LGTM

This revision is now accepted and ready to land.Dec 21 2016, 9:16 AM
This revision was automatically updated to reflect the committed changes.