This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.
ClosedPublic

Authored by andreadb on Sep 29 2015, 9:51 AM.

Details

Summary

This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Added test x86-pshufb.ll.

Please let me know if ok to submit.

Thanks,
Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb updated this revision to Diff 35990.Sep 29 2015, 9:51 AM
andreadb retitled this revision from to [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant. .
andreadb updated this object.
andreadb added reviewers: RKSimon, spatel, qcolombet.
andreadb added a subscriber: llvm-commits.
RKSimon edited edge metadata.Sep 29 2015, 10:20 AM

Query about AVX2 instruction.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1184 ↗(On Diff #35990)

I don't think this is correct - AVX2 pshufb doesn't reuse the lower 128-bit lane it uses its own lane but with 'local' index values (0-15 representing the 16-31 of a shufflevector),

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=shuffle_epi8&expand=4700

andreadb added inline comments.Sep 29 2015, 10:26 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1184 ↗(On Diff #35990)

Right. This is wrong.
I will upload a new patch.

Thanks Simon.

andreadb updated this revision to Diff 36019.Sep 29 2015, 11:57 AM
andreadb edited edge metadata.

Patch updated.
Fixed the AVX2 byte shuffle mask computation (thanks Simon!).

Please let me know if ok to commit.

-Andrea

spatel added inline comments.Sep 29 2015, 1:09 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1185–1194 ↗(On Diff #36019)

If you use an int8_t instead of APInt for Index, does this become simpler?

Indexes[I] = (Index < 0) ? NumElts : Index & 0xF;
andreadb added inline comments.Sep 29 2015, 1:32 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1185–1194 ↗(On Diff #36019)

Yep. I will change it. Thanks.

Bigcheese added a subscriber: Bigcheese.

CCing chandler as he spent quite a while on shuffles.

andreadb updated this revision to Diff 36080.Sep 30 2015, 4:00 AM

Patch Updated.

Simplified code as suggested by Sanjay.

Ok to commit?

RKSimon accepted this revision.Sep 30 2015, 6:31 AM
RKSimon edited edge metadata.

A couple of minor things but otherwise LGTM. Thanks Andrea!

lib/Transforms/InstCombine/InstCombineCalls.cpp
1197 ↗(On Diff #36080)

I'm worried this is hardcoded to the AVX2 '32 element' implementation (so if the AVX512 intrinsic ever gets supported here...)

You could possibly merge this into the code above:

Indexes[I] = (I & ~0xF) + ((Index < 0) ? NumElts : (Index & 0xF))

test/Transforms/InstCombine/x86-pshufb.ll
2 ↗(On Diff #36080)

You have a decent range of tests here already - the only addition I can think of would be to show that some 'strange value' pshufb indices (i.e. not -128 or 0-15 or 0-31) get correctly masked based on their bits.

This revision is now accepted and ready to land.Sep 30 2015, 6:31 AM
andreadb added inline comments.Sep 30 2015, 6:40 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1197 ↗(On Diff #36080)

Yes, this is done specifically to support the AVX2 implementation.
However, I don't think we can merge that into the code above because we also have to handle ConstantAggregateZero pshufb masks.

test/Transforms/InstCombine/x86-pshufb.ll
2 ↗(On Diff #36080)

Sure, I will add extra tests for it.

This revision was automatically updated to reflect the committed changes.