This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
ClosedPublic

Authored by RKSimon on Oct 9 2014, 3:54 AM.

Details

Summary

This patch builds on http://reviews.llvm.org/D5598 to improve byte shift/rotation shuffles on pre-SSSE3 (palignr) targets.

I've also made use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr with a zero register.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 14647.Oct 9 2014, 3:54 AM
RKSimon retitled this revision from to [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: chandlerc, andreadb, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
chandlerc added inline comments.Oct 12 2014, 6:32 PM
lib/Target/X86/X86ISelLowering.cpp
7545 ↗(On Diff #14647)

Please use the zeroable computation routine that the rest of the vector shuffle lowering uses. There are lots of other patterns that show up which this doesn't necessarily catch.

RKSimon updated this revision to Diff 15085.Oct 17 2014, 10:38 AM

Updated patch for SSE2 byte shifts/rotations.

As requested, I've re-implemented the byte shift shuffles using the zeroable computation routine. I've split the byte shift shuffle detection into a new function lowerVectorShuffleAsByteShift() as it makes it easier to do the zeroable element matching - lowerVectorShuffleAsByteRotate now only adds the basic PSRLDQ/PSLLDQ/POR pattern for pre-SSSE3 rotates.

I also made a fix to ISD::isBuildVectorAllZeros (and ISD::isBuildVectorAllOnes whilst I was there) so that the zero detection can properly see through multiple bitcast operations.

chandlerc added inline comments.Oct 17 2014, 11:11 AM
lib/Target/X86/X86ISelLowering.cpp
7451–7456 ↗(On Diff #15085)

The rest of this comment is now misleading.

I think it should now say something like "It does not check for the profitability of lowering either as PALIGNR or PSRLDQ/PSLLDQ/POR, only whether the mask is valid to lower in that form."

7550–7552 ↗(On Diff #15085)

Rather than this, I would suggest testing for SSSE3 prior to calling this in the routines where it is only profitable with SSSE3. That localizes the profitability breakdown to those routines which are already broken out by element size.

7610–7616 ↗(On Diff #15085)

The formatting here isn't right. Can you use clang-format? The curly at the least should be on the prior line.

7634–7641 ↗(On Diff #15085)

The bitwise ands here are weird, and I think you'll need some comments explaining the nature of your search strategy here.

7655 ↗(On Diff #15085)

It would be really nice to write this logic once in a lambda, and run it over a reversed mask.

RKSimon updated this revision to Diff 15249.Oct 22 2014, 9:42 AM

Updated patch for SSE2 byte shifts/rotations.

The patch now (re)uses a lambda to detect the sequential 'shifted' shuffle mask - I've split the shift and zeroable matching sections of the byte-shift routine to make it easier to understand/maintain.

Also, tidied up comments and pushed the byte-rotate profitability check up into the calling functions.

ping

Chandler - are you happy with the changes I made to the patch? Thanks, Simon.

RKSimon updated this revision to Diff 16116.Nov 12 2014, 2:31 PM
RKSimon set the repository for this revision to rL LLVM.

rebased against trunk

andreadb accepted this revision.Nov 18 2014, 8:10 AM
andreadb edited edge metadata.

Hi Simon,
For what is worth, the patch LGTM. Chandler - do you have any final comments?

This revision is now accepted and ready to land.Nov 18 2014, 8:10 AM
RKSimon closed this revision.Nov 19 2014, 2:07 AM
RKSimon updated this revision to Diff 16369.

Closed by commit rL222340 (authored by @RKSimon).