This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permute
ClosedPublic

Authored by RKSimon on Jun 8 2016, 12:38 PM.

Details

Summary

This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments.

Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 60082.Jun 8 2016, 12:38 PM
RKSimon retitled this revision from to [X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permute.
RKSimon updated this object.
RKSimon added reviewers: qcolombet, ab, spatel, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
RKSimon updated this revision to Diff 61211.Jun 19 2016, 12:01 PM

Rebased. Added an extra check that forces use of 256-bit float/double unary shuffles on AVX1 targets (since it doesn't have any 256-bit integer shuffles).

RKSimon updated this revision to Diff 61609.Jun 22 2016, 2:31 PM

Fixed issue causing an unnecessary assertion on some AVX2 cross-lane shuffles.

ab edited edge metadata.Jun 22 2016, 3:12 PM

Code LGTM, but I'm not sure I have a clear picture; questions inline.

lib/Target/X86/X86ISelLowering.cpp
96–1

This looks like independent goodness; maybe extract that out?

test/CodeGen/X86/vector-shuffle-128-v2.ll
2

I'm surprised by this and other changes; isn't the combine for shuffle chains? (it does look better for folding though; just trying to understand)

test/CodeGen/X86/vector-shuffle-128-v4.ll
3

In particular, this looks slightly more expensive according to Agner's Intel tables (for the folded variants)

RKSimon updated this revision to Diff 61699.Jun 23 2016, 10:54 AM
RKSimon edited edge metadata.

Updated to prefer binary shuffle (unpck mainly) over permutes - although this will prevent some folding its shouldn't affect register pressure. We don't handle i32 unpcks as pshufd typically has similar performance. The other changes you see from unary shuffle (e.g. movddup to pshufd) are typically because the target shuffle combine is a bit more ruthless at looking through bitcasts, hopefully reducing domain stalls.

RKSimon added inline comments.Jun 23 2016, 10:56 AM
lib/Target/X86/X86ISelLowering.cpp
24576–24578

I can't separate this easily - annoyingly this is only of use once we add support for permutes in this patch, and without it we see regressions with this patch.

ab added a comment.Jun 24 2016, 1:50 PM

Updated to prefer binary shuffle (unpck mainly) over permutes - although this will prevent some folding its shouldn't affect register pressure.

The permutes are indeed better because they either enable folding, or have similar performance to the binary shuffles (at least on the recent Intel CPUs I look at). If there's no other reason, revert to the previous patch? That one LGTM.
Very sorry I wasn't clear; was just curious!

We don't handle i32 unpcks as pshufd typically has similar performance. The other changes you see from unary shuffle (e.g. movddup to pshufd) are typically because the target shuffle combine is a bit more ruthless at looking through bitcasts, hopefully reducing domain stalls.

Sounds good, thanks!

test/CodeGen/X86/vector-shuffle-128-v2.ll
0

Extra NOTE

RKSimon updated this revision to Diff 61963.Jun 27 2016, 7:25 AM

Updated based on Ahmed's comments. OK to commit?

ab accepted this revision.Jun 27 2016, 8:09 AM
ab edited edge metadata.

Yup yup, LGTM; thanks again.

This revision is now accepted and ready to land.Jun 27 2016, 8:09 AM
This revision was automatically updated to reflect the committed changes.