This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)
ClosedPublic

Authored by RKSimon on Nov 8 2018, 9:22 AM.

Details

Summary

This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location.

The code works fine for 256 and 512-bit vectors as well (although its currently limited to in-line shuffles), but I'm seeing a number of regressions (mainly we'd prefer blend+permute in many cases) that need addressing before enabling on anything but v16i8.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Nov 8 2018, 9:22 AM

Does this change the code generated for rgbcmyk here https://godbolt.org/z/cot3xT I filed the original PR based on what happened from trying to vectorize it for sse4.2 which we don't currently do, but I think the two vblendvbs in the avx2 output are similar.

Does this change the code generated for rgbcmyk here https://godbolt.org/z/cot3xT I filed the original PR based on what happened from trying to vectorize it for sse4.2 which we don't currently do, but I think the two vblendvbs in the avx2 output are similar.

At the moment I haven't added this to v32i8 shuffle lowering as its still causing some regressions (a lot of the lowering patterns in lowerV32I8VectorShuffle are hidden in sub functions making it more difficult to use lowerVectorShuffleAsByteRotateAndPermute in the right cicrumstances). I could add it but I'd have to bail out if I see certain patterns - I'll update the patch for you to see the effect.

craig.topper added inline comments.Nov 8 2018, 10:34 PM
lib/Target/X86/X86ISelLowering.cpp
10495 ↗(On Diff #173183)

Should this be NumLaneElts instead of NumElts?

10496 ↗(On Diff #173183)

If the min/max is inside the lane, isn't Range1.first always less than or equal to Range1.second? And isn't Range1.first always less than or equal to Range1.second unless the input is unused by the shuffle? Which would mean the input is undef?

RKSimon updated this revision to Diff 173308.Nov 9 2018, 6:00 AM

Add 256-bit vector support.

The ymm support requires a minor hack that prevents lowerVectorShuffleAsBlendAndPermute from lowering if the blend mask wouldn't be an immediate (i.e. PBLENDVB), it'll instead try to lower using lowerVectorShuffleAsUNPCKAndPermute and then lowerVectorShuffleAsByteRotateAndPermute before falling back on lowerVectorShuffleAsBlendAndPermute again. This helps the code from https://godbolt.org/z/cot3xT

craig.topper added inline comments.Nov 12 2018, 10:17 AM
lib/Target/X86/X86ISelLowering.cpp
10495 ↗(On Diff #173183)

I think this comment was addressed?

10496 ↗(On Diff #173183)

I dont' think i saw an answer for this

10170 ↗(On Diff #173308)

If the elements could be widened wouldn't they have already been widened in lowerVectorShuffle?

RKSimon marked 2 inline comments as done.Nov 12 2018, 10:25 AM
RKSimon added inline comments.
lib/Target/X86/X86ISelLowering.cpp
10496 ↗(On Diff #173183)

Yes, I can drop the Range1.first <= Range1.second condition

10170 ↗(On Diff #173308)

You can have cases where the blend is wider than the permute (PBLENDD + PSHUFB etc.)

craig.topper added inline comments.Nov 12 2018, 10:52 AM
lib/Target/X86/X86ISelLowering.cpp
10170 ↗(On Diff #173308)

Oh right that makes sense. BlendMask is different than Mask

RKSimon updated this revision to Diff 173732.Nov 12 2018, 12:40 PM

Removed the Range1.first <= Range1.second condition

This revision is now accepted and ready to land.Nov 12 2018, 1:08 PM
This revision was automatically updated to reflect the committed changes.