This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Vectorize i64 ASHR operations
ClosedPublic

Authored by RKSimon on Jul 22 2015, 4:20 PM.

Details

Summary

This patch vectorizes the 2i64/4i64 ASHR shift operations - the only remaining integer vector shifts that still are being transferred to/from the scalar unit to be completed.

Note: The poor code gen for the X32 tests will be improved by D11327.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 30419.Jul 22 2015, 4:20 PM
RKSimon retitled this revision from to [X86][SSE] Vectorize i64 ASHR operations.
RKSimon updated this object.
RKSimon added reviewers: qcolombet, delena, andreadb, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
RKSimon updated this revision to Diff 30634.Jul 25 2015, 7:18 AM

rebased.

qcolombet added inline comments.Jul 28 2015, 9:35 AM
lib/Target/X86/X86ISelLowering.cpp
17457 ↗(On Diff #30634)

That wasn’t immediately clear to me that s>> and u>> referred to signed and unsigned shift.
Use lshr and ashr instead, like in llvm ir (or the SD name variant if you prefer).

test/CodeGen/X86/vector-shift-ashr-128.ll
27 ↗(On Diff #30634)

Is this sequence actually better?

I guess the GPR to vector and vector to GPR copies are quite expensive so that is the case.
Just double checking.

44 ↗(On Diff #30634)

Same question here (this time I guess not using pextr is good).

Thanks Quentin, if its not too much trouble please can you check the sibling patch to this one (D11327)?

lib/Target/X86/X86ISelLowering.cpp
17457 ↗(On Diff #30634)

No problem.

test/CodeGen/X86/vector-shift-ashr-128.ll
27 ↗(On Diff #30634)

Its very target dependent - on my old Penryn avoiding GPR/SSE transfers is by far the best option. Jaguar/Sandy Bridge don't care that much at the 64-bit integer level (it will probably come down to register pressure issues). Haswell has AVX2 so we can do per-lane v2i64 logical shifts where this patch really flies. When in 32-bit mode its always best to avoid trying to do (split) 64-bit shifts on GPR (see D11327).

Overall, the general per-lane ashr v2i64 lowering is the weakest improvement, but the splat and constant cases gain a lot more.

RKSimon updated this revision to Diff 30889.Jul 29 2015, 3:53 AM

Updated psuedocode comments to use llvm ir. Also updated the comment that I copied this from.

qcolombet accepted this revision.Jul 29 2015, 10:06 AM
qcolombet edited edge metadata.

Hi Simon,

LGTM.

Thanks,
-Quentin

This revision is now accepted and ready to land.Jul 29 2015, 10:06 AM
This revision was automatically updated to reflect the committed changes.