This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Support v4i32 rotations (PR37426)
ClosedPublic

Authored by RKSimon on May 16 2018, 8:34 AM.

Details

Summary

As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.

v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.May 16 2018, 8:34 AM
RKSimon added inline comments.May 16 2018, 8:36 AM
test/CodeGen/X86/vector-rotate-128.ll
1599

It might be better to take the cost of 2 loads to remove the 2xPSHUFD and fold directly into the PMULUDQs? There is a small increase in codesize.

xbolva00 added inline comments.
test/CodeGen/X86/vector-rotate-128.ll
1599

+1 for PMULUDQ

craig.topper added inline comments.May 16 2018, 5:19 PM
lib/Target/X86/X86ISelLowering.cpp
23739

Combine these two ifs into one condition?

xbolva00 added inline comments.May 16 2018, 5:21 PM
lib/Target/X86/X86ISelLowering.cpp
23739

Yes

RKSimon updated this revision to Diff 147314.May 17 2018, 7:24 AM

Merged ifs()

This revision is now accepted and ready to land.May 20 2018, 6:24 PM
This revision was automatically updated to reflect the committed changes.