This is an archive of the discontinued LLVM Phabricator instance.

[X86] Expose the various _rot intrinsics on non-MS platforms
ClosedPublic

Authored by mkuper on Aug 23 2015, 6:38 AM.

Details

Summary

_rotl, _rotwl and _lrotl (and their right-shift counterparts) are official x86 intrinsics, and should be supported regardless of environment.
This is in contrast to _rotl8, _rotl16, and _rotl64 which are MS-specific.

Note that the MS documentation for _lrotl is different from the Intel documentation. Intel explicitly documents it as a 64-bit rotate, while for MS, since sizeof(unsigned long) for MSVC is 4, a 32-bit rotate is clearly implied.
Compare:
https://msdn.microsoft.com/en-us/library/a0w705h5.aspx
vs.
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=rot&techs=Other&expand=3193

Note that this doesn't change the implementations of these intrinsics, which are currently pretty awful.
We only manage to match the 32-bit versions to a rotate, and even then, still have the "and" and the control flow in place. That should be dealt with separately.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 32925.Aug 23 2015, 6:38 AM
mkuper retitled this revision from to [X86] Expose the various _rot intrinsics on non-MS platforms.
mkuper updated this object.
mkuper added reviewers: majnemer, rnk.
mkuper added a subscriber: cfe-commits.
rnk accepted this revision.Aug 24 2015, 9:14 AM
rnk edited edge metadata.

This looks good.

As a larger issue, LLVM fast isel definitely won't pattern match this series of shifts and selects to rotl at -O0. There are some users who want branchless constant time rotates regardless of optimization level (https://llvm.org/bugs/show_bug.cgi?id=24226). My thinking is that the shifts are more analyzable to LLVM than an intrinsic, so we should leave these intrinsics alone and tell such users to use inline asm if they need these kinds of low-level guarantees. It still isn't very satisfactory. =/

test/CodeGen/x86-rot-intrinsics.c
9–11 ↗(On Diff #32925)

Any reason not to use -ffreestanding to deal with this on the Linux side of the test like we do for windows?

This revision is now accepted and ready to land.Aug 24 2015, 9:14 AM
silvas added a subscriber: silvas.Aug 24 2015, 11:01 PM
In D12271#231204, @rnk wrote:

This looks good.

As a larger issue, LLVM fast isel definitely won't pattern match this series of shifts and selects to rotl at -O0. There are some users who want branchless constant time rotates regardless of optimization level (https://llvm.org/bugs/show_bug.cgi?id=24226). My thinking is that the shifts are more analyzable to LLVM than an intrinsic, so we should leave these intrinsics alone and tell such users to use inline asm if they need these kinds of low-level guarantees. It still isn't very satisfactory. =/

I don't remember anything in PR24226 about "regardless of optimization level". In particular, code with "constant time" requirements won't tolerate -O0 codegen anyway because of the large amount of memory accesses into the stack -- the timing of such operations will depend on nasty things like cache line aliasing and such, which has similar problems to branches.

This revision was automatically updated to reflect the committed changes.