This is an archive of the discontinued LLVM Phabricator instance.

[ms] Add __shiftleft128 / __shiftright128 intrinsics
ClosedPublic

Authored by thakis on Jul 20 2018, 9:32 AM.

Details

Summary

Carefully match the pattern matched by ISel so that this produces shld / shrd (unless Subtarget->isSHLDSlow() is true).

Thanks to Craig Topper for providing the LLVM IR pattern that gets successfully matched.

Fixes PR37755.

Diff Detail

Event Timeline

thakis created this revision.Jul 20 2018, 9:32 AM

@spatel, should this ultimately use funnel shift?

Here are the IR patterns for this that work. Not sure if we can do this directly in C, we need a 128 bit type, but maybe we can emit it from CGBuiltin.cpp?

define i64 @__shiftleft128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = shl i128 %d, %e
  %g = lshr i128 %f, 64
  %h = trunc i128 %g to i64
  ret i64 %h
}

define i64 @__shiftright128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = lshr i128 %d, %e
  %g = trunc i128 %f to i64
  ret i64 %g
}

We have __int128. If you think hitting the pattern is preferable to inline asm, I can try to give that a try, either via C or via CGBuiltin.cpp.

I'd prefer the pattern over inline assembly. It'll give us more flexibility in the backend if we should be using some other instruction on different targets.

Here are the IR patterns for this that work. Not sure if we can do this directly in C, we need a 128 bit type, but maybe we can emit it from CGBuiltin.cpp?

define i64 @__shiftleft128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = shl i128 %d, %e
  %g = lshr i128 %f, 64
  %h = trunc i128 %g to i64
  ret i64 %h
}

define i64 @__shiftright128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = lshr i128 %d, %e
  %g = trunc i128 %f to i64
  ret i64 %g
}

I’m not at my dev machine, but this is exactly the definition of funnel shift, no? Unless that got reverted, adding/modifying clang builtins was the next step in the plan for those intrinsics. We probably need some backend work to match the variable shift version, but shift-by constant should already work.

thakis updated this revision to Diff 156596.Jul 20 2018, 1:33 PM
thakis edited the summary of this revision. (Show Details)

Now with C builtins which get nicely optimized.

@spatel, yes its exactly funnel shift. I wasn't sure if we were ready for clang to create it yet or not. Can we let this go as is and change it to funnel shift once we have the variable case fixed in the backend?

Isn't implementing this in plain old C the nicest approach anyhow, even once funnel shift exists?

The only weird thing that I can really think of with the C version is that the 'and' on the shift amount might get hoisted out of a loop and not get dropped during isel.

craig.topper accepted this revision.Jul 20 2018, 2:04 PM

LGTM. I'm inclined to let this go in now since we have a requested use for it. We can change it to funnel shift once we're confident in the backend.

This revision is now accepted and ready to land.Jul 20 2018, 2:04 PM

Isn't implementing this in plain old C the nicest approach anyhow, even once funnel shift exists?

No, the primary reason for creating the intrinsic is that we can’t guarantee that we’ll recognize the pattern as ‘shld’ or ‘rotate’ in this C/IR form. That’s because the pattern can get split across basic blocks, so the backend can’t easily recognize it. Also, IIUC this patch only deals with the specific case of 64-bit values, so we still don’t have the functionality for other widths?

I don’t object if this helps that one case, but we have made the investment in the IR intrinsics, so using them should be the general direction/goal.

thakis closed this revision.Jul 20 2018, 2:07 PM

r337619, thanks! The hoisting point is a good one; will rewrite using funnelshift once that's possible :-)