This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to native shifts
ClosedPublic

Authored by RKSimon on Apr 28 2016, 11:22 AM.

Details

Summary

Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).

If the shift amount is constant we can sometimes convert these instructions to native shifts:

1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the bitwidth (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.

In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 55448.Apr 28 2016, 11:22 AM
RKSimon retitled this revision from to [InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to native shifts.
RKSimon updated this object.
RKSimon added reviewers: qcolombet, ab, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
qcolombet added inline comments.May 16 2016, 10:48 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
379

return nullptr?

Also, shouldn’t the type of the argument of the intrinsic + the fact that CShift is constant make sure this could be an assert?

414

Could you rephrase? I have trouble parsing the sentence :P.

majnemer added inline comments.
lib/Transforms/InstCombine/InstCombineCalls.cpp
339–354

Er, why do we even have these intrinsics?
They seem sorta pointless, it seems way better to get rid of these and replace their use in clang's headers with a generic shift which sanitizes the shift amount.

I tend to agree. Probably still need to make sure that passing too large of
an immediate to the software exposed intrinsics returns 0 instead of being
undefined behavior since that is the instruction spec.

RKSimon added inline comments.May 16 2016, 2:34 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
339–354

What kind of sanitization can we add that doesn't raise the risk of adding extra code in there for unknown shift amounts? We can't just use a mask - it has to be a full CMP/MAX style test afaict.

majnemer added inline comments.May 16 2016, 2:38 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
339–354

That should be rather easy to match against in the DAG, no?

We do this sort of sanitization today for other intrinsics:
https://github.com/llvm-mirror/clang/blob/master/lib/Headers/avxintrin.h#L1329

RKSimon updated this revision to Diff 57489.May 17 2016, 9:40 AM

Fixed Quentin's catch.

RKSimon added inline comments.May 17 2016, 9:44 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
339–354

This would work for the SSE2/AVX2 shift by immediate intrinsics (_mm_slli_epi32 etc.) but not for the AVX2 variable shifts being discussed here. I'm happy to deal with those in another patch.

majnemer added inline comments.May 17 2016, 9:52 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
362

auto *CShift

377–384

You can simplify this by handling the UndefValue case upfront:

if (isa<UndefValue>(COp)) {
  ShiftAmts.push_back(-1);
  continue;
}

auto *COp = dyn_cast_or_null<ConstantInt>(CShift->getAggregateElement(I));
if (!COp)
  return nullptr;
393

Why not just ShiftAmts.push_back(LogicalShift ? BitWidth : BitWidth - 1)?

402

Can you llvm::all_of?

404–411

Braces.

RKSimon updated this revision to Diff 58063.May 22 2016, 1:05 PM

Updated based on David's feedback

igorb added a subscriber: igorb.Jun 2 2016, 6:26 AM
qcolombet edited edge metadata.Jun 6 2016, 11:07 AM

Hi Simon,

LGTM.
David, anymore comments?

Cheers,
-Quentin

majnemer accepted this revision.Jun 6 2016, 11:14 AM
majnemer added a reviewer: majnemer.

LGTM

lib/Transforms/InstCombine/InstCombineCalls.cpp
377–378

getAggregateElement can return null.
perhaps:

if (CElt && isa<UndefValue(CElt))
This revision is now accepted and ready to land.Jun 6 2016, 11:14 AM
This revision was automatically updated to reflect the committed changes.