This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM.
ClosedPublic

Authored by craig.topper on Nov 6 2016, 9:39 AM.

Details

Summary

This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.

This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.

Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.

Diff Detail

Event Timeline

craig.topper retitled this revision from to [AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM..
craig.topper updated this object.
craig.topper added reviewers: RKSimon, delena, zvi.
craig.topper updated this object.
craig.topper added a subscriber: llvm-commits.
RKSimon edited edge metadata.Nov 7 2016, 10:56 AM

Are we testing for the lowering to mask/maskz with the separate select instruction anywhere?

No., I can add it. Or it will be implicitly tested once I autoupgrade the original intrinsics to these intrinsics plus select.

No., I can add it. Or it will be implicitly tested once I autoupgrade the original intrinsics to these intrinsics plus select.

Adding an explicit test now would be better I think - which just made me realise if/when we do start removing upgrades we might be in danger of losing such test coverage if we're not careful!

craig.topper edited edge metadata.

Add tests for masking the new intrinsics.

delena edited edge metadata.Nov 10 2016, 12:21 AM

Do you mean InstCombine on IR-2-IR or DAG combine?

RKSimon accepted this revision.Nov 10 2016, 5:57 AM
RKSimon edited edge metadata.

LGTM

Do you mean InstCombine on IR-2-IR or DAG combine?

InstCombineCalls.cpp has code to convert SSE/AVX2 vector shifts to generic shifts if we can guarantee that the shift values are in range. With this change Craig will be able to add support for the AVX512 equivalents without having to add mask/maskz support.

This revision is now accepted and ready to land.Nov 10 2016, 5:57 AM
delena requested changes to this revision.Nov 10 2016, 6:52 AM
delena edited edge metadata.

Please wait! I disagree with adding a bunch of unmasked intrinsics additionally to the masked. If you want to create IR in InstCombineCalls, you can do this for masked intrinsics as well. I'm afraid that we will need to duplicate hundreds of intrinsics. I want to ask Intel guys opinion before commit.

This revision now requires changes to proceed.Nov 10 2016, 6:52 AM

Elena, this the first patch of a four step plan.

  1. Add new unmasked intrinsics
  2. Add support for new unmasked intrinsics to InstCombineCalls
  3. Switch clang to new unmasked intrinsics
  4. Remove masked intrinsics and auto upgrade them to the unmasked instrinsic plus select.

So in the end there wont' be duplicate intrinsics. But there will be quite a bit of autoupgrade cases.

zvi edited edge metadata.Nov 10 2016, 11:36 AM

Is the reason for the temporary duplication the need to make changes in two repositories, LLVM and Clang? If yes, Elena, how can this be done differently?

We could certainly teach InstCombine to handle the masking for the existing intrinsics. I was just going for consistency and not spreading understanding of masking IR to another place.

One concern i have about masking in general is that for a lot of legacy instructions we have have unmasked builtins and I've been wrapping them with selects in IR in the frontend. So the middle end optimizers see the selects and can maybe optimize them through constant folding and the like.. But for 512-bit intrinsics we don't have the selects in IR and instead we have instrinsics that don't expose the same ability to the middle end optimizers.

Right now I just want to replace this subset. I'll have to review more of InstCombineCalls for other cases.

delena accepted this revision.Nov 11 2016, 12:04 AM
delena edited edge metadata.
This revision is now accepted and ready to land.Nov 11 2016, 12:04 AM
This revision was automatically updated to reflect the committed changes.