Page MenuHomePhabricator

[X86] Lowering shift intrinsics to native IR
Needs ReviewPublic

Authored by tkrupa on May 16 2018, 7:47 AM.



Injecting equivalent IR in place of x86 intrinsic calls via InstCombineCalls.cpp.

Diff Detail

Event Timeline

tkrupa created this revision.May 16 2018, 7:47 AM
RKSimon added inline comments.May 18 2018, 5:33 AM

Where is the SSE codegen?

I fear that LICM is going to separate this pattern if its used in a loop and then isel won't be able to see the whole pattern.

Can you provide an example of such behavior? I tried various (rather simple) tests like this one:

m128i foo(m128i a, __m128i b, int x, int y) {

__m128i tmp;
for (int i = 0; i < y; i++) {
  x *= 20;
  tmp = _mm_sll_epi64(a, b);
return _mm_slli_epi64(tmp, x);


and for all of them the whole pattern get hoisted out of the loop and folds without a problem.

tkrupa updated this revision to Diff 147745.May 21 2018, 2:37 AM

Added missing test checks.

Nevermind, I got it wrong - after compiling something similar to, it does split the pattern. Moreover, it splits it in different manner for different intrinsics and hoists more than one more instruction, so it might not be as simple as moving back than one subtraction in the bug reproducer. I don't think there is any other way though - replicating simplifyX86immShift and simplifyX86varShift directly in IR would be a nightmare.

We already do a lot in InstCombineCalls to convert x86 vector shift calls to generic shifts (mainly for constants, valuetracking might allow us to do a bit more?), and as others have mentioned, splitting IR like this is likely to cause us a lot of problems.

It could be useful to allow the backend to remove out of bounds shift masks where the instruction supports implicit zeroing (e.g. to avoid sanitizer warnings:

RKSimon resigned from this revision.Feb 7 2021, 6:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2021, 6:23 AM
Herald added a subscriber: pengfei. · View Herald Transcript