This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix SSE2/AVX2 vector shift by constant
ClosedPublic

Authored by RKSimon on Aug 5 2015, 3:55 AM.

Details

Summary

This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.

e.g.

%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)

In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.

In addition, this review adds support for the SSE2/AVX2 ashr shift-by-constant and also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821). I can commit these changes separately if necessary.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 31342.Aug 5 2015, 3:55 AM
RKSimon retitled this revision from to [InstCombine] Fix SSE2/AVX2 vector shift by constant.
RKSimon updated this object.
RKSimon added reviewers: qcolombet, mkuper, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
andreadb accepted this revision.Aug 5 2015, 5:46 AM
andreadb edited edge metadata.

Hi Simon,

As pointed out in the comments below, I suggest to split this patch into two separate patches.
I'd like you to move the code that "combines" packed arithmetic shifts on a separate patch. That patch will also have to remove the target specific DAG combiner rules in the x86 backend (your patch will make those rules redundant).

It is okay in my opinion if this patch also fixes PR23821 (that fix is very small and probably makes sense to just have it in thius patch..) .

If you address the (minor) comments below, then the patch LGTM.

Thanks Simon!

lib/Transforms/InstCombine/InstCombineCalls.cpp
213 ↗(On Diff #31342)

I would probably be more specific and explicitly quote the architecture manual which says: "only the first 64-bits of a 128-bit count operand are checked to compute the count".
But it is up to you, That comment is probably already good enough :-).

787–799 ↗(On Diff #31342)

Can this be committed in a separate patch?
For simplicity I would prefer if you just fix the problem with the shift count on this patch.

You can add rules for combining arithmetic shifts on a next patch. That new patch will also have to get rid of the (horrible) target specific DAG combine rules on psra(i) intrinsics that we currently run on x86 as part of 'PerformINTRINSIC_WO_CHAINCombine'.

test/Transforms/InstCombine/x86-vector-shifts.ll
9–10 ↗(On Diff #31342)

We should check that we don't have any instruction before the return.
We want to make sure that a shift-by-zero is folded away. In this case you can check that no shift is generated before the return statement (and that the tail call to the intrinsic function is no longer in the code).

You should do the same for all the other tests that check the instcombine behavior for shift-by-zero.

This revision is now accepted and ready to land.Aug 5 2015, 5:46 AM
RKSimon updated this revision to Diff 31350.Aug 5 2015, 7:25 AM
RKSimon edited edge metadata.

Updated patch. I'll submit the ASHR fix as a separate review.

LGTM. Thanks!

This revision was automatically updated to reflect the committed changes.