Page MenuHomePhabricator

[X86][SSE] Vectorized i64 uniform constant SRA shifts
ClosedPublic

Authored by RKSimon on May 10 2015, 4:47 AM.

Details

Summary

This patch adds vectorization support for uniform constant i64 arithmetic shift right operators.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 25426.May 10 2015, 4:47 AM
RKSimon retitled this revision from to [X86][SSE] Vectorized i64 uniform constant SRA shifts.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: andreadb, delena, qcolombet.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
delena added inline comments.May 10 2015, 6:50 AM
lib/Target/X86/X86ISelLowering.cpp
16559–16561

I think, that I fixed here a bug and removed AVX512. Could you, please, check?

test/CodeGen/X86/vector-sext.ll
111–120

I see this code as 2 pmovsxdq instructions and one shuffle between them. For windows, it looks like:

pmovsxdq        (%rcx), %xmm0
pmovsxdq        8(%rcx), %xmm1
retq

For linux your parameter is in xmm, so you need one shuflle with <2, 3, undef, undef>

Thanks Elena, I'll get extra AVX512 tests added for review.

lib/Target/X86/X86ISelLowering.cpp
16559–16561

Yes I'll add a proper AVX512 test.

test/CodeGen/X86/vector-sext.ll
111–120

I have some work in progress patches to improve pmovsx* support - the problem is that SIGN_EXTEND_INREG / SIGN_EXTEND_VECTOR_INREG is a mess and will take a while to sort out. Making this i64 SRA patch was an easy first step so at least we're not transferring between xmm and gprs so much.

delena edited edge metadata.May 10 2015, 7:13 AM

I think that you did not update the code before creating the patch.
See 235993.

  • Elena
RKSimon updated this revision to Diff 25433.May 10 2015, 10:17 AM
RKSimon edited edge metadata.

Dropped AVX512 support for i64 SRA in LowerScalarVariableShift as noted by Elena - added FIXME note.

You should generate another code for SSE 4.1 and AVX, According to my previous comment.

You should generate another code for SSE 4.1 and AVX, According to my previous comment.

Thanks - I have better support for pmovsx* coming in an upcoming later patch - it will be much easier to implement once this patch is already done.

I disagree with this approach. You can prepare a separate patch for "shift-right-64bits" optimization.
But you should not use shift-right in SEXT and generate 10 instructions instead of 3.

I disagree with this approach. You can prepare a separate patch for "shift-right-64bits" optimization.
But you should not use shift-right in SEXT and generate 10 instructions instead of 3.

I haven't intentionally added these shift-right in the sext tests - its just a result of the default sign-ext expansion from SIGN_EXTEND_INREG. I still think this is an improvement over where we are now but I'll push the pmovsx* patch up for review as soon as I can and revisit this afterwards.

RKSimon updated this revision to Diff 29008.Jul 3 2015, 3:38 AM
RKSimon updated this object.

Refreshed this patch now that the previous edge cases (sint_to_fp, and sext) have been dealt with properly.

I've updated the patch to work with Elena's 'SupporteVector' tests and moved the shift lowering code into LowerScalarImmediateShift directly.

delena added inline comments.Jul 4 2015, 11:27 PM
test/CodeGen/X86/vector-shift-ashr-128.ll
988 ↗(On Diff #29008)

Hi Simon,

I think that the result here will be incorrect. Let's take a positive 64-bit number (2^34)-1. After the arithmetic shift-right-7 you should receive
(2^27)-1. But "vpsrad" will take the source as negative 32-bit and you'll see (2^32)-1 in %xmm1 and the correct result will be after "vpsrlq" in %xmm0.

Upper = (2^32)-1
Lower = (2^27)-1
Ex = DAG.getVectorShuffle(ExVT, dl, Upper, Lower, {4, 1, 6, 3}); <= the result is incorrect

RKSimon added inline comments.Jul 5 2015, 7:56 AM
test/CodeGen/X86/vector-shift-ashr-128.ll
988 ↗(On Diff #29008)

Hi Elena, taking your example (as a v1i64 for simplicity):

in: 00000003ffffffff (ffffffff 00000003) as v2i32
upper: ashr32 (in, 7): 00000000ffffffff (ffffffff 00000000) 2nd 32-bit lane used
lower: lshr64(in, 7): 0000000007ffffff (07ffffff 00000000) 1st 32-bit lane used
shuffle32(ashr32, lshr64, 4,1,3,2): 0000000007ffffff (07ffffff 00000000)

Which I believe is correct, no?

delena added inline comments.Jul 5 2015, 11:55 PM
test/CodeGen/X86/vector-shift-ashr-128.ll
988 ↗(On Diff #29008)

yes, you are right. I checked again and I don't see any problem. In my opinion, you can commit this code.

This revision was automatically updated to reflect the committed changes.

Thanks Elena