This patch adds vectorization support for uniform constant i64 arithmetic shift right operators.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
45–1 | I think, that I fixed here a bug and removed AVX512. Could you, please, check? | |
test/CodeGen/X86/vector-sext.ll | ||
111 ↗ | (On Diff #25426) | I see this code as 2 pmovsxdq instructions and one shuffle between them. For windows, it looks like: pmovsxdq (%rcx), %xmm0 pmovsxdq 8(%rcx), %xmm1 retq For linux your parameter is in xmm, so you need one shuflle with <2, 3, undef, undef> |
Thanks Elena, I'll get extra AVX512 tests added for review.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
45–1 | Yes I'll add a proper AVX512 test. | |
test/CodeGen/X86/vector-sext.ll | ||
111 ↗ | (On Diff #25426) | I have some work in progress patches to improve pmovsx* support - the problem is that SIGN_EXTEND_INREG / SIGN_EXTEND_VECTOR_INREG is a mess and will take a while to sort out. Making this i64 SRA patch was an easy first step so at least we're not transferring between xmm and gprs so much. |
I think that you did not update the code before creating the patch.
See 235993.
- Elena
Dropped AVX512 support for i64 SRA in LowerScalarVariableShift as noted by Elena - added FIXME note.
You should generate another code for SSE 4.1 and AVX, According to my previous comment.
Thanks - I have better support for pmovsx* coming in an upcoming later patch - it will be much easier to implement once this patch is already done.
I disagree with this approach. You can prepare a separate patch for "shift-right-64bits" optimization.
But you should not use shift-right in SEXT and generate 10 instructions instead of 3.
I haven't intentionally added these shift-right in the sext tests - its just a result of the default sign-ext expansion from SIGN_EXTEND_INREG. I still think this is an improvement over where we are now but I'll push the pmovsx* patch up for review as soon as I can and revisit this afterwards.
Refreshed this patch now that the previous edge cases (sint_to_fp, and sext) have been dealt with properly.
I've updated the patch to work with Elena's 'SupporteVector' tests and moved the shift lowering code into LowerScalarImmediateShift directly.
test/CodeGen/X86/vector-shift-ashr-128.ll | ||
---|---|---|
988 | Hi Simon, I think that the result here will be incorrect. Let's take a positive 64-bit number (2^34)-1. After the arithmetic shift-right-7 you should receive Upper = (2^32)-1 |
test/CodeGen/X86/vector-shift-ashr-128.ll | ||
---|---|---|
988 | Hi Elena, taking your example (as a v1i64 for simplicity): in: 00000003ffffffff (ffffffff 00000003) as v2i32 Which I believe is correct, no? |
test/CodeGen/X86/vector-shift-ashr-128.ll | ||
---|---|---|
988 | yes, you are right. I checked again and I don't see any problem. In my opinion, you can commit this code. |
Hi Simon,
I think that the result here will be incorrect. Let's take a positive 64-bit number (2^34)-1. After the arithmetic shift-right-7 you should receive
(2^27)-1. But "vpsrad" will take the source as negative 32-bit and you'll see (2^32)-1 in %xmm1 and the correct result will be after "vpsrlq" in %xmm0.
Upper = (2^32)-1
Lower = (2^27)-1
Ex = DAG.getVectorShuffle(ExVT, dl, Upper, Lower, {4, 1, 6, 3}); <= the result is incorrect