This fixes a bug introduced in r261024, and reported as PR27251.
Sometimes the operands need to be reversed in the newly generated instruction
depending on whether the negate pattern is on the true or false side of the select.
Details
- Reviewers
ab DavidKreitzer - Commits
- rL265690: [X86]: Fix for PR27251.
Diff Detail
Event Timeline
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
27268–27269 | Could you perhaps elaborate on the logic here a bit? IIUC, we used to do: (vselect M, X, (sub 0, X)) -> (sub (xor X, M), M) But we should do: (sub M, (xor X, M)) Which works because -1 - ~X == X | |
27271–27272 | How about: std::swap(SubOp1, SubOp2) which lets us get rid of SubOp2 |
LGTM, Kevin.
test/CodeGen/X86/vector-blend.ll | ||
---|---|---|
1011 | I know this isn't related to your change, but the redundant shifts here are pretty gross. |
-----Original Message-----
From: David Kreitzer [mailto:david.l.kreitzer@intel.com]
Sent: Thursday, April 07, 2016 8:22 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>;
ahmed.bougacha@gmail.com; Kreitzer, David L <david.l.kreitzer@intel.com>
Cc: llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251DavidKreitzer added a comment.
LGTM, Kevin.
Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1; SSE2-NEXT: pxor %xmm1, %xmm0
I know this isn't related to your change, but the redundant shifts here are
pretty gross.
Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.
Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1; SSE2-NEXT: pxor %xmm1, %xmm0
I know this isn't related to your change, but the redundant shifts here are
pretty gross.Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.
For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which should be combined to ashr( v, 31 ) - this isn't that difficult.
For the SSE41 cases we don't need the shifts at all as (v)blendvps will select elements based on the sign bit alone, but the vselect/blendv relationship is rather nasty (change in input type behaviour after legalization) and I can imagine a number of problems getting it to work cleanly - I've hit some of these before trying to do late constant folding of vselect.
-----Original Message-----
From: Simon Pilgrim [mailto:llvm-dev@redking.me.uk]
Sent: Thursday, April 07, 2016 11:03 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>; Kreitzer, David L
<david.l.kreitzer@intel.com>; ahmed.bougacha@gmail.com
Cc: llvm-dev@redking.me.uk; llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251RKSimon added a subscriber: RKSimon.
RKSimon added a comment.Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1
; SSE2-NEXT: pxor %xmm1, %xmm0
I know this isn't related to your change, but the redundant shifts here are
pretty gross.
Why are the shifts redundant? One is a shift left by 31, and the other is a
arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector
element.
For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which
should be combined to ashr( v, 31 ) - this isn't that difficult.For the SSE41 cases we don't need the shifts at all as (v)blendvps will select
elements based on the sign bit alone, but the vselect/blendv relationship is
rather nasty (change in input type behaviour after legalization) and I can
imagine a number of problems getting it to work cleanly - I've hit some of
these before trying to do late constant folding of vselect.
Thanks for further clarifying that. I had missed (because it wasn't in Dave's email, only in the test itself)
that the sequence started out with a psrld $31, %xmm1.
As Dave noted, that wasn't what this change-set was about, but the test simply happened to
show that redundancy.
Repository:
rL LLVM
Could you perhaps elaborate on the logic here a bit?
IIUC, we used to do:
But we should do:
Which works because