This is an archive of the discontinued LLVM Phabricator instance.

[X86]: Fix for PR27251
ClosedPublic

Authored by kbsmith1 on Apr 6 2016, 5:16 PM.

Details

Summary

This fixes a bug introduced in r261024, and reported as PR27251.
Sometimes the operands need to be reversed in the newly generated instruction
depending on whether the negate pattern is on the true or false side of the select.

https://llvm.org/bugs/show_bug.cgi?id=27251

Diff Detail

Repository
rL LLVM

Event Timeline

kbsmith1 updated this revision to Diff 52874.Apr 6 2016, 5:16 PM
kbsmith1 retitled this revision from to [X86]: Fix for PR27251.
kbsmith1 updated this object.
kbsmith1 added reviewers: ab, DavidKreitzer.
kbsmith1 added a subscriber: llvm-commits.
ab added inline comments.Apr 6 2016, 5:41 PM
lib/Target/X86/X86ISelLowering.cpp
27268–27269 ↗(On Diff #52874)

Could you perhaps elaborate on the logic here a bit?

IIUC, we used to do:

(vselect M, X, (sub 0, X))
->
(sub (xor X, M), M)

But we should do:

(sub M, (xor X, M))

Which works because

-1 - ~X == X
27271–27272 ↗(On Diff #52874)

How about:

std::swap(SubOp1, SubOp2)

which lets us get rid of SubOp2

ab accepted this revision.Apr 6 2016, 5:42 PM
ab edited edge metadata.

Good catch! LGTM modulo nits.

This revision is now accepted and ready to land.Apr 6 2016, 5:42 PM
DavidKreitzer edited edge metadata.Apr 7 2016, 8:21 AM

LGTM, Kevin.

test/CodeGen/X86/vector-blend.ll
1011 ↗(On Diff #52874)

I know this isn't related to your change, but the redundant shifts here are pretty gross.

kbsmith1 updated this revision to Diff 52928.Apr 7 2016, 8:35 AM
kbsmith1 edited edge metadata.

Addressed Ahmed's review comments.

kbsmith1 marked 2 inline comments as done.Apr 7 2016, 8:37 AM

Thank you for the quick review Ahmed. I addressed both your comments.

-----Original Message-----
From: David Kreitzer [mailto:david.l.kreitzer@intel.com]
Sent: Thursday, April 07, 2016 8:22 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>;
ahmed.bougacha@gmail.com; Kreitzer, David L <david.l.kreitzer@intel.com>
Cc: llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251

DavidKreitzer added a comment.

LGTM, Kevin.

Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are
pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.

http://reviews.llvm.org/D18850

This revision was automatically updated to reflect the committed changes.

Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are
pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.

For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which should be combined to ashr( v, 31 ) - this isn't that difficult.

For the SSE41 cases we don't need the shifts at all as (v)blendvps will select elements based on the sign bit alone, but the vselect/blendv relationship is rather nasty (change in input type behaviour after legalization) and I can imagine a number of problems getting it to work cleanly - I've hit some of these before trying to do late constant folding of vselect.

-----Original Message-----
From: Simon Pilgrim [mailto:llvm-dev@redking.me.uk]
Sent: Thursday, April 07, 2016 11:03 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>; Kreitzer, David L
<david.l.kreitzer@intel.com>; ahmed.bougacha@gmail.com
Cc: llvm-dev@redking.me.uk; llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251

RKSimon added a subscriber: RKSimon.
RKSimon added a comment.

Comment at: test/CodeGen/X86/vector-blend.ll:1011

@@ -1010,3 +1010,3 @@

; SSE2-NEXT: pslld $31, %xmm1

; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0


I know this isn't related to your change, but the redundant shifts here are

pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a

arithmetic shift right by 31.

This has the effect of propagating bit 0 through all 32 bits of the vector

element.

For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which
should be combined to ashr( v, 31 ) - this isn't that difficult.

For the SSE41 cases we don't need the shifts at all as (v)blendvps will select
elements based on the sign bit alone, but the vselect/blendv relationship is
rather nasty (change in input type behaviour after legalization) and I can
imagine a number of problems getting it to work cleanly - I've hit some of
these before trying to do late constant folding of vselect.

Thanks for further clarifying that. I had missed (because it wasn't in Dave's email, only in the test itself)
that the sequence started out with a psrld $31, %xmm1.

As Dave noted, that wasn't what this change-set was about, but the test simply happened to
show that redundancy.

Repository:

rL LLVM

http://reviews.llvm.org/D18850

Yes, thanks, Simon. You precisely captured my concern.