This is an archive of the discontinued LLVM Phabricator instance.

[X86]: Fix for PR27251
ClosedPublic

Authored by kbsmith1 on Apr 6 2016, 5:16 PM.

Download Raw Diff

Details

Reviewers

ab
DavidKreitzer

Commits

rL265690: [X86]: Fix for PR27251.

Summary

This fixes a bug introduced in r261024, and reported as PR27251.
Sometimes the operands need to be reversed in the newly generated instruction
depending on whether the negate pattern is on the true or false side of the select.

https://llvm.org/bugs/show_bug.cgi?id=27251

Diff Detail

Event Timeline

kbsmith1 updated this revision to Diff 52874.Apr 6 2016, 5:16 PM

kbsmith1 retitled this revision from to [X86]: Fix for PR27251.

kbsmith1 updated this object.

kbsmith1 added reviewers: ab, DavidKreitzer.

kbsmith1 added a subscriber: llvm-commits.

ab added inline comments.Apr 6 2016, 5:41 PM

lib/Target/X86/X86ISelLowering.cpp
27268–27269	Could you perhaps elaborate on the logic here a bit? IIUC, we used to do: (vselect M, X, (sub 0, X)) -> (sub (xor X, M), M) But we should do: (sub M, (xor X, M)) Which works because -1 - ~X == X
27271–27272	How about: std::swap(SubOp1, SubOp2) which lets us get rid of SubOp2

Good catch! LGTM modulo nits.

This revision is now accepted and ready to land.Apr 6 2016, 5:42 PM

LGTM, Kevin.

test/CodeGen/X86/vector-blend.ll
1011	I know this isn't related to your change, but the redundant shifts here are pretty gross.

Addressed Ahmed's review comments.

Thank you for the quick review Ahmed. I addressed both your comments.

-----Original Message-----
From: David Kreitzer [mailto:david.l.kreitzer@intel.com]
Sent: Thursday, April 07, 2016 8:22 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>;
ahmed.bougacha@gmail.com; Kreitzer, David L <david.l.kreitzer@intel.com>
Cc: llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251

DavidKreitzer added a comment.

LGTM, Kevin.

Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are
pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.

http://reviews.llvm.org/D18850

Closed by commit rL265690: [X86]: Fix for PR27251. (authored by kbsmith1). · Explain WhyApr 7 2016, 9:21 AM

This revision was automatically updated to reflect the committed changes.

Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are
pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.

For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which should be combined to ashr( v, 31 ) - this isn't that difficult.

For the SSE41 cases we don't need the shifts at all as (v)blendvps will select elements based on the sign bit alone, but the vselect/blendv relationship is rather nasty (change in input type behaviour after legalization) and I can imagine a number of problems getting it to work cleanly - I've hit some of these before trying to do late constant folding of vselect.

-----Original Message-----
From: Simon Pilgrim [mailto:llvm-dev@redking.me.uk]
Sent: Thursday, April 07, 2016 11:03 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>; Kreitzer, David L
<david.l.kreitzer@intel.com>; ahmed.bougacha@gmail.com
Cc: llvm-dev@redking.me.uk; llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251

RKSimon added a subscriber: RKSimon.
RKSimon added a comment.

Comment at: test/CodeGen/X86/vector-blend.ll:1011

@@ -1010,3 +1010,3 @@

; SSE2-NEXT: pslld $31, %xmm1

; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are

pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a

arithmetic shift right by 31.

This has the effect of propagating bit 0 through all 32 bits of the vector

element.

For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which
should be combined to ashr( v, 31 ) - this isn't that difficult.

For the SSE41 cases we don't need the shifts at all as (v)blendvps will select
elements based on the sign bit alone, but the vselect/blendv relationship is
rather nasty (change in input type behaviour after legalization) and I can
imagine a number of problems getting it to work cleanly - I've hit some of
these before trying to do late constant folding of vselect.

Thanks for further clarifying that. I had missed (because it wasn't in Dave's email, only in the test itself)
that the sequence started out with a psrld $31, %xmm1.

As Dave noted, that wasn't what this change-set was about, but the test simply happened to
show that redundancy.

Repository:
rL LLVM
http://reviews.llvm.org/D18850

Yes, thanks, Simon. You precisely captured my concern.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

21 lines

test/

CodeGen/

X86/

vector-blend.ll

6 lines

Diff 52928

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 27,256 Lines • ▼ Show 20 Lines	if (X.getValueType() == MaskVT && Y.getValueType() == MaskVT) {
SDValue V;		SDValue V;
if (IsNegV(Y.getNode(), X))		if (IsNegV(Y.getNode(), X))
V = X;		V = X;
else if (IsNegV(X.getNode(), Y))		else if (IsNegV(X.getNode(), Y))
V = Y;		V = Y;

if (V) {		if (V) {
assert(EltBits == 8 \|\| EltBits == 16 \|\| EltBits == 32);		assert(EltBits == 8 \|\| EltBits == 16 \|\| EltBits == 32);
return DAG.getBitcast(		SDValue SubOp1 = DAG.getNode(ISD::XOR, DL, MaskVT, V, Mask);
VT, DAG.getNode(ISD::SUB, DL, MaskVT,		SDValue SubOp2 = Mask;
DAG.getNode(ISD::XOR, DL, MaskVT, V, Mask), Mask));
		// If the negate was on the false side of the select, then
		// the operands of the SUB need to be swapped. PR 27251.
		abUnsubmitted Done Reply Inline Actions Could you perhaps elaborate on the logic here a bit? IIUC, we used to do: (vselect M, X, (sub 0, X)) -> (sub (xor X, M), M) But we should do: (sub M, (xor X, M)) Which works because -1 - ~X == X ab: Could you perhaps elaborate on the logic here a bit? IIUC, we used to do: (vselect M, X…
		// This is because the pattern being matched above is
		// (vselect M, (sub (0, X), X) -> (sub (xor X, M), M)
		// but if the pattern matched was
		abUnsubmitted Done Reply Inline Actions How about: std::swap(SubOp1, SubOp2) which lets us get rid of SubOp2 ab: How about: std::swap(SubOp1, SubOp2) which lets us get rid of SubOp2
		// (vselect M, X, (sub (0, X))), that is really negation of the pattern
		// above, -(vselect M, (sub 0, X), X), and therefore the replacement
		// pattern also needs to be a negation of the replacement pattern above.
		// And -(sub X, Y) is just sub (Y, X), so swapping the operands of the
		// sub accomplishes the negation of the replacement pattern.
		if (V == Y)
		std::swap(SubOp1, SubOp2);

		return DAG.getBitcast(VT,
		DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2));
}		}
}		}

// PBLENDVB is only available on SSE 4.1.		// PBLENDVB is only available on SSE 4.1.
if (!Subtarget.hasSSE41())		if (!Subtarget.hasSSE41())
return SDValue();		return SDValue();

MVT BlendVT = (VT == MVT::v4i64) ? MVT::v32i8 : MVT::v16i8;		MVT BlendVT = (VT == MVT::v4i64) ? MVT::v32i8 : MVT::v16i8;
▲ Show 20 Lines • Show All 3,283 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-blend.ll

Show First 20 Lines • Show All 1,002 Lines • ▼ Show 20 Lines	entry:
ret <8 x i32> %cond		ret <8 x i32> %cond
}		}

define <4 x i32> @blend_neg_logic_v4i32_2(<4 x i32> %v, <4 x i32> %c) {		define <4 x i32> @blend_neg_logic_v4i32_2(<4 x i32> %v, <4 x i32> %c) {
; SSE2-LABEL: blend_neg_logic_v4i32_2:		; SSE2-LABEL: blend_neg_logic_v4i32_2:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: psrld $31, %xmm1		; SSE2-NEXT: psrld $31, %xmm1
; SSE2-NEXT: pslld $31, %xmm1		; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1		; SSE2-NEXT: psrad $31, %xmm1
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions I know this isn't related to your change, but the redundant shifts here are pretty gross. DavidKreitzer: I know this isn't related to your change, but the redundant shifts here are pretty gross.
; SSE2-NEXT: pxor %xmm1, %xmm0		; SSE2-NEXT: pxor %xmm1, %xmm0
; SSE2-NEXT: psubd %xmm1, %xmm0		; SSE2-NEXT: psubd %xmm0, %xmm1
		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: blend_neg_logic_v4i32_2:		; SSSE3-LABEL: blend_neg_logic_v4i32_2:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: psrld $31, %xmm1		; SSSE3-NEXT: psrld $31, %xmm1
; SSSE3-NEXT: pslld $31, %xmm1		; SSSE3-NEXT: pslld $31, %xmm1
; SSSE3-NEXT: psrad $31, %xmm1		; SSSE3-NEXT: psrad $31, %xmm1
; SSSE3-NEXT: pxor %xmm1, %xmm0		; SSSE3-NEXT: pxor %xmm1, %xmm0
; SSSE3-NEXT: psubd %xmm1, %xmm0		; SSSE3-NEXT: psubd %xmm0, %xmm1
		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: blend_neg_logic_v4i32_2:		; SSE41-LABEL: blend_neg_logic_v4i32_2:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm0, %xmm2		; SSE41-NEXT: movdqa %xmm0, %xmm2
; SSE41-NEXT: psrld $31, %xmm1		; SSE41-NEXT: psrld $31, %xmm1
; SSE41-NEXT: pslld $31, %xmm1		; SSE41-NEXT: pslld $31, %xmm1
; SSE41-NEXT: pxor %xmm3, %xmm3		; SSE41-NEXT: pxor %xmm3, %xmm3
Show All 21 Lines