This is an archive of the discontinued LLVM Phabricator instance.

[X86]: Fix for PR27251
ClosedPublic

Authored by kbsmith1 on Apr 6 2016, 5:16 PM.

Download Raw Diff

Details

Reviewers

ab
DavidKreitzer

Commits

rL265690: [X86]: Fix for PR27251.

Summary

This fixes a bug introduced in r261024, and reported as PR27251.
Sometimes the operands need to be reversed in the newly generated instruction
depending on whether the negate pattern is on the true or false side of the select.

https://llvm.org/bugs/show_bug.cgi?id=27251

Diff Detail

Repository: rL LLVM

Event Timeline

kbsmith1 updated this revision to Diff 52874.Apr 6 2016, 5:16 PM

kbsmith1 retitled this revision from to [X86]: Fix for PR27251.

kbsmith1 updated this object.

kbsmith1 added reviewers: ab, DavidKreitzer.

kbsmith1 added a subscriber: llvm-commits.

ab added inline comments.Apr 6 2016, 5:41 PM

lib/Target/X86/X86ISelLowering.cpp
27268–27269 ↗	(On Diff #52874)	Could you perhaps elaborate on the logic here a bit? IIUC, we used to do: (vselect M, X, (sub 0, X)) -> (sub (xor X, M), M) But we should do: (sub M, (xor X, M)) Which works because -1 - ~X == X
27271–27272 ↗	(On Diff #52874)	How about: std::swap(SubOp1, SubOp2) which lets us get rid of SubOp2

Good catch! LGTM modulo nits.

This revision is now accepted and ready to land.Apr 6 2016, 5:42 PM

LGTM, Kevin.

test/CodeGen/X86/vector-blend.ll
1011 ↗	(On Diff #52874)	I know this isn't related to your change, but the redundant shifts here are pretty gross.

Addressed Ahmed's review comments.

Thank you for the quick review Ahmed. I addressed both your comments.

-----Original Message-----
From: David Kreitzer [mailto:david.l.kreitzer@intel.com]
Sent: Thursday, April 07, 2016 8:22 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>;
ahmed.bougacha@gmail.com; Kreitzer, David L <david.l.kreitzer@intel.com>
Cc: llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251

DavidKreitzer added a comment.

LGTM, Kevin.

Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are
pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.

http://reviews.llvm.org/D18850

Closed by commit rL265690: [X86]: Fix for PR27251. (authored by kbsmith1). · Explain WhyApr 7 2016, 9:21 AM

This revision was automatically updated to reflect the committed changes.

Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are
pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a arithmetic shift right by 31.
This has the effect of propagating bit 0 through all 32 bits of the vector element.

For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which should be combined to ashr( v, 31 ) - this isn't that difficult.

For the SSE41 cases we don't need the shifts at all as (v)blendvps will select elements based on the sign bit alone, but the vselect/blendv relationship is rather nasty (change in input type behaviour after legalization) and I can imagine a number of problems getting it to work cleanly - I've hit some of these before trying to do late constant folding of vselect.

-----Original Message-----
From: Simon Pilgrim [mailto:llvm-dev@redking.me.uk]
Sent: Thursday, April 07, 2016 11:03 AM
To: Smith, Kevin B <kevin.b.smith@intel.com>; Kreitzer, David L
<david.l.kreitzer@intel.com>; ahmed.bougacha@gmail.com
Cc: llvm-dev@redking.me.uk; llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251

RKSimon added a subscriber: RKSimon.
RKSimon added a comment.

Comment at: test/CodeGen/X86/vector-blend.ll:1011

@@ -1010,3 +1010,3 @@

; SSE2-NEXT: pslld $31, %xmm1

; SSE2-NEXT: psrad $31, %xmm1

; SSE2-NEXT: pxor %xmm1, %xmm0

I know this isn't related to your change, but the redundant shifts here are

pretty gross.

Why are the shifts redundant? One is a shift left by 31, and the other is a

arithmetic shift right by 31.

This has the effect of propagating bit 0 through all 32 bits of the vector

element.

For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which
should be combined to ashr( v, 31 ) - this isn't that difficult.

For the SSE41 cases we don't need the shifts at all as (v)blendvps will select
elements based on the sign bit alone, but the vselect/blendv relationship is
rather nasty (change in input type behaviour after legalization) and I can
imagine a number of problems getting it to work cleanly - I've hit some of
these before trying to do late constant folding of vselect.

Thanks for further clarifying that. I had missed (because it wasn't in Dave's email, only in the test itself)
that the sequence started out with a psrld $31, %xmm1.

As Dave noted, that wasn't what this change-set was about, but the test simply happened to
show that redundancy.

Repository:
rL LLVM
http://reviews.llvm.org/D18850

Yes, thanks, Simon. You precisely captured my concern.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

21 lines

test/

CodeGen/

X86/

vector-blend.ll

6 lines

Diff 52934

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 27,332 Lines • ▼ Show 20 Lines	if (X.getValueType() == MaskVT && Y.getValueType() == MaskVT) {
SDValue V;		SDValue V;
if (IsNegV(Y.getNode(), X))		if (IsNegV(Y.getNode(), X))
V = X;		V = X;
else if (IsNegV(X.getNode(), Y))		else if (IsNegV(X.getNode(), Y))
V = Y;		V = Y;

if (V) {		if (V) {
assert(EltBits == 8 \|\| EltBits == 16 \|\| EltBits == 32);		assert(EltBits == 8 \|\| EltBits == 16 \|\| EltBits == 32);
return DAG.getBitcast(		SDValue SubOp1 = DAG.getNode(ISD::XOR, DL, MaskVT, V, Mask);
VT, DAG.getNode(ISD::SUB, DL, MaskVT,		SDValue SubOp2 = Mask;
DAG.getNode(ISD::XOR, DL, MaskVT, V, Mask), Mask));
		// If the negate was on the false side of the select, then
		// the operands of the SUB need to be swapped. PR 27251.
		// This is because the pattern being matched above is
		// (vselect M, (sub (0, X), X) -> (sub (xor X, M), M)
		// but if the pattern matched was
		// (vselect M, X, (sub (0, X))), that is really negation of the pattern
		// above, -(vselect M, (sub 0, X), X), and therefore the replacement
		// pattern also needs to be a negation of the replacement pattern above.
		// And -(sub X, Y) is just sub (Y, X), so swapping the operands of the
		// sub accomplishes the negation of the replacement pattern.
		if (V == Y)
		std::swap(SubOp1, SubOp2);

		return DAG.getBitcast(VT,
		DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2));
}		}
}		}

// PBLENDVB is only available on SSE 4.1.		// PBLENDVB is only available on SSE 4.1.
if (!Subtarget.hasSSE41())		if (!Subtarget.hasSSE41())
return SDValue();		return SDValue();

MVT BlendVT = (VT == MVT::v4i64) ? MVT::v32i8 : MVT::v16i8;		MVT BlendVT = (VT == MVT::v4i64) ? MVT::v32i8 : MVT::v16i8;
▲ Show 20 Lines • Show All 3,303 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-blend.ll

	Show First 20 Lines • Show All 1,004 Lines • ▼ Show 20 Lines

	define <4 x i32> @blend_neg_logic_v4i32_2(<4 x i32> %v, <4 x i32> %c) {			define <4 x i32> @blend_neg_logic_v4i32_2(<4 x i32> %v, <4 x i32> %c) {
	; SSE2-LABEL: blend_neg_logic_v4i32_2:			; SSE2-LABEL: blend_neg_logic_v4i32_2:
	; SSE2: # BB#0: # %entry			; SSE2: # BB#0: # %entry
	; SSE2-NEXT: psrld $31, %xmm1			; SSE2-NEXT: psrld $31, %xmm1
	; SSE2-NEXT: pslld $31, %xmm1			; SSE2-NEXT: pslld $31, %xmm1
	; SSE2-NEXT: psrad $31, %xmm1			; SSE2-NEXT: psrad $31, %xmm1
	; SSE2-NEXT: pxor %xmm1, %xmm0			; SSE2-NEXT: pxor %xmm1, %xmm0
	; SSE2-NEXT: psubd %xmm1, %xmm0			; SSE2-NEXT: psubd %xmm0, %xmm1
				; SSE2-NEXT: movdqa %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: blend_neg_logic_v4i32_2:			; SSSE3-LABEL: blend_neg_logic_v4i32_2:
	; SSSE3: # BB#0: # %entry			; SSSE3: # BB#0: # %entry
	; SSSE3-NEXT: psrld $31, %xmm1			; SSSE3-NEXT: psrld $31, %xmm1
	; SSSE3-NEXT: pslld $31, %xmm1			; SSSE3-NEXT: pslld $31, %xmm1
	; SSSE3-NEXT: psrad $31, %xmm1			; SSSE3-NEXT: psrad $31, %xmm1
	; SSSE3-NEXT: pxor %xmm1, %xmm0			; SSSE3-NEXT: pxor %xmm1, %xmm0
	; SSSE3-NEXT: psubd %xmm1, %xmm0			; SSSE3-NEXT: psubd %xmm0, %xmm1
				; SSSE3-NEXT: movdqa %xmm1, %xmm0
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: blend_neg_logic_v4i32_2:			; SSE41-LABEL: blend_neg_logic_v4i32_2:
	; SSE41: # BB#0: # %entry			; SSE41: # BB#0: # %entry
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: psrld $31, %xmm1			; SSE41-NEXT: psrld $31, %xmm1
	; SSE41-NEXT: pslld $31, %xmm1			; SSE41-NEXT: pslld $31, %xmm1
	; SSE41-NEXT: pxor %xmm3, %xmm3			; SSE41-NEXT: pxor %xmm3, %xmm3
	Show All 21 Lines