This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks.
ClosedPublic

Authored by RKSimon on May 7 2017, 11:41 AM.

Download Raw Diff

Details

Reviewers

ab
craig.topper
delena
spatel
andreadb
filcab

Commits

rGdf39b03f29e3: [X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks.
rL302424: [X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks.

Summary

Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead.

This is a first step in several things we can do to improve PBLENDV support:

Better matching of X86ISD::ANDNP patterns.
Handle floating point cases.
Better vector and bitcast support in computeNumSignBits.
Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.).

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon created this revision.May 7 2017, 11:41 AM

delena added inline comments.May 7 2017, 11:42 PM

lib/Target/X86/X86ISelLowering.cpp
31566 ↗	(On Diff #98104)	Why VT should be v2i64 of v4i64? Isn't the same transformation profitable for other integer types?

RKSimon added inline comments.May 8 2017, 4:05 AM

lib/Target/X86/X86ISelLowering.cpp
31566 ↗	(On Diff #98104)	It's due to us promoting to v2i64/v4i64 for X86ISD::ANDNP - it's not actually necessary but seems to have been used as an early out. I'll remove it - it won't make any difference to the codegen now but will make it easier to perform more thorough ANDNP matching mentioned in the TODO.

Addressed Elena's comments.

delena added inline comments.May 8 2017, 4:51 AM

test/CodeGen/X86/pr32907.ll
26 ↗	(On Diff #98152)	I assume that ComputeNumSignBits() does not recognize the ASHR sequence here, otherwise it would be able to generate VPBLEND, right?

RKSimon added inline comments.May 8 2017, 5:32 AM

test/CodeGen/X86/pr32907.ll
26 ↗	(On Diff #98152)	Exactly - this is an example of ones of the TODOs - the ANDNP is being generated at the same time that the ASHR_v2i64 is being lowered into the PSRAD+PSHUFD. If we could recognise the AND(XOR(-1,M), X) pattern earlier it would combine. Note it wouldn't generate a VPBLEND, it would generate the SUB(XOR(X, M), M) pattern similar to the AVX512 codegen.

delena accepted this revision.May 8 2017, 6:29 AM

This revision is now accepted and ready to land.May 8 2017, 6:29 AM

spatel added inline comments.May 8 2017, 7:11 AM

test/CodeGen/X86/vselect-pcmp.ll
144 ↗	(On Diff #98152)	You can remove those snarky 16-bit comments. 😃
154–156 ↗	(On Diff #98152)	Looks like we have to go to extremes to get the AVX1 case although this may improve with a patch that I'm working on for PR32790: https://bugs.llvm.org/show_bug.cgi?id=32790

Closed by commit rL302424: [X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks. (authored by RKSimon). · Explain WhyMay 8 2017, 7:30 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

36 lines

test/

CodeGen/

X86/

cast-vsel.ll

37 lines

pr32907.ll

7 lines

vselect-pcmp.ll

12 lines

Diff 98167

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 31,555 Lines • ▼ Show 20 Lines
	// into:			// into:
	// (vselect m, x, y)			// (vselect m, x, y)
	// As a special case, try to fold:			// As a special case, try to fold:
	// (or (and (m, (sub 0, x)), (pandn m, x)))			// (or (and (m, (sub 0, x)), (pandn m, x)))
	// into:			// into:
	// (sub (xor X, M), M)			// (sub (xor X, M), M)
	static SDValue combineLogicBlendIntoPBLENDV(SDNode *N, SelectionDAG &DAG,			static SDValue combineLogicBlendIntoPBLENDV(SDNode *N, SelectionDAG &DAG,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	assert(N->getOpcode() == ISD::OR);			assert(N->getOpcode() == ISD::OR && "Unexpected Opcode");

	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	SDValue N1 = N->getOperand(1);			SDValue N1 = N->getOperand(1);
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);

	if (!((VT == MVT::v2i64) \|\| (VT == MVT::v4i64 && Subtarget.hasInt256())))			if (!((VT.is128BitVector() && Subtarget.hasSSE2()) \|\|
				(VT.is256BitVector() && Subtarget.hasInt256())))
	return SDValue();			return SDValue();
	assert(Subtarget.hasSSE2() && "Unexpected i64 vector without SSE2!");

	// Canonicalize pandn to RHS			// Canonicalize AND to LHS.
	if (N0.getOpcode() == X86ISD::ANDNP)			if (N1.getOpcode() == ISD::AND)
	std::swap(N0, N1);			std::swap(N0, N1);

				// TODO: Attempt to match against AND(XOR(-1,X),Y) as well, waiting for
				// ANDNP combine allows other combines to happen that prevent matching.
	if (N0.getOpcode() != ISD::AND \|\| N1.getOpcode() != X86ISD::ANDNP)			if (N0.getOpcode() != ISD::AND \|\| N1.getOpcode() != X86ISD::ANDNP)
	return SDValue();			return SDValue();

	SDValue Mask = N1.getOperand(0);			SDValue Mask = N1.getOperand(0);
	SDValue X = N1.getOperand(1);			SDValue X = N1.getOperand(1);
	SDValue Y;			SDValue Y;
	if (N0.getOperand(0) == Mask)			if (N0.getOperand(0) == Mask)
	Y = N0.getOperand(1);			Y = N0.getOperand(1);
	if (N0.getOperand(1) == Mask)			if (N0.getOperand(1) == Mask)
	Y = N0.getOperand(0);			Y = N0.getOperand(0);

	// Check to see if the mask appeared in both the AND and ANDNP.			// Check to see if the mask appeared in both the AND and ANDNP.
	if (!Y.getNode())			if (!Y.getNode())
	return SDValue();			return SDValue();

	// Validate that X, Y, and Mask are bitcasts, and see through them.			// Validate that X, Y, and Mask are bitcasts, and see through them.
	Mask = peekThroughBitcasts(Mask);			Mask = peekThroughBitcasts(Mask);
	X = peekThroughBitcasts(X);			X = peekThroughBitcasts(X);
	Y = peekThroughBitcasts(Y);			Y = peekThroughBitcasts(Y);

	EVT MaskVT = Mask.getValueType();			EVT MaskVT = Mask.getValueType();

	// Validate that the Mask operand is a vector sra node.
	// FIXME: what to do for bytes, since there is a psignb/pblendvb, but
	// there is no psrai.b
	unsigned EltBits = MaskVT.getScalarSizeInBits();			unsigned EltBits = MaskVT.getScalarSizeInBits();
	unsigned SraAmt = ~0;
	if (Mask.getOpcode() == ISD::SRA) {
	if (auto *AmtBV = dyn_cast<BuildVectorSDNode>(Mask.getOperand(1)))
	if (auto *AmtConst = AmtBV->getConstantSplatNode())
	SraAmt = AmtConst->getZExtValue();
	} else if (Mask.getOpcode() == X86ISD::VSRAI)
	SraAmt = Mask.getConstantOperandVal(1);

	if ((SraAmt + 1) != EltBits)			// TODO: Attempt to handle floating point cases as well?
				if (!MaskVT.isInteger() \|\| DAG.ComputeNumSignBits(Mask) != EltBits)
	return SDValue();			return SDValue();

	SDLoc DL(N);			SDLoc DL(N);

	// Try to match:			// Try to match:
	// (or (and (M, (sub 0, X)), (pandn M, X)))			// (or (and (M, (sub 0, X)), (pandn M, X)))
	// which is a special case of vselect:			// which is a special case of vselect:
	// (vselect M, (sub 0, X), X)			// (vselect M, (sub 0, X), X)
	// Per:			// Per:
	// http://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate			// http://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate
	// We know that, if fNegate is 0 or 1:			// We know that, if fNegate is 0 or 1:
	// (fNegate ? -v : v) == ((v ^ -fNegate) + fNegate)			// (fNegate ? -v : v) == ((v ^ -fNegate) + fNegate)
	//			//
	// Here, we have a mask, M (all 1s or 0), and, similarly, we know that:			// Here, we have a mask, M (all 1s or 0), and, similarly, we know that:
	// ((M & 1) ? -X : X) == ((X ^ -(M & 1)) + (M & 1))			// ((M & 1) ? -X : X) == ((X ^ -(M & 1)) + (M & 1))
	// ( M ? -X : X) == ((X ^ M ) + (M & 1))			// ( M ? -X : X) == ((X ^ M ) + (M & 1))
	// This lets us transform our vselect to:			// This lets us transform our vselect to:
	// (add (xor X, M), (and M, 1))			// (add (xor X, M), (and M, 1))
	// And further to:			// And further to:
	// (sub (xor X, M), M)			// (sub (xor X, M), M)
	if (X.getValueType() == MaskVT && Y.getValueType() == MaskVT) {			if (X.getValueType() == MaskVT && Y.getValueType() == MaskVT &&
				DAG.getTargetLoweringInfo().isOperationLegal(ISD::SUB, MaskVT)) {
	auto IsNegV = [](SDNode *N, SDValue V) {			auto IsNegV = [](SDNode *N, SDValue V) {
	return N->getOpcode() == ISD::SUB && N->getOperand(1) == V &&			return N->getOpcode() == ISD::SUB && N->getOperand(1) == V &&
	ISD::isBuildVectorAllZeros(N->getOperand(0).getNode());			ISD::isBuildVectorAllZeros(N->getOperand(0).getNode());
	};			};
	SDValue V;			SDValue V;
	if (IsNegV(Y.getNode(), X))			if (IsNegV(Y.getNode(), X))
	V = X;			V = X;
	else if (IsNegV(X.getNode(), Y))			else if (IsNegV(X.getNode(), Y))
	V = Y;			V = Y;

	if (V) {			if (V) {
	if (EltBits != 8 && EltBits != 16 && EltBits != 32)
	return SDValue();

	SDValue SubOp1 = DAG.getNode(ISD::XOR, DL, MaskVT, V, Mask);			SDValue SubOp1 = DAG.getNode(ISD::XOR, DL, MaskVT, V, Mask);
	SDValue SubOp2 = Mask;			SDValue SubOp2 = Mask;

	// If the negate was on the false side of the select, then			// If the negate was on the false side of the select, then
	// the operands of the SUB need to be swapped. PR 27251.			// the operands of the SUB need to be swapped. PR 27251.
	// This is because the pattern being matched above is			// This is because the pattern being matched above is
	// (vselect M, (sub (0, X), X) -> (sub (xor X, M), M)			// (vselect M, (sub (0, X), X) -> (sub (xor X, M), M)
	// but if the pattern matched was			// but if the pattern matched was
	// (vselect M, X, (sub (0, X))), that is really negation of the pattern			// (vselect M, X, (sub (0, X))), that is really negation of the pattern
	// above, -(vselect M, (sub 0, X), X), and therefore the replacement			// above, -(vselect M, (sub 0, X), X), and therefore the replacement
	// pattern also needs to be a negation of the replacement pattern above.			// pattern also needs to be a negation of the replacement pattern above.
	// And -(sub X, Y) is just sub (Y, X), so swapping the operands of the			// And -(sub X, Y) is just sub (Y, X), so swapping the operands of the
	// sub accomplishes the negation of the replacement pattern.			// sub accomplishes the negation of the replacement pattern.
	if (V == Y)			if (V == Y)
	std::swap(SubOp1, SubOp2);			std::swap(SubOp1, SubOp2);

	return DAG.getBitcast(VT,			SDValue Res = DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2);
	DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2));			return DAG.getBitcast(VT, Res);
	}			}
	}			}

	// PBLENDVB is only available on SSE 4.1.			// PBLENDVB is only available on SSE 4.1.
	if (!Subtarget.hasSSE41())			if (!Subtarget.hasSSE41())
	return SDValue();			return SDValue();

	MVT BlendVT = (VT == MVT::v4i64) ? MVT::v32i8 : MVT::v16i8;			MVT BlendVT = (VT == MVT::v4i64) ? MVT::v32i8 : MVT::v16i8;
	▲ Show 20 Lines • Show All 4,369 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cast-vsel.ll

	Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pandn %xmm4, %xmm0			; SSE2-NEXT: pandn %xmm4, %xmm0
	; SSE2-NEXT: por %xmm2, %xmm0			; SSE2-NEXT: por %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc:			; SSE41-LABEL: trunc:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pcmpeqw %xmm1, %xmm0			; SSE41-NEXT: pcmpeqw %xmm1, %xmm0
	; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]			; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; SSE41-NEXT: pshufb %xmm1, %xmm5
	; SSE41-NEXT: pshufb %xmm1, %xmm4
	; SSE41-NEXT: punpcklqdq {{.*#+}} xmm4 = xmm4[0],xmm5[0]
	; SSE41-NEXT: pshufb %xmm1, %xmm3			; SSE41-NEXT: pshufb %xmm1, %xmm3
	; SSE41-NEXT: pshufb %xmm1, %xmm2			; SSE41-NEXT: pshufb %xmm1, %xmm2
	; SSE41-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]			; SSE41-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
	; SSE41-NEXT: pand %xmm0, %xmm2			; SSE41-NEXT: pshufb %xmm1, %xmm5
	; SSE41-NEXT: pandn %xmm4, %xmm0			; SSE41-NEXT: pshufb %xmm1, %xmm4
	; SSE41-NEXT: por %xmm2, %xmm0			; SSE41-NEXT: punpcklqdq {{.*#+}} xmm4 = xmm4[0],xmm5[0]
				; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm4
				; SSE41-NEXT: movdqa %xmm4, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc:			; AVX1-LABEL: trunc:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm1			; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm1
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]			; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; AVX1-NEXT: vpshufb %xmm4, %xmm1, %xmm1			; AVX1-NEXT: vpshufb %xmm4, %xmm1, %xmm1
	; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
	; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm3[0],xmm1[0]
	; AVX1-NEXT: vpandn %xmm1, %xmm0, %xmm1
	; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3
	; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
	; AVX1-NEXT: vpshufb %xmm4, %xmm2, %xmm2			; AVX1-NEXT: vpshufb %xmm4, %xmm2, %xmm2
	; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]			; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
	; AVX1-NEXT: vpand %xmm0, %xmm2, %xmm0			; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm2
	; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpshufb %xmm4, %xmm2, %xmm2
				; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
				; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
				; AVX1-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: trunc:			; AVX2-LABEL: trunc:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]			; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]
	; AVX2-NEXT: vpshufb %ymm1, %ymm3, %ymm3			; AVX2-NEXT: vpshufb %ymm1, %ymm2, %ymm2
	; AVX2-NEXT: vpermq {{.*#+}} ymm3 = ymm3[0,2,2,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm2 = ymm2[0,2,2,3]
	; AVX2-NEXT: vpandn %xmm3, %xmm0, %xmm3			; AVX2-NEXT: vpshufb %ymm1, %ymm3, %ymm1
	; AVX2-NEXT: vpshufb %ymm1, %ymm2, %ymm1
	; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,2,2,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,2,2,3]
	; AVX2-NEXT: vpand %xmm0, %xmm1, %xmm0			; AVX2-NEXT: vpblendvb %xmm0, %xmm2, %xmm1, %xmm0
	; AVX2-NEXT: vpor %xmm3, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cmp = icmp eq <8 x i16> %a, %b			%cmp = icmp eq <8 x i16> %a, %b
	%sel = select <8 x i1> %cmp, <8 x i32> %c, <8 x i32> %d			%sel = select <8 x i1> %cmp, <8 x i32> %c, <8 x i32> %d
	%tr = trunc <8 x i32> %sel to <8 x i16>			%tr = trunc <8 x i32> %sel to <8 x i16>
	ret <8 x i16> %tr			ret <8 x i16> %tr
	}			}

	▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/pr32907.ll

	Show All 29 Lines
	; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm1			; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: PR32907:			; AVX512-LABEL: PR32907:
	; AVX512: # BB#0: # %entry			; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vpsubq %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpsubq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpsraq $63, %zmm0, %zmm1			; AVX512-NEXT: vpsraq $63, %zmm0, %zmm1
	; AVX512-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpsubq %xmm0, %xmm2, %xmm2			; AVX512-NEXT: vpsubq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpandn %xmm0, %xmm1, %xmm0
	; AVX512-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX512-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	entry:			entry:
	%sub13.i = sub <2 x i64> %astype.i, %astype6.i			%sub13.i = sub <2 x i64> %astype.i, %astype6.i
	%x.lobit.i.i = ashr <2 x i64> %sub13.i, <i64 63, i64 63>			%x.lobit.i.i = ashr <2 x i64> %sub13.i, <i64 63, i64 63>
	%sub.i.i = sub <2 x i64> zeroinitializer, %sub13.i			%sub.i.i = sub <2 x i64> zeroinitializer, %sub13.i
	%0 = xor <2 x i64> %x.lobit.i.i, <i64 -1, i64 -1>			%0 = xor <2 x i64> %x.lobit.i.i, <i64 -1, i64 -1>
	%1 = and <2 x i64> %sub13.i, %0			%1 = and <2 x i64> %sub13.i, %0
	%2 = and <2 x i64> %x.lobit.i.i, %sub.i.i			%2 = and <2 x i64> %x.lobit.i.i, %sub.i.i
	%cond.i.i = or <2 x i64> %1, %2			%cond.i.i = or <2 x i64> %1, %2
	ret <2 x i64> %cond.i.i			ret <2 x i64> %cond.i.i
	}			}

llvm/trunk/test/CodeGen/X86/vselect-pcmp.ll

	Show All 29 Lines

	; Sorry 16-bit, you're not important enough to support?			; Sorry 16-bit, you're not important enough to support?

	define <8 x i16> @signbit_sel_v8i16(<8 x i16> %x, <8 x i16> %y, <8 x i16> %mask) {			define <8 x i16> @signbit_sel_v8i16(<8 x i16> %x, <8 x i16> %y, <8 x i16> %mask) {
	; AVX-LABEL: signbit_sel_v8i16:			; AVX-LABEL: signbit_sel_v8i16:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX-NEXT: vpcmpgtw %xmm2, %xmm3, %xmm2			; AVX-NEXT: vpcmpgtw %xmm2, %xmm3, %xmm2
	; AVX-NEXT: vpandn %xmm1, %xmm2, %xmm1			; AVX-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%tr = icmp slt <8 x i16> %mask, zeroinitializer			%tr = icmp slt <8 x i16> %mask, zeroinitializer
	%z = select <8 x i1> %tr, <8 x i16> %x, <8 x i16> %y			%z = select <8 x i1> %tr, <8 x i16> %x, <8 x i16> %y
	ret <8 x i16> %z			ret <8 x i16> %z
	}			}

	define <4 x i32> @signbit_sel_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {			define <4 x i32> @signbit_sel_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
	; AVX12F-LABEL: signbit_sel_v4i32:			; AVX12F-LABEL: signbit_sel_v4i32:
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vandps %ymm2, %ymm0, %ymm0			; AVX1-NEXT: vandps %ymm2, %ymm0, %ymm0
	; AVX1-NEXT: vorps %ymm1, %ymm0, %ymm0			; AVX1-NEXT: vorps %ymm1, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: signbit_sel_v16i16:			; AVX2-LABEL: signbit_sel_v16i16:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpxor %ymm3, %ymm3, %ymm3			; AVX2-NEXT: vpxor %ymm3, %ymm3, %ymm3
	; AVX2-NEXT: vpcmpgtw %ymm2, %ymm3, %ymm2			; AVX2-NEXT: vpcmpgtw %ymm2, %ymm3, %ymm2
	; AVX2-NEXT: vpandn %ymm1, %ymm2, %ymm1			; AVX2-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: vpand %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: signbit_sel_v16i16:			; AVX512-LABEL: signbit_sel_v16i16:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vpxor %ymm3, %ymm3, %ymm3			; AVX512-NEXT: vpxor %ymm3, %ymm3, %ymm3
	; AVX512-NEXT: vpcmpgtw %ymm2, %ymm3, %ymm2			; AVX512-NEXT: vpcmpgtw %ymm2, %ymm3, %ymm2
	; AVX512-NEXT: vpandn %ymm1, %ymm2, %ymm1			; AVX512-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
	; AVX512-NEXT: vpand %ymm2, %ymm0, %ymm0
	; AVX512-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%tr = icmp slt <16 x i16> %mask, zeroinitializer			%tr = icmp slt <16 x i16> %mask, zeroinitializer
	%z = select <16 x i1> %tr, <16 x i16> %x, <16 x i16> %y			%z = select <16 x i1> %tr, <16 x i16> %x, <16 x i16> %y
	ret <16 x i16> %z			ret <16 x i16> %z
	}			}

	define <8 x i32> @signbit_sel_v8i32(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {			define <8 x i32> @signbit_sel_v8i32(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {
	; AVX12-LABEL: signbit_sel_v8i32:			; AVX12-LABEL: signbit_sel_v8i32:
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines