This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
1/3
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2/4
fminimum-fmaximum.ll

Differential D150249

[X86] Improve handling on zero constant for fminimum/fmaximum lowering
ClosedPublic

Authored by skatkov on May 9 2023, 11:40 PM.

Download Raw Diff

Details

Reviewers

RKSimon
pengfei
goldstein.w.n
e-kud

Commits

rG6e19eea02bbe: [X86] Improve handling on zero constant for fminimum/fmaximum lowering

Summary

If we know that zero constant operand is already in the right place we do not need
to re-order anything.

Diff Detail

Event Timeline

skatkov created this revision.May 9 2023, 11:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2023, 11:40 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

skatkov requested review of this revision.May 9 2023, 11:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2023, 11:40 PM

skatkov added a parent revision: D149844: [X86] Add lowering of fminimum/fmaximum for vector operands.May 9 2023, 11:40 PM

Harbormaster completed remote builds in B231038: Diff 520915.May 10 2023, 12:16 AM

e-kud added inline comments.May 10 2023, 6:14 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
1141	I think this test shows that a predicate is not preferred zero should be more generic than is opposite zero (in context of constant vectors).

goldstein.w.n added inline comments.May 10 2023, 10:57 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
30300	nit: can you use a different variable name than `PreferredZero` as the value in the lambda doesn't always match whats in the outerscope.

skatkov added inline comments.May 10 2023, 9:05 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
30300	Agreed. Thanks, Will use just Zero.
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
1141	I will modify test to use constant 0.f and -0.f to cover both not a preferred zero and not an opposite zero. The intention of the test is to check that even if any of operands is zero we need all of them zero of the same sign.

skatkov updated this revision to Diff 521209.May 10 2023, 10:37 PM

Dushistov added a subscriber: Dushistov.May 10 2023, 11:58 PM

Harbormaster completed remote builds in B231256: Diff 521209.May 11 2023, 12:53 AM

e-kud added inline comments.May 11 2023, 5:15 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
1141	Sorry, my intentions were unclear. I mean in case of `min(%x, <0, 5>)` we need to generate a single instruction `min(<0, 5>, %x)`. But we generate all kinds of checks instead. This happens because we special case only `<0, 0, ...>` for `min`. Instead we can check something like "a constant vector consisted of not preferred zero" this will cover `<0, 0, ...>` as well as `<0, 5>`.

skatkov added inline comments.May 11 2023, 8:11 PM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
1141	Got it. So you actually want another improvement. For vector we actually need preferred/opposite zero or not zero actually. In this sense case with -0.0, 0.0 actually also makes sense. So I need one more test case and improvement in the code.

PTAL

Harbormaster completed remote builds in B231536: Diff 521563.May 11 2023, 11:48 PM

Great! Thanks. LGTM.

llvm/lib/Target/X86/X86ISelLowering.cpp
30300	It seems a little bit confusing for me that we generally do different things for vectors and scalars. Consider inputs `5.` and `<5., 5.>`: for the first we return false, for the second -- true. But I can't think of better naming explaining this behavior difference, or splitting into different lambdas. In fact, there is no much difference whether we return true or false because non zero cases are covered by `isKnownNeverZeroFloat` calls.

This revision is now accepted and ready to land.May 12 2023, 8:44 AM

Closed by commit rG6e19eea02bbe: [X86] Improve handling on zero constant for fminimum/fmaximum lowering (authored by skatkov). · Explain WhyMay 14 2023, 5:29 AM

This revision was automatically updated to reflect the committed changes.

skatkov added a commit: rG6e19eea02bbe: [X86] Improve handling on zero constant for fminimum/fmaximum lowering.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

15 lines

test/

CodeGen/

X86/

fminimum-fmaximum.ll

177 lines

Diff 521209

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 30,261 Lines • ▼ Show 20 Lines	assert((Op.getOpcode() == ISD::FMAXIMUM \|\| Op.getOpcode() == ISD::FMINIMUM) &&
"Expected FMAXIMUM or FMINIMUM opcode");		"Expected FMAXIMUM or FMINIMUM opcode");
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDValue X = Op.getOperand(0);		SDValue X = Op.getOperand(0);
SDValue Y = Op.getOperand(1);		SDValue Y = Op.getOperand(1);
SDLoc DL(Op);		SDLoc DL(Op);
uint64_t SizeInBits = VT.getScalarSizeInBits();		uint64_t SizeInBits = VT.getScalarSizeInBits();
APInt PreferredZero = APInt::getZero(SizeInBits);		APInt PreferredZero = APInt::getZero(SizeInBits);
		APInt OppositeZero = PreferredZero;
EVT IVT = VT.changeTypeToInteger();		EVT IVT = VT.changeTypeToInteger();
X86ISD::NodeType MinMaxOp;		X86ISD::NodeType MinMaxOp;
if (Op.getOpcode() == ISD::FMAXIMUM) {		if (Op.getOpcode() == ISD::FMAXIMUM) {
MinMaxOp = X86ISD::FMAX;		MinMaxOp = X86ISD::FMAX;
		OppositeZero.setSignBit();
} else {		} else {
PreferredZero.setSignBit();		PreferredZero.setSignBit();
MinMaxOp = X86ISD::FMIN;		MinMaxOp = X86ISD::FMIN;
}		}
EVT SetCCType =		EVT SetCCType =
TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);		TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);

// The tables below show the expected result of Max in cases of NaN and		// The tables below show the expected result of Max in cases of NaN and
// signed zeros.		// signed zeros.
//		//
// Y Y		// Y Y
// Num xNaN +0 -0		// Num xNaN +0 -0
// --------------- ---------------		// --------------- ---------------
// Num \| Max \| Y \| +0 \| +0 \| +0 \|		// Num \| Max \| Y \| +0 \| +0 \| +0 \|
// X --------------- X ---------------		// X --------------- X ---------------
// xNaN \| X \| X/Y \| -0 \| +0 \| -0 \|		// xNaN \| X \| X/Y \| -0 \| +0 \| -0 \|
// --------------- ---------------		// --------------- ---------------
//		//
// It is achieved by means of FMAX/FMIN with preliminary checks and operand		// It is achieved by means of FMAX/FMIN with preliminary checks and operand
// reordering.		// reordering.
//		//
// We check if any of operands is NaN and return NaN. Then we check if any of		// We check if any of operands is NaN and return NaN. Then we check if any of
// operands is zero or negative zero (for fmaximum and fminimum respectively)		// operands is zero or negative zero (for fmaximum and fminimum respectively)
// to ensure the correct zero is returned.		// to ensure the correct zero is returned.
auto IsPreferredZero = [PreferredZero](SDValue Op) {		auto MatchesZero = [](SDValue Op, APInt Zero) {
		goldstein.w.nUnsubmitted Not Done Reply Inline Actions nit: can you use a different variable name than `PreferredZero` as the value in the lambda doesn't always match whats in the outerscope. goldstein.w.n: nit: can you use a different variable name than `PreferredZero` as the value in the lambda…
		skatkovAuthorUnsubmitted Done Reply Inline Actions Agreed. Thanks, Will use just Zero. skatkov: Agreed. Thanks, Will use just Zero.
		e-kudUnsubmitted Not Done Reply Inline Actions It seems a little bit confusing for me that we generally do different things for vectors and scalars. Consider inputs `5.` and `<5., 5.>`: for the first we return false, for the second -- true. But I can't think of better naming explaining this behavior difference, or splitting into different lambdas. In fact, there is no much difference whether we return true or false because non zero cases are covered by `isKnownNeverZeroFloat` calls. e-kud: It seems a little bit confusing for me that we generally do different things for vectors and…
Op = peekThroughBitcasts(Op);		Op = peekThroughBitcasts(Op);
if (auto *CstOp = dyn_cast<ConstantFPSDNode>(Op))		if (auto *CstOp = dyn_cast<ConstantFPSDNode>(Op))
return CstOp->getValueAPF().bitcastToAPInt() == PreferredZero;		return CstOp->getValueAPF().bitcastToAPInt() == Zero;
if (auto *CstOp = dyn_cast<ConstantSDNode>(Op))		if (auto *CstOp = dyn_cast<ConstantSDNode>(Op))
return CstOp->getAPIntValue() == PreferredZero;		return CstOp->getAPIntValue() == Zero;
if (Op->getOpcode() == ISD::BUILD_VECTOR \|\|		if (Op->getOpcode() == ISD::BUILD_VECTOR \|\|
Op->getOpcode() == ISD::SPLAT_VECTOR) {		Op->getOpcode() == ISD::SPLAT_VECTOR) {
for (const SDValue &OpVal : Op->op_values()) {		for (const SDValue &OpVal : Op->op_values()) {
if (OpVal.isUndef())		if (OpVal.isUndef())
continue;		continue;
auto *CstOp = dyn_cast<ConstantFPSDNode>(OpVal);		auto *CstOp = dyn_cast<ConstantFPSDNode>(OpVal);
if (!CstOp)		if (!CstOp)
return false;		return false;
if (CstOp->getValueAPF().bitcastToAPInt() != PreferredZero)		if (CstOp->getValueAPF().bitcastToAPInt() != Zero)
return false;		return false;
}		}
return true;		return true;
}		}
return false;		return false;
};		};

bool IsXNeverNaN = DAG.isKnownNeverNaN(X);		bool IsXNeverNaN = DAG.isKnownNeverNaN(X);
bool IsYNeverNaN = DAG.isKnownNeverNaN(Y);		bool IsYNeverNaN = DAG.isKnownNeverNaN(Y);
bool IgnoreSignedZero = DAG.getTarget().Options.NoSignedZerosFPMath \|\|		bool IgnoreSignedZero = DAG.getTarget().Options.NoSignedZerosFPMath \|\|
Op->getFlags().hasNoSignedZeros() \|\|		Op->getFlags().hasNoSignedZeros() \|\|
DAG.isKnownNeverZeroFloat(X) \|\|		DAG.isKnownNeverZeroFloat(X) \|\|
DAG.isKnownNeverZeroFloat(Y);		DAG.isKnownNeverZeroFloat(Y);
SDValue NewX, NewY;		SDValue NewX, NewY;
if (IgnoreSignedZero \|\| IsPreferredZero(Y)) {		if (IgnoreSignedZero \|\| MatchesZero(Y, PreferredZero) \|\|
		MatchesZero(X, OppositeZero)) {
// Operands are already in right order or order does not matter.		// Operands are already in right order or order does not matter.
NewX = X;		NewX = X;
NewY = Y;		NewY = Y;
} else if (IsPreferredZero(X)) {		} else if (MatchesZero(X, PreferredZero) \|\| MatchesZero(Y, OppositeZero)) {
NewX = Y;		NewX = Y;
NewY = X;		NewY = X;
} else if (!VT.isVector() && (VT == MVT::f16 \|\| Subtarget.hasDQI()) &&		} else if (!VT.isVector() && (VT == MVT::f16 \|\| Subtarget.hasDQI()) &&
(Op->getFlags().hasNoNaNs() \|\| IsXNeverNaN \|\| IsYNeverNaN)) {		(Op->getFlags().hasNoNaNs() \|\| IsXNeverNaN \|\| IsYNeverNaN)) {
if (IsXNeverNaN)		if (IsXNeverNaN)
std::swap(X, Y);		std::swap(X, Y);
// VFPCLASSS consumes a vector type. So provide a minimal one corresponded		// VFPCLASSS consumes a vector type. So provide a minimal one corresponded
// xmm register.		// xmm register.
▲ Show 20 Lines • Show All 28,953 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fminimum-fmaximum.ll

	Show First 20 Lines • Show All 1,036 Lines • ▼ Show 20 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> %y)			%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> %y)
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @test_fminimum_vector_zero(<2 x double> %x) {			define <2 x double> @test_fminimum_vector_zero(<2 x double> %x) {
	; SSE2-LABEL: test_fminimum_vector_zero:			; SSE2-LABEL: test_fminimum_vector_zero:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: xorpd %xmm1, %xmm1
	; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1],xmm0[3,3]			; SSE2-NEXT: minpd %xmm0, %xmm1
	; SSE2-NEXT: pxor %xmm2, %xmm2			; SSE2-NEXT: movapd %xmm1, %xmm0
	; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
	; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE2-NEXT: andps %xmm2, %xmm1
	; SSE2-NEXT: andnps %xmm0, %xmm2
	; SSE2-NEXT: movaps %xmm2, %xmm3
	; SSE2-NEXT: minpd %xmm1, %xmm3
	; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: cmpunordpd %xmm2, %xmm0
	; SSE2-NEXT: andpd %xmm0, %xmm2
	; SSE2-NEXT: andnpd %xmm3, %xmm0
	; SSE2-NEXT: orpd %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: test_fminimum_vector_zero:			; AVX-LABEL: test_fminimum_vector_zero:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; AVX-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm1			; AVX-NEXT: vminpd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vpand %xmm0, %xmm1, %xmm2
	; AVX-NEXT: vpandn %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vminpd %xmm2, %xmm0, %xmm1
	; AVX-NEXT: vcmpunordpd %xmm0, %xmm0, %xmm2
	; AVX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X86-LABEL: test_fminimum_vector_zero:			; X86-LABEL: test_fminimum_vector_zero:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X86-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; X86-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm1			; X86-NEXT: vminpd %xmm0, %xmm1, %xmm0
	; X86-NEXT: vpand %xmm0, %xmm1, %xmm2
	; X86-NEXT: vpandn %xmm0, %xmm1, %xmm0
	; X86-NEXT: vminpd %xmm2, %xmm0, %xmm1
	; X86-NEXT: vcmpunordpd %xmm0, %xmm0, %xmm2
	; X86-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double 0., double 0.>)			%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double 0., double 0.>)
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x float> @test_fmaximum_vector_signed_zero(<4 x float> %x) {			define <4 x float> @test_fmaximum_vector_signed_zero(<4 x float> %x) {
	; SSE2-LABEL: test_fmaximum_vector_signed_zero:			; SSE2-LABEL: test_fmaximum_vector_signed_zero:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; SSE2-NEXT: movaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; SSE2-NEXT: movdqa %xmm1, %xmm2			; SSE2-NEXT: maxps %xmm0, %xmm1
	; SSE2-NEXT: pand %xmm0, %xmm2			; SSE2-NEXT: movaps %xmm1, %xmm0
	; SSE2-NEXT: pxor %xmm3, %xmm3
	; SSE2-NEXT: pcmpgtd %xmm0, %xmm3
	; SSE2-NEXT: movdqa %xmm3, %xmm4
	; SSE2-NEXT: pandn %xmm0, %xmm4
	; SSE2-NEXT: por %xmm2, %xmm4
	; SSE2-NEXT: pand %xmm3, %xmm0
	; SSE2-NEXT: pandn %xmm1, %xmm3
	; SSE2-NEXT: por %xmm3, %xmm0
	; SSE2-NEXT: movdqa %xmm0, %xmm1
	; SSE2-NEXT: maxps %xmm4, %xmm1
	; SSE2-NEXT: movdqa %xmm0, %xmm2
	; SSE2-NEXT: cmpunordps %xmm0, %xmm2
	; SSE2-NEXT: andps %xmm2, %xmm0
	; SSE2-NEXT: andnps %xmm1, %xmm2
	; SSE2-NEXT: orps %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX1-LABEL: test_fmaximum_vector_signed_zero:			; AVX1-LABEL: test_fmaximum_vector_signed_zero:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; AVX1-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; AVX1-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2			; AVX1-NEXT: vmaxps %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: vmaxps %xmm2, %xmm0, %xmm1
	; AVX1-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
	; AVX1-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX512-LABEL: test_fmaximum_vector_signed_zero:			; AVX512-LABEL: test_fmaximum_vector_signed_zero:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; AVX512-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; AVX512-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2			; AVX512-NEXT: vmaxps %xmm0, %xmm1, %xmm0
	; AVX512-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
	; AVX512-NEXT: vmaxps %xmm2, %xmm0, %xmm1
	; AVX512-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
	; AVX512-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; X86-LABEL: test_fmaximum_vector_signed_zero:			; X86-LABEL: test_fmaximum_vector_signed_zero:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; X86-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; X86-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2			; X86-NEXT: vmaxps %xmm0, %xmm1, %xmm0
	; X86-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
	; X86-NEXT: vmaxps %xmm2, %xmm0, %xmm1
	; X86-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
	; X86-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> <float -0., float -0., float -0., float -0.>)			%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> <float -0., float -0., float -0., float -0.>)
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @test_fminimum_vector_partially_zero(<2 x double> %x) {			define <2 x double> @test_fminimum_vector_partially_zero(<2 x double> %x) {
	; SSE2-LABEL: test_fminimum_vector_partially_zero:			; SSE2-LABEL: test_fminimum_vector_partially_zero:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: xorps %xmm1, %xmm1			; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE2-NEXT: pxor %xmm2, %xmm2			; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1],xmm0[3,3]
	; SSE2-NEXT: pcmpgtd %xmm0, %xmm2			; SSE2-NEXT: xorps %xmm2, %xmm2
	; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[1,1,3,3]			; SSE2-NEXT: pxor %xmm3, %xmm3
	; SSE2-NEXT: movhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]			; SSE2-NEXT: pcmpgtd %xmm1, %xmm3
	; SSE2-NEXT: movdqa %xmm3, %xmm4			; SSE2-NEXT: movhps {{.*#+}} xmm2 = xmm2[0,1],mem[0,1]
	; SSE2-NEXT: pandn %xmm1, %xmm4			; SSE2-NEXT: movdqa %xmm3, %xmm1
	; SSE2-NEXT: movdqa %xmm0, %xmm5			; SSE2-NEXT: pandn %xmm2, %xmm1
	; SSE2-NEXT: pand %xmm3, %xmm5			; SSE2-NEXT: movaps %xmm0, %xmm4
	; SSE2-NEXT: por %xmm4, %xmm5			; SSE2-NEXT: andps %xmm3, %xmm4
	; SSE2-NEXT: pand %xmm2, %xmm1			; SSE2-NEXT: orps %xmm1, %xmm4
				; SSE2-NEXT: pand %xmm0, %xmm2
	; SSE2-NEXT: pandn %xmm0, %xmm3			; SSE2-NEXT: pandn %xmm0, %xmm3
	; SSE2-NEXT: por %xmm1, %xmm3			; SSE2-NEXT: por %xmm2, %xmm3
	; SSE2-NEXT: movdqa %xmm3, %xmm1			; SSE2-NEXT: movdqa %xmm3, %xmm1
	; SSE2-NEXT: minpd %xmm5, %xmm1			; SSE2-NEXT: minpd %xmm4, %xmm1
	; SSE2-NEXT: movdqa %xmm3, %xmm0			; SSE2-NEXT: movdqa %xmm3, %xmm0
	; SSE2-NEXT: cmpunordpd %xmm3, %xmm0			; SSE2-NEXT: cmpunordpd %xmm3, %xmm0
	; SSE2-NEXT: andpd %xmm0, %xmm3			; SSE2-NEXT: andpd %xmm0, %xmm3
	; SSE2-NEXT: andnpd %xmm1, %xmm0			; SSE2-NEXT: andnpd %xmm1, %xmm0
	; SSE2-NEXT: orpd %xmm3, %xmm0			; SSE2-NEXT: orpd %xmm3, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: test_fminimum_vector_partially_zero:			; AVX-LABEL: test_fminimum_vector_partially_zero:
	Show All 12 Lines
	; X86-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; X86-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; X86-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]			; X86-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
	; X86-NEXT: vblendvpd %xmm0, %xmm0, %xmm1, %xmm2			; X86-NEXT: vblendvpd %xmm0, %xmm0, %xmm1, %xmm2
	; X86-NEXT: vblendvpd %xmm0, %xmm1, %xmm0, %xmm0			; X86-NEXT: vblendvpd %xmm0, %xmm1, %xmm0, %xmm0
	; X86-NEXT: vminpd %xmm2, %xmm0, %xmm1			; X86-NEXT: vminpd %xmm2, %xmm0, %xmm1
	; X86-NEXT: vcmpunordpd %xmm0, %xmm0, %xmm2			; X86-NEXT: vcmpunordpd %xmm0, %xmm0, %xmm2
	; X86-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0			; X86-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double 0., double 5.>)			%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double 0., double -0.>)
				e-kudUnsubmitted Not Done Reply Inline Actions I think this test shows that a predicate is not preferred zero should be more generic than is opposite zero (in context of constant vectors). e-kud: I think this test shows that a predicate //is not preferred zero// should be more generic than…
				skatkovAuthorUnsubmitted Done Reply Inline Actions I will modify test to use constant 0.f and -0.f to cover both not a preferred zero and not an opposite zero. The intention of the test is to check that even if any of operands is zero we need all of them zero of the same sign. skatkov: I will modify test to use constant 0.f and -0.f to cover both not a preferred zero and not an…
				e-kudUnsubmitted Not Done Reply Inline Actions Sorry, my intentions were unclear. I mean in case of `min(%x, <0, 5>)` we need to generate a single instruction `min(<0, 5>, %x)`. But we generate all kinds of checks instead. This happens because we special case only `<0, 0, ...>` for `min`. Instead we can check something like "a constant vector consisted of not preferred zero" this will cover `<0, 0, ...>` as well as `<0, 5>`. e-kud: Sorry, my intentions were unclear. I mean in case of `min(%x, <0, 5>)` we need to generate a…
				skatkovAuthorUnsubmitted Done Reply Inline Actions Got it. So you actually want another improvement. For vector we actually need preferred/opposite zero or not zero actually. In this sense case with -0.0, 0.0 actually also makes sense. So I need one more test case and improvement in the code. skatkov: Got it. So you actually want another improvement. For vector we actually need…
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x float> @test_fmaximum_vector_non_zero(<4 x float> %x) {			define <4 x float> @test_fmaximum_vector_non_zero(<4 x float> %x) {
	; SSE2-LABEL: test_fmaximum_vector_non_zero:			; SSE2-LABEL: test_fmaximum_vector_non_zero:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movaps {{.*#+}} xmm1 = [5.0E+0,4.0E+0,3.0E+0,2.0E+0]			; SSE2-NEXT: movaps {{.*#+}} xmm1 = [5.0E+0,4.0E+0,3.0E+0,2.0E+0]
	; SSE2-NEXT: maxps %xmm0, %xmm1			; SSE2-NEXT: maxps %xmm0, %xmm1
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double 0., double 0x7fff000000000000>)			%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double 0., double 0x7fff000000000000>)
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <2 x double> @test_fminimum_vector_zero_first(<2 x double> %x) {			define <2 x double> @test_fminimum_vector_zero_first(<2 x double> %x) {
	; SSE2-LABEL: test_fminimum_vector_zero_first:			; SSE2-LABEL: test_fminimum_vector_zero_first:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: xorpd %xmm1, %xmm1
	; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1],xmm0[3,3]			; SSE2-NEXT: minpd %xmm0, %xmm1
	; SSE2-NEXT: pxor %xmm2, %xmm2			; SSE2-NEXT: movapd %xmm1, %xmm0
	; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
	; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE2-NEXT: andps %xmm2, %xmm1
	; SSE2-NEXT: andnps %xmm0, %xmm2
	; SSE2-NEXT: movaps %xmm2, %xmm3
	; SSE2-NEXT: minpd %xmm1, %xmm3
	; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: cmpunordpd %xmm2, %xmm0
	; SSE2-NEXT: andpd %xmm0, %xmm2
	; SSE2-NEXT: andnpd %xmm3, %xmm0
	; SSE2-NEXT: orpd %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: test_fminimum_vector_zero_first:			; AVX-LABEL: test_fminimum_vector_zero_first:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; AVX-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm1			; AVX-NEXT: vminpd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vpand %xmm0, %xmm1, %xmm2
	; AVX-NEXT: vpandn %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vminpd %xmm2, %xmm0, %xmm1
	; AVX-NEXT: vcmpunordpd %xmm0, %xmm0, %xmm2
	; AVX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X86-LABEL: test_fminimum_vector_zero_first:			; X86-LABEL: test_fminimum_vector_zero_first:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X86-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; X86-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm1			; X86-NEXT: vminpd %xmm0, %xmm1, %xmm0
	; X86-NEXT: vpand %xmm0, %xmm1, %xmm2
	; X86-NEXT: vpandn %xmm0, %xmm1, %xmm0
	; X86-NEXT: vminpd %xmm2, %xmm0, %xmm1
	; X86-NEXT: vcmpunordpd %xmm0, %xmm0, %xmm2
	; X86-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> <double 0., double 0.>, <2 x double> %x)			%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> <double 0., double 0.>, <2 x double> %x)
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <2 x double> @test_fminimum_vector_signed_zero(<2 x double> %x) {			define <2 x double> @test_fminimum_vector_signed_zero(<2 x double> %x) {
	; SSE2-LABEL: test_fminimum_vector_signed_zero:			; SSE2-LABEL: test_fminimum_vector_signed_zero:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	Show All 21 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double -0., double -0.>)			%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> <double -0., double -0.>)
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x float> @test_fmaximum_vector_signed_zero_first(<4 x float> %x) {			define <4 x float> @test_fmaximum_vector_signed_zero_first(<4 x float> %x) {
	; SSE2-LABEL: test_fmaximum_vector_signed_zero_first:			; SSE2-LABEL: test_fmaximum_vector_signed_zero_first:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; SSE2-NEXT: movaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; SSE2-NEXT: movdqa %xmm1, %xmm2			; SSE2-NEXT: maxps %xmm0, %xmm1
	; SSE2-NEXT: pand %xmm0, %xmm2			; SSE2-NEXT: movaps %xmm1, %xmm0
	; SSE2-NEXT: pxor %xmm3, %xmm3
	; SSE2-NEXT: pcmpgtd %xmm0, %xmm3
	; SSE2-NEXT: movdqa %xmm3, %xmm4
	; SSE2-NEXT: pandn %xmm0, %xmm4
	; SSE2-NEXT: por %xmm2, %xmm4
	; SSE2-NEXT: pand %xmm3, %xmm0
	; SSE2-NEXT: pandn %xmm1, %xmm3
	; SSE2-NEXT: por %xmm3, %xmm0
	; SSE2-NEXT: movdqa %xmm0, %xmm1
	; SSE2-NEXT: maxps %xmm4, %xmm1
	; SSE2-NEXT: movdqa %xmm0, %xmm2
	; SSE2-NEXT: cmpunordps %xmm0, %xmm2
	; SSE2-NEXT: andps %xmm2, %xmm0
	; SSE2-NEXT: andnps %xmm1, %xmm2
	; SSE2-NEXT: orps %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX1-LABEL: test_fmaximum_vector_signed_zero_first:			; AVX1-LABEL: test_fmaximum_vector_signed_zero_first:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; AVX1-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; AVX1-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2			; AVX1-NEXT: vmaxps %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: vmaxps %xmm2, %xmm0, %xmm1
	; AVX1-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
	; AVX1-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX512-LABEL: test_fmaximum_vector_signed_zero_first:			; AVX512-LABEL: test_fmaximum_vector_signed_zero_first:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; AVX512-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; AVX512-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2			; AVX512-NEXT: vmaxps %xmm0, %xmm1, %xmm0
	; AVX512-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
	; AVX512-NEXT: vmaxps %xmm2, %xmm0, %xmm1
	; AVX512-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
	; AVX512-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; X86-LABEL: test_fmaximum_vector_signed_zero_first:			; X86-LABEL: test_fmaximum_vector_signed_zero_first:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; X86-NEXT: vmovaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; X86-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2			; X86-NEXT: vmaxps %xmm0, %xmm1, %xmm0
	; X86-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
	; X86-NEXT: vmaxps %xmm2, %xmm0, %xmm1
	; X86-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
	; X86-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> <float -0., float -0., float -0., float -0.>, <4 x float> %x)			%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> <float -0., float -0., float -0., float -0.>, <4 x float> %x)
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @test_fmaximum_vector_zero(<4 x float> %x) {			define <4 x float> @test_fmaximum_vector_zero(<4 x float> %x) {
	; SSE2-LABEL: test_fmaximum_vector_zero:			; SSE2-LABEL: test_fmaximum_vector_zero:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	Show All 28 Lines