This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
7/7
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
4/6
fcmp-logic.ll
-
lzcnt-zext-cmp.ll

Differential D110342

[x86] convert logic-of-FP-compares to FP logic-of-vector-compares
ClosedPublic

Authored by spatel on Sep 23 2021, 9:35 AM.

Download Raw Diff

Details

Reviewers

pengfei
craig.topper
lebedev.ri
RKSimon

Commits

rG09e71c367af3: [x86] convert logic-of-FP-compares to FP logic-of-vector-compares

Summary

This is motivated by the examples and discussion in:
https://llvm.org/PR51245
...and related bugs.

By using vector compares and vector logic, we can convert 2 'set' instructions into 1 'movd' or 'movmsk' and generally improve throughput/reduce instructions.

Unfortunately, we don't have a complete vector compare ISA before AVX, so I left SSE-only out of this patch. Ie, we'd need extra logic ops to simulate the missing predicates for SSE 'cmpp*', so it's not as clearly a win.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Sep 23 2021, 9:35 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptSep 23 2021, 9:35 AM

spatel requested review of this revision.Sep 23 2021, 9:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 9:35 AM

Harbormaster completed remote builds in B125376: Diff 374583.Sep 23 2021, 9:35 AM

RKSimon added inline comments.Sep 23 2021, 10:10 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
45511	Why not use SCALAR_TO_VECTOR ?
llvm/test/CodeGen/X86/fcmp-logic.ll
238	These could still be merged (as long as we only use the lower 32-bits).?

RKSimon added inline comments.Sep 23 2021, 10:14 AM

llvm/test/CodeGen/X86/fcmp-logic.ll
2–4	We should probably add AVX512 coverage to ensure we're not doing anything that could be done better with kmask registers

spatel mentioned this in rG5188e2c9ce1f: [x86] add AVX512 run for fcmp+logic ops; NFC.Sep 23 2021, 11:28 AM

spatel marked 3 inline comments as done.Sep 23 2021, 11:37 AM

spatel added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
45511	No reason - forgot we had that opcode.
llvm/test/CodeGen/X86/fcmp-logic.ll
238	Yes - we'd need to add a cast to make it work, but hopefully that would get folded away at some point. I'll add a TODO for now.

Patch updated:

Use SCALAR_TO_VECTOR opcode to reduce code.
Added avx512f to test to show k-reg codegen.
Added TODO comment about enhancement for mismatched FP types.

Harbormaster completed remote builds in B125417: Diff 374638.Sep 23 2021, 12:40 PM

LGTM - cheers

I don't think this patch will cover it, but as a follow up please could you add test coverage for logic containing more than 2 fcmps?

llvm/lib/Target/X86/X86ISelLowering.cpp
45510	Do we need to check/assert that VT is MVT::i1 ?

This revision is now accepted and ready to land.Sep 23 2021, 1:07 PM

pengfei added inline comments.Sep 23 2021, 10:53 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
45452–45454	Nit, why use hyphen here but leave space above?
45510	I think it's Ok since the logic operands come from setcc directly. Do we need to handle `(logic (zext (setcc ...`?

pengfei added inline comments.Sep 23 2021, 10:56 PM

llvm/test/CodeGen/X86/fcmp-logic.ll
242–258	Should we add a common `CHECK` for them?

spatel marked 3 inline comments as done.Sep 24 2021, 5:17 AM

spatel added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
45452–45454	I just overlooked the first one - will fix it.
45510	I haven't been able to find an example where we have the pattern late only, but the result of x86 scalar setcc could be i8 after some transforms. There's no difference on the existing tests if I add the check for i1, so I'll do that. Also, the pattern where we have sext/zext between the logic op and setcc is already folded away by DAGCombiner (the cast is moved after the logic), so I don't think we need to worry about that case.
llvm/test/CodeGen/X86/fcmp-logic.ll
242–258	It's the "v" in the mnemonics that makes these different. Do we have a scrubber option in the script for that?

Patch updated:

Made "floating-point" consistent in the function comment.
Added MVT::i1 constraint for safety.

LGTM.

llvm/test/CodeGen/X86/fcmp-logic.ll
242–258	Oh, yes. I don't think there's such an option. I did have the same thought. But I think it's not easy to achieve. It's valuable only when we check SSE and AVX instructions at the same time. Omitting 'v' unconditionally is over kill.

Harbormaster completed remote builds in B125539: Diff 374811.Sep 24 2021, 5:43 AM

This revision was landed with ongoing or failed builds.Sep 24 2021, 8:38 AM

Closed by commit rG09e71c367af3: [x86] convert logic-of-FP-compares to FP logic-of-vector-compares (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG09e71c367af3: [x86] convert logic-of-FP-compares to FP logic-of-vector-compares.

spatel mentioned this in rG97948620b1ac: [x86] add test for 3 fcmps and logic; NFC.Sep 30 2021, 8:03 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

48 lines

test/

CodeGen/

X86/

fcmp-logic.ll

195 lines

lzcnt-zext-cmp.ll

11 lines

Diff 374859

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 45,443 Lines • ▼ Show 20 Lines	static unsigned convertIntLogicToFPLogicOpcode(unsigned Opcode) {
default: llvm_unreachable("Unexpected input node for FP logic conversion");		default: llvm_unreachable("Unexpected input node for FP logic conversion");
case ISD::AND: FPOpcode = X86ISD::FAND; break;		case ISD::AND: FPOpcode = X86ISD::FAND; break;
case ISD::OR: FPOpcode = X86ISD::FOR; break;		case ISD::OR: FPOpcode = X86ISD::FOR; break;
case ISD::XOR: FPOpcode = X86ISD::FXOR; break;		case ISD::XOR: FPOpcode = X86ISD::FXOR; break;
}		}
return FPOpcode;		return FPOpcode;
}		}

/// If both input operands of a logic op are being cast from floating point		/// If both input operands of a logic op are being cast from floating-point
/// types, try to convert this into a floating point logic node to avoid		/// types or FP compares, try to convert this into a floating-point logic node
/// unnecessary moves from SSE to integer registers.		/// to avoid unnecessary moves from SSE to integer registers.
		pengfeiUnsubmitted Done Reply Inline Actions Nit, why use hyphen here but leave space above? pengfei: Nit, why use hyphen here but leave space above?
		spatelAuthorUnsubmitted Done Reply Inline Actions I just overlooked the first one - will fix it. spatel: I just overlooked the first one - will fix it.
static SDValue convertIntLogicToFPLogic(SDNode *N, SelectionDAG &DAG,		static SDValue convertIntLogicToFPLogic(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDLoc DL(N);		SDLoc DL(N);

if (N0.getOpcode() != ISD::BITCAST \|\| N1.getOpcode() != ISD::BITCAST)		if (!((N0.getOpcode() == ISD::BITCAST && N1.getOpcode() == ISD::BITCAST) \|\|
return SDValue();		(N0.getOpcode() == ISD::SETCC && N1.getOpcode() == ISD::SETCC)))

if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
SDValue N10 = N1.getOperand(0);		SDValue N10 = N1.getOperand(0);
EVT N00Type = N00.getValueType();		EVT N00Type = N00.getValueType();
EVT N10Type = N10.getValueType();		EVT N10Type = N10.getValueType();

// Ensure that both types are the same and are legal scalar fp types.		// Ensure that both types are the same and are legal scalar fp types.
if (N00Type != N10Type \|\| !((Subtarget.hasSSE1() && N00Type == MVT::f32) \|\|		if (N00Type != N10Type \|\| !((Subtarget.hasSSE1() && N00Type == MVT::f32) \|\|
(Subtarget.hasSSE2() && N00Type == MVT::f64) \|\|		(Subtarget.hasSSE2() && N00Type == MVT::f64) \|\|
(Subtarget.hasFP16() && N00Type == MVT::f16)))		(Subtarget.hasFP16() && N00Type == MVT::f16)))
return SDValue();		return SDValue();

		if (N0.getOpcode() == ISD::BITCAST && !DCI.isBeforeLegalizeOps()) {
unsigned FPOpcode = convertIntLogicToFPLogicOpcode(N->getOpcode());		unsigned FPOpcode = convertIntLogicToFPLogicOpcode(N->getOpcode());
SDValue FPLogic = DAG.getNode(FPOpcode, DL, N00Type, N00, N10);		SDValue FPLogic = DAG.getNode(FPOpcode, DL, N00Type, N00, N10);
return DAG.getBitcast(VT, FPLogic);		return DAG.getBitcast(VT, FPLogic);
}		}

		// The vector ISA for FP predicates is incomplete before AVX, so converting
		// COMIS* to CMPS* may not be a win before AVX.
		// TODO: Check types/predicates to see if they are available with SSE/SSE2.
		if (!Subtarget.hasAVX() \|\| VT != MVT::i1 \|\| N0.getOpcode() != ISD::SETCC \|\|
		!N0.hasOneUse() \|\| !N1.hasOneUse())
		return SDValue();

		// Convert scalar FP compares and logic to vector compares (COMIS* to CMPS*)
		// and vector logic:
		// logic (setcc N00, N01), (setcc N10, N11) -->
		// extelt (logic (setcc (s2v N00), (s2v N01)), setcc (s2v N10), (s2v N11))), 0
		unsigned NumElts = 128 / N00Type.getSizeInBits();
		EVT VecVT = EVT::getVectorVT(*DAG.getContext(), N00Type, NumElts);
		EVT BoolVecVT = EVT::getVectorVT(*DAG.getContext(), MVT::i1, NumElts);
		SDValue ZeroIndex = DAG.getVectorIdxConstant(0, DL);
		SDValue N01 = N0.getOperand(1);
		SDValue N11 = N1.getOperand(1);
		SDValue Vec00 = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VecVT, N00);
		SDValue Vec01 = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VecVT, N01);
		SDValue Vec10 = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VecVT, N10);
		SDValue Vec11 = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VecVT, N11);
		SDValue Setcc0 = DAG.getSetCC(DL, BoolVecVT, Vec00, Vec01,
		cast<CondCodeSDNode>(N0.getOperand(2))->get());
		SDValue Setcc1 = DAG.getSetCC(DL, BoolVecVT, Vec10, Vec11,
		cast<CondCodeSDNode>(N1.getOperand(2))->get());
		SDValue Logic = DAG.getNode(N->getOpcode(), DL, BoolVecVT, Setcc0, Setcc1);
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, VT, Logic, ZeroIndex);
		RKSimonUnsubmitted Done Reply Inline Actions Do we need to check/assert that VT is MVT::i1 ? RKSimon: Do we need to check/assert that VT is MVT::i1 ?
		pengfeiUnsubmitted Done Reply Inline Actions I think it's Ok since the logic operands come from setcc directly. Do we need to handle `(logic (zext (setcc ...`? pengfei: I think it's Ok since the logic operands come from setcc directly. Do we need to handle `(logic…
		spatelAuthorUnsubmitted Done Reply Inline Actions I haven't been able to find an example where we have the pattern late only, but the result of x86 scalar setcc could be i8 after some transforms. There's no difference on the existing tests if I add the check for i1, so I'll do that. Also, the pattern where we have sext/zext between the logic op and setcc is already folded away by DAGCombiner (the cast is moved after the logic), so I don't think we need to worry about that case. spatel: I haven't been able to find an example where we have the pattern late only, but the result of…
		}
		RKSimonUnsubmitted Done Reply Inline Actions Why not use SCALAR_TO_VECTOR ? RKSimon: Why not use SCALAR_TO_VECTOR ?
		spatelAuthorUnsubmitted Done Reply Inline Actions No reason - forgot we had that opcode. spatel: No reason - forgot we had that opcode.

// Attempt to fold BITOP(MOVMSK(X),MOVMSK(Y)) -> MOVMSK(BITOP(X,Y))		// Attempt to fold BITOP(MOVMSK(X),MOVMSK(Y)) -> MOVMSK(BITOP(X,Y))
// to reduce XMM->GPR traffic.		// to reduce XMM->GPR traffic.
static SDValue combineBitOpWithMOVMSK(SDNode *N, SelectionDAG &DAG) {		static SDValue combineBitOpWithMOVMSK(SDNode *N, SelectionDAG &DAG) {
unsigned Opc = N->getOpcode();		unsigned Opc = N->getOpcode();
assert((Opc == ISD::OR \|\| Opc == ISD::AND \|\| Opc == ISD::XOR) &&		assert((Opc == ISD::OR \|\| Opc == ISD::AND \|\| Opc == ISD::XOR) &&
"Unexpected bit opcode");		"Unexpected bit opcode");

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 8,323 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fcmp-logic.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- -mattr=sse2 \| FileCheck %s --check-prefixes=SSE2			; RUN: llc < %s -mtriple=x86_64-- -mattr=sse2 \| FileCheck %s --check-prefixes=SSE2
	; RUN: llc < %s -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX			; RUN: llc < %s -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX,AVX1
	; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f \| FileCheck %s --check-prefixes=AVX			; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f \| FileCheck %s --check-prefixes=AVX,AVX512
				RKSimonUnsubmitted Done Reply Inline Actions We should probably add AVX512 coverage to ensure we're not doing anything that could be done better with kmask registers RKSimon: We should probably add AVX512 coverage to ensure we're not doing anything that could be done…

	define i1 @olt_ole_and_f32(float %w, float %x, float %y, float %z) {			define i1 @olt_ole_and_f32(float %w, float %x, float %y, float %z) {
	; SSE2-LABEL: olt_ole_and_f32:			; SSE2-LABEL: olt_ole_and_f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomiss %xmm0, %xmm1			; SSE2-NEXT: ucomiss %xmm0, %xmm1
	; SSE2-NEXT: seta %cl			; SSE2-NEXT: seta %cl
	; SSE2-NEXT: ucomiss %xmm2, %xmm3			; SSE2-NEXT: ucomiss %xmm2, %xmm3
	; SSE2-NEXT: setae %al			; SSE2-NEXT: setae %al
	; SSE2-NEXT: andb %cl, %al			; SSE2-NEXT: andb %cl, %al
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: olt_ole_and_f32:			; AVX1-LABEL: olt_ole_and_f32:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vucomiss %xmm0, %xmm1			; AVX1-NEXT: vcmpleps %xmm3, %xmm2, %xmm2
	; AVX-NEXT: seta %cl			; AVX1-NEXT: vcmpltps %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vucomiss %xmm2, %xmm3			; AVX1-NEXT: vandps %xmm2, %xmm0, %xmm0
	; AVX-NEXT: setae %al			; AVX1-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: andb %cl, %al			; AVX1-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX512-LABEL: olt_ole_and_f32:
				; AVX512: # %bb.0:
				; AVX512-NEXT: # kill: def $xmm3 killed $xmm3 def $zmm3
				; AVX512-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2
				; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
				; AVX512-NEXT: vcmpltps %zmm1, %zmm0, %k1
				; AVX512-NEXT: vcmpleps %zmm3, %zmm2, %k0 {%k1}
				; AVX512-NEXT: kmovw %k0, %eax
				; AVX512-NEXT: # kill: def $al killed $al killed $eax
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%f1 = fcmp olt float %w, %x			%f1 = fcmp olt float %w, %x
	%f2 = fcmp ole float %y, %z			%f2 = fcmp ole float %y, %z
	%r = and i1 %f1, %f2			%r = and i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

	define i1 @oge_oeq_or_f32(float %w, float %x, float %y, float %z) {			define i1 @oge_oeq_or_f32(float %w, float %x, float %y, float %z) {
	; SSE2-LABEL: oge_oeq_or_f32:			; SSE2-LABEL: oge_oeq_or_f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomiss %xmm1, %xmm0			; SSE2-NEXT: ucomiss %xmm1, %xmm0
	; SSE2-NEXT: setae %cl			; SSE2-NEXT: setae %cl
	; SSE2-NEXT: ucomiss %xmm3, %xmm2			; SSE2-NEXT: ucomiss %xmm3, %xmm2
	; SSE2-NEXT: setnp %dl			; SSE2-NEXT: setnp %dl
	; SSE2-NEXT: sete %al			; SSE2-NEXT: sete %al
	; SSE2-NEXT: andb %dl, %al			; SSE2-NEXT: andb %dl, %al
	; SSE2-NEXT: orb %cl, %al			; SSE2-NEXT: orb %cl, %al
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: oge_oeq_or_f32:			; AVX1-LABEL: oge_oeq_or_f32:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vucomiss %xmm1, %xmm0			; AVX1-NEXT: vcmpeqps %xmm3, %xmm2, %xmm2
	; AVX-NEXT: setae %cl			; AVX1-NEXT: vcmpleps %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vucomiss %xmm3, %xmm2			; AVX1-NEXT: vorps %xmm2, %xmm0, %xmm0
	; AVX-NEXT: setnp %dl			; AVX1-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: sete %al			; AVX1-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: andb %dl, %al			; AVX1-NEXT: retq
	; AVX-NEXT: orb %cl, %al			;
	; AVX-NEXT: retq			; AVX512-LABEL: oge_oeq_or_f32:
				; AVX512: # %bb.0:
				; AVX512-NEXT: # kill: def $xmm3 killed $xmm3 def $zmm3
				; AVX512-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2
				; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
				; AVX512-NEXT: vcmpeqps %zmm3, %zmm2, %k0
				; AVX512-NEXT: vcmpleps %zmm0, %zmm1, %k1
				; AVX512-NEXT: korw %k0, %k1, %k0
				; AVX512-NEXT: kmovw %k0, %eax
				; AVX512-NEXT: # kill: def $al killed $al killed $eax
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%f1 = fcmp oge float %w, %x			%f1 = fcmp oge float %w, %x
	%f2 = fcmp oeq float %y, %z			%f2 = fcmp oeq float %y, %z
	%r = or i1 %f1, %f2			%r = or i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

	define i1 @ord_one_xor_f32(float %w, float %x, float %y, float %z) {			define i1 @ord_one_xor_f32(float %w, float %x, float %y, float %z) {
	; SSE2-LABEL: ord_one_xor_f32:			; SSE2-LABEL: ord_one_xor_f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomiss %xmm1, %xmm0			; SSE2-NEXT: ucomiss %xmm1, %xmm0
	; SSE2-NEXT: setnp %cl			; SSE2-NEXT: setnp %cl
	; SSE2-NEXT: ucomiss %xmm3, %xmm2			; SSE2-NEXT: ucomiss %xmm3, %xmm2
	; SSE2-NEXT: setne %al			; SSE2-NEXT: setne %al
	; SSE2-NEXT: xorb %cl, %al			; SSE2-NEXT: xorb %cl, %al
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: ord_one_xor_f32:			; AVX1-LABEL: ord_one_xor_f32:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vucomiss %xmm1, %xmm0			; AVX1-NEXT: vcmpneq_oqps %xmm3, %xmm2, %xmm2
	; AVX-NEXT: setnp %cl			; AVX1-NEXT: vcmpordps %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vucomiss %xmm3, %xmm2			; AVX1-NEXT: vxorps %xmm2, %xmm0, %xmm0
	; AVX-NEXT: setne %al			; AVX1-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: xorb %cl, %al			; AVX1-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX512-LABEL: ord_one_xor_f32:
				; AVX512: # %bb.0:
				; AVX512-NEXT: # kill: def $xmm3 killed $xmm3 def $zmm3
				; AVX512-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2
				; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
				; AVX512-NEXT: vcmpneq_oqps %zmm3, %zmm2, %k0
				; AVX512-NEXT: vcmpordps %zmm1, %zmm0, %k1
				; AVX512-NEXT: kxorw %k0, %k1, %k0
				; AVX512-NEXT: kmovw %k0, %eax
				; AVX512-NEXT: # kill: def $al killed $al killed $eax
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%f1 = fcmp ord float %w, %x			%f1 = fcmp ord float %w, %x
	%f2 = fcmp one float %y, %z			%f2 = fcmp one float %y, %z
	%r = xor i1 %f1, %f2			%r = xor i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

	define i1 @une_ugt_and_f64(double %w, double %x, double %y, double %z) {			define i1 @une_ugt_and_f64(double %w, double %x, double %y, double %z) {
	; SSE2-LABEL: une_ugt_and_f64:			; SSE2-LABEL: une_ugt_and_f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomisd %xmm1, %xmm0			; SSE2-NEXT: ucomisd %xmm1, %xmm0
	; SSE2-NEXT: setp %al			; SSE2-NEXT: setp %al
	; SSE2-NEXT: setne %cl			; SSE2-NEXT: setne %cl
	; SSE2-NEXT: orb %al, %cl			; SSE2-NEXT: orb %al, %cl
	; SSE2-NEXT: ucomisd %xmm2, %xmm3			; SSE2-NEXT: ucomisd %xmm2, %xmm3
	; SSE2-NEXT: setb %al			; SSE2-NEXT: setb %al
	; SSE2-NEXT: andb %cl, %al			; SSE2-NEXT: andb %cl, %al
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: une_ugt_and_f64:			; AVX1-LABEL: une_ugt_and_f64:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vucomisd %xmm1, %xmm0			; AVX1-NEXT: vcmpnlepd %xmm3, %xmm2, %xmm2
	; AVX-NEXT: setp %al			; AVX1-NEXT: vcmpneqpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: setne %cl			; AVX1-NEXT: vandpd %xmm2, %xmm0, %xmm0
	; AVX-NEXT: orb %al, %cl			; AVX1-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: vucomisd %xmm2, %xmm3			; AVX1-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: setb %al			; AVX1-NEXT: retq
	; AVX-NEXT: andb %cl, %al			;
	; AVX-NEXT: retq			; AVX512-LABEL: une_ugt_and_f64:
				; AVX512: # %bb.0:
				; AVX512-NEXT: # kill: def $xmm3 killed $xmm3 def $zmm3
				; AVX512-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2
				; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
				; AVX512-NEXT: vcmpneqpd %zmm1, %zmm0, %k1
				; AVX512-NEXT: vcmpnlepd %zmm3, %zmm2, %k0 {%k1}
				; AVX512-NEXT: kmovw %k0, %eax
				; AVX512-NEXT: # kill: def $al killed $al killed $eax
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%f1 = fcmp une double %w, %x			%f1 = fcmp une double %w, %x
	%f2 = fcmp ugt double %y, %z			%f2 = fcmp ugt double %y, %z
	%r = and i1 %f1, %f2			%r = and i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

	define i1 @ult_uge_or_f64(double %w, double %x, double %y, double %z) {			define i1 @ult_uge_or_f64(double %w, double %x, double %y, double %z) {
	; SSE2-LABEL: ult_uge_or_f64:			; SSE2-LABEL: ult_uge_or_f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomisd %xmm1, %xmm0			; SSE2-NEXT: ucomisd %xmm1, %xmm0
	; SSE2-NEXT: setb %cl			; SSE2-NEXT: setb %cl
	; SSE2-NEXT: ucomisd %xmm2, %xmm3			; SSE2-NEXT: ucomisd %xmm2, %xmm3
	; SSE2-NEXT: setbe %al			; SSE2-NEXT: setbe %al
	; SSE2-NEXT: orb %cl, %al			; SSE2-NEXT: orb %cl, %al
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: ult_uge_or_f64:			; AVX1-LABEL: ult_uge_or_f64:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vucomisd %xmm1, %xmm0			; AVX1-NEXT: vcmpnltpd %xmm3, %xmm2, %xmm2
	; AVX-NEXT: setb %cl			; AVX1-NEXT: vcmpnlepd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vucomisd %xmm2, %xmm3			; AVX1-NEXT: vorpd %xmm2, %xmm0, %xmm0
	; AVX-NEXT: setbe %al			; AVX1-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: orb %cl, %al			; AVX1-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX512-LABEL: ult_uge_or_f64:
				; AVX512: # %bb.0:
				; AVX512-NEXT: # kill: def $xmm3 killed $xmm3 def $zmm3
				; AVX512-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2
				; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
				; AVX512-NEXT: vcmpnltpd %zmm3, %zmm2, %k0
				; AVX512-NEXT: vcmpnlepd %zmm0, %zmm1, %k1
				; AVX512-NEXT: korw %k0, %k1, %k0
				; AVX512-NEXT: kmovw %k0, %eax
				; AVX512-NEXT: # kill: def $al killed $al killed $eax
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%f1 = fcmp ult double %w, %x			%f1 = fcmp ult double %w, %x
	%f2 = fcmp uge double %y, %z			%f2 = fcmp uge double %y, %z
	%r = or i1 %f1, %f2			%r = or i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

	define i1 @une_uno_xor_f64(double %w, double %x, double %y, double %z) {			define i1 @une_uno_xor_f64(double %w, double %x, double %y, double %z) {
	; SSE2-LABEL: une_uno_xor_f64:			; SSE2-LABEL: une_uno_xor_f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomisd %xmm1, %xmm0			; SSE2-NEXT: ucomisd %xmm1, %xmm0
	; SSE2-NEXT: setp %al			; SSE2-NEXT: setp %al
	; SSE2-NEXT: setne %cl			; SSE2-NEXT: setne %cl
	; SSE2-NEXT: orb %al, %cl			; SSE2-NEXT: orb %al, %cl
	; SSE2-NEXT: ucomisd %xmm3, %xmm2			; SSE2-NEXT: ucomisd %xmm3, %xmm2
	; SSE2-NEXT: setp %al			; SSE2-NEXT: setp %al
	; SSE2-NEXT: xorb %cl, %al			; SSE2-NEXT: xorb %cl, %al
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: une_uno_xor_f64:			; AVX1-LABEL: une_uno_xor_f64:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vucomisd %xmm1, %xmm0			; AVX1-NEXT: vcmpunordpd %xmm3, %xmm2, %xmm2
	; AVX-NEXT: setp %al			; AVX1-NEXT: vcmpneqpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: setne %cl			; AVX1-NEXT: vxorpd %xmm2, %xmm0, %xmm0
	; AVX-NEXT: orb %al, %cl			; AVX1-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: vucomisd %xmm3, %xmm2			; AVX1-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: setp %al			; AVX1-NEXT: retq
	; AVX-NEXT: xorb %cl, %al			;
	; AVX-NEXT: retq			; AVX512-LABEL: une_uno_xor_f64:
				; AVX512: # %bb.0:
				; AVX512-NEXT: # kill: def $xmm3 killed $xmm3 def $zmm3
				; AVX512-NEXT: # kill: def $xmm2 killed $xmm2 def $zmm2
				; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
				; AVX512-NEXT: vcmpunordpd %zmm3, %zmm2, %k0
				; AVX512-NEXT: vcmpneqpd %zmm1, %zmm0, %k1
				; AVX512-NEXT: kxorw %k0, %k1, %k0
				; AVX512-NEXT: kmovw %k0, %eax
				; AVX512-NEXT: # kill: def $al killed $al killed $eax
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%f1 = fcmp une double %w, %x			%f1 = fcmp une double %w, %x
	%f2 = fcmp uno double %y, %z			%f2 = fcmp uno double %y, %z
	%r = xor i1 %f1, %f2			%r = xor i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

				; This uses ucomis because the types do not match.
				RKSimonUnsubmitted Done Reply Inline Actions These could still be merged (as long as we only use the lower 32-bits).? RKSimon: These could still be merged (as long as we only use the lower 32-bits).?
				spatelAuthorUnsubmitted Done Reply Inline Actions Yes - we'd need to add a cast to make it work, but hopefully that would get folded away at some point. I'll add a TODO for now. spatel: Yes - we'd need to add a cast to make it work, but hopefully that would get folded away at some…
				; TODO: Merge down to narrow type?

	define i1 @olt_olt_and_f32_f64(float %w, float %x, double %y, double %z) {			define i1 @olt_olt_and_f32_f64(float %w, float %x, double %y, double %z) {
	; SSE2-LABEL: olt_olt_and_f32_f64:			; SSE2-LABEL: olt_olt_and_f32_f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomiss %xmm0, %xmm1			; SSE2-NEXT: ucomiss %xmm0, %xmm1
	; SSE2-NEXT: seta %cl			; SSE2-NEXT: seta %cl
	; SSE2-NEXT: ucomisd %xmm2, %xmm3			; SSE2-NEXT: ucomisd %xmm2, %xmm3
	; SSE2-NEXT: seta %al			; SSE2-NEXT: seta %al
	; SSE2-NEXT: andb %cl, %al			; SSE2-NEXT: andb %cl, %al
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: olt_olt_and_f32_f64:			; AVX-LABEL: olt_olt_and_f32_f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vucomiss %xmm0, %xmm1			; AVX-NEXT: vucomiss %xmm0, %xmm1
	; AVX-NEXT: seta %cl			; AVX-NEXT: seta %cl
	; AVX-NEXT: vucomisd %xmm2, %xmm3			; AVX-NEXT: vucomisd %xmm2, %xmm3
	; AVX-NEXT: seta %al			; AVX-NEXT: seta %al
	; AVX-NEXT: andb %cl, %al			; AVX-NEXT: andb %cl, %al
	; AVX-NEXT: retq			; AVX-NEXT: retq
				pengfeiUnsubmitted Not Done Reply Inline Actions Should we add a common `CHECK` for them? pengfei: Should we add a common `CHECK` for them?
				spatelAuthorUnsubmitted Done Reply Inline Actions It's the "v" in the mnemonics that makes these different. Do we have a scrubber option in the script for that? spatel: It's the "v" in the mnemonics that makes these different. Do we have a scrubber option in the…
				pengfeiUnsubmitted Not Done Reply Inline Actions Oh, yes. I don't think there's such an option. I did have the same thought. But I think it's not easy to achieve. It's valuable only when we check SSE and AVX instructions at the same time. Omitting 'v' unconditionally is over kill. pengfei: Oh, yes. I don't think there's such an option. I did have the same thought. But I think it's…
	%f1 = fcmp olt float %w, %x			%f1 = fcmp olt float %w, %x
	%f2 = fcmp olt double %y, %z			%f2 = fcmp olt double %y, %z
	%r = and i1 %f1, %f2			%r = and i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

				; This uses ucomis because of extra uses.

	define i1 @une_uno_xor_f64_use1(double %w, double %x, double %y, double %z, i1* %p) {			define i1 @une_uno_xor_f64_use1(double %w, double %x, double %y, double %z, i1* %p) {
	; SSE2-LABEL: une_uno_xor_f64_use1:			; SSE2-LABEL: une_uno_xor_f64_use1:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomisd %xmm1, %xmm0			; SSE2-NEXT: ucomisd %xmm1, %xmm0
	; SSE2-NEXT: setp %al			; SSE2-NEXT: setp %al
	; SSE2-NEXT: setne %cl			; SSE2-NEXT: setne %cl
	; SSE2-NEXT: orb %al, %cl			; SSE2-NEXT: orb %al, %cl
	; SSE2-NEXT: movb %cl, (%rdi)			; SSE2-NEXT: movb %cl, (%rdi)
	Show All 15 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%f1 = fcmp une double %w, %x			%f1 = fcmp une double %w, %x
	store i1 %f1, i1* %p			store i1 %f1, i1* %p
	%f2 = fcmp uno double %y, %z			%f2 = fcmp uno double %y, %z
	%r = xor i1 %f1, %f2			%r = xor i1 %f1, %f2
	ret i1 %r			ret i1 %r
	}			}

				; This uses ucomis because of extra uses.

	define i1 @une_uno_xor_f64_use2(double %w, double %x, double %y, double %z, i1* %p) {			define i1 @une_uno_xor_f64_use2(double %w, double %x, double %y, double %z, i1* %p) {
	; SSE2-LABEL: une_uno_xor_f64_use2:			; SSE2-LABEL: une_uno_xor_f64_use2:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: ucomisd %xmm1, %xmm0			; SSE2-NEXT: ucomisd %xmm1, %xmm0
	; SSE2-NEXT: setp %al			; SSE2-NEXT: setp %al
	; SSE2-NEXT: setne %cl			; SSE2-NEXT: setne %cl
	; SSE2-NEXT: orb %al, %cl			; SSE2-NEXT: orb %al, %cl
	; SSE2-NEXT: ucomisd %xmm3, %xmm2			; SSE2-NEXT: ucomisd %xmm3, %xmm2
	Show All 22 Lines

llvm/test/CodeGen/X86/lzcnt-zext-cmp.ll

	Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines
	}			}

	; PR31902 Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math.			; PR31902 Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math.
	define i32 @test_zext_cmp11(double %a, double %b) "no-nans-fp-math"="true" {			define i32 @test_zext_cmp11(double %a, double %b) "no-nans-fp-math"="true" {
	;			;
	; ALL-LABEL: test_zext_cmp11:			; ALL-LABEL: test_zext_cmp11:
	; ALL: # %bb.0: # %entry			; ALL: # %bb.0: # %entry
	; ALL-NEXT: vxorpd %xmm2, %xmm2, %xmm2			; ALL-NEXT: vxorpd %xmm2, %xmm2, %xmm2
	; ALL-NEXT: vucomisd %xmm2, %xmm0			; ALL-NEXT: vcmpeqpd %xmm2, %xmm1, %xmm1
	; ALL-NEXT: sete %al			; ALL-NEXT: vcmpeqpd %xmm2, %xmm0, %xmm0
	; ALL-NEXT: vucomisd %xmm2, %xmm1			; ALL-NEXT: vorpd %xmm1, %xmm0, %xmm0
	; ALL-NEXT: sete %cl			; ALL-NEXT: vmovd %xmm0, %eax
	; ALL-NEXT: orb %al, %cl			; ALL-NEXT: andl $1, %eax
	; ALL-NEXT: movzbl %cl, %eax
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%cmp = fcmp fast oeq double %a, 0.000000e+00			%cmp = fcmp fast oeq double %a, 0.000000e+00
	%cmp1 = fcmp fast oeq double %b, 0.000000e+00			%cmp1 = fcmp fast oeq double %b, 0.000000e+00
	%0 = or i1 %cmp, %cmp1			%0 = or i1 %cmp, %cmp1
	%conv = zext i1 %0 to i32			%conv = zext i1 %0 to i32
	ret i32 %conv			ret i32 %conv
	}			}