This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelDAGToDAG.cpp
-
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
1
handle-f16-storage-type.ll
-
recipest.ll
1/2
scalar_cmp.ll
-
toc-float.ll
-
vec_extract_p9.ll

Differential D118316

[PowerPC] Materialize special ConstantFP using instructions instead of load from TOC
AbandonedPublic

Authored by tingwang on Jan 26 2022, 10:24 PM.

Download Raw Diff

Details

Reviewers

nemanjai
jsji
shchenz
qiucf

Group Reviewers

Restricted Project

Summary

ConstantFP in the range of [-16, 15] (excluding 0.0, which is done by XXLXOR) that can be exactly converted to integer can be materialized into register using VSPLTISW and XVCVSXWDP. This will reduce TOC usage, get rid of memory reference, and maybe save some TOC pointer calculations.

ConstantFP is expanded into load from TOC during ExpandNode. This patch identifies the opportunity before expansion, and does the manual instruction selection later. LIT test shows this patch exposes some other issues, please see my comments on the test scripts.

Since Power10 use prefixed instructions to materialize ConstantFP, maybe target Power9 for this change.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,090 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,170 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c

Event Timeline

tingwang created this revision.Jan 26 2022, 10:24 PM

Herald added subscribers: kbarton, hiraditya. · View Herald TranscriptJan 26 2022, 10:24 PM

tingwang requested review of this revision.Jan 26 2022, 10:24 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 26 2022, 10:24 PM

tingwang added inline comments.Jan 26 2022, 10:26 PM

llvm/test/CodeGen/PowerPC/handle-f16-storage-type.ll
1255	"beqlr 0" converted into "beq 0, .LBB0_2; .LBB0_2: blr" sequence, introduced dummy jump. This maybe one issue.
llvm/test/CodeGen/PowerPC/scalar_cmp.ll
908	Given the operand is known and isFPImmLegal(<negative>) is checked, fsub is converted into fadd. Combining: t15: f64 = fsub t2, ConstantFP:f64<1.000000e+00> Creating fp constant: t17: f64 = ConstantFP<-1.000000e+00> Creating new node: t18: f64 = fadd t2, ConstantFP:f64<-1.000000e+00> ... into: t18: f64 = fadd t2, ConstantFP:f64<-1.000000e+00>
932	This maybe one issue: Given below DAG: t16: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setlt:ch t18: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setuo:ch t19: i1 = or t16, t18 Some logic optimized the second setcc: Combining: t18: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setuo:ch Creating new node: t23: i1 = setcc t2, t2, setuo:ch So that resulted two statements which should have been executed in one fcmpu if there is no combine. t16: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setlt:ch t23: i1 = setcc t2, t2, setuo:ch t19: i1 = or t16, t23

Harbormaster completed remote builds in B145919: Diff 403494.Jan 27 2022, 8:23 AM

Used clang-format to tidy the code style

tingwang marked 2 inline comments as not done.Feb 6 2022, 5:33 PM

Harbormaster completed remote builds in B147862: Diff 406305.Feb 6 2022, 6:37 PM

I am wondering if you have done any performance comparison or even just latency/throughput "back-of-the-envelope" computation for this. It doesn't seem obvious to me that this is better than a CP load. The conversion is a fairly expensive instruction. Another less compelling thing to consider is that in high register pressure situations (for VMX registers), the VSPLTISW may cause a spill.

One thing to note is that this necessarily replaces the cracked lfs instruction which is also expensive, so this transformation may be worthwhile. However, I wonder how this transformation would compare to stopping the conversion from double precision to single precision for constants on Power9 (i.e. return false from ShouldShrinkFPConstant() on Power9). This would of course grow the constant pool in situations where a lot of compact constants are loaded, but presumably this is a somewhat rare situation.

In D118316#3300414, @nemanjai wrote:

I am wondering if you have done any performance comparison or even just latency/throughput "back-of-the-envelope" computation for this. It doesn't seem obvious to me that this is better than a CP load. The conversion is a fairly expensive instruction. Another less compelling thing to consider is that in high register pressure situations (for VMX registers), the VSPLTISW may cause a spill.

One thing to note is that this necessarily replaces the cracked lfs instruction which is also expensive, so this transformation may be worthwhile. However, I wonder how this transformation would compare to stopping the conversion from double precision to single precision for constants on Power9 (i.e. return false from ShouldShrinkFPConstant() on Power9). This would of course grow the constant pool in situations where a lot of compact constants are loaded, but presumably this is a somewhat rare situation.

Thank you for looking into this. I'm planning to do performance comparison, and will check/compare the performance with stopping the conversion from double to single as you mentioned (LFS vs. LFD). I do agree the 'VSPLTISW' may cause a vector spill is a concern, and that cost is higher than CP load spill fixed point register, although this depends on register usage.

Simple calculation shows that from instruction latency perspective (P9InstrResources.td) VSPLTISW + XVCVSXWDP is 4 + 7 = 11, and CP load ADDI + ADDIS + LFS is 2 + 2 + 7 = 11, which is the same. Although CP load has memory access, its ADDIS maybe saved in consecutive CP loads. Looks like this maybe a draw, so I do need to check the real performance.

In D118316#3300414, @nemanjai wrote:

I am wondering if you have done any performance comparison or even just latency/throughput "back-of-the-envelope" computation for this. It doesn't seem obvious to me that this is better than a CP load. The conversion is a fairly expensive instruction. Another less compelling thing to consider is that in high register pressure situations (for VMX registers), the VSPLTISW may cause a spill.

One thing to note is that this necessarily replaces the cracked lfs instruction which is also expensive, so this transformation may be worthwhile. However, I wonder how this transformation would compare to stopping the conversion from double precision to single precision for constants on Power9 (i.e. return false from ShouldShrinkFPConstant() on Power9). This would of course grow the constant pool in situations where a lot of compact constants are loaded, but presumably this is a somewhat rare situation.

Hi Nemanja, I measured there is no performance change on spec base fprate, so I think should abandon this one. Sorry for the bother. By the way, flip ShouldShrinkFPConstant() improved single-copy base fprate by about 0.2%, though it's single run result.

No perf gain according to some spec test.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCISelDAGToDAG.cpp

23 lines

PPCISelLowering.cpp

14 lines

test/

CodeGen/

PowerPC/

handle-f16-storage-type.ll

8 lines

60 lines

36 lines

5 lines

8 lines

Diff 406305

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,000 Lines • ▼ Show 20 Lines	void PPCDAGToDAGISel::Select(SDNode *N) {

case ISD::Constant:		case ISD::Constant:
if (N->getValueType(0) == MVT::i64) {		if (N->getValueType(0) == MVT::i64) {
ReplaceNode(N, selectI64Imm(CurDAG, N));		ReplaceNode(N, selectI64Imm(CurDAG, N));
return;		return;
}		}
break;		break;

		case ISD::ConstantFP:
		if (Subtarget->isISA3_0() && Subtarget->isPPC64() &&
		!Subtarget->hasPrefixInstrs() &&
		(N->getValueType(0) == MVT::f64 \|\| N->getValueType(0) == MVT::f32)) {
		ConstantFPSDNode *CFP = cast<ConstantFPSDNode>(N);
		bool isExact = false;
		APSInt Val(5, false);
		ConstantFP LLVMC = const_cast<ConstantFP >(CFP->getConstantFPValue());
		if (LLVMC->getValueAPF().convertToInteger(Val, APFloat::rmTowardZero,
		&isExact) == APFloat::opOK &&
		isExact == true && Val.getExtValue() != 0) {
		SDNode *Op1 = CurDAG->getMachineNode(PPC::VSPLTISW, dl, MVT::v4i32,
		getI32Imm(Val.getExtValue(), dl));
		SDNode *Op2 = CurDAG->getMachineNode(PPC::XVCVSXWDP, dl, MVT::v2f64,
		SDValue(Op1, 0));
		CurDAG->SelectNodeTo(N, PPC::COPY_TO_REGCLASS, N->getValueType(0),
		SDValue(Op2, 0),
		getI32Imm(PPC::VSFRCRegClassID, dl));
		return;
		}
		}
		break;

case ISD::INTRINSIC_VOID: {		case ISD::INTRINSIC_VOID: {
auto IntrinsicID = N->getConstantOperandVal(1);		auto IntrinsicID = N->getConstantOperandVal(1);
if (IntrinsicID == Intrinsic::ppc_tdw \|\| IntrinsicID == Intrinsic::ppc_tw) {		if (IntrinsicID == Intrinsic::ppc_tdw \|\| IntrinsicID == Intrinsic::ppc_tw) {
unsigned Opcode = IntrinsicID == Intrinsic::ppc_tdw ? PPC::TDI : PPC::TWI;		unsigned Opcode = IntrinsicID == Intrinsic::ppc_tdw ? PPC::TDI : PPC::TWI;
SDValue Ops[] = {N->getOperand(4), N->getOperand(2), N->getOperand(3)};		SDValue Ops[] = {N->getOperand(4), N->getOperand(2), N->getOperand(3)};
int16_t SImmOperand2;		int16_t SImmOperand2;
int16_t SImmOperand3;		int16_t SImmOperand3;
int16_t SImmOperand4;		int16_t SImmOperand4;
▲ Show 20 Lines • Show All 2,485 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,635 Lines • ▼ Show 20 Lines	default:
// false. Examples: f16, f80.		// false. Examples: f16, f80.
return false;		return false;
case MVT::f32:		case MVT::f32:
case MVT::f64:		case MVT::f64:
if (Subtarget.hasPrefixInstrs()) {		if (Subtarget.hasPrefixInstrs()) {
// we can materialize all immediatess via XXSPLTI32DX and XXSPLTIDP.		// we can materialize all immediatess via XXSPLTI32DX and XXSPLTIDP.
return true;		return true;
}		}
		if (Subtarget.isISA3_0() && Subtarget.isPPC64() &&
		!Subtarget.hasPrefixInstrs()) {
		// Currently target P9 only
		bool isExact = false;
		APSInt Value(5, false);
		if (Imm.convertToInteger(Value, APFloat::rmTowardZero, &isExact) ==
		APFloat::opOK &&
		isExact == true && Value.getExtValue() != 0) {
		// we can materialize FP in range [-16.0, 15.0] excluding zeros that
		// can be exactly converted to integer via VSPLTISW and XVCVSXWDP.
		// 0.0 will select XXLXOR, -0.0 is not exact.
		return true;
		}
		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case MVT::ppcf128:		case MVT::ppcf128:
return Imm.isPosZero();		return Imm.isPosZero();
}		}
}		}

// For vector shift operation op, fold		// For vector shift operation op, fold
// (op x, (and y, ((1 << numbits(x)) - 1))) -> (target op x, y)		// (op x, (and y, ((1 << numbits(x)) - 1))) -> (target op x, y)
▲ Show 20 Lines • Show All 1,141 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/handle-f16-storage-type.ll

	Show First 20 Lines • Show All 1,246 Lines • ▼ Show 20 Lines
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xscvdphp f0, f1			; CHECK-NEXT: xscvdphp f0, f1
	; CHECK-NEXT: xxlxor f1, f1, f1			; CHECK-NEXT: xxlxor f1, f1, f1
	; CHECK-NEXT: mffprwz r3, f0			; CHECK-NEXT: mffprwz r3, f0
	; CHECK-NEXT: clrlwi r3, r3, 16			; CHECK-NEXT: clrlwi r3, r3, 16
	; CHECK-NEXT: mtfprwz f0, r3			; CHECK-NEXT: mtfprwz f0, r3
	; CHECK-NEXT: xscvhpdp f0, f0			; CHECK-NEXT: xscvhpdp f0, f0
	; CHECK-NEXT: fcmpu cr0, f0, f1			; CHECK-NEXT: fcmpu cr0, f0, f1
	; CHECK-NEXT: beqlr cr0			; CHECK-NEXT: beq cr0, .LBB20_2
				tingwangAuthorUnsubmitted Not Done Reply Inline Actions "beqlr 0" converted into "beq 0, .LBB0_2; .LBB0_2: blr" sequence, introduced dummy jump. This maybe one issue. tingwang: "beqlr 0" converted into "beq 0, .LBB0_2; .LBB0_2: blr" sequence, introduced dummy jump. This…
	; CHECK-NEXT: # %bb.1:			; CHECK-NEXT: # %bb.1:
	; CHECK-NEXT: addis r3, r2, .LCPI20_0@toc@ha			; CHECK-NEXT: vspltisw v2, 1
	; CHECK-NEXT: lfs f1, .LCPI20_0@toc@l(r3)			; CHECK-NEXT: xvcvsxwdp vs1, vs34
				; CHECK-NEXT: .LBB20_2:
				; CHECK-NEXT: # kill: def $f1 killed $f1 killed $vsl1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	;			;
	; SOFT-LABEL: PR40273:			; SOFT-LABEL: PR40273:
	; SOFT: # %bb.0:			; SOFT: # %bb.0:
	; SOFT-NEXT: mflr r0			; SOFT-NEXT: mflr r0
	; SOFT-NEXT: std r0, 16(r1)			; SOFT-NEXT: std r0, 16(r1)
	; SOFT-NEXT: stdu r1, -32(r1)			; SOFT-NEXT: stdu r1, -32(r1)
	; SOFT-NEXT: clrldi r3, r3, 48			; SOFT-NEXT: clrldi r3, r3, 48
	Show All 19 Lines

llvm/test/CodeGen/PowerPC/recipest.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsmuldp 0, 0, 5			; CHECK-P8-NEXT: xsmuldp 0, 0, 5
	; CHECK-P8-NEXT: xsmuldp 0, 0, 4			; CHECK-P8-NEXT: xsmuldp 0, 0, 4
	; CHECK-P8-NEXT: xsmuldp 1, 1, 0			; CHECK-P8-NEXT: xsmuldp 1, 1, 0
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: foo_fmf:			; CHECK-P9-LABEL: foo_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: xsrsqrtedp 0, 2			; CHECK-P9-NEXT: xsrsqrtedp 0, 2
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI0_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI0_0@toc@ha
	; CHECK-P9-NEXT: lfs 4, .LCPI0_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI0_1@toc@ha
	; CHECK-P9-NEXT: xsmuldp 3, 2, 0			; CHECK-P9-NEXT: xsmuldp 3, 2, 0
				; CHECK-P9-NEXT: xvcvsxwdp 4, 34
	; CHECK-P9-NEXT: fmr 5, 4			; CHECK-P9-NEXT: fmr 5, 4
	; CHECK-P9-NEXT: xsmaddadp 5, 3, 0			; CHECK-P9-NEXT: xsmaddadp 5, 3, 0
	; CHECK-P9-NEXT: lfs 3, .LCPI0_1@toc@l(3)			; CHECK-P9-NEXT: lfs 3, .LCPI0_0@toc@l(3)
	; CHECK-P9-NEXT: xsmuldp 0, 0, 3			; CHECK-P9-NEXT: xsmuldp 0, 0, 3
	; CHECK-P9-NEXT: xsmuldp 0, 0, 5			; CHECK-P9-NEXT: xsmuldp 0, 0, 5
	; CHECK-P9-NEXT: xsmuldp 2, 2, 0			; CHECK-P9-NEXT: xsmuldp 2, 2, 0
	; CHECK-P9-NEXT: xsmaddadp 4, 2, 0			; CHECK-P9-NEXT: xsmaddadp 4, 2, 0
	; CHECK-P9-NEXT: xsmuldp 0, 0, 3			; CHECK-P9-NEXT: xsmuldp 0, 0, 3
	; CHECK-P9-NEXT: xsmuldp 0, 0, 4			; CHECK-P9-NEXT: xsmuldp 0, 0, 4
	; CHECK-P9-NEXT: xsmuldp 1, 1, 0			; CHECK-P9-NEXT: xsmuldp 1, 1, 0
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsmaddasp 4, 2, 0			; CHECK-P8-NEXT: xsmaddasp 4, 2, 0
	; CHECK-P8-NEXT: xsmulsp 0, 3, 4			; CHECK-P8-NEXT: xsmulsp 0, 3, 4
	; CHECK-P8-NEXT: xsmuldp 1, 1, 0			; CHECK-P8-NEXT: xsmuldp 1, 1, 0
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: foof_fmf:			; CHECK-P9-LABEL: foof_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: xsrsqrtesp 0, 2			; CHECK-P9-NEXT: xsrsqrtesp 0, 2
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI3_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI3_0@toc@ha
	; CHECK-P9-NEXT: lfs 3, .LCPI3_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI3_1@toc@ha
	; CHECK-P9-NEXT: xsmulsp 2, 2, 0			; CHECK-P9-NEXT: xsmulsp 2, 2, 0
				; CHECK-P9-NEXT: xvcvsxwdp 3, 34
	; CHECK-P9-NEXT: xsmaddasp 3, 2, 0			; CHECK-P9-NEXT: xsmaddasp 3, 2, 0
	; CHECK-P9-NEXT: lfs 2, .LCPI3_1@toc@l(3)			; CHECK-P9-NEXT: lfs 2, .LCPI3_0@toc@l(3)
	; CHECK-P9-NEXT: xsmulsp 0, 0, 2			; CHECK-P9-NEXT: xsmulsp 0, 0, 2
	; CHECK-P9-NEXT: xsmulsp 0, 0, 3			; CHECK-P9-NEXT: xsmulsp 0, 0, 3
	; CHECK-P9-NEXT: xsmuldp 1, 1, 0			; CHECK-P9-NEXT: xsmuldp 1, 1, 0
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	%x = call contract reassoc arcp float @llvm.sqrt.f32(float %b)			%x = call contract reassoc arcp float @llvm.sqrt.f32(float %b)
	%y = fpext float %x to double			%y = fpext float %x to double
	%r = fdiv contract reassoc arcp double %a, %y			%r = fdiv contract reassoc arcp double %a, %y
	ret double %r			ret double %r
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsmuldp 0, 0, 4			; CHECK-P8-NEXT: xsmuldp 0, 0, 4
	; CHECK-P8-NEXT: xsrsp 0, 0			; CHECK-P8-NEXT: xsrsp 0, 0
	; CHECK-P8-NEXT: xsmulsp 1, 1, 0			; CHECK-P8-NEXT: xsmulsp 1, 1, 0
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: food_fmf:			; CHECK-P9-LABEL: food_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: xsrsqrtedp 0, 2			; CHECK-P9-NEXT: xsrsqrtedp 0, 2
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI5_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI5_0@toc@ha
	; CHECK-P9-NEXT: lfs 4, .LCPI5_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI5_1@toc@ha
	; CHECK-P9-NEXT: xsmuldp 3, 2, 0			; CHECK-P9-NEXT: xsmuldp 3, 2, 0
				; CHECK-P9-NEXT: xvcvsxwdp 4, 34
	; CHECK-P9-NEXT: fmr 5, 4			; CHECK-P9-NEXT: fmr 5, 4
	; CHECK-P9-NEXT: xsmaddadp 5, 3, 0			; CHECK-P9-NEXT: xsmaddadp 5, 3, 0
	; CHECK-P9-NEXT: lfs 3, .LCPI5_1@toc@l(3)			; CHECK-P9-NEXT: lfs 3, .LCPI5_0@toc@l(3)
	; CHECK-P9-NEXT: xsmuldp 0, 0, 3			; CHECK-P9-NEXT: xsmuldp 0, 0, 3
	; CHECK-P9-NEXT: xsmuldp 0, 0, 5			; CHECK-P9-NEXT: xsmuldp 0, 0, 5
	; CHECK-P9-NEXT: xsmuldp 2, 2, 0			; CHECK-P9-NEXT: xsmuldp 2, 2, 0
	; CHECK-P9-NEXT: xsmaddadp 4, 2, 0			; CHECK-P9-NEXT: xsmaddadp 4, 2, 0
	; CHECK-P9-NEXT: xsmuldp 0, 0, 3			; CHECK-P9-NEXT: xsmuldp 0, 0, 3
	; CHECK-P9-NEXT: xsmuldp 0, 0, 4			; CHECK-P9-NEXT: xsmuldp 0, 0, 4
	; CHECK-P9-NEXT: xsrsp 0, 0			; CHECK-P9-NEXT: xsrsp 0, 0
	; CHECK-P9-NEXT: xsmulsp 1, 1, 0			; CHECK-P9-NEXT: xsmulsp 1, 1, 0
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsmaddasp 4, 2, 0			; CHECK-P8-NEXT: xsmaddasp 4, 2, 0
	; CHECK-P8-NEXT: xsmulsp 0, 3, 4			; CHECK-P8-NEXT: xsmulsp 0, 3, 4
	; CHECK-P8-NEXT: xsmulsp 1, 1, 0			; CHECK-P8-NEXT: xsmulsp 1, 1, 0
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: goo_fmf:			; CHECK-P9-LABEL: goo_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: xsrsqrtesp 0, 2			; CHECK-P9-NEXT: xsrsqrtesp 0, 2
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI7_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI7_0@toc@ha
	; CHECK-P9-NEXT: lfs 3, .LCPI7_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI7_1@toc@ha
	; CHECK-P9-NEXT: xsmulsp 2, 2, 0			; CHECK-P9-NEXT: xsmulsp 2, 2, 0
				; CHECK-P9-NEXT: xvcvsxwdp 3, 34
	; CHECK-P9-NEXT: xsmaddasp 3, 2, 0			; CHECK-P9-NEXT: xsmaddasp 3, 2, 0
	; CHECK-P9-NEXT: lfs 2, .LCPI7_1@toc@l(3)			; CHECK-P9-NEXT: lfs 2, .LCPI7_0@toc@l(3)
	; CHECK-P9-NEXT: xsmulsp 0, 0, 2			; CHECK-P9-NEXT: xsmulsp 0, 0, 2
	; CHECK-P9-NEXT: xsmulsp 0, 0, 3			; CHECK-P9-NEXT: xsmulsp 0, 0, 3
	; CHECK-P9-NEXT: xsmulsp 1, 1, 0			; CHECK-P9-NEXT: xsmulsp 1, 1, 0
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	%x = call contract reassoc arcp float @llvm.sqrt.f32(float %b)			%x = call contract reassoc arcp float @llvm.sqrt.f32(float %b)
	%r = fdiv contract reassoc arcp float %a, %x			%r = fdiv contract reassoc arcp float %a, %x
	ret float %r			ret float %r
	}			}
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsnmsubasp 0, 2, 4			; CHECK-P8-NEXT: xsnmsubasp 0, 2, 4
	; CHECK-P8-NEXT: xsmaddasp 4, 1, 0			; CHECK-P8-NEXT: xsmaddasp 4, 1, 0
	; CHECK-P8-NEXT: xsmulsp 1, 3, 4			; CHECK-P8-NEXT: xsmulsp 1, 3, 4
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: rsqrt_fmul_fmf:			; CHECK-P9-LABEL: rsqrt_fmul_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: xsrsqrtesp 0, 1			; CHECK-P9-NEXT: xsrsqrtesp 0, 1
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI10_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI10_0@toc@ha
	; CHECK-P9-NEXT: lfs 4, .LCPI10_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI10_1@toc@ha
	; CHECK-P9-NEXT: xsmulsp 1, 1, 0			; CHECK-P9-NEXT: xsmulsp 1, 1, 0
				; CHECK-P9-NEXT: xvcvsxwdp 4, 34
	; CHECK-P9-NEXT: xsmaddasp 4, 1, 0			; CHECK-P9-NEXT: xsmaddasp 4, 1, 0
	; CHECK-P9-NEXT: lfs 1, .LCPI10_1@toc@l(3)			; CHECK-P9-NEXT: lfs 1, .LCPI10_0@toc@l(3)
	; CHECK-P9-NEXT: xsmulsp 0, 0, 1			; CHECK-P9-NEXT: xsmulsp 0, 0, 1
	; CHECK-P9-NEXT: xsresp 1, 2			; CHECK-P9-NEXT: xsresp 1, 2
	; CHECK-P9-NEXT: xsmulsp 0, 0, 4			; CHECK-P9-NEXT: xsmulsp 0, 0, 4
	; CHECK-P9-NEXT: xsmulsp 4, 0, 1			; CHECK-P9-NEXT: xsmulsp 4, 0, 1
	; CHECK-P9-NEXT: xsnmsubasp 0, 2, 4			; CHECK-P9-NEXT: xsnmsubasp 0, 2, 4
	; CHECK-P9-NEXT: xsmaddasp 4, 1, 0			; CHECK-P9-NEXT: xsmaddasp 4, 1, 0
	; CHECK-P9-NEXT: xsmulsp 1, 3, 4			; CHECK-P9-NEXT: xsmulsp 1, 3, 4
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsmuldp 0, 1, 3			; CHECK-P8-NEXT: xsmuldp 0, 1, 3
	; CHECK-P8-NEXT: xsnmsubadp 1, 2, 0			; CHECK-P8-NEXT: xsnmsubadp 1, 2, 0
	; CHECK-P8-NEXT: xsmaddadp 0, 3, 1			; CHECK-P8-NEXT: xsmaddadp 0, 3, 1
	; CHECK-P8-NEXT: fmr 1, 0			; CHECK-P8-NEXT: fmr 1, 0
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: foo2_fmf:			; CHECK-P9-LABEL: foo2_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: addis 3, 2, .LCPI14_0@toc@ha			; CHECK-P9-NEXT: vspltisw 2, -1
	; CHECK-P9-NEXT: xsredp 3, 2			; CHECK-P9-NEXT: xsredp 3, 2
	; CHECK-P9-NEXT: lfs 0, .LCPI14_0@toc@l(3)			; CHECK-P9-NEXT: xvcvsxwdp 0, 34
	; CHECK-P9-NEXT: xsmaddadp 0, 2, 3			; CHECK-P9-NEXT: xsmaddadp 0, 2, 3
	; CHECK-P9-NEXT: xsnmsubadp 3, 3, 0			; CHECK-P9-NEXT: xsnmsubadp 3, 3, 0
	; CHECK-P9-NEXT: xsmuldp 0, 1, 3			; CHECK-P9-NEXT: xsmuldp 0, 1, 3
	; CHECK-P9-NEXT: xsnmsubadp 1, 2, 0			; CHECK-P9-NEXT: xsnmsubadp 1, 2, 0
	; CHECK-P9-NEXT: xsmaddadp 0, 3, 1			; CHECK-P9-NEXT: xsmaddadp 0, 3, 1
	; CHECK-P9-NEXT: fmr 1, 0			; CHECK-P9-NEXT: fmr 1, 0
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	%r = fdiv contract reassoc arcp nsz ninf double %a, %b			%r = fdiv contract reassoc arcp nsz ninf double %a, %b
	▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: foo3_fmf:			; CHECK-P9-LABEL: foo3_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: xstsqrtdp 0, 1			; CHECK-P9-NEXT: xstsqrtdp 0, 1
	; CHECK-P9-NEXT: bc 12, 2, .LBB20_2			; CHECK-P9-NEXT: bc 12, 2, .LBB20_2
	; CHECK-P9-NEXT: # %bb.1:			; CHECK-P9-NEXT: # %bb.1:
	; CHECK-P9-NEXT: xsrsqrtedp 0, 1			; CHECK-P9-NEXT: xsrsqrtedp 0, 1
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI20_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI20_0@toc@ha
	; CHECK-P9-NEXT: lfs 3, .LCPI20_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI20_1@toc@ha
	; CHECK-P9-NEXT: xsmuldp 2, 1, 0			; CHECK-P9-NEXT: xsmuldp 2, 1, 0
				; CHECK-P9-NEXT: xvcvsxwdp 3, 34
	; CHECK-P9-NEXT: fmr 4, 3			; CHECK-P9-NEXT: fmr 4, 3
	; CHECK-P9-NEXT: xsmaddadp 4, 2, 0			; CHECK-P9-NEXT: xsmaddadp 4, 2, 0
	; CHECK-P9-NEXT: lfs 2, .LCPI20_1@toc@l(3)			; CHECK-P9-NEXT: lfs 2, .LCPI20_0@toc@l(3)
	; CHECK-P9-NEXT: xsmuldp 0, 0, 2			; CHECK-P9-NEXT: xsmuldp 0, 0, 2
	; CHECK-P9-NEXT: xsmuldp 0, 0, 4			; CHECK-P9-NEXT: xsmuldp 0, 0, 4
	; CHECK-P9-NEXT: xsmuldp 1, 1, 0			; CHECK-P9-NEXT: xsmuldp 1, 1, 0
	; CHECK-P9-NEXT: xsmaddadp 3, 1, 0			; CHECK-P9-NEXT: xsmaddadp 3, 1, 0
	; CHECK-P9-NEXT: xsmuldp 0, 1, 2			; CHECK-P9-NEXT: xsmuldp 0, 1, 2
	; CHECK-P9-NEXT: xsmuldp 1, 0, 3			; CHECK-P9-NEXT: xsmuldp 1, 0, 3
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	; CHECK-P9-NEXT: .LBB20_2:			; CHECK-P9-NEXT: .LBB20_2:
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsmuldp 1, 0, 3			; CHECK-P8-NEXT: xsmuldp 1, 0, 3
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	; CHECK-P8-NEXT: .LBB21_2:			; CHECK-P8-NEXT: .LBB21_2:
	; CHECK-P8-NEXT: xssqrtdp 1, 1			; CHECK-P8-NEXT: xssqrtdp 1, 1
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: foo3_fmf_crbits_off:			; CHECK-P9-LABEL: foo3_fmf_crbits_off:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: addis 3, 2, .LCPI21_2@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI21_1@toc@ha
	; CHECK-P9-NEXT: xsabsdp 0, 1			; CHECK-P9-NEXT: xsabsdp 0, 1
	; CHECK-P9-NEXT: lfd 2, .LCPI21_2@toc@l(3)			; CHECK-P9-NEXT: lfd 2, .LCPI21_1@toc@l(3)
	; CHECK-P9-NEXT: xscmpudp 0, 0, 2			; CHECK-P9-NEXT: xscmpudp 0, 0, 2
	; CHECK-P9-NEXT: blt 0, .LBB21_2			; CHECK-P9-NEXT: blt 0, .LBB21_2
	; CHECK-P9-NEXT: # %bb.1:			; CHECK-P9-NEXT: # %bb.1:
	; CHECK-P9-NEXT: xsrsqrtedp 0, 1			; CHECK-P9-NEXT: xsrsqrtedp 0, 1
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI21_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI21_0@toc@ha
	; CHECK-P9-NEXT: lfs 3, .LCPI21_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI21_1@toc@ha
	; CHECK-P9-NEXT: xsmuldp 2, 1, 0			; CHECK-P9-NEXT: xsmuldp 2, 1, 0
				; CHECK-P9-NEXT: xvcvsxwdp 3, 34
	; CHECK-P9-NEXT: fmr 4, 3			; CHECK-P9-NEXT: fmr 4, 3
	; CHECK-P9-NEXT: xsmaddadp 4, 2, 0			; CHECK-P9-NEXT: xsmaddadp 4, 2, 0
	; CHECK-P9-NEXT: lfs 2, .LCPI21_1@toc@l(3)			; CHECK-P9-NEXT: lfs 2, .LCPI21_0@toc@l(3)
	; CHECK-P9-NEXT: xsmuldp 0, 0, 2			; CHECK-P9-NEXT: xsmuldp 0, 0, 2
	; CHECK-P9-NEXT: xsmuldp 0, 0, 4			; CHECK-P9-NEXT: xsmuldp 0, 0, 4
	; CHECK-P9-NEXT: xsmuldp 1, 1, 0			; CHECK-P9-NEXT: xsmuldp 1, 1, 0
	; CHECK-P9-NEXT: xsmaddadp 3, 1, 0			; CHECK-P9-NEXT: xsmaddadp 3, 1, 0
	; CHECK-P9-NEXT: xsmuldp 0, 1, 2			; CHECK-P9-NEXT: xsmuldp 0, 1, 2
	; CHECK-P9-NEXT: xsmuldp 1, 0, 3			; CHECK-P9-NEXT: xsmuldp 1, 0, 3
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	; CHECK-P9-NEXT: .LBB21_2:			; CHECK-P9-NEXT: .LBB21_2:
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-P8-NEXT: xsmulsp 0, 1, 3			; CHECK-P8-NEXT: xsmulsp 0, 1, 3
	; CHECK-P8-NEXT: xsmulsp 0, 0, 2			; CHECK-P8-NEXT: xsmulsp 0, 0, 2
	; CHECK-P8-NEXT: .LBB23_2:			; CHECK-P8-NEXT: .LBB23_2:
	; CHECK-P8-NEXT: fmr 1, 0			; CHECK-P8-NEXT: fmr 1, 0
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: goo3_fmf:			; CHECK-P9-LABEL: goo3_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: addis 3, 2, .LCPI23_2@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI23_1@toc@ha
	; CHECK-P9-NEXT: xsabsdp 0, 1			; CHECK-P9-NEXT: xsabsdp 0, 1
	; CHECK-P9-NEXT: lfs 2, .LCPI23_2@toc@l(3)			; CHECK-P9-NEXT: lfs 2, .LCPI23_1@toc@l(3)
	; CHECK-P9-NEXT: fcmpu 0, 0, 2			; CHECK-P9-NEXT: fcmpu 0, 0, 2
	; CHECK-P9-NEXT: xxlxor 0, 0, 0			; CHECK-P9-NEXT: xxlxor 0, 0, 0
	; CHECK-P9-NEXT: blt 0, .LBB23_2			; CHECK-P9-NEXT: blt 0, .LBB23_2
	; CHECK-P9-NEXT: # %bb.1:			; CHECK-P9-NEXT: # %bb.1:
	; CHECK-P9-NEXT: xsrsqrtesp 0, 1			; CHECK-P9-NEXT: xsrsqrtesp 0, 1
				; CHECK-P9-NEXT: vspltisw 2, -3
	; CHECK-P9-NEXT: addis 3, 2, .LCPI23_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI23_0@toc@ha
	; CHECK-P9-NEXT: lfs 2, .LCPI23_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI23_1@toc@ha
	; CHECK-P9-NEXT: xsmulsp 1, 1, 0			; CHECK-P9-NEXT: xsmulsp 1, 1, 0
				; CHECK-P9-NEXT: xvcvsxwdp 2, 34
	; CHECK-P9-NEXT: xsmaddasp 2, 1, 0			; CHECK-P9-NEXT: xsmaddasp 2, 1, 0
	; CHECK-P9-NEXT: lfs 0, .LCPI23_1@toc@l(3)			; CHECK-P9-NEXT: lfs 0, .LCPI23_0@toc@l(3)
	; CHECK-P9-NEXT: xsmulsp 0, 1, 0			; CHECK-P9-NEXT: xsmulsp 0, 1, 0
	; CHECK-P9-NEXT: xsmulsp 0, 0, 2			; CHECK-P9-NEXT: xsmulsp 0, 0, 2
	; CHECK-P9-NEXT: .LBB23_2:			; CHECK-P9-NEXT: .LBB23_2:
	; CHECK-P9-NEXT: fmr 1, 0			; CHECK-P9-NEXT: fmr 1, 0
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	%r = call contract reassoc ninf afn float @llvm.sqrt.f32(float %a)			%r = call contract reassoc ninf afn float @llvm.sqrt.f32(float %a)
	ret float %r			ret float %r
	}			}
	▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/scalar_cmp.ll

	Show First 20 Lines • Show All 897 Lines • ▼ Show 20 Lines
	; FAST-P8-NEXT: addis r3, r2, .LCPI24_0@toc@ha			; FAST-P8-NEXT: addis r3, r2, .LCPI24_0@toc@ha
	; FAST-P8-NEXT: lfs f0, .LCPI24_0@toc@l(r3)			; FAST-P8-NEXT: lfs f0, .LCPI24_0@toc@l(r3)
	; FAST-P8-NEXT: xssubdp f0, f1, f0			; FAST-P8-NEXT: xssubdp f0, f1, f0
	; FAST-P8-NEXT: fsel f1, f0, f2, f3			; FAST-P8-NEXT: fsel f1, f0, f2, f3
	; FAST-P8-NEXT: blr			; FAST-P8-NEXT: blr
	;			;
	; FAST-P9-LABEL: onecmp1:			; FAST-P9-LABEL: onecmp1:
	; FAST-P9: # %bb.0: # %entry			; FAST-P9: # %bb.0: # %entry
	; FAST-P9-NEXT: addis r3, r2, .LCPI24_0@toc@ha			; FAST-P9-NEXT: vspltisw v2, -1
	; FAST-P9-NEXT: lfs f0, .LCPI24_0@toc@l(r3)			; FAST-P9-NEXT: xvcvsxwdp vs0, vs34
	; FAST-P9-NEXT: xssubdp f0, f1, f0			; FAST-P9-NEXT: xsadddp f0, f1, f0
				tingwangAuthorUnsubmitted Done Reply Inline Actions Given the operand is known and isFPImmLegal(<negative>) is checked, fsub is converted into fadd. Combining: t15: f64 = fsub t2, ConstantFP:f64<1.000000e+00> Creating fp constant: t17: f64 = ConstantFP<-1.000000e+00> Creating new node: t18: f64 = fadd t2, ConstantFP:f64<-1.000000e+00> ... into: t18: f64 = fadd t2, ConstantFP:f64<-1.000000e+00> tingwang: Given the operand is known and isFPImmLegal(<negative>) is checked, fsub is converted into fadd.
	; FAST-P9-NEXT: fsel f1, f0, f2, f3			; FAST-P9-NEXT: fsel f1, f0, f2, f3
	; FAST-P9-NEXT: blr			; FAST-P9-NEXT: blr
	;			;
	; NO-FAST-P8-LABEL: onecmp1:			; NO-FAST-P8-LABEL: onecmp1:
	; NO-FAST-P8: # %bb.0: # %entry			; NO-FAST-P8: # %bb.0: # %entry
	; NO-FAST-P8-NEXT: addis r3, r2, .LCPI24_0@toc@ha			; NO-FAST-P8-NEXT: addis r3, r2, .LCPI24_0@toc@ha
	; NO-FAST-P8-NEXT: lfs f0, .LCPI24_0@toc@l(r3)			; NO-FAST-P8-NEXT: lfs f0, .LCPI24_0@toc@l(r3)
	; NO-FAST-P8-NEXT: fcmpu cr0, f1, f0			; NO-FAST-P8-NEXT: fcmpu cr0, f1, f0
	; NO-FAST-P8-NEXT: cror 4*cr5+lt, lt, un			; NO-FAST-P8-NEXT: cror 4*cr5+lt, lt, un
	; NO-FAST-P8-NEXT: bc 12, 4*cr5+lt, .LBB24_2			; NO-FAST-P8-NEXT: bc 12, 4*cr5+lt, .LBB24_2
	; NO-FAST-P8-NEXT: # %bb.1: # %entry			; NO-FAST-P8-NEXT: # %bb.1: # %entry
	; NO-FAST-P8-NEXT: fmr f3, f2			; NO-FAST-P8-NEXT: fmr f3, f2
	; NO-FAST-P8-NEXT: .LBB24_2: # %entry			; NO-FAST-P8-NEXT: .LBB24_2: # %entry
	; NO-FAST-P8-NEXT: fmr f1, f3			; NO-FAST-P8-NEXT: fmr f1, f3
	; NO-FAST-P8-NEXT: blr			; NO-FAST-P8-NEXT: blr
	;			;
	; NO-FAST-P9-LABEL: onecmp1:			; NO-FAST-P9-LABEL: onecmp1:
	; NO-FAST-P9: # %bb.0: # %entry			; NO-FAST-P9: # %bb.0: # %entry
	; NO-FAST-P9-NEXT: addis r3, r2, .LCPI24_0@toc@ha			; NO-FAST-P9-NEXT: vspltisw v2, 1
	; NO-FAST-P9-NEXT: lfs f0, .LCPI24_0@toc@l(r3)			; NO-FAST-P9-NEXT: xvcvsxwdp vs0, vs34
	; NO-FAST-P9-NEXT: fcmpu cr0, f1, f0			; NO-FAST-P9-NEXT: fcmpu cr0, f1, f0
	; NO-FAST-P9-NEXT: cror 4*cr5+lt, lt, un			; NO-FAST-P9-NEXT: bc 12, lt, .LBB24_3
	; NO-FAST-P9-NEXT: bc 12, 4*cr5+lt, .LBB24_2
	; NO-FAST-P9-NEXT: # %bb.1: # %entry			; NO-FAST-P9-NEXT: # %bb.1: # %entry
				; NO-FAST-P9-NEXT: fcmpu cr0, f1, f1
				tingwangAuthorUnsubmitted Not Done Reply Inline Actions This maybe one issue: Given below DAG: t16: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setlt:ch t18: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setuo:ch t19: i1 = or t16, t18 Some logic optimized the second setcc: Combining: t18: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setuo:ch Creating new node: t23: i1 = setcc t2, t2, setuo:ch So that resulted two statements which should have been executed in one fcmpu if there is no combine. t16: i1 = setcc t2, ConstantFP:f64<1.000000e+00>, setlt:ch t23: i1 = setcc t2, t2, setuo:ch t19: i1 = or t16, t23 tingwang: This maybe one issue: Given below DAG: t16: i1 = setcc t2, ConstantFP:f64<1.000000e+00>…
				; NO-FAST-P9-NEXT: bc 12, un, .LBB24_3
				; NO-FAST-P9-NEXT: # %bb.2: # %entry
	; NO-FAST-P9-NEXT: fmr f3, f2			; NO-FAST-P9-NEXT: fmr f3, f2
	; NO-FAST-P9-NEXT: .LBB24_2: # %entry			; NO-FAST-P9-NEXT: .LBB24_3: # %entry
	; NO-FAST-P9-NEXT: fmr f1, f3			; NO-FAST-P9-NEXT: fmr f1, f3
	; NO-FAST-P9-NEXT: blr			; NO-FAST-P9-NEXT: blr
	entry:			entry:
	%cmp = fcmp ult double %a, 1.000000e+00			%cmp = fcmp ult double %a, 1.000000e+00
	%z.y = select i1 %cmp, double %z, double %y			%z.y = select i1 %cmp, double %z, double %y
	ret double %z.y			ret double %z.y
	}			}

	define double @onecmp2(double %a, double %y, double %z) {			define double @onecmp2(double %a, double %y, double %z) {
	; FAST-P8-LABEL: onecmp2:			; FAST-P8-LABEL: onecmp2:
	; FAST-P8: # %bb.0: # %entry			; FAST-P8: # %bb.0: # %entry
	; FAST-P8-NEXT: addis r3, r2, .LCPI25_0@toc@ha			; FAST-P8-NEXT: addis r3, r2, .LCPI25_0@toc@ha
	; FAST-P8-NEXT: lfs f0, .LCPI25_0@toc@l(r3)			; FAST-P8-NEXT: lfs f0, .LCPI25_0@toc@l(r3)
	; FAST-P8-NEXT: xssubdp f0, f0, f1			; FAST-P8-NEXT: xssubdp f0, f0, f1
	; FAST-P8-NEXT: fsel f1, f0, f3, f2			; FAST-P8-NEXT: fsel f1, f0, f3, f2
	; FAST-P8-NEXT: blr			; FAST-P8-NEXT: blr
	;			;
	; FAST-P9-LABEL: onecmp2:			; FAST-P9-LABEL: onecmp2:
	; FAST-P9: # %bb.0: # %entry			; FAST-P9: # %bb.0: # %entry
	; FAST-P9-NEXT: addis r3, r2, .LCPI25_0@toc@ha			; FAST-P9-NEXT: vspltisw v2, 1
	; FAST-P9-NEXT: lfs f0, .LCPI25_0@toc@l(r3)			; FAST-P9-NEXT: xvcvsxwdp vs0, vs34
	; FAST-P9-NEXT: xssubdp f0, f0, f1			; FAST-P9-NEXT: xssubdp f0, f0, f1
	; FAST-P9-NEXT: fsel f1, f0, f3, f2			; FAST-P9-NEXT: fsel f1, f0, f3, f2
	; FAST-P9-NEXT: blr			; FAST-P9-NEXT: blr
	;			;
	; NO-FAST-P8-LABEL: onecmp2:			; NO-FAST-P8-LABEL: onecmp2:
	; NO-FAST-P8: # %bb.0: # %entry			; NO-FAST-P8: # %bb.0: # %entry
	; NO-FAST-P8-NEXT: addis r3, r2, .LCPI25_0@toc@ha			; NO-FAST-P8-NEXT: addis r3, r2, .LCPI25_0@toc@ha
	; NO-FAST-P8-NEXT: lfs f0, .LCPI25_0@toc@l(r3)			; NO-FAST-P8-NEXT: lfs f0, .LCPI25_0@toc@l(r3)
	; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f0			; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f0
	; NO-FAST-P8-NEXT: fmr f1, f2			; NO-FAST-P8-NEXT: fmr f1, f2
	; NO-FAST-P8-NEXT: bgtlr cr0			; NO-FAST-P8-NEXT: bgtlr cr0
	; NO-FAST-P8-NEXT: # %bb.1: # %entry			; NO-FAST-P8-NEXT: # %bb.1: # %entry
	; NO-FAST-P8-NEXT: fmr f1, f3			; NO-FAST-P8-NEXT: fmr f1, f3
	; NO-FAST-P8-NEXT: blr			; NO-FAST-P8-NEXT: blr
	;			;
	; NO-FAST-P9-LABEL: onecmp2:			; NO-FAST-P9-LABEL: onecmp2:
	; NO-FAST-P9: # %bb.0: # %entry			; NO-FAST-P9: # %bb.0: # %entry
	; NO-FAST-P9-NEXT: addis r3, r2, .LCPI25_0@toc@ha			; NO-FAST-P9-NEXT: vspltisw v2, 1
	; NO-FAST-P9-NEXT: lfs f0, .LCPI25_0@toc@l(r3)			; NO-FAST-P9-NEXT: xvcvsxwdp vs0, vs34
	; NO-FAST-P9-NEXT: xscmpudp cr0, f1, f0			; NO-FAST-P9-NEXT: xscmpudp cr0, f1, f0
	; NO-FAST-P9-NEXT: bgt cr0, .LBB25_2			; NO-FAST-P9-NEXT: bgt cr0, .LBB25_2
	; NO-FAST-P9-NEXT: # %bb.1: # %entry			; NO-FAST-P9-NEXT: # %bb.1: # %entry
	; NO-FAST-P9-NEXT: fmr f2, f3			; NO-FAST-P9-NEXT: fmr f2, f3
	; NO-FAST-P9-NEXT: .LBB25_2: # %entry			; NO-FAST-P9-NEXT: .LBB25_2: # %entry
	; NO-FAST-P9-NEXT: fmr f1, f2			; NO-FAST-P9-NEXT: fmr f1, f2
	; NO-FAST-P9-NEXT: blr			; NO-FAST-P9-NEXT: blr
	entry:			entry:
	Show All 10 Lines
	; FAST-P8-NEXT: xssubdp f0, f1, f0			; FAST-P8-NEXT: xssubdp f0, f1, f0
	; FAST-P8-NEXT: xsnegdp f1, f0			; FAST-P8-NEXT: xsnegdp f1, f0
	; FAST-P8-NEXT: fsel f0, f0, f2, f3			; FAST-P8-NEXT: fsel f0, f0, f2, f3
	; FAST-P8-NEXT: fsel f1, f1, f0, f3			; FAST-P8-NEXT: fsel f1, f1, f0, f3
	; FAST-P8-NEXT: blr			; FAST-P8-NEXT: blr
	;			;
	; FAST-P9-LABEL: onecmp3:			; FAST-P9-LABEL: onecmp3:
	; FAST-P9: # %bb.0: # %entry			; FAST-P9: # %bb.0: # %entry
	; FAST-P9-NEXT: addis r3, r2, .LCPI26_0@toc@ha			; FAST-P9-NEXT: vspltisw v2, -1
	; FAST-P9-NEXT: lfs f0, .LCPI26_0@toc@l(r3)			; FAST-P9-NEXT: xvcvsxwdp vs0, vs34
	; FAST-P9-NEXT: xssubdp f0, f1, f0			; FAST-P9-NEXT: xsadddp f0, f1, f0
	; FAST-P9-NEXT: fsel f1, f0, f2, f3			; FAST-P9-NEXT: fsel f1, f0, f2, f3
	; FAST-P9-NEXT: xsnegdp f0, f0			; FAST-P9-NEXT: xsnegdp f0, f0
	; FAST-P9-NEXT: fsel f1, f0, f1, f3			; FAST-P9-NEXT: fsel f1, f0, f1, f3
	; FAST-P9-NEXT: blr			; FAST-P9-NEXT: blr
	;			;
	; NO-FAST-P8-LABEL: onecmp3:			; NO-FAST-P8-LABEL: onecmp3:
	; NO-FAST-P8: # %bb.0: # %entry			; NO-FAST-P8: # %bb.0: # %entry
	; NO-FAST-P8-NEXT: addis r3, r2, .LCPI26_0@toc@ha			; NO-FAST-P8-NEXT: addis r3, r2, .LCPI26_0@toc@ha
	; NO-FAST-P8-NEXT: lfs f0, .LCPI26_0@toc@l(r3)			; NO-FAST-P8-NEXT: lfs f0, .LCPI26_0@toc@l(r3)
	; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f0			; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f0
	; NO-FAST-P8-NEXT: fmr f1, f2			; NO-FAST-P8-NEXT: fmr f1, f2
	; NO-FAST-P8-NEXT: beqlr cr0			; NO-FAST-P8-NEXT: beqlr cr0
	; NO-FAST-P8-NEXT: # %bb.1: # %entry			; NO-FAST-P8-NEXT: # %bb.1: # %entry
	; NO-FAST-P8-NEXT: fmr f1, f3			; NO-FAST-P8-NEXT: fmr f1, f3
	; NO-FAST-P8-NEXT: blr			; NO-FAST-P8-NEXT: blr
	;			;
	; NO-FAST-P9-LABEL: onecmp3:			; NO-FAST-P9-LABEL: onecmp3:
	; NO-FAST-P9: # %bb.0: # %entry			; NO-FAST-P9: # %bb.0: # %entry
	; NO-FAST-P9-NEXT: addis r3, r2, .LCPI26_0@toc@ha			; NO-FAST-P9-NEXT: vspltisw v2, 1
	; NO-FAST-P9-NEXT: lfs f0, .LCPI26_0@toc@l(r3)			; NO-FAST-P9-NEXT: xvcvsxwdp vs0, vs34
	; NO-FAST-P9-NEXT: xscmpudp cr0, f1, f0			; NO-FAST-P9-NEXT: xscmpudp cr0, f1, f0
	; NO-FAST-P9-NEXT: beq cr0, .LBB26_2			; NO-FAST-P9-NEXT: beq cr0, .LBB26_2
	; NO-FAST-P9-NEXT: # %bb.1: # %entry			; NO-FAST-P9-NEXT: # %bb.1: # %entry
	; NO-FAST-P9-NEXT: fmr f2, f3			; NO-FAST-P9-NEXT: fmr f2, f3
	; NO-FAST-P9-NEXT: .LBB26_2: # %entry			; NO-FAST-P9-NEXT: .LBB26_2: # %entry
	; NO-FAST-P9-NEXT: fmr f1, f2			; NO-FAST-P9-NEXT: fmr f1, f2
	; NO-FAST-P9-NEXT: blr			; NO-FAST-P9-NEXT: blr
	entry:			entry:
	%cmp = fcmp oeq double %a, 1.000000e+00			%cmp = fcmp oeq double %a, 1.000000e+00
	%y.z = select i1 %cmp, double %y, double %z			%y.z = select i1 %cmp, double %y, double %z
	ret double %y.z			ret double %y.z
	}			}

llvm/test/CodeGen/PowerPC/toc-float.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 < %s \| FileCheck -check-prefix=CHECK-P9 %s			; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 < %s \| FileCheck -check-prefix=CHECK-P9 %s
	; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 < %s \| FileCheck -check-prefix=CHECK-P8 %s			; RUN: llc -relocation-model=pic -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 < %s \| FileCheck -check-prefix=CHECK-P8 %s

	; As the constant could be represented as float, a float is			; As the constant could be represented as float, a float is
	; loaded from constant pool.			; loaded from constant pool.
	define double @doubleConstant1() {			define double @doubleConstant1() {
	; CHECK-P9-LABEL: doubleConstant1:			; CHECK-P9-LABEL: doubleConstant1:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: addis 3, 2, .LCPI0_0@toc@ha			; CHECK-P9-NEXT: vspltisw 2, 14
	; CHECK-P9-NEXT: lfs 1, .LCPI0_0@toc@l(3)			; CHECK-P9-NEXT: xvcvsxwdp 1, 34
				; CHECK-P9-NEXT: # kill: def $f1 killed $f1 killed $vsl1
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	;			;
	; CHECK-P8-LABEL: doubleConstant1:			; CHECK-P8-LABEL: doubleConstant1:
	; CHECK-P8: # %bb.0:			; CHECK-P8: # %bb.0:
	; CHECK-P8-NEXT: addis 3, 2, .LCPI0_0@toc@ha			; CHECK-P8-NEXT: addis 3, 2, .LCPI0_0@toc@ha
	; CHECK-P8-NEXT: lfs 1, .LCPI0_0@toc@l(3)			; CHECK-P8-NEXT: lfs 1, .LCPI0_0@toc@l(3)
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	ret double 1.400000e+01			ret double 1.400000e+01
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/vec_extract_p9.ll

	Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
	}			}

	define double @test10(<4 x i32> %a, <4 x i32> %b) {			define double @test10(<4 x i32> %a, <4 x i32> %b) {
	; CHECK-LE-LABEL: test10:			; CHECK-LE-LABEL: test10:
	; CHECK-LE: # %bb.0: # %entry			; CHECK-LE: # %bb.0: # %entry
	; CHECK-LE-NEXT: addis 3, 2, .LCPI9_0@toc@ha			; CHECK-LE-NEXT: addis 3, 2, .LCPI9_0@toc@ha
	; CHECK-LE-NEXT: addi 3, 3, .LCPI9_0@toc@l			; CHECK-LE-NEXT: addi 3, 3, .LCPI9_0@toc@l
	; CHECK-LE-NEXT: lxv 36, 0(3)			; CHECK-LE-NEXT: lxv 36, 0(3)
	; CHECK-LE-NEXT: addis 3, 2, .LCPI9_1@toc@ha
	; CHECK-LE-NEXT: lfs 0, .LCPI9_1@toc@l(3)
	; CHECK-LE-NEXT: vperm 2, 3, 2, 4			; CHECK-LE-NEXT: vperm 2, 3, 2, 4
				; CHECK-LE-NEXT: vspltisw 3, 1
				; CHECK-LE-NEXT: xvcvsxwdp 0, 35
	; CHECK-LE-NEXT: xsadddp 1, 34, 0			; CHECK-LE-NEXT: xsadddp 1, 34, 0
	; CHECK-LE-NEXT: blr			; CHECK-LE-NEXT: blr
	;			;
	; CHECK-BE-LABEL: test10:			; CHECK-BE-LABEL: test10:
	; CHECK-BE: # %bb.0: # %entry			; CHECK-BE: # %bb.0: # %entry
	; CHECK-BE-NEXT: addis 3, 2, .LCPI9_0@toc@ha
	; CHECK-BE-NEXT: vmrghw 3, 3, 3			; CHECK-BE-NEXT: vmrghw 3, 3, 3
	; CHECK-BE-NEXT: lfs 0, .LCPI9_0@toc@l(3)
	; CHECK-BE-NEXT: vmrglw 2, 3, 2			; CHECK-BE-NEXT: vmrglw 2, 3, 2
				; CHECK-BE-NEXT: vspltisw 3, 1
				; CHECK-BE-NEXT: xvcvsxwdp 0, 35
	; CHECK-BE-NEXT: xsadddp 1, 34, 0			; CHECK-BE-NEXT: xsadddp 1, 34, 0
	; CHECK-BE-NEXT: blr			; CHECK-BE-NEXT: blr
	entry:			entry:
	%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 5, i32 2, i32 3, i32 7>			%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 5, i32 2, i32 3, i32 7>
	%cast = bitcast <4 x i32> %shuffle to <2 x double>			%cast = bitcast <4 x i32> %shuffle to <2 x double>
	%extract = extractelement <2 x double> %cast, i32 0			%extract = extractelement <2 x double> %cast, i32 0
	%add = fadd double %extract, 1.0000			%add = fadd double %extract, 1.0000
	ret double %add			ret double %add
	}			}