This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
1/2
PPC.td
1
PPCISelLowering.cpp
1/2
PPCSubtarget.h
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
1
fma-mutate.ll
-
fmf-propagation.ll
-
recipest.ll
-
vsx-fma-mutate-trivial-copy.ll

Differential D60037

[PowerPC] Use the two-constant NR algorithm for refining estimates
ClosedPublic

Authored by nemanjai on Mar 30 2019, 4:39 PM.

Download Raw Diff

Details

Reviewers

hfinkel
renenkel
jsji
stefanp

Commits

rZORG4cb7642591b2: [PowerPC] Use the two-constant NR algorithm for refining estimates
rZORGef73458c9b4c: [PowerPC] Use the two-constant NR algorithm for refining estimates
rG4cb7642591b2: [PowerPC] Use the two-constant NR algorithm for refining estimates
rGef73458c9b4c: [PowerPC] Use the two-constant NR algorithm for refining estimates
rGb4f028f0f3f6: [PowerPC] Use the two-constant NR algorithm for refining estimates
rL360144: [PowerPC] Use the two-constant NR algorithm for refining estimates

Summary

The single-constant algorithm produces infinities on a lot of denormal values. The precision of the two-constant algorithm is actually sufficient across the range of denormals. We will switch to that algorithm for now to avoid the infinities on denormals. In the future, we will re-evaluate the algorithm to find the optimal one for PowerPC.

Example:

$ cat a.c 
#include <stdio.h>
#include <math.h>
float __attribute__((noinline)) test(float f) { return sqrtf(f); }
int main(void) {
  return printf("sqrt(0.49e-43): %g\n", test(0.49e-43));
}

$ clang -Ofast a.c
$ ./a.out 
sqrt(0.49e-43): -inf

Desired output (and output with this patch applied):

$ ./a.out 
sqrt(0.49e-43): -inf

We have also run this through a reasonable approximation of the gamut of tests (1,000,000 tests per exponent over the full single-precision range vs. the precise HW instruction). Here are the results from this test (courtesy of @renenkel):

 0 ulps:  72 %
  1 ulps:  27
  2 ulps:  0.032
  3 ulps:  0
 >3 ulps:  0.35

max error = 2 ulps over full range
except returns NaN for +Inf

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai created this revision.Mar 30 2019, 4:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2019, 4:39 PM

Herald added subscribers: jdoerfert, kbarton. · View Herald Transcript

I have a couple of nits. The main one being the test case for fma-mutate.ll.
I'm not sure if there is a way to use one const NR for just that one test.

lib/Target/PowerPC/PPC.td
140	nit: I think it is "Newton-Raphson" not "Newton-Rhapson".
lib/Target/PowerPC/PPCISelLowering.cpp
11148–11150	Same, nit: Newton-Raphson
test/CodeGen/PowerPC/fma-mutate.ll
18	With this new patch this test no longer really tests all of the things that the author wanted to test. As the author of the test mentions, the first transformation is not reasonable. However, the potential opportunity for this first transformation is no longer there (the `fmr` instruction for it is gone) with two const Newton-Raphson. We can at least detect the second legal transformation with the CHECK-NOT. ; CHECK: @foo3 ; CHECK-NOT: fmr ; CHECK: xsmaddmdp ; CHECK: xsmaddadp

Fixed the typos, added a check for no register move.

I only have one comment here and I think it can be fixed when the patch is committed.
LGTM.

lib/Target/PowerPC/PPCSubtarget.h
101	I think that this variable needs to be initialized to `false` in `void PPCSubtarget::initializeEnvironment()`.

This revision is now accepted and ready to land.May 2 2019, 11:10 AM

nemanjai marked an inline comment as done.May 7 2019, 5:05 AM

nemanjai added inline comments.

lib/Target/PowerPC/PPC.td
140	Thanks. I'll fix it.
lib/Target/PowerPC/PPCSubtarget.h
101	Ah, good catch. I forgot about that.

Closed by commit rL360144: [PowerPC] Use the two-constant NR algorithm for refining estimates (authored by nemanjai). · Explain WhyMay 7 2019, 6:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPC.td

5 lines

PPCISelLowering.cpp

4 lines

PPCSubtarget.h

2 lines

test/

CodeGen/

PowerPC/

fma-mutate.ll

3 lines

fmf-propagation.ll

69 lines

recipest.ll

16 lines

vsx-fma-mutate-trivial-copy.ll

2 lines

Diff 197763

lib/Target/PowerPC/PPC.td

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
def FeaturePPC6xx : SubtargetFeature<"ppc6xx", "IsPPC6xx", "true",		def FeaturePPC6xx : SubtargetFeature<"ppc6xx", "IsPPC6xx", "true",
"Enable PPC 6xx instructions">;		"Enable PPC 6xx instructions">;
def FeatureQPX : SubtargetFeature<"qpx","HasQPX", "true",		def FeatureQPX : SubtargetFeature<"qpx","HasQPX", "true",
"Enable QPX instructions",		"Enable QPX instructions",
[FeatureFPU]>;		[FeatureFPU]>;
def FeatureVSX : SubtargetFeature<"vsx","HasVSX", "true",		def FeatureVSX : SubtargetFeature<"vsx","HasVSX", "true",
"Enable VSX instructions",		"Enable VSX instructions",
[FeatureAltivec]>;		[FeatureAltivec]>;
		def FeatureTwoConstNR :
		SubtargetFeature<"two-const-nr", "NeedsTwoConstNR", "true",
		"Requires two constant Newton-Raphson computation">;
		stefanpUnsubmitted Not Done Reply Inline Actions nit: I think it is "Newton-Raphson" not "Newton-Rhapson". stefanp: nit: I think it is "Newton-Raphson" not "Newton-Rhapson".
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Thanks. I'll fix it. nemanjai: Thanks. I'll fix it.
def FeatureP8Altivec : SubtargetFeature<"power8-altivec", "HasP8Altivec", "true",		def FeatureP8Altivec : SubtargetFeature<"power8-altivec", "HasP8Altivec", "true",
"Enable POWER8 Altivec instructions",		"Enable POWER8 Altivec instructions",
[FeatureAltivec]>;		[FeatureAltivec]>;
def FeatureP8Crypto : SubtargetFeature<"crypto", "HasP8Crypto", "true",		def FeatureP8Crypto : SubtargetFeature<"crypto", "HasP8Crypto", "true",
"Enable POWER8 Crypto instructions",		"Enable POWER8 Crypto instructions",
[FeatureP8Altivec]>;		[FeatureP8Altivec]>;
def FeatureP8Vector : SubtargetFeature<"power8-vector", "HasP8Vector", "true",		def FeatureP8Vector : SubtargetFeature<"power8-vector", "HasP8Vector", "true",
"Enable POWER8 vector instructions",		"Enable POWER8 vector instructions",
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	list<SubtargetFeature> Power7FeatureList =
[DirectivePwr7, FeatureAltivec, FeatureVSX,		[DirectivePwr7, FeatureAltivec, FeatureVSX,
FeatureMFOCRF, FeatureFCPSGN, FeatureFSqrt, FeatureFRE,		FeatureMFOCRF, FeatureFCPSGN, FeatureFSqrt, FeatureFRE,
FeatureFRES, FeatureFRSQRTE, FeatureFRSQRTES,		FeatureFRES, FeatureFRSQRTE, FeatureFRSQRTES,
FeatureRecipPrec, FeatureSTFIWX, FeatureLFIWAX,		FeatureRecipPrec, FeatureSTFIWX, FeatureLFIWAX,
FeatureFPRND, FeatureFPCVT, FeatureISEL,		FeatureFPRND, FeatureFPCVT, FeatureISEL,
FeaturePOPCNTD, FeatureCMPB, FeatureLDBRX,		FeaturePOPCNTD, FeatureCMPB, FeatureLDBRX,
Feature64Bit /, Feature64BitRegs /,		Feature64Bit /, Feature64BitRegs /,
FeatureBPERMD, FeatureExtDiv,		FeatureBPERMD, FeatureExtDiv,
FeatureMFTB, DeprecatedDST];		FeatureMFTB, DeprecatedDST, FeatureTwoConstNR];
list<SubtargetFeature> Power8SpecificFeatures =		list<SubtargetFeature> Power8SpecificFeatures =
[DirectivePwr8, FeatureP8Altivec, FeatureP8Vector, FeatureP8Crypto,		[DirectivePwr8, FeatureP8Altivec, FeatureP8Vector, FeatureP8Crypto,
FeatureHTM, FeatureDirectMove, FeatureICBT, FeaturePartwordAtomic,		FeatureHTM, FeatureDirectMove, FeatureICBT, FeaturePartwordAtomic,
FeatureFusion];		FeatureFusion];
list<SubtargetFeature> Power8FeatureList =		list<SubtargetFeature> Power8FeatureList =
!listconcat(Power7FeatureList, Power8SpecificFeatures);		!listconcat(Power7FeatureList, Power8SpecificFeatures);
list<SubtargetFeature> Power9SpecificFeatures =		list<SubtargetFeature> Power9SpecificFeatures =
[DirectivePwr9, FeatureP9Altivec, FeatureP9Vector, FeatureISA3_0,		[DirectivePwr9, FeatureP9Altivec, FeatureP9Vector, FeatureISA3_0,
▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,139 Lines • ▼ Show 20 Lines	if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|
(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|		(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|
(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|		(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|		(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|
(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|		(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|
(VT == MVT::v4f64 && Subtarget.hasQPX())) {		(VT == MVT::v4f64 && Subtarget.hasQPX())) {
if (RefinementSteps == ReciprocalEstimate::Unspecified)		if (RefinementSteps == ReciprocalEstimate::Unspecified)
RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);		RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);

UseOneConstNR = true;		// The Newton-Raphson computation with a single constant does not provide
		// enough accuracy on some CPUs.
		UseOneConstNR = !Subtarget.needsTwoConstNR();
		stefanpUnsubmitted Not Done Reply Inline Actions Same, nit: Newton-Raphson stefanp: Same, nit: Newton-Raphson
return DAG.getNode(PPCISD::FRSQRTE, SDLoc(Operand), VT, Operand);		return DAG.getNode(PPCISD::FRSQRTE, SDLoc(Operand), VT, Operand);
}		}
return SDValue();		return SDValue();
}		}

SDValue PPCTargetLowering::getRecipEstimate(SDValue Operand, SelectionDAG &DAG,		SDValue PPCTargetLowering::getRecipEstimate(SDValue Operand, SelectionDAG &DAG,
int Enabled,		int Enabled,
int &RefinementSteps) const {		int &RefinementSteps) const {
▲ Show 20 Lines • Show All 3,825 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCSubtarget.h

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	protected:
bool UseCRBits;		bool UseCRBits;
bool HasHardFloat;		bool HasHardFloat;
bool IsPPC64;		bool IsPPC64;
bool HasAltivec;		bool HasAltivec;
bool HasFPU;		bool HasFPU;
bool HasSPE;		bool HasSPE;
bool HasQPX;		bool HasQPX;
bool HasVSX;		bool HasVSX;
		bool NeedsTwoConstNR;
		stefanpUnsubmitted Not Done Reply Inline Actions I think that this variable needs to be initialized to `false` in `void PPCSubtarget::initializeEnvironment()`. stefanp: I think that this variable needs to be initialized to `false` in `void PPCSubtarget…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Ah, good catch. I forgot about that. nemanjai: Ah, good catch. I forgot about that.
bool HasP8Vector;		bool HasP8Vector;
bool HasP8Altivec;		bool HasP8Altivec;
bool HasP8Crypto;		bool HasP8Crypto;
bool HasP9Vector;		bool HasP9Vector;
bool HasP9Altivec;		bool HasP9Altivec;
bool HasFCPSGN;		bool HasFCPSGN;
bool HasFSQRT;		bool HasFSQRT;
bool HasFRE, HasFRES, HasFRSQRTE, HasFRSQRTES;		bool HasFRE, HasFRES, HasFRSQRTE, HasFRSQRTES;
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	public:
bool hasLFIWAX() const { return HasLFIWAX; }		bool hasLFIWAX() const { return HasLFIWAX; }
bool hasFPRND() const { return HasFPRND; }		bool hasFPRND() const { return HasFPRND; }
bool hasFPCVT() const { return HasFPCVT; }		bool hasFPCVT() const { return HasFPCVT; }
bool hasAltivec() const { return HasAltivec; }		bool hasAltivec() const { return HasAltivec; }
bool hasSPE() const { return HasSPE; }		bool hasSPE() const { return HasSPE; }
bool hasFPU() const { return HasFPU; }		bool hasFPU() const { return HasFPU; }
bool hasQPX() const { return HasQPX; }		bool hasQPX() const { return HasQPX; }
bool hasVSX() const { return HasVSX; }		bool hasVSX() const { return HasVSX; }
		bool needsTwoConstNR() const { return NeedsTwoConstNR; }
bool hasP8Vector() const { return HasP8Vector; }		bool hasP8Vector() const { return HasP8Vector; }
bool hasP8Altivec() const { return HasP8Altivec; }		bool hasP8Altivec() const { return HasP8Altivec; }
bool hasP8Crypto() const { return HasP8Crypto; }		bool hasP8Crypto() const { return HasP8Crypto; }
bool hasP9Vector() const { return HasP9Vector; }		bool hasP9Vector() const { return HasP9Vector; }
bool hasP9Altivec() const { return HasP9Altivec; }		bool hasP9Altivec() const { return HasP9Altivec; }
bool hasMFOCRF() const { return HasMFOCRF; }		bool hasMFOCRF() const { return HasMFOCRF; }
bool hasISEL() const { return HasISEL; }		bool hasISEL() const { return HasISEL; }
bool hasBPERMD() const { return HasBPERMD; }		bool hasBPERMD() const { return HasBPERMD; }
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/fma-mutate.ll

	; Test several VSX FMA mutation opportunities. The first one isn't a			; Test several VSX FMA mutation opportunities. The first one isn't a
	; reasonable transformation because the killed product register is the			; reasonable transformation because the killed product register is the
	; same as the FMA target register. The second one is legal. The third			; same as the FMA target register. The second one is legal. The third
	; one doesn't fit the feeding-copy pattern.			; one doesn't fit the feeding-copy pattern.

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)

	define double @foo3(double %a) nounwind {			define double @foo3(double %a) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			%r = call double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r

	; CHECK: @foo3			; CHECK: @foo3
	; CHECK: fmr [[REG:[0-9]+]], [[REG2:[0-9]+]]			; CHECK-NOT: fmr
	; CHECK: xsnmsubadp [[REG]], {{[0-9]+}}, [[REG2]]
	; CHECK: xsmaddmdp			; CHECK: xsmaddmdp
				stefanpUnsubmitted Not Done Reply Inline Actions With this new patch this test no longer really tests all of the things that the author wanted to test. As the author of the test mentions, the first transformation is not reasonable. However, the potential opportunity for this first transformation is no longer there (the `fmr` instruction for it is gone) with two const Newton-Raphson. We can at least detect the second legal transformation with the CHECK-NOT. ; CHECK: @foo3 ; CHECK-NOT: fmr ; CHECK: xsmaddmdp ; CHECK: xsmaddadp stefanp: With this new patch this test no longer really tests all of the things that the author wanted…
	; CHECK: xsmaddadp			; CHECK: xsmaddadp
	}			}

test/CodeGen/PowerPC/fmf-propagation.ll

	Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines

	define float @sqrt_afn(float %x) {			define float @sqrt_afn(float %x) {
	; FMF-LABEL: sqrt_afn:			; FMF-LABEL: sqrt_afn:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xxlxor 0, 0, 0			; FMF-NEXT: xxlxor 0, 0, 0
	; FMF-NEXT: fcmpu 0, 1, 0			; FMF-NEXT: fcmpu 0, 1, 0
	; FMF-NEXT: beq 0, .LBB10_2			; FMF-NEXT: beq 0, .LBB10_2
	; FMF-NEXT: # %bb.1:			; FMF-NEXT: # %bb.1:
				; FMF-NEXT: xsrsqrtesp 0, 1
	; FMF-NEXT: addis 3, 2, .LCPI10_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI10_0@toc@ha
	; FMF-NEXT: xsrsqrtesp 3, 1			; FMF-NEXT: addis 4, 2, .LCPI10_1@toc@ha
	; FMF-NEXT: lfs 0, .LCPI10_0@toc@l(3)			; FMF-NEXT: lfs 2, .LCPI10_0@toc@l(3)
	; FMF-NEXT: xsmulsp 2, 1, 0			; FMF-NEXT: lfs 3, .LCPI10_1@toc@l(4)
	; FMF-NEXT: xsmulsp 4, 3, 3			; FMF-NEXT: xsmulsp 1, 1, 0
	; FMF-NEXT: xssubsp 2, 2, 1			; FMF-NEXT: xsmulsp 0, 1, 0
	; FMF-NEXT: xsmulsp 2, 2, 4			; FMF-NEXT: xsmulsp 1, 1, 2
	; FMF-NEXT: xssubsp 0, 0, 2			; FMF-NEXT: xsaddsp 0, 0, 3
	; FMF-NEXT: xsmulsp 0, 3, 0			; FMF-NEXT: xsmulsp 0, 1, 0
	; FMF-NEXT: xsmulsp 0, 0, 1
	; FMF-NEXT: .LBB10_2:			; FMF-NEXT: .LBB10_2:
	; FMF-NEXT: fmr 1, 0			; FMF-NEXT: fmr 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: sqrt_afn:			; GLOBAL-LABEL: sqrt_afn:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xxlxor 0, 0, 0			; GLOBAL-NEXT: xxlxor 0, 0, 0
	; GLOBAL-NEXT: fcmpu 0, 1, 0			; GLOBAL-NEXT: fcmpu 0, 1, 0
	; GLOBAL-NEXT: beq 0, .LBB10_2			; GLOBAL-NEXT: beq 0, .LBB10_2
	; GLOBAL-NEXT: # %bb.1:			; GLOBAL-NEXT: # %bb.1:
	; GLOBAL-NEXT: xsrsqrtesp 2, 1			; GLOBAL-NEXT: xsrsqrtesp 0, 1
	; GLOBAL-NEXT: fneg 0, 1
	; GLOBAL-NEXT: addis 3, 2, .LCPI10_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI10_0@toc@ha
	; GLOBAL-NEXT: fmr 4, 1			; GLOBAL-NEXT: addis 4, 2, .LCPI10_1@toc@ha
	; GLOBAL-NEXT: lfs 3, .LCPI10_0@toc@l(3)			; GLOBAL-NEXT: lfs 2, .LCPI10_0@toc@l(3)
	; GLOBAL-NEXT: xsmaddasp 4, 0, 3			; GLOBAL-NEXT: lfs 3, .LCPI10_1@toc@l(4)
	; GLOBAL-NEXT: xsmulsp 0, 2, 2			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: xsmaddasp 3, 4, 0			; GLOBAL-NEXT: xsmaddasp 2, 1, 0
	; GLOBAL-NEXT: xsmulsp 0, 2, 3			; GLOBAL-NEXT: xsmulsp 0, 1, 3
	; GLOBAL-NEXT: xsmulsp 0, 0, 1			; GLOBAL-NEXT: xsmulsp 0, 0, 2
	; GLOBAL-NEXT: .LBB10_2:			; GLOBAL-NEXT: .LBB10_2:
	; GLOBAL-NEXT: fmr 1, 0			; GLOBAL-NEXT: fmr 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%rt = call afn float @llvm.sqrt.f32(float %x)			%rt = call afn float @llvm.sqrt.f32(float %x)
	ret float %rt			ret float %rt
	}			}

	; The call is now fully 'fast'. This implies that approximation is allowed.			; The call is now fully 'fast'. This implies that approximation is allowed.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'
	; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'
	; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'

	define float @sqrt_fast(float %x) {			define float @sqrt_fast(float %x) {
	; FMF-LABEL: sqrt_fast:			; FMF-LABEL: sqrt_fast:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xxlxor 0, 0, 0			; FMF-NEXT: xxlxor 0, 0, 0
	; FMF-NEXT: fcmpu 0, 1, 0			; FMF-NEXT: fcmpu 0, 1, 0
	; FMF-NEXT: beq 0, .LBB11_2			; FMF-NEXT: beq 0, .LBB11_2
	; FMF-NEXT: # %bb.1:			; FMF-NEXT: # %bb.1:
	; FMF-NEXT: xsrsqrtesp 2, 1			; FMF-NEXT: xsrsqrtesp 0, 1
	; FMF-NEXT: fneg 0, 1
	; FMF-NEXT: addis 3, 2, .LCPI11_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI11_0@toc@ha
	; FMF-NEXT: fmr 4, 1			; FMF-NEXT: addis 4, 2, .LCPI11_1@toc@ha
	; FMF-NEXT: lfs 3, .LCPI11_0@toc@l(3)			; FMF-NEXT: lfs 2, .LCPI11_0@toc@l(3)
	; FMF-NEXT: xsmaddasp 4, 0, 3			; FMF-NEXT: lfs 3, .LCPI11_1@toc@l(4)
	; FMF-NEXT: xsmulsp 0, 2, 2			; FMF-NEXT: xsmulsp 1, 1, 0
	; FMF-NEXT: xsmaddasp 3, 4, 0			; FMF-NEXT: xsmaddasp 2, 1, 0
	; FMF-NEXT: xsmulsp 0, 2, 3			; FMF-NEXT: xsmulsp 0, 1, 3
	; FMF-NEXT: xsmulsp 0, 0, 1			; FMF-NEXT: xsmulsp 0, 0, 2
	; FMF-NEXT: .LBB11_2:			; FMF-NEXT: .LBB11_2:
	; FMF-NEXT: fmr 1, 0			; FMF-NEXT: fmr 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: sqrt_fast:			; GLOBAL-LABEL: sqrt_fast:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xxlxor 0, 0, 0			; GLOBAL-NEXT: xxlxor 0, 0, 0
	; GLOBAL-NEXT: fcmpu 0, 1, 0			; GLOBAL-NEXT: fcmpu 0, 1, 0
	; GLOBAL-NEXT: beq 0, .LBB11_2			; GLOBAL-NEXT: beq 0, .LBB11_2
	; GLOBAL-NEXT: # %bb.1:			; GLOBAL-NEXT: # %bb.1:
	; GLOBAL-NEXT: xsrsqrtesp 2, 1			; GLOBAL-NEXT: xsrsqrtesp 0, 1
	; GLOBAL-NEXT: fneg 0, 1
	; GLOBAL-NEXT: addis 3, 2, .LCPI11_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI11_0@toc@ha
	; GLOBAL-NEXT: fmr 4, 1			; GLOBAL-NEXT: addis 4, 2, .LCPI11_1@toc@ha
	; GLOBAL-NEXT: lfs 3, .LCPI11_0@toc@l(3)			; GLOBAL-NEXT: lfs 2, .LCPI11_0@toc@l(3)
	; GLOBAL-NEXT: xsmaddasp 4, 0, 3			; GLOBAL-NEXT: lfs 3, .LCPI11_1@toc@l(4)
	; GLOBAL-NEXT: xsmulsp 0, 2, 2			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: xsmaddasp 3, 4, 0			; GLOBAL-NEXT: xsmaddasp 2, 1, 0
	; GLOBAL-NEXT: xsmulsp 0, 2, 3			; GLOBAL-NEXT: xsmulsp 0, 1, 3
	; GLOBAL-NEXT: xsmulsp 0, 0, 1			; GLOBAL-NEXT: xsmulsp 0, 0, 2
	; GLOBAL-NEXT: .LBB11_2:			; GLOBAL-NEXT: .LBB11_2:
	; GLOBAL-NEXT: fmr 1, 0			; GLOBAL-NEXT: fmr 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%rt = call fast float @llvm.sqrt.f32(float %x)			%rt = call fast float @llvm.sqrt.f32(float %x)
	ret float %rt			ret float %rt
	}			}

	; fcmp can have fast-math-flags.			; fcmp can have fast-math-flags.
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/recipest.ll

; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx \| FileCheck %s		; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx \| FileCheck %s
; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s		; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s

target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"		target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
target triple = "powerpc64-unknown-linux-gnu"		target triple = "powerpc64-unknown-linux-gnu"

declare double @llvm.sqrt.f64(double)		declare double @llvm.sqrt.f64(double)
declare float @llvm.sqrt.f32(float)		declare float @llvm.sqrt.f32(float)
declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)		declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

define double @foo(double %a, double %b) nounwind {		define double @foo(double %a, double %b) nounwind {
%x = call double @llvm.sqrt.f64(double %b)		%x = call double @llvm.sqrt.f64(double %b)
%r = fdiv double %a, %x		%r = fdiv double %a, %x
ret double %r		ret double %r

; CHECK: @foo		; CHECK: @foo
; CHECK-DAG: frsqrte		; CHECK: frsqrte
; CHECK-DAG: fnmsub
; CHECK: fmul		; CHECK: fmul
; CHECK-NEXT: fmadd		; CHECK-NEXT: fmadd
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
		; CHECK-NEXT: fmul
; CHECK-NEXT: fmadd		; CHECK-NEXT: fmadd
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
		; CHECK-NEXT: fmul
; CHECK: blr		; CHECK: blr

; CHECK-SAFE: @foo		; CHECK-SAFE: @foo
; CHECK-SAFE: fsqrt		; CHECK-SAFE: fsqrt
; CHECK-SAFE: fdiv		; CHECK-SAFE: fdiv
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

Show All 14 Lines
define double @foof(double %a, float %b) nounwind {		define double @foof(double %a, float %b) nounwind {
%x = call float @llvm.sqrt.f32(float %b)		%x = call float @llvm.sqrt.f32(float %b)
%y = fpext float %x to double		%y = fpext float %x to double
%r = fdiv double %a, %y		%r = fdiv double %a, %y
ret double %r		ret double %r

; CHECK: @foof		; CHECK: @foof
; CHECK-DAG: frsqrtes		; CHECK-DAG: frsqrtes
; CHECK-DAG: fnmsubs
; CHECK: fmuls		; CHECK: fmuls
; CHECK-NEXT: fmadds		; CHECK-NEXT: fmadds
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
		; CHECK-NEXT: fmuls
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
; CHECK-NEXT: blr		; CHECK-NEXT: blr

; CHECK-SAFE: @foof		; CHECK-SAFE: @foof
; CHECK-SAFE: fsqrts		; CHECK-SAFE: fsqrts
; CHECK-SAFE: fdiv		; CHECK-SAFE: fdiv
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define float @food(float %a, double %b) nounwind {		define float @food(float %a, double %b) nounwind {
%x = call double @llvm.sqrt.f64(double %b)		%x = call double @llvm.sqrt.f64(double %b)
%y = fptrunc double %x to float		%y = fptrunc double %x to float
%r = fdiv float %a, %y		%r = fdiv float %a, %y
ret float %r		ret float %r

; CHECK: @foo		; CHECK: @foo
; CHECK-DAG: frsqrte		; CHECK-DAG: frsqrte
; CHECK-DAG: fnmsub
; CHECK: fmul		; CHECK: fmul
; CHECK-NEXT: fmadd		; CHECK-NEXT: fmadd
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
		; CHECK-NEXT: fmul
; CHECK-NEXT: fmadd		; CHECK-NEXT: fmadd
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
		; CHECK-NEXT: fmul
; CHECK-NEXT: frsp		; CHECK-NEXT: frsp
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
; CHECK-NEXT: blr		; CHECK-NEXT: blr

; CHECK-SAFE: @foo		; CHECK-SAFE: @foo
; CHECK-SAFE: fsqrt		; CHECK-SAFE: fsqrt
; CHECK-SAFE: fdivs		; CHECK-SAFE: fdivs
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define float @goo(float %a, float %b) nounwind {		define float @goo(float %a, float %b) nounwind {
%x = call float @llvm.sqrt.f32(float %b)		%x = call float @llvm.sqrt.f32(float %b)
%r = fdiv float %a, %x		%r = fdiv float %a, %x
ret float %r		ret float %r

; CHECK: @goo		; CHECK: @goo
; CHECK-DAG: frsqrtes		; CHECK-DAG: frsqrtes
; CHECK-DAG: fnmsubs
; CHECK: fmuls		; CHECK: fmuls
; CHECK-NEXT: fmadds		; CHECK-NEXT: fmadds
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
		; CHECK-NEXT: fmuls
; CHECK-NEXT: blr		; CHECK-NEXT: blr

; CHECK-SAFE: @goo		; CHECK-SAFE: @goo
; CHECK-SAFE: fsqrts		; CHECK-SAFE: fsqrts
; CHECK-SAFE: fdivs		; CHECK-SAFE: fdivs
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

Show All 19 Lines	define float @rsqrt_fmul(float %a, float %b, float %c) {
%z = fdiv float %c, %y		%z = fdiv float %c, %y
ret float %z		ret float %z

; CHECK: @rsqrt_fmul		; CHECK: @rsqrt_fmul
; CHECK-DAG: frsqrtes		; CHECK-DAG: frsqrtes
; CHECK-DAG: fres		; CHECK-DAG: fres
; CHECK-DAG: fnmsubs		; CHECK-DAG: fnmsubs
; CHECK-DAG: fmuls		; CHECK-DAG: fmuls
; CHECK-DAG: fnmsubs
; CHECK-DAG: fmadds		; CHECK-DAG: fmadds
; CHECK-DAG: fmadds		; CHECK-DAG: fmadds
; CHECK: fmuls		; CHECK: fmuls
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
; CHECK-NEXT: blr		; CHECK-NEXT: blr

; CHECK-SAFE: @rsqrt_fmul		; CHECK-SAFE: @rsqrt_fmul
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines

define double @foo3(double %a) nounwind {		define double @foo3(double %a) nounwind {
%r = call double @llvm.sqrt.f64(double %a)		%r = call double @llvm.sqrt.f64(double %a)
ret double %r		ret double %r

; CHECK: @foo3		; CHECK: @foo3
; CHECK: fcmpu		; CHECK: fcmpu
; CHECK-DAG: frsqrte		; CHECK-DAG: frsqrte
; CHECK-DAG: fnmsub
; CHECK: fmul		; CHECK: fmul
; CHECK-NEXT: fmadd		; CHECK-NEXT: fmadd
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
		; CHECK-NEXT: fmul
; CHECK-NEXT: fmadd		; CHECK-NEXT: fmadd
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
; CHECK-NEXT: fmul		; CHECK-NEXT: fmul
; CHECK: blr		; CHECK: blr

; CHECK-SAFE: @foo3		; CHECK-SAFE: @foo3
; CHECK-SAFE: fsqrt		; CHECK-SAFE: fsqrt
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define float @goo3(float %a) nounwind {		define float @goo3(float %a) nounwind {
%r = call float @llvm.sqrt.f32(float %a)		%r = call float @llvm.sqrt.f32(float %a)
ret float %r		ret float %r

; CHECK: @goo3		; CHECK: @goo3
; CHECK: fcmpu		; CHECK: fcmpu
; CHECK-DAG: frsqrtes		; CHECK-DAG: frsqrtes
; CHECK-DAG: fnmsubs
; CHECK: fmuls		; CHECK: fmuls
; CHECK-NEXT: fmadds		; CHECK-NEXT: fmadds
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
; CHECK-NEXT: fmuls		; CHECK-NEXT: fmuls
; CHECK: blr		; CHECK: blr

; CHECK-SAFE: @goo3		; CHECK-SAFE: @goo3
; CHECK-SAFE: fsqrts		; CHECK-SAFE: fsqrts
Show All 18 Lines

test/CodeGen/PowerPC/vsx-fma-mutate-trivial-copy.ll

	; RUN: llc -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s \| FileCheck %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @LSH_recall_init(float %d_min, float %W) #0 {			define void @LSH_recall_init(float %d_min, float %W) #0 {
	entry:			entry:
	br i1 undef, label %for.body.lr.ph, label %for.end			br i1 undef, label %for.body.lr.ph, label %for.end

	; CHECK-LABEL: @LSH_recall_init			; CHECK-LABEL: @LSH_recall_init
	; CHECK: xsnmsubadp			; CHECK: xsmaddadp

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	%conv3 = fpext float %W to double			%conv3 = fpext float %W to double
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %for.body.lr.ph			for.body: ; preds = %for.body, %for.body.lr.ph
	%div = fdiv fast float 0.000000e+00, 0.000000e+00			%div = fdiv fast float 0.000000e+00, 0.000000e+00
	%add = fadd fast float %div, %d_min			%add = fadd fast float %div, %d_min
	Show All 19 Lines