This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
SimplifyLibCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
pow-sqrt.ll

Differential D49306

[SLC] Simplify pow(x, 0.25) to sqrt(sqrt(x))
AbandonedPublic

Authored by evandro on Jul 13 2018, 10:08 AM.

Download Raw Diff

Details

Reviewers

spatel
efriedma

Summary

This transformation helps some benchmarks in SPEC CPU2006, such as 447.dealII, as well as some proprietary benchmarks. Otherwise, no significant regressions on x86-64 or A64.

Diff Detail

Event Timeline

evandro created this revision.Jul 13 2018, 10:08 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 13 2018, 10:08 AM

evandro added a parent revision: D49040: [SLC] Simplify pow(x, 0.333...) to cbrt(x).Jul 13 2018, 10:09 AM

evandro set the repository for this revision to rL LLVM.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 13 2018, 10:09 AM

¡Ping! 🔔🔔

evandro updated this revision to Diff 158162.Jul 30 2018, 7:07 PM

evandro edited reviewers, added: efriedma; removed: eli.friedman.

Why is this the right IR canonicalization?

We have a bug report requesting the opposite transform here:
https://bugs.llvm.org/show_bug.cgi?id=35600
...citing the case of a chain of many nested sqrt. If we're going to canonicalize the sequence with many sqrt() to pow() in IR, then we might as well do the same for sqrt(sqrt()) to be consistent. Then, we convert back to sqrt(sqrt()) as a special-case for pow(x, 0.25) in the backend because that's a perf win.

In D49306#1182662, @spatel wrote:

Why is this the right IR canonicalization?

It seems to be interesting on targets that have a good performing instruction to calculate sqrt(). I expanded testing on targets with a slow performing instruction for sqrt() and the results were not so promising though.

In D49306#1183213, @evandro wrote:

In D49306#1182662, @spatel wrote:

Why is this the right IR canonicalization?

It seems to be interesting on targets that have a good performing instruction to calculate sqrt(). I expanded testing on targets with a slow performing instruction for sqrt() and the results were not so promising though.

We are in the middle end here though. Targets are back end.
We do not try to produce some-target-specific-optimal IR.
We want to produce middle-level optimal IR.
The back-end (DAGCombine, ...) can and should further refine the IR into whatever is optimal for the actual target.

So +1 to the question of whether we should be doing the opposite here, and this in the backend.

Agreed.

And the same remarks are probably applicable to D49040 and D50036 too, even though they only
extend the existing code. Canonicalizing all the sqrt/cbrt/... to just one pow will
make it simpler for everything else since it's just one function to be aware of, instead of three.

In D49306#1183905, @lebedev.ri wrote:

And the same remarks are probably applicable to D49040 and D50036 too, even though they only
extend the existing code. Canonicalizing all the sqrt/cbrt/... to just one pow will
make it simpler for everything else since it's just one function to be aware of, instead of three.

It's fuzzy. If we had decided on pow() as the canonical form for sqrt() from the start, that would be true. But now we have an intrinsic, analysis, transforms, cost modeling, lowering, etc. for sqrt() directly. So if we wanted to reverse that, we'd have to modify all of that to recognize pow(x, 0.5) as the equivalent. I think it's better as we have it now because sqrt() is the more recognized form both in source code and hardware.

IMO, what made this patch different than the others is that we're replacing 1 call with 2 calls (and potentially other ops as shown in the tests here). Again though, it's fuzzy. We could argue that because sqrt() has existing analysis and pow() doesn't, the longer sequence in IR is better. That isn't the usual way to canonicalize, but it's not unprecedented. For example, there are existing transforms where we turn 1 sdiv/udiv into logic+cmp+ select.

I would lean towards converting cbrt() to pow() here in IR. AFAIK, there's no existing IR benefit to the cbrt() form. Plus, there is an intrinsic for pow(), so that makes vectorization easier. If we canonicalize in the other direction (pow(x, 0.33) --> cbrt(x)), then we have a canonicalization difference based on scalar/vector...or we have to add an intrinsic for cbrt?

A specialized transcendental function is inherently faster than a generic one. In the case of cbrt() this is measurable in popular benchmarks, like CPU2000 and CPU2006. I can see the appeal of converging particular cases back to one generic case in the middle end, but IMO it's the opposite way in the run time environment. From this perspective, methinks that the case for an intrinsic of cbrt() is stronger than folding it into pow().

In D49306#1184350, @evandro wrote:

A specialized transcendental function is inherently faster than a generic one. In the case of cbrt() this is measurable in popular benchmarks, like CPU2000 and CPU2006.

No disagreement there. I understand the motivation and would like to see this fixed too.

I can see the appeal of converging particular cases back to one generic case in the middle end, but IMO it's the opposite way in the run time environment. From this perspective, methinks that the case for an intrinsic of cbrt() is stronger than folding it into pow().

I think we need a stronger argument to add another LLVM math intrinsic for cbrt(). Eg, what IR transforms/analysis would that intrinsic enable vs. something we could already do for llvm.pow(x, 0.33)? If there's no added value, then we should convert this to cbrt() in the backend.

spatel mentioned this in D49040: [SLC] Simplify pow(x, 0.333...) to cbrt(x).Aug 7 2018, 1:45 PM

evandro added a subscriber: fhahn.Aug 17 2018, 8:02 AM

spatel mentioned this in rL341341: [AArch64][x86] add tests for pow(x, 0.25); NFC.Sep 3 2018, 3:12 PM

spatel mentioned this in D51630: [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x)).Sep 4 2018, 7:21 AM

spatel mentioned this in rL341481: [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x)).Sep 5 2018, 10:03 AM

Whitney mentioned this in D57434: Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X)).Jan 30 2019, 1:08 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

SimplifyLibCalls.cpp

25 lines

test/

Transforms/

InstCombine/

pow-sqrt.ll

81 lines

Diff 158162

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 1,198 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::replacePowWithExp(CallInst Pow, IRBuilder<> &B) {
// TODO: There is no exp10() intrinsic yet, but some day there shall be one.		// TODO: There is no exp10() intrinsic yet, but some day there shall be one.
if (BaseF->isExactlyValue(10.0) &&		if (BaseF->isExactlyValue(10.0) &&
hasUnaryFloatFn(TLI, Ty, LibFunc_exp10, LibFunc_exp10f, LibFunc_exp10l))		hasUnaryFloatFn(TLI, Ty, LibFunc_exp10, LibFunc_exp10f, LibFunc_exp10l))
return emitUnaryFloatFnCall(Expo, TLI->getName(LibFunc_exp10), B, Attrs);		return emitUnaryFloatFnCall(Expo, TLI->getName(LibFunc_exp10), B, Attrs);

return nullptr;		return nullptr;
}		}

/// Use sqrt() for pow(x, +/-0.5) and cbrt() for pow(x, +/-0.333...).		/// Use sqrt() for pow(x, +/-0.5) and pow(x, +/-0.25) and
		/// cbrt() for pow(x, +/-0.333...).
Value LibCallSimplifier::replacePowWithRoot(CallInst Pow, IRBuilder<> &B) {		Value LibCallSimplifier::replacePowWithRoot(CallInst Pow, IRBuilder<> &B) {
Value Root, Base = Pow->getArgOperand(0), *Expo = Pow->getArgOperand(1);		Value Root, Base = Pow->getArgOperand(0), *Expo = Pow->getArgOperand(1);
AttributeList Attrs = Pow->getCalledFunction()->getAttributes();		AttributeList Attrs = Pow->getCalledFunction()->getAttributes();
Module *Mod = Pow->getModule();		Module *Mod = Pow->getModule();
Type *Ty = Pow->getType();		Type *Ty = Pow->getType();

const APFloat *ExpoF;		const APFloat *ExpoF;
if (!match(Expo, m_APFloat(ExpoF)))		if (!match(Expo, m_APFloat(ExpoF)))
return nullptr;		return nullptr;

const double OneThird = (Ty->getTypeID() == Type::FloatTyID)		const double OneThird = (Ty->getTypeID() == Type::FloatTyID)
? (1.0f / 3.0f) : (1.0 / 3.0);		? (1.0f / 3.0f) : (1.0 / 3.0);
bool isHalf (ExpoF->isExactlyValue(0.5) \|\| ExpoF->isExactlyValue(-0.5)),		bool isHalf (ExpoF->isExactlyValue(0.5) \|\| ExpoF->isExactlyValue(-0.5)),
isThird (ExpoF->isExactlyValue(OneThird) \|\|		isThird (ExpoF->isExactlyValue(OneThird) \|\|
ExpoF->isExactlyValue(-OneThird));		ExpoF->isExactlyValue(-OneThird)),
if (!isHalf && !isThird)		isQuarter (ExpoF->isExactlyValue(0.25) \|\| ExpoF->isExactlyValue(-0.25));
		if (!isHalf && !isThird && !isQuarter)
return nullptr;		return nullptr;

// Expand pow(x, +/-0.5) to sqrt().		// Expand pow(x, +/-0.5) to sqrt() and pow(x, +/-0.25) to sqrt(sqrt()).
if (isHalf) {		if (isHalf \|\| isQuarter) {
// If errno is never set, then use the intrinsic for sqrt().
if (Pow->hasFnAttr(Attribute::ReadNone)) {		if (Pow->hasFnAttr(Attribute::ReadNone)) {
Function *SqrtFn = Intrinsic::getDeclaration(Mod, Intrinsic::sqrt, Ty);		Function *SqrtFn = Intrinsic::getDeclaration(Mod, Intrinsic::sqrt, Ty);
Root = B.CreateCall(SqrtFn, Base, "sqrt");		Root = B.CreateCall(SqrtFn, Base, "sqrt");
		if (isQuarter)
		Root = B.CreateCall(SqrtFn, Root, "sqrt");
}		}
// Otherwise, use the libcall for sqrt().		// Otherwise, use the libcall for sqrt(), but only when the exponent is 0.5,
else if (hasUnaryFloatFn(TLI, Ty,		// since using the libcall back to back doesn't seem to be a good idea.
		else if (!isQuarter &&
		hasUnaryFloatFn(TLI, Ty,
LibFunc_sqrt, LibFunc_sqrtf, LibFunc_sqrtl))		LibFunc_sqrt, LibFunc_sqrtf, LibFunc_sqrtl))
// TODO: We also should check that the target can in fact lower the sqrt()		// TODO: We also should check that the target can in fact lower the sqrt()
// libcall. We currently have no way to ask this question, so we ask if		// libcall. We currently have no way to ask this question, so we ask if
// the target has a sqrt() libcall, which is not exactly the same.		// the target has a sqrt() libcall, which is not exactly the same.
Root = emitUnaryFloatFnCall(Base, TLI->getName(LibFunc_sqrt), B, Attrs);		Root = emitUnaryFloatFnCall(Base, TLI->getName(LibFunc_sqrt), B, Attrs);
else		else
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 1,536 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/pow-sqrt.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	define float @powf_intrinsic_half_fast(float %x) {			define float @powf_intrinsic_half_fast(float %x) {
	; CHECK-LABEL: @powf_intrinsic_half_fast(			; CHECK-LABEL: @powf_intrinsic_half_fast(
	; CHECK-NEXT: [[SQRT:%.*]] = call fast float @llvm.sqrt.f32(float %x)			; CHECK-NEXT: [[SQRT:%.*]] = call fast float @llvm.sqrt.f32(float %x)
	; CHECK-NEXT: ret float [[SQRT]]			; CHECK-NEXT: ret float [[SQRT]]
	;			;
	%pow = call fast float @llvm.pow.f32(float %x, float 5.0e-01)			%pow = call fast float @llvm.pow.f32(float %x, float 5.0e-01)
	ret float %pow			ret float %pow
	}			}

				define double @pow_intrinsic_quarter_fast(double %x) {
				; CHECK-LABEL: @pow_intrinsic_quarter_fast(
				; CHECK-NEXT: [[SQRT:%.*]] = call fast double @llvm.sqrt.f64(double %x)
				; CHECK-NEXT: [[SQRT1:%.*]] = call fast double @llvm.sqrt.f64(double [[SQRT]])
				; CHECK-NEXT: ret double [[SQRT1]]
				;
				%pow = call fast double @llvm.pow.f64(double %x, double 2.5e-01)
				ret double %pow
				}

	define <2 x double> @pow_intrinsic_half_approx(<2 x double> %x) {			define <2 x double> @pow_intrinsic_half_approx(<2 x double> %x) {
	; CHECK-LABEL: @pow_intrinsic_half_approx(			; CHECK-LABEL: @pow_intrinsic_half_approx(
	; CHECK-NEXT: [[SQRT:%.*]] = call afn <2 x double> @llvm.sqrt.v2f64(<2 x double> %x)			; CHECK-NEXT: [[SQRT:%.*]] = call afn <2 x double> @llvm.sqrt.v2f64(<2 x double> %x)
	; CHECK-NEXT: [[TMP1:%.*]] = call afn <2 x double> @llvm.fabs.v2f64(<2 x double> [[SQRT]])			; CHECK-NEXT: [[TMP1:%.*]] = call afn <2 x double> @llvm.fabs.v2f64(<2 x double> [[SQRT]])
	; CHECK-NEXT: [[TMP2:%.*]] = fcmp afn oeq <2 x double> %x, <double 0xFFF0000000000000, double 0xFFF0000000000000>			; CHECK-NEXT: [[TMP2:%.*]] = fcmp afn oeq <2 x double> %x, <double 0xFFF0000000000000, double 0xFFF0000000000000>
	; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP2]], <2 x double> <double 0x7FF0000000000000, double 0x7FF0000000000000>, <2 x double> [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP2]], <2 x double> <double 0x7FF0000000000000, double 0x7FF0000000000000>, <2 x double> [[TMP1]]
	; CHECK-NEXT: ret <2 x double> [[TMP3]]			; CHECK-NEXT: ret <2 x double> [[TMP3]]
	;			;
	%pow = call afn <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 5.0e-01, double 5.0e-01>)			%pow = call afn <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 5.0e-01, double 5.0e-01>)
	ret <2 x double> %pow			ret <2 x double> %pow
	}			}

				define <2 x float> @powf_intrinsic_quarter_approx(<2 x float> %x) {
				; CHECK-LABEL: @powf_intrinsic_quarter_approx(
				; CHECK-NEXT: [[SQRT:%.*]] = call afn <2 x float> @llvm.sqrt.v2f32(<2 x float> %x)
				; CHECK-NEXT: [[SQRT1:%.*]] = call afn <2 x float> @llvm.sqrt.v2f32(<2 x float> [[SQRT]])
				; CHECK-NEXT: [[TMP1:%.*]] = call afn <2 x float> @llvm.fabs.v2f32(<2 x float> [[SQRT1]])
				; CHECK-NEXT: [[TMP2:%.*]] = fcmp afn oeq <2 x float> %x, <float 0xFFF0000000000000, float 0xFFF0000000000000>
				; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP2]], <2 x float> <float 0x7FF0000000000000, float 0x7FF0000000000000>, <2 x float> [[TMP1]]
				; CHECK-NEXT: ret <2 x float> [[TMP3]]
				;
				%pow = call afn <2 x float> @llvm.pow.v2f32(<2 x float> %x, <2 x float> <float 2.5e-01, float 2.5e-01>)
				ret <2 x float> %pow
				}

	define float @powf_libcall_half_ninf(float %x) {			define float @powf_libcall_half_ninf(float %x) {
	; CHECK-LABEL: @powf_libcall_half_ninf(			; CHECK-LABEL: @powf_libcall_half_ninf(
	; CHECK-NEXT: [[SQRTF:%.*]] = call ninf float @sqrtf(float %x)			; CHECK-NEXT: [[SQRTF:%.*]] = call ninf float @sqrtf(float %x)
	; CHECK-NEXT: [[TMP1:%.*]] = call ninf float @llvm.fabs.f32(float [[SQRTF]])			; CHECK-NEXT: [[TMP1:%.*]] = call ninf float @llvm.fabs.f32(float [[SQRTF]])
	; CHECK-NEXT: ret float [[TMP1]]			; CHECK-NEXT: ret float [[TMP1]]
	;			;
	%pow = call ninf float @powf(float %x, float 5.0e-01)			%pow = call ninf float @powf(float %x, float 5.0e-01)
	ret float %pow			ret float %pow
	}			}

				define double @pow_libcall_quarter_ninf(double %x) {
				; CHECK-LABEL: @pow_libcall_quarter_ninf(
				; CHECK-NEXT: [[POW:%.*]] = call ninf double @pow(double %x, double 2.500000e-01)
				; CHECK-NEXT: ret double [[POW]]
				;
				%pow = call ninf double @pow(double %x, double 2.5e-01)
				ret double %pow
				}

	define <2 x double> @pow_intrinsic_neghalf_fast(<2 x double> %x) {			define <2 x double> @pow_intrinsic_neghalf_fast(<2 x double> %x) {
	; CHECK-LABEL: @pow_intrinsic_neghalf_fast(			; CHECK-LABEL: @pow_intrinsic_neghalf_fast(
	; CHECK-NEXT: [[SQRT:%.]] = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> [[X:%.]])			; CHECK-NEXT: [[SQRT:%.*]] = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> %x)
	; CHECK-NEXT: [[RECP:%.*]] = fdiv fast <2 x double> <double 1.000000e+00, double 1.000000e+00>, [[SQRT]]			; CHECK-NEXT: [[TMP1:%.*]] = fdiv fast <2 x double> <double 1.000000e+00, double 1.000000e+00>, [[SQRT]]
	; CHECK-NEXT: ret <2 x double> [[RECP]]			; CHECK-NEXT: ret <2 x double> [[TMP1]]
	;			;
	%pow = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double -5.0e-01, double -5.0e-01>)			%pow = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double -5.0e-01, double -5.0e-01>)
	ret <2 x double> %pow			ret <2 x double> %pow
	}			}

				define <4 x float> @powf_intrinsic_negquarter_fast(<4 x float> %x) {
				; CHECK-LABEL: @powf_intrinsic_negquarter_fast(
				; CHECK-NEXT: [[SQRT:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %x)
				; CHECK-NEXT: [[SQRT1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[SQRT]])
				; CHECK-NEXT: [[TMP1:%.*]] = fdiv fast <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, [[SQRT1]]
				; CHECK-NEXT: ret <4 x float> [[TMP1]]
				;
				%pow = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float -2.5e-01, float -2.5e-01, float -2.5e-01, float -2.5e-01>)
				ret <4 x float> %pow
				}

	define <2 x float> @powf_intrinsic_neghalf_ninf(<2 x float> %x) {			define <2 x float> @powf_intrinsic_neghalf_ninf(<2 x float> %x) {
	; CHECK-LABEL: @powf_intrinsic_neghalf_ninf(			; CHECK-LABEL: @powf_intrinsic_neghalf_ninf(
	; CHECK-NEXT: [[SQRT:%.*]] = call ninf <2 x float> @llvm.sqrt.v2f32(<2 x float> %x)			; CHECK-NEXT: [[SQRT:%.*]] = call ninf <2 x float> @llvm.sqrt.v2f32(<2 x float> %x)
	; CHECK-NEXT: [[TMP1:%.*]] = call ninf <2 x float> @llvm.fabs.v2f32(<2 x float> [[SQRT]])			; CHECK-NEXT: [[TMP1:%.*]] = call ninf <2 x float> @llvm.fabs.v2f32(<2 x float> [[SQRT]])
	; CHECK-NEXT: [[TMP2:%.*]] = fdiv ninf <2 x float> <float 1.000000e+00, float 1.000000e+00>, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fdiv ninf <2 x float> <float 1.000000e+00, float 1.000000e+00>, [[TMP1]]
	; CHECK-NEXT: ret <2 x float> [[TMP2]]			; CHECK-NEXT: ret <2 x float> [[TMP2]]
	;			;
	%pow = call ninf <2 x float> @llvm.pow.v2f32(<2 x float> %x, <2 x float> <float -5.0e-01, float-5.0e-01>)			%pow = call ninf <2 x float> @llvm.pow.v2f32(<2 x float> %x, <2 x float> <float -5.0e-01, float-5.0e-01>)
	ret <2 x float> %pow			ret <2 x float> %pow
	}			}

				define <2 x double> @pow_intrinsic_negquarter_ninf(<2 x double> %x) {
				; CHECK-LABEL: @pow_intrinsic_negquarter_ninf(
				; CHECK-NEXT: [[SQRT:%.*]] = call ninf <2 x double> @llvm.sqrt.v2f64(<2 x double> %x)
				; CHECK-NEXT: [[SQRT1:%.*]] = call ninf <2 x double> @llvm.sqrt.v2f64(<2 x double> [[SQRT]])
				; CHECK-NEXT: [[TMP1:%.*]] = call ninf <2 x double> @llvm.fabs.v2f64(<2 x double> [[SQRT1]])
				; CHECK-NEXT: [[TMP2:%.*]] = fdiv ninf <2 x double> <double 1.000000e+00, double 1.000000e+00>, [[TMP1]]
				; CHECK-NEXT: ret <2 x double> [[TMP2]]
				;
				%pow = call ninf <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double -2.5e-01, double-2.5e-01>)
				ret <2 x double> %pow
				}

	define float @powf_libcall_neghalf_approx(float %x) {			define float @powf_libcall_neghalf_approx(float %x) {
	; CHECK-LABEL: @powf_libcall_neghalf_approx(			; CHECK-LABEL: @powf_libcall_neghalf_approx(
	; CHECK-NEXT: [[SQRTF:%.*]] = call afn float @sqrtf(float %x)			; CHECK-NEXT: [[SQRTF:%.*]] = call afn float @sqrtf(float %x)
	; CHECK-NEXT: [[TMP1:%.*]] = call afn float @llvm.fabs.f32(float [[SQRTF]])			; CHECK-NEXT: [[TMP1:%.*]] = call afn float @llvm.fabs.f32(float [[SQRTF]])
	; CHECK-NEXT: [[TMP2:%.*]] = fcmp afn oeq float %x, 0xFFF0000000000000			; CHECK-NEXT: [[TMP2:%.*]] = fcmp afn oeq float %x, 0xFFF0000000000000
	; CHECK-NEXT: [[DOTOP:%.*]] = fdiv afn float 1.000000e+00, [[TMP1]]			; CHECK-NEXT: [[DOTOP:%.*]] = fdiv afn float 1.000000e+00, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], float 0.000000e+00, float [[DOTOP]]			; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], float 0.000000e+00, float [[DOTOP]]
	; CHECK-NEXT: ret float [[TMP3]]			; CHECK-NEXT: ret float [[TMP3]]
	;			;
	%pow = call afn float @powf(float %x, float -5.0e-01)			%pow = call afn float @powf(float %x, float -5.0e-01)
	ret float %pow			ret float %pow
	}			}

				define double @pow_libcall_negquarter_approx(double %x) {
				; CHECK-LABEL: @pow_libcall_negquarter_approx(
				; CHECK-NEXT: [[POW:%.*]] = call afn double @pow(double %x, double -2.500000e-01)
				; CHECK-NEXT: ret double [[POW]]
				;
				%pow = call afn double @pow(double %x, double -2.5e-01)
				ret double %pow
				}

	define double @pow_libcall_neghalf_fast(double %x) {			define double @pow_libcall_neghalf_fast(double %x) {
	; CHECK-LABEL: @pow_libcall_neghalf_fast(			; CHECK-LABEL: @pow_libcall_neghalf_fast(
	; CHECK-NEXT: [[SQRT:%.*]] = call fast double @sqrt(double %x)			; CHECK-NEXT: [[SQRT:%.*]] = call fast double @sqrt(double %x)
	; CHECK-NEXT: [[TMP1:%.*]] = fdiv fast double 1.000000e+00, [[SQRT]]			; CHECK-NEXT: [[TMP1:%.*]] = fdiv fast double 1.000000e+00, [[SQRT]]
	; CHECK-NEXT: ret double [[TMP1]]			; CHECK-NEXT: ret double [[TMP1]]
	;			;
	%pow = call fast double @pow(double %x, double -5.0e-01)			%pow = call fast double @pow(double %x, double -5.0e-01)
	ret double %pow			ret double %pow
	}			}

				define float @powf_libcall_negquarter_fast(float %x) {
				; CHECK-LABEL: @powf_libcall_negquarter_fast(
				; CHECK-NEXT: [[POW:%.*]] = call fast float @powf(float %x, float -2.500000e-01)
				; CHECK-NEXT: ret float [[POW]]
				;
				%pow = call fast float @powf(float %x, float -2.5e-01)
				ret float %pow
				}

	declare double @llvm.pow.f64(double, double) #0			declare double @llvm.pow.f64(double, double) #0
	declare float @llvm.pow.f32(float, float) #0			declare float @llvm.pow.f32(float, float) #0
	declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>) #0			declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>) #0
	declare <2 x float> @llvm.pow.v2f32(<2 x float>, <2 x float>) #0			declare <2 x float> @llvm.pow.v2f32(<2 x float>, <2 x float>) #0
	declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>) #0			declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>) #0
	declare double @pow(double, double)			declare double @pow(double, double)
	declare float @powf(float, float)			declare float @powf(float, float)

	attributes #0 = { nounwind readnone speculatable }			attributes #0 = { nounwind readnone speculatable }
				attributes #1 = { nounwind readnone }