Download Raw Diff

Details

Reviewers

evandro
efriedma
spatel

Commits

rGcc9dc599ba91: [SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x)
rL341330: [SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x)

Diff Detail

Event Timeline

fhahn created this revision.Aug 29 2018, 8:54 AM

evandro added inline comments.Aug 29 2018, 12:52 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1394	May I suggest `s/VInt/Expo2/`?
1421	Methinks that `s/"sqrt"/TLI->getName(LibFunc_exp2)/` is more elegant. Additionally, if `pow()` is an intrinsic, you want to emit the intrinsic for `sqrt()` then, rather than a libcall always. Then, please the corresponding tests too.

Thanks Evandro! Move code to create Sqrt intrinsic/libcall call to helper function and use it. Rename variable as suggested.

fhahn marked 2 inline comments as done.Aug 29 2018, 2:42 PM

spatel added inline comments.Aug 29 2018, 3:01 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1381–1386	There was no real justification for this limit in D13994, but I suppose we're still ok with the transform +1 more instruction?
1395	I didn't catch the doubling of "?.5" followed by an isInteger check when I saw this the first time, so it deserves a code comment.

LGTM, but, please, wait for @spatel's agreement.

Add comment about how ExpoA == integer + 0.5 is detected.

fhahn added inline comments.Aug 30 2018, 1:59 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1381–1386	IIUC the main reason for adding this limit was to avoid generating too long fmul chains. I think adding a call to sqrt() is independent of that (similar to adding a call to fdiv for negative exponents), but I can either update the comment or only generate sqrt() for smaller exponents.

LGTM - see inline for minor improvements.

lib/Transforms/Utils/SimplifyLibCalls.cpp
1381–1386	The sqrt enhancement was mentioned in the original patch, so I won't hold this patch up... But this entire transform is questionable as an IR canonicalization (instcombine). The limit was chosen arbitrarily, and it applies universally even though the optimal limit will vary based on target. We're also doing this transform regardless of whether we are optimizing for size or not. There was a suggestion in the original patch that this should be a backend transform, and I think that was correct. Alternatively, there should be a backend reversal of this transform if the target would prefer to use a libcall instead of expanding. This should be noted in a TODO comment here.
test/Transforms/InstCombine/pow-4.ll
138	It would be nice to get a bit more coverage from these tests by varying the exponent and data types (the transform should work with vectors too?).

This revision is now accepted and ready to land.Aug 30 2018, 5:58 AM

evandro added inline comments.Aug 30 2018, 8:40 AM

test/Transforms/InstCombine/pow-4.ll
138	Good point.

Thanks! Added a comment and additional test cases. Please let me know if the comment makes sense.

Yes, it makes sense.

lib/Transforms/Utils/SimplifyLibCalls.cpp
1383	I wonder if the additional multiplication and the `sqrt()` should be counted towards this limit, as arbitrary as it is.

Comment + extra tests look good.

Given the revert of D49273, make the getSqrtCall() diff an NFC commit ahead of this patch + generalize that for any libcall, so we make sure that we're not generating libcalls when we're not allowed to?

In D51435#1219433, @spatel wrote:

Comment + extra tests look good.

Given the revert of D49273, make the getSqrtCall() diff an NFC commit ahead of this patch + generalize that for any libcall, so we make sure that we're not generating libcalls when we're not allowed to?

Will do tomorrow, thanks!

In D51435#1219433, @spatel wrote:

Comment + extra tests look good.

Given the revert of D49273, make the getSqrtCall() diff an NFC commit ahead of this patch + generalize that for any libcall, so we make sure that we're not generating libcalls when we're not allowed to?

I had a look at other uses that could benefit from a general getSqrtCall, but I am not entirely sure what the scope of it should be. Also, for sqrt, we use the intrinsic when possible, and a lib call if it is available otherwise. But e.g. for the pow(2.0 ** n, x) -> exp2(n * x) transform we check if an exp2 lib func is available. If it is not, we also do not emit an intrinsic, even if it would be possible. Should the generic function behave similar to the exp2 behavior?

In D51435#1222148, @fhahn wrote:

In D51435#1219433, @spatel wrote:

Comment + extra tests look good.

Given the revert of D49273, make the getSqrtCall() diff an NFC commit ahead of this patch + generalize that for any libcall, so we make sure that we're not generating libcalls when we're not allowed to?

I had a look at other uses that could benefit from a general getSqrtCall, but I am not entirely sure what the scope of it should be. Also, for sqrt, we use the intrinsic when possible, and a lib call if it is available otherwise. But e.g. for the pow(2.0 ** n, x) -> exp2(n * x) transform we check if an exp2 lib func is available. If it is not, we also do not emit an intrinsic, even if it would be possible. Should the generic function behave similar to the exp2 behavior?

That's an interesting question. IIUC, we don't generate an exp2 intrinsic if the libcall is not available because we expect that most targets would end up lowering to that libcall anyway (they don't have hardware support for exp2). But that's probably not true for sqrt - either a compliant or estimate sqrt instruction probably is available even if the libcall is not. That question/problem is noted in the 'TODO' comment for the sqrt libcall test.

This is probably worth asking about on llvm-dev if not via another patch. Either way, I don't think we should hold this patch up while deciding how to fix that.

Ok thanks! So I can commit this patch as is?

In D51435#1222236, @fhahn wrote:

Ok thanks! So I can commit this patch as is?

Yes, no objections from me. I think we have sufficient TODOs sprinkled around that make it clear how we can improve things more if needed.

Closed by commit rL341330: [SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x) (authored by fhahn). · Explain WhySep 3 2018, 10:38 AM

This revision was automatically updated to reflect the committed changes.

Great, thanks for the review!

Diff 163349

lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 1,258 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::replacePowWithExp(CallInst Pow, IRBuilder<> &B) {
// TODO: There is no exp10() intrinsic yet, but some day there shall be one.		// TODO: There is no exp10() intrinsic yet, but some day there shall be one.
if (match(Base, m_SpecificFP(10.0)) &&		if (match(Base, m_SpecificFP(10.0)) &&
hasUnaryFloatFn(TLI, Ty, LibFunc_exp10, LibFunc_exp10f, LibFunc_exp10l))		hasUnaryFloatFn(TLI, Ty, LibFunc_exp10, LibFunc_exp10f, LibFunc_exp10l))
return emitUnaryFloatFnCall(Expo, TLI->getName(LibFunc_exp10), B, Attrs);		return emitUnaryFloatFnCall(Expo, TLI->getName(LibFunc_exp10), B, Attrs);

return nullptr;		return nullptr;
}		}

		static Value getSqrtCall(Value V, AttributeList Attrs, bool NoErrno,
		Module *M, IRBuilder<> &B,
		const TargetLibraryInfo *TLI) {
		// If errno is never set, then use the intrinsic for sqrt().
		if (NoErrno) {
		Function *SqrtFn =
		Intrinsic::getDeclaration(M, Intrinsic::sqrt, V->getType());
		return B.CreateCall(SqrtFn, V, "sqrt");
		}

		// Otherwise, use the libcall for sqrt().
		if (hasUnaryFloatFn(TLI, V->getType(), LibFunc_sqrt, LibFunc_sqrtf,
		LibFunc_sqrtl))
		// TODO: We also should check that the target can in fact lower the sqrt()
		// libcall. We currently have no way to ask this question, so we ask if
		// the target has a sqrt() libcall, which is not exactly the same.
		return emitUnaryFloatFnCall(V, TLI->getName(LibFunc_sqrt), B, Attrs);

		return nullptr;
		}

/// Use square root in place of pow(x, +/-0.5).		/// Use square root in place of pow(x, +/-0.5).
Value LibCallSimplifier::replacePowWithSqrt(CallInst Pow, IRBuilder<> &B) {		Value LibCallSimplifier::replacePowWithSqrt(CallInst Pow, IRBuilder<> &B) {
Value Sqrt, Base = Pow->getArgOperand(0), *Expo = Pow->getArgOperand(1);		Value Sqrt, Base = Pow->getArgOperand(0), *Expo = Pow->getArgOperand(1);
AttributeList Attrs = Pow->getCalledFunction()->getAttributes();		AttributeList Attrs = Pow->getCalledFunction()->getAttributes();
Module *Mod = Pow->getModule();		Module *Mod = Pow->getModule();
Type *Ty = Pow->getType();		Type *Ty = Pow->getType();

const APFloat *ExpoF;		const APFloat *ExpoF;
if (!match(Expo, m_APFloat(ExpoF)) \|\|		if (!match(Expo, m_APFloat(ExpoF)) \|\|
(!ExpoF->isExactlyValue(0.5) && !ExpoF->isExactlyValue(-0.5)))		(!ExpoF->isExactlyValue(0.5) && !ExpoF->isExactlyValue(-0.5)))
return nullptr;		return nullptr;

// If errno is never set, then use the intrinsic for sqrt().		Sqrt = getSqrtCall(Base, Attrs, Pow->doesNotAccessMemory(), Mod, B, TLI);
if (Pow->doesNotAccessMemory()) {		if (!Sqrt)
Function *SqrtFn = Intrinsic::getDeclaration(Pow->getModule(),
Intrinsic::sqrt, Ty);
Sqrt = B.CreateCall(SqrtFn, Base, "sqrt");
}
// Otherwise, use the libcall for sqrt().
else if (hasUnaryFloatFn(TLI, Ty, LibFunc_sqrt, LibFunc_sqrtf, LibFunc_sqrtl))
// TODO: We also should check that the target can in fact lower the sqrt()
// libcall. We currently have no way to ask this question, so we ask if
// the target has a sqrt() libcall, which is not exactly the same.
Sqrt = emitUnaryFloatFnCall(Base, TLI->getName(LibFunc_sqrt), B, Attrs);
else
return nullptr;		return nullptr;

// Handle signed zero base by expanding to fabs(sqrt(x)).		// Handle signed zero base by expanding to fabs(sqrt(x)).
if (!Pow->hasNoSignedZeros()) {		if (!Pow->hasNoSignedZeros()) {
Function *FAbsFn = Intrinsic::getDeclaration(Mod, Intrinsic::fabs, Ty);		Function *FAbsFn = Intrinsic::getDeclaration(Mod, Intrinsic::fabs, Ty);
Sqrt = B.CreateCall(FAbsFn, Sqrt, "abs");		Sqrt = B.CreateCall(FAbsFn, Sqrt, "abs");
}		}

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (match(Expo, m_SpecificFP(2.0)))
return B.CreateFMul(Base, Base, "square");		return B.CreateFMul(Base, Base, "square");

if (Value *Sqrt = replacePowWithSqrt(Pow, B))		if (Value *Sqrt = replacePowWithSqrt(Pow, B))
return Sqrt;		return Sqrt;

// pow(x, n) -> x * x * x * ...		// pow(x, n) -> x * x * x * ...
const APFloat *ExpoF;		const APFloat *ExpoF;
if (Pow->isFast() && match(Expo, m_APFloat(ExpoF))) {		if (Pow->isFast() && match(Expo, m_APFloat(ExpoF))) {
// We limit to a max of 7 multiplications, thus the maximum exponent is 32.		// We limit to a max of 7 multiplications, thus the maximum exponent is 32.
		// If the exponent is an integer+0.5 we generate a call to sqrt and an
		// additional fmul.
		evandroUnsubmitted Not Done Reply Inline Actions I wonder if the additional multiplication and the `sqrt()` should be counted towards this limit, as arbitrary as it is. evandro: I wonder if the additional multiplication and the `sqrt()` should be counted towards this limit…
		// TODO: This whole transformation should be backend specific (e.g. some
		// backends might prefer libcalls or the limit for the exponent might
		// be different) and it should also consider optimizing for size.
		spatelUnsubmitted Not Done Reply Inline Actions There was no real justification for this limit in D13994, but I suppose we're still ok with the transform +1 more instruction? spatel: There was no real justification for this limit in D13994, but I suppose we're still ok with the…
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions IIUC the main reason for adding this limit was to avoid generating too long fmul chains. I think adding a call to sqrt() is independent of that (similar to adding a call to fdiv for negative exponents), but I can either update the comment or only generate sqrt() for smaller exponents. fhahn: IIUC the main reason for adding this limit was to avoid generating too long fmul chains. I…
		spatelUnsubmitted Not Done Reply Inline Actions The sqrt enhancement was mentioned in the original patch, so I won't hold this patch up... But this entire transform is questionable as an IR canonicalization (instcombine). The limit was chosen arbitrarily, and it applies universally even though the optimal limit will vary based on target. We're also doing this transform regardless of whether we are optimizing for size or not. There was a suggestion in the original patch that this should be a backend transform, and I think that was correct. Alternatively, there should be a backend reversal of this transform if the target would prefer to use a libcall instead of expanding. This should be noted in a TODO comment here. spatel: The sqrt enhancement was mentioned in the original patch, so I won't hold this patch up... But…
APFloat LimF(ExpoF->getSemantics(), 33.0),		APFloat LimF(ExpoF->getSemantics(), 33.0),
ExpoA(abs(*ExpoF));		ExpoA(abs(*ExpoF));
if (ExpoA.isInteger() && ExpoA.compare(LimF) == APFloat::cmpLessThan) {		if (ExpoA.compare(LimF) == APFloat::cmpLessThan) {
		// This transformation applies to integer or integer+0.5 exponents only.
		// For integer+0.5, we create a sqrt(Base) call.
		Value *Sqrt = nullptr;
		if (!ExpoA.isInteger()) {
		APFloat Expo2 = ExpoA;
		evandroUnsubmitted Done Reply Inline Actions May I suggest `s/VInt/Expo2/`? evandro: May I suggest `s/VInt/Expo2/`?
		// To check if ExpoA is an integer + 0.5, we add it to itself. If there
		spatelUnsubmitted Done Reply Inline Actions I didn't catch the doubling of "?.5" followed by an isInteger check when I saw this the first time, so it deserves a code comment. spatel: I didn't catch the doubling of "?.5" followed by an isInteger check when I saw this the first…
		// is no floating point exception and the result is an integer, then
		// ExpoA == integer + 0.5
		if (Expo2.add(ExpoA, APFloat::rmNearestTiesToEven) != APFloat::opOK)
		return nullptr;

		if (!Expo2.isInteger())
		return nullptr;

		Sqrt =
		getSqrtCall(Base, Pow->getCalledFunction()->getAttributes(),
		Pow->doesNotAccessMemory(), Pow->getModule(), B, TLI);
		}

// We will memoize intermediate products of the Addition Chain.		// We will memoize intermediate products of the Addition Chain.
Value *InnerChain[33] = {nullptr};		Value *InnerChain[33] = {nullptr};
InnerChain[1] = Base;		InnerChain[1] = Base;
InnerChain[2] = B.CreateFMul(Base, Base, "square");		InnerChain[2] = B.CreateFMul(Base, Base, "square");

// We cannot readily convert a non-double type (like float) to a double.		// We cannot readily convert a non-double type (like float) to a double.
// So we first convert it to something which could be converted to double.		// So we first convert it to something which could be converted to double.
ExpoA.convert(APFloat::IEEEdouble(), APFloat::rmTowardZero, &Ignored);		ExpoA.convert(APFloat::IEEEdouble(), APFloat::rmTowardZero, &Ignored);
Value *FMul = getPow(InnerChain, ExpoA.convertToDouble(), B);		Value *FMul = getPow(InnerChain, ExpoA.convertToDouble(), B);

		// Expand pow(x, y+0.5) to pow(x, y) * sqrt(x).
		if (Sqrt)
		FMul = B.CreateFMul(FMul, Sqrt);
		evandroUnsubmitted Done Reply Inline Actions Methinks that `s/"sqrt"/TLI->getName(LibFunc_exp2)/` is more elegant. Additionally, if `pow()` is an intrinsic, you want to emit the intrinsic for `sqrt()` then, rather than a libcall always. Then, please the corresponding tests too. evandro: Methinks that `s/"sqrt"/TLI->getName(LibFunc_exp2)/` is more elegant. Additionally, if `pow()`…

// If the exponent is negative, then get the reciprocal.		// If the exponent is negative, then get the reciprocal.
if (ExpoF->isNegative())		if (ExpoF->isNegative())
FMul = B.CreateFDiv(ConstantFP::get(Ty, 1.0), FMul, "reciprocal");		FMul = B.CreateFDiv(ConstantFP::get(Ty, 1.0), FMul, "reciprocal");

return FMul;		return FMul;
}		}
}		}

▲ Show 20 Lines • Show All 1,427 Lines • Show Last 20 Lines

test/Transforms/InstCombine/pow-4.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -instcombine -S < %s \| FileCheck %s

	declare double @llvm.pow.f64(double, double)			declare double @llvm.pow.f64(double, double)
	declare float @llvm.pow.f32(float, float)			declare float @llvm.pow.f32(float, float)
	declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)
	declare <2 x float> @llvm.pow.v2f32(<2 x float>, <2 x float>)			declare <2 x float> @llvm.pow.v2f32(<2 x float>, <2 x float>)
				declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)
				declare double @pow(double, double)

	; pow(x, 3.0)			; pow(x, 3.0)
	define double @test_simplify_3(double %x) {			define double @test_simplify_3(double %x) {
	; CHECK-LABEL: @test_simplify_3(			; CHECK-LABEL: @test_simplify_3(
	; CHECK-NEXT: [[TMP1:%.]] = fmul fast double [[X:%.]], [[X]]			; CHECK-NEXT: [[TMP1:%.]] = fmul fast double [[X:%.]], [[X]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[X]]
	; CHECK-NEXT: ret double [[TMP2]]			; CHECK-NEXT: ret double [[TMP2]]
	;			;
	▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @test_simplify_33(			; CHECK-LABEL: @test_simplify_33(
	; CHECK-NEXT: [[TMP1:%.]] = call fast double @llvm.pow.f64(double [[X:%.]], double 3.300000e+01)			; CHECK-NEXT: [[TMP1:%.]] = call fast double @llvm.pow.f64(double [[X:%.]], double 3.300000e+01)
	; CHECK-NEXT: ret double [[TMP1]]			; CHECK-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @llvm.pow.f64(double %x, double 3.300000e+01)			%1 = call fast double @llvm.pow.f64(double %x, double 3.300000e+01)
	ret double %1			ret double %1
	}			}

				; pow(x, 16.5) with double
				define double @test_simplify_16_5(double %x) {
				; CHECK-LABEL: @test_simplify_16_5(
				; CHECK-NEXT: [[SQRT:%.*]] = call fast double @llvm.sqrt.f64(double [[X]])
				; CHECK-NEXT: [[SQUARE:%.]] = fmul fast double [[X:%.]], [[X]]
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
				; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
				; CHECK-NEXT: ret double [[TMP4]]
				;
				%1 = call fast double @llvm.pow.f64(double %x, double 1.650000e+01)
				ret double %1
				}

				; pow(x, -16.5) with double
				define double @test_simplify_neg_16_5(double %x) {
				spatelUnsubmitted Not Done Reply Inline Actions It would be nice to get a bit more coverage from these tests by varying the exponent and data types (the transform should work with vectors too?). spatel: It would be nice to get a bit more coverage from these tests by varying the exponent and data…
				evandroUnsubmitted Not Done Reply Inline Actions Good point. evandro: Good point.
				; CHECK-LABEL: @test_simplify_neg_16_5(
				; CHECK-NEXT: [[SQRT:%.*]] = call fast double @llvm.sqrt.f64(double [[X]])
				; CHECK-NEXT: [[SQUARE:%.]] = fmul fast double [[X:%.]], [[X]]
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
				; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
				; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]
				; CHECK-NEXT: ret double [[RECIPROCAL]]
				;
				%1 = call fast double @llvm.pow.f64(double %x, double -1.650000e+01)
				ret double %1
				}

				; pow(x, 16.5) with double
				define double @test_simplify_16_5_libcall(double %x) {
				; CHECK-LABEL: @test_simplify_16_5_libcall(
				; CHECK-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])
				; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
				; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
				; CHECK-NEXT: ret double [[TMP4]]
				;
				%1 = call fast double @pow(double %x, double 1.650000e+01)
				ret double %1
				}

				; pow(x, -16.5) with double
				define double @test_simplify_neg_16_5_libcall(double %x) {
				; CHECK-LABEL: @test_simplify_neg_16_5_libcall(
				; CHECK-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])
				; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
				; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
				; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]
				; CHECK-NEXT: ret double [[RECIPROCAL]]
				;
				%1 = call fast double @pow(double %x, double -1.650000e+01)
				ret double %1
				}

				; pow(x, -8.5) with float
				define float @test_simplify_neg_8_5(float %x) {
				; CHECK-LABEL: @test_simplify_neg_8_5(
				; CHECK-NEXT: [[SQRT:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])
				; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast float [[X]], [[X]]
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast float [[SQUARE]], [[SQUARE]]
				; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[TMP1]], [[SQRT]]
				; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast float 1.000000e+00, [[TMP2]]
				; CHECK-NEXT: ret float [[RECIPROCAL]]
				;
				%1 = call fast float @llvm.pow.f32(float %x, float -0.450000e+01)
				ret float %1
				}

				; pow(x, 7.5) with <2 x double>
				define <2 x double> @test_simplify_7_5(<2 x double> %x) {
				; CHECK-LABEL: @test_simplify_7_5(
				; CHECK-NEXT: [[SQRT:%.]] = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> [[X:%.]])
				; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast <2 x double> [[X]], [[X]]
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <2 x double> [[SQUARE]], [[SQUARE]]
				; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x double> [[TMP1]], [[X]]
				; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x double> [[SQUARE]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x double> [[TMP3]], [[SQRT]]
				; CHECK-NEXT: ret <2 x double> [[TMP4]]
				;
				%1 = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 7.500000e+00, double 7.500000e+00>)
				ret <2 x double> %1
				}

				; pow(x, 3.5) with <4 x float>
				define <4 x float> @test_simplify_3_5(<4 x float> %x) {
				; CHECK-LABEL: @test_simplify_3_5(
				; CHECK-NEXT: [[SQRT:%.]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[X:%.]])
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <4 x float> [[X]], [[X]]
				; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[X]]
				; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[SQRT]]
				; CHECK-NEXT: ret <4 x float> [[TMP3]]
				;
				%1 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 3.500000e+00, float 3.500000e+00, float 3.500000e+00, float 3.500000e+00>)
				ret <4 x float> %1
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 163349

lib/Transforms/Utils/SimplifyLibCalls.cpp

test/Transforms/InstCombine/pow-4.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 163349

lib/Transforms/Utils/SimplifyLibCalls.cpp

test/Transforms/InstCombine/pow-4.ll

[SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x)
ClosedPublic