Download Raw Diff

Details

Reviewers

spatel
efriedma

Summary

This transformation helps some benchmarks in SPEC CPU200 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64.

Diff Detail

Event Timeline

evandro created this revision.Jul 6 2018, 1:38 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptJul 6 2018, 1:38 PM

Tests?

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1138–1139	There seems to be two variants: https://godbolt.org/g/Rw1Gxt Can you output the value that doesn't match?

Missing testcases.

I'm not sure what you mean by "a hard time matching the exponent"; I can't see any reason float would be different from double, assuming you're actually using the right constant.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1125	You need nsz: `pow(-0., 1./3)` returns +0, but `cbrt(-0.)` returns -0. I think I'd prefer to require afn for this; not sure it's necessary, but better to be safe. Please add explicit comments explaining why you need nnan and ninf (nnan because pow() returns a nan for negative x, ninf for `pow(-inf, 1./3)`).

Shouldn't this patch use the existing code in replacePowWithSqrt(), so we're not (incompletely) duplicating the logic?

That code also has a TODO comment about choosing the minimal set of FMF to enable the fold. Whatever we decide that predicate will be should be identical for both transforms?

In D49040#1155206, @spatel wrote:

Shouldn't this patch use the existing code in replacePowWithSqrt(), so we're not (incompletely) duplicating the logic?

That code also has a TODO comment about choosing the minimal set of FMF to enable the fold. Whatever we decide that predicate will be should be identical for both transforms?

The logic is almost the same, except that sqrt() has a corresponding intrinsic and cbrt() doesn't. So. at elast for now, methinks that it's easier to understand and review this change as a separate function. Then, if needed, both functions could be merged.

Add a test case.

evandro updated this revision to Diff 154624.Jul 9 2018, 9:25 AM

evandro added inline comments.Jul 9 2018, 10:48 AM

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1138–1139	Please, see any example using `float` in the test case below.

evandro marked 2 inline comments as done.Jul 9 2018, 12:11 PM

evandro added inline comments.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1138–1139	My bad. I crafted the test case using the IEEE754 bits for SP instead of the bits for DP truncated for `float`.

Enable handling of float.

lebedev.ri added inline comments.Jul 9 2018, 12:13 PM

llvm/test/Transforms/InstCombine/pow-cbrt.ll
2	Just use `./utils/update_test_checks.py`

efriedma added inline comments.Jul 9 2018, 12:20 PM

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1125	isFast() is deprecated, because it makes the actual requirements unclear and disables optimizations where it isn't necessary. (In particular, you don't need reassoc here.)

evandro marked 2 inline comments as done.Jul 9 2018, 12:38 PM

evandro updated this revision to Diff 154679.Jul 9 2018, 12:41 PM

Refactor all of the code that simplifies pow(x, 0.5) to sqrt() by folding it into a new function that handles all simplifications to radical operations.

Allow splat vectors for trivial simplifications.

evandro added a child revision: D49273: [InstCombine] Expand the simplification of pow() into exp2().Jul 12 2018, 4:07 PM

evandro added a child revision: D49306: [SLC] Simplify pow(x, 0.25) to sqrt(sqrt(x)).Jul 13 2018, 10:09 AM

Ping! 🔔

¡Ping! 🔔🔔

If I'm seeing it correctly, there are several independent changes going on here. Can you split this up to make the review easier?

Add all new tests with baseline CHECKs as an NFC preliminary step.
It's not clear to me what the FMF diffs in pow-sqrt.ll are showing. If we are adjusting FMF constraints on existing folds, that should be an independent patch?
The cosmetic diffs in variable names (Base, Expo, Shrunk, etc) look fine, so those can be another preliminary NFC commit before the part that needs further review.
The cbrt transform can be the last patch/commit in this series; once we have the cleanup done, that should be a very small diff.

OK. Please, stay tuned.

evandro mentioned this in rL338152: [SLC] Test simplification of pow(x, 0.333...) to cbrt(x) (NFC).Jul 27 2018, 11:57 AM

Committed rL338152 to add the base line test case pow-cbrt.ll.

evandro added a parent revision: D50036: [SLC] Expand the simplification of pow(x, 0.5) to sqrt(x).Jul 30 2018, 7:01 PM

evandro updated this revision to Diff 158160.Jul 30 2018, 7:03 PM

evandro edited reviewers, added: efriedma; removed: davide, beanz.Jul 30 2018, 7:07 PM

evandro edited subscribers, added: davide, beanz; removed: efriedma.

evandro updated this revision to Diff 158438.Jul 31 2018, 6:13 PM

evandro edited the summary of this revision. (Show Details)

lebedev.ri mentioned this in D49306: [SLC] Simplify pow(x, 0.25) to sqrt(sqrt(x)).Jul 31 2018, 11:16 PM

Ping! 🔔

As we discussed in D49306, I agree that this is probably a good perf transform, but I don't think we've shown any compelling reason to do this in IR vs. DAGCombiner.

There are downsides to doing this in IR currently because we don't have a cbrt intrinsic. That means we have different behavior based on data type (vectors won't get transformed).
It's also not clear why transforming to a form with fdiv in the negative exponent case is better than a single pow instruction. (And that case probably needs some perf justification even as a backend fold.)

OK, then.

evandro removed a child revision: D49273: [InstCombine] Expand the simplification of pow() into exp2().Aug 9 2018, 2:23 PM

evandro added a subscriber: fhahn.Aug 17 2018, 8:02 AM

spatel mentioned this in D51753: [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x).Sep 6 2018, 2:16 PM

spatel mentioned this in rL342348: [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x).Sep 16 2018, 9:51 AM

Diff 154679

llvm/include/llvm/Transforms/Utils/SimplifyLibCalls.h

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	private:
Value optimizeWcslen(CallInst CI, IRBuilder<> &B);		Value optimizeWcslen(CallInst CI, IRBuilder<> &B);
// Wrapper for all String/Memory Library Call Optimizations		// Wrapper for all String/Memory Library Call Optimizations
Value optimizeStringMemoryLibCall(CallInst CI, IRBuilder<> &B);		Value optimizeStringMemoryLibCall(CallInst CI, IRBuilder<> &B);

// Math Library Optimizations		// Math Library Optimizations
Value optimizeCAbs(CallInst CI, IRBuilder<> &B);		Value optimizeCAbs(CallInst CI, IRBuilder<> &B);
Value optimizeCos(CallInst CI, IRBuilder<> &B);		Value optimizeCos(CallInst CI, IRBuilder<> &B);
Value optimizePow(CallInst CI, IRBuilder<> &B);		Value optimizePow(CallInst CI, IRBuilder<> &B);
		Value replacePowWithCbrt(CallInst Pow, IRBuilder<> &B);
Value replacePowWithSqrt(CallInst Pow, IRBuilder<> &B);		Value replacePowWithSqrt(CallInst Pow, IRBuilder<> &B);
Value optimizeExp2(CallInst CI, IRBuilder<> &B);		Value optimizeExp2(CallInst CI, IRBuilder<> &B);
Value optimizeFMinFMax(CallInst CI, IRBuilder<> &B);		Value optimizeFMinFMax(CallInst CI, IRBuilder<> &B);
Value optimizeLog(CallInst CI, IRBuilder<> &B);		Value optimizeLog(CallInst CI, IRBuilder<> &B);
Value optimizeSqrt(CallInst CI, IRBuilder<> &B);		Value optimizeSqrt(CallInst CI, IRBuilder<> &B);
Value optimizeSinCosPi(CallInst CI, IRBuilder<> &B);		Value optimizeSinCosPi(CallInst CI, IRBuilder<> &B);
Value optimizeTan(CallInst CI, IRBuilder<> &B);		Value optimizeTan(CallInst CI, IRBuilder<> &B);
// Wrapper for all floating point library call optimizations		// Wrapper for all floating point library call optimizations
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 1,113 Lines • ▼ Show 20 Lines	static const unsigned AddChain[33][2] = {
{3, 24}, {14, 14}, {4, 25}, {15, 15}, {3, 28}, {16, 16},		{3, 24}, {14, 14}, {4, 25}, {15, 15}, {3, 28}, {16, 16},
};		};

InnerChain[Exp] = B.CreateFMul(getPow(InnerChain, AddChain[Exp][0], B),		InnerChain[Exp] = B.CreateFMul(getPow(InnerChain, AddChain[Exp][0], B),
getPow(InnerChain, AddChain[Exp][1], B));		getPow(InnerChain, AddChain[Exp][1], B));
return InnerChain[Exp];		return InnerChain[Exp];
}		}

		/// Use cube root in place of pow(x, +/-0.333...).
		Value LibCallSimplifier::replacePowWithCbrt(CallInst Pow, IRBuilder<> &B) {
		// Only in finite and normal math.
		if (!Pow->hasApproxFunc() \|\|
		efriedmaUnsubmitted Done Reply Inline Actions You need nsz: `pow(-0., 1./3)` returns +0, but `cbrt(-0.)` returns -0. I think I'd prefer to require afn for this; not sure it's necessary, but better to be safe. Please add explicit comments explaining why you need nnan and ninf (nnan because pow() returns a nan for negative x, ninf for `pow(-inf, 1./3)`). efriedma: You need nsz: `pow(-0., 1./3)` returns +0, but `cbrt(-0.)` returns -0. I think I'd prefer to…
		efriedmaUnsubmitted Done Reply Inline Actions isFast() is deprecated, because it makes the actual requirements unclear and disables optimizations where it isn't necessary. (In particular, you don't need reassoc here.) efriedma: isFast() is deprecated, because it makes the actual requirements unclear and disables…
		!Pow->hasNoSignedZeros() \|\| !Pow->hasNoInfs() \|\| !Pow->hasNoNaNs())
		return nullptr;

		const APFloat *Arg2C;
		if (!match(Pow->getArgOperand(1), m_APFloat(Arg2C)))
		return nullptr;

		Type *Ty = Pow->getType();
		const double OneThird = (Ty->getTypeID() == Type::FloatTyID)
		? (1.0f / 3.0f) : (1.0 / 3.0);
		if (!Arg2C->isExactlyValue(OneThird) && !Arg2C->isExactlyValue(-OneThird))
		return nullptr;

		if (!hasUnaryFloatFn(TLI, Ty, LibFunc_cbrt, LibFunc_cbrtf, LibFunc_cbrtl))
		lebedev.riUnsubmitted Done Reply Inline Actions There seems to be two variants: https://godbolt.org/g/Rw1Gxt Can you output the value that doesn't match? lebedev.ri: There seems to be two variants: https://godbolt.org/g/Rw1Gxt Can you output the value that…
		evandroAuthorUnsubmitted Done Reply Inline Actions Please, see any example using `float` in the test case below. evandro: Please, see any example using `float` in the test case below.
		evandroAuthorUnsubmitted Not Done Reply Inline Actions My bad. I crafted the test case using the IEEE754 bits for SP instead of the bits for DP truncated for `float`. evandro: My bad. I crafted the test case using the IEEE754 bits for SP instead of the bits for DP…
		return nullptr;

		// Fast-math flags from the pow() are propagated to all replacement ops.
		IRBuilder<>::FastMathFlagGuard Guard(B);
		B.setFastMathFlags(Pow->getFastMathFlags());
		Value *Cbrt = emitUnaryFloatFnCall(Pow->getArgOperand(0),
		TLI->getName(LibFunc_cbrt), B,
		Pow->getCalledFunction()->getAttributes());

		// If this is pow(x, -0.333...), get the reciprocal.
		if (Arg2C->isExactlyValue(-OneThird))
		Cbrt = B.CreateFDiv(ConstantFP::get(Ty, 1.0), Cbrt);

		return Cbrt;
		}

/// Use square root in place of pow(x, +/-0.5).		/// Use square root in place of pow(x, +/-0.5).
Value LibCallSimplifier::replacePowWithSqrt(CallInst Pow, IRBuilder<> &B) {		Value LibCallSimplifier::replacePowWithSqrt(CallInst Pow, IRBuilder<> &B) {
// TODO: There is some subset of 'fast' under which these transforms should		// TODO: There is some subset of 'fast' under which these transforms should
// be allowed.		// be allowed.
if (!Pow->isFast())		if (!Pow->isFast())
return nullptr;		return nullptr;

const APFloat *Arg1C;		const APFloat *Arg1C;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	if (OpCCallee && TLI->getLibFunc(OpCCallee->getName(), Func) &&
return emitUnaryFloatFnCall(FMul, OpCCallee->getName(), B,		return emitUnaryFloatFnCall(FMul, OpCCallee->getName(), B,
OpCCallee->getAttributes());		OpCCallee->getAttributes());
}		}
}		}

if (Value *Sqrt = replacePowWithSqrt(CI, B))		if (Value *Sqrt = replacePowWithSqrt(CI, B))
return Sqrt;		return Sqrt;

		if (Value *Cbrt = replacePowWithCbrt(CI, B))
		return Cbrt;

ConstantFP *Op2C = dyn_cast<ConstantFP>(Op2);		ConstantFP *Op2C = dyn_cast<ConstantFP>(Op2);
if (!Op2C)		if (!Op2C)
return Ret;		return Ret;

if (Op2C->getValueAPF().isZero()) // pow(x, 0.0) -> 1.0		if (Op2C->getValueAPF().isZero()) // pow(x, 0.0) -> 1.0
return ConstantFP::get(CI->getType(), 1.0);		return ConstantFP::get(CI->getType(), 1.0);

// FIXME: Correct the transforms and pull this into replacePowWithSqrt().		// FIXME: Correct the transforms and pull this into replacePowWithSqrt().
▲ Show 20 Lines • Show All 1,490 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/pow-cbrt.ll

This file was added.

				; RUN: opt < %s -instcombine -S \| FileCheck %s

				lebedev.riUnsubmitted Done Reply Inline Actions Just use `./utils/update_test_checks.py` lebedev.ri: Just use `./utils/update_test_checks.py`
				define double @pow_intrinsic_third_fast(double %x) {
				; CHECK-LABEL: @pow_intrinsic_third_fast(
				; CHECK-NEXT: [[CBRT:%.*]] = call fast double @cbrt(double %x) #1
				; CHECK-NEXT: ret double [[CBRT]]
				;
				%pow = call fast double @llvm.pow.f64(double %x, double 0x3fd5555555555555)
				ret double %pow
				}

				define float @powf_intrinsic_third_fast(float %x) {
				; CHECK-LABEL: @powf_intrinsic_third_fast(
				; CHECK-NEXT: [[CBRTF:%.*]] = call fast float @cbrtf(float %x) #1
				; CHECK-NEXT: ret float [[CBRTF]]
				;
				%pow = call fast float @llvm.pow.f32(float %x, float 0x3fd5555560000000)
				ret float %pow
				}

				define double @pow_intrinsic_third_approx(double %x) {
				; CHECK-LABEL: @pow_intrinsic_third_approx(
				; CHECK-NEXT: [[POW:%.*]] = call afn double @llvm.pow.f64(double %x, double 0x3FD5555555555555)
				; CHECK-NEXT: ret double [[POW]]
				;
				%pow = call afn double @llvm.pow.f64(double %x, double 0x3fd5555555555555)
				ret double %pow
				}

				define float @powf_intrinsic_third_approx(float %x) {
				; CHECK-LABEL: @powf_intrinsic_third_approx(
				; CHECK-NEXT: [[POW:%.*]] = call afn float @llvm.pow.f32(float %x, float 0x3FD5555560000000)
				; CHECK-NEXT: ret float [[POW]]
				;
				%pow = call afn float @llvm.pow.f32(float %x, float 0x3fd5555560000000)
				ret float %pow
				}

				define double @pow_libcall_third_fast(double %x) {
				; CHECK-LABEL: @pow_libcall_third_fast(
				; CHECK-NEXT: [[CBRT:%.*]] = call fast double @cbrt(double %x)
				; CHECK-NEXT: ret double [[CBRT]]
				;
				%pow = call fast double @pow(double %x, double 0x3fd5555555555555)
				ret double %pow
				}

				define float @powf_libcall_third_fast(float %x) {
				; CHECK-LABEL: @powf_libcall_third_fast(
				; CHECK-NEXT: [[CBRTF:%.*]] = call fast float @cbrtf(float %x)
				; CHECK-NEXT: ret float [[CBRTF]]
				;
				%pow = call fast float @powf(float %x, float 0x3fd5555560000000)
				ret float %pow
				}

				define double @pow_intrinsic_negthird_fast(double %x) {
				; CHECK-LABEL: @pow_intrinsic_negthird_fast(
				; CHECK-NEXT: [[CBRT:%.*]] = call fast double @cbrt(double %x) #1
				; CHECK-NEXT: [[RECP:%.*]] = fdiv fast double 1.000000e+00, [[CBRT]]
				; CHECK-NEXT: ret double [[RECP]]
				;
				%pow = call fast double @llvm.pow.f64(double %x, double 0xbfd5555555555555)
				ret double %pow
				}

				define float @powf_intrinsic_negthird_fast(float %x) {
				; CHECK-LABEL: @powf_intrinsic_negthird_fast(
				; CHECK-NEXT: [[CBRTF:%.*]] = call fast float @cbrtf(float %x) #1
				; CHECK-NEXT: [[RECP:%.*]] = fdiv fast float 1.000000e+00, [[CBRTF]]
				; CHECK-NEXT: ret float [[RECP]]
				;
				%pow = call fast float @llvm.pow.f32(float %x, float 0xbfd5555560000000)
				ret float %pow
				}

				define double @pow_intrinsic_negthird_approx(double %x) {
				; CHECK-LABEL: @pow_intrinsic_negthird_approx(
				; CHECK-NEXT: [[POW:%.*]] = call afn double @llvm.pow.f64(double %x, double 0xBFD5555555555555)
				; CHECK-NEXT: ret double [[POW]]
				;
				%pow = call afn double @llvm.pow.f64(double %x, double 0xbfd5555555555555)
				ret double %pow
				}

				define float @powf_intrinsic_negthird_approx(float %x) {
				; CHECK-LABEL: @powf_intrinsic_negthird_approx(
				; CHECK-NEXT: [[POW:%.*]] = call afn float @llvm.pow.f32(float %x, float 0xBFD5555560000000)
				; CHECK-NEXT: ret float [[POW]]
				;
				%pow = call afn float @llvm.pow.f32(float %x, float 0xbfd5555560000000)
				ret float %pow
				}

				define double @pow_libcall_negthird_fast(double %x) {
				; CHECK-LABEL: @pow_libcall_negthird_fast(
				; CHECK-NEXT: [[CBRT:%.*]] = call fast double @cbrt(double %x)
				; CHECK-NEXT: [[RECP:%.*]] = fdiv fast double 1.000000e+00, [[CBRT]]
				; CHECK-NEXT: ret double [[RECP]]
				;
				%pow = call fast double @pow(double %x, double 0xbfd5555555555555)
				ret double %pow
				}

				define float @powf_libcall_negthird_fast(float %x) {
				; CHECK-LABEL: @powf_libcall_negthird_fast(
				; CHECK-NEXT: [[CBRTF:%.*]] = call fast float @cbrtf(float %x)
				; CHECK-NEXT: [[RECP:%.*]] = fdiv fast float 1.000000e+00, [[CBRTF]]
				; CHECK-NEXT: ret float [[RECP]]
				;
				%pow = call fast float @powf(float %x, float 0xbfd5555560000000)
				ret float %pow
				}

				declare double @llvm.pow.f64(double, double) #0
				declare float @llvm.pow.f32(float, float) #0
				declare double @pow(double, double)
				declare float @powf(float, float)

				attributes #0 = { nounwind readnone speculatable }

This is an archive of the discontinued LLVM Phabricator instance.

[SLC] Simplify pow(x, 0.333...) to cbrt(x)
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 154679

llvm/include/llvm/Transforms/Utils/SimplifyLibCalls.h

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp

llvm/test/Transforms/InstCombine/pow-cbrt.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SLC] Simplify pow(x, 0.333...) to cbrt(x)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 154679

llvm/include/llvm/Transforms/Utils/SimplifyLibCalls.h

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp

llvm/test/Transforms/InstCombine/pow-cbrt.ll

[SLC] Simplify pow(x, 0.333...) to cbrt(x)
AbandonedPublic