This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
8/22
SimplifyLibCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
3/4
pow-4.ll

Differential D13994

[SimplifyLibCalls] Optimization for pow(x, n) where n is some constant
ClosedPublic

Authored by mgrang on Oct 22 2015, 2:04 PM.

Download Raw Diff

Details

Reviewers

majnemer
scanon
joerg
davide
escha
weimingz

Commits

rG8213072a453b: [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant
rL254776: [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant

Summary

In order to avoid calling pow function we generate repeated fmul when n is a
positive or negative whole number.

For each exponent we pre-compute Addition Chains in order to minimize the no.
of fmuls.
Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html

We pre-compute addition chains for exponents upto 32 (which results in a max of
7 fmuls).

For eg:
4 = 2+2
5 = 2+3
6 = 3+3 and so on

Hence,
pow(x, 4.0) ==> y = fmul x, x
                x = fmul y, y
                ret x

For negative exponents, we simply compute the reciprocal of the final result.

Note: This transformation is only enabled under fast-math.

Patch by Mandeep Singh Grang <mgrang@codeaurora.org>

Diff Detail

Event Timeline

mgrang updated this revision to Diff 38171.Oct 22 2015, 2:04 PM

mgrang retitled this revision from to Optimization for pow(x, n) where n is some constant.

mgrang updated this object.

mgrang added reviewers: weimingz, apazos, • zinob, beanz.

mgrang set the repository for this revision to rL LLVM.

mgrang added a subscriber: llvm-commits.

mgrang added a reviewer: chandlerc.Oct 22 2015, 2:30 PM

mcrosier edited reviewers, added: majnemer; removed: beanz.Oct 22 2015, 3:30 PM

mgrang added a reviewer: joerg.Oct 22 2015, 5:10 PM

Please upload a diff with more context.

lib/Transforms/Utils/SimplifyLibCalls.cpp
1065	Pointers go on the right side. Why is `Exp` signed?
1069	Sentences start with an upper case letter.
1077	Please don't strength reduce manually.

mgrang added inline comments.Oct 22 2015, 5:58 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1077	Can you please clarify what do you mean by "don't strength reduce manually".

majnemer added inline comments.Oct 22 2015, 6:14 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1077	Unless I am mistaken, you can rewrite that expression as `Exp / 2`.

mgrang added inline comments.Oct 22 2015, 6:41 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1077	Yes, I can rewrite that as Exp / 2 but wouldn't that be slower than Exp >> 1 since the former case would involve generating a div instruction?

majnemer added inline comments.Oct 22 2015, 6:45 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1077	I would recommend against premature optimization.

mgrang updated this revision to Diff 38200.Oct 22 2015, 6:57 PM

mgrang updated this object.

mgrang edited edge metadata.

mgrang removed rL LLVM as the repository for this revision.

majnemer added inline comments.Oct 22 2015, 7:42 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1197	If `Exp` is `72057594037927937`, it will be converted to `72057594037927936.0` when passing it to `isExactlyValue`. Is that OK?

Somewhat unrelated side note, for fast math -0.5 (generally) and +/- 1/3 (for targets with cbrt) are likely also useful to consider.

Trivial nits.

lib/Transforms/Utils/SimplifyLibCalls.cpp
1190	Maximize 80-column here.
test/Transforms/InstCombine/pow-4.ll
10	I don't see the attributes associated with #1 anywhere, so it's probably safe to drop the #1 here.
31	I believe you can safely drop the tail marker (i.e., tail call -> call). There are many instances of this.

mgrang updated this revision to Diff 38248.Oct 23 2015, 12:11 PM

mgrang edited edge metadata.

mgrang marked 7 inline comments as done.Oct 23 2015, 12:16 PM

mgrang added inline comments.

lib/Transforms/Utils/SimplifyLibCalls.cpp
1197	I limit the PowerOptThreshold so that we don't run into this issue.

Folks, could you please review this patch and comment on whether we are good or if any more changes are needed?

https://en.wikipedia.org/wiki/Addition-chain_exponentiation This might be relevant here.

Thanks escha.
It seems Addition Chain Exponentiation is NP-complete.

My solution calculates pow(x, n) by computing powers of 2.
See: https://en.wikipedia.org/wiki/Exponentiation_by_squaring#Computation_by_powers_of_2

mgrang added a reviewer: escha.Oct 25 2015, 8:11 PM

It is NP-complete, but that's only for the general case; in practice you really only need to list a few common cases (it's not like we need to calculate optimal chains for pow(x, 1723)).

Hi Mandeep,
The general rule of thumb for pinging a patch is about a week. Also, David and others are likely busy getting prepared for the Dev Meeting later this week, so response may be delayed.

Also, I don't think you've properly addressed David's comment about the use of the isExactlyValue() function. I could be mistaken, but I believe David is pointing out that there may be conversion error with some inputs. The default value of 32 works, but will this work for all user specified values?

Personally, my suggestion would be to drop the command-line option all together.

Chad

Wouldn't the breakeven point be target dependent? A hard-coded default feels wrong.

Thanks Chad and apologies to everyone for pinging too early!

I guess doing this transformation for exponents greater than 32 would not really be helpful since the no. of fmul(s) would increase.
So I would like to limit this only to exponents <= 32 (+ve and -ve).

I would then change the command line option to bool (default value true), so that the user can turn off the transformation if he wants to.

Thoughts?

Hi Paul,

I think for smaller values of exponents (<=32), the break-even point does not really depend on the target. Although I am open to suggestions to consider otherwise :)

Thanks,
Mandeep

mgrang updated this revision to Diff 38572.Oct 27 2015, 11:08 AM

mgrang updated this object.

mgrang edited edge metadata.

mgrang set the repository for this revision to rL LLVM.

mgrang retitled this revision from Optimization for pow(x, n) where n is some constant to [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant.Oct 28 2015, 11:18 AM

I wonder if an argument could be made that this expansion should happen in CodeGenPrep instead of SimplifyLibCalls...

David,
I see that all the optimizations for various math related functions is done in SimplifyLibCalls. So I guess SimplifyLibCalls should be the place to do it.
Did you have a particular reason why this expansion should be done in CodeGenPrep?

mgrang added a reviewer: scanon.Oct 29 2015, 3:13 PM

scanon added inline comments.Oct 30 2015, 6:53 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1197	Can we really not find a more elegant way to handle this than by checking for equality to every integer in range? How was 32 chosen? It would make more sense to bound the number of multiplies in the resulting lowering than bounding the magnitude of the exponent. I would also expect this to not be symmetric, because division is expensive.
1199	s/divide by 1.0/compute the reciprocal/

Added Davide Italiano since I see that he is working on similar/related stuff.

mgrang added inline comments.Nov 4 2015, 4:29 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1197	Thanks for the review Steve. I am working on fixing this. GCC generates fmul for all exponent values under fast-math. But I don't know if it would really be efficient since the no. of fmuls increases with the exponent.

Instead of a hard-coded limit of 32, would about building a table of multiplication chains with at most 8 multiplications or something like that? Precomputing them is easy and the number of intermediate registers should also be low enough to be. Might be worth checking much temporaries are generated and tweak the limit a bit.

mgrang updated this revision to Diff 39752.Nov 9 2015, 2:07 PM

mgrang updated this object.

mgrang removed rL LLVM as the repository for this revision.

I have limited the transformation to a max of 8 fmuls (thus max exponent=32).

Using addition-chains can reduce the no. of fmuls in few cases but storing the addition-chains for each exponent will incur space overhead.
The current implementation of exponentiation by binary expansion seems much cleaner IMO.

mgrang updated this revision to Diff 40166.Nov 13 2015, 12:01 PM

mgrang updated this object.

chandlerc removed a reviewer: chandlerc.Nov 13 2015, 9:30 PM

I have now implemented this transformation by pre-computing multiplications (using Addition-Chains).
Can the reviewers please review this patch?

scanon added inline comments.Nov 16 2015, 12:27 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1194	I think either this does the wrong thing for x86 80-bit long doubles (or does it not apply to powl( )?) and a hypothetical future quad type, or it could simply be Op2C->getValueAPF().convertToDouble().
1202	Super minor nit, ceil(V) != floor(V) is an overly-complex way to check if V is an integer. trunc(V) != V is simpler and seems more idiomatic to me. Longer-term, we should just add isInteger( ) to APFloat, but that's outside the scope of this change.

I went ahead and added isInteger( ) to APFloat (r253254). You can use that to simplify some of the logic and make it somewhat more general.

mcrosier edited edge metadata.Nov 16 2015, 2:25 PM

mcrosier removed a subscriber: mcrosier.

Thanks Steve.

I see that x86_FP80 and FP128 types are not handled by this transformation. Also we will lose info if we convert them to float or double. So I am in favor of limiting this transformation only to strict float and double types (ie. powl will not be handled).

Would like to know what you think.

scanon added inline comments.Nov 16 2015, 5:57 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp

1194

With r253254, I think you should be able to make this general pretty easily. Something like:

APFloat V = abs(Op2C->getValueAPF( ));
if ((V.compare(APFloat(32.0)) == APFloat::cmpGreaterThan) || !V.isInteger())
    return nullptr;
// V is known to be an integer in [0, 32], so it can safely be converted to double now
// int would be even nicer, but APFloat doesn't have convertToInt (yet?). 
Value *FMul = getPow(Op1, convertToDouble(V), B);

I think we cannot simply call convertToDouble(V) on something which is not a double.

For example;
powf (x, 8) --> results in assertion: "semantics == (const llvm::fltSemantics*)&IEEEdouble && "Float semantics are not IEEEdouble"' failed."

That's the reason I checked the Type of V and accordingly converted to float or double.

But the problem is that long doubles are neither float nor doubles (they are type FP128) and hence cannot be converted readily.

Hence the "something like". We should really just be able to pull and integer value out of APFloat (but again, outside the scope of this change). Short-term, I think you can use V.convert(APFloat::IEEEdouble, ...) to get something that *can* be converted to double.

mgrang updated this revision to Diff 40421.Nov 17 2015, 12:14 PM

Made the test case target independent.

This looks good to me now. One of the owners should probably sign off on it too.

majnemer added inline comments.Nov 17 2015, 8:14 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1068–1069	Perhaps we should assert that `Exp != 0` ?
1088–1101	You will end up creating unnecessary instructions if `Exp < 32`, please do not do this.
test/Transforms/InstCombine/pow-4.ll
4	You don't need --check-prefix=CHECK

Thanks a lot everyone for your valuable comments and suggestions!

@majnemer @escha
Can you please review and sign off this patch?

mgrang added inline comments.Nov 17 2015, 8:31 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1068–1069	I will fix this.
1088–1101	Thanks David. My previous implementation used Binary Exponentiation but that results in more fmuls getting generated. Addition-Chain Exponentiation generates the optimal (least) no. of fmuls. Other reviewers were of the opinion that we should minimize fmuls (by precomputing the multiplication chains). I think it will always be a trade-off between runtime and code-size; and here we opted for optimal runtime.
test/Transforms/InstCombine/pow-4.ll
4	I will fix this.

mcrosier added inline comments.Nov 18 2015, 6:16 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1101	I believe the point David is trying to make is that you don't need to compute all 32 chains (at compile-time) when exp is less than 32. How about something like this: static const unsigned ExpTable[32][2] = { { 0, 0 }, // Unused { 0, 0 }, // pow1 { 0, 0 }, // pow1, pow1 { 1, 2 }, // { 2, 2 }, { 2, 3 }, { 3, 3 }, { 2, 5 }, { 4, 4 }, { 1, 8 }, ... { 16, 16 } } if (exp == 1) return pow1; if (exp == 2) return B.CreateFMul(pow1, pow1); AddChain[1] = pow1; AddChain[2] = B.CreateFMul(pow1, pow1); for (unsigned i = 3; i < exp; ++i) AddChain[i] = B.CreateFMul(AddChain[ExpTable[i][0]], AddChain[ExpTable[i][1]]); return AddChain[exp];

joerg added inline comments.Nov 18 2015, 7:28 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1088–1101	Almost. There are two different ways to store the pre-computed table. The first is to store the summands, e.g. `15 = 12 + 3`. Advantage is that it needs exactly one entry per exponent, but downside is that you need to deduplicate, since 12 and 3 will use the same sub-expressions. A map and recursion is easiest approach for that. The other approach is to just compile the list explicitly like `(0,0), (1,0), (2,2), (3,3), (4,2)`. Entry 0 is the input, every other entry is created in the loop. This version requires slightly more space in the binary, but allows using a simple for loop.

As long as we're really going to dig into the addition-chain issue, why are we duplicating Reassociate::buildMinimalMultiplyDAG here?

Thanks for the suggestions everyone.

I have now implemented Addition-Chains using recursion with memoization. The result being that we do not need to generate extraneous instructions.

Here is an example:

7 --> (2, 5) --> (2, (2, 3)) --> (2, (2, (1, 2)))

We now generate instructions for only the intermediate products needed by our exponent. Storing the results at each step saves us repetitive recursive calls.

I have also addressed David's comments in this patch (added an assertion for Exp!=0 and removed check-prefix from the test case).

Please let me know your thoughts on this.

mgrang updated this revision to Diff 40543.Nov 18 2015, 12:47 PM

mgrang removed reviewers: • zinob, apazos.Nov 18 2015, 10:39 PM

Hi @majnemer,

I guess I have addressed all your concerns in my latest patch.
Please let me know if you have any other concerns.

Thanks,
Mandeep

Please run clang-format on the patch. At minimum it should reduce the amount of unnecessary vertical space in the getPow function.

Ran clang-format on the patch as per Chad's suggestion.

D14466 provides a variable (bool unsafeFPMath) to check for -fast-math. Made use of this in the check for fast-math.
Also updated comment for fast-math check.

Changed if (InnerChain[Exp] != nullptr) ==> if (InnerChain[Exp])

Can we please get this patch reviewed?

A gentle reminder for reviews please.

My concerns have been addressed.

This revision is now accepted and ready to land.Nov 28 2015, 1:15 PM

weimingz closed this revision.Dec 4 2015, 2:03 PM

spatel mentioned this in D51435: [SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x).Aug 29 2018, 3:01 PM

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyLibCalls.cpp

51 lines

test/

Transforms/

InstCombine/

pow-4.ll

120 lines

Diff 40724

lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 1,055 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::optimizeCos(CallInst CI, IRBuilder<> &B) {
Value *Op1 = CI->getArgOperand(0);		Value *Op1 = CI->getArgOperand(0);
if (BinaryOperator::isFNeg(Op1)) {		if (BinaryOperator::isFNeg(Op1)) {
BinaryOperator *BinExpr = cast<BinaryOperator>(Op1);		BinaryOperator *BinExpr = cast<BinaryOperator>(Op1);
return B.CreateCall(Callee, BinExpr->getOperand(1), "cos");		return B.CreateCall(Callee, BinExpr->getOperand(1), "cos");
}		}
return Ret;		return Ret;
}		}

		static Value getPow(Value InnerChain[33], unsigned Exp, IRBuilder<> &B) {
		// Multiplications calculated using Addition Chains.
		majnemerUnsubmitted Done Reply Inline Actions Pointers go on the right side. Why is `Exp` signed? majnemer: Pointers go on the right side. Why is `Exp` signed?
		// Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html

		assert(Exp != 0 && "Incorrect exponent 0 not handled");

		majnemerUnsubmitted Done Reply Inline Actions Sentences start with an upper case letter. majnemer: Sentences start with an upper case letter.
		majnemerUnsubmitted Done Reply Inline Actions Perhaps we should assert that `Exp != 0` ? majnemer: Perhaps we should assert that `Exp != 0` ?
		mgrangAuthorUnsubmitted Not Done Reply Inline Actions I will fix this. mgrang: I will fix this.
		if (InnerChain[Exp])
		return InnerChain[Exp];

		static const unsigned AddChain[33][2] = {
		{0, 0}, // Unused.
		{0, 0}, // Unused (base case = pow1).
		{1, 1}, // Unused (pre-computed).
		{1, 2}, {2, 2}, {2, 3}, {3, 3}, {2, 5}, {4, 4},
		majnemerUnsubmitted Done Reply Inline Actions Please don't strength reduce manually. majnemer: Please don't strength reduce manually.
		mgrangAuthorUnsubmitted Not Done Reply Inline Actions Can you please clarify what do you mean by "don't strength reduce manually". mgrang: Can you please clarify what do you mean by "don't strength reduce manually".
		majnemerUnsubmitted Not Done Reply Inline Actions Unless I am mistaken, you can rewrite that expression as `Exp / 2`. majnemer: Unless I am mistaken, you can rewrite that expression as `Exp / 2`.
		mgrangAuthorUnsubmitted Not Done Reply Inline Actions Yes, I can rewrite that as Exp / 2 but wouldn't that be slower than Exp >> 1 since the former case would involve generating a div instruction? mgrang: Yes, I can rewrite that as Exp / 2 but wouldn't that be slower than Exp >> 1 since the former…
		majnemerUnsubmitted Done Reply Inline Actions I would recommend against premature optimization. majnemer: I would recommend against premature optimization.
		{1, 8}, {5, 5}, {1, 10}, {6, 6}, {4, 9}, {7, 7},
		{3, 12}, {8, 8}, {8, 9}, {2, 16}, {1, 18}, {10, 10},
		{6, 15}, {11, 11}, {3, 20}, {12, 12}, {8, 17}, {13, 13},
		{3, 24}, {14, 14}, {4, 25}, {15, 15}, {3, 28}, {16, 16},
		};

		InnerChain[Exp] = B.CreateFMul(getPow(InnerChain, AddChain[Exp][0], B),
		getPow(InnerChain, AddChain[Exp][1], B));
		return InnerChain[Exp];
		}

Value LibCallSimplifier::optimizePow(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizePow(CallInst CI, IRBuilder<> &B) {
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();
Value *Ret = nullptr;		Value *Ret = nullptr;
StringRef Name = Callee->getName();		StringRef Name = Callee->getName();
if (UnsafeFPShrink && Name == "pow" && hasFloatVersion(Name))		if (UnsafeFPShrink && Name == "pow" && hasFloatVersion(Name))
Ret = optimizeUnaryDoubleFP(CI, B, true);		Ret = optimizeUnaryDoubleFP(CI, B, true);

FunctionType *FT = Callee->getFunctionType();		FunctionType *FT = Callee->getFunctionType();
// Just make sure this has 2 arguments of the same FP type, which match the		// Just make sure this has 2 arguments of the same FP type, which match the
// result type.		// result type.
if (FT->getNumParams() != 2 \|\| FT->getReturnType() != FT->getParamType(0) \|\|		if (FT->getNumParams() != 2 \|\| FT->getReturnType() != FT->getParamType(0) \|\|
FT->getParamType(0) != FT->getParamType(1) \|\|		FT->getParamType(0) != FT->getParamType(1) \|\|
!FT->getParamType(0)->isFloatingPointTy())		!FT->getParamType(0)->isFloatingPointTy())
		majnemerUnsubmitted Done Reply Inline Actions You will end up creating unnecessary instructions if `Exp < 32`, please do not do this. majnemer: You will end up creating unnecessary instructions if `Exp < 32`, please do not do this.
		mgrangAuthorUnsubmitted Not Done Reply Inline Actions Thanks David. My previous implementation used Binary Exponentiation but that results in more fmuls getting generated. Addition-Chain Exponentiation generates the optimal (least) no. of fmuls. Other reviewers were of the opinion that we should minimize fmuls (by precomputing the multiplication chains). I think it will always be a trade-off between runtime and code-size; and here we opted for optimal runtime. mgrang: Thanks David. My previous implementation used Binary Exponentiation but that results in more…
		joergUnsubmitted Not Done Reply Inline Actions Almost. There are two different ways to store the pre-computed table. The first is to store the summands, e.g. `15 = 12 + 3`. Advantage is that it needs exactly one entry per exponent, but downside is that you need to deduplicate, since 12 and 3 will use the same sub-expressions. A map and recursion is easiest approach for that. The other approach is to just compile the list explicitly like `(0,0), (1,0), (2,2), (3,3), (4,2)`. Entry 0 is the input, every other entry is created in the loop. This version requires slightly more space in the binary, but allows using a simple for loop. joerg: Almost. There are two different ways to store the pre-computed table. The first is to store the…
		mcrosierUnsubmitted Not Done Reply Inline Actions I believe the point David is trying to make is that you don't need to compute all 32 chains (at compile-time) when exp is less than 32. How about something like this: static const unsigned ExpTable[32][2] = { { 0, 0 }, // Unused { 0, 0 }, // pow1 { 0, 0 }, // pow1, pow1 { 1, 2 }, // { 2, 2 }, { 2, 3 }, { 3, 3 }, { 2, 5 }, { 4, 4 }, { 1, 8 }, ... { 16, 16 } } if (exp == 1) return pow1; if (exp == 2) return B.CreateFMul(pow1, pow1); AddChain[1] = pow1; AddChain[2] = B.CreateFMul(pow1, pow1); for (unsigned i = 3; i < exp; ++i) AddChain[i] = B.CreateFMul(AddChain[ExpTable[i][0]], AddChain[ExpTable[i][1]]); return AddChain[exp]; mcrosier: I believe the point David is trying to make is that you don't need to compute all 32 chains (at…
return Ret;		return Ret;

Value Op1 = CI->getArgOperand(0), Op2 = CI->getArgOperand(1);		Value Op1 = CI->getArgOperand(0), Op2 = CI->getArgOperand(1);
if (ConstantFP *Op1C = dyn_cast<ConstantFP>(Op1)) {		if (ConstantFP *Op1C = dyn_cast<ConstantFP>(Op1)) {
// pow(1.0, x) -> 1.0		// pow(1.0, x) -> 1.0
if (Op1C->isExactlyValue(1.0))		if (Op1C->isExactlyValue(1.0))
return Op1C;		return Op1C;
// pow(2.0, x) -> exp2(x)		// pow(2.0, x) -> exp2(x)
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::optimizePow(CallInst CI, IRBuilder<> &B) {
}		}

if (Op2C->isExactlyValue(1.0)) // pow(x, 1.0) -> x		if (Op2C->isExactlyValue(1.0)) // pow(x, 1.0) -> x
return Op1;		return Op1;
if (Op2C->isExactlyValue(2.0)) // pow(x, 2.0) -> x*x		if (Op2C->isExactlyValue(2.0)) // pow(x, 2.0) -> x*x
return B.CreateFMul(Op1, Op1, "pow2");		return B.CreateFMul(Op1, Op1, "pow2");
if (Op2C->isExactlyValue(-1.0)) // pow(x, -1.0) -> 1.0/x		if (Op2C->isExactlyValue(-1.0)) // pow(x, -1.0) -> 1.0/x
return B.CreateFDiv(ConstantFP::get(CI->getType(), 1.0), Op1, "powrecip");		return B.CreateFDiv(ConstantFP::get(CI->getType(), 1.0), Op1, "powrecip");

		// In -ffast-math, generate repeated fmul instead of generating pow(x, n).
		mcrosierUnsubmitted Done Reply Inline Actions Maximize 80-column here. mcrosier: Maximize 80-column here.
		if (unsafeFPMath) {
		APFloat V = abs(Op2C->getValueAPF());
		// We limit to a max of 7 fmul(s). Thus max exponent is 32.
		// This transformation applies to integer exponents only.
		scanonUnsubmitted Not Done Reply Inline Actions I think either this does the wrong thing for x86 80-bit long doubles (or does it not apply to powl( )?) and a hypothetical future quad type, or it could simply be Op2C->getValueAPF().convertToDouble(). scanon: I think either this does the wrong thing for x86 80-bit long doubles (or does it not apply to…
		scanonUnsubmitted Not Done Reply Inline Actions With r253254, I think you should be able to make this general pretty easily. Something like: APFloat V = abs(Op2C->getValueAPF( )); if ((V.compare(APFloat(32.0)) == APFloat::cmpGreaterThan) \|\| !V.isInteger()) return nullptr; // V is known to be an integer in [0, 32], so it can safely be converted to double now // int would be even nicer, but APFloat doesn't have convertToInt (yet?). Value FMul = getPow(Op1, convertToDouble(V), B); scanon:* With r253254, I think you should be able to make this general pretty easily. Something like…
		if (V.compare(APFloat(V.getSemantics(), 32.0)) == APFloat::cmpGreaterThan \|\|
		!V.isInteger())
		return nullptr;
		majnemerUnsubmitted Not Done Reply Inline Actions If `Exp` is `72057594037927937`, it will be converted to `72057594037927936.0` when passing it to `isExactlyValue`. Is that OK? majnemer: If `Exp` is `72057594037927937`, it will be converted to `72057594037927936.0` when passing it…
		mgrangAuthorUnsubmitted Not Done Reply Inline Actions I limit the PowerOptThreshold so that we don't run into this issue. mgrang: I limit the PowerOptThreshold so that we don't run into this issue.
		scanonUnsubmitted Not Done Reply Inline Actions Can we really not find a more elegant way to handle this than by checking for equality to every integer in range? How was 32 chosen? It would make more sense to bound the number of multiplies in the resulting lowering than bounding the magnitude of the exponent. I would also expect this to not be symmetric, because division is expensive. scanon: Can we really not find a more elegant way to handle this than by checking for equality to every…
		mgrangAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the review Steve. I am working on fixing this. GCC generates fmul for all exponent values under fast-math. But I don't know if it would really be efficient since the no. of fmuls increases with the exponent. mgrang: Thanks for the review Steve. I am working on fixing this. GCC generates fmul for all exponent…

		// We will memoize intermediate products of the Addition Chain.
		scanonUnsubmitted Done Reply Inline Actions s/divide by 1.0/compute the reciprocal/ scanon: s/divide by 1.0/compute the reciprocal/
		Value *InnerChain[33] = {nullptr};
		InnerChain[1] = Op1;
		InnerChain[2] = B.CreateFMul(Op1, Op1);
		scanonUnsubmitted Not Done Reply Inline Actions Super minor nit, ceil(V) != floor(V) is an overly-complex way to check if V is an integer. trunc(V) != V is simpler and seems more idiomatic to me. Longer-term, we should just add isInteger( ) to APFloat, but that's outside the scope of this change. scanon: Super minor nit, ceil(V) != floor(V) is an overly-complex way to check if V is an integer.

		// We cannot readily convert a non-double type (like float) to a double.
		// So we first convert V to something which could be converted to double.
		bool ignored;
		V.convert(APFloat::IEEEdouble, APFloat::rmTowardZero, &ignored);
		Value *FMul = getPow(InnerChain, V.convertToDouble(), B);
		// For negative exponents simply compute the reciprocal.
		if (Op2C->isNegative())
		FMul = B.CreateFDiv(ConstantFP::get(CI->getType(), 1.0), FMul);
		return FMul;
		}

return nullptr;		return nullptr;
}		}

Value LibCallSimplifier::optimizeExp2(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizeExp2(CallInst CI, IRBuilder<> &B) {
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();
Function *Caller = CI->getParent()->getParent();		Function *Caller = CI->getParent()->getParent();
Value *Ret = nullptr;		Value *Ret = nullptr;
StringRef Name = Callee->getName();		StringRef Name = Callee->getName();
▲ Show 20 Lines • Show All 1,312 Lines • Show Last 20 Lines

test/Transforms/InstCombine/pow-4.ll

This file was added.

				; Test that the pow library call simplifier works correctly.

				; RUN: opt -instcombine -S < %s \| FileCheck %s

				majnemerUnsubmitted Done Reply Inline Actions You don't need --check-prefix=CHECK majnemer: You don't need --check-prefix=CHECK
				mgrangAuthorUnsubmitted Not Done Reply Inline Actions I will fix this. mgrang: I will fix this.
				; Function Attrs: nounwind readnone
				declare double @llvm.pow.f64(double, double)
				declare float @llvm.pow.f32(float, float)

				; pow(x, 4.0f)
				define float @test_simplify_4f(float %x) #0 {
				mcrosierUnsubmitted Done Reply Inline Actions I don't see the attributes associated with #1 anywhere, so it's probably safe to drop the #1 here. mcrosier: I don't see the attributes associated with #1 anywhere, so it's probably safe to drop the #1…
				; CHECK-LABEL: @test_simplify_4f(
				; CHECK-NOT: pow
				; CHECK-NEXT: %1 = fmul float %x, %x
				; CHECK-NEXT: %2 = fmul float %1, %1
				; CHECK-NEXT: ret float %2
				%1 = call float @llvm.pow.f32(float %x, float 4.000000e+00)
				ret float %1
				}

				; pow(x, 3.0)
				define double @test_simplify_3(double %x) #0 {
				; CHECK-LABEL: @test_simplify_3(
				; CHECK-NOT: pow
				; CHECK-NEXT: %1 = fmul double %x, %x
				; CHECK-NEXT: %2 = fmul double %1, %x
				; CHECK-NEXT: ret double %2
				%1 = call double @llvm.pow.f64(double %x, double 3.000000e+00)
				ret double %1
				}

				; pow(x, 4.0)
				mcrosierUnsubmitted Done Reply Inline Actions I believe you can safely drop the tail marker (i.e., tail call -> call). There are many instances of this. mcrosier: I believe you can safely drop the tail marker (i.e., tail call -> call). There are many…
				define double @test_simplify_4(double %x) #0 {
				; CHECK-LABEL: @test_simplify_4(
				; CHECK-NOT: pow
				; CHECK-NEXT: %1 = fmul double %x, %x
				; CHECK-NEXT: %2 = fmul double %1, %1
				; CHECK-NEXT: ret double %2
				%1 = call double @llvm.pow.f64(double %x, double 4.000000e+00)
				ret double %1
				}

				; pow(x, 15.0)
				define double @test_simplify_15(double %x) #0 {
				; CHECK-LABEL: @test_simplify_15(
				; CHECK-NOT: pow
				; CHECK-NEXT: %1 = fmul double %x, %x
				; CHECK-NEXT: %2 = fmul double %1, %x
				; CHECK-NEXT: %3 = fmul double %2, %2
				; CHECK-NEXT: %4 = fmul double %3, %3
				; CHECK-NEXT: %5 = fmul double %2, %4
				; CHECK-NEXT: ret double %5
				%1 = call double @llvm.pow.f64(double %x, double 1.500000e+01)
				ret double %1
				}

				; pow(x, -7.0)
				define double @test_simplify_neg_7(double %x) #0 {
				; CHECK-LABEL: @test_simplify_neg_7(
				; CHECK-NOT: pow
				; CHECK-NEXT: %1 = fmul double %x, %x
				; CHECK-NEXT: %2 = fmul double %1, %x
				; CHECK-NEXT: %3 = fmul double %1, %2
				; CHECK-NEXT: %4 = fmul double %1, %3
				; CHECK-NEXT: %5 = fdiv double 1.000000e+00, %4
				; CHECK-NEXT: ret double %5
				%1 = call double @llvm.pow.f64(double %x, double -7.000000e+00)
				ret double %1
				}

				; pow(x, -19.0)
				define double @test_simplify_neg_19(double %x) #0 {
				; CHECK-LABEL: @test_simplify_neg_19(
				; CHECK-NOT: pow
				; CHECK-NEXT: %1 = fmul double %x, %x
				; CHECK-NEXT: %2 = fmul double %1, %1
				; CHECK-NEXT: %3 = fmul double %2, %2
				; CHECK-NEXT: %4 = fmul double %3, %3
				; CHECK-NEXT: %5 = fmul double %1, %4
				; CHECK-NEXT: %6 = fmul double %5, %x
				; CHECK-NEXT: %7 = fdiv double 1.000000e+00, %6
				; CHECK-NEXT: ret double %7
				%1 = call double @llvm.pow.f64(double %x, double -1.900000e+01)
				ret double %1
				}

				; pow(x, 11.23)
				define double @test_simplify_11_23(double %x) #0 {
				; CHECK-LABEL: @test_simplify_11_23(
				; CHECK-NOT: fmul
				; CHECK-NEXT: %1 = call double @llvm.pow.f64(double %x, double 1.123000e+01)
				; CHECK-NEXT: ret double %1
				%1 = call double @llvm.pow.f64(double %x, double 1.123000e+01)
				ret double %1
				}

				; pow(x, 32.0)
				define double @test_simplify_32(double %x) #0 {
				; CHECK-LABEL: @test_simplify_32(
				; CHECK-NOT: pow
				; CHECK-NEXT: %1 = fmul double %x, %x
				; CHECK-NEXT: %2 = fmul double %1, %1
				; CHECK-NEXT: %3 = fmul double %2, %2
				; CHECK-NEXT: %4 = fmul double %3, %3
				; CHECK-NEXT: %5 = fmul double %4, %4
				; CHECK-NEXT: ret double %5
				%1 = call double @llvm.pow.f64(double %x, double 3.200000e+01)
				ret double %1
				}

				; pow(x, 33.0)
				define double @test_simplify_33(double %x) #0 {
				; CHECK-LABEL: @test_simplify_33(
				; CHECK-NOT: fmul
				; CHECK-NEXT: %1 = call double @llvm.pow.f64(double %x, double 3.300000e+01)
				; CHECK-NEXT: ret double %1
				%1 = call double @llvm.pow.f64(double %x, double 3.300000e+01)
				ret double %1
				}

				attributes #0 = { nounwind readnone "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" "unsafe-fp-math"="true" "use-soft-float"="false" }

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyLibCalls] Optimization for pow(x, n) where n is some constantClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 40724

lib/Transforms/Utils/SimplifyLibCalls.cpp

test/Transforms/InstCombine/pow-4.ll

[SimplifyLibCalls] Optimization for pow(x, n) where n is some constant
ClosedPublic