This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] pow(C,x) -> exp2(log2(C)*x)
ClosedPublic

Authored by xbolva00 on Jul 2 2019, 2:06 PM.

Details

Summary

Transform
pow(C,x)

To
exp2(log2(C)*x)

if C > 0, C != inf, C != NaN (and C is not power of 2, since we have some fold for such case already).

log(C) is folded by the compiler and exp2 is much faster to compute than pow.

Diff Detail

Repository
rL LLVM

Event Timeline

xbolva00 created this revision.Jul 2 2019, 2:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2019, 2:06 PM

I'm a little wary about fold the case where doesNotAccessMemory is false, but I guess it's likely okay.

I'm guessing it doesn't matter whether you use exp or exp2 here? I guess the performance is probably similar, but it isn't obvious to me.

Should we specifically have test coverage for transforming pow(10, x) to exp?

lib/Transforms/Utils/SimplifyLibCalls.cpp
1351 ↗(On Diff #207624)

Please don't check "isFast()", instead, check the components you actually need for this. You clearly need afn; not sure if you need anything else to handle cases where the exponent is zero/inf/nan.

1354 ↗(On Diff #207624)

I'd rather explicitly fold the "log" here, so we know it actually happens; the constant folding code will not fold it in all cases.

xbolva00 updated this revision to Diff 207637.EditedJul 2 2019, 3:03 PM
xbolva00 retitled this revision from [InstCombine] pow(C,x) -> exp(log(C)*x) to [InstCombine] pow(C,x) -> exp2(log2(C)*x).
xbolva00 edited the summary of this revision. (Show Details)

Switched to exp2 (faster).
Addressed some review comments.
Added pow(10, e) test.

xbolva00 marked 2 inline comments as done.Jul 2 2019, 3:06 PM
xbolva00 added a subscriber: lebedev.ri.
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
1354 ↗(On Diff #207624)

emitUnaryFloatFnCall automatically folds it? @spatel @lebedev.ri

Not sure how to get log2 of APFloat...

xbolva00 marked an inline comment as done.Jul 2 2019, 3:45 PM
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
1354 ↗(On Diff #207624)

Ah, right. It was constant folded only in “fast” mode..

xbolva00 marked an inline comment as done.Jul 2 2019, 3:50 PM
xbolva00 added inline comments.
test/Transforms/InstCombine/pow-exp.ll
7 ↗(On Diff #207637)

Should we also change sFast() check in the fold above this new fold to just “afn” check ? @efriedma @spatel

efriedma added inline comments.Jul 2 2019, 4:08 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1354 ↗(On Diff #207624)

lib/Analysis/ConstantFolding.cpp has some code that calls APFloat::convertToDouble() and uses the host C library's implementation of various routines: specifically, fabs, log2, log, log10, exp, exp2, sin, cos, sqrt, acos, asin, atan, ceil, cosh, exp, floor, round, sinh, tan, tanh, pow, fmod, atan2. This is not ideal, but the trigonometric and exponential functions are tricky to implement with correct rounding, and nobody has spent the time to implement them on APFloat.

It's worth noting that we never fold the long double versions.

xbolva00 marked an inline comment as done.Jul 2 2019, 4:33 PM
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
1354 ↗(On Diff #207624)

Thanks, I will use this solution.

xbolva00 updated this revision to Diff 207781.Jul 3 2019, 6:51 AM

Fold log2 "manually".

xbolva00 marked an inline comment as done.Jul 3 2019, 6:52 AM
evandro added inline comments.Jul 3 2019, 8:30 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1355 ↗(On Diff #207781)

Shouldn't this call and the one below be to log2()?

xbolva00 updated this revision to Diff 207815.Jul 3 2019, 9:18 AM
xbolva00 marked an inline comment as done.
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
1355 ↗(On Diff #207781)

Ah, yes. Fixed.

Since using exp2(log()) was mathematically incorrect, have you run any benchmark that validates the results, such SPEC CPU2006 or CPU2017? If so, what kind of improvement did it register?

Since using exp2(log()) was mathematically incorrect, have you run any benchmark that validates the results, such SPEC CPU2006 or CPU2017? If so, what kind of improvement did it register?

I tested the first patch and the second revision with change to exp2 locally with simple examples to check for correctness and performance - and yes, I didn’t check previous revision with folding so a mistake was made.

I have no access to the SPEC benchmarks.

xbolva00 updated this revision to Diff 208036.Jul 4 2019, 6:54 AM

Precommited tests, rebased.

I tested the first patch and the second revision with change to exp2 locally with simple examples to check for correctness and performance - and yes, I didn’t check previous revision with folding so a mistake was made.

I have no access to the SPEC benchmarks.

What correctness tests did you run? If necessary, I can run SPEC to validate this patch at least on x86-64.

GCC has exp variant of this transformation so I checked their preconditions - unsafe math. My first patch had isFast() check but @efriedma requested to specify it better. I think afn/ninf/nnan is all we need. I wrote some small tests and compared printed results between Clang and GCC. I don’t think we could formally prove this transformation with Alive.

So, If you can test this patch with SPEC, please do - Thanks!

evandro accepted this revision.Jul 9 2019, 2:27 PM

It was actually harder to find a SPEC benchmark that triggered this case. As a matter of fact, it shows up only in CPU2017's 525.x264_r. But I can confirm that it works successfully.

Thank you for this patch.

lib/Transforms/Utils/SimplifyLibCalls.cpp
1238 ↗(On Diff #208036)

Please, update this comment.

1350 ↗(On Diff #208036)

Not really necessary to specify the conditions in the comment.

1360 ↗(On Diff #208036)

"mul" was used above to describe products.

test/Transforms/InstCombine/pow-exp.ll
208 ↗(On Diff #208036)

Again, not really necessary to spell out the conditions in the comment.

This revision is now accepted and ready to land.Jul 9 2019, 2:27 PM
xbolva00 updated this revision to Diff 208962.Jul 10 2019, 7:40 AM

Addressed review notes.

xbolva00 marked 6 inline comments as done.Jul 10 2019, 7:41 AM

It was actually harder to find a SPEC benchmark that triggered this case. As a matter of fact, it shows up only in CPU2017's 525.x264_r. But I can confirm that it works successfully.

Thank you for this patch.

Thanks !

This revision was automatically updated to reflect the committed changes.