This is an archive of the discontinued LLVM Phabricator instance.

[SLC] Simplify pow(x, 0.333...) to cbrt(x)
AbandonedPublic

Authored by evandro on Jul 6 2018, 1:38 PM.

Details

Reviewers
spatel
efriedma
Summary

This transformation helps some benchmarks in SPEC CPU200 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64.

Diff Detail

Event Timeline

evandro created this revision.Jul 6 2018, 1:38 PM

Tests?

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1138–1139

There seems to be two variants: https://godbolt.org/g/Rw1Gxt
Can you output the value that doesn't match?

Missing testcases.

I'm not sure what you mean by "a hard time matching the exponent"; I can't see any reason float would be different from double, assuming you're actually using the right constant.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1125

You need nsz: pow(-0., 1./3) returns +0, but cbrt(-0.) returns -0.

I think I'd prefer to require afn for this; not sure it's necessary, but better to be safe.

Please add explicit comments explaining why you need nnan and ninf (nnan because pow() returns a nan for negative x, ninf for pow(-inf, 1./3)).

spatel added a comment.Jul 8 2018, 8:56 AM

Shouldn't this patch use the existing code in replacePowWithSqrt(), so we're not (incompletely) duplicating the logic?

That code also has a TODO comment about choosing the minimal set of FMF to enable the fold. Whatever we decide that predicate will be should be identical for both transforms?

evandro marked an inline comment as done.Jul 9 2018, 9:05 AM

Shouldn't this patch use the existing code in replacePowWithSqrt(), so we're not (incompletely) duplicating the logic?

That code also has a TODO comment about choosing the minimal set of FMF to enable the fold. Whatever we decide that predicate will be should be identical for both transforms?

The logic is almost the same, except that sqrt() has a corresponding intrinsic and cbrt() doesn't. So. at elast for now, methinks that it's easier to understand and review this change as a separate function. Then, if needed, both functions could be merged.

evandro updated this revision to Diff 154620.Jul 9 2018, 9:07 AM

Add a test case.

evandro updated this revision to Diff 154624.Jul 9 2018, 9:25 AM
evandro added inline comments.Jul 9 2018, 10:48 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1138–1139

Please, see any example using float in the test case below.

evandro marked 2 inline comments as done.Jul 9 2018, 12:11 PM
evandro added inline comments.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1138–1139

My bad. I crafted the test case using the IEEE754 bits for SP instead of the bits for DP truncated for float.

evandro updated this revision to Diff 154674.Jul 9 2018, 12:12 PM
evandro marked an inline comment as done.
evandro edited the summary of this revision. (Show Details)

Enable handling of float.

lebedev.ri added inline comments.Jul 9 2018, 12:13 PM
llvm/test/Transforms/InstCombine/pow-cbrt.ll
1

Just use ./utils/update_test_checks.py

efriedma added inline comments.Jul 9 2018, 12:20 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1125

isFast() is deprecated, because it makes the actual requirements unclear and disables optimizations where it isn't necessary. (In particular, you don't need reassoc here.)

evandro marked 2 inline comments as done.Jul 9 2018, 12:38 PM
evandro updated this revision to Diff 154679.Jul 9 2018, 12:41 PM
evandro updated this revision to Diff 155032.Jul 11 2018, 10:43 AM
evandro edited the summary of this revision. (Show Details)
evandro added a reviewer: beanz.
evandro set the repository for this revision to rL LLVM.
evandro added subscribers: jlebar, arsenm.

Refactor all of the code that simplifies pow(x, 0.5) to sqrt() by folding it into a new function that handles all simplifications to radical operations.

evandro updated this revision to Diff 155297.Jul 12 2018, 4:03 PM

Allow splat vectors for trivial simplifications.

¡Ping! 🔔🔔

If I'm seeing it correctly, there are several independent changes going on here. Can you split this up to make the review easier?

  1. Add all new tests with baseline CHECKs as an NFC preliminary step.
  2. It's not clear to me what the FMF diffs in pow-sqrt.ll are showing. If we are adjusting FMF constraints on existing folds, that should be an independent patch?
  3. The cosmetic diffs in variable names (Base, Expo, Shrunk, etc) look fine, so those can be another preliminary NFC commit before the part that needs further review.
  4. The cbrt transform can be the last patch/commit in this series; once we have the cleanup done, that should be a very small diff.

OK. Please, stay tuned.

Committed rL338152 to add the base line test case pow-cbrt.ll.

evandro updated this revision to Diff 158160.Jul 30 2018, 7:03 PM
evandro edited reviewers, added: efriedma; removed: davide, beanz.Jul 30 2018, 7:07 PM
evandro edited subscribers, added: davide, beanz; removed: efriedma.
evandro updated this revision to Diff 158438.Jul 31 2018, 6:13 PM
evandro edited the summary of this revision. (Show Details)
spatel added a comment.Aug 7 2018, 1:45 PM

As we discussed in D49306, I agree that this is probably a good perf transform, but I don't think we've shown any compelling reason to do this in IR vs. DAGCombiner.

There are downsides to doing this in IR currently because we don't have a cbrt intrinsic. That means we have different behavior based on data type (vectors won't get transformed).
It's also not clear why transforming to a form with fdiv in the negative exponent case is better than a single pow instruction. (And that case probably needs some perf justification even as a backend fold.)

evandro abandoned this revision.Aug 9 2018, 2:20 PM

OK, then.

evandro added a subscriber: fhahn.Aug 17 2018, 8:02 AM