This transformation helps some benchmarks in SPEC CPU200 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64.
I'm not sure what you mean by "a hard time matching the exponent"; I can't see any reason float would be different from double, assuming you're actually using the right constant.
You need nsz: pow(-0., 1./3) returns +0, but cbrt(-0.) returns -0.
I think I'd prefer to require afn for this; not sure it's necessary, but better to be safe.
Please add explicit comments explaining why you need nnan and ninf (nnan because pow() returns a nan for negative x, ninf for pow(-inf, 1./3)).
Shouldn't this patch use the existing code in replacePowWithSqrt(), so we're not (incompletely) duplicating the logic?
That code also has a TODO comment about choosing the minimal set of FMF to enable the fold. Whatever we decide that predicate will be should be identical for both transforms?
The logic is almost the same, except that sqrt() has a corresponding intrinsic and cbrt() doesn't. So. at elast for now, methinks that it's easier to understand and review this change as a separate function. Then, if needed, both functions could be merged.
isFast() is deprecated, because it makes the actual requirements unclear and disables optimizations where it isn't necessary. (In particular, you don't need reassoc here.)
If I'm seeing it correctly, there are several independent changes going on here. Can you split this up to make the review easier?
- Add all new tests with baseline CHECKs as an NFC preliminary step.
- It's not clear to me what the FMF diffs in pow-sqrt.ll are showing. If we are adjusting FMF constraints on existing folds, that should be an independent patch?
- The cosmetic diffs in variable names (Base, Expo, Shrunk, etc) look fine, so those can be another preliminary NFC commit before the part that needs further review.
- The cbrt transform can be the last patch/commit in this series; once we have the cleanup done, that should be a very small diff.
As we discussed in D49306, I agree that this is probably a good perf transform, but I don't think we've shown any compelling reason to do this in IR vs. DAGCombiner.
There are downsides to doing this in IR currently because we don't have a cbrt intrinsic. That means we have different behavior based on data type (vectors won't get transformed).
It's also not clear why transforming to a form with fdiv in the negative exponent case is better than a single pow instruction. (And that case probably needs some perf justification even as a backend fold.)