Page MenuHomePhabricator

fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
ClosedPublic

Authored by spatel on Oct 14 2014, 3:31 PM.

Details

Summary

If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

y = sqrt(x * x);

becomes this:

y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 14898.Oct 14 2014, 3:31 PM
spatel retitled this revision from to fold: sqrt(x * x * y) -> fabs(x) * sqrt(y).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: beanz, hfinkel, mcrosier.
spatel added a subscriber: Unknown Object (MLST).
hfinkel added inline comments.Oct 14 2014, 3:41 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗(On Diff #14898)

Hrmm... is this right, or do we need to check the function attribute here? I'm not sure that the argument having the unsafe-algebra flag means that we can change its use in a non-strict way.

spatel added inline comments.Oct 14 2014, 5:22 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗(On Diff #14898)

I think this goes back to the discussion of whether 'fast' is infectious in http://reviews.llvm.org/D5584. :)
In that case, we optimized away a sqrt call even though it wasn't explicitly marked 'fast'. This is a similar transform. We've just reversed the order of the fmul and sqrt.

I thought I saw some other precedence for just using the instruction-level flags, but I'm not finding it now. There's an optimization for log2() in InstCombineMulDivRem that checks whether the log2 intrinsic hasUnsafeAlgebra...I don't know how that is even specified in IR. I thought the IR fast-math flags only apply to fmul, fdiv, fadd, fsub, and frem? At least, that's what the LangRef says. FWIW, that log2 optimization doesn't appear to ever trigger, and I don't see any test case for it (r169025).

hfinkel added inline comments.Oct 15 2014, 12:22 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗(On Diff #14898)

In that case, we optimized away a sqrt call even though it wasn't explicitly marked 'fast'. This is a similar transform. We've just reversed the order of the fmul and sqrt.

Right, but in that case we were optimizing the fmul, and it was the "outer" operation, and it had fast. In this case, the sqrt is the outer operation, and we need to check its equivalent.

Generally speaking, fast-math flags cannot infect users (only operands), except for some result assumptions, because of the inlining use case.

FWIW, that log2 optimization doesn't appear to ever trigger, and I don't see any test case for it (r169025).

Seems like this is a bug (I would not object to enhancing the IR to support fast-math flags on intrinsics, but as it stands, it is a bug).

spatel added inline comments.Oct 15 2014, 10:25 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗(On Diff #14898)

Ah - ok, that sounds reasonable. Is the outer/inner distinction specified somewhere? In the LangRef?

And yes, I think if we're committed to instruction-level IR fast-math-flags, then intrinsics deserve those decorations too.

How about this for now: I'll redo this patch and testcases to check for a function-level attribute, add a TODO comment that we should revisit it if and when intrinsics get FMF support, and open a bug to add FMF support to intrinsics.

We had also run into the issue of FMF on intrinsics in http://reviews.llvm.org/D5222. We weren't sure if we could just treat the function-level attribute as a convenience, but I don't think we can do that in the following case: inlining/LTO where strict code gets inlined into fast code. The strict code is the "inner" logic in that case...so even though it wouldn't have IR-level FMF, it should bend to the "outer" function's attributes and IR-level FMF?

hfinkel added inline comments.Oct 15 2014, 10:48 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗(On Diff #14898)

Ah - ok, that sounds reasonable. Is the outer/inner distinction specified somewhere? In the LangRef?

Looking at the LangRef, it seems very unclear on this point. We should make this more clear, because it was designed (not by me) to support a specific use case involving inlining during LTO, and we should spell that out.

How about this for now: I'll redo this patch and testcases to check for a function-level attribute, add a TODO comment that we should revisit it if and when intrinsics get FMF support, and open a bug to add FMF support to intrinsics.

Yes, this sounds good.

We had also run into the issue of FMF on intrinsics in http://reviews.llvm.org/D5222. We weren't sure if we could just treat the function-level attribute as a convenience, but I don't think we can do that in the following case: inlining/LTO where strict code gets inlined into fast code. The strict code is the "inner" logic in that case...so even though it wouldn't have IR-level FMF, it should bend to the "outer" function's attributes and IR-level FMF?

Sounds right.

spatel updated this revision to Diff 14960.Oct 15 2014, 1:09 PM

Code and regression tests updated to use a function-level attribute to enable the optimization.

Note that the existing optimization to do (X*Y) * X => (X*X) * Y in InstCombineMulDivRem also needs to detect the function-level attribute, otherwise it won't reassociate the factors to the square root into the canonical form.

hfinkel edited edge metadata.Oct 15 2014, 6:02 PM

Adding for the record...

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
453 ↗(On Diff #14960)

As we discussed elsewhere (PR21291), this is incorrect.

spatel updated this revision to Diff 15027.Oct 16 2014, 10:31 AM
spatel edited edge metadata.

Removed the change to InstCombineMulDivRem and the additional test case; function-level attributes do NOT override IR-level FMF.
But since we have no FMF support for intrinsics currently, added FIXMEs where we have to use the function-level attribute to make this optimization possible. The new testcases also have 'fast' at the IR level on all fmul. I think that should be a requirement for this optimization.

The intrinsic enhancement is filed as:
http://llvm.org/bugs/show_bug.cgi?id=21290

but as discussed here:
http://llvm.org/bugs/show_bug.cgi?id=21291

the IR-level FMF is not actually doing the job it was intended to do because the backend currently overrides/ignores all IR-level FMF.

hfinkel accepted this revision.Oct 16 2014, 10:54 AM
hfinkel edited edge metadata.

LGTM, thanks!

This revision is now accepted and ready to land.Oct 16 2014, 10:54 AM
spatel closed this revision.Oct 16 2014, 11:58 AM
spatel updated this revision to Diff 15034.

Closed by commit rL219944 (authored by @spatel).

Thanks, Hal! Checked in with r219944.