This is an archive of the discontinued LLVM Phabricator instance.

Optimize square root squared (PR21126)
ClosedPublic

Authored by spatel on Oct 2 2014, 11:17 AM.

Details

Summary

When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.

This can happen in the real world when calculating x ** 3/2. This occurs in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 14338.Oct 2 2014, 11:17 AM
spatel retitled this revision from to Optimize square root squared (PR21126).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: hfinkel, majnemer, dexonsmith.
spatel added a subscriber: Unknown Object (MLST).
hfinkel accepted this revision.Oct 2 2014, 12:29 PM
hfinkel edited edge metadata.

LGTM.

test/Transforms/InstCombine/fmul.ll
130 ↗(On Diff #14338)

We don't need the "unsafe-fp-math"="true" because you're just checking the 'fast' on the fmul, right? If you don't need it, please remove it.

This revision is now accepted and ready to land.Oct 2 2014, 12:29 PM
spatel added inline comments.Oct 2 2014, 1:05 PM
test/Transforms/InstCombine/fmul.ll
130 ↗(On Diff #14338)

Thanks, Hal! Yes - the 'fast' alone is enough to do this optimization. I'm not a fan of the instruction-level fast-math flags...because it leads to another question. What should happen in the 2nd test case if it looks like this:

define double @sqrt_squared2(double %f) #0 {

%sqrt = call double @llvm.sqrt.f64(double %f)
%mul1 = fmul fast double %sqrt, %sqrt
%mul2 = fmul double %mul1, %sqrt
ret double %mul2

}

When the %mul1 operand is replaced in the 2nd fmul instruction, does that fmul instruction now become fast too? Is fast infectious?

hfinkel added inline comments.Oct 2 2014, 1:17 PM
test/Transforms/InstCombine/fmul.ll
130 ↗(On Diff #14338)

I think that you should change your mind ;)

The motivation behind the fast-math flags, in addition to providing client more-precise control over what instructions might be folded, was to create a "fast math" system that could survive inlining. As a result, the answer to your last question is no, it is not infectious, it is designed so that inlined "fast math" functions can have their instructions retain that property without affecting operations from the caller (which might not be "fast math" -- a particular problem when you consider LTO).

spatel added inline comments.Oct 2 2014, 1:44 PM
test/Transforms/InstCombine/fmul.ll
130 ↗(On Diff #14338)

I'd love to love this. :)

..but, you see what happens in this example:

  1. There's no function-level attribute for unsafe-math.
  1. When we kill the first fmul, we eliminate the only trace of 'fast'-ness in this function (ignore for the moment that the llvm.sqrt intrinsic isn't IEEE-754 compliant).
  1. After the optimization, we have a seemingly safe sqrt and a safe fmul. But the entirety of the code that will be produced for this function is decidedly unsafe. We just deleted the knowledge that the original function was not safe.

So in fact, fast is infectious, but silently so! LTO users should be aware that linking any unsafe code produces an overall unsafe program?

hfinkel added inline comments.Oct 2 2014, 1:56 PM
test/Transforms/InstCombine/fmul.ll
130 ↗(On Diff #14338)

Sure, but that's normally what the user wants! They want fast semantics for the routines in one module, and precise semantics for routines in another, then they want LTO to inline one into the other.

The fast flags are not meant to preserve a record of non-strict semantics -- that's not the optimizers job ;) -- but rather are intended to allow optimization control.

spatel added inline comments.Oct 2 2014, 2:08 PM
test/Transforms/InstCombine/fmul.ll
130 ↗(On Diff #14338)

Ok, sold. :)

I was just pretending to be an FP wonk. If the real FP wonks complain, please protect this optimization!

I'll fix up the testcase and get this checked in.

spatel closed this revision.Oct 2 2014, 2:21 PM
spatel updated this revision to Diff 14346.

Closed by commit rL218906 (authored by @spatel).