This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] allow fdiv folds with less than fully 'fast' ops
ClosedPublic

Authored by spatel on Feb 16 2018, 11:06 AM.

Download Raw Diff

Details

Reviewers

efriedma
wristow
MatzeB
hfinkel

Commits

rG31a90468e1e1: [InstCombine] allow fdiv folds with less than fully 'fast' ops
rL326098: [InstCombine] allow fdiv folds with less than fully 'fast' ops

Summary

These folds raise 2 questions for me as I'm trying to eliminate the use of 'isFast' here:

Do we need both 'arcp' and 'reassoc' to do this? My understanding is that -freciprocal-math allows this:

(X / Y) / Z --> X * (1.0 / Y) * (1.0 / Z)

...which wouldn't be useful here in InstCombine, but that might be a target-specific transform for the DAG.
If we are allowed to reassociate, then we reduce the above as:
X * (1.0 / Y) * (1.0 / Z) --> X * (1.0 / (Y * Z)) --> (X * 1.0) / (Y * Z) --> X / (Y * Z)

...but gcc currently says we can do this with just -freciprocal-math:
https://godbolt.org/g/vx7JnL

Should we allow this with just 'arcp'?

We're applying the intersected FMF to the new fmul op. That results in strict fmul ops in the regression tests. Is that too conservative? Strict ops inhibit further reassociation and might cause more FMF removal in early-cse (because that also does FMF intersection).

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Feb 16 2018, 11:06 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptFeb 16 2018, 11:06 AM

...but gcc currently says we can do this with just -freciprocal-math

Regardless of what gcc is doing in this case, I think both 'reassoc' and 'arcp' are needed here. So I think your code is right.

In D43398#1011150, @wristow wrote:

...but gcc currently says we can do this with just -freciprocal-math

Regardless of what gcc is doing in this case, I think both 'reassoc' and 'arcp' are needed here. So I think your code is right.

Thanks. I wasn't sure what weight should be given to matching gcc-based customer expectations, so I thought it was worth noting here. Either way, this gets us closer to their behavior because we at least don't have to enable all of -ffast-math.

Any thoughts about the FMF on the fmul? It can be a separate patch if that's preferred, but I think this is the only transform around here that is intersecting FMF. We always propagate flags from the last op to intermediate ops in other cases. That makes more sense to me because the new op only exists to produce an intermediate result for our transformed fdiv, so it should have the same relaxations as the fdiv.

spatel mentioned this in rL325533: [InstCombine] allow fdiv with constant dividend folds with less than full….Feb 19 2018, 1:49 PM

Patch updated:
No functional change from the last rev, but I cleaned up the surrounding folds, so this is the last use of isFast() in visitFDiv(). There's also one more test affected by this change.

Any thoughts about the FMF on the fmul? It can be a separate patch if that's preferred, but I think this is the only transform around here that is intersecting FMF. We always propagate flags from the last op to intermediate ops in other cases. That makes more sense to me because the new op only exists to produce an intermediate result for our transformed fdiv, so it should have the same relaxations as the fdiv.

That makes sense to me, too. And from my POV, it's sensible to do this in the same patch.

Patch updated:
Propagate FMF from the last fdiv to the new fmul.

Your reasoning makes sense to me. And I think it's fine to push this given no floating point expert objects.

Thanks for the reviews. I think we have unofficial approval. Can someone click the button to 'Accept' and/or issue the official 'looks good...'?

LGTM

This revision is now accepted and ready to land.Feb 25 2018, 6:22 PM

Closed by commit rL326098: [InstCombine] allow fdiv folds with less than fully 'fast' ops (authored by spatel). · Explain WhyFeb 26 2018, 8:05 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineMulDivRem.cpp

16 lines

test/