This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize fneg before fmul/fdiv
ClosedPublic

Authored by spatel on Jul 29 2019, 8:23 AM.

Details

Summary

I'm proposing to reverse the canonicalization of fneg relative to fmul/fdiv. That makes it easier to implement the transforms (and possibly other fneg transforms) in 1 place because we can always start the pattern match from fneg (either the legacy binop or the new unop).

There's a secondary practical benefit seen in PR21914 and PR42681:
https://bugs.llvm.org/show_bug.cgi?id=21914
https://bugs.llvm.org/show_bug.cgi?id=42681
...hoisting fneg rather than sinking seems to play nicer with LICM in IR (although this change may expose analysis holes in the other direction).

  1. The instcombine test changes show the expected neutral IR diffs from reversing the order.
  1. The reassociation tests show that we were missing an optimization opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says that all of these transforms are allowed (regardless of binop/unop fneg version) because:

"For all other operations [besides copy/abs/negate/copysign], this standard does not specify the sign bit of a NaN result."
In all of these transforms, we always have some other binop (fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a potential intermediate NaN operand.
(If that interpretation is wrong, then we must already have a bug in the existing transforms?)

  1. The clang tests shouldn't exist as-is, but that's effectively a revert of rL367149 (the test broke with an extension of the pre-existing fneg canonicalization in rL367146).

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Jul 29 2019, 8:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2019, 8:23 AM
xbolva00 added inline comments.
llvm/test/Transforms/InstCombine/fadd.ll
163 ↗(On Diff #212169)

Comment needs to be fixed

spatel marked 2 inline comments as done.Jul 29 2019, 9:10 AM
spatel added inline comments.
llvm/test/Transforms/InstCombine/fadd.ll
163 ↗(On Diff #212169)

Ah, right. And some other tests have assumed the transform in the test name. Fix coming up.

spatel updated this revision to Diff 212177.Jul 29 2019, 9:12 AM
spatel marked an inline comment as done.

Patch updated:
Fixed test comments/names to reflect change in transformations.

mcberg2017 added a comment.EditedJul 29 2019, 10:45 AM

General comment: For us internally at Apple this does cause regressions, its small but there. Mostly because fneg becomes a side effect for fp arithmetic ops for us. I would request a target info guard which we can turn off to avoid the drift. If not, this is small enough to live with.

General comment: For us internally at Apple this does cause regressions, its small but there. Mostly because fneg becomes a side effect for fp arithmetic ops for us. I would request a target info guard which we can turn off to avoid the drift. If not, this is small enough to live with.

This is target-independent instcombine, so I don't think we want to add hooks at this level. If you see regressions transforming in this direction, do you also see regressions from the current code (ie, does disabling all fneg transforms make things even better)? That would suggest that we remove these transforms from instcombine altogether and create a dedicated target-dependent IR pass for fneg folds that behaves more like what we have in DAGCombiner.

mcberg2017 accepted this revision.Jul 29 2019, 11:36 AM

Turning off inst combining fneg opts does not make things better for us however. It seems to consistently cause additional regressions. We can live with the new change then.

This revision is now accepted and ready to land.Jul 29 2019, 11:36 AM
  1. The reassociation tests show that we were missing an optimization opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says that all of these transforms are allowed (regardless of binop/unop fneg version) because:

    "For all other operations [besides copy/abs/negate/copysign], this standard does not specify the sign bit of a NaN result."

    In all of these transforms, we always have some other binop (fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a potential intermediate NaN operand. (If that interpretation is wrong, then we must already have a bug in the existing transforms?)

Oh, that's an interesting thought...

There's one edge-case I've been playing with that is worth mentioning. This transform could cause problems with denormals and Flush-To-Zero (FTZ). For example:

-0.0 - D           -->  0.0
negate(D)          -->  -D

AFAICT, FTZ breaks IEEE-754 compatibility already, so all bets are off at that point. My certainty-level is low about this though. I still need to work through it.

Saw that this was Accepted while I was writing my last comment...

I don't think my comment should hold up this Diff. I'm just thinking aloud and it's something that could be easily fixed later.

mcberg2017 added a comment.EditedJul 29 2019, 11:46 AM

How about we add a TODO comment regarding Cameron's topic? Would that be sufficient?

How about we add a TODO comment regarding Cameron's topic? Would that be sufficient?

To be fair, this is a larger open question. There are a number of transforms that may need to be addressed in the future from it.

It's probably fine to submit without a comment.

How about we add a TODO comment regarding Cameron's topic? Would that be sufficient?

To be fair, this is a larger open question. There are a number of transforms that may need to be addressed in the future from it.

It's probably fine to submit without a comment.

Thanks for the quick reviews! I agree that there are potential pitfalls in instcombine for targets with FTZ/DAZ. If you grep around in here, you can find some transforms that are partially limited to avoid that, but I can't see how those limitations work in the general case where we don't know some value is a denorm at compile-time. I suspect we need to expand our fast-math-flags and/or use of function attributes to account for targets that are not IEEE-754 compliant even though LLVM IR assumes that model by default.

This revision was automatically updated to reflect the committed changes.