This is an archive of the discontinued LLVM Phabricator instance.

Extending constant folding for float arithmetic to isFast IR flags
AbandonedPublic

Authored by mcberg2017 on May 16 2018, 11:39 AM.

Details

Summary

The patch addresses a single transform of selection DAG optimization, namely a few special cases of constant folding, extending the guard for isFast on the SDNodeFlags passed to getNode for fadd, fsub, fmul, fdiv and frem.

Diff Detail

Event Timeline

mcberg2017 created this revision.May 16 2018, 11:39 AM

This patch is split off of D46562.

What is the test actually testing? I don't think there should be any difference from this in the floor lowering, and I don't really want a separate test file duplicating all of the lowering testing

I can simplify this somewhat, I will make a new test that hits the patterns for folding only for all ops of concern, then we will have a way to test with unsafe and without, using the new guard to ensure that the CHECK result is the same for both.

mcberg2017 added a comment.EditedMay 16 2018, 1:14 PM

In the way of explanation of the floor test, I simply commented out unsafe and used only Flags.isFast() go guard the optimization, the original floor test broke. Matt, would you like me to include the overload for the floor test with the isFast path in the original file or is a specific functional lit test sufficient?

Added fast math flag capture from visitBinary so that getNode can make decisions in cases where SelectionDAGBuilder::visit has not yet been called. Also updated the test for just these cases and dropped the floor test.

missed a file...

Any issues remaining here?

spatel added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2772–2775 ↗(On Diff #147183)

I don't think we want repeat this here (otherwise, we need to set the flags like this for all FPMO).

This reminds me of a comment that @andreadb made (probably >3 years ago; I'm not finding it now) regarding DAGBuilder/DAGCombiner. IIRC, because of the way the DAG operates, we may need to repeat folds in the builder to ensure consistency.

So I think the right fix is to duplicate the simplifications, so the unnecessary node is not created in the 1st place.

test/CodeGen/X86/fp-fold.ll
5–6 ↗(On Diff #147183)

This isn't quite true. In all cases, we do not need full 'fast' to enable these transforms. All of the transforms should show the minimal FMF requirement in instsimplify at this point, so that's what we should emulate here.

Eg, x * 1.0 = x is true without any FMF.

spatel added inline comments.May 17 2018, 6:27 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4440–4462

This code structure really doesn't make sense as-is: we putting independent cases together even though they share no common code.

Cleaned up constant folding to follow flag rules used in other locations to guard optimization and reordered code for context. The tests also reflect the min flag usage and duality with unsafe.

spatel added inline comments.May 17 2018, 2:58 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4442–4444

This doesn't match instsimplify:

// fadd X, 0 ==> X, when we know X is not -0
// fsub X, +0 ==> X (no flags needed)

But make sure (add tests) for -0.0 too, so we get that right:

// fadd X, -0 ==> X (no flags needed)
// fsub X, -0 ==> X, when we know X is not -0
4443–4445

Again, these are not the correct predicates:

// fmul nnan nsz X, 0 ==> 0
test/CodeGen/X86/fp-fold.ll
26–35 ↗(On Diff #147380)

Make sure I understand this - we need to change the code in DAGCombiner to fold this? Why is this case different than the others?

It's probably better if you check in the baseline tests first, so we're only getting the functional diffs from this patch.

These changes reflect what is in Transforms/InstSimplify/fast-math.ll
I have a baseline version of this test ready to roll, will check that in first.

As for the fmul_zero case, the call producing the getNode is from visitBinary, which cannot provide flags as they are created and passed in without any representation from the current Instruction (which does have the flags). I believe we need to add some new coverage in the DAGCombiner so that the case above is covered so we do not get to this scenario as the STRICT case is unoptimized. For now the test marks the short fall.

spatel added inline comments.May 18 2018, 10:25 AM
test/CodeGen/X86/fp-fold.ll
32–36 ↗(On Diff #147434)

This is wrong when x is -0.0:
(-0.0) - (-0.0) --> (-0.0) + 0.0 --> 0.0

mcberg2017 added a comment.EditedMay 18 2018, 11:47 AM

So something to keep in mind, because D46562 was split, part of its code landing here, we are missing flag based visit functionality, where unsafe behavior differs, so for this patch there will be some holes. Its the reason fmul_zero diverges and when you see the next upload, why fsub_zero_2 will also diverge in the tests. However, D46562 will attempt to deal with most of the issues that remain via sub flag usage in the visit functions.

mcberg2017 updated this revision to Diff 147572.EditedMay 18 2018, 12:10 PM

Special cases for fsub, we will need mirrored functionality in visit for fsub via flags to collect on matching optimizations in unsafe. The same is true for other operations.

mcberg2017 added inline comments.May 18 2018, 12:25 PM
test/CodeGen/X86/fp-fold.ll
32–36 ↗(On Diff #147434)

A related question is that unsafe does do that optimization, is it incorrectly doing so?

spatel added inline comments.May 18 2018, 1:58 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4441–4445

Still not quite right.
Add and sub should have similarly structured code: for 1 version of 0.0, we don't need 'nsz' and for the other version of 0.0, we do need 'nsz'.

test/CodeGen/X86/fp-fold.ll
32–36 ↗(On Diff #147434)

That goes back to how we interpret "-enable-unsafe-fp-math". My reading of the text:

cl::desc("Enable optimizations that may decrease FP precision")

says that yes, we have a bug (ignoring signed-zero is not covered by that parameter).

But as we noted in one of the earlier reviews, clang only sets this when you pile together a bunch of independent FP loosening:

if (!MathErrno && AssociativeMath && ReciprocalMath && !SignedZeros &&
    !TrappingMath)
  CmdArgs.push_back("-menable-unsafe-fp-math");

...so the bug is probably not easily visible for anyone besides compiler hackers?

Updated after rL332756 for just the test changes. Are we ready to roll then?

Updated after rL332756 for just the test changes. Are we ready to roll then?

There are 4 cases each for fadd/fsub:

  1. 0.0
  2. -0.0
  3. nsz 0.0
  4. nsz -0.0

I added those tests in rL332780. Can you rebase with that?
But now I'm not sure why we're bothering with these folds in getNode(). We have to repeat them in DAGCombiner anyway, right? Isn't this the equivalent of doing simplifications in IRBuilder? Are we doing this because it's a big perf win (seems unlikely)?

Rebased a test in fp_fold.ll

I am hoping to remove part of these changes once the DAGCombiner work is in place.

I know we'll probably just kill this whole chunk of code in a subsequent patch, but the logic should be complete/correct. I think we want this for fadd/fsub:

case ISD::FADD:
  // fadd X, -0.0 ==> X (no flags needed)
  // fadd X, 0.0 ==> X, when we know X is not -0
  // FIXME: Unsafe math doesn't imply no-signed-zeros.
  if (N2CFP && N2CFP->isZero())
    if (N2CFP->isNegative() || getTarget().Options.UnsafeFPMath ||
        Flags.hasNoSignedZeros())
      return N1;
  break;
case ISD::FSUB:
  // fsub X, 0.0 ==> X (no flags needed)
  // fsub X, -0.0 ==> X, when we know X is not -0
  // FIXME: Unsafe math doesn't imply no-signed-zeros.
  if (N2CFP && N2CFP->isZero())
    if (!N2CFP->isNegative() || getTarget().Options.UnsafeFPMath ||
        Flags.hasNoSignedZeros())
      return N1;
  break;
arsenm added inline comments.May 23 2018, 4:37 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4440

Not particularly a comment on this patch, but I was wondering what's the reasoning for doing this in getNode? I've never liked how getNode attempts to optimize nodes on creation, especially since a lot of node visitors in DAGCombiner attempt the same folds (and kind of have to since a ReplaceNodeResults may trigger the same situation). Have you found an explanation?

spatel added inline comments.May 23 2018, 6:13 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4440

I hinted at the only explanation I could think of: it's being done as a compile-time optimization. I doubt that's a meaningful win, so I wouldn't mind skipping this step entirely - just fix the folds in DAGCombiner and ignore this.

Given the last round of comments, I am going to spend some time on the DAG combiner work in D46562 to see if I can remove the need for these changes. More later...

mcberg2017 abandoned this revision.Jun 14 2018, 12:33 PM