This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] require UnsafeFPMath for re-association of addition
ClosedPublic

Authored by nhaehnle on Jan 13 2017, 5:38 AM.

Details

Summary

The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Event Timeline

nhaehnle updated this revision to Diff 84288.Jan 13 2017, 5:38 AM
nhaehnle retitled this revision from to [DAGCombine] require UnsafeFPMath for re-association of addition.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
mehdi_amini edited edge metadata.Jan 13 2017, 8:06 AM

What about individual FMF?

hfinkel edited edge metadata.Jan 13 2017, 8:18 AM

What about individual FMF?

I agree; these transforms rely on the associativity of addition. As Mehdi says, you should check the FMFs on the individual instructions.

jlebar edited edge metadata.Jan 13 2017, 8:28 AM
jlebar added a subscriber: jlebar.
arsenm added inline comments.Jan 13 2017, 8:50 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8136–8141

I think this should be replacing the (AllowFusion || HasFMAD) check and then picking the right opcode inside

arsenm edited edge metadata.Jan 16 2017, 4:06 PM

What about individual FMF?

I agree; these transforms rely on the associativity of addition. As Mehdi says, you should check the FMFs on the individual instructions.

This would be nice, but it seems right now you can only use FMF flags with a fixed set of binary nodes, so no FMA/FMAD

What about individual FMF?

I agree; these transforms rely on the associativity of addition. As Mehdi says, you should check the FMFs on the individual instructions.

This would be nice, but it seems right now you can only use FMF flags with a fixed set of binary nodes, so no FMA/FMAD

That's not great, looks like something that should be lifted, otherwise folding (fast add (fast mul (x, y), z) to FMA is losing the fast-math information and inhibiting potential further combine.

nhaehnle updated this revision to Diff 84647.Jan 17 2017, 3:04 AM
  1. Take individual UnsafeAlgebra flags into account.

    There's a question of whether the flag should be required on both of the involved nodes, or if having it one of them is sufficient. I opted for requiring it on both of them, to prevent unsafe flags from leaking out of where they were originally added.
  1. Simplify the outside condition; as Matt pointed out, the guard of (AllowFusion || HasFMAD) is no longer required, because each of the contained blocks has a check that is at least as strict.
spatel edited edge metadata.Jan 17 2017, 7:41 AM
  1. Take individual UnsafeAlgebra flags into account.

    There's a question of whether the flag should be required on both of the involved nodes, or if having it one of them is sufficient. I opted for requiring it on both of them, to prevent unsafe flags from leaking out of where they were originally added.

That sounds right. We can't reassociate strict ops in general, so the lack of hasUnsafeAlgebra() on any of the operands that will change in these transforms should prevent the transform (unless the FMA is guaranteed to produce a better result like we showed in D26602?).

And yes, it's a moot point for this patch currently because we haven't extended FMF beyond the binary FP ops.

  1. Take individual UnsafeAlgebra flags into account.

    There's a question of whether the flag should be required on both of the involved nodes, or if having it one of them is sufficient. I opted for requiring it on both of them, to prevent unsafe flags from leaking out of where they were originally added.

That sounds right. We can't reassociate strict ops in general, so the lack of hasUnsafeAlgebra() on any of the operands that will change in these transforms should prevent the transform (unless the FMA is guaranteed to produce a better result like we showed in D26602?).

And yes, it's a moot point for this patch currently because we haven't extended FMF beyond the binary FP ops.

Okay. Please add a FIXME wr.t. adding FMF on the FMA nodes, and proceed with fixing the bug.

nhaehnle updated this revision to Diff 84954.Jan 19 2017, 3:41 AM

Add FIXMEs. @hfinkel, do I understand your comment correctly that this patch is
good to go?

Add FIXMEs. @hfinkel, do I understand your comment correctly that this patch is
good to go?

The patch will cause a crash on a modified form of the PPC tests because we can't ask for flags of an FMA (sigh):

define double @test_FMADD_ASSOC1(double %A, double %B, double %C, double %D, double %E) {
  %F = fmul fast double %A, %B
  %G = fmul fast double %C, %D
  %H = fadd fast double %F, %G
  %I = fadd fast double %H, %E
  ret double %I
}
$ ./llc -march=ppc32 -fp-contract=fast  fma_crash.ll -o -
	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 12
	.machine ppc
	.section	__TEXT,__textcoal_nt,coalesced,pure_instructions
	.section	__TEXT,__symbol_stub1,symbol_stubs,pure_instructions,16
	.section	__TEXT,__text,regular,pure_instructions
Stack dump:
0.	Program arguments: ./llc -march=ppc32 -fp-contract=fast fma_crash.ll -o - 
1.	Running pass 'Function Pass Manager' on module 'fma_crash.ll'.
2.	Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on function '@test_FMADD_ASSOC1'

The patch will cause a crash [...]

Of course *sigh*

Out of the various forms of dealing with this, how about this:

 const SDNodeFlags *SDNode::getFlags() const {
   if (auto *FlagsNode = dyn_cast<BinaryWithFlagsSDNode>(this))
     return &FlagsNode->Flags;
-  return nullptr;
+
+  static const SDNodeFlags DefaultFlags;
+  return &DefaultFlags;
 }

That keeps callers of getFlags() simple. In terms of run-time performance it also seems reasonable: getFlags() isn't inlined anyway (it lives in SelectionDAG.cpp).

Obviously it'd be even easier to skip the getFlags() thing entirely now, since it can never evaluate to true. But the right way forward is to extend support for FMF, so...

Obviously it'd be even easier to skip the getFlags() thing entirely now, since it can never evaluate to true. But the right way forward is to extend support for FMF, so...

Right - I think your previous suggestion to return a default flags object will work.

nhaehnle updated this revision to Diff 85133.Jan 20 2017, 6:56 AM

Added the DefaultFlags. It does fix @spatel's example for me. Anything else?

Added the DefaultFlags. It does fix @spatel's example for me. Anything else?

That's all from me (someday, I'll get back to FMF in the DAG...), but I think someone who knows one of the affected backends should provide the approval.

Can you add a few new tests which do the combine just from the FMF flags?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8138

: after FIXME (same for the other places)

8141

I don't think you need the getNode here (and same for the rest

hfinkel added inline comments.Jan 25 2017, 4:32 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8141

Why are you checking N0->getFlags()->hasUnsafeAlgebra() if N0 is a FMA and the comment says they don't support FMFs?

nhaehnle updated this revision to Diff 86084.Jan 27 2017, 11:24 AM

Formatting of FIXMEs

@arsenm, unfortunately I can't add flag-based tests, since as @hfinkel points out, you can't actually get the flag on an FMA or FMAD in the DAG. @hfinkel, the idea was to put the check in already anyway, since by the discussion with @spatel, the plan is to add FMF for all nodes eventually. That seems reasonable to me.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8138

Done.

8141

I double-checked, SDValue doesn't have getFlags().

@arsenm, unfortunately I can't add flag-based tests, since as @hfinkel points out, you can't actually get the flag on an FMA or FMAD in the DAG. @hfinkel, the idea was to put the check in already anyway, since by the discussion with @spatel, the plan is to add FMF for all nodes eventually. That seems reasonable to me.

But you can't because they'll crash:

const SDNodeFlags *SDNode::getFlags() const {

if (auto *FlagsNode = dyn_cast<BinaryWithFlagsSDNode>(this))
  return &FlagsNode->Flags;
return nullptr;

}

so if you actually reach that code, right now, when those nodes don't support flags, then you'll get the nullptr return here on which you're calling a member function. In any case, I'd leave them out for now. Someone needs to look at this again anyway to fix the FIXMEs, and then we'll add the code when it is testable.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8141

I double-checked, SDValue doesn't have getFlags().

The point is that SDValue has:

SDNode *getNode() const { return Node; }

and also:

inline SDNode *operator->() const { return Node; }

so the -> operator is just like getNode().

But you can't because they'll crash [...]

That's what the hunk in SelectionDAG.cpp is about.

And d'oh on the SDValue thing, I'll fix that.

But you can't because they'll crash [...]

That's what the hunk in SelectionDAG.cpp is about.

OIC, alright. I recommend that, if you want this to go into the branch, we do a targeted fix and then expand on that in follow-up.

And d'oh on the SDValue thing, I'll fix that.

nhaehnle updated this revision to Diff 86096.Jan 27 2017, 12:16 PM

Drop unnecessary getNode()s.

But you can't because they'll crash [...]

That's what the hunk in SelectionDAG.cpp is about.

OIC, alright. I recommend that, if you want this to go into the branch, we do a targeted fix and then expand on that in follow-up.

OTOH, maybe it is safer this way. Hrmm.

And d'oh on the SDValue thing, I'll fix that.

Yeah, I personally feel more comfortable having the exact same change on the branch.

Yeah, I personally feel more comfortable having the exact same change on the branch.

I meant that having getFlags() always return something dereferenceable seems safer.

Yeah, I personally feel more comfortable having the exact same change on the branch.

I meant that having getFlags() always return something dereferenceable seems safer.

Aha, I misunderstood. It's still not clear what those two comments mean together :)

Do you strongly prefer a very limited fix, meaning that looking at getFlags() is postponed for whoever gets around to supporting those on FMA(D) when they do? Either way, the FIXME stays (maybe rephrased).

Yeah, I personally feel more comfortable having the exact same change on the branch.

I meant that having getFlags() always return something dereferenceable seems safer.

Aha, I misunderstood. It's still not clear what those two comments mean together :)

Do you strongly prefer a very limited fix, meaning that looking at getFlags() is postponed for whoever gets around to supporting those on FMA(D) when they do? Either way, the FIXME stays (maybe rephrased).

I'm not sure how strongly I feel about it, but I think it is better to save that until we make it work. That way we can make sure we get test cases for it at the time. That seems better than adding dead code.

nhaehnle updated this revision to Diff 86255.Jan 30 2017, 1:27 AM

Backing out the effectively dead FMF code, but leave the corresponding
FIXMEs in place.

hfinkel accepted this revision.Jan 30 2017, 6:28 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jan 30 2017, 6:28 AM
This revision was automatically updated to reflect the committed changes.