Page MenuHomePhabricator

propagate fast-math-flags on DAG nodes
ClosedPublic

Authored by spatel on Aug 17 2015, 6:11 PM.

Details

Summary

After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. :)

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings.

Note: I didn't try to locate all the spots where the existing nsw / nuw / exact flags should be propagated, but it doesn't look like those are handled correctly either. Maybe it doesn't matter as much? Nobody has reported a problem from those, and they've been in the DAG for over a year ( http://reviews.llvm.org/rL210467 ).

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 32361.Aug 17 2015, 6:11 PM
spatel retitled this revision from to propagate fast-math-flags on DAG nodes.
spatel updated this object.
spatel added reviewers: hfinkel, escha, nlewycky.
spatel added a subscriber: llvm-commits.
hfinkel edited edge metadata.Aug 27 2015, 10:25 AM

Thanks for continuing to work on this! I think this sets out a fairly-concrete plan to move forward (we need to do this, and then also add flags on FNEG, the comparisons and intrinsics). We need the flags on intrinsics at the IR level anyway.

Note: I didn't try to locate all the spots where the existing nsw / nuw / exact flags should be propagated, but it doesn't look like those are handled correctly either. Maybe it doesn't matter as much? Nobody has reported a problem from those, and they've been in the DAG for over a year ( http://reviews.llvm.org/rL210467 ).

This is not a safe assumption; performance can vary a lot because of the rate of development and other factors, and the number of small performance regressions that are investigated in detail is very small.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3304 ↗(On Diff #32361)

Why did you add { } here?

4290 ↗(On Diff #32361)

This is repeated a lot. Can we add a utility function to SDNode that does this?

spatel added inline comments.Aug 30 2015, 1:18 PM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3304 ↗(On Diff #32361)

No good reason - leftover bits from an earlier draft where I had added a temp variable. Will fix.

4290 ↗(On Diff #32361)

Sure - that would have saved a lot of cut and paste. :)
I think this would be a virtual function in SDNode that returns a nullptr and derived classes would return a pointer to their SDNodeFlags if they have them. However, I see that SDNode has no virtual functions, so adding one means I should also add a virtual destructor? But is the lack of virtual functions in this class to avoid a vtable for size/perf? If that's a conscious design decision, is it better to fake a virtual function with something like:

const SDNodeFlags *SDNode::getFlags() const {
  if (auto *FlagsNode = dyn_cast<BinaryWithFlagsSDNode>(this))
    return &FlagsNode->Flags;
  return nullptr;
}
hfinkel added inline comments.Aug 30 2015, 2:18 PM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4290 ↗(On Diff #32361)

I think that it is a conscious design decision; I'd not induce the creation of a vtable just for this, and your suggestion is what I had in mind (it probably can't be inlined either way).

spatel updated this revision to Diff 33586.Aug 31 2015, 10:10 AM
spatel edited edge metadata.

Patch updated based on Hal's feedback:

  1. Remove unnecessary braces in LegalizeDAG.cpp.
  2. Add helper function for extracting flags from SDNode (with comment about non-virtual definition).
  3. Replace all dyn_cast sequences for getting flags to use helper function.
  4. Rebase to r246447.
spatel updated this revision to Diff 33592.Aug 31 2015, 10:21 AM

Patch updated: previous upload lost the new regression test file.

spatel added a comment.Sep 8 2015, 7:23 AM

Ping.

Note: the getFlags() implementation is probably only a temporary concern; I think we'll be pulling the flags into the SDNode class itself if we generalize flags to be used by intrinsics and other instructions.

Also for reference - another potential use/extension of FMF in the backend was raised here:
https://llvm.org/bugs/show_bug.cgi?id=24512#c12

nlewycky edited edge metadata.Sep 15 2015, 4:27 PM
nlewycky added a subscriber: nlewycky.

I just noticed that I'm listed as a reviewer here. I'm not familiar enough
with this are of llvm to review this patch. Do you need me to patch it in
and try it out?

I just noticed that I'm listed as a reviewer here. I'm not familiar enough
with this are of llvm to review this patch. Do you need me to patch it in
and try it out?

Thank you for the offer, Nick. I'm not expecting any correctness bugs to manifest with this patch. I think there are just potential perf differences resulting from failure to CSE nodes. If your test setup can detect asm or perf diffs, then certainly, I would welcome more testing.

Escha, if you could try this with your out-of-tree target to verify that it doesn't break anything, that would be great. If not, then I think we'll just push this into trunk and see if anyone can detect any differences.

The patch wasn't too far off last time Hal looked it over, so I hope this is almost ready to go. Hal, any other feedback?

escha edited edge metadata.Sep 15 2015, 5:44 PM

It's good on our test suite. There's regressions initially, but literally all of them go away when I apply my patch for passing through DAG flags in our target-specific DAG combines. I think it's actually NFC across ~10,000 shaders when that second patch is applied.

escha accepted this revision.Sep 15 2015, 5:44 PM
escha edited edge metadata.
This revision is now accepted and ready to land.Sep 15 2015, 5:44 PM
spatel added a comment.EditedSep 16 2015, 8:29 AM

It's good on our test suite. There's regressions initially, but literally all of them go away when I apply my patch for passing through DAG flags in our target-specific DAG combines. I think it's actually NFC across ~10,000 shaders when that second patch is applied.

Thanks, Escha. I'll rebase, commit, and send a note to the dev list so other out-of-tree users know about this change.

This revision was automatically updated to reflect the committed changes.