This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] remove constant folding limitations based on FP exceptions
ClosedPublic

Authored by spatel on Apr 30 2019, 11:47 AM.

Details

Summary

We don't have any FP exception limits in the IR constant folder (apart from strict ops), so I don't think it makes sense to have them here in the DAG either. Nothing else in the backend tries to preserve those (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions.

Diff Detail

Event Timeline

spatel created this revision.Apr 30 2019, 11:47 AM
efriedma accepted this revision.Apr 30 2019, 12:19 PM

LGTM; this matches the direction we're going, where we use different DAG nodes for functions which care about FP exceptions. (Please wait a couple days in case I'm missing something obvious.)

This revision is now accepted and ready to land.Apr 30 2019, 12:19 PM

Rather than get rid of this, can we update it to handle strict FP nodes correctly? Or at least put a skeleton in place to prepare to handle it correctly?

Right now I think we're throwing away the rounding mode and exception semantics arguments when we convert constrained intrinsics to strict FP nodes, but if we still had that information here we could use it to make a smart decision. For instance, suppose we had an intrinsic like this:

%t = call double @llvm.experimental.constrained.fadd.f64(double 4.3, double 6.5,
                                                         metadata !"round.upward",
                                                         metadata !"fpexcept.ignore")

If that got to the selection DAG we would be able to fold it.

Yes, we could possibly constant-fold STRICT_FADD etc. using similar logic, but I'm not sure we would actually want to reuse the existing codepath in foldConstantFPMath. In any case, it'll be in the revision history if someone does end up wanting to revive the code for that.

andrew.w.kaylor added a comment.EditedApr 30 2019, 12:55 PM

Why wouldn't we want to reuse this code path?

We made the decision that we are initially willing to live with poor optimization in order to get correctness for the constrained FP implementation, but the plan all along has been that after we had correct results we wanted to go back to re-enable optimizations. I'm just trying to avoid making that path harder.

What I'm asking for here is something like this:

SDValue SelectionDAG::foldConstantFPMath(unsigned Opcode, const SDLoc &DL,
                                         EVT VT, SDValue N1, SDValue N2) {
  auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1.getNode());
  auto *N2CFP = dyn_cast<ConstantFPSDNode>(N2.getNode());
  bool HasFPExceptions = false;
  APFloat::roundingMode = APFloat::round_to_nearest;
  // FIXME: Add support for strict FP rounding and exception semantics.

That would produce the same effect as this patch. To support it we'd just need extra arguments to provide the rounding mode and exception semantics. I think those are probably even available not far up the stack from where this is called.

Why wouldn't we want to reuse this code path?

We made the decision that we are initially willing to live with poor optimization in order to get correctness for the constrained FP implementation, but the plan all along has been that after we had correct results we wanted to go back to re-enable optimizations. I'm just trying to avoid making that path harder.

What I'm asking for here is something like this:

SDValue SelectionDAG::foldConstantFPMath(unsigned Opcode, const SDLoc &DL,
                                         EVT VT, SDValue N1, SDValue N2) {
  auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1.getNode());
  auto *N2CFP = dyn_cast<ConstantFPSDNode>(N2.getNode());
  bool HasFPExceptions = false;
  APFloat::roundingMode = APFloat::round_to_nearest;
  // FIXME: Add support for strict FP rounding and exception semantics.

That would produce the same effect as this patch. To support it we'd just need extra arguments to provide the rounding mode and exception semantics. I think those are probably even available not far up the stack from where this is called.

You also need to pass in the input chain and probably emit a MERGE_VALUES to connect it to the expected output chain. Probably need to call this from a different getNode function with the right number of input arguments and add STRICT_FP nodes to the switch in that function to call this.

We generally try to avoid keeping dead code in-tree, but I guess I'm not strongly opposed to it here.

We generally try to avoid keeping dead code in-tree, but I guess I'm not strongly opposed to it here.

I honestly don't have strong feelings about this particular change either. I'm more concerned about the way we think about strict FP handling. We've still got a lot of work to do to finish strict FP handling, and I would hate to see any code that currently has logic to handle strict exception semantics lose it. On the other hand, looking closer at the current code, I see that the logic there today isn't actually correct for strict exception semantics. So I guess if we decide to go with my suggestion then every place that's currently checking "Status != APFloat::opInvalidOp" should really be checking "Status == APFloat::opOK".

BTW, in addition to what Craig pointed out above with regard to the chain, he visited me and mentioned additional complications related to the fact that we aren't keeping the rounding mode and exception semantics in the DAG. We really need a way to fix that. I'm inclined to believe constant folding will still end up going through this function, but it'll be more work to get here than I realized.

We generally try to avoid keeping dead code in-tree, but I guess I'm not strongly opposed to it here.

I honestly don't have strong feelings about this particular change either. I'm more concerned about the way we think about strict FP handling. We've still got a lot of work to do to finish strict FP handling, and I would hate to see any code that currently has logic to handle strict exception semantics lose it. On the other hand, looking closer at the current code, I see that the logic there today isn't actually correct for strict exception semantics. So I guess if we decide to go with my suggestion then every place that's currently checking "Status != APFloat::opInvalidOp" should really be checking "Status == APFloat::opOK".

That would definitely be an improvement, but I'm not sure if that's enough. For example, if we intend to track denormals here, then we'd have to check the returned constant value rather than the status value?

It seemed to me that the strict and regular cases would end up sharing just the 1 line of code that computes the constant, so it might make more sense to group strict nodes together rather than interleave those cases with the regular ops.

BTW, in addition to what Craig pointed out above with regard to the chain, he visited me and mentioned additional complications related to the fact that we aren't keeping the rounding mode and exception semantics in the DAG. We really need a way to fix that. I'm inclined to believe constant folding will still end up going through this function, but it'll be more work to get here than I realized.

Yes, I think there's a lot of plumbing to do...

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4829–4830

These break statements (FADD to FREM) are unreachable.

spatel updated this revision to Diff 197611.May 1 2019, 12:29 PM
spatel marked an inline comment as done.

Patch updated:

  1. Added a TODO comment about handling strict opcodes.
  2. Removed unreachable 'break' statements in the switch.
andrew.w.kaylor accepted this revision.May 1 2019, 1:23 PM

I'm OK with this. Thanks for the TODO comment!

mcberg2017 accepted this revision.May 1 2019, 1:31 PM

LGTM, thanks Sanjay

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4233

Happened to stumble across more of these exception checks. Just a heads up...

spatel marked an inline comment as done.May 2 2019, 6:39 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4233

Thanks. Also, I just noticed that we do not constant fold the FMA intrinsic in IR. Haven't looked at the unary ops in IR yet, but I'll try to make this consistent.

This revision was automatically updated to reflect the committed changes.