Page MenuHomePhabricator

[FPEnv] Default NoFPExcept SDNodeFlag to false

Authored by uweigand on Dec 23 2019, 6:41 AM.



The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization.

This patch changes the default to false. This requires moving the setting of the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again.

In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon is called on an original node that can not raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception.

To check whether or not an SD node should be considered as raising an FP exception, the following logic applies:

  • For machine nodes, check the mayRaiseFPException property of the underlying MI instruction
  • For regular nodes, check isStrictFPOpcode
  • For target nodes, check a newly introduced isTargetStrictFPOpcode

The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.)

I've updated SystemZ and X86 (the only targets currently defining target strict FP nodes) accordingly.

This patch causes a number of changes to X86 test cases, because instructions are now (correctly) marked fpexcept at the MI level, where this was missed by old code.

Note that the MI-level fpexcept flag is not inverted by this patch. For correctness reasons, this should still be done as well; I'm working on a follow-on patch.

Diff Detail

Event Timeline

uweigand created this revision.Dec 23 2019, 6:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2019, 6:41 AM

Note that this patch has D71840 as prerequisite.

craig.topper added inline comments.Dec 23 2019, 10:17 AM

Does this compare have fpexcept on it too?

uweigand updated this revision to Diff 235165.Dec 23 2019, 10:57 AM
uweigand marked an inline comment as done.
uweigand added inline comments.

Yes, it does ... updated the patch to check for it.

Another thing I forgot to mention: I've had to add a bit of code to the X86 target to set NoFPExcept flags. This is because you're using the same opcodes for both strict and non-strict compares, and adding the NoFPExcept in the non-strict avoids some codegen regressions we'd otherwise see.

I think a cleaner solution may be to use different opcodes for the two cases.


Shouldn't this also check for < FIRST_TARGET_MEMORY_OPCODE?

uweigand marked an inline comment as done.Dec 23 2019, 1:36 PM
uweigand added inline comments.

As I said above:

(Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.)

Not sure what the best way to handle this is.


Yeah, sorry, I comment before reading the whole description (or apparently the comment here). I'm not sure how much practical impact this will have, but it just feels wrong.

I've add STRICT versions of X86's target specific nodes in D71850

I've had a closer look at the vec-strict-fptoint-256.ll changes I mention here:

In one case this led to codegen changes (different register allocation) since some copy propagation is no longer run on instructions marked as fpexcept.

It turns out there's actually another problem. It is true that the first codegen change happens in MachineCSE, where code before this patch would coalease a few copies via MachineCSE::PerformTrivialCopyPropagation, which code after this patch no longer does. (This is because MachineCSE only propagates copies into instructions that are considered CSE candidates, and MI instructions that may raise FP exceptions no longer are considered such candidates.)

However, that change in itself would actually be harmless. Even with those copies still there, the register allocator would still coalesce them itself, resulting in the same generated machine code. However, there is another problem in between, in a X86 target-specific pass: X86FixupSetCCPass. Before my patch, this pass would eliminate a zero-extend after a SETCC, after my patch, it no longer does -- and this is what causes the visible machine code differences.

The reason for this difference is here:

// We expect the instruction *immediately* before the setcc to imp-def
// EFLAGS (because of scheduling glue). To make this less brittle w.r.t
// scheduling, look backwards until we hit the beginning of the
// basic-block, or a small bound (to avoid quadratic behavior).
MachineInstr *
X86FixupSetCCPass::findFlagsImpDef(MachineBasicBlock *MBB,
                                   MachineBasicBlock::reverse_iterator MI) {
  // FIXME: Should this be instr_rend(), and MI be reverse_instr_iterator?
  auto MBBStart = MBB->rend();
  for (int i = 0; (i < SearchBound) && (MI != MBBStart); ++i, ++MI)
    for (auto &Op : MI->implicit_operands())
      if (Op.isReg() && (Op.getReg() == X86::EFLAGS) && Op.isDef())
        return &*MI;

  return nullptr;

where SearchBound is a constant (16). The problem is now that simply due to the presence of an extra COPY, the distance between the eflags defining instruction and the SETCC now exceeds this arbitrary limit.

Increasing the SearchBound setting gets me back the original machine code for this test case even with my patch.

I'm wondering why this nested loop is even necessary: the main loop of the pass is a forward scan of all MIs, so if you'd simply scan instructions for whether they define eflags as you iterate over them, you'd get the immediately preceding eflags-def for free ...

uweigand updated this revision to Diff 235291.Dec 25 2019, 11:44 AM
uweigand edited the summary of this revision. (Show Details)

Updated to current mainline.

Removed changes to X86ISelDAGToDAG.cpp and X86ISelLowering.cpp -- no longer necessary after a21becc.

Removed changes to test/CodeGen/X86/vec-strict-fptoint-256.ll -- no longer necessary after 4af5b23.

craig.topper added inline comments.Dec 26 2019, 3:57 PM

Why is only NodeToMatch important? Couldn't we be pattern matching multiple strict nodes?

uweigand updated this revision to Diff 235626.Dec 30 2019, 11:29 AM

Check all matched nodes whether they may raise FP exceptions.

uweigand marked 2 inline comments as done.Dec 30 2019, 11:30 AM
uweigand added inline comments.

Good point. I've changed the code to check whether any of the matched nodes may raise an exception.

This revision is now accepted and ready to land.Dec 31 2019, 10:24 AM
This revision was automatically updated to reflect the committed changes.
uweigand marked an inline comment as done.