This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Let NSW/NUW flags be cleared by default in call to getNode().
ClosedPublic

Authored by jonpa on Aug 31 2020, 4:34 AM.

Details

Summary

Recently a new optimization was added to the DAGCombiner which exposed a problem with SDNode memoization (see bug report at https://bugs.llvm.org/show_bug.cgi?id=47092). The problem was that an existing ISD::ADD node node with NSW/NUW flags set was reused when the DAGCombiner produced an identical node except without any guarantee for no overflow. This led to false NSW/NUW flags on the resulting SystemZ MachineInstr, and since SystemZElimCompare.cpp trusts these flags when eliminating a compare with 0, wrong code resulted.

This patch changes the default behavior in these cases to clear any NSW/NUW flags in case they are not set by the caller of getNode(). This seems safer since otherwise wrong-code might result any time an optimization forgets to explicitly clear these flags.

This doesn't cause any test failures, and changes just one locr instruction on SPEC (cse.s) from locrlh to locrne (in that case an SRK instruction had an NSW flag which it seems it perhaps should not have...)

I am not sure about the FP flags so I left them out of this change, but maybe one or more of them also should be conservatively cleared by default?

Diff Detail

Event Timeline

jonpa created this revision.Aug 31 2020, 4:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2020, 4:34 AM
jonpa requested review of this revision.Aug 31 2020, 4:34 AM

Interesting. I agree that the flags should always be reset here. In fact, I'm questioning why we should have that "isDefined" logic in the first place. It seems to me that any node with "undefined" flags can only be used correctly if any setting of the flags would be semantically correct -- but then we may just as well use the all-zero setting of the flags anyway.

However, doing that for the floating-point flags would break the current logic in SelectionDAGBuilder::visit, which does:

if (SDNode *Node = getNodeForIRValue(&I)) {
  SDNodeFlags IncomingFlags;
  IncomingFlags.copyFMF(*FPMO);
  if (!Node->getFlags().isDefined())
    Node->setFlags(IncomingFlags);
  else
    Node->intersectFlagsWith(IncomingFlags);
}

So it relies on nodes initially being created without defined flags, and then setting the FMF flags after the fact.

But it seems to me this logic is actually wrong: the Node returned from getNodeForIRValue may itself have multiple uses, some of which allow FMF and others do not. So just blindly setting the flags here probably breaks in some cases.

Similarly, pretty much every other use of setFlags in SelectionDAG runs into the same issue. Really, the only safe usage of flags is to specific them in the initial DAG.getNode call to begin with. It seems this will require a significant refactoring.

However, as to the NUW/NSW flags specifically, your patch looks correct to me, but I'd appreciate comments from folks more familiar with the overall SelectionDAG logic as well.

qiucf added a subscriber: qiucf.Aug 31 2020, 9:35 PM
spatel added a comment.Sep 1 2020, 8:27 AM

I only scanned over the comments here and in the bug report so far, but let me list some reviews that contributed to the current state for more background:
D32527 - introduced the defined state bit
D46854 - changed code in SelectionDAGBuilder::visit()
D51145 - dealt with a flags intersection problem

Thanks for the links, @spatel ! It seems to me that the change in "D46854 - changed code in SelectionDAGBuilder::visit()" is actually not correctly respecting the shared flags semantics. The node returned from getNodeForIRValue, while implementing the translation of this IR, might at the same be be used elsewhere (due to DAG node merging). If that other place does not allow FMF semantics, the flags in the DAG node can still be "undefined", and therefore the new code in "visit" will just replace the flags with a version appropriate only for this IR (which may allow FMF).

Am I missing something here? More generally, how can any after-the-fact setFlags ever be correct, given that the node might have already been merged?

spatel added a comment.Sep 2 2020, 8:19 AM

Thanks for the links, @spatel ! It seems to me that the change in "D46854 - changed code in SelectionDAGBuilder::visit()" is actually not correctly respecting the shared flags semantics. The node returned from getNodeForIRValue, while implementing the translation of this IR, might at the same be be used elsewhere (due to DAG node merging). If that other place does not allow FMF semantics, the flags in the DAG node can still be "undefined", and therefore the new code in "visit" will just replace the flags with a version appropriate only for this IR (which may allow FMF).

Am I missing something here? More generally, how can any after-the-fact setFlags ever be correct, given that the node might have already been merged?

That does seem wrong given the Flags.isDefined() check sitting within intersectWith(). I'm not seeing any regression test failures if we remove that entirely. So let's do that instead of only pushing it below the nsw/nuw intersects?

llvm/test/CodeGen/SystemZ/int-cmp-60.ll
7–10

I think it would be less fragile to check the complete/correct asm for this test instead of using 'CHECK-NOT' on debug output.
If I'm seeing that correctly, we should get a "locrnhe" in the result if we fixed the bug.

Thanks for the links, @spatel ! It seems to me that the change in "D46854 - changed code in SelectionDAGBuilder::visit()" is actually not correctly respecting the shared flags semantics. The node returned from getNodeForIRValue, while implementing the translation of this IR, might at the same be be used elsewhere (due to DAG node merging). If that other place does not allow FMF semantics, the flags in the DAG node can still be "undefined", and therefore the new code in "visit" will just replace the flags with a version appropriate only for this IR (which may allow FMF).

Am I missing something here? More generally, how can any after-the-fact setFlags ever be correct, given that the node might have already been merged?

That does seem wrong given the Flags.isDefined() check sitting within intersectWith(). I'm not seeing any regression test failures if we remove that entirely. So let's do that instead of only pushing it below the nsw/nuw intersects?

I agree removing the isDefined() check within intersectWith() is a good idea, and if we can do it without regression, we should do so. (And that will certainly fix the original problem Jonas was seeing.)

I still believe that even with that change most (all?) uses of setFlags are dangerous as they could introduce a flag into a shared node that may not be appropriate for all existing uses. But maybe that can then be a separate discussion elsewhere.

jonpa updated this revision to Diff 289733.Sep 3 2020, 8:57 AM

Patch updated per review.

I'm questioning why we should have that "isDefined" logic in the first place. It seems to me that any node with "undefined" flags can only be used correctly if any setting of the flags would be semantically correct -- but then we may just as well use the all-zero setting of the flags anyway.

I changed te uses in the AMDGPU backend so that it is now obvious that the only reason for having isDefined() is to allow SelectionDAGBuilder to set the flags at a later point than when creating the nodes.

To get rid of it I think this is needed:

  • Refactor SelectionDAGBuilder to call setValue(DAG.getNode()) with the correct SDNodeFlags directly.
  • SelectionDAGBuilder::Visit(const Instruction) call to setFlags() can be removed.
  • the isDefined() and AnyDefined member of SDNodeFlags can then be removed.

I still believe that even with that change most (all?) uses of setFlags are dangerous as they could introduce a flag into a shared node that may not be appropriate for all existing uses. But maybe that can then be a separate discussion elsewhere.

I may be missing something, but as long as the memoization does a true intersection of the flags per this patch, there should not necessarily be anything broken, or?

I think it would be possible to make things clearer:

  • When a new equivalent node takes over flags from a node it is replacing (for instance during widening), it should pass the flags to getNode() directly or if there are many calls to getNode() maybe call a function like transferFlags(OrigNode, NewNode) at the end of the function.
  • When a new node is created and the values of the flags are given by the context, the flags should be passed directly to getNode().
  • When a single flag needs to be set on an existing node, why not do it directly instead of calling getFlags(), set...(), setFlags() (like SelectionDAGISel currently does for setNoFPExcept).
  • ...?

I added a TODO comment referencing this.

I agree removing the isDefined() check within intersectWith() is a good idea, and if we can do it without regression, we should do so. (And that will certainly fix the original problem Jonas was seeing.

Yes, all the tests are passing :-) With fast-fp, there are 2 files changing on SPEC with a total of just 8 less fused ops (without fast fp, there is no change). This is due to SelectionDAGBuilder now clearing the flags of the fmul and fadd nodes so DAGCombiner will not produce the fma. I would hope this is temporarily acceptable, or?

@aemerson: It seems you introduced the isDefined() with d28f0cd4 - are you OK with this change, and what are your comments now?

Yes, all the tests are passing :-) With fast-fp, there are 2 files changing on SPEC with a total of just 8 less fused ops (without fast fp, there is no change). This is due to SelectionDAGBuilder now clearing the flags of the fmul and fadd nodes so DAGCombiner will not produce the fma. I would hope this is temporarily acceptable, or?

We have to favor correctness over performance, so that's the right change. But if you already have some idea/example of how we are losing the flags on those SPEC tests, it would be great to add reduced versions as regression tests.

Patch updated per review.

I'm questioning why we should have that "isDefined" logic in the first place. It seems to me that any node with "undefined" flags can only be used correctly if any setting of the flags would be semantically correct -- but then we may just as well use the all-zero setting of the flags anyway.

I changed te uses in the AMDGPU backend so that it is now obvious that the only reason for having isDefined() is to allow SelectionDAGBuilder to set the flags at a later point than when creating the nodes.

To get rid of it I think this is needed:

  • Refactor SelectionDAGBuilder to call setValue(DAG.getNode()) with the correct SDNodeFlags directly.
  • SelectionDAGBuilder::Visit(const Instruction) call to setFlags() can be removed.
  • the isDefined() and AnyDefined member of SDNodeFlags can then be removed.

I still believe that even with that change most (all?) uses of setFlags are dangerous as they could introduce a flag into a shared node that may not be appropriate for all existing uses. But maybe that can then be a separate discussion elsewhere.

I may be missing something, but as long as the memoization does a true intersection of the flags per this patch, there should not necessarily be anything broken, or?

I think it would be possible to make things clearer:

  • When a new equivalent node takes over flags from a node it is replacing (for instance during widening), it should pass the flags to getNode() directly or if there are many calls to getNode() maybe call a function like transferFlags(OrigNode, NewNode) at the end of the function.
  • When a new node is created and the values of the flags are given by the context, the flags should be passed directly to getNode().
  • When a single flag needs to be set on an existing node, why not do it directly instead of calling getFlags(), set...(), setFlags() (like SelectionDAGISel currently does for setNoFPExcept).
  • ...?

I added a TODO comment referencing this.

I agree removing the isDefined() check within intersectWith() is a good idea, and if we can do it without regression, we should do so. (And that will certainly fix the original problem Jonas was seeing.

Yes, all the tests are passing :-) With fast-fp, there are 2 files changing on SPEC with a total of just 8 less fused ops (without fast fp, there is no change). This is due to SelectionDAGBuilder now clearing the flags of the fmul and fadd nodes so DAGCombiner will not produce the fma. I would hope this is temporarily acceptable, or?

@aemerson: It seems you introduced the isDefined() with d28f0cd4 - are you OK with this change, and what are your comments now?

I don't have any strong feelings about this. I agree with Sanjay that correctness is more important but it would be good to have some tests that show what we're now failing to optimize.

jonpa updated this revision to Diff 289884.Sep 4 2020, 1:10 AM
jonpa marked an inline comment as done.

We have to favor correctness over performance, so that's the right change. But if you already have some idea/example of how we are losing the flags on those SPEC tests, it would be great to add reduced versions as regression tests.

I added a reduced XFAILing test, and the good news is that I found the single fix for these cases which was just to remember to copy the FMF flags in SelectionDAGBuilder::visitBinary(). With that, SPEC does not change on SystemZ (except for the one locr). I will post this fix separately, since two other tests (AArch64, X86) now needs updating.

I may be missing something, but as long as the memoization does a true intersection of the flags per this patch, there should not necessarily be anything broken, or?

The problem occurs when setFlags is done after memoization was already performed. For example, this code in LegalizeVectorTypes.cpp:

SDNode *ScalarNode = DAG.getNode(
    N->getOpcode(), DL, ScalarVTs, ScalarLHS, ScalarRHS).getNode();
ScalarNode->setFlags(N->getFlags());

This creates a new scalar node with no flags. Now, it might happen that this same node already exists in the DAG, in a place where no flags are valid. Memoization returns this existing node; intersection doesn't change anything since there already were no flags set. But after that memoized node is returned, the explicit setFlags call now simply overwrites the flags. If some flag bits are set here, these will now apply to the original node, where they may be incorrect.

Fortunately, it looks like most such cases in SelectionDAG can be fixed by simply providing the desired flags with get getNode call itself and removing the separate setFlags call.

jonpa added a comment.Sep 4 2020, 3:32 AM

The problem occurs when setFlags is done after memoization was already performed.

Ah, now I see what you mean :-)

spatel accepted this revision.Sep 4 2020, 7:50 AM

LGTM - we can work on the improvements in the follow-up patches.

This revision is now accepted and ready to land.Sep 4 2020, 7:50 AM