This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Let NSW/NUW flags be cleared by default in call to getNode().
ClosedPublic

Authored by jonpa on Aug 31 2020, 4:34 AM.

Download Raw Diff

Details

Reviewers

spatel
aemerson
mcberg2017
uweigand
craig.topper
eli.friedman

Commits

rG714ceefad9b9: [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.

Summary

Recently a new optimization was added to the DAGCombiner which exposed a problem with SDNode memoization (see bug report at https://bugs.llvm.org/show_bug.cgi?id=47092). The problem was that an existing ISD::ADD node node with NSW/NUW flags set was reused when the DAGCombiner produced an identical node except without any guarantee for no overflow. This led to false NSW/NUW flags on the resulting SystemZ MachineInstr, and since SystemZElimCompare.cpp trusts these flags when eliminating a compare with 0, wrong code resulted.

This patch changes the default behavior in these cases to clear any NSW/NUW flags in case they are not set by the caller of getNode(). This seems safer since otherwise wrong-code might result any time an optimization forgets to explicitly clear these flags.

This doesn't cause any test failures, and changes just one locr instruction on SPEC (cse.s) from locrlh to locrne (in that case an SRK instruction had an NSW flag which it seems it perhaps should not have...)

I am not sure about the FP flags so I left them out of this change, but maybe one or more of them also should be conservatively cleared by default?

Diff Detail

Event Timeline

jonpa created this revision.Aug 31 2020, 4:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2020, 4:34 AM

Herald added subscribers: steven.zhang, JDevlieghere. · View Herald Transcript

jonpa requested review of this revision.Aug 31 2020, 4:34 AM

Interesting. I agree that the flags should always be reset here. In fact, I'm questioning why we should have that "isDefined" logic in the first place. It seems to me that any node with "undefined" flags can only be used correctly if any setting of the flags would be semantically correct -- but then we may just as well use the all-zero setting of the flags anyway.

However, doing that for the floating-point flags would break the current logic in SelectionDAGBuilder::visit, which does:

if (SDNode *Node = getNodeForIRValue(&I)) {
  SDNodeFlags IncomingFlags;
  IncomingFlags.copyFMF(*FPMO);
  if (!Node->getFlags().isDefined())
    Node->setFlags(IncomingFlags);
  else
    Node->intersectFlagsWith(IncomingFlags);
}

So it relies on nodes initially being created without defined flags, and then setting the FMF flags after the fact.

But it seems to me this logic is actually wrong: the Node returned from getNodeForIRValue may itself have multiple uses, some of which allow FMF and others do not. So just blindly setting the flags here probably breaks in some cases.

Similarly, pretty much every other use of setFlags in SelectionDAG runs into the same issue. Really, the only safe usage of flags is to specific them in the initial DAG.getNode call to begin with. It seems this will require a significant refactoring.

However, as to the NUW/NSW flags specifically, your patch looks correct to me, but I'd appreciate comments from folks more familiar with the overall SelectionDAG logic as well.

qiucf added a subscriber: qiucf.Aug 31 2020, 9:35 PM

jonpa added reviewers: craig.topper, eli.friedman.Sep 1 2020, 7:22 AM

Herald added a subscriber: ecnelises. · View Herald TranscriptSep 1 2020, 7:22 AM

I only scanned over the comments here and in the bug report so far, but let me list some reviews that contributed to the current state for more background:
D32527 - introduced the defined state bit
D46854 - changed code in SelectionDAGBuilder::visit()
D51145 - dealt with a flags intersection problem

Thanks for the links, @spatel ! It seems to me that the change in "D46854 - changed code in SelectionDAGBuilder::visit()" is actually not correctly respecting the shared flags semantics. The node returned from getNodeForIRValue, while implementing the translation of this IR, might at the same be be used elsewhere (due to DAG node merging). If that other place does not allow FMF semantics, the flags in the DAG node can still be "undefined", and therefore the new code in "visit" will just replace the flags with a version appropriate only for this IR (which may allow FMF).

Am I missing something here? More generally, how can any after-the-fact setFlags ever be correct, given that the node might have already been merged?

In D86871#2249833, @uweigand wrote:

Thanks for the links, @spatel ! It seems to me that the change in "D46854 - changed code in SelectionDAGBuilder::visit()" is actually not correctly respecting the shared flags semantics. The node returned from getNodeForIRValue, while implementing the translation of this IR, might at the same be be used elsewhere (due to DAG node merging). If that other place does not allow FMF semantics, the flags in the DAG node can still be "undefined", and therefore the new code in "visit" will just replace the flags with a version appropriate only for this IR (which may allow FMF).

Am I missing something here? More generally, how can any after-the-fact setFlags ever be correct, given that the node might have already been merged?

That does seem wrong given the Flags.isDefined() check sitting within intersectWith(). I'm not seeing any regression test failures if we remove that entirely. So let's do that instead of only pushing it below the nsw/nuw intersects?

llvm/test/CodeGen/SystemZ/int-cmp-60.ll
8–11	I think it would be less fragile to check the complete/correct asm for this test instead of using 'CHECK-NOT' on debug output. If I'm seeing that correctly, we should get a "locrnhe" in the result if we fixed the bug.

qiucf mentioned this in D87037: [DAGCombiner] Propagate FMF flags in FMA folding.Sep 2 2020, 8:51 AM

In D86871#2252248, @spatel wrote:

In D86871#2249833, @uweigand wrote:

Thanks for the links, @spatel ! It seems to me that the change in "D46854 - changed code in SelectionDAGBuilder::visit()" is actually not correctly respecting the shared flags semantics. The node returned from getNodeForIRValue, while implementing the translation of this IR, might at the same be be used elsewhere (due to DAG node merging). If that other place does not allow FMF semantics, the flags in the DAG node can still be "undefined", and therefore the new code in "visit" will just replace the flags with a version appropriate only for this IR (which may allow FMF).

Am I missing something here? More generally, how can any after-the-fact setFlags ever be correct, given that the node might have already been merged?

That does seem wrong given the Flags.isDefined() check sitting within intersectWith(). I'm not seeing any regression test failures if we remove that entirely. So let's do that instead of only pushing it below the nsw/nuw intersects?

I agree removing the isDefined() check within intersectWith() is a good idea, and if we can do it without regression, we should do so. (And that will certainly fix the original problem Jonas was seeing.)

I still believe that even with that change most (all?) uses of setFlags are dangerous as they could introduce a flag into a shared node that may not be appropriate for all existing uses. But maybe that can then be a separate discussion elsewhere.

Patch updated per review.

I'm questioning why we should have that "isDefined" logic in the first place. It seems to me that any node with "undefined" flags can only be used correctly if any setting of the flags would be semantically correct -- but then we may just as well use the all-zero setting of the flags anyway.

I changed te uses in the AMDGPU backend so that it is now obvious that the only reason for having isDefined() is to allow SelectionDAGBuilder to set the flags at a later point than when creating the nodes.

To get rid of it I think this is needed:

Refactor SelectionDAGBuilder to call setValue(DAG.getNode()) with the correct SDNodeFlags directly.
SelectionDAGBuilder::Visit(const Instruction) call to setFlags() can be removed.
the isDefined() and AnyDefined member of SDNodeFlags can then be removed.

I still believe that even with that change most (all?) uses of setFlags are dangerous as they could introduce a flag into a shared node that may not be appropriate for all existing uses. But maybe that can then be a separate discussion elsewhere.

I may be missing something, but as long as the memoization does a true intersection of the flags per this patch, there should not necessarily be anything broken, or?

I think it would be possible to make things clearer:

When a new equivalent node takes over flags from a node it is replacing (for instance during widening), it should pass the flags to getNode() directly or if there are many calls to getNode() maybe call a function like transferFlags(OrigNode, NewNode) at the end of the function.
When a new node is created and the values of the flags are given by the context, the flags should be passed directly to getNode().
When a single flag needs to be set on an existing node, why not do it directly instead of calling getFlags(), set...(), setFlags() (like SelectionDAGISel currently does for setNoFPExcept).
...?

I added a TODO comment referencing this.

I agree removing the isDefined() check within intersectWith() is a good idea, and if we can do it without regression, we should do so. (And that will certainly fix the original problem Jonas was seeing.

Yes, all the tests are passing :-) With fast-fp, there are 2 files changing on SPEC with a total of just 8 less fused ops (without fast fp, there is no change). This is due to SelectionDAGBuilder now clearing the flags of the fmul and fadd nodes so DAGCombiner will not produce the fma. I would hope this is temporarily acceptable, or?

@aemerson: It seems you introduced the isDefined() with d28f0cd4 - are you OK with this change, and what are your comments now?

Herald added subscribers: kerbowa, hiraditya, nhaehnle and 2 others. · View Herald TranscriptSep 3 2020, 8:57 AM

In D86871#2254760, @jonpa wrote:

Yes, all the tests are passing :-) With fast-fp, there are 2 files changing on SPEC with a total of just 8 less fused ops (without fast fp, there is no change). This is due to SelectionDAGBuilder now clearing the flags of the fmul and fadd nodes so DAGCombiner will not produce the fma. I would hope this is temporarily acceptable, or?

We have to favor correctness over performance, so that's the right change. But if you already have some idea/example of how we are losing the flags on those SPEC tests, it would be great to add reduced versions as regression tests.

In D86871#2254760, @jonpa wrote:

Patch updated per review.

I'm questioning why we should have that "isDefined" logic in the first place. It seems to me that any node with "undefined" flags can only be used correctly if any setting of the flags would be semantically correct -- but then we may just as well use the all-zero setting of the flags anyway.

I changed te uses in the AMDGPU backend so that it is now obvious that the only reason for having isDefined() is to allow SelectionDAGBuilder to set the flags at a later point than when creating the nodes.

To get rid of it I think this is needed:

Refactor SelectionDAGBuilder to call setValue(DAG.getNode()) with the correct SDNodeFlags directly.

SelectionDAGBuilder::Visit(const Instruction) call to setFlags() can be removed.

the isDefined() and AnyDefined member of SDNodeFlags can then be removed.

I still believe that even with that change most (all?) uses of setFlags are dangerous as they could introduce a flag into a shared node that may not be appropriate for all existing uses. But maybe that can then be a separate discussion elsewhere.

I may be missing something, but as long as the memoization does a true intersection of the flags per this patch, there should not necessarily be anything broken, or?

I think it would be possible to make things clearer:

When a new equivalent node takes over flags from a node it is replacing (for instance during widening), it should pass the flags to getNode() directly or if there are many calls to getNode() maybe call a function like transferFlags(OrigNode, NewNode) at the end of the function.

When a new node is created and the values of the flags are given by the context, the flags should be passed directly to getNode().

When a single flag needs to be set on an existing node, why not do it directly instead of calling getFlags(), set...(), setFlags() (like SelectionDAGISel currently does for setNoFPExcept).

...?

I added a TODO comment referencing this.

I agree removing the isDefined() check within intersectWith() is a good idea, and if we can do it without regression, we should do so. (And that will certainly fix the original problem Jonas was seeing.

Yes, all the tests are passing :-) With fast-fp, there are 2 files changing on SPEC with a total of just 8 less fused ops (without fast fp, there is no change). This is due to SelectionDAGBuilder now clearing the flags of the fmul and fadd nodes so DAGCombiner will not produce the fma. I would hope this is temporarily acceptable, or?

@aemerson: It seems you introduced the isDefined() with d28f0cd4 - are you OK with this change, and what are your comments now?

I don't have any strong feelings about this. I agree with Sanjay that correctness is more important but it would be good to have some tests that show what we're now failing to optimize.

We have to favor correctness over performance, so that's the right change. But if you already have some idea/example of how we are losing the flags on those SPEC tests, it would be great to add reduced versions as regression tests.

I added a reduced XFAILing test, and the good news is that I found the single fix for these cases which was just to remember to copy the FMF flags in SelectionDAGBuilder::visitBinary(). With that, SPEC does not change on SystemZ (except for the one locr). I will post this fix separately, since two other tests (AArch64, X86) now needs updating.

jonpa mentioned this in D87130: [SelectionDAGBuilder] Remember to copy the FMF flags in visitBinary()..Sep 4 2020, 1:44 AM

I may be missing something, but as long as the memoization does a true intersection of the flags per this patch, there should not necessarily be anything broken, or?

The problem occurs when setFlags is done after memoization was already performed. For example, this code in LegalizeVectorTypes.cpp:

SDNode *ScalarNode = DAG.getNode(
    N->getOpcode(), DL, ScalarVTs, ScalarLHS, ScalarRHS).getNode();
ScalarNode->setFlags(N->getFlags());

This creates a new scalar node with no flags. Now, it might happen that this same node already exists in the DAG, in a place where no flags are valid. Memoization returns this existing node; intersection doesn't change anything since there already were no flags set. But after that memoized node is returned, the explicit setFlags call now simply overwrites the flags. If some flag bits are set here, these will now apply to the original node, where they may be incorrect.

Fortunately, it looks like most such cases in SelectionDAG can be fixed by simply providing the desired flags with get getNode call itself and removing the separate setFlags call.

The problem occurs when setFlags is done after memoization was already performed.

Ah, now I see what you mean :-)

LGTM - we can work on the improvements in the follow-up patches.

This revision is now accepted and ready to land.Sep 4 2020, 7:50 AM

Closed by commit rG714ceefad9b9: [SelectionDAG] Always intersect SDNode flags during getNode() node memoization. (authored by jonpa). · Explain WhySep 5 2020, 1:32 AM

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rG714ceefad9b9: [SelectionDAG] Always intersect SDNode flags during getNode() node memoization..

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

SelectionDAGNodes.h

11 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

2 lines

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

4 lines

AMDGPUISelLowering.h

4 lines

test/

CodeGen/

SystemZ/

int-cmp-60.ll

29 lines

Diff 289733

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	template<> struct simplify_type<SDUse> {
}		}
};		};

/// These are IR-level optimization flags that may be propagated to SDNodes.		/// These are IR-level optimization flags that may be propagated to SDNodes.
/// TODO: This data structure should be shared by the IR optimizer and the		/// TODO: This data structure should be shared by the IR optimizer and the
/// the backend.		/// the backend.
struct SDNodeFlags {		struct SDNodeFlags {
private:		private:
// This bit is used to determine if the flags are in a defined state.		// This bit is used to determine if the flags are in a defined state. It is
// Flag bits can only be masked out during intersection if the masking flags		// only used by SelectionDAGBuilder.
// are defined.
bool AnyDefined : 1;		bool AnyDefined : 1;

bool NoUnsignedWrap : 1;		bool NoUnsignedWrap : 1;
bool NoSignedWrap : 1;		bool NoSignedWrap : 1;
bool Exact : 1;		bool Exact : 1;
bool NoNaNs : 1;		bool NoNaNs : 1;
bool NoInfs : 1;		bool NoInfs : 1;
bool NoSignedZeros : 1;		bool NoSignedZeros : 1;
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	public:
bool hasNoInfs() const { return NoInfs; }		bool hasNoInfs() const { return NoInfs; }
bool hasNoSignedZeros() const { return NoSignedZeros; }		bool hasNoSignedZeros() const { return NoSignedZeros; }
bool hasAllowReciprocal() const { return AllowReciprocal; }		bool hasAllowReciprocal() const { return AllowReciprocal; }
bool hasAllowContract() const { return AllowContract; }		bool hasAllowContract() const { return AllowContract; }
bool hasApproximateFuncs() const { return ApproximateFuncs; }		bool hasApproximateFuncs() const { return ApproximateFuncs; }
bool hasAllowReassociation() const { return AllowReassociation; }		bool hasAllowReassociation() const { return AllowReassociation; }
bool hasNoFPExcept() const { return NoFPExcept; }		bool hasNoFPExcept() const { return NoFPExcept; }

/// Clear any flags in this flag set that aren't also set in Flags.		/// Clear any flags in this flag set that aren't also set in Flags. All
/// If the given Flags are undefined then don't do anything.		/// flags will be cleared if Flags are undefined.
void intersectWith(const SDNodeFlags Flags) {		void intersectWith(const SDNodeFlags Flags) {
if (!Flags.isDefined())
return;
NoUnsignedWrap &= Flags.NoUnsignedWrap;		NoUnsignedWrap &= Flags.NoUnsignedWrap;
NoSignedWrap &= Flags.NoSignedWrap;		NoSignedWrap &= Flags.NoSignedWrap;
Exact &= Flags.Exact;		Exact &= Flags.Exact;
NoNaNs &= Flags.NoNaNs;		NoNaNs &= Flags.NoNaNs;
NoInfs &= Flags.NoInfs;		NoInfs &= Flags.NoInfs;
NoSignedZeros &= Flags.NoSignedZeros;		NoSignedZeros &= Flags.NoSignedZeros;
AllowReciprocal &= Flags.AllowReciprocal;		AllowReciprocal &= Flags.AllowReciprocal;
AllowContract &= Flags.AllowContract;		AllowContract &= Flags.AllowContract;
▲ Show 20 Lines • Show All 2,233 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,122 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visit(const Instruction &I) {
if (auto *FPMO = dyn_cast<FPMathOperator>(&I)) {		if (auto *FPMO = dyn_cast<FPMathOperator>(&I)) {
// ConstrainedFPIntrinsics handle their own FMF.		// ConstrainedFPIntrinsics handle their own FMF.
if (!isa<ConstrainedFPIntrinsic>(&I)) {		if (!isa<ConstrainedFPIntrinsic>(&I)) {
// Propagate the fast-math-flags of this IR instruction to the DAG node that		// Propagate the fast-math-flags of this IR instruction to the DAG node that
// maps to this instruction.		// maps to this instruction.
// TODO: We could handle all flags (nsw, etc) here.		// TODO: We could handle all flags (nsw, etc) here.
// TODO: If an IR instruction maps to >1 node, only the final node will have		// TODO: If an IR instruction maps to >1 node, only the final node will have
// flags set.		// flags set.
		// TODO: The handling of flags should be improved, see
		// https://reviews.llvm.org/D86871
if (SDNode *Node = getNodeForIRValue(&I)) {		if (SDNode *Node = getNodeForIRValue(&I)) {
SDNodeFlags IncomingFlags;		SDNodeFlags IncomingFlags;
IncomingFlags.copyFMF(*FPMO);		IncomingFlags.copyFMF(*FPMO);
if (!Node->getFlags().isDefined())		if (!Node->getFlags().isDefined())
Node->setFlags(IncomingFlags);		Node->setFlags(IncomingFlags);
else		else
Node->intersectFlagsWith(IncomingFlags);		Node->intersectFlagsWith(IncomingFlags);
}		}
▲ Show 20 Lines • Show All 9,547 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	void AMDGPUDAGToDAGISel::PreprocessISelDAG() {
}		}
}		}

bool AMDGPUDAGToDAGISel::isNoNanSrc(SDValue N) const {		bool AMDGPUDAGToDAGISel::isNoNanSrc(SDValue N) const {
if (TM.Options.NoNaNsFPMath)		if (TM.Options.NoNaNsFPMath)
return true;		return true;

// TODO: Move into isKnownNeverNaN		// TODO: Move into isKnownNeverNaN
if (N->getFlags().isDefined())		if (N->getFlags().hasNoNaNs())
return N->getFlags().hasNoNaNs();		return true;

return CurDAG->isKnownNeverNaN(N);		return CurDAG->isKnownNeverNaN(N);
}		}

bool AMDGPUDAGToDAGISel::isInlineImmediate(const SDNode *N,		bool AMDGPUDAGToDAGISel::isInlineImmediate(const SDNode *N,
bool Negated) const {		bool Negated) const {
if (N->isUndef())		if (N->isUndef())
return true;		return true;
▲ Show 20 Lines • Show All 2,460 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

	Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	public:			public:
	AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUSubtarget &STI);			AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUSubtarget &STI);

	bool mayIgnoreSignedZero(SDValue Op) const {			bool mayIgnoreSignedZero(SDValue Op) const {
	if (getTargetMachine().Options.NoSignedZerosFPMath)			if (getTargetMachine().Options.NoSignedZerosFPMath)
	return true;			return true;

	const auto Flags = Op.getNode()->getFlags();			const auto Flags = Op.getNode()->getFlags();
	if (Flags.isDefined())			if (Flags.hasNoSignedZeros())
	return Flags.hasNoSignedZeros();			return true;

	return false;			return false;
	}			}

	static inline SDValue stripBitcast(SDValue Val) {			static inline SDValue stripBitcast(SDValue Val) {
	return Val.getOpcode() == ISD::BITCAST ? Val.getOperand(0) : Val;			return Val.getOpcode() == ISD::BITCAST ? Val.getOperand(0) : Val;
	}			}

	▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/int-cmp-60.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 \| FileCheck %s
				;
				; Test that DAGCombiner properly clears the NUW/NSW flags on the memoized add
				; node.

				define void @fun(i64* %Src, i32* %Dst) {
				; CHECK-LABEL: fun:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: iilf %r0, 1303940520
				; CHECK-NEXT: n %r0, 4(%r2)
				spatelUnsubmitted Done Reply Inline Actions I think it would be less fragile to check the complete/correct asm for this test instead of using 'CHECK-NOT' on debug output. If I'm seeing that correctly, we should get a "locrnhe" in the result if we fixed the bug. spatel: I think it would be less fragile to check the complete/correct asm for this test instead of…
				; CHECK-NEXT: lr %r1, %r0
				; CHECK-NEXT: afi %r1, 1628135358
				; CHECK-NEXT: locrnhe %r1, %r0
				; CHECK-NEXT: st %r1, 0(%r3)
				; CHECK-NEXT: br %r14
				entry:
				%0 = load i64, i64* %Src, align 8
				%1 = trunc i64 %0 to i32
				%conv = and i32 %1, 1303940520
				%xor11.i = or i32 %conv, -2147483648
				%xor2.i = add i32 %xor11.i, -519348290
				%cmp.i = icmp slt i32 %xor2.i, 0
				%sub3.i = add nuw nsw i32 %conv, 1628135358
				%cond.i = select i1 %cmp.i, i32 %conv, i32 %sub3.i
				store i32 %cond.i, i32* %Dst
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Let NSW/NUW flags be cleared by default in call to getNode().ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 289733

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/test/CodeGen/SystemZ/int-cmp-60.ll

[SelectionDAG] Let NSW/NUW flags be cleared by default in call to getNode().
ClosedPublic