This is an archive of the discontinued LLVM Phabricator instance.

propagate IR-level fast-math-flags to DAG nodes
ClosedPublic

Authored by spatel on Apr 8 2015, 10:50 AM.

Download Raw Diff

Details

Reviewers

rengolin
andreadb
echristo
hfinkel
marcello.maggioni

Commits

rG801caff64d2f: propagate IR-level fast-math-flags to DAG nodes (NFC)
rL236546: propagate IR-level fast-math-flags to DAG nodes (NFC)

Summary

This patch is an attempt to introduce the minimum plumbing necessary to use IR-level fast-math-flags (FMF) in the backend along with one usage and test case (reciprocal estimate).

History:

IR-level FMF was added around 2012: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054999.html
Integer IR optimization flags (nsw, nuw, exact) were extended to the backend in June 2014: http://reviews.llvm.org/rL210467

Motivation:

We have customers who would like to see clang have more flexible behavior with respect to FP reciprocal approximations. This means that -freciprocal-math is honored as a separate optimization setting from -ffast-math (see existing gcc behavior for a starting point).

This is broken in the:
a. front-end: -freciprocal-math flag is ignored
b. the IR optimizer: UnsafeAlgebra implies AllowReciprocal
c. the backend: IR FMF flags are dropped entirely

Longer-term and more fundamentally: one of the goals for IR-level FMF was to allow mixing of strict and relaxed FP math code. This is impossible without backend support. The problem is exacerbated by LTO.

My initial draft of this patch left the nsw/nuw/exact changes mostly as-is, but it quickly became clear that maintaining the APIs for those and adding FMF on the side made everything worse. So I grouped all of the optimization flags into one class and went from there. The flags and node classes are copied directly from the IR FMF class and FPMathOperator. We should obviously unify the backend with the IR optimizer on those flag bits, but I think that can be a follow-on patch? The goal of this initial patch is to not break any existing functionality or change the IR code while adding optional processing of FMF to the backend.

I'm promising to continue work on this to convert much more of the existing backend code to use FMF wherever possible.

Diff Detail

Event Timeline

spatel updated this revision to Diff 23427.Apr 8 2015, 10:50 AM

spatel retitled this revision from to propagate IR-level fast-math-flags to DAG nodes.

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: hfinkel, andreadb, rengolin.

spatel added a subscriber: Unknown Object (MLST).

alexr added a subscriber: alexr.Apr 8 2015, 11:18 AM

resistor added a subscriber: resistor.Apr 8 2015, 9:57 PM

Hi Sanjay,

I only had a couple of minor comments/questions. In general I think this is a nice initial patch.
If I remember correctly, the initial support for nsw/nuw/exact flags in SelectionDAG was added by Marcello Maggioni. In case, I suggest you to add him in cc to the code review (I don't know if he is still working on this..).

include/llvm/CodeGen/SelectionDAG.h
1230–1231	My understanding is that flags can only be present on binary operations. This is also the reason why originally nsw/nuw/exact were only added to BinarySDNode. Is there a reason why BinarySDNode should extend from SDNode? At the moment, 'Ops' is always expected to have two SDValues.
include/llvm/CodeGen/SelectionDAGNodes.h
971–977	This is ok. However, what about having something like this? setNoUnsignedWrap() { Flags \|= NoUnsignedWrap; } clearNoUnsignedWrap() { Flags ^= NoUnsignedWrap; } Also it is a shame that most of this code is repeated in 'class SDNodeWithFlags'. I wonder if there is a better design that allows to delegate the 'bit manipulation part' as much as possible to 'SDNodeFlags'...

Adding Marcello to reviewers.

spatel added inline comments.Apr 9 2015, 7:55 AM

include/llvm/CodeGen/SelectionDAG.h
1230–1231	It is correct that we only have flags on binary ops today, but I was considering the case of adding FMF to intrinsics or libcalls ( https://llvm.org/bugs/show_bug.cgi?id=21290 ). I was also thinking about the FMA case and other nodes that don't exist in IR. I think we'll need to propagate the flags to more than binops eventually, so... I agree that this patch is hacky in that it only updates the binop API, but it was the minimal change.

spatel added inline comments.Apr 9 2015, 8:04 AM

include/llvm/CodeGen/SelectionDAGNodes.h
971–977	I agree completely, but I didn't see an immediate solution, so I just went with the lazy approach: copy and paste! I'm open to any suggestions for improvement on the interface, and eventually (soon) I would unify the IR and SDNode versions of the flags, so we get the benefits in both places.

re: the -freciprocal-math flag

"see existing gcc behavior for a starting point"

I should've taken my own advice more seriously. :)

It turns out that -freciprocal-math is only used for a reassociation optimization on divisions. Specifically, this one:
https://llvm.org/bugs/show_bug.cgi?id=16218

With gcc, all of the machine-level hackery that we've put into generating hardware estimate instructions is controlled by a flag that doesn't even exist in clang yet: -mrecip
https://llvm.org/bugs/show_bug.cgi?id=20912

So the end result is that this patch's recip change is wrong. Update to at least that part of the patch coming soon.

I'm assuming that we should not diverge from gcc's user-visible behavior without good reason, so we'll still make -freciprocal-math cause 'arcp' generation in IR ( http://reviews.llvm.org/rL234493 ), but 'arcp' has nothing to do with HW estimate codegen. For that, we need -mrecip with its multitude of options:
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/i386-and-x86-64-Options.html#index-mrecip_003dopt-1627

Delayed by another case of 'how does anything work...ever?':
https://llvm.org/bugs/show_bug.cgi?id=23172

Patch updated:

Corrected DAGCombiner 'arcp' optimization to only trigger for the reassociation division case, not the estimate cases.
Updated PowerPC test case that uses that optimization (x86 still doesn't have it); worked around PR23172 by putting the 'arcp' versions of the tests before the unsafe-fp-math versions.

milseman added a subscriber: milseman.Apr 9 2015, 12:01 PM

Hello, I'm the "Marcello Maggioni" of the original backend flags by the way :P

I'm very interested in what is going on here and I think is about time we have "Node-level" Fast Math flags in the backend! This is a very nice start!

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3614–3615	Does it even make sense to have BinarySDNode around at all considering the main function generating these now generates standard SDNodes?

spatel added inline comments.Apr 10 2015, 1:58 PM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3614–3615	Hi Marcello - Thanks for looking at this patch. I got rid of "BinaryWithFlagsSDNode", but "BinarySDNode" had some uses elsewhere along with "UnarySDNode" and "TernarySDNode". The comment for BinarySDNode says: /// This class is used for two-operand SDNodes. This is solely /// to allow co-allocation of node operands with the node itself. So it's just there for malloc efficiency? If everybody agrees that it has no value, then I can delete it, but I think that should be a follow-on patch.

Over all as far as I'm concerned the approach here is pretty sound in my opinion and is definitely an improvement over everything we have at the moment.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3614–3615	Yeah, sure, I was just bringing up the the question, thats all :)

One high level comment and a few inline nits:

Can you split out the flags part of the patch first? I'm very certain that's not objectionable and will make the actual content of your patch much smaller.

Rest inline :)

Thanks!

-eric

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7822	Nit: No braces around single statement ifs.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
406	Early exit?
416	Early exit?

Let me ask a dumb question since I don't know any better. :)

Is the size of an SDNode so critical that we can't do the proper software engineering thing: add the flags as a member rather than riding on the bonus bits available in SubclassData? If we do that, all of the distasteful get/set and bit shifting replication disappears.

If we make a bag o' bits for the flags as in this updated patch, they'd just be single byte. As it stands, I'm showing that the sizeof(SDNode) and sizeof(SDNodeWithFlags) are both 80 bytes (building on MacOSX 10.10). Free bits!

That said, there's something very wrong with this patch now...I'm seeing *intermittent* failures on seemingly unrelated codegen regression tests. Can anyone spot the bug?

In D8900#157359, @spatel wrote:

Let me ask a dumb question since I don't know any better. :)

Is the size of an SDNode so critical that we can't do the proper software engineering thing: add the flags as a member rather than riding on the bonus bits available in SubclassData? If we do that, all of the distasteful get/set and bit shifting replication disappears.

If we make a bag o' bits for the flags as in this updated patch, they'd just be single byte. As it stands, I'm showing that the sizeof(SDNode) and sizeof(SDNodeWithFlags) are both 80 bytes (building on MacOSX 10.10). Free bits!

The size is important, but if you don't increase the size on common architectures, then you should be good to go.

That said, there's something very wrong with this patch now...I'm seeing *intermittent* failures on seemingly unrelated codegen regression tests. Can anyone spot the bug?

Have you tried running under valgrind and/or asan/msan? Does this happen in a debug build, or just an optimized build?

In D8900#157396, @hfinkel wrote:

Have you tried running under valgrind and/or asan/msan? Does this happen in a debug build, or just an optimized build?

Not yet; that's the next step. I wanted to see if there were structural objections to this patch before chasing down the errors.

I'm seeing the failures with both debug and optimized builds. The tests that fail regularly, but not always are:

LLVM :: CodeGen/AArch64/aarch64-address-type-promotion.ll
LLVM :: CodeGen/AArch64/arm64-vector-ldst.ll
LLVM :: CodeGen/X86/cmovcmov.ll

ab added a subscriber: ab.Apr 17 2015, 6:42 PM

In D8900#157594, @spatel wrote:

In D8900#157396, @hfinkel wrote:

Have you tried running under valgrind and/or asan/msan? Does this happen in a debug build, or just an optimized build?

Not yet; that's the next step. I wanted to see if there were structural objections to this patch before chasing down the errors.

Valgrind noted that the flags were being read uninitialized. I thought between bitfields, bools and shifts, I must've stepped into a pile of UB, but the actual bug was when creating the node. I was checking:

if (Flags && mayHaveOptimizationFlags(Opcode))

We can't use the nullptr Flags to decide the type of node to allocate; if it may have opt flags, then it must be an SDNodeWithFlags. We just have to fill in default flags in that case.

I had no luck with sanitizers (not enabled with the OS X system clang? And can't build LLVM trunk with that turned on?). But it pushed me to build on Linux x86-64 where I confirmed that the sizeof(SDNode) == sizeof(SDNodeWithFlags) == 80. So again, it looks like we can use a proper software class for the flags without any size overhead.

Patch updated:

Fixed bug when allocating SDNodesWithFlags.
Fixed inline problems noted by Eric Christopher.

I didn't split off the flags class from the rest of the patch as suggested by Eric on the previous review because the class is now just a trivial bit container. But if that's still the preferred way to go, please let me know.

Hi Sanjay,

It's still my preference if possible because it'll take the API nfc change out of the mix on the patch. Is it possible for you to do that?

Thanks!

-eric

In D8900#162327, @echristo wrote:

It's still my preference if possible because it'll take the API nfc change out of the mix on the patch. Is it possible for you to do that?

Hi Eric -

Sure - it's a little more work, but it's the right thing to do. I think we should actually break this into 3 steps:

Introduce the new Flags struct and have the existing BinaryWithFlagsSDNode use them only for nsw/nuw/exact (NFC).
Convert BinaryWithFlagsSDNode to the broader SDNodeWithFlags class and extend the potential users to the FP nodes (NFC).
Update DAGCombiner to use the 'arcp' flag and add the new test cases.

First patch coming up...

spatel mentioned this in D9325: move IR-level optimization flags into their own struct.Apr 28 2015, 9:05 AM

Absolutely agreed. Thanks!

-eric

spatel mentioned this in rL235997: move IR-level optimization flags into their own struct.Apr 28 2015, 9:42 AM

Patch updated:

We hacked off the head of this patch into D9325 ( http://reviews.llvm.org/rL235997 ) - move the flags into their own space.
I've also removed the tail end of this patch; that's just the part that recognizes 'arcp' and optimizes based on it in DAGCombiner.

So this patch is now NFC-intended, but there are 2 structural changes:

The main diff is that we're preparing to expand the reach of the optimization flags to affect more than just binary SDNodes. I think this is already necessary for intrinsics in IR ( PR21290 - https://llvm.org/bugs/show_bug.cgi?id=21290 ). And now that we have the flags in the backend, we can propagate them to non-binop nodes that don't even exist in IR such as FMA, FNEG, etc.
The other change is that we're actually copying the FP fast-math-flags from the IR instructions to SDNodes. They're just not being used for anything yet...that's the next patch.

LGTM.

Thanks for splitting this out! :)

-eric

This revision is now accepted and ready to land.May 5 2015, 11:06 AM

Closed by commit rL236546: propagate IR-level fast-math-flags to DAG nodes (NFC) (authored by spatel). · Explain WhyMay 5 2015, 2:44 PM

This revision was automatically updated to reflect the committed changes.

Thanks, Eric! Checked in at r236546.

For the record, r236546 was reverted by:
http://reviews.llvm.org/rL236600

The bug is independent of FMF or the other flags. It can be reproduced by allocating a plain SDNode for any instruction with 2 operands rather than a BinarySDNode. The bug manifests (visible under valgrind) after recycling the operand memory and/or morphing a node.

I didn't take any perf measurements, but I assume that we want to continue using the specialized operand classes (UnarySDNode, BinarySDNode, TernarySDNode) for their allocation optimizations. So I simplified this patch to keep the existing BinaryWithFlagsSDNode class that derives from BinarySDNode:
http://reviews.llvm.org/rL237046

spatel mentioned this in D9708: Use the 'arcp' fast-math-flag when combining repeated FP divisors.May 12 2015, 10:15 AM

spatel mentioned this in D9780: expose ILP for associative operations in the DAG.Jun 4 2015, 1:25 PM

spatel mentioned this in D10403: propagate IR-level fast-math-flags to DAG nodes, disabled by default.Jun 11 2015, 3:02 PM

spatel mentioned this in rL251450: Use the 'arcp' fast-math-flag when combining repeated FP divisors.Oct 27 2015, 1:29 PM

spatel mentioned this in D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..Apr 26 2017, 6:55 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

SelectionDAG.h

10 lines

SelectionDAGNodes.h

220 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

55 lines

SelectionDAG.cpp

88 lines

SelectionDAGBuilder.cpp

23 lines

TargetLowering.cpp

5 lines

Target/

X86/

X86ISelLowering.cpp

5 lines

test/

CodeGen/

X86/

recip-fastmath.ll

35 lines

Diff 23427

include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 645 Lines • ▼ Show 20 Lines	SDValue getGLOBAL_OFFSET_TABLE(EVT VT) {
return getNode(ISD::GLOBAL_OFFSET_TABLE, SDLoc(), VT);		return getNode(ISD::GLOBAL_OFFSET_TABLE, SDLoc(), VT);
}		}

/// Gets or creates the specified node.		/// Gets or creates the specified node.
///		///
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT);		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT);
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N);		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N);
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,
bool nuw = false, bool nsw = false, bool exact = false);		const SDNodeFlags *Flags = nullptr);
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,
SDValue N3);		SDValue N3);
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,
SDValue N3, SDValue N4);		SDValue N3, SDValue N4);
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1, SDValue N2,
SDValue N3, SDValue N4, SDValue N5);		SDValue N3, SDValue N4, SDValue N5);
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, ArrayRef<SDUse> Ops);		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT, ArrayRef<SDUse> Ops);
SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT,		SDValue getNode(unsigned Opcode, SDLoc DL, EVT VT,
▲ Show 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	SDValue getTargetExtractSubreg(int SRIdx, SDLoc DL, EVT VT,
SDValue Operand);		SDValue Operand);

/// A convenience function for creating TargetInstrInfo::INSERT_SUBREG nodes.		/// A convenience function for creating TargetInstrInfo::INSERT_SUBREG nodes.
SDValue getTargetInsertSubreg(int SRIdx, SDLoc DL, EVT VT,		SDValue getTargetInsertSubreg(int SRIdx, SDLoc DL, EVT VT,
SDValue Operand, SDValue Subreg);		SDValue Operand, SDValue Subreg);

/// Get the specified node if it's already available, or else return NULL.		/// Get the specified node if it's already available, or else return NULL.
SDNode *getNodeIfExists(unsigned Opcode, SDVTList VTs, ArrayRef<SDValue> Ops,		SDNode *getNodeIfExists(unsigned Opcode, SDVTList VTs, ArrayRef<SDValue> Ops,
bool nuw = false, bool nsw = false,		const SDNodeFlags *Flags = nullptr);
bool exact = false);

/// Creates a SDDbgValue node.		/// Creates a SDDbgValue node.
SDDbgValue getDbgValue(MDNode Var, MDNode Expr, SDNode N, unsigned R,		SDDbgValue getDbgValue(MDNode Var, MDNode Expr, SDNode N, unsigned R,
bool IsIndirect, uint64_t Off, DebugLoc DL,		bool IsIndirect, uint64_t Off, DebugLoc DL,
unsigned O);		unsigned O);

/// Constant		/// Constant
SDDbgValue getConstantDbgValue(MDNode Var, MDNode Expr, const Value C,		SDDbgValue getConstantDbgValue(MDNode Var, MDNode Expr, const Value C,
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	SDNode FindModifiedNodeSlot(SDNode N, ArrayRef<SDValue> Ops,
void *&InsertPos);		void *&InsertPos);
SDNode UpdadeSDLocOnMergedSDNode(SDNode N, SDLoc loc);		SDNode UpdadeSDLocOnMergedSDNode(SDNode N, SDLoc loc);

void DeleteNodeNotInCSEMaps(SDNode *N);		void DeleteNodeNotInCSEMaps(SDNode *N);
void DeallocateNode(SDNode *N);		void DeallocateNode(SDNode *N);

void allnodes_clear();		void allnodes_clear();

BinarySDNode *GetBinarySDNode(unsigned Opcode, SDLoc DL, SDVTList VTs,		SDNode *GetSDNodeWithFlags(unsigned Opcode, SDLoc DL, SDVTList VTs,
SDValue N1, SDValue N2, bool nuw, bool nsw,		ArrayRef<SDValue> Ops, const SDNodeFlags *Flags);
		andreadbUnsubmitted Not Done Reply Inline Actions My understanding is that flags can only be present on binary operations. This is also the reason why originally nsw/nuw/exact were only added to BinarySDNode. Is there a reason why BinarySDNode should extend from SDNode? At the moment, 'Ops' is always expected to have two SDValues. andreadb: My understanding is that flags can only be present on binary operations. This is also the…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions It is correct that we only have flags on binary ops today, but I was considering the case of adding FMF to intrinsics or libcalls ( https://llvm.org/bugs/show_bug.cgi?id=21290 ). I was also thinking about the FMA case and other nodes that don't exist in IR. I think we'll need to propagate the flags to more than binops eventually, so... I agree that this patch is hacky in that it only updates the binop API, but it was the minimal change. spatel: It is correct that we only have flags on binary ops today, but I was considering the case of…
bool exact);

/// List of non-single value types.		/// List of non-single value types.
FoldingSet<SDVTListNode> VTListMap;		FoldingSet<SDVTListNode> VTListMap;

/// Maps to auto-CSE operations.		/// Maps to auto-CSE operations.
std::vector<CondCodeSDNode*> CondCodeNodes;		std::vector<CondCodeSDNode*> CondCodeNodes;

std::vector<SDNode*> ValueTypeNodes;		std::vector<SDNode*> ValueTypeNodes;
Show All 19 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

	Show First 20 Lines • Show All 920 Lines • ▼ Show 20 Lines
	}			}

	inline void SDUse::setNode(SDNode *N) {			inline void SDUse::setNode(SDNode *N) {
	if (Val.getNode()) removeFromList();			if (Val.getNode()) removeFromList();
	Val.setNode(N);			Val.setNode(N);
	if (N) N->addUse(*this);			if (N) N->addUse(*this);
	}			}

	/// This class is used for single-operand SDNodes. This is solely			/// Returns true if the opcode is an operation with optional optimization flags.
	/// to allow co-allocation of node operands with the node itself.			static bool mayHaveOptimizationFlags(unsigned Opcode) {
	class UnarySDNode : public SDNode {
	SDUse Op;
	public:
	UnarySDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
	SDValue X)
	: SDNode(Opc, Order, dl, VTs) {
	InitOperands(&Op, X);
	}
	};

	/// This class is used for two-operand SDNodes. This is solely
	/// to allow co-allocation of node operands with the node itself.
	class BinarySDNode : public SDNode {
	SDUse Ops[2];
	public:
	BinarySDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
	SDValue X, SDValue Y)
	: SDNode(Opc, Order, dl, VTs) {
	InitOperands(Ops, X, Y);
	}
	};

	/// Returns true if the opcode is a binary operation with flags.
	static bool isBinOpWithFlags(unsigned Opcode) {
	switch (Opcode) {			switch (Opcode) {
	case ISD::SDIV:			case ISD::SDIV:
	case ISD::UDIV:			case ISD::UDIV:
	case ISD::SRA:			case ISD::SRA:
	case ISD::SRL:			case ISD::SRL:
	case ISD::MUL:			case ISD::MUL:
	case ISD::ADD:			case ISD::ADD:
	case ISD::SUB:			case ISD::SUB:
	case ISD::SHL:			case ISD::SHL:
				case ISD::FADD:
				case ISD::FDIV:
				case ISD::FMUL:
				case ISD::FREM:
				case ISD::FSUB:
	return true;			return true;
	default:			default:
	return false;			return false;
	}			}
	}			}

	/// This class is an extension of BinarySDNode			class SDNodeFlags {
	/// used from those opcodes that have associated extra flags.			private:
	class BinaryWithFlagsSDNode : public BinarySDNode {			friend class SDNodeWithFlags;
	enum { NUW = (1 << 0), NSW = (1 << 1), EXACT = (1 << 2) };			unsigned Flags;
				SDNodeFlags(unsigned F) : Flags(F) { }

				public:
				enum {
				UnsafeAlgebra = (1 << 0),
				NoNaNs = (1 << 1),
				NoInfs = (1 << 2),
				NoSignedZeros = (1 << 3),
				AllowReciprocal = (1 << 4),
				NoUnsignedWrap = (1 << 5),
				NoSignedWrap = (1 << 6),
				Exact = (1 << 7)
				};

				SDNodeFlags() : Flags(0) { }

				void setHasNoUnsignedWrap(bool b) {
				Flags = (Flags & ~NoUnsignedWrap) \| (b ? NoUnsignedWrap : 0);
				}

				void setHasNoSignedWrap(bool b) {
				Flags = (Flags & ~NoSignedWrap) \| (b ? NoSignedWrap : 0);
				}
				andreadbUnsubmitted Not Done Reply Inline Actions This is ok. However, what about having something like this? setNoUnsignedWrap() { Flags \|= NoUnsignedWrap; } clearNoUnsignedWrap() { Flags ^= NoUnsignedWrap; } Also it is a shame that most of this code is repeated in 'class SDNodeWithFlags'. I wonder if there is a better design that allows to delegate the 'bit manipulation part' as much as possible to 'SDNodeFlags'... andreadb: This is ok. However, what about having something like this? ``` setNoUnsignedWrap() { Flags…
				spatelAuthorUnsubmitted Not Done Reply Inline Actions I agree completely, but I didn't see an immediate solution, so I just went with the lazy approach: copy and paste! I'm open to any suggestions for improvement on the interface, and eventually (soon) I would unify the IR and SDNode versions of the flags, so we get the benefits in both places. spatel: I agree completely, but I didn't see an immediate solution, so I just went with the lazy…

				void setHasExact(bool b) {
				Flags = (Flags & ~Exact) \| (b ? Exact : 0);
				}

				void setHasUnsafeAlgebra(bool b) {
				Flags = (Flags & ~UnsafeAlgebra) \| (b ? UnsafeAlgebra : 0);
				}

				void setHasNoNaNs(bool b) {
				Flags = (Flags & ~NoNaNs) \| (b ? NoNaNs : 0);
				}

				void setHasNoInfs(bool b) {
				Flags = (Flags & ~NoInfs) \| (b ? NoInfs : 0);
				}

				void setHasNoSignedZeros(bool b) {
				Flags = (Flags & ~NoSignedZeros) \| (b ? NoSignedZeros : 0);
				}

				void setHasAllowReciprocal(bool b) {
				Flags = (Flags & ~AllowReciprocal) \| (b ? AllowReciprocal : 0);
				}

				/// Return a raw encoding of the flags.
				/// This function should only be used to add data to the NodeID value.
				unsigned getRawFlags() const {
				return Flags;
				}
				};

				/// This class is an extension of SDNode used from instructions that may have
				/// associated extra flags.
				class SDNodeWithFlags : public SDNode {
	public:			public:
	BinaryWithFlagsSDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,			SDNodeWithFlags(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
	SDValue X, SDValue Y)			ArrayRef<SDValue> Ops, const SDNodeFlags &NodeFlags)
	: BinarySDNode(Opc, Order, dl, VTs, X, Y) {}			: SDNode(Opc, Order, dl, VTs, Ops) {
	/// Return the SubclassData value, which contains an encoding of the flags.			SubclassData = NodeFlags.Flags;
	/// This function should be used to add subclass data to the NodeID value.			}
	unsigned getRawSubclassData() const { return SubclassData; }
				// These are mutator methods for the optimization flags.

	void setHasNoUnsignedWrap(bool b) {			void setHasNoUnsignedWrap(bool b) {
	SubclassData = (SubclassData & ~NUW) \| (b ? NUW : 0);			SubclassData = (SubclassData & ~SDNodeFlags::NoUnsignedWrap) \|
				(b ? SDNodeFlags::NoUnsignedWrap : 0);
	}			}

	void setHasNoSignedWrap(bool b) {			void setHasNoSignedWrap(bool b) {
	SubclassData = (SubclassData & ~NSW) \| (b ? NSW : 0);			SubclassData = (SubclassData & ~SDNodeFlags::NoSignedWrap) \|
				(b ? SDNodeFlags::NoSignedWrap : 0);
	}			}
	void setIsExact(bool b) {
	SubclassData = (SubclassData & ~EXACT) \| (b ? EXACT : 0);			void setHasExact(bool b) {
				SubclassData = (SubclassData & ~SDNodeFlags::Exact) \|
				(b ? SDNodeFlags::Exact : 0);
	}			}
	bool hasNoUnsignedWrap() const { return SubclassData & NUW; }
	bool hasNoSignedWrap() const { return SubclassData & NSW; }			void setHasUnsafeAlgebra(bool b) {
	bool isExact() const { return SubclassData & EXACT; }			SubclassData = (SubclassData & ~SDNodeFlags::UnsafeAlgebra) \|
				(b ? SDNodeFlags::UnsafeAlgebra : 0);
				}

				void setHasNoNaNs(bool b) {
				SubclassData = (SubclassData & ~SDNodeFlags::NoNaNs) \|
				(b ? SDNodeFlags::NoNaNs : 0);
				}

				void setHasNoInfs(bool b) {
				SubclassData = (SubclassData & ~SDNodeFlags::NoInfs) \|
				(b ? SDNodeFlags::NoInfs : 0);
				}

				void setHasNoSignedZeros(bool b) {
				SubclassData = (SubclassData & ~SDNodeFlags::NoSignedZeros) \|
				(b ? SDNodeFlags::NoSignedZeros : 0);
				}

				void setHasAllowReciprocal(bool b) {
				SubclassData = (SubclassData & ~SDNodeFlags::AllowReciprocal) \|
				(b ? SDNodeFlags::AllowReciprocal : 0);
				}

				// These are accessor methods for the optimization flags.

				SDNodeFlags getFlags() const {
				return SDNodeFlags(SubclassData);
				}

				bool hasNoUnsignedWrap() const {
				return SubclassData & SDNodeFlags::NoUnsignedWrap;
				}

				bool hasNoSignedWrap() const {
				return SubclassData & SDNodeFlags::NoSignedWrap;
				}

				bool hasExact() const {
				return SubclassData & SDNodeFlags::Exact;
				}

				bool hasUnsafeAlgebra() const {
				return SubclassData & SDNodeFlags::UnsafeAlgebra;
				}

				bool hasNoNaNs() const {
				return SubclassData & SDNodeFlags::NoNaNs;
				}

				bool hasNoInfs() const {
				return SubclassData & SDNodeFlags::NoInfs;
				}

				bool hasNoSignedZeros() const {
				return SubclassData & SDNodeFlags::NoSignedZeros;
				}

				bool hasAllowReciprocal() const {
				return SubclassData & SDNodeFlags::AllowReciprocal;
				}

				// This is used to implement dyn_cast, isa, and other type queries.
	static bool classof(const SDNode *N) {			static bool classof(const SDNode *N) {
	return isBinOpWithFlags(N->getOpcode());			return mayHaveOptimizationFlags(N->getOpcode());
				}
				};


				/// This class is used for single-operand SDNodes. This is solely
				/// to allow co-allocation of node operands with the node itself.
				class UnarySDNode : public SDNode {
				SDUse Op;
				public:
				UnarySDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
				SDValue X)
				: SDNode(Opc, Order, dl, VTs) {
				InitOperands(&Op, X);
				}
				};

				/// This class is used for two-operand SDNodes. This is solely
				/// to allow co-allocation of node operands with the node itself.
				class BinarySDNode : public SDNode {
				SDUse Ops[2];
				public:
				BinarySDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
				SDValue X, SDValue Y)
				: SDNode(Opc, Order, dl, VTs) {
				InitOperands(Ops, X, Y);
	}			}
	};			};

	/// This class is used for three-operand SDNodes. This is solely			/// This class is used for three-operand SDNodes. This is solely
	/// to allow co-allocation of node operands with the node itself.			/// to allow co-allocation of node operands with the node itself.
	class TernarySDNode : public SDNode {			class TernarySDNode : public SDNode {
	SDUse Ops[3];			SDUse Ops[3];
	public:			public:
	▲ Show 20 Lines • Show All 1,148 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,437 Lines • ▼ Show 20 Lines	if (!RV.getNode() && SelectionDAG::isCommutativeBinOp(N->getOpcode()) &&
N->getNumValues() == 1) {		N->getNumValues() == 1) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);

// Constant operands are canonicalized to RHS.		// Constant operands are canonicalized to RHS.
if (isa<ConstantSDNode>(N0) \|\| !isa<ConstantSDNode>(N1)) {		if (isa<ConstantSDNode>(N0) \|\| !isa<ConstantSDNode>(N1)) {
SDValue Ops[] = {N1, N0};		SDValue Ops[] = {N1, N0};
SDNode *CSENode;		SDNode *CSENode;
if (const BinaryWithFlagsSDNode *BinNode =		if (const SDNodeWithFlags *FlagsNode =
dyn_cast<BinaryWithFlagsSDNode>(N)) {		dyn_cast<SDNodeWithFlags>(N)) {
		SDNodeFlags Flags = FlagsNode->getFlags();
CSENode = DAG.getNodeIfExists(		CSENode = DAG.getNodeIfExists(
N->getOpcode(), N->getVTList(), Ops, BinNode->hasNoUnsignedWrap(),		N->getOpcode(), N->getVTList(), Ops, &Flags);
BinNode->hasNoSignedWrap(), BinNode->isExact());
} else {		} else {
CSENode = DAG.getNodeIfExists(N->getOpcode(), N->getVTList(), Ops);		CSENode = DAG.getNodeIfExists(N->getOpcode(), N->getVTList(), Ops);
}		}
if (CSENode)		if (CSENode)
return SDValue(CSENode, 0);		return SDValue(CSENode, 0);
}		}
}		}

▲ Show 20 Lines • Show All 6,264 Lines • ▼ Show 20 Lines	if (VT.isVector())
if (SDValue FoldedVOp = SimplifyVBinOp(N))		if (SDValue FoldedVOp = SimplifyVBinOp(N))
return FoldedVOp;		return FoldedVOp;

// fold (fdiv c1, c2) -> c1/c2		// fold (fdiv c1, c2) -> c1/c2
if (N0CFP && N1CFP)		if (N0CFP && N1CFP)
return DAG.getNode(ISD::FDIV, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::FDIV, SDLoc(N), VT, N0, N1);

if (Options.UnsafeFPMath) {		if (Options.UnsafeFPMath) {
// fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.
if (N1CFP) {
// Compute the reciprocal 1.0 / c2.
APFloat N1APF = N1CFP->getValueAPF();
APFloat Recip(N1APF.getSemantics(), 1); // 1.0
APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
// Only do the transform if the reciprocal is a legal fp immediate that
// isn't too nasty (eg NaN, denormal, ...).
if ((st == APFloat::opOK \|\| st == APFloat::opInexact) && // Not too nasty
(!LegalOperations \|\|
// FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
// backend)... we should handle this gracefully after Legalize.
// TLI.isOperationLegalOrCustom(llvm::ISD::ConstantFP, VT) \|\|
TLI.isOperationLegal(llvm::ISD::ConstantFP, VT) \|\|
TLI.isFPImmLegal(Recip, VT)))
return DAG.getNode(ISD::FMUL, SDLoc(N), VT, N0,
DAG.getConstantFP(Recip, VT));
}

// If this FDIV is part of a reciprocal square root, it may be folded		// If this FDIV is part of a reciprocal square root, it may be folded
// into a target-specific square root estimate instruction.		// into a target-specific square root estimate instruction.
if (N1.getOpcode() == ISD::FSQRT) {		if (N1.getOpcode() == ISD::FSQRT) {
if (SDValue RV = BuildRsqrtEstimate(N1.getOperand(0))) {		if (SDValue RV = BuildRsqrtEstimate(N1.getOperand(0))) {
return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
}		}
} else if (N1.getOpcode() == ISD::FP_EXTEND &&		} else if (N1.getOpcode() == ISD::FP_EXTEND &&
N1.getOperand(0).getOpcode() == ISD::FSQRT) {		N1.getOperand(0).getOpcode() == ISD::FSQRT) {
Show All 26 Lines	if (N1.getOpcode() == ISD::FSQRT) {
// x / (y * sqrt(z)) -> x * (rsqrt(z) / y)		// x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
if (SDValue RV = BuildRsqrtEstimate(SqrtOp.getOperand(0))) {		if (SDValue RV = BuildRsqrtEstimate(SqrtOp.getOperand(0))) {
RV = DAG.getNode(ISD::FDIV, SDLoc(N1), VT, RV, OtherOp);		RV = DAG.getNode(ISD::FDIV, SDLoc(N1), VT, RV, OtherOp);
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
}		}
}		}
}		}
		} // fast-math

		// For plain reciprocal optimization of a division node, check the division
		// node's optimization flags as well as the function-wide fast-math setting.
		bool AllowRecip = false;
		if (auto *NodeWithFlags = dyn_cast<SDNodeWithFlags>(N)) {
		AllowRecip = NodeWithFlags->hasAllowReciprocal();
		}

		if (Options.UnsafeFPMath \|\| AllowRecip) {
		// fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.
		if (N1CFP) {
		// Compute the reciprocal 1.0 / c2.
		APFloat N1APF = N1CFP->getValueAPF();
		APFloat Recip(N1APF.getSemantics(), 1); // 1.0
		APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
		// Only do the transform if the reciprocal is a legal fp immediate that
		// isn't too nasty (eg NaN, denormal, ...).
		if ((st == APFloat::opOK \|\| st == APFloat::opInexact) && // Not too nasty
		(!LegalOperations \|\|
		// FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
		// backend)... we should handle this gracefully after Legalize.
		// TLI.isOperationLegalOrCustom(llvm::ISD::ConstantFP, VT) \|\|
		TLI.isOperationLegal(llvm::ISD::ConstantFP, VT) \|\|
		TLI.isFPImmLegal(Recip, VT)))
		return DAG.getNode(ISD::FMUL, SDLoc(N), VT, N0,
		DAG.getConstantFP(Recip, VT));
		}

// Fold into a reciprocal estimate and multiply instead of a real divide.		// Fold into a reciprocal estimate and multiply instead of a real divide.
if (SDValue RV = BuildReciprocalEstimate(N1)) {		if (SDValue RV = BuildReciprocalEstimate(N1)) {
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
}		}
}		}

// (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y)		// (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y)
if (char LHSNeg = isNegatibleForFree(N0, LegalOperations, TLI, &Options)) {		if (char LHSNeg = isNegatibleForFree(N0, LegalOperations, TLI, &Options)) {
if (char RHSNeg = isNegatibleForFree(N1, LegalOperations, TLI, &Options)) {		if (char RHSNeg = isNegatibleForFree(N1, LegalOperations, TLI, &Options)) {
// Both can be negated for free, check to see if at least one is cheaper		// Both can be negated for free, check to see if at least one is cheaper
// negated.		// negated.
if (LHSNeg == 2 \|\| RHSNeg == 2)		if (LHSNeg == 2 \|\| RHSNeg == 2)
return DAG.getNode(ISD::FDIV, SDLoc(N), VT,		return DAG.getNode(ISD::FDIV, SDLoc(N), VT,
GetNegatedExpression(N0, DAG, LegalOperations),		GetNegatedExpression(N0, DAG, LegalOperations),
GetNegatedExpression(N1, DAG, LegalOperations));		GetNegatedExpression(N1, DAG, LegalOperations));
}		}
}		}

// Combine multiple FDIVs with the same divisor into multiple FMULs by the		// Combine multiple FDIVs with the same divisor into multiple FMULs by the
// reciprocal.		// reciprocal.
		echristoUnsubmitted Not Done Reply Inline Actions Nit: No braces around single statement ifs. echristo: Nit: No braces around single statement ifs.
// E.g., (a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)		// E.g., (a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)
// Notice that this is not always beneficial. One reason is different target		// Notice that this is not always beneficial. One reason is different target
// may have different costs for FDIV and FMUL, so sometimes the cost of two		// may have different costs for FDIV and FMUL, so sometimes the cost of two
// FDIVs may be lower than the cost of one FDIV and two FMULs. Another reason		// FDIVs may be lower than the cost of one FDIV and two FMULs. Another reason
// is the critical path is increased from "one FDIV" to "one FDIV + one FMUL".		// is the critical path is increased from "one FDIV" to "one FDIV + one FMUL".
if (Options.UnsafeFPMath) {		if (Options.UnsafeFPMath) {
// Skip if current node is a reciprocal.		// Skip if current node is a reciprocal.
if (N0CFP && N0CFP->isExactlyValue(1.0))		if (N0CFP && N0CFP->isExactlyValue(1.0))
▲ Show 20 Lines • Show All 5,544 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Show First 20 Lines • Show All 394 Lines • ▼ Show 20 Lines
static void AddNodeIDOperands(FoldingSetNodeID &ID,		static void AddNodeIDOperands(FoldingSetNodeID &ID,
ArrayRef<SDUse> Ops) {		ArrayRef<SDUse> Ops) {
for (auto& Op : Ops) {		for (auto& Op : Ops) {
ID.AddPointer(Op.getNode());		ID.AddPointer(Op.getNode());
ID.AddInteger(Op.getResNo());		ID.AddInteger(Op.getResNo());
}		}
}		}

static void AddBinaryNodeIDCustom(FoldingSetNodeID &ID, bool nuw, bool nsw,		// Add logical or fast math flag values to FoldingSetNodeID value.
bool exact) {		static void AddNodeIDFlags(FoldingSetNodeID &ID, unsigned Opcode,
ID.AddBoolean(nuw);		const SDNodeFlags *Flags) {
ID.AddBoolean(nsw);		if (Flags && mayHaveOptimizationFlags(Opcode)) {
		echristoUnsubmitted Not Done Reply Inline Actions Early exit? echristo: Early exit?
ID.AddBoolean(exact);		unsigned RawFlags = Flags->getRawFlags();
		// If no flags are set, do not alter the ID. This saves time and allows
		// a gradual increase in API usage of the optional optimization flags.
		if (RawFlags != 0)
		ID.AddInteger(RawFlags);
		}
}		}

/// AddBinaryNodeIDCustom - Add BinarySDNodes special infos		static void AddNodeIDFlags(FoldingSetNodeID &ID, const SDNode *N) {
static void AddBinaryNodeIDCustom(FoldingSetNodeID &ID, unsigned Opcode,		if (auto *Node = dyn_cast<SDNodeWithFlags>(N)) {
		echristoUnsubmitted Not Done Reply Inline Actions Early exit? echristo: Early exit?
bool nuw, bool nsw, bool exact) {		SDNodeFlags Flags = Node->getFlags();
if (isBinOpWithFlags(Opcode))		AddNodeIDFlags(ID, Node->getOpcode(), &Flags);
AddBinaryNodeIDCustom(ID, nuw, nsw, exact);		}
}		}

static void AddNodeIDNode(FoldingSetNodeID &ID, unsigned short OpC,		static void AddNodeIDNode(FoldingSetNodeID &ID, unsigned short OpC,
SDVTList VTList, ArrayRef<SDValue> OpList) {		SDVTList VTList, ArrayRef<SDValue> OpList) {
AddNodeIDOpcode(ID, OpC);		AddNodeIDOpcode(ID, OpC);
AddNodeIDValueTypes(ID, VTList);		AddNodeIDValueTypes(ID, VTList);
AddNodeIDOperands(ID, OpList);		AddNodeIDOperands(ID, OpList);
}		}
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	static void AddNodeIDCustom(FoldingSetNodeID &ID, const SDNode *N) {
}		}
case ISD::STORE: {		case ISD::STORE: {
const StoreSDNode *ST = cast<StoreSDNode>(N);		const StoreSDNode *ST = cast<StoreSDNode>(N);
ID.AddInteger(ST->getMemoryVT().getRawBits());		ID.AddInteger(ST->getMemoryVT().getRawBits());
ID.AddInteger(ST->getRawSubclassData());		ID.AddInteger(ST->getRawSubclassData());
ID.AddInteger(ST->getPointerInfo().getAddrSpace());		ID.AddInteger(ST->getPointerInfo().getAddrSpace());
break;		break;
}		}
case ISD::SDIV:
case ISD::UDIV:
case ISD::SRA:
case ISD::SRL:
case ISD::MUL:
case ISD::ADD:
case ISD::SUB:
case ISD::SHL: {
const BinaryWithFlagsSDNode *BinNode = cast<BinaryWithFlagsSDNode>(N);
AddBinaryNodeIDCustom(ID, N->getOpcode(), BinNode->hasNoUnsignedWrap(),
BinNode->hasNoSignedWrap(), BinNode->isExact());
break;
}
case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS:		case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS:
case ISD::ATOMIC_SWAP:		case ISD::ATOMIC_SWAP:
case ISD::ATOMIC_LOAD_ADD:		case ISD::ATOMIC_LOAD_ADD:
case ISD::ATOMIC_LOAD_SUB:		case ISD::ATOMIC_LOAD_SUB:
case ISD::ATOMIC_LOAD_AND:		case ISD::ATOMIC_LOAD_AND:
case ISD::ATOMIC_LOAD_OR:		case ISD::ATOMIC_LOAD_OR:
case ISD::ATOMIC_LOAD_XOR:		case ISD::ATOMIC_LOAD_XOR:
Show All 27 Lines	case ISD::BlockAddress: {
const BlockAddressSDNode *BA = cast<BlockAddressSDNode>(N);		const BlockAddressSDNode *BA = cast<BlockAddressSDNode>(N);
ID.AddPointer(BA->getBlockAddress());		ID.AddPointer(BA->getBlockAddress());
ID.AddInteger(BA->getOffset());		ID.AddInteger(BA->getOffset());
ID.AddInteger(BA->getTargetFlags());		ID.AddInteger(BA->getTargetFlags());
break;		break;
}		}
} // end switch (N->getOpcode())		} // end switch (N->getOpcode())

		AddNodeIDFlags(ID, N);

// Target specific memory nodes could also have address spaces to check.		// Target specific memory nodes could also have address spaces to check.
if (N->isTargetMemoryOpcode())		if (N->isTargetMemoryOpcode())
ID.AddInteger(cast<MemSDNode>(N)->getPointerInfo().getAddrSpace());		ID.AddInteger(cast<MemSDNode>(N)->getPointerInfo().getAddrSpace());
}		}

/// AddNodeIDNode - Generic routine for adding a nodes info to the NodeID		/// AddNodeIDNode - Generic routine for adding a nodes info to the NodeID
/// data.		/// data.
static void AddNodeIDNode(FoldingSetNodeID &ID, const SDNode *N) {		static void AddNodeIDNode(FoldingSetNodeID &ID, const SDNode *N) {
▲ Show 20 Lines • Show All 378 Lines • ▼ Show 20 Lines

void SelectionDAG::allnodes_clear() {		void SelectionDAG::allnodes_clear() {
assert(&*AllNodes.begin() == &EntryNode);		assert(&*AllNodes.begin() == &EntryNode);
AllNodes.remove(AllNodes.begin());		AllNodes.remove(AllNodes.begin());
while (!AllNodes.empty())		while (!AllNodes.empty())
DeallocateNode(AllNodes.begin());		DeallocateNode(AllNodes.begin());
}		}

BinarySDNode *SelectionDAG::GetBinarySDNode(unsigned Opcode, SDLoc DL,		SDNode *SelectionDAG::GetSDNodeWithFlags(unsigned Opcode, SDLoc DL,
SDVTList VTs, SDValue N1,		SDVTList VTs, ArrayRef<SDValue> Ops,
SDValue N2, bool nuw, bool nsw,		const SDNodeFlags *Flags) {
bool exact) {		if (Flags && mayHaveOptimizationFlags(Opcode)) {
if (isBinOpWithFlags(Opcode)) {		SDNodeWithFlags *NodeWithFlags = new (NodeAllocator) SDNodeWithFlags(
BinaryWithFlagsSDNode *FN = new (NodeAllocator) BinaryWithFlagsSDNode(		Opcode, DL.getIROrder(), DL.getDebugLoc(), VTs, Ops, *Flags);
Opcode, DL.getIROrder(), DL.getDebugLoc(), VTs, N1, N2);		return NodeWithFlags;
FN->setHasNoUnsignedWrap(nuw);
FN->setHasNoSignedWrap(nsw);
FN->setIsExact(exact);

return FN;
}		}

BinarySDNode *N = new (NodeAllocator)		SDNode *N = new (NodeAllocator) SDNode(Opcode, DL.getIROrder(),
BinarySDNode(Opcode, DL.getIROrder(), DL.getDebugLoc(), VTs, N1, N2);		DL.getDebugLoc(), VTs, Ops);
return N;		return N;
}		}

void SelectionDAG::clear() {		void SelectionDAG::clear() {
allnodes_clear();		allnodes_clear();
OperandAllocator.Reset();		OperandAllocator.Reset();
CSEMap.clear();		CSEMap.clear();

ExtendedValueTypeNodes.clear();		ExtendedValueTypeNodes.clear();
ExternalSymbols.clear();		ExternalSymbols.clear();
TargetExternalSymbols.clear();		TargetExternalSymbols.clear();
▲ Show 20 Lines • Show All 2,174 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::FoldConstantArithmetic(unsigned Opcode, EVT VT,
// We may have a vector type but a scalar result. Create a splat.		// We may have a vector type but a scalar result. Create a splat.
Outputs.resize(VT.getVectorNumElements(), Outputs.back());		Outputs.resize(VT.getVectorNumElements(), Outputs.back());

// Build a big vector out of the scalar elements we generated.		// Build a big vector out of the scalar elements we generated.
return getNode(ISD::BUILD_VECTOR, SDLoc(), VT, Outputs);		return getNode(ISD::BUILD_VECTOR, SDLoc(), VT, Outputs);
}		}

SDValue SelectionDAG::getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1,		SDValue SelectionDAG::getNode(unsigned Opcode, SDLoc DL, EVT VT, SDValue N1,
SDValue N2, bool nuw, bool nsw, bool exact) {		SDValue N2, const SDNodeFlags *Flags) {
ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());		ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());
ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N2.getNode());		ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N2.getNode());
switch (Opcode) {		switch (Opcode) {
default: break;		default: break;
case ISD::TokenFactor:		case ISD::TokenFactor:
assert(VT == MVT::Other && N1.getValueType() == MVT::Other &&		assert(VT == MVT::Other && N1.getValueType() == MVT::Other &&
N2.getValueType() == MVT::Other && "Invalid token factor!");		N2.getValueType() == MVT::Other && "Invalid token factor!");
// Fold trivial token factors.		// Fold trivial token factors.
▲ Show 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	case ISD::OR:
// the LHS.		// the LHS.
return N1;		return N1;
case ISD::SRA:		case ISD::SRA:
return N1;		return N1;
}		}
}		}

// Memoize this node if possible.		// Memoize this node if possible.
BinarySDNode *N;		SDNode *N;
SDVTList VTs = getVTList(VT);		SDVTList VTs = getVTList(VT);
const bool BinOpHasFlags = isBinOpWithFlags(Opcode);		SDValue Ops[] = { N1, N2 };
if (VT != MVT::Glue) {		if (VT != MVT::Glue) {
SDValue Ops[] = {N1, N2};		SDValue Ops[] = {N1, N2};
FoldingSetNodeID ID;		FoldingSetNodeID ID;
AddNodeIDNode(ID, Opcode, VTs, Ops);		AddNodeIDNode(ID, Opcode, VTs, Ops);
if (BinOpHasFlags)		AddNodeIDFlags(ID, Opcode, Flags);
AddBinaryNodeIDCustom(ID, Opcode, nuw, nsw, exact);
void *IP = nullptr;		void *IP = nullptr;
if (SDNode *E = CSEMap.FindNodeOrInsertPos(ID, IP))		if (SDNode *E = CSEMap.FindNodeOrInsertPos(ID, IP))
return SDValue(E, 0);		return SDValue(E, 0);

N = GetBinarySDNode(Opcode, DL, VTs, N1, N2, nuw, nsw, exact);		N = GetSDNodeWithFlags(Opcode, DL, VTs, Ops, Flags);

		kariddiUnsubmitted Not Done Reply Inline Actions Does it even make sense to have BinarySDNode around at all considering the main function generating these now generates standard SDNodes? kariddi: Does it even make sense to have BinarySDNode around at all considering the main function…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Hi Marcello - Thanks for looking at this patch. I got rid of "BinaryWithFlagsSDNode", but "BinarySDNode" had some uses elsewhere along with "UnarySDNode" and "TernarySDNode". The comment for BinarySDNode says: /// This class is used for two-operand SDNodes. This is solely /// to allow co-allocation of node operands with the node itself. So it's just there for malloc efficiency? If everybody agrees that it has no value, then I can delete it, but I think that should be a follow-on patch. spatel: Hi Marcello - Thanks for looking at this patch. I got rid of "BinaryWithFlagsSDNode", but…
		kariddiUnsubmitted Not Done Reply Inline Actions Yeah, sure, I was just bringing up the the question, thats all :) kariddi: Yeah, sure, I was just bringing up the the question, thats all :)
CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
} else {		} else {
N = GetBinarySDNode(Opcode, DL, VTs, N1, N2, nuw, nsw, exact);		N = GetSDNodeWithFlags(Opcode, DL, VTs, Ops, Flags);
}		}

InsertNode(N);		InsertNode(N);
return SDValue(N, 0);		return SDValue(N, 0);
}		}

SDValue SelectionDAG::getNode(unsigned Opcode, SDLoc DL, EVT VT,		SDValue SelectionDAG::getNode(unsigned Opcode, SDLoc DL, EVT VT,
SDValue N1, SDValue N2, SDValue N3) {		SDValue N1, SDValue N2, SDValue N3) {
▲ Show 20 Lines • Show All 2,233 Lines • ▼ Show 20 Lines	SelectionDAG::getTargetInsertSubreg(int SRIdx, SDLoc DL, EVT VT,
SDNode *Result = getMachineNode(TargetOpcode::INSERT_SUBREG, DL,		SDNode *Result = getMachineNode(TargetOpcode::INSERT_SUBREG, DL,
VT, Operand, Subreg, SRIdxVal);		VT, Operand, Subreg, SRIdxVal);
return SDValue(Result, 0);		return SDValue(Result, 0);
}		}

/// getNodeIfExists - Get the specified node if it's already available, or		/// getNodeIfExists - Get the specified node if it's already available, or
/// else return NULL.		/// else return NULL.
SDNode *SelectionDAG::getNodeIfExists(unsigned Opcode, SDVTList VTList,		SDNode *SelectionDAG::getNodeIfExists(unsigned Opcode, SDVTList VTList,
ArrayRef<SDValue> Ops, bool nuw, bool nsw,		ArrayRef<SDValue> Ops,
bool exact) {		const SDNodeFlags *Flags) {
if (VTList.VTs[VTList.NumVTs - 1] != MVT::Glue) {		if (VTList.VTs[VTList.NumVTs - 1] != MVT::Glue) {
FoldingSetNodeID ID;		FoldingSetNodeID ID;
AddNodeIDNode(ID, Opcode, VTList, Ops);		AddNodeIDNode(ID, Opcode, VTList, Ops);
if (isBinOpWithFlags(Opcode))		AddNodeIDFlags(ID, Opcode, Flags);
AddBinaryNodeIDCustom(ID, nuw, nsw, exact);
void *IP = nullptr;		void *IP = nullptr;
if (SDNode *E = CSEMap.FindNodeOrInsertPos(ID, IP))		if (SDNode *E = CSEMap.FindNodeOrInsertPos(ID, IP))
return E;		return E;
}		}
return nullptr;		return nullptr;
}		}

/// getDbgValue - Creates a SDDbgValue node.		/// getDbgValue - Creates a SDDbgValue node.
▲ Show 20 Lines • Show All 1,064 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,826 Lines • ▼ Show 20 Lines

void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {		void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

bool nuw = false;		bool nuw = false;
bool nsw = false;		bool nsw = false;
bool exact = false;		bool exact = false;
		FastMathFlags FMF;

if (const OverflowingBinaryOperator *OFBinOp =		if (const OverflowingBinaryOperator *OFBinOp =
dyn_cast<const OverflowingBinaryOperator>(&I)) {		dyn_cast<const OverflowingBinaryOperator>(&I)) {
nuw = OFBinOp->hasNoUnsignedWrap();		nuw = OFBinOp->hasNoUnsignedWrap();
nsw = OFBinOp->hasNoSignedWrap();		nsw = OFBinOp->hasNoSignedWrap();
}		}
if (const PossiblyExactOperator *ExactOp =		if (const PossiblyExactOperator *ExactOp =
dyn_cast<const PossiblyExactOperator>(&I))		dyn_cast<const PossiblyExactOperator>(&I))
exact = ExactOp->isExact();		exact = ExactOp->isExact();

		if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(&I))
		FMF = FPOp->getFastMathFlags();

		SDNodeFlags Flags;
		Flags.setHasAllowReciprocal(FMF.allowReciprocal());
		Flags.setHasExact(exact);
		Flags.setHasNoInfs(FMF.noInfs());
		Flags.setHasNoNaNs(FMF.noNaNs());
		Flags.setHasNoSignedWrap(nsw);
		Flags.setHasNoSignedZeros(FMF.noSignedZeros());
		Flags.setHasNoUnsignedWrap(nuw);
		Flags.setHasUnsafeAlgebra(FMF.unsafeAlgebra());
SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),		SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),
Op1, Op2, nuw, nsw, exact);		Op1, Op2, &Flags);
setValue(&I, BinNodeValue);		setValue(&I, BinNodeValue);
}		}

void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) {		void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

EVT ShiftTy =		EVT ShiftTy =
Show All 31 Lines	if (const OverflowingBinaryOperator *OFBinOp =
dyn_cast<const OverflowingBinaryOperator>(&I)) {		dyn_cast<const OverflowingBinaryOperator>(&I)) {
nuw = OFBinOp->hasNoUnsignedWrap();		nuw = OFBinOp->hasNoUnsignedWrap();
nsw = OFBinOp->hasNoSignedWrap();		nsw = OFBinOp->hasNoSignedWrap();
}		}
if (const PossiblyExactOperator *ExactOp =		if (const PossiblyExactOperator *ExactOp =
dyn_cast<const PossiblyExactOperator>(&I))		dyn_cast<const PossiblyExactOperator>(&I))
exact = ExactOp->isExact();		exact = ExactOp->isExact();
}		}
		SDNodeFlags Flags;
		Flags.setHasExact(exact);
		Flags.setHasNoSignedWrap(nsw);
		Flags.setHasNoUnsignedWrap(nuw);
SDValue Res = DAG.getNode(Opcode, getCurSDLoc(), Op1.getValueType(), Op1, Op2,		SDValue Res = DAG.getNode(Opcode, getCurSDLoc(), Op1.getValueType(), Op1, Op2,
nuw, nsw, exact);		&Flags);
setValue(&I, Res);		setValue(&I, Res);
}		}

void SelectionDAGBuilder::visitSDiv(const User &I) {		void SelectionDAGBuilder::visitSDiv(const User &I) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

// Turn exact SDivs into multiplications.		// Turn exact SDivs into multiplications.
▲ Show 20 Lines • Show All 4,895 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 2,644 Lines • ▼ Show 20 Lines	SDValue TargetLowering::BuildExactSDIV(SDValue Op1, SDValue Op2, SDLoc dl,
APInt d = C->getAPIntValue();		APInt d = C->getAPIntValue();
assert(d != 0 && "Division by zero!");		assert(d != 0 && "Division by zero!");

// Shift the value upfront if it is even, so the LSB is one.		// Shift the value upfront if it is even, so the LSB is one.
unsigned ShAmt = d.countTrailingZeros();		unsigned ShAmt = d.countTrailingZeros();
if (ShAmt) {		if (ShAmt) {
// TODO: For UDIV use SRL instead of SRA.		// TODO: For UDIV use SRL instead of SRA.
SDValue Amt = DAG.getConstant(ShAmt, getShiftAmountTy(Op1.getValueType()));		SDValue Amt = DAG.getConstant(ShAmt, getShiftAmountTy(Op1.getValueType()));
Op1 = DAG.getNode(ISD::SRA, dl, Op1.getValueType(), Op1, Amt, false, false,		SDNodeFlags Flags;
true);		Flags.setHasExact(true);
		Op1 = DAG.getNode(ISD::SRA, dl, Op1.getValueType(), Op1, Amt, &Flags);
d = d.ashr(ShAmt);		d = d.ashr(ShAmt);
}		}

// Calculate the multiplicative inverse, using Newton's method.		// Calculate the multiplicative inverse, using Newton's method.
APInt t, xn = d;		APInt t, xn = d;
while ((t = d*xn) != 1)		while ((t = d*xn) != 1)
xn *= APInt(d.getBitWidth(), 2) - t;		xn *= APInt(d.getBitWidth(), 2) - t;

▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,463 Lines • ▼ Show 20 Lines	case X86::COND_O: case X86::COND_NO: {
// Check if we really need to set the		// Check if we really need to set the
// Overflow flag. If NoSignedWrap is present		// Overflow flag. If NoSignedWrap is present
// that is not actually needed.		// that is not actually needed.
switch (Op->getOpcode()) {		switch (Op->getOpcode()) {
case ISD::ADD:		case ISD::ADD:
case ISD::SUB:		case ISD::SUB:
case ISD::MUL:		case ISD::MUL:
case ISD::SHL: {		case ISD::SHL: {
const BinaryWithFlagsSDNode *BinNode =		const SDNodeWithFlags *Node = cast<SDNodeWithFlags>(Op.getNode());
cast<BinaryWithFlagsSDNode>(Op.getNode());		if (Node->hasNoSignedWrap())
if (BinNode->hasNoSignedWrap())
break;		break;
}		}
default:		default:
NeedOF = true;		NeedOF = true;
break;		break;
}		}
break;		break;
}		}
▲ Show 20 Lines • Show All 12,204 Lines • Show Last 20 Lines

test/CodeGen/X86/recip-fastmath.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est \| FileCheck %s --check-prefix=RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est \| FileCheck %s --check-prefix=RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est -x86-recip-refinement-steps=2 \| FileCheck %s --check-prefix=REFINE			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est -x86-recip-refinement-steps=2 \| FileCheck %s --check-prefix=REFINE

				; There is no function-wide fast-math setting on this function.
				; We should still recognize the IR optimization flag and convert to a
				; reciprocal estimate if the CPU attributes allow it.
				define float @reciprocal_estimate_only(float %x, float %y) {
				%div = fdiv arcp float %x, %y
				ret float %div

				; CHECK-LABEL: reciprocal_estimate_only:
				; CHECK: divss
				; CHECK-NEXT: retq

				; RECIP-LABEL: reciprocal_estimate_only:
				; RECIP: vrcpss
				; RECIP: vmulss
				; RECIP: vsubss
				; RECIP: vmulss
				; RECIP: vaddss
				; RECIP: vmulss
				; RECIP-NEXT: retq

				; REFINE-LABEL: reciprocal_estimate_only:
				; REFINE: vrcpss
				; REFINE: vmulss
				; REFINE: vsubss
				; REFINE: vmulss
				; REFINE: vaddss
				; REFINE: vmulss
				; REFINE: vsubss
				; REFINE: vmulss
				; REFINE: vaddss
				; REFINE: vmulss
				; REFINE-NEXT: retq
				}


	; If the target's divss/divps instructions are substantially			; If the target's divss/divps instructions are substantially
	; slower than rcpss/rcpps with a Newton-Raphson refinement,			; slower than rcpss/rcpps with a Newton-Raphson refinement,
	; we should generate the estimate sequence.			; we should generate the estimate sequence.

	; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )			; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )
	; for details about the accuracy, speed, and implementation			; for details about the accuracy, speed, and implementation
	; differences of x86 reciprocal estimates.			; differences of x86 reciprocal estimates.

	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

propagate IR-level fast-math-flags to DAG nodesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 23427

include/llvm/CodeGen/SelectionDAG.h

include/llvm/CodeGen/SelectionDAGNodes.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/TargetLowering.cpp

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/recip-fastmath.ll

propagate IR-level fast-math-flags to DAG nodes
ClosedPublic