This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/CodeGen/SelectionDAG/
-
lib/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp

Differential D149855

[DAGCombiner] Avoid template for generalized pattern match.
Needs ReviewPublic

Authored by fakepaper56 on May 4 2023, 8:08 AM.

Download Raw Diff

Details

Reviewers

craig.topper
reames
frasercrmck
rogfer01
luke
simoll
RKSimon

Summary

D141891 introduced an approach to make functions to serve non-vp nodes and vp
nodes. The old patch used template to make functions have different MatchContext
classes. There is a concern that using template for many functions in
DAGCombiner.cpp may expand the binary too much.
The patch replaces template by selecting corresponding MatchContext class in
runtime.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fakepaper56 created this revision.May 4 2023, 8:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 8:08 AM

Herald added subscribers: ecnelises, steven.zhang, hiraditya. · View Herald Transcript

fakepaper56 requested review of this revision.May 4 2023, 8:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 8:08 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B229990: Diff 519503.May 4 2023, 8:59 AM

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

In D149855#4326180, @simoll wrote:

DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

I am sorry that the story is stupid. Actually I didn't encounter any problems for the binary size and I just ignored the performance cost of virtual dispatch. After putting the cost of virtual dispatch into consideration, I think the trade-off problem is too big for me and maybe too early to be considered.

fakepaper56 edited the summary of this revision. (Show Details)May 8 2023, 5:01 AM

In D149855#4326180, @simoll wrote:

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

I raised it an internal SiFive discussion. I agree a virtual dispatch and a heap allocation are not an improvement. We haven't applied this generic matcher to much code yet and if we started applying it aggressively we may double the size of DAGCombiner.cpp.o. So I thought it was worth thinking about alternative abstractions that didn't duplicate entire functions. I just don't have any good ideas yet.

In D149855#4327203, @craig.topper wrote:

In D149855#4326180, @simoll wrote:

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

I raised it an internal SiFive discussion. I agree a virtual dispatch and a heap allocation are not an improvement. We haven't applied this generic matcher to much code yet and if we started applying it aggressively we may double the size of DAGCombiner.cpp.o. So I thought it was worth thinking about alternative abstractions that didn't duplicate entire functions. I just don't have any good ideas yet.

You don't need to rewrite the templated code to have the option for virtual dispatch in the future:
If the need arises, we could implement one SuperMatchContext that defers to EmptyContext and VPMatchContext internally (via subclasses or otherwise) and only instantiate the templates for SuperMatchContext once.
(If you go down this route, there is a bunch of cool things you can do, eg combining different matchers or running matcher code with multiple contexts in parallel).

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

320 lines

Diff 519503

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	private:
SDValue visitVPGATHER(SDNode *N);		SDValue visitVPGATHER(SDNode *N);
SDValue visitVPSCATTER(SDNode *N);		SDValue visitVPSCATTER(SDNode *N);
SDValue visitFP_TO_FP16(SDNode *N);		SDValue visitFP_TO_FP16(SDNode *N);
SDValue visitFP16_TO_FP(SDNode *N);		SDValue visitFP16_TO_FP(SDNode *N);
SDValue visitFP_TO_BF16(SDNode *N);		SDValue visitFP_TO_BF16(SDNode *N);
SDValue visitVECREDUCE(SDNode *N);		SDValue visitVECREDUCE(SDNode *N);
SDValue visitVPOp(SDNode *N);		SDValue visitVPOp(SDNode *N);

template <class MatchContextClass>
SDValue visitFADDForFMACombine(SDNode *N);		SDValue visitFADDForFMACombine(SDNode *N);
template <class MatchContextClass>
SDValue visitFSUBForFMACombine(SDNode *N);		SDValue visitFSUBForFMACombine(SDNode *N);
SDValue visitFMULForFMADistributiveCombine(SDNode *N);		SDValue visitFMULForFMADistributiveCombine(SDNode *N);

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
bool reassociationCanBreakAddressingModePattern(unsigned Opc,		bool reassociationCanBreakAddressingModePattern(unsigned Opc,
const SDLoc &DL,		const SDLoc &DL,
SDNode *N,		SDNode *N,
SDValue N0,		SDValue N0,
▲ Show 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	public:
explicit WorklistInserter(DAGCombiner &dc)		explicit WorklistInserter(DAGCombiner &dc)
: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}		: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}

// FIXME: Ideally we could add N to the worklist, but this causes exponential		// FIXME: Ideally we could add N to the worklist, but this causes exponential
// compile time costs in large DAGs, e.g. Halide.		// compile time costs in large DAGs, e.g. Halide.
void NodeInserted(SDNode *N) override { DC.ConsiderForPruning(N); }		void NodeInserted(SDNode *N) override { DC.ConsiderForPruning(N); }
};		};

class EmptyMatchContext {		class MatchContext {
		protected:
SelectionDAG &DAG;		SelectionDAG &DAG;
const TargetLowering &TLI;		const TargetLowering &TLI;

public:		public:
EmptyMatchContext(SelectionDAG &DAG, const TargetLowering &TLI, SDNode *Root)		MatchContext(SelectionDAG &DAG, const TargetLowering &TLI)
: DAG(DAG), TLI(TLI) {}		: DAG(DAG), TLI(TLI) {}

bool match(SDValue OpN, unsigned Opcode) const {		static std::unique_ptr<MatchContext>
		get(SelectionDAG &DAG, const TargetLowering &TLI, SDNode *Root);

		virtual bool match(SDValue OpN, unsigned Opcode) const {
return Opcode == OpN->getOpcode();		return Opcode == OpN->getOpcode();
}		}

// Same as SelectionDAG::getNode().		// Same as SelectionDAG::getNode().
template <typename... ArgT> SDValue getNode(ArgT &&...Args) {		virtual SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
return DAG.getNode(std::forward<ArgT>(Args)...);		SDValue Operand) {
		return DAG.getNode(Opcode, DL, VT, Operand);
}		}

bool isOperationLegalOrCustom(unsigned Op, EVT VT,		virtual SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2) {
		return DAG.getNode(Opcode, DL, VT, N1, N2);
		}

		virtual SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3);
		}

		virtual SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
		SDValue Operand, SDNodeFlags Flags) {
		return DAG.getNode(Opcode, DL, VT, Operand, Flags);
		}

		virtual SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDNodeFlags Flags) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, Flags);
		}

		virtual SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3, SDNodeFlags Flags) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, Flags);
		}

		virtual bool isOperationLegalOrCustom(unsigned Op, EVT VT,
bool LegalOnly = false) const {		bool LegalOnly = false) const {
return TLI.isOperationLegalOrCustom(Op, VT, LegalOnly);		return TLI.isOperationLegalOrCustom(Op, VT, LegalOnly);
}		}
};		};

class VPMatchContext {		class VPMatchContext : public MatchContext {
SelectionDAG &DAG;
const TargetLowering &TLI;
SDValue RootMaskOp;		SDValue RootMaskOp;
SDValue RootVectorLenOp;		SDValue RootVectorLenOp;

public:		public:
VPMatchContext(SelectionDAG &DAG, const TargetLowering &TLI, SDNode *Root)		VPMatchContext(SelectionDAG &DAG, const TargetLowering &TLI, SDNode *Root)
: DAG(DAG), TLI(TLI), RootMaskOp(), RootVectorLenOp() {		: MatchContext(DAG, TLI) {
assert(Root->isVPOpcode());		assert(Root->isVPOpcode());
if (auto RootMaskPos = ISD::getVPMaskIdx(Root->getOpcode()))		if (auto RootMaskPos = ISD::getVPMaskIdx(Root->getOpcode()))
RootMaskOp = Root->getOperand(*RootMaskPos);		RootMaskOp = Root->getOperand(*RootMaskPos);

if (auto RootVLenPos =		if (auto RootVLenPos =
ISD::getVPExplicitVectorLengthIdx(Root->getOpcode()))		ISD::getVPExplicitVectorLengthIdx(Root->getOpcode()))
RootVectorLenOp = Root->getOperand(*RootVLenPos);		RootVectorLenOp = Root->getOperand(*RootVLenPos);
}		}

/// whether \p OpVal is a node that is functionally compatible with the		/// whether \p OpVal is a node that is functionally compatible with the
/// NodeType \p Opc		/// NodeType \p Opc
bool match(SDValue OpVal, unsigned Opc) const {		bool match(SDValue OpVal, unsigned Opc) const override {
if (!OpVal->isVPOpcode())		if (!OpVal->isVPOpcode())
return OpVal->getOpcode() == Opc;		return OpVal->getOpcode() == Opc;

auto BaseOpc = ISD::getBaseOpcodeForVP(OpVal->getOpcode(),		auto BaseOpc = ISD::getBaseOpcodeForVP(
!OpVal->getFlags().hasNoFPExcept());		OpVal->getOpcode(), !OpVal->getFlags().hasNoFPExcept());
if (BaseOpc != Opc)		if (BaseOpc != Opc)
return false;		return false;

// Make sure the mask of OpVal is true mask or is same as Root's.		// Make sure the mask of OpVal is true mask or is same as Root's.
unsigned VPOpcode = OpVal->getOpcode();		unsigned VPOpcode = OpVal->getOpcode();
if (auto MaskPos = ISD::getVPMaskIdx(VPOpcode)) {		if (auto MaskPos = ISD::getVPMaskIdx(VPOpcode)) {
SDValue MaskOp = OpVal.getOperand(*MaskPos);		SDValue MaskOp = OpVal.getOperand(*MaskPos);
if (RootMaskOp != MaskOp &&		if (RootMaskOp != MaskOp &&
!ISD::isConstantSplatVectorAllOnes(MaskOp.getNode()))		!ISD::isConstantSplatVectorAllOnes(MaskOp.getNode()))
return false;		return false;
}		}

// Make sure the EVL of OpVal is same as Root's.		// Make sure the EVL of OpVal is same as Root's.
if (auto VLenPos = ISD::getVPExplicitVectorLengthIdx(VPOpcode))		if (auto VLenPos = ISD::getVPExplicitVectorLengthIdx(VPOpcode))
if (RootVectorLenOp != OpVal.getOperand(*VLenPos))		if (RootVectorLenOp != OpVal.getOperand(*VLenPos))
return false;		return false;
return true;		return true;
}		}

// Specialize based on number of operands.		// Specialize based on number of operands.
// TODO emit VP intrinsics where MaskOp/VectorLenOp != null		// TODO emit VP intrinsics where MaskOp/VectorLenOp != null
// SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return		// SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return
// DAG.getNode(Opcode, DL, VT); }		// DAG.getNode(Opcode, DL, VT); }
SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand) {		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
		SDValue Operand) override {
unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);		unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);
assert(ISD::getVPMaskIdx(VPOpcode) == 1 &&		assert(ISD::getVPMaskIdx(VPOpcode) == 1 &&
ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 2);		ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 2);
return DAG.getNode(VPOpcode, DL, VT,		return DAG.getNode(VPOpcode, DL, VT,
{Operand, RootMaskOp, RootVectorLenOp});		{Operand, RootMaskOp, RootVectorLenOp});
}		}

SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
SDValue N2) {		SDValue N2) override {
unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);		unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);
assert(ISD::getVPMaskIdx(VPOpcode) == 2 &&		assert(ISD::getVPMaskIdx(VPOpcode) == 2 &&
ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 3);		ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 3);
return DAG.getNode(VPOpcode, DL, VT,		return DAG.getNode(VPOpcode, DL, VT,
{N1, N2, RootMaskOp, RootVectorLenOp});		{N1, N2, RootMaskOp, RootVectorLenOp});
}		}

SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
SDValue N2, SDValue N3) {		SDValue N2, SDValue N3) override {
unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);		unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);
assert(ISD::getVPMaskIdx(VPOpcode) == 3 &&		assert(ISD::getVPMaskIdx(VPOpcode) == 3 &&
ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 4);		ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 4);
return DAG.getNode(VPOpcode, DL, VT,		return DAG.getNode(VPOpcode, DL, VT,
{N1, N2, N3, RootMaskOp, RootVectorLenOp});		{N1, N2, N3, RootMaskOp, RootVectorLenOp});
}		}

SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,
SDNodeFlags Flags) {		SDNodeFlags Flags) override {
unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);		unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);
assert(ISD::getVPMaskIdx(VPOpcode) == 1 &&		assert(ISD::getVPMaskIdx(VPOpcode) == 1 &&
ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 2);		ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 2);
return DAG.getNode(VPOpcode, DL, VT, {Operand, RootMaskOp, RootVectorLenOp},		return DAG.getNode(VPOpcode, DL, VT,
Flags);		{Operand, RootMaskOp, RootVectorLenOp}, Flags);
}		}

SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
SDValue N2, SDNodeFlags Flags) {		SDValue N2, SDNodeFlags Flags) override {
unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);		unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);
assert(ISD::getVPMaskIdx(VPOpcode) == 2 &&		assert(ISD::getVPMaskIdx(VPOpcode) == 2 &&
ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 3);		ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 3);
return DAG.getNode(VPOpcode, DL, VT, {N1, N2, RootMaskOp, RootVectorLenOp},		return DAG.getNode(VPOpcode, DL, VT,
Flags);		{N1, N2, RootMaskOp, RootVectorLenOp}, Flags);
}		}

SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
SDValue N2, SDValue N3, SDNodeFlags Flags) {		SDValue N2, SDValue N3, SDNodeFlags Flags) override {
unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);		unsigned VPOpcode = ISD::getVPForBaseOpcode(Opcode);
assert(ISD::getVPMaskIdx(VPOpcode) == 3 &&		assert(ISD::getVPMaskIdx(VPOpcode) == 3 &&
ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 4);		ISD::getVPExplicitVectorLengthIdx(VPOpcode) == 4);
return DAG.getNode(VPOpcode, DL, VT,		return DAG.getNode(VPOpcode, DL, VT,
{N1, N2, N3, RootMaskOp, RootVectorLenOp}, Flags);		{N1, N2, N3, RootMaskOp, RootVectorLenOp}, Flags);
}		}

bool isOperationLegalOrCustom(unsigned Op, EVT VT,		bool isOperationLegalOrCustom(unsigned Op, EVT VT,
bool LegalOnly = false) const {		bool LegalOnly = false) const override {
unsigned VPOp = ISD::getVPForBaseOpcode(Op);		unsigned VPOp = ISD::getVPForBaseOpcode(Op);
return TLI.isOperationLegalOrCustom(VPOp, VT, LegalOnly);		return TLI.isOperationLegalOrCustom(VPOp, VT, LegalOnly);
}		}
};		};

		std::unique_ptr<MatchContext>
		MatchContext::get(SelectionDAG &DAG, const TargetLowering &TLI, SDNode *Root) {
		if (Root->isVPOpcode())
		return std::make_unique<VPMatchContext>(DAG, TLI, Root);
		return std::make_unique<MatchContext>(DAG, TLI);
		}

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetLowering::DAGCombinerInfo implementation		// TargetLowering::DAGCombinerInfo implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {		void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {
((DAGCombiner*)DC)->AddToWorklist(N);		((DAGCombiner*)DC)->AddToWorklist(N);
▲ Show 20 Lines • Show All 14,129 Lines • ▼ Show 20 Lines
}		}

// Returns true if `N` can assume no infinities involved in its computation.		// Returns true if `N` can assume no infinities involved in its computation.
static bool hasNoInfs(const TargetOptions &Options, SDValue N) {		static bool hasNoInfs(const TargetOptions &Options, SDValue N) {
return Options.NoInfsFPMath \|\| N->getFlags().hasNoInfs();		return Options.NoInfsFPMath \|\| N->getFlags().hasNoInfs();
}		}

/// Try to perform FMA combining on a given FADD node.		/// Try to perform FMA combining on a given FADD node.
template <class MatchContextClass>
SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);
MatchContextClass matcher(DAG, TLI, N);		std::unique_ptr<MatchContext> matcher = MatchContext::get(DAG, TLI, N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;

bool UseVP = std::is_same_v<MatchContextClass, VPMatchContext>;

// Floating-point multiply-add with intermediate rounding.		// Floating-point multiply-add with intermediate rounding.
// FIXME: Make isFMADLegal have specific behavior when using VPMatchContext.		// FIXME: Make isFMADLegal have specific behavior when N is a vp node.
// FIXME: Add VP_FMAD opcode.		// FIXME: Add VP_FMAD opcode.
bool HasFMAD = !UseVP && (LegalOperations && TLI.isFMADLegal(DAG, N));		bool HasFMAD =
		!N->isVPOpcode() && (LegalOperations && TLI.isFMADLegal(DAG, N));

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA =		bool HasFMA =
TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT) &&		TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT) &&
(!LegalOperations \|\| matcher.isOperationLegalOrCustom(ISD::FMA, VT));		(!LegalOperations \|\| matcher->isOperationLegalOrCustom(ISD::FMA, VT));

// No valid opcode, do not combine.		// No valid opcode, do not combine.
if (!HasFMAD && !HasFMA)		if (!HasFMAD && !HasFMA)
return SDValue();		return SDValue();

bool CanReassociate =		bool CanReassociate =
Options.UnsafeFPMath \|\| N->getFlags().hasAllowReassociation();		Options.UnsafeFPMath \|\| N->getFlags().hasAllowReassociation();
bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|		bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|
Options.UnsafeFPMath \|\| HasFMAD);		Options.UnsafeFPMath \|\| HasFMAD);
// If the addition is not contractable, do not combine.		// If the addition is not contractable, do not combine.
if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())		if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())
return SDValue();		return SDValue();

if (TLI.generateFMAsInMachineCombiner(VT, OptLevel))		if (TLI.generateFMAsInMachineCombiner(VT, OptLevel))
return SDValue();		return SDValue();

// Always prefer FMAD to FMA for precision.		// Always prefer FMAD to FMA for precision.
unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
bool Aggressive = TLI.enableAggressiveFMAFusion(VT);		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);

auto isFusedOp = [&](SDValue N) {		auto isFusedOp = [&](SDValue N) {
return matcher.match(N, ISD::FMA) \|\| matcher.match(N, ISD::FMAD);		return matcher->match(N, ISD::FMA) \|\| matcher->match(N, ISD::FMAD);
};		};

// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally, &matcher](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally, &matcher](SDValue N) {
if (!matcher.match(N, ISD::FMUL))		if (!matcher->match(N, ISD::FMUL))
return false;		return false;
return AllowFusionGlobally \|\| N->getFlags().hasAllowContract();		return AllowFusionGlobally \|\| N->getFlags().hasAllowContract();
};		};
// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),		// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),
// prefer to fold the multiply with fewer uses.		// prefer to fold the multiply with fewer uses.
if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {		if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {
if (N0->use_size() > N1->use_size())		if (N0->use_size() > N1->use_size())
std::swap(N0, N1);		std::swap(N0, N1);
}		}

// fold (fadd (fmul x, y), z) -> (fma x, y, z)		// fold (fadd (fmul x, y), z) -> (fma x, y, z)
if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {		if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {
return matcher.getNode(PreferredFusedOpcode, SL, VT, N0.getOperand(0),		return matcher->getNode(PreferredFusedOpcode, SL, VT, N0.getOperand(0),
N0.getOperand(1), N1);		N0.getOperand(1), N1);
}		}

// fold (fadd x, (fmul y, z)) -> (fma y, z, x)		// fold (fadd x, (fmul y, z)) -> (fma y, z, x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {		if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {
return matcher.getNode(PreferredFusedOpcode, SL, VT, N1.getOperand(0),		return matcher->getNode(PreferredFusedOpcode, SL, VT, N1.getOperand(0),
N1.getOperand(1), N0);		N1.getOperand(1), N0);
}		}

// fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)		// fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
// fadd E, (fma A, B, (fmul C, D)) --> fma A, B, (fma C, D, E)		// fadd E, (fma A, B, (fmul C, D)) --> fma A, B, (fma C, D, E)
// This also works with nested fma instructions:		// This also works with nested fma instructions:
// fadd (fma A, B, (fma (C, D, (fmul (E, F))))), G -->		// fadd (fma A, B, (fma (C, D, (fmul (E, F))))), G -->
// fma A, B, (fma C, D, fma (E, F, G))		// fma A, B, (fma C, D, fma (E, F, G))
// fadd (G, (fma A, B, (fma (C, D, (fmul (E, F)))))) -->		// fadd (G, (fma A, B, (fma (C, D, (fmul (E, F)))))) -->
Show All 24 Lines	while (E && isFusedOp(TmpFMA) && TmpFMA.hasOneUse()) {

TmpFMA = TmpFMA->getOperand(2);		TmpFMA = TmpFMA->getOperand(2);
}		}
}		}

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.

// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)		// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
if (matcher.match(N0, ISD::FP_EXTEND)) {		if (matcher->match(N0, ISD::FP_EXTEND)) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isContractableFMUL(N00) &&		if (isContractableFMUL(N00) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N00.getValueType())) {		N00.getValueType())) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)), N1);		matcher->getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)), N1);
}		}
}		}

// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)		// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (matcher.match(N1, ISD::FP_EXTEND)) {		if (matcher->match(N1, ISD::FP_EXTEND)) {
SDValue N10 = N1.getOperand(0);		SDValue N10 = N1.getOperand(0);
if (isContractableFMUL(N10) &&		if (isContractableFMUL(N10) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N10.getValueType())) {		N10.getValueType())) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(1)), N0);		matcher->getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(1)), N0);
}		}
}		}

// More folding opportunities when target permits.		// More folding opportunities when target permits.
if (Aggressive) {		if (Aggressive) {
// fold (fadd (fma x, y, (fpext (fmul u, v))), z)		// fold (fadd (fma x, y, (fpext (fmul u, v))), z)
// -> (fma x, y, (fma (fpext u), (fpext v), z))		// -> (fma x, y, (fma (fpext u), (fpext v), z))
auto FoldFAddFMAFPExtFMul = [&](SDValue X, SDValue Y, SDValue U, SDValue V,		auto FoldFAddFMAFPExtFMul = [&](SDValue X, SDValue Y, SDValue U, SDValue V,
SDValue Z) {		SDValue Z) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT, X, Y,		PreferredFusedOpcode, SL, VT, X, Y,
matcher.getNode(PreferredFusedOpcode, SL, VT,		matcher->getNode(PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher->getNode(ISD::FP_EXTEND, SL, VT, U),
matcher.getNode(ISD::FP_EXTEND, SL, VT, V), Z));		matcher->getNode(ISD::FP_EXTEND, SL, VT, V), Z));
};		};
if (isFusedOp(N0)) {		if (isFusedOp(N0)) {
SDValue N02 = N0.getOperand(2);		SDValue N02 = N0.getOperand(2);
if (matcher.match(N02, ISD::FP_EXTEND)) {		if (matcher->match(N02, ISD::FP_EXTEND)) {
SDValue N020 = N02.getOperand(0);		SDValue N020 = N02.getOperand(0);
if (isContractableFMUL(N020) &&		if (isContractableFMUL(N020) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N020.getValueType())) {		N020.getValueType())) {
return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),		return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),
N020.getOperand(0), N020.getOperand(1),		N020.getOperand(0), N020.getOperand(1),
N1);		N1);
}		}
}		}
}		}

// fold (fadd (fpext (fma x, y, (fmul u, v))), z)		// fold (fadd (fpext (fma x, y, (fmul u, v))), z)
// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))		// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))
// FIXME: This turns two single-precision and one double-precision		// FIXME: This turns two single-precision and one double-precision
// operation into two double-precision operations, which might not be		// operation into two double-precision operations, which might not be
// interesting for all targets, especially GPUs.		// interesting for all targets, especially GPUs.
auto FoldFAddFPExtFMAFMul = [&](SDValue X, SDValue Y, SDValue U, SDValue V,		auto FoldFAddFPExtFMAFMul = [&](SDValue X, SDValue Y, SDValue U, SDValue V,
SDValue Z) {		SDValue Z) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, X),		matcher->getNode(ISD::FP_EXTEND, SL, VT, X),
matcher.getNode(ISD::FP_EXTEND, SL, VT, Y),		matcher->getNode(ISD::FP_EXTEND, SL, VT, Y),
matcher.getNode(PreferredFusedOpcode, SL, VT,		matcher->getNode(PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher->getNode(ISD::FP_EXTEND, SL, VT, U),
matcher.getNode(ISD::FP_EXTEND, SL, VT, V), Z));		matcher->getNode(ISD::FP_EXTEND, SL, VT, V), Z));
};		};
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isFusedOp(N00)) {		if (isFusedOp(N00)) {
SDValue N002 = N00.getOperand(2);		SDValue N002 = N00.getOperand(2);
if (isContractableFMUL(N002) &&		if (isContractableFMUL(N002) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N00.getValueType())) {		N00.getValueType())) {
Show All 39 Lines	if (N1.getOpcode() == ISD::FP_EXTEND) {
}		}
}		}
}		}

return SDValue();		return SDValue();
}		}

/// Try to perform FMA combining on a given FSUB node.		/// Try to perform FMA combining on a given FSUB node.
template <class MatchContextClass>
SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);
MatchContextClass matcher(DAG, TLI, N);		std::unique_ptr<MatchContext> matcher = MatchContext::get(DAG, TLI, N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;

bool UseVP = std::is_same_v<MatchContextClass, VPMatchContext>;

// Floating-point multiply-add with intermediate rounding.		// Floating-point multiply-add with intermediate rounding.
// FIXME: Make isFMADLegal have specific behavior when using VPMatchContext.		// FIXME: Make isFMADLegal have specific behavior when N is a vp node.
// FIXME: Add VP_FMAD opcode.		// FIXME: Add VP_FMAD opcode.
bool HasFMAD = !UseVP && (LegalOperations && TLI.isFMADLegal(DAG, N));		bool HasFMAD =
		!N->isVPOpcode() && (LegalOperations && TLI.isFMADLegal(DAG, N));

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA =		bool HasFMA =
TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT) &&		TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT) &&
(!LegalOperations \|\| matcher.isOperationLegalOrCustom(ISD::FMA, VT));		(!LegalOperations \|\| matcher->isOperationLegalOrCustom(ISD::FMA, VT));

// No valid opcode, do not combine.		// No valid opcode, do not combine.
if (!HasFMAD && !HasFMA)		if (!HasFMAD && !HasFMA)
return SDValue();		return SDValue();

const SDNodeFlags Flags = N->getFlags();		const SDNodeFlags Flags = N->getFlags();
bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|		bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|
Options.UnsafeFPMath \|\| HasFMAD);		Options.UnsafeFPMath \|\| HasFMAD);

// If the subtraction is not contractable, do not combine.		// If the subtraction is not contractable, do not combine.
if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())		if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())
return SDValue();		return SDValue();

if (TLI.generateFMAsInMachineCombiner(VT, OptLevel))		if (TLI.generateFMAsInMachineCombiner(VT, OptLevel))
return SDValue();		return SDValue();

// Always prefer FMAD to FMA for precision.		// Always prefer FMAD to FMA for precision.
unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
bool Aggressive = TLI.enableAggressiveFMAFusion(VT);		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);
bool NoSignedZero = Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros();		bool NoSignedZero = Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros();

// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally, &matcher](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally, &matcher](SDValue N) {
if (!matcher.match(N, ISD::FMUL))		if (!matcher->match(N, ISD::FMUL))
return false;		return false;
return AllowFusionGlobally \|\| N->getFlags().hasAllowContract();		return AllowFusionGlobally \|\| N->getFlags().hasAllowContract();
};		};

// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))		// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))
auto tryToFoldXYSubZ = [&](SDValue XY, SDValue Z) {		auto tryToFoldXYSubZ = [&](SDValue XY, SDValue Z) {
if (isContractableFMUL(XY) && (Aggressive \|\| XY->hasOneUse())) {		if (isContractableFMUL(XY) && (Aggressive \|\| XY->hasOneUse())) {
return matcher.getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0),		return matcher->getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0),
XY.getOperand(1),		XY.getOperand(1),
matcher.getNode(ISD::FNEG, SL, VT, Z));		matcher->getNode(ISD::FNEG, SL, VT, Z));
}		}
return SDValue();		return SDValue();
};		};

// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)		// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)
// Note: Commutes FSUB operands.		// Note: Commutes FSUB operands.
auto tryToFoldXSubYZ = [&](SDValue X, SDValue YZ) {		auto tryToFoldXSubYZ = [&](SDValue X, SDValue YZ) {
if (isContractableFMUL(YZ) && (Aggressive \|\| YZ->hasOneUse())) {		if (isContractableFMUL(YZ) && (Aggressive \|\| YZ->hasOneUse())) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT, YZ.getOperand(0)),		matcher->getNode(ISD::FNEG, SL, VT, YZ.getOperand(0)),
YZ.getOperand(1), X);		YZ.getOperand(1), X);
}		}
return SDValue();		return SDValue();
};		};

// If we have two choices trying to fold (fsub (fmul u, v), (fmul x, y)),		// If we have two choices trying to fold (fsub (fmul u, v), (fmul x, y)),
// prefer to fold the multiply with fewer uses.		// prefer to fold the multiply with fewer uses.
if (isContractableFMUL(N0) && isContractableFMUL(N1) &&		if (isContractableFMUL(N0) && isContractableFMUL(N1) &&
Show All 9 Lines	if (isContractableFMUL(N0) && isContractableFMUL(N1) &&
if (SDValue V = tryToFoldXYSubZ(N0, N1))		if (SDValue V = tryToFoldXYSubZ(N0, N1))
return V;		return V;
// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)		// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)
if (SDValue V = tryToFoldXSubYZ(N0, N1))		if (SDValue V = tryToFoldXSubYZ(N0, N1))
return V;		return V;
}		}

// fold (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))		// fold (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))
if (matcher.match(N0, ISD::FNEG) && isContractableFMUL(N0.getOperand(0)) &&		if (matcher->match(N0, ISD::FNEG) && isContractableFMUL(N0.getOperand(0)) &&
(Aggressive \|\| (N0->hasOneUse() && N0.getOperand(0).hasOneUse()))) {		(Aggressive \|\| (N0->hasOneUse() && N0.getOperand(0).hasOneUse()))) {
SDValue N00 = N0.getOperand(0).getOperand(0);		SDValue N00 = N0.getOperand(0).getOperand(0);
SDValue N01 = N0.getOperand(0).getOperand(1);		SDValue N01 = N0.getOperand(0).getOperand(1);
return matcher.getNode(PreferredFusedOpcode, SL, VT,		return matcher->getNode(PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT, N00), N01,		matcher->getNode(ISD::FNEG, SL, VT, N00), N01,
matcher.getNode(ISD::FNEG, SL, VT, N1));		matcher->getNode(ISD::FNEG, SL, VT, N1));
}		}

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.

// fold (fsub (fpext (fmul x, y)), z)		// fold (fsub (fpext (fmul x, y)), z)
// -> (fma (fpext x), (fpext y), (fneg z))		// -> (fma (fpext x), (fpext y), (fneg z))
if (matcher.match(N0, ISD::FP_EXTEND)) {		if (matcher->match(N0, ISD::FP_EXTEND)) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isContractableFMUL(N00) &&		if (isContractableFMUL(N00) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N00.getValueType())) {		N00.getValueType())) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),
matcher.getNode(ISD::FNEG, SL, VT, N1));		matcher->getNode(ISD::FNEG, SL, VT, N1));
}		}
}		}

// fold (fsub x, (fpext (fmul y, z)))		// fold (fsub x, (fpext (fmul y, z)))
// -> (fma (fneg (fpext y)), (fpext z), x)		// -> (fma (fneg (fpext y)), (fpext z), x)
// Note: Commutes FSUB operands.		// Note: Commutes FSUB operands.
if (matcher.match(N1, ISD::FP_EXTEND)) {		if (matcher->match(N1, ISD::FP_EXTEND)) {
SDValue N10 = N1.getOperand(0);		SDValue N10 = N1.getOperand(0);
if (isContractableFMUL(N10) &&		if (isContractableFMUL(N10) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N10.getValueType())) {		N10.getValueType())) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(		matcher->getNode(
ISD::FNEG, SL, VT,		ISD::FNEG, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(0))),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(0))),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(1)), N0);		matcher->getNode(ISD::FP_EXTEND, SL, VT, N10.getOperand(1)), N0);
}		}
}		}

// fold (fsub (fpext (fneg (fmul, x, y))), z)		// fold (fsub (fpext (fneg (fmul, x, y))), z)
// -> (fneg (fma (fpext x), (fpext y), z))		// -> (fneg (fma (fpext x), (fpext y), z))
// Note: This could be removed with appropriate canonicalization of the		// Note: This could be removed with appropriate canonicalization of the
// input expression into (fneg (fadd (fpext (fmul, x, y)), z). However, the		// input expression into (fneg (fadd (fpext (fmul, x, y)), z). However, the
// orthogonal flags -fp-contract=fast and -enable-unsafe-fp-math prevent		// orthogonal flags -fp-contract=fast and -enable-unsafe-fp-math prevent
// from implementing the canonicalization in visitFSUB.		// from implementing the canonicalization in visitFSUB.
if (matcher.match(N0, ISD::FP_EXTEND)) {		if (matcher->match(N0, ISD::FP_EXTEND)) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (matcher.match(N00, ISD::FNEG)) {		if (matcher->match(N00, ISD::FNEG)) {
SDValue N000 = N00.getOperand(0);		SDValue N000 = N00.getOperand(0);
if (isContractableFMUL(N000) &&		if (isContractableFMUL(N000) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N00.getValueType())) {		N00.getValueType())) {
return matcher.getNode(		return matcher->getNode(
ISD::FNEG, SL, VT,		ISD::FNEG, SL, VT,
matcher.getNode(		matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(1)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(1)),
N1));		N1));
}		}
}		}
}		}

// fold (fsub (fneg (fpext (fmul, x, y))), z)		// fold (fsub (fneg (fpext (fmul, x, y))), z)
// -> (fneg (fma (fpext x)), (fpext y), z)		// -> (fneg (fma (fpext x)), (fpext y), z)
// Note: This could be removed with appropriate canonicalization of the		// Note: This could be removed with appropriate canonicalization of the
// input expression into (fneg (fadd (fpext (fmul, x, y)), z). However, the		// input expression into (fneg (fadd (fpext (fmul, x, y)), z). However, the
// orthogonal flags -fp-contract=fast and -enable-unsafe-fp-math prevent		// orthogonal flags -fp-contract=fast and -enable-unsafe-fp-math prevent
// from implementing the canonicalization in visitFSUB.		// from implementing the canonicalization in visitFSUB.
if (matcher.match(N0, ISD::FNEG)) {		if (matcher->match(N0, ISD::FNEG)) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (matcher.match(N00, ISD::FP_EXTEND)) {		if (matcher->match(N00, ISD::FP_EXTEND)) {
SDValue N000 = N00.getOperand(0);		SDValue N000 = N00.getOperand(0);
if (isContractableFMUL(N000) &&		if (isContractableFMUL(N000) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N000.getValueType())) {		N000.getValueType())) {
return matcher.getNode(		return matcher->getNode(
ISD::FNEG, SL, VT,		ISD::FNEG, SL, VT,
matcher.getNode(		matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(1)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N000.getOperand(1)),
N1));		N1));
}		}
}		}
}		}

auto isReassociable = [Options](SDNode *N) {		auto isReassociable = [Options](SDNode *N) {
return Options.UnsafeFPMath \|\| N->getFlags().hasAllowReassociation();		return Options.UnsafeFPMath \|\| N->getFlags().hasAllowReassociation();
};		};

auto isContractableAndReassociableFMUL = [&isContractableFMUL,		auto isContractableAndReassociableFMUL = [&isContractableFMUL,
&isReassociable](SDValue N) {		&isReassociable](SDValue N) {
return isContractableFMUL(N) && isReassociable(N.getNode());		return isContractableFMUL(N) && isReassociable(N.getNode());
};		};

auto isFusedOp = [&](SDValue N) {		auto isFusedOp = [&](SDValue N) {
return matcher.match(N, ISD::FMA) \|\| matcher.match(N, ISD::FMAD);		return matcher->match(N, ISD::FMA) \|\| matcher->match(N, ISD::FMAD);
};		};

// More folding opportunities when target permits.		// More folding opportunities when target permits.
if (Aggressive && isReassociable(N)) {		if (Aggressive && isReassociable(N)) {
bool CanFuse = Options.UnsafeFPMath \|\| N->getFlags().hasAllowContract();		bool CanFuse = Options.UnsafeFPMath \|\| N->getFlags().hasAllowContract();
// fold (fsub (fma x, y, (fmul u, v)), z)		// fold (fsub (fma x, y, (fmul u, v)), z)
// -> (fma x, y (fma u, v, (fneg z)))		// -> (fma x, y (fma u, v, (fneg z)))
if (CanFuse && isFusedOp(N0) &&		if (CanFuse && isFusedOp(N0) &&
isContractableAndReassociableFMUL(N0.getOperand(2)) &&		isContractableAndReassociableFMUL(N0.getOperand(2)) &&
N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {		N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT, N0.getOperand(0), N0.getOperand(1),		PreferredFusedOpcode, SL, VT, N0.getOperand(0), N0.getOperand(1),
matcher.getNode(PreferredFusedOpcode, SL, VT,		matcher->getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(2).getOperand(0),		N0.getOperand(2).getOperand(0),
N0.getOperand(2).getOperand(1),		N0.getOperand(2).getOperand(1),
matcher.getNode(ISD::FNEG, SL, VT, N1)));		matcher->getNode(ISD::FNEG, SL, VT, N1)));
}		}

// fold (fsub x, (fma y, z, (fmul u, v)))		// fold (fsub x, (fma y, z, (fmul u, v)))
// -> (fma (fneg y), z, (fma (fneg u), v, x))		// -> (fma (fneg y), z, (fma (fneg u), v, x))
if (CanFuse && isFusedOp(N1) &&		if (CanFuse && isFusedOp(N1) &&
isContractableAndReassociableFMUL(N1.getOperand(2)) &&		isContractableAndReassociableFMUL(N1.getOperand(2)) &&
N1->hasOneUse() && NoSignedZero) {		N1->hasOneUse() && NoSignedZero) {
SDValue N20 = N1.getOperand(2).getOperand(0);		SDValue N20 = N1.getOperand(2).getOperand(0);
SDValue N21 = N1.getOperand(2).getOperand(1);		SDValue N21 = N1.getOperand(2).getOperand(1);
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT, N1.getOperand(0)),		matcher->getNode(ISD::FNEG, SL, VT, N1.getOperand(0)),
N1.getOperand(1),		N1.getOperand(1),
matcher.getNode(PreferredFusedOpcode, SL, VT,		matcher->getNode(PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT, N20), N21, N0));		matcher->getNode(ISD::FNEG, SL, VT, N20), N21, N0));
}		}

// fold (fsub (fma x, y, (fpext (fmul u, v))), z)		// fold (fsub (fma x, y, (fpext (fmul u, v))), z)
// -> (fma x, y (fma (fpext u), (fpext v), (fneg z)))		// -> (fma x, y (fma (fpext u), (fpext v), (fneg z)))
if (isFusedOp(N0) && N0->hasOneUse()) {		if (isFusedOp(N0) && N0->hasOneUse()) {
SDValue N02 = N0.getOperand(2);		SDValue N02 = N0.getOperand(2);
if (matcher.match(N02, ISD::FP_EXTEND)) {		if (matcher->match(N02, ISD::FP_EXTEND)) {
SDValue N020 = N02.getOperand(0);		SDValue N020 = N02.getOperand(0);
if (isContractableAndReassociableFMUL(N020) &&		if (isContractableAndReassociableFMUL(N020) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N020.getValueType())) {		N020.getValueType())) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT, N0.getOperand(0), N0.getOperand(1),		PreferredFusedOpcode, SL, VT, N0.getOperand(0), N0.getOperand(1),
matcher.getNode(		matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N020.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N020.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N020.getOperand(1)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N020.getOperand(1)),
matcher.getNode(ISD::FNEG, SL, VT, N1)));		matcher->getNode(ISD::FNEG, SL, VT, N1)));
}		}
}		}
}		}

// fold (fsub (fpext (fma x, y, (fmul u, v))), z)		// fold (fsub (fpext (fma x, y, (fmul u, v))), z)
// -> (fma (fpext x), (fpext y),		// -> (fma (fpext x), (fpext y),
// (fma (fpext u), (fpext v), (fneg z)))		// (fma (fpext u), (fpext v), (fneg z)))
// FIXME: This turns two single-precision and one double-precision		// FIXME: This turns two single-precision and one double-precision
// operation into two double-precision operations, which might not be		// operation into two double-precision operations, which might not be
// interesting for all targets, especially GPUs.		// interesting for all targets, especially GPUs.
if (matcher.match(N0, ISD::FP_EXTEND)) {		if (matcher->match(N0, ISD::FP_EXTEND)) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isFusedOp(N00)) {		if (isFusedOp(N00)) {
SDValue N002 = N00.getOperand(2);		SDValue N002 = N00.getOperand(2);
if (isContractableAndReassociableFMUL(N002) &&		if (isContractableAndReassociableFMUL(N002) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N00.getValueType())) {		N00.getValueType())) {
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),
matcher.getNode(		matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N002.getOperand(0)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N002.getOperand(0)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N002.getOperand(1)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N002.getOperand(1)),
matcher.getNode(ISD::FNEG, SL, VT, N1)));		matcher->getNode(ISD::FNEG, SL, VT, N1)));
}		}
}		}
}		}

// fold (fsub x, (fma y, z, (fpext (fmul u, v))))		// fold (fsub x, (fma y, z, (fpext (fmul u, v))))
// -> (fma (fneg y), z, (fma (fneg (fpext u)), (fpext v), x))		// -> (fma (fneg y), z, (fma (fneg (fpext u)), (fpext v), x))
if (isFusedOp(N1) && matcher.match(N1.getOperand(2), ISD::FP_EXTEND) &&		if (isFusedOp(N1) && matcher->match(N1.getOperand(2), ISD::FP_EXTEND) &&
N1->hasOneUse()) {		N1->hasOneUse()) {
SDValue N120 = N1.getOperand(2).getOperand(0);		SDValue N120 = N1.getOperand(2).getOperand(0);
if (isContractableAndReassociableFMUL(N120) &&		if (isContractableAndReassociableFMUL(N120) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N120.getValueType())) {		N120.getValueType())) {
SDValue N1200 = N120.getOperand(0);		SDValue N1200 = N120.getOperand(0);
SDValue N1201 = N120.getOperand(1);		SDValue N1201 = N120.getOperand(1);
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT, N1.getOperand(0)),		matcher->getNode(ISD::FNEG, SL, VT, N1.getOperand(0)),
N1.getOperand(1),		N1.getOperand(1),
matcher.getNode(		matcher->getNode(PreferredFusedOpcode, SL, VT,
PreferredFusedOpcode, SL, VT,		matcher->getNode(ISD::FNEG, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT,		matcher->getNode(ISD::FP_EXTEND,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N1200)),		SL, VT, N1200)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N1201), N0));		matcher->getNode(ISD::FP_EXTEND, SL, VT, N1201),
		N0));
}		}
}		}

// fold (fsub x, (fpext (fma y, z, (fmul u, v))))		// fold (fsub x, (fpext (fma y, z, (fmul u, v))))
// -> (fma (fneg (fpext y)), (fpext z),		// -> (fma (fneg (fpext y)), (fpext z),
// (fma (fneg (fpext u)), (fpext v), x))		// (fma (fneg (fpext u)), (fpext v), x))
// FIXME: This turns two single-precision and one double-precision		// FIXME: This turns two single-precision and one double-precision
// operation into two double-precision operations, which might not be		// operation into two double-precision operations, which might not be
// interesting for all targets, especially GPUs.		// interesting for all targets, especially GPUs.
if (matcher.match(N1, ISD::FP_EXTEND) && isFusedOp(N1.getOperand(0))) {		if (matcher->match(N1, ISD::FP_EXTEND) && isFusedOp(N1.getOperand(0))) {
SDValue CvtSrc = N1.getOperand(0);		SDValue CvtSrc = N1.getOperand(0);
SDValue N100 = CvtSrc.getOperand(0);		SDValue N100 = CvtSrc.getOperand(0);
SDValue N101 = CvtSrc.getOperand(1);		SDValue N101 = CvtSrc.getOperand(1);
SDValue N102 = CvtSrc.getOperand(2);		SDValue N102 = CvtSrc.getOperand(2);
if (isContractableAndReassociableFMUL(N102) &&		if (isContractableAndReassociableFMUL(N102) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
CvtSrc.getValueType())) {		CvtSrc.getValueType())) {
SDValue N1020 = N102.getOperand(0);		SDValue N1020 = N102.getOperand(0);
SDValue N1021 = N102.getOperand(1);		SDValue N1021 = N102.getOperand(1);
return matcher.getNode(		return matcher->getNode(
PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N100)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N101),
matcher.getNode(
PreferredFusedOpcode, SL, VT,		PreferredFusedOpcode, SL, VT,
matcher.getNode(ISD::FNEG, SL, VT,		matcher->getNode(ISD::FNEG, SL, VT,
matcher.getNode(ISD::FP_EXTEND, SL, VT, N1020)),		matcher->getNode(ISD::FP_EXTEND, SL, VT, N100)),
matcher.getNode(ISD::FP_EXTEND, SL, VT, N1021), N0));		matcher->getNode(ISD::FP_EXTEND, SL, VT, N101),
		matcher->getNode(PreferredFusedOpcode, SL, VT,
		matcher->getNode(ISD::FNEG, SL, VT,
		matcher->getNode(ISD::FP_EXTEND,
		SL, VT, N1020)),
		matcher->getNode(ISD::FP_EXTEND, SL, VT, N1021),
		N0));
}		}
}		}
}		}

return SDValue();		return SDValue();
}		}

/// Try to perform FMA combining on a given FMUL node based on the distributive		/// Try to perform FMA combining on a given FMUL node based on the distributive
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMULForFMADistributiveCombine(SDNode *N) {

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitVP_FADD(SDNode *N) {		SDValue DAGCombiner::visitVP_FADD(SDNode *N) {
SelectionDAG::FlagInserter FlagsInserter(DAG, N);		SelectionDAG::FlagInserter FlagsInserter(DAG, N);

// FADD -> FMA combines:		// FADD -> FMA combines:
if (SDValue Fused = visitFADDForFMACombine<VPMatchContext>(N)) {		if (SDValue Fused = visitFADDForFMACombine(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}
return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFADD(SDNode *N) {		SDValue DAGCombiner::visitFADD(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|

// Fold fadd(vecreduce(x), vecreduce(y)) -> vecreduce(fadd(x, y))		// Fold fadd(vecreduce(x), vecreduce(y)) -> vecreduce(fadd(x, y))
if (SDValue SD = reassociateReduction(ISD::VECREDUCE_FADD, ISD::FADD, DL,		if (SDValue SD = reassociateReduction(ISD::VECREDUCE_FADD, ISD::FADD, DL,
VT, N0, N1, Flags))		VT, N0, N1, Flags))
return SD;		return SD;
} // enable-unsafe-fp-math		} // enable-unsafe-fp-math

// FADD -> FMA combines:		// FADD -> FMA combines:
if (SDValue Fused = visitFADDForFMACombine<EmptyMatchContext>(N)) {		if (SDValue Fused = visitFADDForFMACombine(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}
return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSTRICT_FADD(SDNode *N) {		SDValue DAGCombiner::visitSTRICT_FADD(SDNode *N) {
SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUB(SDNode *N) {
}		}

// fold (fsub A, (fneg B)) -> (fadd A, B)		// fold (fsub A, (fneg B)) -> (fadd A, B)
if (SDValue NegN1 =		if (SDValue NegN1 =
TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize))		TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize))
return DAG.getNode(ISD::FADD, DL, VT, N0, NegN1);		return DAG.getNode(ISD::FADD, DL, VT, N0, NegN1);

// FSUB -> FMA combines:		// FSUB -> FMA combines:
if (SDValue Fused = visitFSUBForFMACombine<EmptyMatchContext>(N)) {		if (SDValue Fused = visitFSUBForFMACombine(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFMUL(SDNode *N) {		SDValue DAGCombiner::visitFMUL(SDNode *N) {
▲ Show 20 Lines • Show All 9,381 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitVECREDUCE(SDNode *N) {

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitVP_FSUB(SDNode *N) {		SDValue DAGCombiner::visitVP_FSUB(SDNode *N) {
SelectionDAG::FlagInserter FlagsInserter(DAG, N);		SelectionDAG::FlagInserter FlagsInserter(DAG, N);

// FSUB -> FMA combines:		// FSUB -> FMA combines:
if (SDValue Fused = visitFSUBForFMACombine<VPMatchContext>(N)) {		if (SDValue Fused = visitFSUBForFMACombine(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}
return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitVPOp(SDNode *N) {		SDValue DAGCombiner::visitVPOp(SDNode *N) {

▲ Show 20 Lines • Show All 1,692 Lines • Show Last 20 Lines