This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Improve FMA support for interpolation patterns
ClosedPublic

Authored by RKSimon on Sep 20 2015, 11:00 AM.

Download Raw Diff

Details

Reviewers

spatel
delena
arsenm
hfinkel

Commits

rG4003ed2da300: [DAGCombiner] Improve FMA support for interpolation patterns
rL248210: [DAGCombiner] Improve FMA support for interpolation patterns

Summary

This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents.

This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t))))

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 35206.Sep 20 2015, 11:00 AM

RKSimon retitled this revision from to [DAGCombiner] Improve FMA support for interpolation patterns.

RKSimon updated this object.

RKSimon added reviewers: hfinkel, arsenm, spatel, delena.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

This mostly LGTM.

There aren't any tests stressing the FMAD path. AMDGPU seems to be only target using it still, and the one test change is in the expansion of an intrinsic which should be removed. If you can add some of those that would be good, otherwise I can try to do it after you commit

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
96	A better name would be AllowFusion or something like that
103–105	I think the AllowFusion/UnsafeFPMath check should be first
112	Usually the int is omitted
113	It seems wrong to use this in the FMAD case, although AMDGPU happens to not care because enableAggressiveFMAFusion always reports true and it seems to be what is used already.

There aren't any tests stressing the FMAD path. AMDGPU seems to be only target using it still, and the one test change is in the expansion of an intrinsic which should be removed. If you can add some of those that would be good, otherwise I can try to do it after you commit

Thanks Matt, I can add some FMAD tests for v_mad_f32 - is that the only instruction I should be testing for?

Most of your comments about the preamble are just as relevant for the other FMA pattern combines (visitFADDForFMACombine, visitFSUBForFMACombine); given that I copied+pasted most of it from them should they be updated as well?

In D13003#249540, @RKSimon wrote:

There aren't any tests stressing the FMAD path. AMDGPU seems to be only target using it still, and the one test change is in the expansion of an intrinsic which should be removed. If you can add some of those that would be good, otherwise I can try to do it after you commit

Thanks Matt, I can add some FMAD tests for v_mad_f32 - is that the only instruction I should be testing for?

Yes. The fneg should be folded in as a source modifier that looks something like v_mad_f32 v0, v1, v2, -v3. Sometimes v_mac_f32 is used, although in these cases that shouldn't happen

Most of your comments about the preamble are just as relevant for the other FMA pattern combines (visitFADDForFMACombine, visitFSUBForFMACombine); given that I copied+pasted most of it from them should they be updated as well?

Yes, probably

Updated all FMA combine helpers based on Matt's feedback.

Added AMDGPU FMA/FMAD tests

LGTM, although I think you should split the renames in the other parts into a separate patch

This revision is now accepted and ready to land.Sep 21 2015, 9:45 AM

RKSimon mentioned this in rL248206: [DAGCombiner] Tidy up FMA combine helpers. NFCI..Sep 21 2015, 1:16 PM

Closed by commit rL248210: [DAGCombiner] Improve FMA support for interpolation patterns (authored by RKSimon). · Explain WhySep 21 2015, 1:34 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

	DAGCombiner.cpp
	DAGCombiner.cpp (revision 248171)

135 lines

test/

CodeGen/

AMDGPU/

	fma-combine.ll
	fma-combine.ll (revision 248171)

200 lines

	llvm.amdgpu.lrp.ll
	llvm.amdgpu.lrp.ll (revision 248171)

2 lines

X86/

	fma_patterns.ll
	fma_patterns.ll (revision 248171)

305 lines

Diff 35267

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	class DAGCombiner {
const TargetLowering &TLI;		const TargetLowering &TLI;
CombineLevel Level;		CombineLevel Level;
CodeGenOpt::Level OptLevel;		CodeGenOpt::Level OptLevel;
bool LegalOperations;		bool LegalOperations;
bool LegalTypes;		bool LegalTypes;
bool ForCodeSize;		bool ForCodeSize;

/// \brief Worklist of all of the nodes that need to be simplified.		/// \brief Worklist of all of the nodes that need to be simplified.
///		///
		arsenmUnsubmitted Not Done Reply Inline Actions A better name would be AllowFusion or something like that arsenm: A better name would be AllowFusion or something like that
/// This must behave as a stack -- new nodes to process are pushed onto the		/// This must behave as a stack -- new nodes to process are pushed onto the
/// back and when processing we pop off of the back.		/// back and when processing we pop off of the back.
///		///
/// The worklist will not contain duplicates but may contain null entries		/// The worklist will not contain duplicates but may contain null entries
/// due to nodes being deleted from the underlying DAG.		/// due to nodes being deleted from the underlying DAG.
SmallVector<SDNode *, 64> Worklist;		SmallVector<SDNode *, 64> Worklist;

/// \brief Mapping from an SDNode to its position on the worklist.		/// \brief Mapping from an SDNode to its position on the worklist.
///		///
		arsenmUnsubmitted Not Done Reply Inline Actions I think the AllowFusion/UnsafeFPMath check should be first arsenm: I think the AllowFusion/UnsafeFPMath check should be first
/// This is used to find and remove nodes from the worklist (by nulling		/// This is used to find and remove nodes from the worklist (by nulling
/// them) when they are deleted from the underlying DAG. It relies on		/// them) when they are deleted from the underlying DAG. It relies on
/// stable indices of nodes within the worklist.		/// stable indices of nodes within the worklist.
DenseMap<SDNode *, unsigned> WorklistMap;		DenseMap<SDNode *, unsigned> WorklistMap;

/// \brief Set of nodes which have been combined (at least once).		/// \brief Set of nodes which have been combined (at least once).
///		///
		arsenmUnsubmitted Not Done Reply Inline Actions Usually the int is omitted arsenm: Usually the int is omitted
/// This is used to allow us to reliably add any operands of a DAG node		/// This is used to allow us to reliably add any operands of a DAG node
		arsenmUnsubmitted Not Done Reply Inline Actions It seems wrong to use this in the FMAD case, although AMDGPU happens to not care because enableAggressiveFMAFusion always reports true and it seems to be what is used already. arsenm: It seems wrong to use this in the FMAD case, although AMDGPU happens to not care because…
/// which have not yet been combined to the worklist.		/// which have not yet been combined to the worklist.
SmallPtrSet<SDNode *, 64> CombinedNodes;		SmallPtrSet<SDNode *, 64> CombinedNodes;

// AA - Used for DAG load/store alias analysis.		// AA - Used for DAG load/store alias analysis.
AliasAnalysis &AA;		AliasAnalysis &AA;

/// When an instruction is simplified, add all users of the instruction to		/// When an instruction is simplified, add all users of the instruction to
/// the work lists because they might get more simplified now.		/// the work lists because they might get more simplified now.
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	private:
SDValue visitMSTORE(SDNode *N);		SDValue visitMSTORE(SDNode *N);
SDValue visitMGATHER(SDNode *N);		SDValue visitMGATHER(SDNode *N);
SDValue visitMSCATTER(SDNode *N);		SDValue visitMSCATTER(SDNode *N);
SDValue visitFP_TO_FP16(SDNode *N);		SDValue visitFP_TO_FP16(SDNode *N);
SDValue visitFP16_TO_FP(SDNode *N);		SDValue visitFP16_TO_FP(SDNode *N);

SDValue visitFADDForFMACombine(SDNode *N);		SDValue visitFADDForFMACombine(SDNode *N);
SDValue visitFSUBForFMACombine(SDNode *N);		SDValue visitFSUBForFMACombine(SDNode *N);
		SDValue visitFMULForFMACombine(SDNode *N);

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
SDValue ReassociateOps(unsigned Opc, SDLoc DL, SDValue LHS, SDValue RHS);		SDValue ReassociateOps(unsigned Opc, SDLoc DL, SDValue LHS, SDValue RHS);

SDValue visitShiftByConstant(SDNode N, ConstantSDNode Amt);		SDValue visitShiftByConstant(SDNode N, ConstantSDNode Amt);

bool SimplifySelectOps(SDNode *SELECT, SDValue LHS, SDValue RHS);		bool SimplifySelectOps(SDNode *SELECT, SDValue LHS, SDValue RHS);
SDValue SimplifyBinOpWithSameOpcodeHands(SDNode *N);		SDValue SimplifyBinOpWithSameOpcodeHands(SDNode *N);
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	static SDValue GetNegatedExpression(SDValue Op, SelectionDAG &DAG,
if (Op.getOpcode() == ISD::FNEG) return Op.getOperand(0);		if (Op.getOpcode() == ISD::FNEG) return Op.getOperand(0);

// Don't allow anything with multiple uses.		// Don't allow anything with multiple uses.
assert(Op.hasOneUse() && "Unknown reuse!");		assert(Op.hasOneUse() && "Unknown reuse!");

assert(Depth <= 6 && "GetNegatedExpression doesn't match isNegatibleForFree");		assert(Depth <= 6 && "GetNegatedExpression doesn't match isNegatibleForFree");

const SDNodeFlags *Flags = Op.getNode()->getFlags();		const SDNodeFlags *Flags = Op.getNode()->getFlags();

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default: llvm_unreachable("Unknown code");		default: llvm_unreachable("Unknown code");
case ISD::ConstantFP: {		case ISD::ConstantFP: {
APFloat V = cast<ConstantFPSDNode>(Op)->getValueAPF();		APFloat V = cast<ConstantFPSDNode>(Op)->getValueAPF();
V.changeSign();		V.changeSign();
return DAG.getConstantFP(V, SDLoc(Op), Op.getValueType());		return DAG.getConstantFP(V, SDLoc(Op), Op.getValueType());
}		}
case ISD::FADD:		case ISD::FADD:
▲ Show 20 Lines • Show All 6,845 Lines • ▼ Show 20 Lines
/// Try to perform FMA combining on a given FADD node.		/// Try to perform FMA combining on a given FADD node.
SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);

const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
bool UnsafeFPMath = (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|		bool AllowFusion =
Options.UnsafeFPMath);		(Options.AllowFPOpFusion == FPOpFusion::Fast \|\| Options.UnsafeFPMath);

// Floating-point multiply-add with intermediate rounding.		// Floating-point multiply-add with intermediate rounding.
bool HasFMAD = (LegalOperations &&		bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));
TLI.isOperationLegal(ISD::FMAD, VT));

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA = ((!LegalOperations \|\|		bool HasFMA =
TLI.isOperationLegalOrCustom(ISD::FMA, VT)) &&		AllowFusion && TLI.isFMAFasterThanFMulAndFAdd(VT) &&
TLI.isFMAFasterThanFMulAndFAdd(VT) &&		(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FMA, VT));
UnsafeFPMath);

// No valid opcode, do not combine.		// No valid opcode, do not combine.
if (!HasFMAD && !HasFMA)		if (!HasFMAD && !HasFMA)
return SDValue();		return SDValue();

// Always prefer FMAD to FMA for precision.		// Always prefer FMAD to FMA for precision.
unsigned int PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
bool Aggressive = TLI.enableAggressiveFMAFusion(VT);		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);
bool LookThroughFPExt = TLI.isFPExtFree(VT);		bool LookThroughFPExt = TLI.isFPExtFree(VT);

// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),		// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),
// prefer to fold the multiply with fewer uses.		// prefer to fold the multiply with fewer uses.
if (Aggressive && N0.getOpcode() == ISD::FMUL &&		if (Aggressive && N0.getOpcode() == ISD::FMUL &&
N1.getOpcode() == ISD::FMUL) {		N1.getOpcode() == ISD::FMUL) {
if (N0.getNode()->use_size() > N1.getNode()->use_size())		if (N0.getNode()->use_size() > N1.getNode()->use_size())
Show All 11 Lines	SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (N1.getOpcode() == ISD::FMUL &&		if (N1.getOpcode() == ISD::FMUL &&
(Aggressive \|\| N1->hasOneUse())) {		(Aggressive \|\| N1->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1), N0);		N1.getOperand(0), N1.getOperand(1), N0);
}		}

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.
if (UnsafeFPMath && LookThroughFPExt) {		if (AllowFusion && LookThroughFPExt) {
// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)		// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (N00.getOpcode() == ISD::FMUL)		if (N00.getOpcode() == ISD::FMUL)
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		DAG.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(0)),		N00.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		DAG.getNode(ISD::FP_EXTEND, SL, VT,
Show All 9 Lines	if (N1.getOpcode() == ISD::FP_EXTEND) {
DAG.getNode(ISD::FP_EXTEND, SL, VT,		DAG.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(0)),		N10.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		DAG.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(1)), N0);		N10.getOperand(1)), N0);
}		}
}		}

// More folding opportunities when target permits.		// More folding opportunities when target permits.
if ((UnsafeFPMath \|\| HasFMAD) && Aggressive) {		if ((AllowFusion \|\| HasFMAD) && Aggressive) {
// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))		// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))
if (N0.getOpcode() == PreferredFusedOpcode &&		if (N0.getOpcode() == PreferredFusedOpcode &&
N0.getOperand(2).getOpcode() == ISD::FMUL) {		N0.getOperand(2).getOpcode() == ISD::FMUL) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1),		N0.getOperand(0), N0.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		DAG.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(2).getOperand(0),		N0.getOperand(2).getOperand(0),
N0.getOperand(2).getOperand(1),		N0.getOperand(2).getOperand(1),
N1));		N1));
}		}

// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))		// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))
if (N1->getOpcode() == PreferredFusedOpcode &&		if (N1->getOpcode() == PreferredFusedOpcode &&
N1.getOperand(2).getOpcode() == ISD::FMUL) {		N1.getOperand(2).getOpcode() == ISD::FMUL) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1),		N1.getOperand(0), N1.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		DAG.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(2).getOperand(0),		N1.getOperand(2).getOperand(0),
N1.getOperand(2).getOperand(1),		N1.getOperand(2).getOperand(1),
N0));		N0));
}		}

if (UnsafeFPMath && LookThroughFPExt) {		if (AllowFusion && LookThroughFPExt) {
// fold (fadd (fma x, y, (fpext (fmul u, v))), z)		// fold (fadd (fma x, y, (fpext (fmul u, v))), z)
// -> (fma x, y, (fma (fpext u), (fpext v), z))		// -> (fma x, y, (fma (fpext u), (fpext v), z))
auto FoldFAddFMAFPExtFMul = [&] (		auto FoldFAddFMAFPExtFMul = [&] (
SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z) {		SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z) {
return DAG.getNode(PreferredFusedOpcode, SL, VT, X, Y,		return DAG.getNode(PreferredFusedOpcode, SL, VT, X, Y,
DAG.getNode(PreferredFusedOpcode, SL, VT,		DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, U),		DAG.getNode(ISD::FP_EXTEND, SL, VT, U),
DAG.getNode(ISD::FP_EXTEND, SL, VT, V),		DAG.getNode(ISD::FP_EXTEND, SL, VT, V),
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
/// Try to perform FMA combining on a given FSUB node.		/// Try to perform FMA combining on a given FSUB node.
SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);

const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
bool UnsafeFPMath = (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|		bool AllowFusion =
Options.UnsafeFPMath);		(Options.AllowFPOpFusion == FPOpFusion::Fast \|\| Options.UnsafeFPMath);

// Floating-point multiply-add with intermediate rounding.		// Floating-point multiply-add with intermediate rounding.
bool HasFMAD = (LegalOperations &&		bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));
TLI.isOperationLegal(ISD::FMAD, VT));

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA = ((!LegalOperations \|\|		bool HasFMA =
TLI.isOperationLegalOrCustom(ISD::FMA, VT)) &&		AllowFusion && TLI.isFMAFasterThanFMulAndFAdd(VT) &&
TLI.isFMAFasterThanFMulAndFAdd(VT) &&		(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FMA, VT));
UnsafeFPMath);

// No valid opcode, do not combine.		// No valid opcode, do not combine.
if (!HasFMAD && !HasFMA)		if (!HasFMAD && !HasFMA)
return SDValue();		return SDValue();

// Always prefer FMAD to FMA for precision.		// Always prefer FMAD to FMA for precision.
unsigned int PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
bool Aggressive = TLI.enableAggressiveFMAFusion(VT);		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);
bool LookThroughFPExt = TLI.isFPExtFree(VT);		bool LookThroughFPExt = TLI.isFPExtFree(VT);

// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))		// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))
if (N0.getOpcode() == ISD::FMUL &&		if (N0.getOpcode() == ISD::FMUL &&
(Aggressive \|\| N0->hasOneUse())) {		(Aggressive \|\| N0->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1),		N0.getOperand(0), N0.getOperand(1),
Show All 16 Lines	if (N0.getOpcode() == ISD::FNEG &&
SDValue N00 = N0.getOperand(0).getOperand(0);		SDValue N00 = N0.getOperand(0).getOperand(0);
SDValue N01 = N0.getOperand(0).getOperand(1);		SDValue N01 = N0.getOperand(0).getOperand(1);
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FNEG, SL, VT, N00), N01,		DAG.getNode(ISD::FNEG, SL, VT, N00), N01,
DAG.getNode(ISD::FNEG, SL, VT, N1));		DAG.getNode(ISD::FNEG, SL, VT, N1));
}		}

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.
if (UnsafeFPMath && LookThroughFPExt) {		if (AllowFusion && LookThroughFPExt) {
// fold (fsub (fpext (fmul x, y)), z)		// fold (fsub (fpext (fmul x, y)), z)
// -> (fma (fpext x), (fpext y), (fneg z))		// -> (fma (fpext x), (fpext y), (fneg z))
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (N00.getOpcode() == ISD::FMUL)		if (N00.getOpcode() == ISD::FMUL)
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		DAG.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(0)),		N00.getOperand(0)),
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::FNEG) {
N1));		N1));
}		}
}		}
}		}

}		}

// More folding opportunities when target permits.		// More folding opportunities when target permits.
if ((UnsafeFPMath \|\| HasFMAD) && Aggressive) {		if ((AllowFusion \|\| HasFMAD) && Aggressive) {
// fold (fsub (fma x, y, (fmul u, v)), z)		// fold (fsub (fma x, y, (fmul u, v)), z)
// -> (fma x, y (fma u, v, (fneg z)))		// -> (fma x, y (fma u, v, (fneg z)))
if (N0.getOpcode() == PreferredFusedOpcode &&		if (N0.getOpcode() == PreferredFusedOpcode &&
N0.getOperand(2).getOpcode() == ISD::FMUL) {		N0.getOperand(2).getOpcode() == ISD::FMUL) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1),		N0.getOperand(0), N0.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		DAG.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(2).getOperand(0),		N0.getOperand(2).getOperand(0),
Show All 13 Lines	if (N1.getOpcode() == PreferredFusedOpcode &&
N1.getOperand(0)),		N1.getOperand(0)),
N1.getOperand(1),		N1.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FNEG, SL, VT, N20),		DAG.getNode(ISD::FNEG, SL, VT, N20),

N21, N0));		N21, N0));
}		}

if (UnsafeFPMath && LookThroughFPExt) {		if (AllowFusion && LookThroughFPExt) {
// fold (fsub (fma x, y, (fpext (fmul u, v))), z)		// fold (fsub (fma x, y, (fpext (fmul u, v))), z)
// -> (fma x, y (fma (fpext u), (fpext v), (fneg z)))		// -> (fma x, y (fma (fpext u), (fpext v), (fneg z)))
if (N0.getOpcode() == PreferredFusedOpcode) {		if (N0.getOpcode() == PreferredFusedOpcode) {
SDValue N02 = N0.getOperand(2);		SDValue N02 = N0.getOperand(2);
if (N02.getOpcode() == ISD::FP_EXTEND) {		if (N02.getOpcode() == ISD::FP_EXTEND) {
SDValue N020 = N02.getOperand(0);		SDValue N020 = N02.getOperand(0);
if (N020.getOpcode() == ISD::FMUL)		if (N020.getOpcode() == ISD::FMUL)
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	if (AllowFusion && LookThroughFPExt) {
}		}
}		}
}		}
}		}

return SDValue();		return SDValue();
}		}

		/// Try to perform FMA combining on a given FMUL node.
		SDValue DAGCombiner::visitFMULForFMACombine(SDNode *N) {
		SDValue N0 = N->getOperand(0);
		SDValue N1 = N->getOperand(1);
		EVT VT = N->getValueType(0);
		SDLoc SL(N);

		assert(N->getOpcode() == ISD::FMUL && "Expected FMUL Operation");

		const TargetOptions &Options = DAG.getTarget().Options;
		bool AllowFusion =
		(Options.AllowFPOpFusion == FPOpFusion::Fast \|\| Options.UnsafeFPMath);

		// Floating-point multiply-add with intermediate rounding.
		bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));

		// Floating-point multiply-add without intermediate rounding.
		bool HasFMA =
		AllowFusion && TLI.isFMAFasterThanFMulAndFAdd(VT) &&
		(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FMA, VT));

		// No valid opcode, do not combine.
		if (!HasFMAD && !HasFMA)
		return SDValue();

		// Always prefer FMAD to FMA for precision.
		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);

		// fold (fmul (fadd x, +1.0), y) -> (fma x, y, y)
		// fold (fmul (fadd x, -1.0), y) -> (fma x, y, (fneg y))
		auto FuseFADD = [&](SDValue X, SDValue Y) {
		if (X.getOpcode() == ISD::FADD && (Aggressive \|\| X->hasOneUse())) {
		auto XC1 = isConstOrConstSplatFP(X.getOperand(1));
		if (XC1 && XC1->isExactlyValue(+1.0))
		return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y);
		if (XC1 && XC1->isExactlyValue(-1.0))
		return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y,
		DAG.getNode(ISD::FNEG, SL, VT, Y));
		}
		return SDValue();
		};

		if (SDValue FMA = FuseFADD(N0, N1))
		return FMA;
		if (SDValue FMA = FuseFADD(N1, N0))
		return FMA;

		// fold (fmul (fsub +1.0, x), y) -> (fma (fneg x), y, y)
		// fold (fmul (fsub -1.0, x), y) -> (fma (fneg x), y, (fneg y))
		// fold (fmul (fsub x, +1.0), y) -> (fma x, y, (fneg y))
		// fold (fmul (fsub x, -1.0), y) -> (fma x, y, y)
		auto FuseFSUB = [&](SDValue X, SDValue Y) {
		if (X.getOpcode() == ISD::FSUB && (Aggressive \|\| X->hasOneUse())) {
		auto XC0 = isConstOrConstSplatFP(X.getOperand(0));
		if (XC0 && XC0->isExactlyValue(+1.0))
		return DAG.getNode(PreferredFusedOpcode, SL, VT,
		DAG.getNode(ISD::FNEG, SL, VT, X.getOperand(1)), Y,
		Y);
		if (XC0 && XC0->isExactlyValue(-1.0))
		return DAG.getNode(PreferredFusedOpcode, SL, VT,
		DAG.getNode(ISD::FNEG, SL, VT, X.getOperand(1)), Y,
		DAG.getNode(ISD::FNEG, SL, VT, Y));

		auto XC1 = isConstOrConstSplatFP(X.getOperand(1));
		if (XC1 && XC1->isExactlyValue(+1.0))
		return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y,
		DAG.getNode(ISD::FNEG, SL, VT, Y));
		if (XC1 && XC1->isExactlyValue(-1.0))
		return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y);
		}
		return SDValue();
		};

		if (SDValue FMA = FuseFSUB(N0, N1))
		return FMA;
		if (SDValue FMA = FuseFSUB(N1, N0))
		return FMA;

		return SDValue();
		}

SDValue DAGCombiner::visitFADD(SDNode *N) {		SDValue DAGCombiner::visitFADD(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);		ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	if (char RHSNeg = isNegatibleForFree(N1, LegalOperations, TLI, &Options)) {
if (LHSNeg == 2 \|\| RHSNeg == 2)		if (LHSNeg == 2 \|\| RHSNeg == 2)
return DAG.getNode(ISD::FMUL, DL, VT,		return DAG.getNode(ISD::FMUL, DL, VT,
GetNegatedExpression(N0, DAG, LegalOperations),		GetNegatedExpression(N0, DAG, LegalOperations),
GetNegatedExpression(N1, DAG, LegalOperations),		GetNegatedExpression(N1, DAG, LegalOperations),
Flags);		Flags);
}		}
}		}

		// FMUL -> FMA combines:
		if (SDValue Fused = visitFMULForFMACombine(N)) {
		AddToWorklist(Fused.getNode());
		return Fused;
		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFMA(SDNode *N) {		SDValue DAGCombiner::visitFMA(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
▲ Show 20 Lines • Show All 6,229 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fma-combine.ll

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	define void @aggressive_combine_to_fma_fsub_1_f64(double addrspace(1)* noalias %out, double addrspace(1)* noalias %in) #1 {
%tmp0 = fmul double %u, %v		%tmp0 = fmul double %u, %v
%tmp1 = call double @llvm.fma.f64(double %y, double %z, double %tmp0) #0		%tmp1 = call double @llvm.fma.f64(double %y, double %z, double %tmp0) #0
%tmp2 = fsub double %x, %tmp1		%tmp2 = fsub double %x, %tmp1

store double %tmp2, double addrspace(1)* %gep.out		store double %tmp2, double addrspace(1)* %gep.out
ret void		ret void
}		}

		;
		; Patterns (+ fneg variants): mul(add(1.0,x),y), mul(sub(1.0,x),y), mul(sub(x,1.0),y)
		;

		; FUNC-LABEL: {{^}}test_f32_mul_add_x_one_y:
		; SI: v_mac_f32_e32 [[VY:v[0-9]]], [[VY:v[0-9]]], [[VX:v[0-9]]]
		define void @test_f32_mul_add_x_one_y(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%a = fadd float %x, 1.0
		%m = fmul float %a, %y
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_y_add_x_one:
		; SI: v_mac_f32_e32 [[VY:v[0-9]]], [[VY:v[0-9]]], [[VX:v[0-9]]]
		define void @test_f32_mul_y_add_x_one(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%a = fadd float %x, 1.0
		%m = fmul float %y, %a
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_add_x_negone_y:
		; SI: v_mad_f32 [[VX:v[0-9]]], [[VX]], [[VY:v[0-9]]], -[[VY]]
		define void @test_f32_mul_add_x_negone_y(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%a = fadd float %x, -1.0
		%m = fmul float %a, %y
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_y_add_x_negone:
		; SI: v_mad_f32 [[VX:v[0-9]]], [[VX]], [[VY:v[0-9]]], -[[VY]]
		define void @test_f32_mul_y_add_x_negone(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%a = fadd float %x, -1.0
		%m = fmul float %y, %a
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_sub_one_x_y:
		; SI: v_mad_f32 [[VX:v[0-9]]], -[[VX]], [[VY:v[0-9]]], [[VY]]
		define void @test_f32_mul_sub_one_x_y(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float 1.0, %x
		%m = fmul float %s, %y
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_y_sub_one_x:
		; SI: v_mad_f32 [[VX:v[0-9]]], -[[VX]], [[VY:v[0-9]]], [[VY]]
		define void @test_f32_mul_y_sub_one_x(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float 1.0, %x
		%m = fmul float %y, %s
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_sub_negone_x_y:
		; SI: v_mad_f32 [[VX:v[0-9]]], -[[VX]], [[VY:v[0-9]]], -[[VY]]
		define void @test_f32_mul_sub_negone_x_y(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float -1.0, %x
		%m = fmul float %s, %y
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_y_sub_negone_x:
		; SI: v_mad_f32 [[VX:v[0-9]]], -[[VX]], [[VY:v[0-9]]], -[[VY]]
		define void @test_f32_mul_y_sub_negone_x(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float -1.0, %x
		%m = fmul float %y, %s
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_sub_x_one_y:
		; SI: v_mad_f32 [[VX:v[0-9]]], [[VX]], [[VY:v[0-9]]], -[[VY]]
		define void @test_f32_mul_sub_x_one_y(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float %x, 1.0
		%m = fmul float %s, %y
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_y_sub_x_one:
		; SI: v_mad_f32 [[VX:v[0-9]]], [[VX]], [[VY:v[0-9]]], -[[VY]]
		define void @test_f32_mul_y_sub_x_one(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float %x, 1.0
		%m = fmul float %y, %s
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_sub_x_negone_y:
		; SI: v_mac_f32_e32 [[VY:v[0-9]]], [[VY]], [[VX:v[0-9]]]
		define void @test_f32_mul_sub_x_negone_y(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float %x, -1.0
		%m = fmul float %s, %y
		store float %m, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f32_mul_y_sub_x_negone:
		; SI: v_mac_f32_e32 [[VY:v[0-9]]], [[VY]], [[VX:v[0-9]]]
		define void @test_f32_mul_y_sub_x_negone(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%s = fsub float %x, -1.0
		%m = fmul float %y, %s
		store float %m, float addrspace(1)* %out
		ret void
		}

		;
		; Interpolation Patterns: add(mul(x,t),mul(sub(1.0,t),y))
		;

		; FUNC-LABEL: {{^}}test_f32_interp:
		; SI: v_mad_f32 [[VR:v[0-9]]], -[[VT:v[0-9]]], [[VY:v[0-9]]], [[VY]]
		; SI: v_mac_f32_e32 [[VR]], [[VT]], [[VX:v[0-9]]]
		define void @test_f32_interp(float addrspace(1)* %out,
		float addrspace(1)* %in1,
		float addrspace(1)* %in2,
		float addrspace(1)* %in3) {
		%x = load float, float addrspace(1)* %in1
		%y = load float, float addrspace(1)* %in2
		%t = load float, float addrspace(1)* %in3
		%t1 = fsub float 1.0, %t
		%tx = fmul float %x, %t
		%ty = fmul float %y, %t1
		%r = fadd float %tx, %ty
		store float %r, float addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_f64_interp:
		; SI: v_fma_f64 [[VR:v\[[0-9]+:[0-9]+\]]], -[[VT:v\[[0-9]+:[0-9]+\]]], [[VY:v\[[0-9]+:[0-9]+\]]], [[VY]]
		; SI: v_fma_f64 [[VR:v\[[0-9]+:[0-9]+\]]], [[VX:v\[[0-9]+:[0-9]+\]]], [[VT]], [[VR]]
		define void @test_f64_interp(double addrspace(1)* %out,
		double addrspace(1)* %in1,
		double addrspace(1)* %in2,
		double addrspace(1)* %in3) {
		%x = load double, double addrspace(1)* %in1
		%y = load double, double addrspace(1)* %in2
		%t = load double, double addrspace(1)* %in3
		%t1 = fsub double 1.0, %t
		%tx = fmul double %x, %t
		%ty = fmul double %y, %t1
		%r = fadd double %tx, %ty
		store double %r, double addrspace(1)* %out
		ret void
		}

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }
attributes #1 = { nounwind }		attributes #1 = { nounwind }

test/CodeGen/AMDGPU/llvm.amdgpu.lrp.ll

	; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s

	declare float @llvm.AMDGPU.lrp(float, float, float) nounwind readnone			declare float @llvm.AMDGPU.lrp(float, float, float) nounwind readnone

	; FUNC-LABEL: {{^}}test_lrp:			; FUNC-LABEL: {{^}}test_lrp:
	; SI: v_sub_f32			; SI: v_mad_f32
	; SI: v_mac_f32_e32			; SI: v_mac_f32_e32
	define void @test_lrp(float addrspace(1)* %out, float %src0, float %src1, float %src2) nounwind {			define void @test_lrp(float addrspace(1)* %out, float %src0, float %src1, float %src2) nounwind {
	%mad = call float @llvm.AMDGPU.lrp(float %src0, float %src1, float %src2) nounwind readnone			%mad = call float @llvm.AMDGPU.lrp(float %src0, float %src1, float %src2) nounwind readnone
	store float %mad, float addrspace(1)* %out, align 4			store float %mad, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

test/CodeGen/X86/fma_patterns.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma4,+fma -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma4,+fma -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma4 -fp-contract=fast \| FileCheck %s --check-prefix=CHECK_FMA4			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma4 -fp-contract=fast \| FileCheck %s --check-prefix=CHECK_FMA4

				;
				; Patterns (+ fneg variants): add(mul(x,y),z), sub(mul(x,y),z)
				;

	define <4 x float> @test_x86_fmadd_ps(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {			define <4 x float> @test_x86_fmadd_ps(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {
	; CHECK-LABEL: test_x86_fmadd_ps:			; CHECK-LABEL: test_x86_fmadd_ps:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vfmadd213ps %xmm2, %xmm1, %xmm0			; CHECK-NEXT: vfmadd213ps %xmm2, %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; CHECK_FMA4-LABEL: test_x86_fmadd_ps:			; CHECK_FMA4-LABEL: test_x86_fmadd_ps:
	; CHECK_FMA4: # BB#0:			; CHECK_FMA4: # BB#0:
	▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines
	; CHECK_FMA4-NEXT: vfmsubps %xmm1, (%rdi), %xmm0, %xmm0			; CHECK_FMA4-NEXT: vfmsubps %xmm1, (%rdi), %xmm0, %xmm0
	; CHECK_FMA4-NEXT: retq			; CHECK_FMA4-NEXT: retq
	%x = load <4 x float>, <4 x float>* %a0			%x = load <4 x float>, <4 x float>* %a0
	%y = fmul <4 x float> %x, %a1			%y = fmul <4 x float> %x, %a1
	%res = fsub <4 x float> %y, %a2			%res = fsub <4 x float> %y, %a2
	ret <4 x float> %res			ret <4 x float> %res
	}			}

				;
				; Patterns (+ fneg variants): mul(add(1.0,x),y), mul(sub(1.0,x),y), mul(sub(x,1.0),y)
				;

				define <4 x float> @test_v4f32_mul_add_x_one_y(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_add_x_one_y:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmadd213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_add_x_one_y:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmaddps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%a = fadd <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>
				%m = fmul <4 x float> %a, %y
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_y_add_x_one(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_y_add_x_one:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmadd213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_y_add_x_one:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmaddps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%a = fadd <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>
				%m = fmul <4 x float> %y, %a
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_add_x_negone_y(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_add_x_negone_y:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmsub213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_add_x_negone_y:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmsubps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%a = fadd <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>
				%m = fmul <4 x float> %a, %y
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_y_add_x_negone(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_y_add_x_negone:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmsub213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_y_add_x_negone:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmsubps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%a = fadd <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>
				%m = fmul <4 x float> %y, %a
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_sub_one_x_y(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_sub_one_x_y:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_sub_one_x_y:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
				%m = fmul <4 x float> %s, %y
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_y_sub_one_x(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_y_sub_one_x:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_y_sub_one_x:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
				%m = fmul <4 x float> %y, %s
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_sub_negone_x_y(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_sub_negone_x_y:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmsub213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_sub_negone_x_y:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmsubps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> <float -1.0, float -1.0, float -1.0, float -1.0>, %x
				%m = fmul <4 x float> %s, %y
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_y_sub_negone_x(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_y_sub_negone_x:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmsub213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_y_sub_negone_x:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmsubps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> <float -1.0, float -1.0, float -1.0, float -1.0>, %x
				%m = fmul <4 x float> %y, %s
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_sub_x_one_y(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_sub_x_one_y:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmsub213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_sub_x_one_y:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmsubps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>
				%m = fmul <4 x float> %s, %y
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_y_sub_x_one(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_y_sub_x_one:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmsub213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_y_sub_x_one:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmsubps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>
				%m = fmul <4 x float> %y, %s
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_sub_x_negone_y(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_sub_x_negone_y:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmadd213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_sub_x_negone_y:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmaddps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>
				%m = fmul <4 x float> %s, %y
				ret <4 x float> %m
				}

				define <4 x float> @test_v4f32_mul_y_sub_x_negone(<4 x float> %x, <4 x float> %y) {
				; CHECK-LABEL: test_v4f32_mul_y_sub_x_negone:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfmadd213ps %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_mul_y_sub_x_negone:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfmaddps %xmm1, %xmm1, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%s = fsub <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>
				%m = fmul <4 x float> %y, %s
				ret <4 x float> %m
				}

				;
				; Interpolation Patterns: add(mul(x,t),mul(sub(1.0,t),y))
				;

				define float @test_f32_interp(float %x, float %y, float %t) {
				; CHECK-LABEL: test_f32_interp:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213ss %xmm1, %xmm2, %xmm1
				; CHECK-NEXT: vfmadd213ss %xmm1, %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_f32_interp:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddss %xmm1, %xmm1, %xmm2, %xmm1
				; CHECK_FMA4-NEXT: vfmaddss %xmm1, %xmm2, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%t1 = fsub float 1.0, %t
				%tx = fmul float %x, %t
				%ty = fmul float %y, %t1
				%r = fadd float %tx, %ty
				ret float %r
				}

				define <4 x float> @test_v4f32_interp(<4 x float> %x, <4 x float> %y, <4 x float> %t) {
				; CHECK-LABEL: test_v4f32_interp:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213ps %xmm1, %xmm2, %xmm1
				; CHECK-NEXT: vfmadd213ps %xmm1, %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f32_interp:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddps %xmm1, %xmm1, %xmm2, %xmm1
				; CHECK_FMA4-NEXT: vfmaddps %xmm1, %xmm2, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%t1 = fsub <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %t
				%tx = fmul <4 x float> %x, %t
				%ty = fmul <4 x float> %y, %t1
				%r = fadd <4 x float> %tx, %ty
				ret <4 x float> %r
				}

				define <8 x float> @test_v8f32_interp(<8 x float> %x, <8 x float> %y, <8 x float> %t) {
				; CHECK-LABEL: test_v8f32_interp:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213ps %ymm1, %ymm2, %ymm1
				; CHECK-NEXT: vfmadd213ps %ymm1, %ymm2, %ymm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v8f32_interp:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddps %ymm1, %ymm1, %ymm2, %ymm1
				; CHECK_FMA4-NEXT: vfmaddps %ymm1, %ymm2, %ymm0, %ymm0
				; CHECK_FMA4-NEXT: retq
				%t1 = fsub <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %t
				%tx = fmul <8 x float> %x, %t
				%ty = fmul <8 x float> %y, %t1
				%r = fadd <8 x float> %tx, %ty
				ret <8 x float> %r
				}

				define double @test_f64_interp(double %x, double %y, double %t) {
				; CHECK-LABEL: test_f64_interp:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213sd %xmm1, %xmm2, %xmm1
				; CHECK-NEXT: vfmadd213sd %xmm1, %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_f64_interp:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddsd %xmm1, %xmm1, %xmm2, %xmm1
				; CHECK_FMA4-NEXT: vfmaddsd %xmm1, %xmm2, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%t1 = fsub double 1.0, %t
				%tx = fmul double %x, %t
				%ty = fmul double %y, %t1
				%r = fadd double %tx, %ty
				ret double %r
				}

				define <2 x double> @test_v2f64_interp(<2 x double> %x, <2 x double> %y, <2 x double> %t) {
				; CHECK-LABEL: test_v2f64_interp:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213pd %xmm1, %xmm2, %xmm1
				; CHECK-NEXT: vfmadd213pd %xmm1, %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v2f64_interp:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddpd %xmm1, %xmm1, %xmm2, %xmm1
				; CHECK_FMA4-NEXT: vfmaddpd %xmm1, %xmm2, %xmm0, %xmm0
				; CHECK_FMA4-NEXT: retq
				%t1 = fsub <2 x double> <double 1.0, double 1.0>, %t
				%tx = fmul <2 x double> %x, %t
				%ty = fmul <2 x double> %y, %t1
				%r = fadd <2 x double> %tx, %ty
				ret <2 x double> %r
				}

				define <4 x double> @test_v4f64_interp(<4 x double> %x, <4 x double> %y, <4 x double> %t) {
				; CHECK-LABEL: test_v4f64_interp:
				; CHECK: # BB#0:
				; CHECK-NEXT: vfnmadd213pd %ymm1, %ymm2, %ymm1
				; CHECK-NEXT: vfmadd213pd %ymm1, %ymm2, %ymm0
				; CHECK-NEXT: retq
				;
				; CHECK_FMA4-LABEL: test_v4f64_interp:
				; CHECK_FMA4: # BB#0:
				; CHECK_FMA4-NEXT: vfnmaddpd %ymm1, %ymm1, %ymm2, %ymm1
				; CHECK_FMA4-NEXT: vfmaddpd %ymm1, %ymm2, %ymm0, %ymm0
				; CHECK_FMA4-NEXT: retq
				%t1 = fsub <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>, %t
				%tx = fmul <4 x double> %x, %t
				%ty = fmul <4 x double> %y, %t1
				%r = fadd <4 x double> %tx, %ty
				ret <4 x double> %r
				}