This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineCombinerPattern.h
-
lib/
-
CodeGen/
-
MachineCombiner.cpp
-
Target/PowerPC/
-
PowerPC/
2
PPCInstrInfo.h
1/4
PPCInstrInfo.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
machine-combiner.ll

Differential D80175

[PowerPC][MachineCombiner] reassociate fma to expose more ILP
ClosedPublic

Authored by shchenz on May 18 2020, 8:42 PM.

Download Raw Diff

Details

Reviewers

hfinkel
jsji

Group Reviewers

Restricted Project

Commits

rGbd7096b977e1: [PowerPC] fma chain break to expose more ILP

Summary

This patch tries to reassociate two patterns related to FMA to expose more ILP on PowerPC.

// Pattern 1:
//   A =  FADD X,  Y          (Leaf)
//   B =  FMA  A,  M21,  M22  (Prev)
//   C =  FMA  B,  M31,  M32  (Root)
// -->
//   A =  FMA  X,  M21,  M22
//   B =  FMA  Y,  M31,  M32
//   C =  FADD A,  B

// Pattern 2:
//   A =  FMA  X,  M11,  M12  (Leaf)
//   B =  FMA  A,  M21,  M22  (Prev)
//   C =  FMA  B,  M31,  M32  (Root)
// -->
//   A =  FMUL M11,  M12
//   B =  FMA  X,  M21,  M22
//   D =  FMA  A,  M31,  M32
//   C =  FADD B,  D

breaking the dependency between A and B, allowing FMA to be executed in parallel (or back-to-back in a pipeline) instead of depending on each other.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

shchenz created this revision.May 18 2020, 8:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2020, 8:42 PM

Herald added subscribers: llvm-commits, steven.zhang, • wuzish and 3 others. · View Herald Transcript

shchenz edited the summary of this revision. (Show Details)May 18 2020, 9:02 PM

Harbormaster failed remote builds in B57152: Diff 264778!May 18 2020, 9:40 PM

format fixing & rebase & some typo fixing

Harbormaster completed remote builds in B57787: Diff 266001.May 25 2020, 5:51 AM

shchenz edited the summary of this revision. (Show Details)May 26 2020, 1:41 AM

I did not look at the patch itself except to notice that it is a lot of code...so I have to ask - did you look at implementing at least the 1st pattern in DAGCombiner? That seems like a general improvement for any superscalar micro-arch with no register pressure disadvantage.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
313–319	I was confused here because I was expecting the C++ style notation for FMA (X*Y+Z): https://en.cppreference.com/w/cpp/numeric/math/fma

In D80175#2058335, @spatel wrote:

I did not look at the patch itself except to notice that it is a lot of code...so I have to ask - did you look at implementing at least the 1st pattern in DAGCombiner? That seems like a general improvement for any superscalar micro-arch with no register pressure disadvantage.

Thanks for looking into this @spatel

Yes, I tried to implement pattern 1 in DAGCombiner, but I got some LIT failures related to register allocation on platform AArch64 and Thumb2. And this kind of opt will increase register pressure. I think it is better not to add it in DAGCombiner.
Reason I add these two patterns in MachineCombiner is:
1: This pass is targeted for ILP related optimization
2: Adding register pressure estimation model here should be easy than in DAGCombiner. We can do similar estimation like we did in MachineLICM if we want to model it in future?
3: These two patterns have to be put together. After breaking pattern 2: (fma+fma+fma) to (fmul+fma+fma+fadd), the last fadd can be combined with following two fmas as pattern 1, and we can get more paralleled fmas.

I agree that this can be exploited to other platforms that support destructive hardware fma instructions. But I am not familiar with other platform's instruction set, so currently I only implement it on PowerPC.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
313–319	This comment is target-specific. On PowerPC, most fma like instructions such as xsmaddadp/xsmaddasp/xvmaddadp/xvmaddasp are defined with the above form in ISA.

In D80175#2058915, @shchenz wrote:

In D80175#2058335, @spatel wrote:

I did not look at the patch itself except to notice that it is a lot of code...so I have to ask - did you look at implementing at least the 1st pattern in DAGCombiner? That seems like a general improvement for any superscalar micro-arch with no register pressure disadvantage.

Thanks for looking into this @spatel

Yes, I tried to implement pattern 1 in DAGCombiner, but I got some LIT failures related to register allocation on platform AArch64 and Thumb2. And this kind of opt will increase register pressure. I think it is better not to add it in DAGCombiner.
Reason I add these two patterns in MachineCombiner is:
1: This pass is targeted for ILP related optimization
2: Adding register pressure estimation model here should be easy than in DAGCombiner. We can do similar estimation like we did in MachineLICM if we want to model it in future?
3: These two patterns have to be put together. After breaking pattern 2: (fma+fma+fma) to (fmul+fma+fma+fadd), the last fadd can be combined with following two fmas as pattern 1, and we can get more paralleled fmas.

Thank you for the explanation. I agree that this is a better place to try the transform. DAGCombiner has no real register pressure analysis. One thing to be aware of: the compile-time cost of MachineCombiner was potentially high when analyzing large blocks. My experience with this pass is a few years old, so this might have changed, but there may be some bugzilla reports on that.

If you have a version of the transform as a DAGCombiner patch and can post it somewhere, I would still be interested in trying it out locally.

@spatel , Thanks for reminding the compiling time issue. I will have a test about compiling time later.

This was the prototype I implemented in DAGCombiner,

Before the final return statement of visitFMA() function in file DAGCombiner.cpp

Sorry for the bad formatting and it is not well tested.

+
+  // expose more ILP:
+  // (fma E, F, (fma C, D, (add A, B))) -> add ((fma C, D, A), (fma E, F, A))
+  TargetSchedModel SchedModel;
+  SchedModel.init(&DAG.getSubtarget());
+  if (UnsafeFPMath && SchedModel.getIssueWidth() > 1 && N2.getOpcode() == ISD::FMA && N2.hasOneUse() && (Options.UnsafeFPMath || isContractable(N2.getNode()))) {
+    SDValue OpADD = N2.getOperand(2);
+    if (OpADD.getOpcode() == ISD::FADD && OpADD.hasOneUse()) {
+      SDValue ADDLHS = DAG.getNode(ISD::FMA, SDLoc(N2), VT, N2.getOperand(0), N2.getOperand(1), OpADD.getOperand(0), Flags);
+      SDValue ADDRHS = DAG.getNode(ISD::FMA, SDLoc(N2), VT, N0, N1, OpADD.getOperand(1), Flags);
+      return DAG.getNode(ISD::FADD, DL, VT, ADDLHS, ADDRHS, OpADD.getNode()->getFlags());
+    }
+  }
+

set resource length extension to 1 on PowerPC

Harbormaster failed remote builds in B58593: Diff 267556!Jun 1 2020, 2:37 AM

no compile time deg found in the benchmarks I run.

LGTM with some nits.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
354	Can we define or use enum for index instead of hardcode 1,2,3,4? So that it will be easier to read and maintain. eg: #define InfoArrayIdxFMAInst 0 #define InfoArrayIdxFAddInst 1 #define InfoArrayIdxFMULInst 2 #define InfoArrayIdxAddOpIdx 3 #define InfoArrayIdxMULOpIdx 4
395	Do we need to reset `AddOpIdx` befoer calling `IsReassociable` again? Or else the value is not `-1` anymore, we won't be able to catch issues in following `assert`..
llvm/lib/Target/PowerPC/PPCInstrInfo.h
326	`Pattern`? -> `P`? Or replace `P` to `Pattern` in following line.
340	Can we make this comment more clearer? eg: Why we need to set it to 1? for what edge case?

This revision is now accepted and ready to land.Jun 6 2020, 1:37 PM

address review comments:
1: use the macro to represent array index instead of hard code
2: comments update

Harbormaster failed remote builds in B59414: Diff 269089!Jun 7 2020, 8:48 PM

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

In D80175#2081483, @AaronLiu wrote:

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

Have you met some issues with this being turned on? This is specific for PowerPC, I have verified that on PowerPC there is no deg for the patch on the benchmarks I run. If you just want to see the impact of this patch, I would suggest you comment out the two newly added lines in function PPCInstrInfo::getMachineCombinerPatterns, it will turn off this opt.

In D80175#2081521, @shchenz wrote:

In D80175#2081483, @AaronLiu wrote:

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

Have you met some issues with this being turned on? This is specific for PowerPC, I have verified that on PowerPC there is no deg for the patch on the benchmarks I run. If you just want to see the impact of this patch, I would suggest you comment out the two newly added lines in function PPCInstrInfo::getMachineCombinerPatterns, it will turn off this opt.

From a user perspective, for anyone who investigate a benchmark or debug a problem, it is very frequently to disable or enable many optimizations, and it is impossible for a user to find someway to comment out some lines in some function for some optimizations and rebuild the compiler, and uncomment some lines and rebuild the compiler again and again.

I think this is a simple and reasonable suggestion. By the way, I do not have very strong opinion on this. -:) What do you think? @jsji @nemanjai

In D80175#2081483, @AaronLiu wrote:

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

This is not adding a new optimization but is increasing the scope of an existing optimization. So I don't think it is appropriate to add an option to control only this aspect of this. If there is a use case for allowing the user to have fine grained control over which combiner patterns to use, then we can add an enum option. But we need to have justification for why that is needed.

OTOH, I am not opposed to adding an option to turn off the machine combiner (i.e. change the value returned by PPCInstrInfo::useMachineCombiner()) - but that is orthogonal to this patch and can be done separately.

In D80175#2085326, @nemanjai wrote:

In D80175#2081483, @AaronLiu wrote:

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

This is not adding a new optimization but is increasing the scope of an existing optimization. So I don't think it is appropriate to add an option to control only this aspect of this. If there is a use case for allowing the user to have fine grained control over which combiner patterns to use, then we can add an enum option. But we need to have justification for why that is needed.

OTOH, I am not opposed to adding an option to turn off the machine combiner (i.e. change the value returned by PPCInstrInfo::useMachineCombiner()) - but that is orthogonal to this patch and can be done separately.

Thanks for the explanation!

@AaronLiu Thanks for posting your concerns.
@nemanjai Thanks for your explanation and suggestion. I am happy to add an option to control MachineCombiner pass on PowerPC later.

Closed by commit rGbd7096b977e1: [PowerPC] fma chain break to expose more ILP (authored by shchenz). · Explain WhyJun 14 2020, 9:19 PM

This revision was automatically updated to reflect the committed changes.

In D80175#2092047, @shchenz wrote:

@AaronLiu Thanks for posting your concerns.
@nemanjai Thanks for your explanation and suggestion. I am happy to add an option to control MachineCombiner pass on PowerPC later.

There is already one hidden option used to turn on/off machine combiner pass on PowerPC.

static cl::opt<bool>
EnableMachineCombinerPass("ppc-machine-combiner",
                          cl::desc("Enable the machine combiner pass"),
                          cl::init(true), cl::Hidden);

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineCombinerPattern.h

4 lines

lib/

CodeGen/

MachineCombiner.cpp

2 lines

Target/

PowerPC/

PPCInstrInfo.h

29 lines

PPCInstrInfo.cpp

327 lines

test/

CodeGen/

PowerPC/

machine-combiner.ll

67 lines

Diff 270657

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

	Show All 19 Lines
	enum class MachineCombinerPattern {			enum class MachineCombinerPattern {
	// These are commutative variants for reassociating a computation chain. See			// These are commutative variants for reassociating a computation chain. See
	// the comments before getMachineCombinerPatterns() in TargetInstrInfo.cpp.			// the comments before getMachineCombinerPatterns() in TargetInstrInfo.cpp.
	REASSOC_AX_BY,			REASSOC_AX_BY,
	REASSOC_AX_YB,			REASSOC_AX_YB,
	REASSOC_XA_BY,			REASSOC_XA_BY,
	REASSOC_XA_YB,			REASSOC_XA_YB,

				// These are patterns matched by the PowerPC to reassociate FMA chains.
				REASSOC_XY_AMM_BMM,
				REASSOC_XMM_AMM_BMM,

	// These are multiply-add patterns matched by the AArch64 machine combiner.			// These are multiply-add patterns matched by the AArch64 machine combiner.
	MULADDW_OP1,			MULADDW_OP1,
	MULADDW_OP2,			MULADDW_OP2,
	MULSUBW_OP1,			MULSUBW_OP1,
	MULSUBW_OP2,			MULSUBW_OP2,
	MULADDWI_OP1,			MULADDWI_OP1,
	MULSUBWI_OP1,			MULSUBWI_OP1,
	MULADDX_OP1,			MULADDX_OP1,
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineCombiner.cpp

	Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines
	static CombinerObjective getCombinerObjective(MachineCombinerPattern P) {			static CombinerObjective getCombinerObjective(MachineCombinerPattern P) {
	// TODO: If C++ ever gets a real enum class, make this part of the			// TODO: If C++ ever gets a real enum class, make this part of the
	// MachineCombinerPattern class.			// MachineCombinerPattern class.
	switch (P) {			switch (P) {
	case MachineCombinerPattern::REASSOC_AX_BY:			case MachineCombinerPattern::REASSOC_AX_BY:
	case MachineCombinerPattern::REASSOC_AX_YB:			case MachineCombinerPattern::REASSOC_AX_YB:
	case MachineCombinerPattern::REASSOC_XA_BY:			case MachineCombinerPattern::REASSOC_XA_BY:
	case MachineCombinerPattern::REASSOC_XA_YB:			case MachineCombinerPattern::REASSOC_XA_YB:
				case MachineCombinerPattern::REASSOC_XY_AMM_BMM:
				case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:
	return CombinerObjective::MustReduceDepth;			return CombinerObjective::MustReduceDepth;
	default:			default:
	return CombinerObjective::Default;			return CombinerObjective::Default;
	}			}
	}			}

	/// Estimate the latency of the new and original instruction sequence by summing			/// Estimate the latency of the new and original instruction sequence by summing
	/// up the latencies of the inserted and deleted instructions. This assumes			/// up the latencies of the inserted and deleted instructions. This assumes
	▲ Show 20 Lines • Show All 398 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.h

Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	bool isImmElgibleForForwarding(const MachineOperand &ImmMO,
int64_t &Imm) const;		int64_t &Imm) const;
bool isRegElgibleForForwarding(const MachineOperand &RegMO,		bool isRegElgibleForForwarding(const MachineOperand &RegMO,
const MachineInstr &DefMI,		const MachineInstr &DefMI,
const MachineInstr &MI, bool KillDefMI,		const MachineInstr &MI, bool KillDefMI,
bool &IsFwdFeederRegKilled) const;		bool &IsFwdFeederRegKilled) const;
unsigned getSpillTarget() const;		unsigned getSpillTarget() const;
const unsigned *getStoreOpcodesForSpillArray() const;		const unsigned *getStoreOpcodesForSpillArray() const;
const unsigned *getLoadOpcodesForSpillArray() const;		const unsigned *getLoadOpcodesForSpillArray() const;
		int16_t getFMAOpIdxInfo(unsigned Opcode) const;
		void reassociateFMA(MachineInstr &Root, MachineCombinerPattern Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const;
virtual void anchor();		virtual void anchor();

protected:		protected:
/// Commutes the operands in the given instruction.		/// Commutes the operands in the given instruction.
/// The commutable operands are specified by their indices OpIdx1 and OpIdx2.		/// The commutable operands are specified by their indices OpIdx1 and OpIdx2.
///		///
/// Do not call this method for a non-commutable instruction or for		/// Do not call this method for a non-commutable instruction or for
/// non-commutable pair of operand indices OpIdx1 and OpIdx2.		/// non-commutable pair of operand indices OpIdx1 and OpIdx2.
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	bool hasLowDefLatency(const TargetSchedModel &SchedModel,
// body.		// body.
return false;		return false;
}		}

bool useMachineCombiner() const override {		bool useMachineCombiner() const override {
return true;		return true;
}		}

		/// When getMachineCombinerPatterns() finds patterns, this function generates
		/// the instructions that could replace the original code sequence
		void genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const override;

		/// Return true when there is potentially a faster code sequence for a fma
		/// chain ending in \p Root. All potential patterns are output in the \p
		/// P array.
		jsjiUnsubmitted Not Done Reply Inline Actions `Pattern`? -> `P`? Or replace `P` to `Pattern` in following line. jsji: `Pattern`? -> `P`? Or replace `P` to `Pattern` in following line.
		bool getFMAPatterns(MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern> &P) const;

/// Return true when there is potentially a faster code sequence		/// Return true when there is potentially a faster code sequence
/// for an instruction chain ending in <Root>. All potential patterns are		/// for an instruction chain ending in <Root>. All potential patterns are
/// output in the <Pattern> array.		/// output in the <Pattern> array.
bool getMachineCombinerPatterns(		bool getMachineCombinerPatterns(
MachineInstr &Root,		MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &P) const override;		SmallVectorImpl<MachineCombinerPattern> &P) const override;

bool isAssociativeAndCommutative(const MachineInstr &Inst) const override;		bool isAssociativeAndCommutative(const MachineInstr &Inst) const override;

		/// On PowerPC, we try to reassociate FMA chain which will increase
		/// instruction size. Set extension resource length limit to 1 for edge case.
		jsjiUnsubmitted Not Done Reply Inline Actions Can we make this comment more clearer? eg: Why we need to set it to 1? for what edge case? jsji: Can we make this comment more clearer? eg: Why we need to set it to 1? for what edge case?
		/// Resource Length is calculated by scaled resource usage in getCycles().
		/// Because of the division in getCycles(), it returns different cycles due to
		/// legacy scaled resource usage. So new resource length may be same with
		/// legacy or 1 bigger than legacy.
		/// We need to execlude the 1 bigger case even the resource length is not
		/// perserved for more FMA chain reassociations on PowerPC.
		int getExtendResourceLenLimit() const override { return 1; }

void setSpecialOperandAttr(MachineInstr &OldMI1, MachineInstr &OldMI2,		void setSpecialOperandAttr(MachineInstr &OldMI1, MachineInstr &OldMI2,
MachineInstr &NewMI1,		MachineInstr &NewMI1,
MachineInstr &NewMI2) const override;		MachineInstr &NewMI2) const override;

void setSpecialOperandAttr(MachineInstr &MI, uint16_t Flags) const override;		void setSpecialOperandAttr(MachineInstr &MI, uint16_t Flags) const override;

bool isCoalescableExtInstr(const MachineInstr &MI,		bool isCoalescableExtInstr(const MachineInstr &MI,
Register &SrcReg, Register &DstReg,		Register &SrcReg, Register &DstReg,
▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	bool PPCInstrInfo::isAssociativeAndCommutative(const MachineInstr &Inst) const {
case PPC::MULHW:		case PPC::MULHW:
case PPC::MULLW:		case PPC::MULLW:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

		#define InfoArrayIdxFMAInst 0
		#define InfoArrayIdxFAddInst 1
		#define InfoArrayIdxFMULInst 2
		#define InfoArrayIdxAddOpIdx 3
		#define InfoArrayIdxMULOpIdx 4
		// Array keeps info for FMA instructions:
		// Index 0(InfoArrayIdxFMAInst): FMA instruction;
		// Index 1(InfoArrayIdxFAddInst): ADD instruction assoaicted with FMA;
		// Index 2(InfoArrayIdxFMULInst): MUL instruction assoaicted with FMA;
		// Index 3(InfoArrayIdxAddOpIdx): ADD operand index in the FMA operand list;
		// Index 4(InfoArrayIdxMULOpIdx): first MUL operand index in the FMA operand
		// list;
		// second MUL operand index is plus 1.
		static const uint16_t FMAOpIdxInfo[][5] = {
		// FIXME: add more FMA instructions like XSNMADDADP and so on.
		{PPC::XSMADDADP, PPC::XSADDDP, PPC::XSMULDP, 1, 2},
		{PPC::XSMADDASP, PPC::XSADDSP, PPC::XSMULSP, 1, 2},
		{PPC::XVMADDADP, PPC::XVADDDP, PPC::XVMULDP, 1, 2},
		{PPC::XVMADDASP, PPC::XVADDSP, PPC::XVMULSP, 1, 2},
		{PPC::FMADD, PPC::FADD, PPC::FMUL, 3, 1},
		{PPC::FMADDS, PPC::FADDS, PPC::FMULS, 3, 1},
		{PPC::QVFMADDSs, PPC::QVFADDSs, PPC::QVFMULSs, 3, 1},
		{PPC::QVFMADD, PPC::QVFADD, PPC::QVFMUL, 3, 1}};

		// Check if an opcode is a FMA instruction. If it is, return the index in array
		// FMAOpIdxInfo. Otherwise, return -1.
		int16_t PPCInstrInfo::getFMAOpIdxInfo(unsigned Opcode) const {
		for (unsigned I = 0; I < array_lengthof(FMAOpIdxInfo); I++)
		if (FMAOpIdxInfo[I][InfoArrayIdxFMAInst] == Opcode)
		return I;
		return -1;
		}

		// Try to reassociate FMA chains like below:
		//
		// Pattern 1:
		// A = FADD X, Y (Leaf)
		spatelUnsubmitted Not Done Reply Inline Actions I was confused here because I was expecting the C++ style notation for FMA (XY+Z): https://en.cppreference.com/w/cpp/numeric/math/fma spatel:* I was confused here because I was expecting the C++ style notation for FMA (X*Y+Z): https://en.
		shchenzAuthorUnsubmitted Done Reply Inline Actions This comment is target-specific. On PowerPC, most fma like instructions such as xsmaddadp/xsmaddasp/xvmaddadp/xvmaddasp are defined with the above form in ISA. shchenz: This comment is target-specific. On PowerPC, most fma like instructions such as…
		// B = FMA A, M21, M22 (Prev)
		// C = FMA B, M31, M32 (Root)
		// -->
		// A = FMA X, M21, M22
		// B = FMA Y, M31, M32
		// C = FADD A, B
		//
		// Pattern 2:
		// A = FMA X, M11, M12 (Leaf)
		// B = FMA A, M21, M22 (Prev)
		// C = FMA B, M31, M32 (Root)
		// -->
		// A = FMUL M11, M12
		// B = FMA X, M21, M22
		// D = FMA A, M31, M32
		// C = FADD B, D
		//
		// breaking the dependency between A and B, allowing FMA to be executed in
		// parallel (or back-to-back in a pipeline) instead of depending on each other.
		bool PPCInstrInfo::getFMAPatterns(
		MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern> &Patterns) const {
		MachineBasicBlock *MBB = Root.getParent();
		const MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();

		auto IsAllOpsVirtualReg = [](const MachineInstr &Instr) {
		for (const auto &MO : Instr.explicit_operands())
		if (!(MO.isReg() && Register::isVirtualRegister(MO.getReg())))
		return false;
		return true;
		};

		auto IsReassociable = [&](const MachineInstr &Instr, int16_t &AddOpIdx,
		bool IsLeaf, bool IsAdd) {
		int16_t Idx = -1;
		jsjiUnsubmitted Not Done Reply Inline Actions Can we define or use enum for index instead of hardcode 1,2,3,4? So that it will be easier to read and maintain. eg: #define InfoArrayIdxFMAInst 0 #define InfoArrayIdxFAddInst 1 #define InfoArrayIdxFMULInst 2 #define InfoArrayIdxAddOpIdx 3 #define InfoArrayIdxMULOpIdx 4 jsji: Can we define or use enum for index instead of hardcode 1,2,3,4? So that it will be easier to…
		if (!IsAdd) {
		Idx = getFMAOpIdxInfo(Instr.getOpcode());
		if (Idx < 0)
		return false;
		} else if (Instr.getOpcode() !=
		FMAOpIdxInfo[getFMAOpIdxInfo(Root.getOpcode())]
		[InfoArrayIdxFAddInst])
		return false;

		// Instruction can be reassociated.
		// fast match flags may prohibit reassociation.
		if (!(Instr.getFlag(MachineInstr::MIFlag::FmReassoc) &&
		Instr.getFlag(MachineInstr::MIFlag::FmNsz)))
		return false;

		// Instruction operands are virtual registers for reassociating.
		if (!IsAllOpsVirtualReg(Instr))
		return false;

		if (IsAdd && IsLeaf)
		return true;

		AddOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxAddOpIdx];

		const MachineOperand &OpAdd = Instr.getOperand(AddOpIdx);
		MachineInstr *MIAdd = MRI.getUniqueVRegDef(OpAdd.getReg());
		// If 'add' operand's def is not in current block, don't do ILP related opt.
		if (!MIAdd \|\| MIAdd->getParent() != MBB)
		return false;

		// If this is not Leaf FMA Instr, its 'add' operand should only have one use
		// as this fma will be changed later.
		return IsLeaf ? true : MRI.hasOneNonDBGUse(OpAdd.getReg());
		};

		int16_t AddOpIdx = -1;
		// Root must be a valid FMA like instruction.
		if (!IsReassociable(Root, AddOpIdx, false, false))
		return false;

		assert((AddOpIdx >= 0) && "add operand index not right!");
		jsjiUnsubmitted Not Done Reply Inline Actions Do we need to reset `AddOpIdx` befoer calling `IsReassociable` again? Or else the value is not `-1` anymore, we won't be able to catch issues in following `assert`.. jsji: Do we need to reset `AddOpIdx` befoer calling `IsReassociable` again? Or else the value is not…

		Register RegB = Root.getOperand(AddOpIdx).getReg();
		MachineInstr *Prev = MRI.getUniqueVRegDef(RegB);

		// Prev must be a valid FMA like instruction.
		AddOpIdx = -1;
		if (!IsReassociable(*Prev, AddOpIdx, false, false))
		return false;

		assert((AddOpIdx >= 0) && "add operand index not right!");

		Register RegA = Prev->getOperand(AddOpIdx).getReg();
		MachineInstr *Leaf = MRI.getUniqueVRegDef(RegA);
		AddOpIdx = -1;
		if (IsReassociable(*Leaf, AddOpIdx, true, false)) {
		Patterns.push_back(MachineCombinerPattern::REASSOC_XMM_AMM_BMM);
		return true;
		}
		if (IsReassociable(*Leaf, AddOpIdx, true, true)) {
		Patterns.push_back(MachineCombinerPattern::REASSOC_XY_AMM_BMM);
		return true;
		}
		return false;
		}

bool PPCInstrInfo::getMachineCombinerPatterns(		bool PPCInstrInfo::getMachineCombinerPatterns(
MachineInstr &Root,		MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &Patterns) const {		SmallVectorImpl<MachineCombinerPattern> &Patterns) const {
// Using the machine combiner in this way is potentially expensive, so		// Using the machine combiner in this way is potentially expensive, so
// restrict to when aggressive optimizations are desired.		// restrict to when aggressive optimizations are desired.
if (Subtarget.getTargetMachine().getOptLevel() != CodeGenOpt::Aggressive)		if (Subtarget.getTargetMachine().getOptLevel() != CodeGenOpt::Aggressive)
return false;		return false;

		if (getFMAPatterns(Root, Patterns))
		return true;

return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns);		return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns);
}		}

		void PPCInstrInfo::genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
		switch (Pattern) {
		case MachineCombinerPattern::REASSOC_XY_AMM_BMM:
		case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:
		reassociateFMA(Root, Pattern, InsInstrs, DelInstrs, InstrIdxForVirtReg);
		break;
		default:
		// Reassociate default patterns.
		TargetInstrInfo::genAlternativeCodeSequence(Root, Pattern, InsInstrs,
		DelInstrs, InstrIdxForVirtReg);
		break;
		}
		}

		// Currently, only handle two patterns REASSOC_XY_AMM_BMM and
		// REASSOC_XMM_AMM_BMM. See comments for getFMAPatterns.
		void PPCInstrInfo::reassociateFMA(
		MachineInstr &Root, MachineCombinerPattern Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
		MachineFunction *MF = Root.getMF();
		MachineRegisterInfo &MRI = MF->getRegInfo();
		MachineOperand &OpC = Root.getOperand(0);
		Register RegC = OpC.getReg();
		const TargetRegisterClass *RC = MRI.getRegClass(RegC);
		MRI.constrainRegClass(RegC, RC);

		unsigned FmaOp = Root.getOpcode();
		int16_t Idx = getFMAOpIdxInfo(FmaOp);
		assert(Idx >= 0 && "Root must be a FMA instruction");

		uint16_t AddOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxAddOpIdx];
		uint16_t FirstMulOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxMULOpIdx];
		MachineInstr *Prev = MRI.getUniqueVRegDef(Root.getOperand(AddOpIdx).getReg());
		MachineInstr *Leaf =
		MRI.getUniqueVRegDef(Prev->getOperand(AddOpIdx).getReg());
		uint16_t IntersectedFlags =
		Root.getFlags() & Prev->getFlags() & Leaf->getFlags();

		auto GetOperandInfo = [&](const MachineOperand &Operand, Register &Reg,
		bool &KillFlag) {
		Reg = Operand.getReg();
		MRI.constrainRegClass(Reg, RC);
		KillFlag = Operand.isKill();
		};

		auto GetFMAInstrInfo = [&](const MachineInstr &Instr, Register &MulOp1,
		Register &MulOp2, bool &MulOp1KillFlag,
		bool &MulOp2KillFlag) {
		GetOperandInfo(Instr.getOperand(FirstMulOpIdx), MulOp1, MulOp1KillFlag);
		GetOperandInfo(Instr.getOperand(FirstMulOpIdx + 1), MulOp2, MulOp2KillFlag);
		};

		Register RegM11, RegM12, RegX, RegY, RegM21, RegM22, RegM31, RegM32;
		bool KillX = false, KillY = false, KillM11 = false, KillM12 = false,
		KillM21 = false, KillM22 = false, KillM31 = false, KillM32 = false;

		GetFMAInstrInfo(Root, RegM31, RegM32, KillM31, KillM32);
		GetFMAInstrInfo(*Prev, RegM21, RegM22, KillM21, KillM22);

		if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {
		GetFMAInstrInfo(*Leaf, RegM11, RegM12, KillM11, KillM12);
		GetOperandInfo(Leaf->getOperand(AddOpIdx), RegX, KillX);
		} else if (Pattern == MachineCombinerPattern::REASSOC_XY_AMM_BMM) {
		GetOperandInfo(Leaf->getOperand(1), RegX, KillX);
		GetOperandInfo(Leaf->getOperand(2), RegY, KillY);
		}

		// Create new virtual registers for the new results instead of
		// recycling legacy ones because the MachineCombiner's computation of the
		// critical path requires a new register definition rather than an existing
		// one.
		Register NewVRA = MRI.createVirtualRegister(RC);
		InstrIdxForVirtReg.insert(std::make_pair(NewVRA, 0));

		Register NewVRB = MRI.createVirtualRegister(RC);
		InstrIdxForVirtReg.insert(std::make_pair(NewVRB, 1));

		Register NewVRD = 0;
		if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {
		NewVRD = MRI.createVirtualRegister(RC);
		InstrIdxForVirtReg.insert(std::make_pair(NewVRD, 2));
		}

		auto AdjustOperandOrder = [&](MachineInstr *MI, Register RegAdd, bool KillAdd,
		Register RegMul1, bool KillRegMul1,
		Register RegMul2, bool KillRegMul2) {
		MI->getOperand(AddOpIdx).setReg(RegAdd);
		MI->getOperand(AddOpIdx).setIsKill(KillAdd);
		MI->getOperand(FirstMulOpIdx).setReg(RegMul1);
		MI->getOperand(FirstMulOpIdx).setIsKill(KillRegMul1);
		MI->getOperand(FirstMulOpIdx + 1).setReg(RegMul2);
		MI->getOperand(FirstMulOpIdx + 1).setIsKill(KillRegMul2);
		};

		if (Pattern == MachineCombinerPattern::REASSOC_XY_AMM_BMM) {
		// Create new instructions for insertion.
		MachineInstrBuilder MINewB =
		BuildMI(*MF, Prev->getDebugLoc(), get(FmaOp), NewVRB)
		.addReg(RegX, getKillRegState(KillX))
		.addReg(RegM21, getKillRegState(KillM21))
		.addReg(RegM22, getKillRegState(KillM22));
		MachineInstrBuilder MINewA =
		BuildMI(*MF, Root.getDebugLoc(), get(FmaOp), NewVRA)
		.addReg(RegY, getKillRegState(KillY))
		.addReg(RegM31, getKillRegState(KillM31))
		.addReg(RegM32, getKillRegState(KillM32));
		// if AddOpIdx is not 1, adjust the order.
		if (AddOpIdx != 1) {
		AdjustOperandOrder(MINewB, RegX, KillX, RegM21, KillM21, RegM22, KillM22);
		AdjustOperandOrder(MINewA, RegY, KillY, RegM31, KillM31, RegM32, KillM32);
		}

		MachineInstrBuilder MINewC =
		BuildMI(*MF, Root.getDebugLoc(),
		get(FMAOpIdxInfo[Idx][InfoArrayIdxFAddInst]), RegC)
		.addReg(NewVRB, getKillRegState(true))
		.addReg(NewVRA, getKillRegState(true));

		// update flags for new created instructions.
		setSpecialOperandAttr(*MINewA, IntersectedFlags);
		setSpecialOperandAttr(*MINewB, IntersectedFlags);
		setSpecialOperandAttr(*MINewC, IntersectedFlags);

		// Record new instructions for insertion.
		InsInstrs.push_back(MINewA);
		InsInstrs.push_back(MINewB);
		InsInstrs.push_back(MINewC);
		} else if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {
		assert(NewVRD && "new FMA register not created!");
		// Create new instructions for insertion.
		MachineInstrBuilder MINewA =
		BuildMI(*MF, Leaf->getDebugLoc(),
		get(FMAOpIdxInfo[Idx][InfoArrayIdxFMULInst]), NewVRA)
		.addReg(RegM11, getKillRegState(KillM11))
		.addReg(RegM12, getKillRegState(KillM12));
		MachineInstrBuilder MINewB =
		BuildMI(*MF, Prev->getDebugLoc(), get(FmaOp), NewVRB)
		.addReg(RegX, getKillRegState(KillX))
		.addReg(RegM21, getKillRegState(KillM21))
		.addReg(RegM22, getKillRegState(KillM22));
		MachineInstrBuilder MINewD =
		BuildMI(*MF, Root.getDebugLoc(), get(FmaOp), NewVRD)
		.addReg(NewVRA, getKillRegState(true))
		.addReg(RegM31, getKillRegState(KillM31))
		.addReg(RegM32, getKillRegState(KillM32));
		// If AddOpIdx is not 1, adjust the order.
		if (AddOpIdx != 1) {
		AdjustOperandOrder(MINewB, RegX, KillX, RegM21, KillM21, RegM22, KillM22);
		AdjustOperandOrder(MINewD, NewVRA, true, RegM31, KillM31, RegM32,
		KillM32);
		}

		MachineInstrBuilder MINewC =
		BuildMI(*MF, Root.getDebugLoc(),
		get(FMAOpIdxInfo[Idx][InfoArrayIdxFAddInst]), RegC)
		.addReg(NewVRB, getKillRegState(true))
		.addReg(NewVRD, getKillRegState(true));

		// update flags for new created instructions.
		setSpecialOperandAttr(*MINewA, IntersectedFlags);
		setSpecialOperandAttr(*MINewB, IntersectedFlags);
		setSpecialOperandAttr(*MINewD, IntersectedFlags);
		setSpecialOperandAttr(*MINewC, IntersectedFlags);

		// Record new instructions for insertion.
		InsInstrs.push_back(MINewA);
		InsInstrs.push_back(MINewB);
		InsInstrs.push_back(MINewD);
		InsInstrs.push_back(MINewC);
		}

		assert(!InsInstrs.empty() &&
		"Insertion instructions set should not be empty!");

		// Record old instructions for deletion.
		DelInstrs.push_back(Leaf);
		DelInstrs.push_back(Prev);
		DelInstrs.push_back(&Root);
		}

// Detect 32 -> 64-bit extensions where we may reuse the low sub-register.		// Detect 32 -> 64-bit extensions where we may reuse the low sub-register.
bool PPCInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,		bool PPCInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
Register &SrcReg, Register &DstReg,		Register &SrcReg, Register &DstReg,
unsigned &SubIdx) const {		unsigned &SubIdx) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return false;		default: return false;
case PPC::EXTSW:		case PPC::EXTSW:
case PPC::EXTSW_32:		case PPC::EXTSW_32:
▲ Show 20 Lines • Show All 3,929 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/machine-combiner.ll

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	; FIXPOINT-NEXT: blr
%t1 = mul i64 %t0, %x2		%t1 = mul i64 %t0, %x2
%t2 = mul i64 %t1, %x3		%t2 = mul i64 %t1, %x3
ret i64 %t2		ret i64 %t2
}		}

define double @reassociate_mamaa_double(double %0, double %1, double %2, double %3, double %4, double %5) {		define double @reassociate_mamaa_double(double %0, double %1, double %2, double %3, double %4, double %5) {
; CHECK-LABEL: reassociate_mamaa_double:		; CHECK-LABEL: reassociate_mamaa_double:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-QPX: fadd 0, 2, 1		; CHECK-QPX-DAG: fmadd [[REG0:[0-9]+]], 4, 3, 2
; CHECK-QPX: fmadd 0, 4, 3, 0		; CHECK-QPX-DAG: fmadd [[REG1:[0-9]+]], 6, 5, 1
; CHECK-QPX: fmadd 1, 6, 5, 0		; CHECK-QPX: fadd 1, [[REG0]], [[REG1]]
		; CHECK-PWR-DAG: xsmaddadp 1, 6, 5
		; CHECK-PWR-DAG: xsmaddadp 2, 4, 3
; CHECK-PWR: xsadddp 1, 2, 1		; CHECK-PWR: xsadddp 1, 2, 1
; CHECK-PWR: xsmaddadp 1, 4, 3
; CHECK-PWR: xsmaddadp 1, 6, 5
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%7 = fmul reassoc nsz double %3, %2		%7 = fmul reassoc nsz double %3, %2
%8 = fmul reassoc nsz double %5, %4		%8 = fmul reassoc nsz double %5, %4
%9 = fadd reassoc nsz double %1, %0		%9 = fadd reassoc nsz double %1, %0
%10 = fadd reassoc nsz double %9, %7		%10 = fadd reassoc nsz double %9, %7
%11 = fadd reassoc nsz double %10, %8		%11 = fadd reassoc nsz double %10, %8
ret double %11		ret double %11
}		}

; FIXME: should use xsmaddasp instead of fmadds for pwr7 arch.		; FIXME: should use xsmaddasp instead of fmadds for pwr7 arch.
define float @reassociate_mamaa_float(float %0, float %1, float %2, float %3, float %4, float %5) {		define float @reassociate_mamaa_float(float %0, float %1, float %2, float %3, float %4, float %5) {
; CHECK-LABEL: reassociate_mamaa_float:		; CHECK-LABEL: reassociate_mamaa_float:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK: fadds 0, 2, 1		; CHECK-DAG: fmadds [[REG0:[0-9]+]], 4, 3, 2
; CHECK: fmadds 0, 4, 3, 0		; CHECK-DAG: fmadds [[REG1:[0-9]+]], 6, 5, 1
; CHECK: fmadds 1, 6, 5, 0		; CHECK: fadds 1, [[REG0]], [[REG1]]
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%7 = fmul reassoc nsz float %3, %2		%7 = fmul reassoc nsz float %3, %2
%8 = fmul reassoc nsz float %5, %4		%8 = fmul reassoc nsz float %5, %4
%9 = fadd reassoc nsz float %1, %0		%9 = fadd reassoc nsz float %1, %0
%10 = fadd reassoc nsz float %9, %7		%10 = fadd reassoc nsz float %9, %7
%11 = fadd reassoc nsz float %10, %8		%11 = fadd reassoc nsz float %10, %8
ret float %11		ret float %11
}		}

define <4 x float> @reassociate_mamaa_vec(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3, <4 x float> %4, <4 x float> %5) {		define <4 x float> @reassociate_mamaa_vec(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3, <4 x float> %4, <4 x float> %5) {
; CHECK-LABEL: reassociate_mamaa_vec:		; CHECK-LABEL: reassociate_mamaa_vec:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-QPX: qvfadds 0, 2, 1		; CHECK-QPX-DAG: qvfmadds [[REG0:[0-9]+]], 4, 3, 2
; CHECK-QPX: qvfmadds 0, 4, 3, 0		; CHECK-QPX-DAG: qvfmadds [[REG1:[0-9]+]], 6, 5, 1
; CHECK-QPX: qvfmadds 1, 6, 5, 0		; CHECK-QPX: qvfadds 1, [[REG0]], [[REG1]]
; CHECK-PWR: xvaddsp 34, 35, 34		; CHECK-PWR-DAG: xvmaddasp [[REG0:[0-9]+]], 39, 38
; CHECK-PWR: xvmaddasp 34, 37, 36		; CHECK-PWR-DAG: xvmaddasp [[REG1:[0-9]+]], 37, 36
; CHECK-PWR: xvmaddasp 34, 39, 38		; CHECK-PWR: xvaddsp 34, [[REG1]], [[REG0]]
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%7 = fmul reassoc nsz <4 x float> %3, %2		%7 = fmul reassoc nsz <4 x float> %3, %2
%8 = fmul reassoc nsz <4 x float> %5, %4		%8 = fmul reassoc nsz <4 x float> %5, %4
%9 = fadd reassoc nsz <4 x float> %1, %0		%9 = fadd reassoc nsz <4 x float> %1, %0
%10 = fadd reassoc nsz <4 x float> %9, %7		%10 = fadd reassoc nsz <4 x float> %9, %7
%11 = fadd reassoc nsz <4 x float> %10, %8		%11 = fadd reassoc nsz <4 x float> %10, %8
ret <4 x float> %11		ret <4 x float> %11
}		}

define double @reassociate_mamama_double(double %0, double %1, double %2, double %3, double %4, double %5, double %6, double %7, double %8) {		define double @reassociate_mamama_double(double %0, double %1, double %2, double %3, double %4, double %5, double %6, double %7, double %8) {
; CHECK-LABEL: reassociate_mamama_double:		; CHECK-LABEL: reassociate_mamama_double:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-QPX: fmadd 0, 2, 1, 7		; CHECK-QPX: fmadd [[REG0:[0-9]+]], 2, 1, 7
; CHECK-QPX-DAG: fmadd 0, 4, 3, 0		; CHECK-QPX-DAG: fmul [[REG1:[0-9]+]], 4, 3
; CHECK-QPX-DAG: fmadd 0, 6, 5, 0		; CHECK-QPX-DAG: fmadd [[REG2:[0-9]+]], 6, 5, [[REG0]]
; CHECK-QPX: fmadd 1, 9, 8, 0		; CHECK-QPX-DAG: fmadd [[REG3:[0-9]+]], 9, 8, [[REG1]]
		; CHECK-QPX: fadd 1, [[REG2]], [[REG3]]
; CHECK-PWR: xsmaddadp 7, 2, 1		; CHECK-PWR: xsmaddadp 7, 2, 1
; CHECK-PWR-DAG: xsmaddadp 7, 4, 3		; CHECK-PWR-DAG: xsmuldp [[REG0:[0-9]+]], 4, 3
; CHECK-PWR-DAG: xsmaddadp 7, 6, 5		; CHECK-PWR-DAG: xsmaddadp 7, 6, 5
; CHECK-PWR-DAG: xsmaddadp 7, 9, 8		; CHECK-PWR-DAG: xsmaddadp [[REG0]], 9, 8
; CHECK-PWR: fmr 1, 7		; CHECK-PWR: xsadddp 1, 7, [[REG0]]
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%10 = fmul reassoc nsz double %1, %0		%10 = fmul reassoc nsz double %1, %0
%11 = fmul reassoc nsz double %3, %2		%11 = fmul reassoc nsz double %3, %2
%12 = fmul reassoc nsz double %5, %4		%12 = fmul reassoc nsz double %5, %4
%13 = fmul reassoc nsz double %8, %7		%13 = fmul reassoc nsz double %8, %7
%14 = fadd reassoc nsz double %11, %10		%14 = fadd reassoc nsz double %11, %10
%15 = fadd reassoc nsz double %14, %6		%15 = fadd reassoc nsz double %14, %6
%16 = fadd reassoc nsz double %15, %12		%16 = fadd reassoc nsz double %15, %12
%17 = fadd reassoc nsz double %16, %13		%17 = fadd reassoc nsz double %16, %13
ret double %17		ret double %17
}		}

; FIXME: should use xsmaddasp instead of fmadds for pwr7 arch.		; FIXME: should use xsmaddasp instead of fmadds for pwr7 arch.
define dso_local float @reassociate_mamama_8(float %0, float %1, float %2, float %3, float %4, float %5, float %6, float %7, float %8,		define dso_local float @reassociate_mamama_8(float %0, float %1, float %2, float %3, float %4, float %5, float %6, float %7, float %8,
float %9, float %10, float %11, float %12, float %13, float %14, float %15, float %16) {		float %9, float %10, float %11, float %12, float %13, float %14, float %15, float %16) {
; CHECK-LABEL: reassociate_mamama_8:		; CHECK-LABEL: reassociate_mamama_8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK: fmadds [[REG0:[0-9]+]], 3, 2, 1		; CHECK-DAG: fmadds [[REG0:[0-9]+]], 3, 2, 1
; CHECK-DAG: fmadds [[REG0]], 5, 4, [[REG0]]		; CHECK-DAG: fmuls [[REG1:[0-9]+]], 5, 4
; CHECK-DAG: fmadds [[REG0]], 7, 6, [[REG0]]		; CHECK-DAG: fmadds [[REG2:[0-9]+]], 7, 6, [[REG0]]
; CHECK-DAG: fmadds [[REG0]], 9, 8, [[REG0]]		; CHECK-DAG: fmadds [[REG3:[0-9]+]], 9, 8, [[REG1]]
; CHECK-DAG: fmadds [[REG0]], 13, 12, [[REG0]]		;
; CHECK-DAG: fmadds [[REG0]], 11, 10, [[REG0]]		; CHECK-DAG: fmadds [[REG4:[0-9]+]], 13, 12, [[REG3]]
		; CHECK-DAG: fmadds [[REG5:[0-9]+]], 11, 10, [[REG2]]
;		;
; CHECK: fmadds [[REG0]],		; CHECK-DAG: fmadds [[REG6:[0-9]+]], 3, 2, [[REG4]]
; CHECK: fmadds 1,		; CHECK-DAG: fmadds [[REG7:[0-9]+]], 5, 4, [[REG5]]
		; CHECK: fadds 1, [[REG7]], [[REG6]]
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%18 = fmul reassoc nsz float %2, %1		%18 = fmul reassoc nsz float %2, %1
%19 = fadd reassoc nsz float %18, %0		%19 = fadd reassoc nsz float %18, %0
%20 = fmul reassoc nsz float %4, %3		%20 = fmul reassoc nsz float %4, %3
%21 = fadd reassoc nsz float %19, %20		%21 = fadd reassoc nsz float %19, %20
%22 = fmul reassoc nsz float %6, %5		%22 = fmul reassoc nsz float %6, %5
%23 = fadd reassoc nsz float %21, %22		%23 = fadd reassoc nsz float %21, %22
%24 = fmul reassoc nsz float %8, %7		%24 = fmul reassoc nsz float %8, %7
%25 = fadd reassoc nsz float %23, %24		%25 = fadd reassoc nsz float %23, %24
Show All 11 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][MachineCombiner] reassociate fma to expose more ILPClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 270657

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

llvm/lib/CodeGen/MachineCombiner.cpp

llvm/lib/Target/PowerPC/PPCInstrInfo.h

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp

llvm/test/CodeGen/PowerPC/machine-combiner.ll

[PowerPC][MachineCombiner] reassociate fma to expose more ILP
ClosedPublic