This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetInstrInfo.h
-
lib/
-
CodeGen/
-
MachineCombiner.cpp
-
TargetInstrInfo.cpp
-
Target/RISCV/
-
RISCV/
-
RISCVInstrInfo.h
-
RISCVInstrInfo.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
machine-combiner-strategies.ll
-
machine-combiner.ll

Differential D140542

[MachineCombiner] Support local strategy for traces
ClosedPublic

Authored by asi-sc on Dec 22 2022, 2:57 AM.

Download Raw Diff

Details

Reviewers

Gerolf
craig.topper
spatel
fhahn
asb

Commits

rG2693efa8a5bc: [MachineCombiner] Support local strategy for traces

Summary

For in-order cores MachineCombiner makes better decisions when the critical path
is calculated only for the current basic block and does not take into account
other blocks from the trace.

This patch adds a virtual method to TargetInstrInfo to allow each target decide
which strategy to use.

Depends on D140541

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

asi-sc created this revision.Dec 22 2022, 2:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 2:57 AM

Herald added subscribers: frasercrmck, luismarques, apazos and 20 others. · View Herald Transcript

asi-sc requested review of this revision.Dec 22 2022, 2:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 2:57 AM

Herald added subscribers: llvm-commits, • pcwang-thead, MaskRay. · View Herald Transcript

Sync with changes in parent patches

Harbormaster completed remote builds in B204547: Diff 484790.Dec 22 2022, 4:29 AM

Update RISCV/machine-combiner.ll

asi-sc added reviewers: Gerolf, craig.topper, spatel, fhahn, asb.Dec 22 2022, 5:20 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptDec 22 2022, 5:20 AM

Harbormaster completed remote builds in B204552: Diff 484799.Dec 22 2022, 5:54 AM

Rebase

Harbormaster completed remote builds in B206483: Diff 487354.Jan 9 2023, 4:23 AM

Ping

@dmgreen do you know if this would be beneficial for in-order AArch64 cores as well? Or is there the chance that the RISCV models are missing some information?

For in-order cores MachineCombiner makes better decisions when the critical path

@asi-sc did you do any measurements to collect empirical data?

For in-order cores MachineCombiner makes better decisions when the critical path

@asi-sc did you do any measurements to collect empirical data?

I have performance impact only for microbenchmarks as execution time fluctuations on SPEC is higher than the performance change. However, there are some statistics just in case

Program                                       machine-combiner.NumInstCombined                                                                                                                                                                                                            
                                              results                          min-instr diff                                                                                                                                                                                             
83.xalancbmk/483.xalancbmk                     243.00                           328.00    35.0%                                                                                                                                                                                           
64.h264ref/464.h264ref                         878.00                           909.00     3.5%                                                                                                                                                                                           
00.perlbench/400.perlbench                     272.00                           279.00     2.6%                                                                                                                                                                                           
03.gcc/403.gcc                                 946.00                           946.00     0.0%                                                                                                                                                                                           
29.mcf/429.mcf                                   2.00                             2.00     0.0%                                                                                                                                                                                           
73.astar/473.astar                               5.00                             5.00     0.0%                                                                                                                                                                                           
45.gobmk/445.gobmk                            1025.00                          1020.00    -0.5%                                                                                                                                                                                           
62.libquantum/462.libquantum                    27.00                            26.00    -3.7%                                                                                                                                                                                           
58.sjeng/458.sjeng                              80.00                            76.00    -5.0%                                                                                                                                                                                           
01.bzip2/401.bzip2                             135.00                           127.00    -5.9%                                                                                                                                                                                           
71.omnetpp/471.omnetpp                          12.00                            11.00    -8.3%                                                                                                                                                                                           
56.hmmer/456.hmmer                             429.00                           360.00   -16.1%

I cannot share details but my testing shows that FLOPS module 7 is 1.5% faster for in-order RISCV core when local strategy is used. The test I attached to this patch is a minimization of a performance problem in a real application that with different strategies shows ~3% performance change (~1.5% for sifive-u74). MultiSource/Benchmarks/Ptrdist/bc/bc from llvm-test-suite also speedups by 3%. Other tests from llvm-test-suite show no measurable performance difference.

Apart from execution time, local strategy reduces compilation time as traces become smaller. I randomly ran time-report and there are no regressions and often improvements up to 20% of the pass time (total impact is hardly noticeable).

Gentle ping

Herald added a subscriber: luke. · View Herald TranscriptJan 18 2023, 1:07 AM

I don't see anything wrong with this patch, but I haven't looked at MachineCombiner in a long time, so I'm likely not the best reviewer.

I do want to note a potential alternative - have you looked at using enableAggressiveFMAFusion() in DAGCombiner?
If you flip that switch for RISCV, the sub3 calculation in the last block would become:

	fmadd.d	ft1, ft0, fa1, fa0
	fnmsub.d	fa0, ft0, fa1, ft1

If the fma instruction timing is the same as plain fmul on your target, then is that even better than the 1 fmadd produced on the test here?

In D140542#4083878, @spatel wrote:
I don't see anything wrong with this patch, but I haven't looked at MachineCombiner in a long time, so I'm likely not the best reviewer.

I do want to note a potential alternative - have you looked at using enableAggressiveFMAFusion() in DAGCombiner?
If you flip that switch for RISCV, the sub3 calculation in the last block would become:
	fmadd.d	ft1, ft0, fa1, fa0
	fnmsub.d	fa0, ft0, fa1, ft1
If the fma instruction timing is the same as plain fmul on your target, then is that even better than the 1 fmadd produced on the test here?

@spatel , yeah, there are almost no recent contributions to machine combiner, so I had to find people who contributed to it years ago.

Thanks for the suggestion, I agree it can fix the provided test, but it won't solve the problem in general: when machine combiner estimates profitability, it bases the decision on the trace gathered according to some strategy. The current implementation supports only one strategy that choses multiple blocks from a function. So, when we ask machine combiner to check whether the suggested pattern is good or bad, an instruction that was really far away (let's say 4 BBs and 400 instructions above) may dramatically affect its decision (when it is on the critical path). It works fine for CPUs with big OOO buffer, but not for a tiny CPU that executes instructions in-order. My experiments show that for such CPUs we should calculate critical path separately for each BB as instructions that were in other BBs usually has almost no effect. Does it sound reasonable?

In D140542#4086019, @asi-sc wrote:

Thanks for the suggestion, I agree it can fix the provided test, but it won't solve the problem in general: when machine combiner estimates profitability, it bases the decision on the trace gathered according to some strategy. The current implementation supports only one strategy that choses multiple blocks from a function. So, when we ask machine combiner to check whether the suggested pattern is good or bad, an instruction that was really far away (let's say 4 BBs and 400 instructions above) may dramatically affect its decision (when it is on the critical path). It works fine for CPUs with big OOO buffer, but not for a tiny CPU that executes instructions in-order. My experiments show that for such CPUs we should calculate critical path separately for each BB as instructions that were in other BBs usually has almost no effect. Does it sound reasonable?

Yes, that seems like a valid approach. However, I have no experience with any recent in-order cores, so it would be interesting to see if the experimental results that you got with sifive-u74 can be replicated for other in-order cores.

A quick scan of in-tree targets with MicroOpBufferSize = 0 says that we could try benchmarking on SiFive7 and a variety of Arm CPUs (M4, M55, Cortex-M7, Cortex-A55, etc).

IIUC, only RISCV is overriding the default with this patch, so no other arch will be affected, and so I don't think we need to hold this patch up waiting for more experimental data.
But does that mean the test difference that you are showing will also occur for SiFive7, Syntacore SCR1, and/or Rocket with this patch? If so, can you add a RUN line to the test file like that?

Improve test coverage

Harbormaster completed remote builds in B211204: Diff 493914.Feb 1 2023, 5:26 AM

In D140542#4090627, @spatel wrote:

But does that mean the test difference that you are showing will also occur for SiFive7, Syntacore SCR1, and/or Rocket with this patch? If so, can you add a RUN line to the test file like that?

I added one more test for Syntacore-SCR1 and SiFive-u74. It shows the desired reassociation when the local strategy is used. Reassociated in this way instructions demonstrate better ILP.
One thing we should understand is that RISC-V reassociation patterns for machine combiner are not really interesting for Syntacore SCR1 as it is a single-issue CPU and doesn't support FP extensions. However, when locally strategy is used, resulted asm is not worse for Syntacore-SCR1 and slightly better for SiFive-u74 (which is dual-issue).

In D140542#4096373, @asi-sc wrote:

In D140542#4090627, @spatel wrote:

But does that mean the test difference that you are showing will also occur for SiFive7, Syntacore SCR1, and/or Rocket with this patch? If so, can you add a RUN line to the test file like that?

I added one more test for Syntacore-SCR1 and SiFive-u74. It shows the desired reassociation when the local strategy is used. Reassociated in this way instructions demonstrate better ILP.
One thing we should understand is that RISC-V reassociation patterns for machine combiner are not really interesting for Syntacore SCR1 as it is a single-issue CPU and doesn't support FP extensions. However, when locally strategy is used, resulted asm is not worse for Syntacore-SCR1 and slightly better for SiFive-u74 (which is dual-issue).

Thanks - I don't know anything about RISC-V chips, so I'm deferring to others on that.

The asm diffs in existing test files show that we are affecting the default behavior of RISC-V compiles, right? If those are considered neutral or improvements, then I think this is good to go.

Given that this is RISC-V and under a flag, this LGTM. I would like to see stats on the FMA's plus the changes to the cycle counts on the critical path, and see how the data correlate to your measured run-time performance numbers. And ditto for the current heuristic. This might also help understand the wide variety of results in your SPEC data. Your numbers look all over the place. Also, you can probably push your idea more by allowing a parameterized schedule window (eg 10 or 15 instructions) rather than a basic block. This would allow you to catch cases across blocks and should work better for large blocks. Finally, I would not be surprised - just learning from your insights here and guessing - that various in-order processors show best performance for different window sizes. All this is just food for thought for additional/future work though. Cheers!

Use MinInstrCount strategy when scheduling model is not specified.

Harbormaster completed remote builds in B212067: Diff 495085.Feb 6 2023, 4:42 AM

In D140542#4096793, @spatel wrote:

The asm diffs in existing test files show that we are affecting the default behavior of RISC-V compiles, right? If those are considered neutral or improvements, then I think this is good to go.

Thanks for the question. Although changes were neutral, I decided we shouldn't change the default behavior. At least not in this change. So, I slightly updated the patch.

In D140542#4098268, @Gerolf wrote:

Given that this is RISC-V and under a flag, this LGTM. I would like to see stats on the FMA's plus the changes to the cycle counts on the critical path, and see how the data correlate to your measured run-time performance numbers. And ditto for the current heuristic. This might also help understand the wide variety of results in your SPEC data. Your numbers look all over the place. Also, you can probably push your idea more by allowing a parameterized schedule window (eg 10 or 15 instructions) rather than a basic block. This would allow you to catch cases across blocks and should work better for large blocks. Finally, I would not be surprised - just learning from your insights here and guessing - that various in-order processors show best performance for different window sizes. All this is just food for thought for additional/future work though. Cheers!

Using a schedule window is an interesting idea, thank you. It'll definitely work better at basic block boundaries, however I can imagine basic blocks for which reasonably small schedule window won't work well because of instructions placement before scheduling. It might be a rare corner case though. I'll experiment with this idea at my free time.

LGTM for the limited scope of the patch.
I don't know anything about those particular targets, so be prepared to deal with regressions/requests from developers on those platforms. :)

This revision is now accepted and ready to land.Feb 6 2023, 5:15 AM

This revision was landed with ongoing or failed builds.Feb 17 2023, 2:18 AM

Closed by commit rG2693efa8a5bc: [MachineCombiner] Support local strategy for traces (authored by asi-sc). · Explain Why

This revision was automatically updated to reflect the committed changes.

asi-sc added a commit: rG2693efa8a5bc: [MachineCombiner] Support local strategy for traces.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetInstrInfo.h

4 lines

lib/

CodeGen/

MachineCombiner.cpp

15 lines

TargetInstrInfo.cpp

5 lines

Target/

RISCV/

RISCVInstrInfo.h

2 lines

RISCVInstrInfo.cpp

25 lines

test/

CodeGen/

RISCV/

machine-combiner-strategies.ll

86 lines

machine-combiner.ll

48 lines

Diff 495085

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
class SMSchedule;		class SMSchedule;
class SwingSchedulerDAG;		class SwingSchedulerDAG;
class RegScavenger;		class RegScavenger;
class TargetRegisterClass;		class TargetRegisterClass;
class TargetRegisterInfo;		class TargetRegisterInfo;
class TargetSchedModel;		class TargetSchedModel;
class TargetSubtargetInfo;		class TargetSubtargetInfo;
enum class MachineCombinerPattern;		enum class MachineCombinerPattern;
		enum class MachineTraceStrategy;

template <class T> class SmallVectorImpl;		template <class T> class SmallVectorImpl;

using ParamLoadedValue = std::pair<MachineOperand, DIExpression*>;		using ParamLoadedValue = std::pair<MachineOperand, DIExpression*>;

struct DestSourcePair {		struct DestSourcePair {
const MachineOperand *Destination;		const MachineOperand *Destination;
const MachineOperand *Source;		const MachineOperand *Source;
▲ Show 20 Lines • Show All 1,174 Lines • ▼ Show 20 Lines	public:
/// Set special operand attributes for new instructions after reassociation.		/// Set special operand attributes for new instructions after reassociation.
virtual void setSpecialOperandAttr(MachineInstr &OldMI1, MachineInstr &OldMI2,		virtual void setSpecialOperandAttr(MachineInstr &OldMI1, MachineInstr &OldMI2,
MachineInstr &NewMI1,		MachineInstr &NewMI1,
MachineInstr &NewMI2) const {}		MachineInstr &NewMI2) const {}

/// Return true when a target supports MachineCombiner.		/// Return true when a target supports MachineCombiner.
virtual bool useMachineCombiner() const { return false; }		virtual bool useMachineCombiner() const { return false; }

		/// Return a strategy that MachineCombiner must use when creating traces.
		virtual MachineTraceStrategy getMachineCombinerTraceStrategy() const;

/// Return true if the given SDNode can be copied during scheduling		/// Return true if the given SDNode can be copied during scheduling
/// even if it has glue.		/// even if it has glue.
virtual bool canCopyGluedNodeDuringSchedule(SDNode *N) const { return false; }		virtual bool canCopyGluedNodeDuringSchedule(SDNode *N) const { return false; }

protected:		protected:
/// Target-dependent implementation for foldMemoryOperand.		/// Target-dependent implementation for foldMemoryOperand.
/// Target-independent code in foldMemoryOperand will		/// Target-independent code in foldMemoryOperand will
/// take care of adding a MachineMemOperand to the newly created instruction.		/// take care of adding a MachineMemOperand to the newly created instruction.
▲ Show 20 Lines • Show All 867 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineCombiner.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	public:
StringRef getPassName() const override { return "Machine InstCombiner"; }		StringRef getPassName() const override { return "Machine InstCombiner"; }

private:		private:
bool combineInstructions(MachineBasicBlock *);		bool combineInstructions(MachineBasicBlock *);
MachineInstr *getOperandDef(const MachineOperand &MO);		MachineInstr *getOperandDef(const MachineOperand &MO);
bool isTransientMI(const MachineInstr *MI);		bool isTransientMI(const MachineInstr *MI);
unsigned getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,		unsigned getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
MachineTraceMetrics::Trace BlockTrace);		MachineTraceMetrics::Trace BlockTrace,
		const MachineBasicBlock &MBB);
unsigned getLatency(MachineInstr Root, MachineInstr NewRoot,		unsigned getLatency(MachineInstr Root, MachineInstr NewRoot,
MachineTraceMetrics::Trace BlockTrace);		MachineTraceMetrics::Trace BlockTrace);
bool		bool
improvesCriticalPathLen(MachineBasicBlock MBB, MachineInstr Root,		improvesCriticalPathLen(MachineBasicBlock MBB, MachineInstr Root,
MachineTraceMetrics::Trace BlockTrace,		MachineTraceMetrics::Trace BlockTrace,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
/// \param InstrIdxForVirtReg is a dense map of virtual register to index		/// \param InstrIdxForVirtReg is a dense map of virtual register to index
/// of defining machine instruction in \p InsInstrs		/// of defining machine instruction in \p InsInstrs
/// \param BlockTrace is a trace of machine instructions		/// \param BlockTrace is a trace of machine instructions
///		///
/// \returns Depth of last instruction in \InsInstrs ("NewRoot")		/// \returns Depth of last instruction in \InsInstrs ("NewRoot")
unsigned		unsigned
MachineCombiner::getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,		MachineCombiner::getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
MachineTraceMetrics::Trace BlockTrace) {		MachineTraceMetrics::Trace BlockTrace,
		const MachineBasicBlock &MBB) {
SmallVector<unsigned, 16> InstrDepth;		SmallVector<unsigned, 16> InstrDepth;
// For each instruction in the new sequence compute the depth based on the		// For each instruction in the new sequence compute the depth based on the
// operands. Use the trace information when possible. For new operands which		// operands. Use the trace information when possible. For new operands which
// are tracked in the InstrIdxForVirtReg map depth is looked up in InstrDepth		// are tracked in the InstrIdxForVirtReg map depth is looked up in InstrDepth
for (auto *InstrPtr : InsInstrs) { // for each Use		for (auto *InstrPtr : InsInstrs) { // for each Use
unsigned IDepth = 0;		unsigned IDepth = 0;
for (const MachineOperand &MO : InstrPtr->operands()) {		for (const MachineOperand &MO : InstrPtr->operands()) {
// Check for virtual register operand.		// Check for virtual register operand.
Show All 13 Lines	for (const MachineOperand &MO : InstrPtr->operands()) {
"There must be a definition for a new virtual register");		"There must be a definition for a new virtual register");
DepthOp = InstrDepth[II->second];		DepthOp = InstrDepth[II->second];
int DefIdx = DefInstr->findRegisterDefOperandIdx(MO.getReg());		int DefIdx = DefInstr->findRegisterDefOperandIdx(MO.getReg());
int UseIdx = InstrPtr->findRegisterUseOperandIdx(MO.getReg());		int UseIdx = InstrPtr->findRegisterUseOperandIdx(MO.getReg());
LatencyOp = TSchedModel.computeOperandLatency(DefInstr, DefIdx,		LatencyOp = TSchedModel.computeOperandLatency(DefInstr, DefIdx,
InstrPtr, UseIdx);		InstrPtr, UseIdx);
} else {		} else {
MachineInstr *DefInstr = getOperandDef(MO);		MachineInstr *DefInstr = getOperandDef(MO);
if (DefInstr) {		if (DefInstr && (TII->getMachineCombinerTraceStrategy() !=
		MachineTraceStrategy::TS_Local \|\|
		DefInstr->getParent() == &MBB)) {
DepthOp = BlockTrace.getInstrCycles(*DefInstr).Depth;		DepthOp = BlockTrace.getInstrCycles(*DefInstr).Depth;
if (!isTransientMI(DefInstr))		if (!isTransientMI(DefInstr))
LatencyOp = TSchedModel.computeOperandLatency(		LatencyOp = TSchedModel.computeOperandLatency(
DefInstr, DefInstr->findRegisterDefOperandIdx(MO.getReg()),		DefInstr, DefInstr->findRegisterDefOperandIdx(MO.getReg()),
InstrPtr, InstrPtr->findRegisterUseOperandIdx(MO.getReg()));		InstrPtr, InstrPtr->findRegisterUseOperandIdx(MO.getReg()));
}		}
}		}
IDepth = std::max(IDepth, DepthOp + LatencyOp);		IDepth = std::max(IDepth, DepthOp + LatencyOp);
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	bool MachineCombiner::improvesCriticalPathLen(
MachineBasicBlock MBB, MachineInstr Root,		MachineBasicBlock MBB, MachineInstr Root,
MachineTraceMetrics::Trace BlockTrace,		MachineTraceMetrics::Trace BlockTrace,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
MachineCombinerPattern Pattern,		MachineCombinerPattern Pattern,
bool SlackIsAccurate) {		bool SlackIsAccurate) {
// Get depth and latency of NewRoot and Root.		// Get depth and latency of NewRoot and Root.
unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace);		unsigned NewRootDepth =
		getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace, *MBB);
unsigned RootDepth = BlockTrace.getInstrCycles(*Root).Depth;		unsigned RootDepth = BlockTrace.getInstrCycles(*Root).Depth;

LLVM_DEBUG(dbgs() << " Dependence data for " << *Root << "\tNewRootDepth: "		LLVM_DEBUG(dbgs() << " Dependence data for " << *Root << "\tNewRootDepth: "
<< NewRootDepth << "\tRootDepth: " << RootDepth);		<< NewRootDepth << "\tRootDepth: " << RootDepth);

// For a transform such as reassociation, the cost equation is		// For a transform such as reassociation, the cost equation is
// conservatively calculated so that we must improve the depth (data		// conservatively calculated so that we must improve the depth (data
// dependency cycles) in the critical path to proceed with the transform.		// dependency cycles) in the critical path to proceed with the transform.
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	bool MachineCombiner::combineInstructions(MachineBasicBlock *MBB) {
LLVM_DEBUG(dbgs() << "Combining MBB " << MBB->getName() << "\n");		LLVM_DEBUG(dbgs() << "Combining MBB " << MBB->getName() << "\n");

bool IncrementalUpdate = false;		bool IncrementalUpdate = false;
auto BlockIter = MBB->begin();		auto BlockIter = MBB->begin();
decltype(BlockIter) LastUpdate;		decltype(BlockIter) LastUpdate;
// Check if the block is in a loop.		// Check if the block is in a loop.
const MachineLoop *ML = MLI->getLoopFor(MBB);		const MachineLoop *ML = MLI->getLoopFor(MBB);
if (!TraceEnsemble)		if (!TraceEnsemble)
TraceEnsemble = Traces->getEnsemble(MachineTraceStrategy::TS_MinInstrCount);		TraceEnsemble = Traces->getEnsemble(TII->getMachineCombinerTraceStrategy());

SparseSet<LiveRegUnit> RegUnits;		SparseSet<LiveRegUnit> RegUnits;
RegUnits.setUniverse(TRI->getNumRegUnits());		RegUnits.setUniverse(TRI->getNumRegUnits());

bool OptForSize = OptSize \|\| llvm::shouldOptimizeForSize(MBB, PSI, MBFI);		bool OptForSize = OptSize \|\| llvm::shouldOptimizeForSize(MBB, PSI, MBFI);

bool DoRegPressureReduce =		bool DoRegPressureReduce =
TII->shouldReduceRegisterPressure(MBB, &RegClassInfo);		TII->shouldReduceRegisterPressure(MBB, &RegClassInfo);
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetInstrInfo.cpp

Show All 13 Lines
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/BinaryFormat/Dwarf.h"		#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/CodeGen/MachineCombinerPattern.h"		#include "llvm/CodeGen/MachineCombinerPattern.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"
		#include "llvm/CodeGen/MachineTraceMetrics.h"
#include "llvm/CodeGen/PseudoSourceValue.h"		#include "llvm/CodeGen/PseudoSourceValue.h"
#include "llvm/CodeGen/ScoreboardHazardRecognizer.h"		#include "llvm/CodeGen/ScoreboardHazardRecognizer.h"
#include "llvm/CodeGen/StackMaps.h"		#include "llvm/CodeGen/StackMaps.h"
#include "llvm/CodeGen/TargetFrameLowering.h"		#include "llvm/CodeGen/TargetFrameLowering.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSchedule.h"		#include "llvm/CodeGen/TargetSchedule.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
▲ Show 20 Lines • Show All 1,014 Lines • ▼ Show 20 Lines	void TargetInstrInfo::genAlternativeCodeSequence(
if (Prev->getParent() != Root.getParent())		if (Prev->getParent() != Root.getParent())
return;		return;

assert(Prev && "Unknown pattern for machine combiner");		assert(Prev && "Unknown pattern for machine combiner");

reassociateOps(Root, *Prev, Pattern, InsInstrs, DelInstrs, InstIdxForVirtReg);		reassociateOps(Root, *Prev, Pattern, InsInstrs, DelInstrs, InstIdxForVirtReg);
}		}

		MachineTraceStrategy TargetInstrInfo::getMachineCombinerTraceStrategy() const {
		return MachineTraceStrategy::TS_MinInstrCount;
		}

bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric(		bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric(
const MachineInstr &MI) const {		const MachineInstr &MI) const {
const MachineFunction &MF = *MI.getMF();		const MachineFunction &MF = *MI.getMF();
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();

// Remat clients assume operand 0 is the defined register.		// Remat clients assume operand 0 is the defined register.
if (!MI.getNumOperands() \|\| !MI.getOperand(0).isReg())		if (!MI.getNumOperands() \|\| !MI.getOperand(0).isReg())
return false;		return false;
▲ Show 20 Lines • Show All 510 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfo.h

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	public:

void getVLENFactoredAmount(		void getVLENFactoredAmount(
MachineFunction &MF, MachineBasicBlock &MBB,		MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator II, const DebugLoc &DL, Register DestReg,		MachineBasicBlock::iterator II, const DebugLoc &DL, Register DestReg,
int64_t Amount, MachineInstr::MIFlag Flag = MachineInstr::NoFlags) const;		int64_t Amount, MachineInstr::MIFlag Flag = MachineInstr::NoFlags) const;

bool useMachineCombiner() const override { return true; }		bool useMachineCombiner() const override { return true; }

		MachineTraceStrategy getMachineCombinerTraceStrategy() const override;

void setSpecialOperandAttr(MachineInstr &OldMI1, MachineInstr &OldMI2,		void setSpecialOperandAttr(MachineInstr &OldMI1, MachineInstr &OldMI2,
MachineInstr &NewMI1,		MachineInstr &NewMI1,
MachineInstr &NewMI2) const override;		MachineInstr &NewMI2) const override;
bool		bool
getMachineCombinerPatterns(MachineInstr &Root,		getMachineCombinerPatterns(MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &Patterns,		SmallVectorImpl<MachineCombinerPattern> &Patterns,
bool DoRegPressureReduce) const override;		bool DoRegPressureReduce) const override;

▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

Show All 19 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/LiveVariables.h"		#include "llvm/CodeGen/LiveVariables.h"
#include "llvm/CodeGen/MachineCombinerPattern.h"		#include "llvm/CodeGen/MachineCombinerPattern.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		#include "llvm/CodeGen/MachineTraceMetrics.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/MC/MCInstBuilder.h"		#include "llvm/MC/MCInstBuilder.h"
#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"

using namespace llvm;		using namespace llvm;

#define GEN_CHECK_COMPRESS_INSTR		#define GEN_CHECK_COMPRESS_INSTR
#include "RISCVGenCompressInstEmitter.inc"		#include "RISCVGenCompressInstEmitter.inc"

#define GET_INSTRINFO_CTOR_DTOR		#define GET_INSTRINFO_CTOR_DTOR
#define GET_INSTRINFO_NAMED_OPS		#define GET_INSTRINFO_NAMED_OPS
#include "RISCVGenInstrInfo.inc"		#include "RISCVGenInstrInfo.inc"

static cl::opt<bool> PreferWholeRegisterMove(		static cl::opt<bool> PreferWholeRegisterMove(
"riscv-prefer-whole-register-move", cl::init(false), cl::Hidden,		"riscv-prefer-whole-register-move", cl::init(false), cl::Hidden,
cl::desc("Prefer whole register move for vector registers."));		cl::desc("Prefer whole register move for vector registers."));

		static cl::opt<MachineTraceStrategy> ForceMachineCombinerStrategy(
		"riscv-force-machine-combiner-strategy", cl::Hidden,
		cl::desc("Force machine combiner to use a specific strategy for machine "
		"trace metrics evaluation."),
		cl::init(MachineTraceStrategy::TS_NumStrategies),
		cl::values(clEnumValN(MachineTraceStrategy::TS_Local, "local",
		"Local strategy."),
		clEnumValN(MachineTraceStrategy::TS_MinInstrCount, "min-instr",
		"MinInstrCount strategy.")));

namespace llvm::RISCVVPseudosTable {		namespace llvm::RISCVVPseudosTable {

using namespace RISCV;		using namespace RISCV;

#define GET_RISCVVPseudosTable_IMPL		#define GET_RISCVVPseudosTable_IMPL
#include "RISCVGenSearchableTables.inc"		#include "RISCVGenSearchableTables.inc"

} // namespace llvm::RISCVVPseudosTable		} // namespace llvm::RISCVVPseudosTable
▲ Show 20 Lines • Show All 1,197 Lines • ▼ Show 20 Lines	case RISCV::FSGNJ_H:
if (MI.getOperand(1).isReg() && MI.getOperand(2).isReg() &&		if (MI.getOperand(1).isReg() && MI.getOperand(2).isReg() &&
MI.getOperand(1).getReg() == MI.getOperand(2).getReg())		MI.getOperand(1).getReg() == MI.getOperand(2).getReg())
return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};		return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
break;		break;
}		}
return std::nullopt;		return std::nullopt;
}		}

		MachineTraceStrategy RISCVInstrInfo::getMachineCombinerTraceStrategy() const {
		if (ForceMachineCombinerStrategy.getNumOccurrences() == 0) {
		// The option is unused. Choose Local strategy only for in-order cores. When
		// scheduling model is unspecified, use MinInstrCount strategy as more
		// generic one.
		const auto &SchedModel = STI.getSchedModel();
		return (!SchedModel.hasInstrSchedModel() \|\| SchedModel.isOutOfOrder())
		? MachineTraceStrategy::TS_MinInstrCount
		: MachineTraceStrategy::TS_Local;
		}
		// The strategy was forced by the option.
		return ForceMachineCombinerStrategy;
		}

void RISCVInstrInfo::setSpecialOperandAttr(MachineInstr &OldMI1,		void RISCVInstrInfo::setSpecialOperandAttr(MachineInstr &OldMI1,
MachineInstr &OldMI2,		MachineInstr &OldMI2,
MachineInstr &NewMI1,		MachineInstr &NewMI1,
MachineInstr &NewMI2) const {		MachineInstr &NewMI2) const {
uint16_t IntersectedFlags = OldMI1.getFlags() & OldMI2.getFlags();		uint16_t IntersectedFlags = OldMI1.getFlags() & OldMI2.getFlags();
NewMI1.setFlags(IntersectedFlags);		NewMI1.setFlags(IntersectedFlags);
NewMI2.setFlags(IntersectedFlags);		NewMI2.setFlags(IntersectedFlags);
}		}
▲ Show 20 Lines • Show All 1,605 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/machine-combiner-strategies.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs -mcpu=syntacore-scr1-max \
				; RUN: -O1 -riscv-enable-machine-combiner=true -riscv-force-machine-combiner-strategy=local < %s \| \
				; RUN: FileCheck %s --check-prefixes=CHECK_SCR1,CHECK_LOCAL_SCR1

				; RUN: llc -mtriple=riscv32 -verify-machineinstrs -mcpu=syntacore-scr1-max \
				; RUN: -O1 -riscv-enable-machine-combiner=true -riscv-force-machine-combiner-strategy=min-instr < %s \| \
				; RUN: FileCheck %s --check-prefixes=CHECK_SCR1,CHECK_GLOBAL_SCR1

				; RUN: llc -mtriple=riscv64 -verify-machineinstrs -mcpu=sifive-u74 \
				; RUN: -O1 -riscv-enable-machine-combiner=true -riscv-force-machine-combiner-strategy=local < %s \| \
				; RUN: FileCheck %s --check-prefixes=CHECK_SIFIVE_U74,CHECK_LOCAL_SIFIVE_U74

				; RUN: llc -mtriple=riscv64 -verify-machineinstrs -mcpu=sifive-u74 \
				; RUN: -O1 -riscv-enable-machine-combiner=true -riscv-force-machine-combiner-strategy=min-instr < %s \| \
				; RUN: FileCheck %s --check-prefixes=CHECK_SIFIVE_U74,CHECK_GLOBAL_SIFIVE_U74

				define i32 @test_local_strategy(i32 %a0, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5) {
				; CHECK_LOCAL_SCR1-LABEL: test_local_strategy:
				; CHECK_LOCAL_SCR1: # %bb.0: # %entry
				; CHECK_LOCAL_SCR1-NEXT: div a0, a0, a1
				; CHECK_LOCAL_SCR1-NEXT: sub a0, a0, a2
				; CHECK_LOCAL_SCR1-NEXT: beqz a0, .LBB0_2
				; CHECK_LOCAL_SCR1-NEXT: # %bb.1: # %b2
				; CHECK_LOCAL_SCR1-NEXT: ret
				; CHECK_LOCAL_SCR1-NEXT: .LBB0_2: # %b1
				; CHECK_LOCAL_SCR1-NEXT: add a3, a3, a4
				; CHECK_LOCAL_SCR1-NEXT: add a0, a0, a5
				; CHECK_LOCAL_SCR1-NEXT: add a0, a0, a3
				; CHECK_LOCAL_SCR1-NEXT: ret
				;
				; CHECK_GLOBAL_SCR1-LABEL: test_local_strategy:
				; CHECK_GLOBAL_SCR1: # %bb.0: # %entry
				; CHECK_GLOBAL_SCR1-NEXT: div a0, a0, a1
				; CHECK_GLOBAL_SCR1-NEXT: sub a0, a0, a2
				; CHECK_GLOBAL_SCR1-NEXT: beqz a0, .LBB0_2
				; CHECK_GLOBAL_SCR1-NEXT: # %bb.1: # %b2
				; CHECK_GLOBAL_SCR1-NEXT: ret
				; CHECK_GLOBAL_SCR1-NEXT: .LBB0_2: # %b1
				; CHECK_GLOBAL_SCR1-NEXT: add a3, a3, a4
				; CHECK_GLOBAL_SCR1-NEXT: add a3, a3, a5
				; CHECK_GLOBAL_SCR1-NEXT: add a0, a0, a3
				; CHECK_GLOBAL_SCR1-NEXT: ret
				;
				; CHECK_LOCAL_SIFIVE_U74-LABEL: test_local_strategy:
				; CHECK_LOCAL_SIFIVE_U74: # %bb.0: # %entry
				; CHECK_LOCAL_SIFIVE_U74-NEXT: divw a0, a0, a1
				; CHECK_LOCAL_SIFIVE_U74-NEXT: subw a0, a0, a2
				; CHECK_LOCAL_SIFIVE_U74-NEXT: beqz a0, .LBB0_2
				; CHECK_LOCAL_SIFIVE_U74-NEXT: # %bb.1: # %b2
				; CHECK_LOCAL_SIFIVE_U74-NEXT: ret
				; CHECK_LOCAL_SIFIVE_U74-NEXT: .LBB0_2: # %b1
				; CHECK_LOCAL_SIFIVE_U74-NEXT: add a3, a3, a4
				; CHECK_LOCAL_SIFIVE_U74-NEXT: add a0, a0, a5
				; CHECK_LOCAL_SIFIVE_U74-NEXT: addw a0, a0, a3
				; CHECK_LOCAL_SIFIVE_U74-NEXT: ret
				;
				; CHECK_GLOBAL_SIFIVE_U74-LABEL: test_local_strategy:
				; CHECK_GLOBAL_SIFIVE_U74: # %bb.0: # %entry
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: divw a0, a0, a1
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: subw a0, a0, a2
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: beqz a0, .LBB0_2
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: # %bb.1: # %b2
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: ret
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: .LBB0_2: # %b1
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: add a3, a3, a4
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: add a3, a3, a5
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: addw a0, a0, a3
				; CHECK_GLOBAL_SIFIVE_U74-NEXT: ret
				entry:
				%div = sdiv i32 %a0, %a1
				%sub0 = sub i32 %div, %a2
				%cmp = icmp eq i32 %sub0, 0
				br i1 %cmp, label %b1, label %b2
				b1:
				%sub1 = add i32 %a3, %a4
				%sub2 = add i32 %a5, %sub1
				%sub3 = add i32 %sub2, %sub0
				ret i32 %sub3
				b2:
				ret i32 %sub0
				}

				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; CHECK_SCR1: {{.*}}
				; CHECK_SIFIVE_U74: {{.*}}

llvm/test/CodeGen/RISCV/machine-combiner.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -mattr=+d,+zbb,+zfh -verify-machineinstrs -mcpu=sifive-u74 \			; RUN: llc -mtriple=riscv64 -mattr=+d,+zbb,+zfh -verify-machineinstrs -mcpu=sifive-u74 \
	; RUN: -O1 -riscv-enable-machine-combiner=true < %s \| \			; RUN: -O1 -riscv-enable-machine-combiner=true -riscv-force-machine-combiner-strategy=local < %s \| \
	; RUN: FileCheck %s			; RUN: FileCheck %s --check-prefixes=CHECK,CHECK_LOCAL

				; RUN: llc -mtriple=riscv64 -mattr=+d,+zbb,+zfh -verify-machineinstrs -mcpu=sifive-u74 \
				; RUN: -O1 -riscv-enable-machine-combiner=true -riscv-force-machine-combiner-strategy=min-instr < %s \| \
				; RUN: FileCheck %s --check-prefixes=CHECK,CHECK_GLOBAL

	define double @test_reassoc_fadd1(double %a0, double %a1, double %a2, double %a3) {			define double @test_reassoc_fadd1(double %a0, double %a1, double %a2, double %a3) {
	; CHECK-LABEL: test_reassoc_fadd1:			; CHECK-LABEL: test_reassoc_fadd1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: fadd.d ft0, fa0, fa1			; CHECK-NEXT: fadd.d ft0, fa0, fa1
	; CHECK-NEXT: fadd.d ft1, fa2, fa3			; CHECK-NEXT: fadd.d ft1, fa2, fa3
	; CHECK-NEXT: fadd.d fa0, ft0, ft1			; CHECK-NEXT: fadd.d fa0, ft0, ft1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	▲ Show 20 Lines • Show All 1,070 Lines • ▼ Show 20 Lines
	declare i32 @llvm.smax.i32(i32 %a, i32 %b)			declare i32 @llvm.smax.i32(i32 %a, i32 %b)
	declare i64 @llvm.smax.i64(i64 %a, i64 %b)			declare i64 @llvm.smax.i64(i64 %a, i64 %b)
	declare half @llvm.minnum.f16(half, half)			declare half @llvm.minnum.f16(half, half)
	declare float @llvm.minnum.f32(float, float)			declare float @llvm.minnum.f32(float, float)
	declare double @llvm.minnum.f64(double, double)			declare double @llvm.minnum.f64(double, double)
	declare half @llvm.maxnum.f16(half, half)			declare half @llvm.maxnum.f16(half, half)
	declare float @llvm.maxnum.f32(float, float)			declare float @llvm.maxnum.f32(float, float)
	declare double @llvm.maxnum.f64(double, double)			declare double @llvm.maxnum.f64(double, double)

				define double @test_fmadd_strategy(double %a0, double %a1, double %a2, double %a3, i64 %flag) {
				; CHECK_LOCAL-LABEL: test_fmadd_strategy:
				; CHECK_LOCAL: # %bb.0: # %entry
				; CHECK_LOCAL-NEXT: fmv.d ft0, fa0
				; CHECK_LOCAL-NEXT: fsub.d ft1, fa0, fa1
				; CHECK_LOCAL-NEXT: fmul.d fa0, ft1, fa2
				; CHECK_LOCAL-NEXT: andi a0, a0, 1
				; CHECK_LOCAL-NEXT: beqz a0, .LBB76_2
				; CHECK_LOCAL-NEXT: # %bb.1: # %entry
				; CHECK_LOCAL-NEXT: fmul.d ft1, ft0, fa1
				; CHECK_LOCAL-NEXT: fmadd.d ft0, ft0, fa1, fa0
				; CHECK_LOCAL-NEXT: fsub.d fa0, ft0, ft1
				; CHECK_LOCAL-NEXT: .LBB76_2: # %entry
				; CHECK_LOCAL-NEXT: ret
				;
				; CHECK_GLOBAL-LABEL: test_fmadd_strategy:
				; CHECK_GLOBAL: # %bb.0: # %entry
				; CHECK_GLOBAL-NEXT: fmv.d ft0, fa0
				; CHECK_GLOBAL-NEXT: fsub.d ft1, fa0, fa1
				; CHECK_GLOBAL-NEXT: fmul.d fa0, ft1, fa2
				; CHECK_GLOBAL-NEXT: andi a0, a0, 1
				; CHECK_GLOBAL-NEXT: beqz a0, .LBB76_2
				; CHECK_GLOBAL-NEXT: # %bb.1: # %entry
				; CHECK_GLOBAL-NEXT: fmul.d ft0, ft0, fa1
				; CHECK_GLOBAL-NEXT: fadd.d ft1, ft0, fa0
				; CHECK_GLOBAL-NEXT: fsub.d fa0, ft1, ft0
				; CHECK_GLOBAL-NEXT: .LBB76_2: # %entry
				; CHECK_GLOBAL-NEXT: ret
				entry:
				%sub = fsub contract double %a0, %a1
				%mul = fmul contract double %sub, %a2
				%and = and i64 %flag, 1
				%tobool.not = icmp eq i64 %and, 0
				%mul2 = fmul contract double %a0, %a1
				%add = fadd contract double %mul2, %mul
				%sub3 = fsub contract double %add, %mul2
				%retval.0 = select i1 %tobool.not, double %mul, double %sub3
				ret double %retval.0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCombiner] Support local strategy for tracesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 495085

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/lib/CodeGen/MachineCombiner.cpp

llvm/lib/CodeGen/TargetInstrInfo.cpp

llvm/lib/Target/RISCV/RISCVInstrInfo.h

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

llvm/test/CodeGen/RISCV/machine-combiner-strategies.ll

llvm/test/CodeGen/RISCV/machine-combiner.ll

[MachineCombiner] Support local strategy for traces
ClosedPublic