This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachinePipeliner.h
-
lib/CodeGen/
-
CodeGen/
4/4
MachinePipeliner.cpp
-
test/CodeGen/
-
CodeGen/
-
PowerPC/
-
sms-iterator.ll
-
sms-phi-1.ll
-
sms-phi-2.ll
-
Thumb2/
-
mve-pipelineloops.ll
-
swp-exitbranchdir.mir
-
swp-fixedii-le.mir
-
swp-fixedii.mir
-
swp-regpressure.mir

Differential D133572

[MachinePipeliner] Fix the interpretation of the scheduling model
ClosedPublic

Authored by ytmukai on Sep 9 2022, 5:43 AM.

Download Raw Diff

Details

Reviewers

dpenry
jsji
dmgreen
bcahoon

Commits

rG116838b1516a: [MachinePipeliner] Fix the interpretation of the scheduling model

Summary

The method of counting resource consumption is modified to be based on
"Cycles" value when DFA is not used.

The calculation of ResMII is modified to total "Cycles" and divide it
by the number of units for each resource. Previously, ResMII was
excessive because it was assumed that resources were consumed for
the cycles of "Latency" value.

The method of resource reservation is modified similarly. When a
value of "Cycles" is larger than 1, the resource is considered to be
consumed by 1 for cycles of its length from the scheduled cycle.
To realize this, ResourceManager maintains a resource table for all
slots. Previously, resource consumption was always 1 for 1 cycle
regardless of the value of "Cycles" or "Latency".

In addition, the number of instructions per cycle is modified to be
constrained by "IssueWidth".

For the case of using DFA, the scheduling results are unchanged.

Example:

Command: $ llc -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 --ppc-enable-pipeliner --debug-only=pipeliner --pipeliner-dbg-res --pipeliner-max-stages=10 llvm/test/CodeGen/PowerPC/sms-phi-2.ll
Previous result:

MII = 15 MAX_II = 25 (rec=2, res=15)
Schedule Found? 1 (II=15)

Modified result:

#Insts: 7, IssueWidth: 8, Cycles: 1
ID            Name     Units  Consumed    Cycles
 1             ALU         4         5         2
 2            ALUE         2         0         0
 3            ALUO         2         0         0
 4              BR         1         0         0
 5              CY         1         0         0
 6             DFU         1         0         0
 7        DISP_NBR         6        15         3
 8         DISP_SS         4         8         2
 9         DISPb01         2         0         0
10         DISPx02         2         4         2
11         DISPx13         2         4         2
12         DISPxab         2         3         2
13             DIV         2         8         4
14              DP         4         1         1
15             DPE         2         0         0
16             DPO         2         0         0
17         IP_AGEN         4         1         1
18         IP_EXEC         4         9         3
19        IP_EXECE         2         1         1
20        IP_EXECO         2         1         1
21              LS         4         1         1
22              PM         2         0         0
MII = 4 MAX_II = 14 (rec=2, res=4)

MRT:
Slot  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 #Insts
   0  0  0  0  0  0  0  3  2  0  2  2  1  2  1  0  0  0  2  1  1  0  0      2
   1  3  0  0  0  0  0  5  4  0  2  2  2  2  0  0  0  1  4  0  0  1  0      3
   2  0  0  0  0  0  0  3  2  0  0  0  0  2  0  0  0  0  1  0  0  0  0      0
   3  2  0  0  0  0  0  4  0  0  0  0  0  2  0  0  0  0  2  0  0  0  0      2

Schedule Found? 1 (II=4)

The modification provides a better ResMII and can actually schedule it at that value (although resource management during scheduling is changed to be more restrictive.)

The modifications will produce more aggressive schedule, but the final result may not be significantly different due to the limitation of the maximum number of stages (3).
I believe that only ARM Cortex-M7 enables pipeliner by default. What are your thoughts on the impact of this modification?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ytmukai created this revision.Sep 9 2022, 5:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2022, 5:43 AM

Herald added subscribers: steven.zhang, hiraditya, kristof.beyls, nemanjai. · View Herald Transcript

ytmukai requested review of this revision.Sep 9 2022, 5:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2022, 5:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B185818: Diff 459015.Sep 9 2022, 6:40 AM

Remarkably, I was just about to upload a patch to do this very thing for in-order non-DFA scheduling models, but this one has more functionality! (It handles multi-cycle occupancy better, and the pipeliner-force-ii option will be quite useful for testing.)

Overall, I expect this to be helpful for Cortex-M7. I can't speak to any side effects on models using DFA (Hexagon), or out-of-order non-DFA (PowerPC). (NOTE: DFA is used by default; PowerPC and ARM targets turn it off.)

Do update the failing Thumb2 test outputs so I can comment on how/whether they need to be modified. The use of pragmas in them to force the MII was precisely because of the problem this patch solves.

Also "unbuffered groups" -- resources which have subunits and whose BufferSize is zero -- are intended to consume resources differently -- or so the MachineScheduler code (SchedBoundary::getNextResourceCycle) implies. At least in some cases, they do not consume resources for themselves, but choose between their subunits. IIRC, this is used to model instructions having options in what resources they consume. I don't think that getting this behavior in place is necessary for this patch, but could be in a follow-on patch.

llvm/lib/CodeGen/MachinePipeliner.cpp
3055	Will need to adjust for use of NumMicroOps as described in later comments
3080	Should increment by SCDesc->NumMicroOps (see comment below while calculating ResMII)
3205	Use SM.resolveSchedClass here so that variant scheduling classes can be resolved.
3210	Increment by SCDesc->NumMicroOps, as NumMicroOps represents the amount of issue width taken by the instruction (at least that's how MachineScheduler interprets it).

Update Thumb2 tests.
Refer to the number of micro operations when restricting the number of issues.

@dpenry Thank you for the reviews!

Thumb2 tests passed by disabling the restriction of issue width. (I had forgotten that the ARM swp tests are under Thumb2.)

The number of issues per cycle was changed to be counted in micro-ops. Micro-ops are assumed to be scheduled one per cycle, starting with the cycle in which the instruction is scheduled. (Reserved at reserveResources()) This may differ from actual operation, but it prevents an instruction from being unschedulable if it has a large number of micro-ops relative to the issue width. In MachineScheduler, when the number of micro-ops exceeds the limit, it seems that a next cycle's slot is used. In any case, the model does not have that much resolution, so I preferred a simple implementation.

Regarding problem on getting scheduling classes, I understand that the interface resolveSchedClass should be used, but the problem spans the entire code and I would like to solve it in a separate patch, is that ok?

Thanks for the info about unbuffered groups. I have not yet understood the details and have addressed it in a future patch if necessary.

Harbormaster completed remote builds in B186347: Diff 459680.Sep 13 2022, 6:12 AM

LGTM

In D133572#3786257, @ytmukai wrote:

<snip>

Thumb2 tests passed by disabling the restriction of issue width. (I had forgotten that the ARM swp tests are under Thumb2.)

That's fine for now; I may tweak them in a new patch to ensure they're still adequately testing what they were intended to test.

Regarding problem on getting scheduling classes, I understand that the interface resolveSchedClass should be used, but the problem spans the entire code and I would like to solve it in a separate patch, is that ok?

Sure. Let's make it soon, though. IIRC, when I was doing my changes, not resolving the class led to occasionally odd results for Cortex-M7.

Thanks for the info about unbuffered groups. I have not yet understood the details and have addressed it in a future patch if necessary.

Yeah, this all gets pretty hairy. Look at https://reviews.llvm.org/D98976 for a thread with me trying to figure out what it all means once there are resource groups in play.

This revision is now accepted and ready to land.Sep 13 2022, 8:20 AM

Refactoring
Deal with variants when getting scheduling classes

By refactoring, the modifications to resolve variant scheduling classes could be made naturally. calculateResMIIDFA() is not modified yet, but the only user of the function, Hexagon, has empty resolveSchedClass(), so it should not be a problem for the time being.

Harbormaster completed remote builds in B186540: Diff 459989.Sep 13 2022, 11:44 PM

LGTM. Thanks!

This revision was landed with ongoing or failed builds.Sep 15 2022, 6:26 PM

Closed by commit rG116838b1516a: [MachinePipeliner] Fix the interpretation of the scheduling model (authored by ytmukai). · Explain Why

This revision was automatically updated to reflect the committed changes.

ytmukai added a commit: rG116838b1516a: [MachinePipeliner] Fix the interpretation of the scheduling model.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachinePipeliner.h

81 lines

lib/

CodeGen/

MachinePipeliner.cpp

391 lines

test/

CodeGen/

PowerPC/

sms-iterator.ll

2 lines

sms-phi-1.ll

2 lines

sms-phi-2.ll

2 lines

Thumb2/

mve-pipelineloops.ll

2 lines

swp-exitbranchdir.mir

2 lines

swp-fixedii-le.mir

2 lines

swp-fixedii.mir

2 lines

swp-regpressure.mir

2 lines

Diff 460598

llvm/include/llvm/CodeGen/MachinePipeliner.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines

namespace llvm {		namespace llvm {

class AAResults;		class AAResults;
class NodeSet;		class NodeSet;
class SMSchedule;		class SMSchedule;

extern cl::opt<bool> SwpEnableCopyToPhi;		extern cl::opt<bool> SwpEnableCopyToPhi;
		extern cl::opt<int> SwpForceIssueWidth;

/// The main class in the implementation of the target independent		/// The main class in the implementation of the target independent
/// software pipeliner pass.		/// software pipeliner pass.
class MachinePipeliner : public MachineFunctionPass {		class MachinePipeliner : public MachineFunctionPass {
public:		public:
MachineFunction *MF = nullptr;		MachineFunction *MF = nullptr;
MachineOptimizationRemarkEmitter *ORE = nullptr;		MachineOptimizationRemarkEmitter *ORE = nullptr;
const MachineLoopInfo *MLI = nullptr;		const MachineLoopInfo *MLI = nullptr;
▲ Show 20 Lines • Show All 370 Lines • ▼ Show 20 Lines
// 16 was selected based on the number of ProcResource kinds for all		// 16 was selected based on the number of ProcResource kinds for all
// existing Subtargets, so that SmallVector don't need to resize too often.		// existing Subtargets, so that SmallVector don't need to resize too often.
static const int DefaultProcResSize = 16;		static const int DefaultProcResSize = 16;

class ResourceManager {		class ResourceManager {
private:		private:
const MCSubtargetInfo *STI;		const MCSubtargetInfo *STI;
const MCSchedModel &SM;		const MCSchedModel &SM;
		const TargetSubtargetInfo *ST;
		const TargetInstrInfo *TII;
		SwingSchedulerDAG *DAG;
const bool UseDFA;		const bool UseDFA;
std::unique_ptr<DFAPacketizer> DFAResources;		/// DFA resources for each slot
		llvm::SmallVector<std::unique_ptr<DFAPacketizer>> DFAResources;
		/// Modulo Reservation Table. When a resource with ID R is consumed in cycle
		/// C, it is counted in MRT[C mod II][R]. (Used when UseDFA == F)
		llvm::SmallVector<llvm::SmallVector<uint64_t, DefaultProcResSize>> MRT;
		/// The number of scheduled micro operations for each slot. Micro operations
		/// are assumed to be scheduled one per cycle, starting with the cycle in
		/// which the instruction is scheduled.
		llvm::SmallVector<int> NumScheduledMops;
/// Each processor resource is associated with a so-called processor resource		/// Each processor resource is associated with a so-called processor resource
/// mask. This vector allows to correlate processor resource IDs with		/// mask. This vector allows to correlate processor resource IDs with
/// processor resource masks. There is exactly one element per each processor		/// processor resource masks. There is exactly one element per each processor
/// resource declared by the scheduling model.		/// resource declared by the scheduling model.
llvm::SmallVector<uint64_t, DefaultProcResSize> ProcResourceMasks;		llvm::SmallVector<uint64_t, DefaultProcResSize> ProcResourceMasks;
		int InitiationInterval;
		/// The number of micro operations that can be scheduled at a cycle.
		int IssueWidth;

		int calculateResMIIDFA() const;
		/// Check if MRT is overbooked
		bool isOverbooked() const;
		/// Reserve resources on MRT
		void reserveResources(const MCSchedClassDesc *SCDesc, int Cycle);
		/// Unreserve resources on MRT
		void unreserveResources(const MCSchedClassDesc *SCDesc, int Cycle);

		/// Return M satisfying Dividend = Divisor * X + M, 0 < M < Divisor.
		/// The slot on MRT to reserve a resource for the cycle C is positiveModulo(C,
		/// II).
		int positiveModulo(int Dividend, int Divisor) const {
		assert(Divisor > 0);
		int R = Dividend % Divisor;
		if (R < 0)
		R += Divisor;
		return R;
		}

llvm::SmallVector<uint64_t, DefaultProcResSize> ProcResourceCount;		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		LLVM_DUMP_METHOD void dumpMRT() const;
		#endif

public:		public:
ResourceManager(const TargetSubtargetInfo *ST)		ResourceManager(const TargetSubtargetInfo ST, SwingSchedulerDAG DAG)
: STI(ST), SM(ST->getSchedModel()), UseDFA(ST->useDFAforSMS()),		: STI(ST), SM(ST->getSchedModel()), ST(ST), TII(ST->getInstrInfo()),
		DAG(DAG), UseDFA(ST->useDFAforSMS()),
ProcResourceMasks(SM.getNumProcResourceKinds(), 0),		ProcResourceMasks(SM.getNumProcResourceKinds(), 0),
ProcResourceCount(SM.getNumProcResourceKinds(), 0) {		IssueWidth(SM.IssueWidth) {
if (UseDFA)
DFAResources.reset(ST->getInstrInfo()->CreateTargetScheduleState(*ST));
initProcResourceVectors(SM, ProcResourceMasks);		initProcResourceVectors(SM, ProcResourceMasks);
		if (IssueWidth <= 0)
		// If IssueWidth is not specified, set a sufficiently large value
		IssueWidth = 100;
		if (SwpForceIssueWidth > 0)
		IssueWidth = SwpForceIssueWidth;
}		}

void initProcResourceVectors(const MCSchedModel &SM,		void initProcResourceVectors(const MCSchedModel &SM,
SmallVectorImpl<uint64_t> &Masks);		SmallVectorImpl<uint64_t> &Masks);
/// Check if the resources occupied by a MCInstrDesc are available in
/// the current state.
bool canReserveResources(const MCInstrDesc *MID) const;

/// Reserve the resources occupied by a MCInstrDesc and change the current
/// state to reflect that change.
void reserveResources(const MCInstrDesc *MID);

/// Check if the resources occupied by a machine instruction are available		/// Check if the resources occupied by a machine instruction are available
/// in the current state.		/// in the current state.
bool canReserveResources(const MachineInstr &MI) const;		bool canReserveResources(SUnit &SU, int Cycle);

/// Reserve the resources occupied by a machine instruction and change the		/// Reserve the resources occupied by a machine instruction and change the
/// current state to reflect that change.		/// current state to reflect that change.
void reserveResources(const MachineInstr &MI);		void reserveResources(SUnit &SU, int Cycle);

/// Reset the state		int calculateResMII() const;
void clearResources();
		/// Initialize resources with the initiation interval II.
		void init(int II);
};		};

/// This class represents the scheduled code. The main data structure is a		/// This class represents the scheduled code. The main data structure is a
/// map from scheduled cycle to instructions. During scheduling, the		/// map from scheduled cycle to instructions. During scheduling, the
/// data structure explicitly represents all stages/iterations. When		/// data structure explicitly represents all stages/iterations. When
/// the algorithm finshes, the schedule is collapsed into a single stage,		/// the algorithm finshes, the schedule is collapsed into a single stage,
/// which represents instructions from different loop iterations.		/// which represents instructions from different loop iterations.
///		///
Show All 21 Lines	private:
const TargetSubtargetInfo &ST;		const TargetSubtargetInfo &ST;

/// Virtual register information.		/// Virtual register information.
MachineRegisterInfo &MRI;		MachineRegisterInfo &MRI;

ResourceManager ProcItinResources;		ResourceManager ProcItinResources;

public:		public:
SMSchedule(MachineFunction *mf)		SMSchedule(MachineFunction mf, SwingSchedulerDAG DAG)
: ST(mf->getSubtarget()), MRI(mf->getRegInfo()), ProcItinResources(&ST) {}		: ST(mf->getSubtarget()), MRI(mf->getRegInfo()),
		ProcItinResources(&ST, DAG) {}

void reset() {		void reset() {
ScheduledInstrs.clear();		ScheduledInstrs.clear();
InstrToCycle.clear();		InstrToCycle.clear();
FirstCycle = 0;		FirstCycle = 0;
LastCycle = 0;		LastCycle = 0;
InitiationInterval = 0;		InitiationInterval = 0;
}		}

/// Set the initiation interval for this schedule.		/// Set the initiation interval for this schedule.
void setInitiationInterval(int ii) { InitiationInterval = ii; }		void setInitiationInterval(int ii) {
		InitiationInterval = ii;
		ProcItinResources.init(ii);
		}

/// Return the initiation interval for this schedule.		/// Return the initiation interval for this schedule.
int getInitiationInterval() const { return InitiationInterval; }		int getInitiationInterval() const { return InitiationInterval; }

/// Return the first cycle in the completed schedule. This		/// Return the first cycle in the completed schedule. This
/// can be a negative value.		/// can be a negative value.
int getFirstCycle() const { return FirstCycle; }		int getFirstCycle() const { return FirstCycle; }

▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachinePipeliner.cpp

Show All 37 Lines
#include "llvm/ADT/SetOperations.h"		#include "llvm/ADT/SetOperations.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
		#include "llvm/Analysis/CycleAnalysis.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/DFAPacketizer.h"		#include "llvm/CodeGen/DFAPacketizer.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
Show All 25 Lines
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <climits>		#include <climits>
#include <cstdint>		#include <cstdint>
#include <deque>		#include <deque>
#include <functional>		#include <functional>
		#include <iomanip>
#include <iterator>		#include <iterator>
#include <map>		#include <map>
#include <memory>		#include <memory>
		#include <sstream>
#include <tuple>		#include <tuple>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "pipeliner"		#define DEBUG_TYPE "pipeliner"

Show All 18 Lines	static cl::opt<bool> EnableSWPOptSize("enable-pipeliner-opt-size",
cl::desc("Enable SWP at Os."), cl::Hidden,		cl::desc("Enable SWP at Os."), cl::Hidden,
cl::init(false));		cl::init(false));

/// A command line argument to limit minimum initial interval for pipelining.		/// A command line argument to limit minimum initial interval for pipelining.
static cl::opt<int> SwpMaxMii("pipeliner-max-mii",		static cl::opt<int> SwpMaxMii("pipeliner-max-mii",
cl::desc("Size limit for the MII."),		cl::desc("Size limit for the MII."),
cl::Hidden, cl::init(27));		cl::Hidden, cl::init(27));

		/// A command line argument to force pipeliner to use specified initial
		/// interval.
		static cl::opt<int> SwpForceII("pipeliner-force-ii",
		cl::desc("Force pipeliner to use specified II."),
		cl::Hidden, cl::init(-1));

/// A command line argument to limit the number of stages in the pipeline.		/// A command line argument to limit the number of stages in the pipeline.
static cl::opt<int>		static cl::opt<int>
SwpMaxStages("pipeliner-max-stages",		SwpMaxStages("pipeliner-max-stages",
cl::desc("Maximum stages allowed in the generated scheduled."),		cl::desc("Maximum stages allowed in the generated scheduled."),
cl::Hidden, cl::init(3));		cl::Hidden, cl::init(3));

/// A command line option to disable the pruning of chain dependences due to		/// A command line option to disable the pruning of chain dependences due to
/// an unrelated Phi.		/// an unrelated Phi.
Show All 35 Lines

namespace llvm {		namespace llvm {

// A command line option to enable the CopyToPhi DAG mutation.		// A command line option to enable the CopyToPhi DAG mutation.
cl::opt<bool> SwpEnableCopyToPhi("pipeliner-enable-copytophi", cl::ReallyHidden,		cl::opt<bool> SwpEnableCopyToPhi("pipeliner-enable-copytophi", cl::ReallyHidden,
cl::init(true),		cl::init(true),
cl::desc("Enable CopyToPhi DAG Mutation"));		cl::desc("Enable CopyToPhi DAG Mutation"));

		/// A command line argument to force pipeliner to use specified issue
		/// width.
		cl::opt<int> SwpForceIssueWidth(
		"pipeliner-force-issue-width",
		cl::desc("Force pipeliner to use specified issue width."), cl::Hidden,
		cl::init(-1));

} // end namespace llvm		} // end namespace llvm

unsigned SwingSchedulerDAG::Circuits::MaxPaths = 5;		unsigned SwingSchedulerDAG::Circuits::MaxPaths = 5;
char MachinePipeliner::ID = 0;		char MachinePipeliner::ID = 0;
#ifndef NDEBUG		#ifndef NDEBUG
int MachinePipeliner::NumTries = 0;		int MachinePipeliner::NumTries = 0;
#endif		#endif
char &llvm::MachinePipelinerID = MachinePipeliner::ID;		char &llvm::MachinePipelinerID = MachinePipeliner::ID;
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	void MachinePipeliner::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequired<MachineLoopInfo>();		AU.addRequired<MachineLoopInfo>();
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
AU.addRequired<LiveIntervals>();		AU.addRequired<LiveIntervals>();
AU.addRequired<MachineOptimizationRemarkEmitterPass>();		AU.addRequired<MachineOptimizationRemarkEmitterPass>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

void SwingSchedulerDAG::setMII(unsigned ResMII, unsigned RecMII) {		void SwingSchedulerDAG::setMII(unsigned ResMII, unsigned RecMII) {
if (II_setByPragma > 0)		if (SwpForceII > 0)
		MII = SwpForceII;
		else if (II_setByPragma > 0)
MII = II_setByPragma;		MII = II_setByPragma;
else		else
MII = std::max(ResMII, RecMII);		MII = std::max(ResMII, RecMII);
}		}

void SwingSchedulerDAG::setMAX_II() {		void SwingSchedulerDAG::setMAX_II() {
if (II_setByPragma > 0)		if (SwpForceII > 0)
		MAX_II = SwpForceII;
		else if (II_setByPragma > 0)
MAX_II = II_setByPragma;		MAX_II = II_setByPragma;
else		else
MAX_II = MII + 10;		MAX_II = MII + 10;
}		}

/// We override the schedule function in ScheduleDAGInstrs to implement the		/// We override the schedule function in ScheduleDAGInstrs to implement the
/// scheduling part of the Swing Modulo Scheduling algorithm.		/// scheduling part of the Swing Modulo Scheduling algorithm.
void SwingSchedulerDAG::schedule() {		void SwingSchedulerDAG::schedule() {
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	LLVM_DEBUG({
}		}
});		});

computeNodeOrder(NodeSets);		computeNodeOrder(NodeSets);

// check for node order issues		// check for node order issues
checkValidNodeOrder(Circuits);		checkValidNodeOrder(Circuits);

SMSchedule Schedule(Pass.MF);		SMSchedule Schedule(Pass.MF, this);
Scheduled = schedulePipeline(Schedule);		Scheduled = schedulePipeline(Schedule);

if (!Scheduled){		if (!Scheduled){
LLVM_DEBUG(dbgs() << "No schedule found, return\n");		LLVM_DEBUG(dbgs() << "No schedule found, return\n");
NumFailNoSchedule++;		NumFailNoSchedule++;
Pass.ORE->emit([&]() {		Pass.ORE->emit([&]() {
return MachineOptimizationRemarkAnalysis(		return MachineOptimizationRemarkAnalysis(
DEBUG_TYPE, "schedule", Loop.getStartLoc(), Loop.getHeader())		DEBUG_TYPE, "schedule", Loop.getStartLoc(), Loop.getHeader())
▲ Show 20 Lines • Show All 516 Lines • ▼ Show 20 Lines

/// Calculate the resource constrained minimum initiation interval for the		/// Calculate the resource constrained minimum initiation interval for the
/// specified loop. We use the DFA to model the resources needed for		/// specified loop. We use the DFA to model the resources needed for
/// each instruction, and we ignore dependences. A different DFA is created		/// each instruction, and we ignore dependences. A different DFA is created
/// for each cycle that is required. When adding a new instruction, we attempt		/// for each cycle that is required. When adding a new instruction, we attempt
/// to add it to each existing DFA, until a legal space is found. If the		/// to add it to each existing DFA, until a legal space is found. If the
/// instruction cannot be reserved in an existing DFA, we create a new one.		/// instruction cannot be reserved in an existing DFA, we create a new one.
unsigned SwingSchedulerDAG::calculateResMII() {		unsigned SwingSchedulerDAG::calculateResMII() {

LLVM_DEBUG(dbgs() << "calculateResMII:\n");		LLVM_DEBUG(dbgs() << "calculateResMII:\n");
SmallVector<ResourceManager*, 8> Resources;		ResourceManager RM(&MF.getSubtarget(), this);
MachineBasicBlock *MBB = Loop.getHeader();		return RM.calculateResMII();
Resources.push_back(new ResourceManager(&MF.getSubtarget()));

// Sort the instructions by the number of available choices for scheduling,
// least to most. Use the number of critical resources as the tie breaker.
FuncUnitSorter FUS = FuncUnitSorter(MF.getSubtarget());
for (MachineInstr &MI :
llvm::make_range(MBB->getFirstNonPHI(), MBB->getFirstTerminator()))
FUS.calcCriticalResources(MI);
PriorityQueue<MachineInstr , std::vector<MachineInstr >, FuncUnitSorter>
FuncUnitOrder(FUS);

for (MachineInstr &MI :
llvm::make_range(MBB->getFirstNonPHI(), MBB->getFirstTerminator()))
FuncUnitOrder.push(&MI);

while (!FuncUnitOrder.empty()) {
MachineInstr *MI = FuncUnitOrder.top();
FuncUnitOrder.pop();
if (TII->isZeroCost(MI->getOpcode()))
continue;
// Attempt to reserve the instruction in an existing DFA. At least one
// DFA is needed for each cycle.
unsigned NumCycles = getSUnit(MI)->Latency;
unsigned ReservedCycles = 0;
SmallVectorImpl<ResourceManager *>::iterator RI = Resources.begin();
SmallVectorImpl<ResourceManager *>::iterator RE = Resources.end();
LLVM_DEBUG({
dbgs() << "Trying to reserve resource for " << NumCycles
<< " cycles for \n";
MI->dump();
});
for (unsigned C = 0; C < NumCycles; ++C)
while (RI != RE) {
if ((RI)->canReserveResources(MI)) {
(RI)->reserveResources(MI);
++ReservedCycles;
break;
}
RI++;
}
LLVM_DEBUG(dbgs() << "ReservedCycles:" << ReservedCycles
<< ", NumCycles:" << NumCycles << "\n");
// Add new DFAs, if needed, to reserve resources.
for (unsigned C = ReservedCycles; C < NumCycles; ++C) {
LLVM_DEBUG(if (SwpDebugResource) dbgs()
<< "NewResource created to reserve resources"
<< "\n");
ResourceManager *NewResource = new ResourceManager(&MF.getSubtarget());
assert(NewResource->canReserveResources(*MI) && "Reserve error.");
NewResource->reserveResources(*MI);
Resources.push_back(NewResource);
}
}
int Resmii = Resources.size();
LLVM_DEBUG(dbgs() << "Return Res MII:" << Resmii << "\n");
// Delete the memory for each of the DFAs that were created earlier.
for (ResourceManager *RI : Resources) {
ResourceManager *D = RI;
delete D;
}
Resources.clear();
return Resmii;
}		}

/// Calculate the recurrence-constrainted minimum initiation interval.		/// Calculate the recurrence-constrainted minimum initiation interval.
/// Iterate over each circuit. Compute the delay(c) and distance(c)		/// Iterate over each circuit. Compute the delay(c) and distance(c)
/// for each circuit. The II needs to satisfy the inequality		/// for each circuit. The II needs to satisfy the inequality
/// delay(c) - II*distance(c) <= 0. For each circuit, choose the smallest		/// delay(c) - II*distance(c) <= 0. For each circuit, choose the smallest
/// II that satisfies the inequality, and the RecMII is the maximum		/// II that satisfies the inequality, and the RecMII is the maximum
/// of those values.		/// of those values.
▲ Show 20 Lines • Show All 1,200 Lines • ▼ Show 20 Lines	bool SMSchedule::insert(SUnit *SU, int StartCycle, int EndCycle, int II) {
if (StartCycle > EndCycle)		if (StartCycle > EndCycle)
forward = false;		forward = false;

// The terminating condition depends on the direction.		// The terminating condition depends on the direction.
int termCycle = forward ? EndCycle + 1 : EndCycle - 1;		int termCycle = forward ? EndCycle + 1 : EndCycle - 1;
for (int curCycle = StartCycle; curCycle != termCycle;		for (int curCycle = StartCycle; curCycle != termCycle;
forward ? ++curCycle : --curCycle) {		forward ? ++curCycle : --curCycle) {

// Add the already scheduled instructions at the specified cycle to the
// DFA.
ProcItinResources.clearResources();
for (int checkCycle = FirstCycle + ((curCycle - FirstCycle) % II);
checkCycle <= LastCycle; checkCycle += II) {
std::deque<SUnit *> &cycleInstrs = ScheduledInstrs[checkCycle];

for (SUnit *CI : cycleInstrs) {
if (ST.getInstrInfo()->isZeroCost(CI->getInstr()->getOpcode()))
continue;
assert(ProcItinResources.canReserveResources(*CI->getInstr()) &&
"These instructions have already been scheduled.");
ProcItinResources.reserveResources(*CI->getInstr());
}
}
if (ST.getInstrInfo()->isZeroCost(SU->getInstr()->getOpcode()) \|\|		if (ST.getInstrInfo()->isZeroCost(SU->getInstr()->getOpcode()) \|\|
ProcItinResources.canReserveResources(*SU->getInstr())) {		ProcItinResources.canReserveResources(*SU, curCycle)) {
LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "\tinsert at cycle " << curCycle << " ";		dbgs() << "\tinsert at cycle " << curCycle << " ";
SU->getInstr()->dump();		SU->getInstr()->dump();
});		});

		if (!ST.getInstrInfo()->isZeroCost(SU->getInstr()->getOpcode()))
		ProcItinResources.reserveResources(*SU, curCycle);
ScheduledInstrs[curCycle].push_back(SU);		ScheduledInstrs[curCycle].push_back(SU);
InstrToCycle.insert(std::make_pair(SU, curCycle));		InstrToCycle.insert(std::make_pair(SU, curCycle));
if (curCycle > LastCycle)		if (curCycle > LastCycle)
LastCycle = curCycle;		LastCycle = curCycle;
if (curCycle < FirstCycle)		if (curCycle < FirstCycle)
FirstCycle = curCycle;		FirstCycle = curCycle;
return true;		return true;
}		}
▲ Show 20 Lines • Show All 612 Lines • ▼ Show 20 Lines	for (int cycle = getFirstCycle(); cycle <= getFinalCycle(); ++cycle) {
}		}
}		}
}		}

/// Utility function used for debugging to print the schedule.		/// Utility function used for debugging to print the schedule.
LLVM_DUMP_METHOD void SMSchedule::dump() const { print(dbgs()); }		LLVM_DUMP_METHOD void SMSchedule::dump() const { print(dbgs()); }
LLVM_DUMP_METHOD void NodeSet::dump() const { print(dbgs()); }		LLVM_DUMP_METHOD void NodeSet::dump() const { print(dbgs()); }

		void ResourceManager::dumpMRT() const {
		LLVM_DEBUG({
		if (UseDFA)
		return;
		std::stringstream SS;
		SS << "MRT:\n";
		SS << std::setw(4) << "Slot";
		for (unsigned I = 1, E = SM.getNumProcResourceKinds(); I < E; ++I)
		SS << std::setw(3) << I;
		SS << std::setw(7) << "#Mops"
		<< "\n";
		for (int Slot = 0; Slot < InitiationInterval; ++Slot) {
		SS << std::setw(4) << Slot;
		for (unsigned I = 1, E = SM.getNumProcResourceKinds(); I < E; ++I)
		SS << std::setw(3) << MRT[Slot][I];
		SS << std::setw(7) << NumScheduledMops[Slot] << "\n";
		}
		dbgs() << SS.str();
		});
		}
#endif		#endif

void ResourceManager::initProcResourceVectors(		void ResourceManager::initProcResourceVectors(
const MCSchedModel &SM, SmallVectorImpl<uint64_t> &Masks) {		const MCSchedModel &SM, SmallVectorImpl<uint64_t> &Masks) {
unsigned ProcResourceID = 0;		unsigned ProcResourceID = 0;

// We currently limit the resource kinds to 64 and below so that we can use		// We currently limit the resource kinds to 64 and below so that we can use
// uint64_t for Masks		// uint64_t for Masks
Show All 28 Lines	if (SwpShowResMask) {
ProcResource->Name, I, Masks[I],		ProcResource->Name, I, Masks[I],
ProcResource->NumUnits);		ProcResource->NumUnits);
}		}
dbgs() << " -----------------\n";		dbgs() << " -----------------\n";
}		}
});		});
}		}

bool ResourceManager::canReserveResources(const MCInstrDesc *MID) const {		bool ResourceManager::canReserveResources(SUnit &SU, int Cycle) {

LLVM_DEBUG({		LLVM_DEBUG({
if (SwpDebugResource)		if (SwpDebugResource)
dbgs() << "canReserveResources:\n";		dbgs() << "canReserveResources:\n";
});		});
if (UseDFA)		if (UseDFA)
return DFAResources->canReserveResources(MID);		return DFAResources[positiveModulo(Cycle, InitiationInterval)]
		->canReserveResources(&SU.getInstr()->getDesc());

unsigned InsnClass = MID->getSchedClass();		const MCSchedClassDesc *SCDesc = DAG->getSchedClass(&SU);
const MCSchedClassDesc *SCDesc = SM.getSchedClassDesc(InsnClass);
if (!SCDesc->isValid()) {		if (!SCDesc->isValid()) {
LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "No valid Schedule Class Desc for schedClass!\n";		dbgs() << "No valid Schedule Class Desc for schedClass!\n";
dbgs() << "isPseudo:" << MID->isPseudo() << "\n";		dbgs() << "isPseudo:" << SU.getInstr()->isPseudo() << "\n";
});		});
return true;		return true;
}		}

const MCWriteProcResEntry *I = STI->getWriteProcResBegin(SCDesc);		reserveResources(SCDesc, Cycle);
const MCWriteProcResEntry *E = STI->getWriteProcResEnd(SCDesc);		bool Result = !isOverbooked();
		dpenryUnsubmitted Done Reply Inline Actions Will need to adjust for use of NumMicroOps as described in later comments dpenry: Will need to adjust for use of NumMicroOps as described in later comments
for (; I != E; ++I) {		unreserveResources(SCDesc, Cycle);
if (!I->Cycles)
continue;		LLVM_DEBUG(if (SwpDebugResource) dbgs() << "return " << Result << "\n\n";);
const MCProcResourceDesc *ProcResource =		return Result;
SM.getProcResource(I->ProcResourceIdx);
unsigned NumUnits = ProcResource->NumUnits;
LLVM_DEBUG({
if (SwpDebugResource)
dbgs() << format(" %16s(%2d): Count: %2d, NumUnits:%2d, Cycles:%2d\n",
ProcResource->Name, I->ProcResourceIdx,
ProcResourceCount[I->ProcResourceIdx], NumUnits,
I->Cycles);
});
if (ProcResourceCount[I->ProcResourceIdx] >= NumUnits)
return false;
}
LLVM_DEBUG(if (SwpDebugResource) dbgs() << "return true\n\n";);
return true;
}		}

void ResourceManager::reserveResources(const MCInstrDesc *MID) {		void ResourceManager::reserveResources(SUnit &SU, int Cycle) {
LLVM_DEBUG({		LLVM_DEBUG({
if (SwpDebugResource)		if (SwpDebugResource)
dbgs() << "reserveResources:\n";		dbgs() << "reserveResources:\n";
});		});
if (UseDFA)		if (UseDFA)
return DFAResources->reserveResources(MID);		return DFAResources[positiveModulo(Cycle, InitiationInterval)]
		->reserveResources(&SU.getInstr()->getDesc());

unsigned InsnClass = MID->getSchedClass();		const MCSchedClassDesc *SCDesc = DAG->getSchedClass(&SU);
const MCSchedClassDesc *SCDesc = SM.getSchedClassDesc(InsnClass);
if (!SCDesc->isValid()) {		if (!SCDesc->isValid()) {
LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "No valid Schedule Class Desc for schedClass!\n";		dbgs() << "No valid Schedule Class Desc for schedClass!\n";
dbgs() << "isPseudo:" << MID->isPseudo() << "\n";		dbgs() << "isPseudo:" << SU.getInstr()->isPseudo() << "\n";
});		});
return;		return;
}		}

		reserveResources(SCDesc, Cycle);
		dpenryUnsubmitted Done Reply Inline Actions Should increment by SCDesc->NumMicroOps (see comment below while calculating ResMII) dpenry: Should increment by SCDesc->NumMicroOps (see comment below while calculating ResMII)

		LLVM_DEBUG({
		if (SwpDebugResource) {
		dumpMRT();
		dbgs() << "reserveResources: done!\n\n";
		}
		});
		}

		void ResourceManager::reserveResources(const MCSchedClassDesc *SCDesc,
		int Cycle) {
		assert(!UseDFA);
		for (const MCWriteProcResEntry &PRE : make_range(
		STI->getWriteProcResBegin(SCDesc), STI->getWriteProcResEnd(SCDesc)))
		for (int C = Cycle; C < Cycle + PRE.Cycles; ++C)
		++MRT[positiveModulo(C, InitiationInterval)][PRE.ProcResourceIdx];

		for (int C = Cycle; C < Cycle + SCDesc->NumMicroOps; ++C)
		++NumScheduledMops[positiveModulo(C, InitiationInterval)];
		}

		void ResourceManager::unreserveResources(const MCSchedClassDesc *SCDesc,
		int Cycle) {
		assert(!UseDFA);
		for (const MCWriteProcResEntry &PRE : make_range(
		STI->getWriteProcResBegin(SCDesc), STI->getWriteProcResEnd(SCDesc)))
		for (int C = Cycle; C < Cycle + PRE.Cycles; ++C)
		--MRT[positiveModulo(C, InitiationInterval)][PRE.ProcResourceIdx];

		for (int C = Cycle; C < Cycle + SCDesc->NumMicroOps; ++C)
		--NumScheduledMops[positiveModulo(C, InitiationInterval)];
		}

		bool ResourceManager::isOverbooked() const {
		assert(!UseDFA);
		for (int Slot = 0; Slot < InitiationInterval; ++Slot) {
		for (unsigned I = 1, E = SM.getNumProcResourceKinds(); I < E; ++I) {
		const MCProcResourceDesc *Desc = SM.getProcResource(I);
		if (MRT[Slot][I] > Desc->NumUnits)
		return true;
		}
		if (NumScheduledMops[Slot] > IssueWidth)
		return true;
		}
		return false;
		}

		int ResourceManager::calculateResMIIDFA() const {
		assert(UseDFA);

		// Sort the instructions by the number of available choices for scheduling,
		// least to most. Use the number of critical resources as the tie breaker.
		FuncUnitSorter FUS = FuncUnitSorter(*ST);
		for (SUnit &SU : DAG->SUnits)
		FUS.calcCriticalResources(*SU.getInstr());
		PriorityQueue<MachineInstr , std::vector<MachineInstr >, FuncUnitSorter>
		FuncUnitOrder(FUS);

		for (SUnit &SU : DAG->SUnits)
		FuncUnitOrder.push(SU.getInstr());

		SmallVector<std::unique_ptr<DFAPacketizer>, 8> Resources;
		Resources.push_back(
		std::unique_ptr<DFAPacketizer>(TII->CreateTargetScheduleState(*ST)));

		while (!FuncUnitOrder.empty()) {
		MachineInstr *MI = FuncUnitOrder.top();
		FuncUnitOrder.pop();
		if (TII->isZeroCost(MI->getOpcode()))
		continue;

		// Attempt to reserve the instruction in an existing DFA. At least one
		// DFA is needed for each cycle.
		unsigned NumCycles = DAG->getSUnit(MI)->Latency;
		unsigned ReservedCycles = 0;
		auto *RI = Resources.begin();
		auto *RE = Resources.end();
		LLVM_DEBUG({
		dbgs() << "Trying to reserve resource for " << NumCycles
		<< " cycles for \n";
		MI->dump();
		});
		for (unsigned C = 0; C < NumCycles; ++C)
		while (RI != RE) {
		if ((RI)->canReserveResources(MI)) {
		(RI)->reserveResources(MI);
		++ReservedCycles;
		break;
		}
		RI++;
		}
		LLVM_DEBUG(dbgs() << "ReservedCycles:" << ReservedCycles
		<< ", NumCycles:" << NumCycles << "\n");
		// Add new DFAs, if needed, to reserve resources.
		for (unsigned C = ReservedCycles; C < NumCycles; ++C) {
		LLVM_DEBUG(if (SwpDebugResource) dbgs()
		<< "NewResource created to reserve resources"
		<< "\n");
		auto NewResource = TII->CreateTargetScheduleState(ST);
		assert(NewResource->canReserveResources(*MI) && "Reserve error.");
		NewResource->reserveResources(*MI);
		Resources.push_back(std::unique_ptr<DFAPacketizer>(NewResource));
		}
		}

		int Resmii = Resources.size();
		LLVM_DEBUG(dbgs() << "Return Res MII:" << Resmii << "\n");
		return Resmii;
		}

		int ResourceManager::calculateResMII() const {
		if (UseDFA)
		return calculateResMIIDFA();

		// Count each resource consumption and divide it by the number of units.
		// ResMII is the max value among them.

		int NumMops = 0;
		SmallVector<uint64_t> ResourceCount(SM.getNumProcResourceKinds());
		for (SUnit &SU : DAG->SUnits) {
		if (TII->isZeroCost(SU.getInstr()->getOpcode()))
		continue;

		const MCSchedClassDesc *SCDesc = DAG->getSchedClass(&SU);
		if (!SCDesc->isValid())
		dpenryUnsubmitted Done Reply Inline Actions Use SM.resolveSchedClass here so that variant scheduling classes can be resolved. dpenry: Use SM.resolveSchedClass here so that variant scheduling classes can be resolved.
		continue;

		LLVM_DEBUG({
		if (SwpDebugResource) {
		DAG->dumpNode(SU);
		dpenryUnsubmitted Done Reply Inline Actions Increment by SCDesc->NumMicroOps, as NumMicroOps represents the amount of issue width taken by the instruction (at least that's how MachineScheduler interprets it). dpenry: Increment by SCDesc->NumMicroOps, as NumMicroOps represents the amount of issue width taken by…
		dbgs() << " #Mops: " << SCDesc->NumMicroOps << "\n"
		<< " WriteProcRes: ";
		}
		});
		NumMops += SCDesc->NumMicroOps;
for (const MCWriteProcResEntry &PRE :		for (const MCWriteProcResEntry &PRE :
make_range(STI->getWriteProcResBegin(SCDesc),		make_range(STI->getWriteProcResBegin(SCDesc),
STI->getWriteProcResEnd(SCDesc))) {		STI->getWriteProcResEnd(SCDesc))) {
if (!PRE.Cycles)
continue;
++ProcResourceCount[PRE.ProcResourceIdx];
LLVM_DEBUG({		LLVM_DEBUG({
if (SwpDebugResource) {		if (SwpDebugResource) {
const MCProcResourceDesc *ProcResource =		const MCProcResourceDesc *Desc =
SM.getProcResource(PRE.ProcResourceIdx);		SM.getProcResource(PRE.ProcResourceIdx);
dbgs() << format(" %16s(%2d): Count: %2d, NumUnits:%2d, Cycles:%2d\n",		dbgs() << Desc->Name << ": " << PRE.Cycles << ", ";
ProcResource->Name, PRE.ProcResourceIdx,
ProcResourceCount[PRE.ProcResourceIdx],
ProcResource->NumUnits, PRE.Cycles);
}		}
});		});
		ResourceCount[PRE.ProcResourceIdx] += PRE.Cycles;
		}
		LLVM_DEBUG(if (SwpDebugResource) dbgs() << "\n");
}		}

		int Result = (NumMops + IssueWidth - 1) / IssueWidth;
LLVM_DEBUG({		LLVM_DEBUG({
if (SwpDebugResource)		if (SwpDebugResource)
dbgs() << "reserveResources: done!\n\n";		dbgs() << "#Mops: " << NumMops << ", "
		<< "IssueWidth: " << IssueWidth << ", "
		<< "Cycles: " << Result << "\n";
});		});
}

bool ResourceManager::canReserveResources(const MachineInstr &MI) const {		LLVM_DEBUG({
return canReserveResources(&MI.getDesc());		if (SwpDebugResource) {
}		std::stringstream SS;
		SS << std::setw(2) << "ID" << std::setw(16) << "Name" << std::setw(10)
void ResourceManager::reserveResources(const MachineInstr &MI) {		<< "Units" << std::setw(10) << "Consumed" << std::setw(10) << "Cycles"
return reserveResources(&MI.getDesc());		<< "\n";
		dbgs() << SS.str();
}		}
		});
void ResourceManager::clearResources() {		for (unsigned I = 1, E = SM.getNumProcResourceKinds(); I < E; ++I) {
if (UseDFA)		const MCProcResourceDesc *Desc = SM.getProcResource(I);
return DFAResources->clearResources();		int Cycles = (ResourceCount[I] + Desc->NumUnits - 1) / Desc->NumUnits;
std::fill(ProcResourceCount.begin(), ProcResourceCount.end(), 0);		LLVM_DEBUG({
		if (SwpDebugResource) {
		std::stringstream SS;
		SS << std::setw(2) << I << std::setw(16) << Desc->Name << std::setw(10)
		<< Desc->NumUnits << std::setw(10) << ResourceCount[I]
		<< std::setw(10) << Cycles << "\n";
		dbgs() << SS.str();
		}
		});
		if (Cycles > Result)
		Result = Cycles;
		}
		return Result;
		}

		void ResourceManager::init(int II) {
		InitiationInterval = II;
		DFAResources.clear();
		DFAResources.resize(II);
		for (auto &I : DFAResources)
		I.reset(ST->getInstrInfo()->CreateTargetScheduleState(*ST));
		MRT.clear();
		MRT.resize(II, SmallVector<uint64_t>(SM.getNumProcResourceKinds()));
		NumScheduledMops.clear();
		NumScheduledMops.resize(II);
}		}

llvm/test/CodeGen/PowerPC/sms-iterator.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs\			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs\
	; RUN: -mcpu=pwr9 --ppc-enable-pipeliner -debug-only=pipeliner 2>&1 \			; RUN: -mcpu=pwr9 --ppc-enable-pipeliner -debug-only=pipeliner 2>&1 \
	; RUN: >/dev/null \| FileCheck %s			; RUN: >/dev/null \| FileCheck %s

	%0 = type { i32, [16 x double] }			%0 = type { i32, [16 x double] }

	; CHECK: MII = 8 MAX_II = 18			; CHECK: MII = 3 MAX_II = 13 (rec=3, res=2)

	define dso_local fastcc double @_ZN3povL9polysolveEiPdS0_() unnamed_addr #0 {			define dso_local fastcc double @_ZN3povL9polysolveEiPdS0_() unnamed_addr #0 {
	br label %1			br label %1

	1: ; preds = %1, %0			1: ; preds = %1, %0
	br i1 undef, label %2, label %1			br i1 undef, label %2, label %1

	2: ; preds = %1			2: ; preds = %1
	Show All 20 Lines

llvm/test/CodeGen/PowerPC/sms-phi-1.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs\			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs\
	; RUN: -mcpu=pwr9 --ppc-enable-pipeliner 2>&1 \| FileCheck %s			; RUN: -mcpu=pwr9 --ppc-enable-pipeliner --pipeliner-force-ii=4 2>&1 \| FileCheck %s

	define void @main() nounwind #0 {			define void @main() nounwind #0 {
	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mflr 0			; CHECK-NEXT: mflr 0
	; CHECK-NEXT: std 30, -16(1) # 8-byte Folded Spill			; CHECK-NEXT: std 30, -16(1) # 8-byte Folded Spill
	; CHECK-NEXT: std 0, 16(1)			; CHECK-NEXT: std 0, 16(1)
	; CHECK-NEXT: stdu 1, -48(1)			; CHECK-NEXT: stdu 1, -48(1)
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/sms-phi-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs\			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -verify-machineinstrs\
	; RUN: -mcpu=pwr9 --ppc-enable-pipeliner 2>&1 \| FileCheck %s			; RUN: -mcpu=pwr9 --ppc-enable-pipeliner --pipeliner-force-ii=15 2>&1 \| FileCheck %s

	define void @phi2(i32, i32, i8*) local_unnamed_addr {			define void @phi2(i32, i32, i8*) local_unnamed_addr {
	; CHECK-LABEL: phi2:			; CHECK-LABEL: phi2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: divw 8, 3, 4			; CHECK-NEXT: divw 8, 3, 4
	; CHECK-NEXT: li 5, 55			; CHECK-NEXT: li 5, 55
	; CHECK-NEXT: li 6, 48			; CHECK-NEXT: li 6, 48
	; CHECK-NEXT: mtctr 3			; CHECK-NEXT: mtctr 3
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-pipelineloops.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -O3 -mattr=+mve.fp,+use-mipipeliner -mcpu=cortex-m55 %s -o - -verify-machineinstrs \| FileCheck %s			; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -O3 -mattr=+mve.fp,+use-mipipeliner -mcpu=cortex-m55 %s -o - -verify-machineinstrs --pipeliner-force-issue-width=10 \| FileCheck %s

	define void @arm_cmplx_dot_prod_q15(ptr noundef %pSrcA, ptr noundef %pSrcB, i32 noundef %numSamples, ptr nocapture noundef writeonly %realResult, ptr nocapture noundef writeonly %imagResult) {			define void @arm_cmplx_dot_prod_q15(ptr noundef %pSrcA, ptr noundef %pSrcB, i32 noundef %numSamples, ptr nocapture noundef writeonly %realResult, ptr nocapture noundef writeonly %imagResult) {
	; CHECK-LABEL: arm_cmplx_dot_prod_q15:			; CHECK-LABEL: arm_cmplx_dot_prod_q15:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: .save {r4, r5, r6, r7, lr}			; CHECK-NEXT: .save {r4, r5, r6, r7, lr}
	; CHECK-NEXT: push {r4, r5, r6, r7, lr}			; CHECK-NEXT: push {r4, r5, r6, r7, lr}
	; CHECK-NEXT: .pad #4			; CHECK-NEXT: .pad #4
	; CHECK-NEXT: sub sp, #4			; CHECK-NEXT: sub sp, #4
	▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/swp-exitbranchdir.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=thumbv7m-none-eabi -mcpu=cortex-m7 -run-pass=pipeliner -o - %s \| FileCheck %s --check-prefix=CHECK			# RUN: llc -mtriple=thumbv7m-none-eabi -mcpu=cortex-m7 -run-pass=pipeliner --pipeliner-force-issue-width=10 -o - %s \| FileCheck %s --check-prefix=CHECK

	--- \|			--- \|
	define hidden float @dot(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {			define hidden float @dot(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {
	entry:			entry:
	%cmp8 = icmp sgt i32 %sz, 0			%cmp8 = icmp sgt i32 %sz, 0
	br i1 %cmp8, label %for.body.preheader, label %for.end			br i1 %cmp8, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/swp-fixedii-le.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=thumbv8.1m.main-none-eabi -mcpu=cortex-m85 -mattr=+use-mipipeliner -run-pass=pipeliner -o - %s \| FileCheck %s --check-prefix=CHECK			# RUN: llc -mtriple=thumbv8.1m.main-none-eabi -mcpu=cortex-m85 -mattr=+use-mipipeliner -run-pass=pipeliner --pipeliner-force-issue-width=10 -o - %s \| FileCheck %s --check-prefix=CHECK

	--- \|			--- \|
	define hidden float @dot(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {			define hidden float @dot(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {
	entry:			entry:
	%cmp8 = icmp sgt i32 %sz, 0			%cmp8 = icmp sgt i32 %sz, 0
	br i1 %cmp8, label %for.body.preheader, label %for.end			br i1 %cmp8, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/swp-fixedii.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=thumbv8.1m.main-none-eabi -mcpu=cortex-m85 -mattr=+use-mipipeliner -run-pass=pipeliner -o - %s \| FileCheck %s --check-prefix=CHECK			# RUN: llc -mtriple=thumbv8.1m.main-none-eabi -mcpu=cortex-m85 -mattr=+use-mipipeliner -run-pass=pipeliner --pipeliner-force-issue-width=10 -o - %s \| FileCheck %s --check-prefix=CHECK

	--- \|			--- \|
	define hidden float @dot(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {			define hidden float @dot(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {
	entry:			entry:
	%cmp8 = icmp sgt i32 %sz, 0			%cmp8 = icmp sgt i32 %sz, 0
	br i1 %cmp8, label %for.body.preheader, label %for.end			br i1 %cmp8, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/swp-regpressure.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=thumbv7m-none-eabi -mcpu=cortex-m7 -run-pass=pipeliner -o - %s \| FileCheck %s --check-prefix=CHECK			# RUN: llc -mtriple=thumbv7m-none-eabi -mcpu=cortex-m7 -run-pass=pipeliner --pipeliner-force-issue-width=10 -o - %s \| FileCheck %s --check-prefix=CHECK

	# This test checks that too much register pressure will cause the modulo			# This test checks that too much register pressure will cause the modulo
	# schedule to be rejected and that a test with the same resource usage			# schedule to be rejected and that a test with the same resource usage
	# but without register pressure is not rejected.			# but without register pressure is not rejected.

	--- \|			--- \|
	define hidden float @high_pressure(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {			define hidden float @high_pressure(float* nocapture noundef readonly %a, float* nocapture noundef readonly %b, i32 noundef %sz) local_unnamed_addr #0 {
	entry:			entry:
	▲ Show 20 Lines • Show All 518 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MachinePipeliner] Fix the interpretation of the scheduling modelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 460598

llvm/include/llvm/CodeGen/MachinePipeliner.h

llvm/lib/CodeGen/MachinePipeliner.cpp

llvm/test/CodeGen/PowerPC/sms-iterator.ll

llvm/test/CodeGen/PowerPC/sms-phi-1.ll

llvm/test/CodeGen/PowerPC/sms-phi-2.ll

llvm/test/CodeGen/Thumb2/mve-pipelineloops.ll

llvm/test/CodeGen/Thumb2/swp-exitbranchdir.mir

llvm/test/CodeGen/Thumb2/swp-fixedii-le.mir

llvm/test/CodeGen/Thumb2/swp-fixedii.mir

llvm/test/CodeGen/Thumb2/swp-regpressure.mir

[MachinePipeliner] Fix the interpretation of the scheduling model
ClosedPublic