This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add GCNMaxILPSchedStrategy
ClosedPublic

Authored by kerbowa on Jul 31 2022, 11:26 PM.

Download Raw Diff

Details

Reviewers

rampitec
vpykhtin
vangthao95
jrbyrnes

Commits

rGd7100b398b76: [AMDGPU] Add GCNMaxILPSchedStrategy

Summary

Creates a new scheduling strategy that attempts to maximize ILP for a single
wave.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kerbowa created this revision.Jul 31 2022, 11:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2022, 11:26 PM

Herald added subscribers: kosarev, jsilvanus, foad and 9 others. · View Herald Transcript

kerbowa requested review of this revision.Jul 31 2022, 11:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2022, 11:26 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B178502: Diff 448922.Aug 1 2022, 12:15 AM

foad added inline comments.Aug 1 2022, 7:03 AM

llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit.ll
2–3	Why has this changed?

kerbowa added inline comments.Aug 1 2022, 8:18 AM

llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit.ll
2–3	I renamed the iterative scheduler cl flags to have this "iterative" prefix. Mostly for clarity and to avoid confusion with this scheduling strategy that is being added in this patch.

Is scheduling for maximum ILP the same thing as scheduling for minimum latency?

Does this patch have anything in common with lib/Target/AMDGPU/GCNILPSched.cpp? (Is that even maintained?)

llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit.ll
2–3	Oh I see. I somehow missed that change in AMDGPUTargetMachine.cpp.

In D130869#3691259, @foad wrote:

Is scheduling for maximum ILP the same thing as scheduling for minimum latency?

Yes, it's the same.

Does this patch have anything in common with lib/Target/AMDGPU/GCNILPSched.cpp? (Is that even maintained?)

That is part of the iterative scheduler. I don't know if it is maintained. I did see it was crashing on some lit tests if I changed waves-per-eu.

In D130869#3691312, @kerbowa wrote:

In D130869#3691259, @foad wrote:

Does this patch have anything in common with lib/Target/AMDGPU/GCNILPSched.cpp? (Is that even maintained?)

That is part of the iterative scheduler. I don't know if it is maintained. I did see it was crashing on some lit tests if I changed waves-per-eu.

It was probably made obsolete by the recent changes to the default scheduler. @vpykhtin do you think we still have there something useful?

rampitec added inline comments.Aug 1 2022, 11:12 AM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
32	AFAIR PreRARematerialize shall be a last stage, it was leaving some variables in an inconsistent state. Before the last refactoring there was even static_assert about that. I see that you are building stages pipeline within SchedStages, but probably it makes sense to reorder the enum and redefine operator++ to walk SchedStages instead of static casting integers now.

kerbowa added inline comments.Aug 1 2022, 11:55 AM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
32	The idea is that different SchedStrategies may have different stages or permutations of stages. I already defined operator++ in a previous patch. I'm not sure what casting of integers you are referring to. In this patch, each SchedStrategy has the order of its stages defined in the SchedStages vector. I should make the max occupancy strategy assert that PreRARemat is the last stage in that vector and make all the checks relative to that vector.

rampitec added inline comments.Aug 1 2022, 12:17 PM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
32	I mean the cast at line 42. It might be clearer to use a next stage from SchedStages vector rather than just incrementing GCNSchedStageID numerically. Ten years from now nobody will remember why PreRARematerialize shall be a last one, and code can easily avoid using that operator alltogether, like it does using a range based loop for SchedStages in the runSchedStages. Then imagine you would like to add it to the ILP pipeline too, it will just break operator++ logic.

+1 for the general idea of having a max ilp strategy that we can opt into.

Address comments.

LGTM, thanks!

This revision is now accepted and ready to land.Aug 2 2022, 12:48 PM

This revision was landed with ongoing or failed builds.Aug 2 2022, 1:21 PM

Closed by commit rGd7100b398b76: [AMDGPU] Add GCNMaxILPSchedStrategy (authored by kerbowa). · Explain Why

This revision was automatically updated to reflect the committed changes.

kerbowa added a commit: rGd7100b398b76: [AMDGPU] Add GCNMaxILPSchedStrategy.

Harbormaster completed remote builds in B178831: Diff 449393.Aug 2 2022, 1:59 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

39 lines

GCNSchedStrategy.h

91 lines

GCNSchedStrategy.cpp

197 lines

test/

CodeGen/

AMDGPU/

schedule-ilp.ll

3 lines

schedule-regpressure-limit.ll

4 lines

schedule-regpressure-limit2.ll

8 lines

schedule-regpressure-limit3.ll

2 lines

Diff 449393

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	createGCNMaxOccupancyMachineScheduler(MachineSchedContext *C) {
DAG->addMutation(createIGroupLPDAGMutation());		DAG->addMutation(createIGroupLPDAGMutation());
DAG->addMutation(createSchedBarrierDAGMutation());		DAG->addMutation(createSchedBarrierDAGMutation());
DAG->addMutation(createAMDGPUMacroFusionDAGMutation());		DAG->addMutation(createAMDGPUMacroFusionDAGMutation());
DAG->addMutation(createAMDGPUExportClusteringDAGMutation());		DAG->addMutation(createAMDGPUExportClusteringDAGMutation());
return DAG;		return DAG;
}		}

static ScheduleDAGInstrs *		static ScheduleDAGInstrs *
		createGCNMaxILPMachineScheduler(MachineSchedContext *C) {
		ScheduleDAGMILive *DAG =
		new GCNScheduleDAGMILive(C, std::make_unique<GCNMaxILPSchedStrategy>(C));
		DAG->addMutation(createIGroupLPDAGMutation());
		DAG->addMutation(createSchedBarrierDAGMutation());
		return DAG;
		}

		static ScheduleDAGInstrs *
createIterativeGCNMaxOccupancyMachineScheduler(MachineSchedContext *C) {		createIterativeGCNMaxOccupancyMachineScheduler(MachineSchedContext *C) {
const GCNSubtarget &ST = C->MF->getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = C->MF->getSubtarget<GCNSubtarget>();
auto DAG = new GCNIterativeScheduler(C,		auto DAG = new GCNIterativeScheduler(C,
GCNIterativeScheduler::SCHEDULE_LEGACYMAXOCCUPANCY);		GCNIterativeScheduler::SCHEDULE_LEGACYMAXOCCUPANCY);
DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));		DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));
if (ST.shouldClusterStores())		if (ST.shouldClusterStores())
DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));		DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));
return DAG;		return DAG;
Show All 21 Lines	SISchedRegistry("si", "Run SI's custom scheduler",
createSIMachineScheduler);		createSIMachineScheduler);

static MachineSchedRegistry		static MachineSchedRegistry
GCNMaxOccupancySchedRegistry("gcn-max-occupancy",		GCNMaxOccupancySchedRegistry("gcn-max-occupancy",
"Run GCN scheduler to maximize occupancy",		"Run GCN scheduler to maximize occupancy",
createGCNMaxOccupancyMachineScheduler);		createGCNMaxOccupancyMachineScheduler);

static MachineSchedRegistry		static MachineSchedRegistry
IterativeGCNMaxOccupancySchedRegistry("gcn-max-occupancy-experimental",		GCNMaxILPSchedRegistry("gcn-max-ilp", "Run GCN scheduler to maximize ilp",
		createGCNMaxILPMachineScheduler);

		static MachineSchedRegistry IterativeGCNMaxOccupancySchedRegistry(
		"gcn-iterative-max-occupancy-experimental",
"Run GCN scheduler to maximize occupancy (experimental)",		"Run GCN scheduler to maximize occupancy (experimental)",
createIterativeGCNMaxOccupancyMachineScheduler);		createIterativeGCNMaxOccupancyMachineScheduler);

static MachineSchedRegistry		static MachineSchedRegistry GCNMinRegSchedRegistry(
GCNMinRegSchedRegistry("gcn-minreg",		"gcn-iterative-minreg",
"Run GCN iterative scheduler for minimal register usage (experimental)",		"Run GCN iterative scheduler for minimal register usage (experimental)",
createMinRegScheduler);		createMinRegScheduler);

static MachineSchedRegistry		static MachineSchedRegistry GCNILPSchedRegistry(
GCNILPSchedRegistry("gcn-ilp",		"gcn-iterative-ilp",
"Run GCN iterative scheduler for ILP scheduling (experimental)",		"Run GCN iterative scheduler for ILP scheduling (experimental)",
createIterativeILPMachineScheduler);		createIterativeILPMachineScheduler);

static StringRef computeDataLayout(const Triple &TT) {		static StringRef computeDataLayout(const Triple &TT) {
if (TT.getArch() == Triple::r600) {		if (TT.getArch() == Triple::r600) {
// 32-bit pointers.		// 32-bit pointers.
return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"		return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1";		"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1";
}		}

▲ Show 20 Lines • Show All 1,143 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/GCNSchedStrategy.h

Show All 16 Lines
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"

namespace llvm {		namespace llvm {

class SIMachineFunctionInfo;		class SIMachineFunctionInfo;
class SIRegisterInfo;		class SIRegisterInfo;
class GCNSubtarget;		class GCNSubtarget;
		class GCNSchedStage;

		enum class GCNSchedStageID : unsigned {
		OccInitialSchedule = 0,
		UnclusteredHighRPReschedule = 1,
		ClusteredLowOccupancyReschedule = 2,
		PreRARematerialize = 3,
		ILPInitialSchedule = 4
		rampitecUnsubmitted Not Done Reply Inline Actions AFAIR PreRARematerialize shall be a last stage, it was leaving some variables in an inconsistent state. Before the last refactoring there was even static_assert about that. I see that you are building stages pipeline within SchedStages, but probably it makes sense to reorder the enum and redefine operator++ to walk SchedStages instead of static casting integers now. rampitec: AFAIR PreRARematerialize shall be a last stage, it was leaving some variables in an…
		kerbowaAuthorUnsubmitted Not Done Reply Inline Actions The idea is that different SchedStrategies may have different stages or permutations of stages. I already defined operator++ in a previous patch. I'm not sure what casting of integers you are referring to. In this patch, each SchedStrategy has the order of its stages defined in the SchedStages vector. I should make the max occupancy strategy assert that PreRARemat is the last stage in that vector and make all the checks relative to that vector. kerbowa: The idea is that different SchedStrategies may have different stages or permutations of stages.
		rampitecUnsubmitted Not Done Reply Inline Actions I mean the cast at line 42. It might be clearer to use a next stage from SchedStages vector rather than just incrementing GCNSchedStageID numerically. Ten years from now nobody will remember why PreRARematerialize shall be a last one, and code can easily avoid using that operator alltogether, like it does using a range based loop for SchedStages in the runSchedStages. Then imagine you would like to add it to the ILP pipeline too, it will just break operator++ logic. rampitec: I mean the cast at line 42. It might be clearer to use a next stage from SchedStages vector…
		};

		#ifndef NDEBUG
		raw_ostream &operator<<(raw_ostream &OS, const GCNSchedStageID &StageID);
		#endif

/// This is a minimal scheduler strategy. The main difference between this		/// This is a minimal scheduler strategy. The main difference between this
/// and the GenericScheduler is that GCNSchedStrategy uses different		/// and the GenericScheduler is that GCNSchedStrategy uses different
/// heuristics to determine excess/critical pressure sets. Its goal is to		/// heuristics to determine excess/critical pressure sets.
/// maximize kernel occupancy (i.e. maximum number of waves per simd).		class GCNSchedStrategy : public GenericScheduler {
class GCNMaxOccupancySchedStrategy final : public GenericScheduler {		protected:
SUnit *pickNodeBidirectional(bool &IsTopNode);		SUnit *pickNodeBidirectional(bool &IsTopNode);

void pickNodeFromQueue(SchedBoundary &Zone, const CandPolicy &ZonePolicy,		void pickNodeFromQueue(SchedBoundary &Zone, const CandPolicy &ZonePolicy,
const RegPressureTracker &RPTracker,		const RegPressureTracker &RPTracker,
SchedCandidate &Cand);		SchedCandidate &Cand);

void initCandidate(SchedCandidate &Cand, SUnit *SU,		void initCandidate(SchedCandidate &Cand, SUnit *SU,
bool AtTop, const RegPressureTracker &RPTracker,		bool AtTop, const RegPressureTracker &RPTracker,
const SIRegisterInfo *SRI,		const SIRegisterInfo *SRI,
unsigned SGPRPressure, unsigned VGPRPressure);		unsigned SGPRPressure, unsigned VGPRPressure);

std::vector<unsigned> Pressure;		std::vector<unsigned> Pressure;

std::vector<unsigned> MaxPressure;		std::vector<unsigned> MaxPressure;

unsigned SGPRExcessLimit;		unsigned SGPRExcessLimit;

unsigned VGPRExcessLimit;		unsigned VGPRExcessLimit;

unsigned TargetOccupancy;		unsigned TargetOccupancy;

MachineFunction *MF;		MachineFunction *MF;

		// Scheduling stages for this strategy.
		SmallVector<GCNSchedStageID, 4> SchedStages;

		// Pointer to the current SchedStageID.
		SmallVectorImpl<GCNSchedStageID>::iterator CurrentStage = nullptr;

public:		public:
// schedule() have seen register pressure over the critical limits and had to		// schedule() have seen register pressure over the critical limits and had to
// track register pressure for actual scheduling heuristics.		// track register pressure for actual scheduling heuristics.
bool HasHighPressure;		bool HasHighPressure;

// An error margin is necessary because of poor performance of the generic RP		// An error margin is necessary because of poor performance of the generic RP
// tracker and can be adjusted up for tuning heuristics to try and more		// tracker and can be adjusted up for tuning heuristics to try and more
// aggressively reduce register pressure.		// aggressively reduce register pressure.
const unsigned DefaultErrorMargin = 3;		const unsigned DefaultErrorMargin = 3;

const unsigned HighRPErrorMargin = 10;		const unsigned HighRPErrorMargin = 10;

unsigned ErrorMargin = DefaultErrorMargin;		unsigned ErrorMargin = DefaultErrorMargin;

unsigned SGPRCriticalLimit;		unsigned SGPRCriticalLimit;

unsigned VGPRCriticalLimit;		unsigned VGPRCriticalLimit;

GCNMaxOccupancySchedStrategy(const MachineSchedContext *C);		GCNSchedStrategy(const MachineSchedContext *C);

SUnit *pickNode(bool &IsTopNode) override;		SUnit *pickNode(bool &IsTopNode) override;

void initialize(ScheduleDAGMI *DAG) override;		void initialize(ScheduleDAGMI *DAG) override;

unsigned getTargetOccupancy() { return TargetOccupancy; }		unsigned getTargetOccupancy() { return TargetOccupancy; }

void setTargetOccupancy(unsigned Occ) { TargetOccupancy = Occ; }		void setTargetOccupancy(unsigned Occ) { TargetOccupancy = Occ; }

		GCNSchedStageID getCurrentStage();

		// Advances stage. Returns true if there are remaining stages.
		bool advanceStage();

		bool hasNextStage() const;

		GCNSchedStageID getNextStage() const;
};		};

enum class GCNSchedStageID : unsigned {		/// The goal of this scheduling strategy is to maximize kernel occupancy (i.e.
InitialSchedule = 0,		/// maximum number of waves per simd).
UnclusteredHighRPReschedule = 1,		class GCNMaxOccupancySchedStrategy final : public GCNSchedStrategy {
ClusteredLowOccupancyReschedule = 2,		public:
PreRARematerialize = 3,		GCNMaxOccupancySchedStrategy(const MachineSchedContext *C);
LastStage = PreRARematerialize
};		};

#ifndef NDEBUG		/// The goal of this scheduling strategy is to maximize ILP for a single wave
raw_ostream &operator<<(raw_ostream &OS, const GCNSchedStageID &StageID);		/// (i.e. latency hiding).
#endif		class GCNMaxILPSchedStrategy final : public GCNSchedStrategy {
		protected:
		bool tryCandidate(SchedCandidate &Cand, SchedCandidate &TryCand,
		SchedBoundary *Zone) const override;

inline GCNSchedStageID &operator++(GCNSchedStageID &Stage, int) {		public:
assert(Stage != GCNSchedStageID::PreRARematerialize);		GCNMaxILPSchedStrategy(const MachineSchedContext *C);
Stage = static_cast<GCNSchedStageID>(static_cast<unsigned>(Stage) + 1);		};
return Stage;
}

inline GCNSchedStageID nextStage(const GCNSchedStageID Stage) {
return static_cast<GCNSchedStageID>(static_cast<unsigned>(Stage) + 1);
}

inline bool operator>(GCNSchedStageID &LHS, GCNSchedStageID &RHS) {
return static_cast<unsigned>(LHS) > static_cast<unsigned>(RHS);
}

class GCNScheduleDAGMILive final : public ScheduleDAGMILive {		class GCNScheduleDAGMILive final : public ScheduleDAGMILive {
friend class GCNSchedStage;		friend class GCNSchedStage;
friend class InitialScheduleStage;		friend class OccInitialScheduleStage;
friend class UnclusteredHighRPStage;		friend class UnclusteredHighRPStage;
friend class ClusteredLowOccStage;		friend class ClusteredLowOccStage;
friend class PreRARematStage;		friend class PreRARematStage;
		friend class ILPInitialScheduleStage;

const GCNSubtarget &ST;		const GCNSubtarget &ST;

SIMachineFunctionInfo &MFI;		SIMachineFunctionInfo &MFI;

// Occupancy target at the beginning of function scheduling cycle.		// Occupancy target at the beginning of function scheduling cycle.
unsigned StartingOccupancy;		unsigned StartingOccupancy;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	class GCNScheduleDAGMILive final : public ScheduleDAGMILive {
void updateRegionBoundaries(		void updateRegionBoundaries(
SmallVectorImpl<std::pair<MachineBasicBlock::iterator,		SmallVectorImpl<std::pair<MachineBasicBlock::iterator,
MachineBasicBlock::iterator>> &RegionBoundaries,		MachineBasicBlock::iterator>> &RegionBoundaries,
MachineBasicBlock::iterator MI, MachineInstr *NewMI,		MachineBasicBlock::iterator MI, MachineInstr *NewMI,
bool Removing = false);		bool Removing = false);

void runSchedStages();		void runSchedStages();

		std::unique_ptr<GCNSchedStage> createSchedStage(GCNSchedStageID SchedStageID);

public:		public:
GCNScheduleDAGMILive(MachineSchedContext *C,		GCNScheduleDAGMILive(MachineSchedContext *C,
std::unique_ptr<MachineSchedStrategy> S);		std::unique_ptr<MachineSchedStrategy> S);

void schedule() override;		void schedule() override;

void finalizeSchedule() override;		void finalizeSchedule() override;
};		};

// GCNSchedStrategy applies multiple scheduling stages to a function.		// GCNSchedStrategy applies multiple scheduling stages to a function.
class GCNSchedStage {		class GCNSchedStage {
protected:		protected:
GCNScheduleDAGMILive &DAG;		GCNScheduleDAGMILive &DAG;

GCNMaxOccupancySchedStrategy &S;		GCNSchedStrategy &S;

MachineFunction &MF;		MachineFunction &MF;

SIMachineFunctionInfo &MFI;		SIMachineFunctionInfo &MFI;

const GCNSubtarget &ST;		const GCNSubtarget &ST;

const GCNSchedStageID StageID;		const GCNSchedStageID StageID;
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	public:
// Attempt to revert scheduling for this region.		// Attempt to revert scheduling for this region.
void revertScheduling();		void revertScheduling();

void advanceRegion() { RegionIdx++; }		void advanceRegion() { RegionIdx++; }

virtual ~GCNSchedStage() = default;		virtual ~GCNSchedStage() = default;
};		};

class InitialScheduleStage : public GCNSchedStage {		class OccInitialScheduleStage : public GCNSchedStage {
public:		public:
bool shouldRevertScheduling(unsigned WavesAfter) override;		bool shouldRevertScheduling(unsigned WavesAfter) override;

InitialScheduleStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)		OccInitialScheduleStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)
: GCNSchedStage(StageID, DAG) {}		: GCNSchedStage(StageID, DAG) {}
};		};

class UnclusteredHighRPStage : public GCNSchedStage {		class UnclusteredHighRPStage : public GCNSchedStage {
private:		private:
std::vector<std::unique_ptr<ScheduleDAGMutation>> SavedMutations;		std::vector<std::unique_ptr<ScheduleDAGMutation>> SavedMutations;

// Save the initial occupancy before starting this stage.		// Save the initial occupancy before starting this stage.
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:
bool initGCNRegion() override;		bool initGCNRegion() override;

bool shouldRevertScheduling(unsigned WavesAfter) override;		bool shouldRevertScheduling(unsigned WavesAfter) override;

PreRARematStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)		PreRARematStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)
: GCNSchedStage(StageID, DAG) {}		: GCNSchedStage(StageID, DAG) {}
};		};

		class ILPInitialScheduleStage : public GCNSchedStage {
		public:
		bool shouldRevertScheduling(unsigned WavesAfter) override;

		ILPInitialScheduleStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)
		: GCNSchedStage(StageID, DAG) {}
		};

} // End namespace llvm		} // End namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H		#endif // LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

Show All 32 Lines

cl::opt<bool>		cl::opt<bool>
DisableUnclusterHighRP("amdgpu-disable-unclustred-high-rp-reschedule",		DisableUnclusterHighRP("amdgpu-disable-unclustred-high-rp-reschedule",
cl::Hidden,		cl::Hidden,
cl::desc("Disable unclustred high register pressure "		cl::desc("Disable unclustred high register pressure "
"reduction scheduling stage."),		"reduction scheduling stage."),
cl::init(false));		cl::init(false));

GCNMaxOccupancySchedStrategy::GCNMaxOccupancySchedStrategy(		GCNSchedStrategy::GCNSchedStrategy(const MachineSchedContext *C)
const MachineSchedContext *C)
: GenericScheduler(C), TargetOccupancy(0), MF(nullptr),		: GenericScheduler(C), TargetOccupancy(0), MF(nullptr),
HasHighPressure(false) {}		HasHighPressure(false) {}

void GCNMaxOccupancySchedStrategy::initialize(ScheduleDAGMI *DAG) {		void GCNSchedStrategy::initialize(ScheduleDAGMI *DAG) {
GenericScheduler::initialize(DAG);		GenericScheduler::initialize(DAG);

MF = &DAG->MF;		MF = &DAG->MF;

const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();

SGPRExcessLimit =		SGPRExcessLimit =
Context->RegClassInfo->getNumAllocatableRegs(&AMDGPU::SGPR_32RegClass);		Context->RegClassInfo->getNumAllocatableRegs(&AMDGPU::SGPR_32RegClass);
Show All 14 Lines	void GCNSchedStrategy::initialize(ScheduleDAGMI *DAG) {
SGPRCriticalLimit =		SGPRCriticalLimit =
std::min(SGPRCriticalLimit - ErrorMargin, SGPRCriticalLimit);		std::min(SGPRCriticalLimit - ErrorMargin, SGPRCriticalLimit);
VGPRCriticalLimit =		VGPRCriticalLimit =
std::min(VGPRCriticalLimit - ErrorMargin, VGPRCriticalLimit);		std::min(VGPRCriticalLimit - ErrorMargin, VGPRCriticalLimit);
SGPRExcessLimit = std::min(SGPRExcessLimit - ErrorMargin, SGPRExcessLimit);		SGPRExcessLimit = std::min(SGPRExcessLimit - ErrorMargin, SGPRExcessLimit);
VGPRExcessLimit = std::min(VGPRExcessLimit - ErrorMargin, VGPRExcessLimit);		VGPRExcessLimit = std::min(VGPRExcessLimit - ErrorMargin, VGPRExcessLimit);
}		}

void GCNMaxOccupancySchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU,		void GCNSchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU,
bool AtTop, const RegPressureTracker &RPTracker,		bool AtTop,
		const RegPressureTracker &RPTracker,
const SIRegisterInfo *SRI,		const SIRegisterInfo *SRI,
unsigned SGPRPressure,		unsigned SGPRPressure,
unsigned VGPRPressure) {		unsigned VGPRPressure) {
Cand.SU = SU;		Cand.SU = SU;
Cand.AtTop = AtTop;		Cand.AtTop = AtTop;

if (!DAG->isTrackingPressure())		if (!DAG->isTrackingPressure())
return;		return;
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (SGPRDelta > VGPRDelta) {
PressureChange(AMDGPU::RegisterPressureSets::VGPR_32);		PressureChange(AMDGPU::RegisterPressureSets::VGPR_32);
Cand.RPDelta.CriticalMax.setUnitInc(VGPRDelta);		Cand.RPDelta.CriticalMax.setUnitInc(VGPRDelta);
}		}
}		}
}		}

// This function is mostly cut and pasted from		// This function is mostly cut and pasted from
// GenericScheduler::pickNodeFromQueue()		// GenericScheduler::pickNodeFromQueue()
void GCNMaxOccupancySchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,		void GCNSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,
const CandPolicy &ZonePolicy,		const CandPolicy &ZonePolicy,
const RegPressureTracker &RPTracker,		const RegPressureTracker &RPTracker,
SchedCandidate &Cand) {		SchedCandidate &Cand) {
const SIRegisterInfo SRI = static_cast<const SIRegisterInfo>(TRI);		const SIRegisterInfo SRI = static_cast<const SIRegisterInfo>(TRI);
ArrayRef<unsigned> Pressure = RPTracker.getRegSetPressureAtPos();		ArrayRef<unsigned> Pressure = RPTracker.getRegSetPressureAtPos();
unsigned SGPRPressure = 0;		unsigned SGPRPressure = 0;
unsigned VGPRPressure = 0;		unsigned VGPRPressure = 0;
if (DAG->isTrackingPressure()) {		if (DAG->isTrackingPressure()) {
SGPRPressure = Pressure[AMDGPU::RegisterPressureSets::SReg_32];		SGPRPressure = Pressure[AMDGPU::RegisterPressureSets::SReg_32];
VGPRPressure = Pressure[AMDGPU::RegisterPressureSets::VGPR_32];		VGPRPressure = Pressure[AMDGPU::RegisterPressureSets::VGPR_32];
}		}
ReadyQueue &Q = Zone.Available;		ReadyQueue &Q = Zone.Available;
for (SUnit *SU : Q) {		for (SUnit *SU : Q) {

SchedCandidate TryCand(ZonePolicy);		SchedCandidate TryCand(ZonePolicy);
initCandidate(TryCand, SU, Zone.isTop(), RPTracker, SRI,		initCandidate(TryCand, SU, Zone.isTop(), RPTracker, SRI,
SGPRPressure, VGPRPressure);		SGPRPressure, VGPRPressure);
// Pass SchedBoundary only when comparing nodes from the same boundary.		// Pass SchedBoundary only when comparing nodes from the same boundary.
SchedBoundary *ZoneArg = Cand.AtTop == TryCand.AtTop ? &Zone : nullptr;		SchedBoundary *ZoneArg = Cand.AtTop == TryCand.AtTop ? &Zone : nullptr;
GenericScheduler::tryCandidate(Cand, TryCand, ZoneArg);		tryCandidate(Cand, TryCand, ZoneArg);
if (TryCand.Reason != NoCand) {		if (TryCand.Reason != NoCand) {
// Initialize resource delta if needed in case future heuristics query it.		// Initialize resource delta if needed in case future heuristics query it.
if (TryCand.ResDelta == SchedResourceDelta())		if (TryCand.ResDelta == SchedResourceDelta())
TryCand.initResourceDelta(Zone.DAG, SchedModel);		TryCand.initResourceDelta(Zone.DAG, SchedModel);
Cand.setBest(TryCand);		Cand.setBest(TryCand);
LLVM_DEBUG(traceCandidate(Cand));		LLVM_DEBUG(traceCandidate(Cand));
}		}
}		}
}		}

// This function is mostly cut and pasted from		// This function is mostly cut and pasted from
// GenericScheduler::pickNodeBidirectional()		// GenericScheduler::pickNodeBidirectional()
SUnit *GCNMaxOccupancySchedStrategy::pickNodeBidirectional(bool &IsTopNode) {		SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
// Schedule as far as possible in the direction of no choice. This is most		// Schedule as far as possible in the direction of no choice. This is most
// efficient, but also provides the best heuristics for CriticalPSets.		// efficient, but also provides the best heuristics for CriticalPSets.
if (SUnit *SU = Bot.pickOnlyChoice()) {		if (SUnit *SU = Bot.pickOnlyChoice()) {
IsTopNode = false;		IsTopNode = false;
return SU;		return SU;
}		}
if (SUnit *SU = Top.pickOnlyChoice()) {		if (SUnit *SU = Top.pickOnlyChoice()) {
IsTopNode = true;		IsTopNode = true;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
#endif		#endif
}		}

// Pick best from BotCand and TopCand.		// Pick best from BotCand and TopCand.
LLVM_DEBUG(dbgs() << "Top Cand: "; traceCandidate(TopCand);		LLVM_DEBUG(dbgs() << "Top Cand: "; traceCandidate(TopCand);
dbgs() << "Bot Cand: "; traceCandidate(BotCand););		dbgs() << "Bot Cand: "; traceCandidate(BotCand););
SchedCandidate Cand = BotCand;		SchedCandidate Cand = BotCand;
TopCand.Reason = NoCand;		TopCand.Reason = NoCand;
GenericScheduler::tryCandidate(Cand, TopCand, nullptr);		tryCandidate(Cand, TopCand, nullptr);
if (TopCand.Reason != NoCand) {		if (TopCand.Reason != NoCand) {
Cand.setBest(TopCand);		Cand.setBest(TopCand);
}		}
LLVM_DEBUG(dbgs() << "Picking: "; traceCandidate(Cand););		LLVM_DEBUG(dbgs() << "Picking: "; traceCandidate(Cand););

IsTopNode = Cand.AtTop;		IsTopNode = Cand.AtTop;
return Cand.SU;		return Cand.SU;
}		}

// This function is mostly cut and pasted from		// This function is mostly cut and pasted from
// GenericScheduler::pickNode()		// GenericScheduler::pickNode()
SUnit *GCNMaxOccupancySchedStrategy::pickNode(bool &IsTopNode) {		SUnit *GCNSchedStrategy::pickNode(bool &IsTopNode) {
if (DAG->top() == DAG->bottom()) {		if (DAG->top() == DAG->bottom()) {
assert(Top.Available.empty() && Top.Pending.empty() &&		assert(Top.Available.empty() && Top.Pending.empty() &&
Bot.Available.empty() && Bot.Pending.empty() && "ReadyQ garbage");		Bot.Available.empty() && Bot.Pending.empty() && "ReadyQ garbage");
return nullptr;		return nullptr;
}		}
SUnit *SU;		SUnit *SU;
do {		do {
if (RegionPolicy.OnlyTopDown) {		if (RegionPolicy.OnlyTopDown) {
Show All 26 Lines	SUnit *GCNSchedStrategy::pickNode(bool &IsTopNode) {
if (SU->isBottomReady())		if (SU->isBottomReady())
Bot.removeReady(SU);		Bot.removeReady(SU);

LLVM_DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "		LLVM_DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "
<< *SU->getInstr());		<< *SU->getInstr());
return SU;		return SU;
}		}

		GCNSchedStageID GCNSchedStrategy::getCurrentStage() {
		assert(CurrentStage && CurrentStage != SchedStages.end());
		return *CurrentStage;
		}

		bool GCNSchedStrategy::advanceStage() {
		assert(CurrentStage != SchedStages.end());
		if (!CurrentStage)
		CurrentStage = SchedStages.begin();
		else
		CurrentStage++;

		return CurrentStage != SchedStages.end();
		}

		bool GCNSchedStrategy::hasNextStage() const {
		assert(CurrentStage);
		return std::next(CurrentStage) != SchedStages.end();
		}

		GCNSchedStageID GCNSchedStrategy::getNextStage() const {
		assert(CurrentStage && std::next(CurrentStage) != SchedStages.end());
		return *std::next(CurrentStage);
		}

		GCNMaxOccupancySchedStrategy::GCNMaxOccupancySchedStrategy(
		const MachineSchedContext *C)
		: GCNSchedStrategy(C) {
		SchedStages.push_back(GCNSchedStageID::OccInitialSchedule);
		SchedStages.push_back(GCNSchedStageID::UnclusteredHighRPReschedule);
		SchedStages.push_back(GCNSchedStageID::ClusteredLowOccupancyReschedule);
		SchedStages.push_back(GCNSchedStageID::PreRARematerialize);
		}

		GCNMaxILPSchedStrategy::GCNMaxILPSchedStrategy(const MachineSchedContext *C)
		: GCNSchedStrategy(C) {
		SchedStages.push_back(GCNSchedStageID::ILPInitialSchedule);
		}

		bool GCNMaxILPSchedStrategy::tryCandidate(SchedCandidate &Cand,
		SchedCandidate &TryCand,
		SchedBoundary *Zone) const {
		// Initialize the candidate if needed.
		if (!Cand.isValid()) {
		TryCand.Reason = NodeOrder;
		return true;
		}

		// Avoid spilling by exceeding the register limit.
		if (DAG->isTrackingPressure() &&
		tryPressure(TryCand.RPDelta.Excess, Cand.RPDelta.Excess, TryCand, Cand,
		RegExcess, TRI, DAG->MF))
		return TryCand.Reason != NoCand;

		// Bias PhysReg Defs and copies to their uses and defined respectively.
		if (tryGreater(biasPhysReg(TryCand.SU, TryCand.AtTop),
		biasPhysReg(Cand.SU, Cand.AtTop), TryCand, Cand, PhysReg))
		return TryCand.Reason != NoCand;

		bool SameBoundary = Zone != nullptr;
		if (SameBoundary) {
		// Prioritize instructions that read unbuffered resources by stall cycles.
		if (tryLess(Zone->getLatencyStallCycles(TryCand.SU),
		Zone->getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))
		return TryCand.Reason != NoCand;

		// Avoid critical resource consumption and balance the schedule.
		TryCand.initResourceDelta(DAG, SchedModel);
		if (tryLess(TryCand.ResDelta.CritResources, Cand.ResDelta.CritResources,
		TryCand, Cand, ResourceReduce))
		return TryCand.Reason != NoCand;
		if (tryGreater(TryCand.ResDelta.DemandedResources,
		Cand.ResDelta.DemandedResources, TryCand, Cand,
		ResourceDemand))
		return TryCand.Reason != NoCand;

		// Unconditionally try to reduce latency.
		if (tryLatency(TryCand, Cand, *Zone))
		return TryCand.Reason != NoCand;

		// Weak edges are for clustering and other constraints.
		if (tryLess(getWeakLeft(TryCand.SU, TryCand.AtTop),
		getWeakLeft(Cand.SU, Cand.AtTop), TryCand, Cand, Weak))
		return TryCand.Reason != NoCand;
		}

		// Keep clustered nodes together to encourage downstream peephole
		// optimizations which may reduce resource requirements.
		//
		// This is a best effort to set things up for a post-RA pass. Optimizations
		// like generating loads of multiple registers should ideally be done within
		// the scheduler pass by combining the loads during DAG postprocessing.
		const SUnit *CandNextClusterSU =
		Cand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
		const SUnit *TryCandNextClusterSU =
		TryCand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
		if (tryGreater(TryCand.SU == TryCandNextClusterSU,
		Cand.SU == CandNextClusterSU, TryCand, Cand, Cluster))
		return TryCand.Reason != NoCand;

		// Avoid increasing the max critical pressure in the scheduled region.
		if (DAG->isTrackingPressure() &&
		tryPressure(TryCand.RPDelta.CriticalMax, Cand.RPDelta.CriticalMax,
		TryCand, Cand, RegCritical, TRI, DAG->MF))
		return TryCand.Reason != NoCand;

		// Avoid increasing the max pressure of the entire region.
		if (DAG->isTrackingPressure() &&
		tryPressure(TryCand.RPDelta.CurrentMax, Cand.RPDelta.CurrentMax, TryCand,
		Cand, RegMax, TRI, DAG->MF))
		return TryCand.Reason != NoCand;

		if (SameBoundary) {
		// Fall through to original instruction order.
		if ((Zone->isTop() && TryCand.SU->NodeNum < Cand.SU->NodeNum) \|\|
		(!Zone->isTop() && TryCand.SU->NodeNum > Cand.SU->NodeNum)) {
		TryCand.Reason = NodeOrder;
		return true;
		}
		}
		return false;
		}

GCNScheduleDAGMILive::GCNScheduleDAGMILive(		GCNScheduleDAGMILive::GCNScheduleDAGMILive(
MachineSchedContext *C, std::unique_ptr<MachineSchedStrategy> S)		MachineSchedContext *C, std::unique_ptr<MachineSchedStrategy> S)
: ScheduleDAGMILive(C, std::move(S)), ST(MF.getSubtarget<GCNSubtarget>()),		: ScheduleDAGMILive(C, std::move(S)), ST(MF.getSubtarget<GCNSubtarget>()),
MFI(*MF.getInfo<SIMachineFunctionInfo>()),		MFI(*MF.getInfo<SIMachineFunctionInfo>()),
StartingOccupancy(MFI.getOccupancy()), MinOccupancy(StartingOccupancy) {		StartingOccupancy(MFI.getOccupancy()), MinOccupancy(StartingOccupancy) {

LLVM_DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n");		LLVM_DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n");
}		}

		std::unique_ptr<GCNSchedStage>
		GCNScheduleDAGMILive::createSchedStage(GCNSchedStageID SchedStageID) {
		switch (SchedStageID) {
		case GCNSchedStageID::OccInitialSchedule:
		return std::make_unique<OccInitialScheduleStage>(SchedStageID, *this);
		case GCNSchedStageID::UnclusteredHighRPReschedule:
		return std::make_unique<UnclusteredHighRPStage>(SchedStageID, *this);
		case GCNSchedStageID::ClusteredLowOccupancyReschedule:
		return std::make_unique<ClusteredLowOccStage>(SchedStageID, *this);
		case GCNSchedStageID::PreRARematerialize:
		return std::make_unique<PreRARematStage>(SchedStageID, *this);
		case GCNSchedStageID::ILPInitialSchedule:
		return std::make_unique<ILPInitialScheduleStage>(SchedStageID, *this);
		}
		}

void GCNScheduleDAGMILive::schedule() {		void GCNScheduleDAGMILive::schedule() {
// Collect all scheduling regions. The actual scheduling is performed in		// Collect all scheduling regions. The actual scheduling is performed in
// GCNScheduleDAGMILive::finalizeSchedule.		// GCNScheduleDAGMILive::finalizeSchedule.
Regions.push_back(std::make_pair(RegionBegin, RegionEnd));		Regions.push_back(std::make_pair(RegionBegin, RegionEnd));
}		}

GCNRegPressure		GCNRegPressure
GCNScheduleDAGMILive::getRealRegPressure(unsigned RegionIdx) const {		GCNScheduleDAGMILive::getRealRegPressure(unsigned RegionIdx) const {
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	void GCNScheduleDAGMILive::finalizeSchedule() {
RegionsWithExcessRP.reset();		RegionsWithExcessRP.reset();
RegionsWithMinOcc.reset();		RegionsWithMinOcc.reset();

runSchedStages();		runSchedStages();
}		}

void GCNScheduleDAGMILive::runSchedStages() {		void GCNScheduleDAGMILive::runSchedStages() {
LLVM_DEBUG(dbgs() << "All regions recorded, starting actual scheduling.\n");		LLVM_DEBUG(dbgs() << "All regions recorded, starting actual scheduling.\n");
InitialScheduleStage S0(GCNSchedStageID::InitialSchedule, *this);
UnclusteredHighRPStage S1(GCNSchedStageID::UnclusteredHighRPReschedule,
*this);
ClusteredLowOccStage S2(GCNSchedStageID::ClusteredLowOccupancyReschedule,
*this);
PreRARematStage S3(GCNSchedStageID::PreRARematerialize, *this);
GCNSchedStage *SchedStages[] = {&S0, &S1, &S2, &S3};

if (!Regions.empty())		if (!Regions.empty())
BBLiveInMap = getBBLiveInMap();		BBLiveInMap = getBBLiveInMap();

for (auto *Stage : SchedStages) {		GCNSchedStrategy &S = static_cast<GCNSchedStrategy &>(*SchedImpl);
		while (S.advanceStage()) {
		auto Stage = createSchedStage(S.getCurrentStage());
if (!Stage->initGCNSchedStage())		if (!Stage->initGCNSchedStage())
continue;		continue;

for (auto Region : Regions) {		for (auto Region : Regions) {
RegionBegin = Region.first;		RegionBegin = Region.first;
RegionEnd = Region.second;		RegionEnd = Region.second;
// Setup for scheduling the region and check whether it should be skipped.		// Setup for scheduling the region and check whether it should be skipped.
if (!Stage->initGCNRegion()) {		if (!Stage->initGCNRegion()) {
Stage->advanceRegion();		Stage->advanceRegion();
exitRegion();		exitRegion();
continue;		continue;
}		}

ScheduleDAGMILive::schedule();		ScheduleDAGMILive::schedule();
Stage->finalizeGCNRegion();		Stage->finalizeGCNRegion();
}		}

Stage->finalizeGCNSchedStage();		Stage->finalizeGCNSchedStage();
}		}
}		}

#ifndef NDEBUG		#ifndef NDEBUG
raw_ostream &llvm::operator<<(raw_ostream &OS, const GCNSchedStageID &StageID) {		raw_ostream &llvm::operator<<(raw_ostream &OS, const GCNSchedStageID &StageID) {
switch (StageID) {		switch (StageID) {
case GCNSchedStageID::InitialSchedule:		case GCNSchedStageID::OccInitialSchedule:
OS << "Initial Schedule";		OS << "Max Occupancy Initial Schedule";
break;		break;
case GCNSchedStageID::UnclusteredHighRPReschedule:		case GCNSchedStageID::UnclusteredHighRPReschedule:
OS << "Unclustered High Register Pressure Reschedule";		OS << "Unclustered High Register Pressure Reschedule";
break;		break;
case GCNSchedStageID::ClusteredLowOccupancyReschedule:		case GCNSchedStageID::ClusteredLowOccupancyReschedule:
OS << "Clustered Low Occupancy Reschedule";		OS << "Clustered Low Occupancy Reschedule";
break;		break;
case GCNSchedStageID::PreRARematerialize:		case GCNSchedStageID::PreRARematerialize:
OS << "Pre-RA Rematerialize";		OS << "Pre-RA Rematerialize";
break;		break;
		case GCNSchedStageID::ILPInitialSchedule:
		OS << "Max ILP Initial Schedule";
		break;
}		}

return OS;		return OS;
}		}
#endif		#endif

GCNSchedStage::GCNSchedStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)		GCNSchedStage::GCNSchedStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)
: DAG(DAG), S(static_cast<GCNMaxOccupancySchedStrategy &>(*DAG.SchedImpl)),		: DAG(DAG), S(static_cast<GCNSchedStrategy &>(*DAG.SchedImpl)), MF(DAG.MF),
MF(DAG.MF), MFI(DAG.MFI), ST(DAG.ST), StageID(StageID) {}		MFI(DAG.MFI), ST(DAG.ST), StageID(StageID) {}

bool GCNSchedStage::initGCNSchedStage() {		bool GCNSchedStage::initGCNSchedStage() {
if (!DAG.LIS)		if (!DAG.LIS)
return false;		return false;

LLVM_DEBUG(dbgs() << "Starting scheduling stage: " << StageID << "\n");		LLVM_DEBUG(dbgs() << "Starting scheduling stage: " << StageID << "\n");
return true;		return true;
}		}
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	bool PreRARematStage::initGCNSchedStage() {
if (ST.computeOccupancy(MF.getFunction(), MFI.getLDSSize()) ==		if (ST.computeOccupancy(MF.getFunction(), MFI.getLDSSize()) ==
DAG.MinOccupancy)		DAG.MinOccupancy)
return false;		return false;

// FIXME: This pass will invalidate cached MBBLiveIns for regions		// FIXME: This pass will invalidate cached MBBLiveIns for regions
// inbetween the defs and region we sinked the def to. Cached pressure		// inbetween the defs and region we sinked the def to. Cached pressure
// for regions where a def is sinked from will also be invalidated. Will		// for regions where a def is sinked from will also be invalidated. Will
// need to be fixed if there is another pass after this pass.		// need to be fixed if there is another pass after this pass.
		assert(!S.hasNextStage());

collectRematerializableInstructions();		collectRematerializableInstructions();
if (RematerializableInsts.empty() \|\| !sinkTriviallyRematInsts(ST, TII))		if (RematerializableInsts.empty() \|\| !sinkTriviallyRematInsts(ST, TII))
return false;		return false;

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Retrying function scheduling with improved occupancy of "		dbgs() << "Retrying function scheduling with improved occupancy of "
<< DAG.MinOccupancy << " from rematerializing\n");		<< DAG.MinOccupancy << " from rematerializing\n");
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
void GCNSchedStage::setupNewBlock() {		void GCNSchedStage::setupNewBlock() {
if (CurrentMBB)		if (CurrentMBB)
DAG.finishBlock();		DAG.finishBlock();

CurrentMBB = DAG.RegionBegin->getParent();		CurrentMBB = DAG.RegionBegin->getParent();
DAG.startBlock(CurrentMBB);		DAG.startBlock(CurrentMBB);
// Get real RP for the region if it hasn't be calculated before. After the		// Get real RP for the region if it hasn't be calculated before. After the
// initial schedule stage real RP will be collected after scheduling.		// initial schedule stage real RP will be collected after scheduling.
if (StageID == GCNSchedStageID::InitialSchedule)		if (StageID == GCNSchedStageID::OccInitialSchedule)
DAG.computeBlockPressure(RegionIdx, CurrentMBB);		DAG.computeBlockPressure(RegionIdx, CurrentMBB);
}		}

void GCNSchedStage::finalizeGCNRegion() {		void GCNSchedStage::finalizeGCNRegion() {
DAG.Regions[RegionIdx] = std::make_pair(DAG.RegionBegin, DAG.RegionEnd);		DAG.Regions[RegionIdx] = std::make_pair(DAG.RegionBegin, DAG.RegionEnd);
DAG.RescheduleRegions[RegionIdx] = false;		DAG.RescheduleRegions[RegionIdx] = false;
if (S.HasHighPressure)		if (S.HasHighPressure)
DAG.RegionsWithHighRP[RegionIdx] = true;		DAG.RegionsWithHighRP[RegionIdx] = true;
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines

bool GCNSchedStage::shouldRevertScheduling(unsigned WavesAfter) {		bool GCNSchedStage::shouldRevertScheduling(unsigned WavesAfter) {
if (WavesAfter < DAG.MinOccupancy)		if (WavesAfter < DAG.MinOccupancy)
return true;		return true;

return false;		return false;
}		}

bool InitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {		bool OccInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
if (GCNSchedStage::shouldRevertScheduling(WavesAfter))		if (GCNSchedStage::shouldRevertScheduling(WavesAfter))
return true;		return true;

if (mayCauseSpilling(WavesAfter))		if (mayCauseSpilling(WavesAfter))
return true;		return true;

return false;		return false;
}		}
Show All 26 Lines	if (GCNSchedStage::shouldRevertScheduling(WavesAfter))
return true;		return true;

if (mayCauseSpilling(WavesAfter))		if (mayCauseSpilling(WavesAfter))
return true;		return true;

return false;		return false;
}		}

		bool ILPInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
		if (mayCauseSpilling(WavesAfter))
		return true;

		return false;
		}

bool GCNSchedStage::mayCauseSpilling(unsigned WavesAfter) {		bool GCNSchedStage::mayCauseSpilling(unsigned WavesAfter) {
if (WavesAfter <= MFI.getMinWavesPerEU() &&		if (WavesAfter <= MFI.getMinWavesPerEU() &&
!PressureAfter.less(ST, PressureBefore) &&		!PressureAfter.less(ST, PressureBefore) &&
DAG.RegionsWithExcessRP[RegionIdx]) {		DAG.RegionsWithExcessRP[RegionIdx]) {
LLVM_DEBUG(dbgs() << "New pressure will result in more spilling.\n");		LLVM_DEBUG(dbgs() << "New pressure will result in more spilling.\n");
return true;		return true;
}		}

return false;		return false;
}		}

void GCNSchedStage::revertScheduling() {		void GCNSchedStage::revertScheduling() {
DAG.RegionsWithMinOcc[RegionIdx] =		DAG.RegionsWithMinOcc[RegionIdx] =
PressureBefore.getOccupancy(ST) == DAG.MinOccupancy;		PressureBefore.getOccupancy(ST) == DAG.MinOccupancy;
LLVM_DEBUG(dbgs() << "Attempting to revert scheduling.\n");		LLVM_DEBUG(dbgs() << "Attempting to revert scheduling.\n");
DAG.RescheduleRegions[RegionIdx] =		DAG.RescheduleRegions[RegionIdx] =
(nextStage(StageID)) != GCNSchedStageID::UnclusteredHighRPReschedule;		S.hasNextStage() &&
		S.getNextStage() != GCNSchedStageID::UnclusteredHighRPReschedule;
DAG.RegionEnd = DAG.RegionBegin;		DAG.RegionEnd = DAG.RegionBegin;
int SkippedDebugInstr = 0;		int SkippedDebugInstr = 0;
for (MachineInstr *MI : Unsched) {		for (MachineInstr *MI : Unsched) {
if (MI->isDebugInstr()) {		if (MI->isDebugInstr()) {
++SkippedDebugInstr;		++SkippedDebugInstr;
continue;		continue;
}		}

▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/schedule-ilp.ll

	; RUN: llc -march=amdgcn -mcpu=tonga -misched=gcn-ilp -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=tonga -misched=gcn-iterative-ilp -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -march=amdgcn -mcpu=tonga -misched=gcn-max-ilp -verify-machineinstrs < %s \| FileCheck %s

	; CHECK: NumVgprs: {{[0-9][0-9][0-9]$}}			; CHECK: NumVgprs: {{[0-9][0-9][0-9]$}}

	define amdgpu_kernel void @load_fma_store(float addrspace(3)* nocapture readonly %arg, float addrspace(1)* nocapture %arg1) #0 {			define amdgpu_kernel void @load_fma_store(float addrspace(3)* nocapture readonly %arg, float addrspace(1)* nocapture %arg1) #0 {
	bb:			bb:
	%tmp = getelementptr inbounds float, float addrspace(3)* %arg, i32 1			%tmp = getelementptr inbounds float, float addrspace(3)* %arg, i32 1
	%tmp2 = load float, float addrspace(3)* %tmp, align 4			%tmp2 = load float, float addrspace(3)* %tmp, align 4
	%tmp3 = getelementptr inbounds float, float addrspace(3)* %arg, i32 2			%tmp3 = getelementptr inbounds float, float addrspace(3)* %arg, i32 2
	▲ Show 20 Lines • Show All 580 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit.ll

	; RUN: llc -enable-amdgpu-aa=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -enable-amdgpu-aa=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -enable-amdgpu-aa=0 -march=amdgcn -mcpu=tonga -misched=gcn-minreg -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -enable-amdgpu-aa=0 -march=amdgcn -mcpu=tonga -misched=gcn-iterative-minreg -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -enable-amdgpu-aa=0 -march=amdgcn -mcpu=tonga -misched=gcn-max-occupancy-experimental -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -enable-amdgpu-aa=0 -march=amdgcn -mcpu=tonga -misched=gcn-iterative-max-occupancy-experimental -verify-machineinstrs < %s \| FileCheck %s
				foadUnsubmitted Not Done Reply Inline Actions Why has this changed? foad: Why has this changed?
				kerbowaAuthorUnsubmitted Not Done Reply Inline Actions I renamed the iterative scheduler cl flags to have this "iterative" prefix. Mostly for clarity and to avoid confusion with this scheduling strategy that is being added in this patch. kerbowa: I renamed the iterative scheduler cl flags to have this "iterative" prefix. Mostly for clarity…
				foadUnsubmitted Not Done Reply Inline Actions Oh I see. I somehow missed that change in AMDGPUTargetMachine.cpp. foad: Oh I see. I somehow missed that change in AMDGPUTargetMachine.cpp.

	; We expect a two digit VGPR usage here, not a three digit.			; We expect a two digit VGPR usage here, not a three digit.
	; CHECK: NumVgprs: {{[0-9][0-9]$}}			; CHECK: NumVgprs: {{[0-9][0-9]$}}

	define amdgpu_kernel void @load_fma_store(float addrspace(3)* nocapture readonly %arg, float addrspace(1)* nocapture %arg1) {			define amdgpu_kernel void @load_fma_store(float addrspace(3)* nocapture readonly %arg, float addrspace(1)* nocapture %arg1) {
	bb:			bb:
	%tmp = getelementptr inbounds float, float addrspace(3)* %arg, i32 1			%tmp = getelementptr inbounds float, float addrspace(3)* %arg, i32 1
	%tmp2 = load float, float addrspace(3)* %tmp, align 4			%tmp2 = load float, float addrspace(3)* %tmp, align 4
	▲ Show 20 Lines • Show All 580 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit2.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti -enable-amdgpu-aa=0 -misched=gcn-minreg -verify-machineinstrs < %s \| FileCheck --check-prefix=SI-MINREG %s			; RUN: llc -march=amdgcn -mcpu=tahiti -enable-amdgpu-aa=0 -misched=gcn-iterative-minreg -verify-machineinstrs < %s \| FileCheck --check-prefix=SI-MINREG %s
	; RUN: llc -march=amdgcn -mcpu=tahiti -enable-amdgpu-aa=0 -misched=gcn-max-occupancy-experimental -verify-machineinstrs < %s \| FileCheck --check-prefix=SI-MAXOCC %s			; RUN: llc -march=amdgcn -mcpu=tahiti -enable-amdgpu-aa=0 -misched=gcn-iterative-max-occupancy-experimental -verify-machineinstrs < %s \| FileCheck --check-prefix=SI-MAXOCC %s
	; RUN: llc -march=amdgcn -mcpu=fiji -enable-amdgpu-aa=0 -misched=gcn-minreg -verify-machineinstrs < %s \| FileCheck --check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=fiji -enable-amdgpu-aa=0 -misched=gcn-iterative-minreg -verify-machineinstrs < %s \| FileCheck --check-prefix=VI %s
	; RUN: llc -march=amdgcn -mcpu=fiji -enable-amdgpu-aa=0 -misched=gcn-max-occupancy-experimental -verify-machineinstrs < %s \| FileCheck --check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=fiji -enable-amdgpu-aa=0 -misched=gcn-iterative-max-occupancy-experimental -verify-machineinstrs < %s \| FileCheck --check-prefix=VI %s

	; SI-MINREG: NumSgprs: {{[1-9]$}}			; SI-MINREG: NumSgprs: {{[1-9]$}}
	; SI-MINREG: NumVgprs: {{[1-9]$}}			; SI-MINREG: NumVgprs: {{[1-9]$}}

	; SI-MAXOCC: NumSgprs: {{[1-4]?[0-9]$}}			; SI-MAXOCC: NumSgprs: {{[1-4]?[0-9]$}}
	; SI-MAXOCC: NumVgprs: {{[1-4]?[0-9]$}}			; SI-MAXOCC: NumVgprs: {{[1-4]?[0-9]$}}

	; stores may alias loads			; stores may alias loads
	▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit3.ll

	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck --check-prefix=MISCHED %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck --check-prefix=MISCHED %s
	; RUN: llc -march=amdgcn -mcpu=tonga -misched=gcn-ilp -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN-ILP %s			; RUN: llc -march=amdgcn -mcpu=tonga -misched=gcn-iterative-ilp -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN-ILP %s

	; Test the scheduler when only one wave is requested. The result should be high register usage and max ILP.			; Test the scheduler when only one wave is requested. The result should be high register usage and max ILP.

	; We expect a three digit VGPR usage here since only one wave requested.			; We expect a three digit VGPR usage here since only one wave requested.
	;			;
	; GCN-ILP: NumVgprs: {{[0-9][0-9][0-9]$}}			; GCN-ILP: NumVgprs: {{[0-9][0-9][0-9]$}}

	; FIXME: The machine scheduler is doing a poor job at maximizing ILP here.			; FIXME: The machine scheduler is doing a poor job at maximizing ILP here.
	▲ Show 20 Lines • Show All 590 Lines • Show Last 20 Lines