This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add second pass of the scheduler
ClosedPublic

Authored by rampitec on Feb 27 2017, 8:47 PM.

Download Raw Diff

Details

Reviewers

vpykhtin
alex-t

Commits

rG357d3db0a42e: [AMDGPU] Add second pass of the scheduler
rL296506: [AMDGPU] Add second pass of the scheduler

Summary

If during scheduling we have identified that we cannot keep optimistic
occupancy increase critical register pressure limit and try scheduling
of the whole function again. In this case blocks with smaller pressure
will have a chance for better scheduling.

Diff Detail

Repository: rL LLVM

Event Timeline

rampitec created this revision.Feb 27 2017, 8:47 PM

Herald added subscribers: nhaehnle, arsenm. · View Herald TranscriptFeb 27 2017, 8:47 PM

rampitec retitled this revision from Add second pass of the scheduler to [AMDGPU] Add second pass of the scheduler.Feb 27 2017, 8:48 PM

rampitec added parent revisions: D30439: [AMDGPU] New method to estimate register pressure, D30428: [AMDGPU] Fix read-undef flags when schedule is reverted.

tstellar added a subscriber: tstellar.Feb 28 2017, 8:19 AM

tstellar added inline comments.

lib/Target/AMDGPU/GCNSchedStrategy.cpp
48–50 ↗	(On Diff #89972)	Can you store TargetOccupancy in the SIMachineFunctionInfo object? I think that would be a little cleaner.

tstellar added inline comments.Feb 28 2017, 8:26 AM

lib/Target/AMDGPU/GCNSchedStrategy.cpp
48–50 ↗	(On Diff #89972)	To clarify: TargetOccupancy = MFI->getTargetOccupancy();

rampitec added inline comments.Feb 28 2017, 8:29 AM

lib/Target/AMDGPU/GCNSchedStrategy.cpp
48–50 ↗	(On Diff #89972)	It is the override for scheduler and it will not be always initialized to non-zero. I'm afraid if I expose this field in MFI it would be misleading.

I'm just a bit confused: what gives scheduler more registry freedom on rescheduling run?

In D30442#688729, @vpykhtin wrote:

I'm just a bit confused: what gives scheduler more registry freedom on rescheduling run?

With lower occupancy we have more registers available. If for any block we cannot maintain minimal occupancy we can use the same number of registers in other blocks. That is achieved by bumping critical limits.

Ok, it's just not that easy to follow this in GCNMaxOccupancySchedStrategy::initialize.

This revision is now accepted and ready to land.Feb 28 2017, 10:13 AM

kzhuravl added a subscriber: kzhuravl.Feb 28 2017, 10:18 AM

kzhuravl added inline comments.

lib/Target/AMDGPU/GCNSchedStrategy.cpp
62 ↗	(On Diff #89972)	I think this should also respect the "amdgpu-waves-per-eu" attribute (https://clang.llvm.org/docs/AttributeReference.html#amdgpu-waves-per-eu)?

How expensive is it to do this? The scheduler is already frequently the most expensive pass after RA, sometimes surpassing it

In D30442#688792, @arsenm wrote:

How expensive is it to do this? The scheduler is already frequently the most expensive pass after RA, sometimes surpassing it

The algorithm is not very expensive itself. Liveins scanned for all defined registers once every region. This is probably most expensive part if there are a lot of registers. The main scan after that only touches those registers which are alive in the region. This is obviously more expensive than without it, but not terribly expensive given ready LIS we already have.

That would be possible to preserve LiveRegs between regions, but main scheduler loop can skip some regions.

lib/Target/AMDGPU/GCNSchedStrategy.cpp
62 ↗	(On Diff #89972)	It does when it calls getRegPressureSetLimit(). However if we are not limited or cannot keep within guessed optimistic limits we override TargetOccupancy and reschedule.

Closed by commit rL296506: [AMDGPU] Add second pass of the scheduler (authored by rampitec). · Explain WhyFeb 28 2017, 11:32 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

GCNSchedStrategy.h

31 lines

GCNSchedStrategy.cpp

102 lines

Diff 90064

llvm/trunk/lib/Target/AMDGPU/GCNSchedStrategy.h

	Show All 12 Lines

	#ifndef LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H			#ifndef LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H
	#define LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H			#define LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H

	#include "llvm/CodeGen/MachineScheduler.h"			#include "llvm/CodeGen/MachineScheduler.h"

	namespace llvm {			namespace llvm {

				class SIMachineFunctionInfo;
	class SIRegisterInfo;			class SIRegisterInfo;
				class SISubtarget;

	/// This is a minimal scheduler strategy. The main difference between this			/// This is a minimal scheduler strategy. The main difference between this
	/// and the GenericScheduler is that GCNSchedStrategy uses different			/// and the GenericScheduler is that GCNSchedStrategy uses different
	/// heuristics to determine excess/critical pressure sets. Its goal is to			/// heuristics to determine excess/critical pressure sets. Its goal is to
	/// maximize kernel occupancy (i.e. maximum number of waves per simd).			/// maximize kernel occupancy (i.e. maximum number of waves per simd).
	class GCNMaxOccupancySchedStrategy : public GenericScheduler {			class GCNMaxOccupancySchedStrategy : public GenericScheduler {
	friend class GCNScheduleDAGMILive;			friend class GCNScheduleDAGMILive;

	SUnit *pickNodeBidirectional(bool &IsTopNode);			SUnit *pickNodeBidirectional(bool &IsTopNode);

	void pickNodeFromQueue(SchedBoundary &Zone, const CandPolicy &ZonePolicy,			void pickNodeFromQueue(SchedBoundary &Zone, const CandPolicy &ZonePolicy,
	const RegPressureTracker &RPTracker,			const RegPressureTracker &RPTracker,
	SchedCandidate &Cand);			SchedCandidate &Cand);

	void initCandidate(SchedCandidate &Cand, SUnit *SU,			void initCandidate(SchedCandidate &Cand, SUnit *SU,
	bool AtTop, const RegPressureTracker &RPTracker,			bool AtTop, const RegPressureTracker &RPTracker,
	const SIRegisterInfo *SRI,			const SIRegisterInfo *SRI,
	unsigned SGPRPressure, unsigned VGPRPressure);			unsigned SGPRPressure, unsigned VGPRPressure);

	unsigned SGPRExcessLimit;			unsigned SGPRExcessLimit;
	unsigned VGPRExcessLimit;			unsigned VGPRExcessLimit;
	unsigned SGPRCriticalLimit;			unsigned SGPRCriticalLimit;
	unsigned VGPRCriticalLimit;			unsigned VGPRCriticalLimit;

				unsigned TargetOccupancy;

				MachineFunction *MF;

	public:			public:
	GCNMaxOccupancySchedStrategy(const MachineSchedContext *C);			GCNMaxOccupancySchedStrategy(const MachineSchedContext *C);

	SUnit *pickNode(bool &IsTopNode) override;			SUnit *pickNode(bool &IsTopNode) override;

	void initialize(ScheduleDAGMI *DAG) override;			void initialize(ScheduleDAGMI *DAG) override;
	};			};

	class GCNScheduleDAGMILive : public ScheduleDAGMILive {			class GCNScheduleDAGMILive : public ScheduleDAGMILive {

				const SISubtarget &ST;

				const SIMachineFunctionInfo &MFI;

				// Occupancy target at the begining of function scheduling cycle.
				unsigned StartingOccupancy;

				// Minimal real occupancy recorder for the function.
				unsigned MinOccupancy;

				// Scheduling stage number.
				unsigned Stage;

				// Vecor of regions recorder for later rescheduling
				SmallVector<std::pair<const MachineBasicBlock::iterator,
				const MachineBasicBlock::iterator>, 32> Regions;

	// Region live-ins.			// Region live-ins.
	DenseMap<unsigned, LaneBitmask> LiveIns;			DenseMap<unsigned, LaneBitmask> LiveIns;

	// Number of live-ins to the current region, first SGPR then VGPR.			// Number of live-ins to the current region, first SGPR then VGPR.
	std::pair<unsigned, unsigned> LiveInPressure;			std::pair<unsigned, unsigned> LiveInPressure;

	// Collect current region live-ins.			// Collect current region live-ins.
	void discoverLiveIns();			void discoverLiveIns();

	// Return current region pressure. First value is SGPR number, second is VGPR.			// Return current region pressure. First value is SGPR number, second is VGPR.
	std::pair<unsigned, unsigned> getRealRegPressure() const;			std::pair<unsigned, unsigned> getRealRegPressure() const;

	public:			public:
	GCNScheduleDAGMILive(MachineSchedContext *C,			GCNScheduleDAGMILive(MachineSchedContext *C,
	std::unique_ptr<MachineSchedStrategy> S) :			std::unique_ptr<MachineSchedStrategy> S);
	ScheduleDAGMILive(C, std::move(S)) {}
				void enterRegion(MachineBasicBlock *bb,
				MachineBasicBlock::iterator begin,
				MachineBasicBlock::iterator end,
				unsigned regioninstrs) override;

	void schedule() override;			void schedule() override;

	void finalizeSchedule() override;			void finalizeSchedule() override;
	};			};

	} // End namespace llvm			} // End namespace llvm

	#endif // GCNSCHEDSTRATEGY_H			#endif // GCNSCHEDSTRATEGY_H

llvm/trunk/lib/Target/AMDGPU/GCNSchedStrategy.cpp

Show All 20 Lines
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"

#define DEBUG_TYPE "misched"		#define DEBUG_TYPE "misched"

using namespace llvm;		using namespace llvm;

GCNMaxOccupancySchedStrategy::GCNMaxOccupancySchedStrategy(		GCNMaxOccupancySchedStrategy::GCNMaxOccupancySchedStrategy(
const MachineSchedContext *C) :		const MachineSchedContext *C) :
GenericScheduler(C) { }		GenericScheduler(C), TargetOccupancy(0), MF(nullptr) { }

static unsigned getMaxWaves(unsigned SGPRs, unsigned VGPRs,		static unsigned getMaxWaves(unsigned SGPRs, unsigned VGPRs,
const MachineFunction &MF) {		const MachineFunction &MF) {

const SISubtarget &ST = MF.getSubtarget<SISubtarget>();		const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
unsigned MinRegOccupancy = std::min(ST.getOccupancyWithNumSGPRs(SGPRs),		unsigned MinRegOccupancy = std::min(ST.getOccupancyWithNumSGPRs(SGPRs),
ST.getOccupancyWithNumVGPRs(VGPRs));		ST.getOccupancyWithNumVGPRs(VGPRs));
return std::min(MinRegOccupancy,		return std::min(MinRegOccupancy,
ST.getOccupancyWithLocalMemSize(MFI->getLDSSize(),		ST.getOccupancyWithLocalMemSize(MFI->getLDSSize(),
*MF.getFunction()));		*MF.getFunction()));
}		}

void GCNMaxOccupancySchedStrategy::initialize(ScheduleDAGMI *DAG) {		void GCNMaxOccupancySchedStrategy::initialize(ScheduleDAGMI *DAG) {
GenericScheduler::initialize(DAG);		GenericScheduler::initialize(DAG);

const SIRegisterInfo SRI = static_cast<const SIRegisterInfo>(TRI);		const SIRegisterInfo SRI = static_cast<const SIRegisterInfo>(TRI);

		if (MF != &DAG->MF)
		TargetOccupancy = 0;
		MF = &DAG->MF;

		const SISubtarget &ST = MF->getSubtarget<SISubtarget>();

// FIXME: This is also necessary, because some passes that run after		// FIXME: This is also necessary, because some passes that run after
// scheduling and before regalloc increase register pressure.		// scheduling and before regalloc increase register pressure.
const int ErrorMargin = 3;		const int ErrorMargin = 3;

SGPRExcessLimit = Context->RegClassInfo		SGPRExcessLimit = Context->RegClassInfo
->getNumAllocatableRegs(&AMDGPU::SGPR_32RegClass) - ErrorMargin;		->getNumAllocatableRegs(&AMDGPU::SGPR_32RegClass) - ErrorMargin;
VGPRExcessLimit = Context->RegClassInfo		VGPRExcessLimit = Context->RegClassInfo
->getNumAllocatableRegs(&AMDGPU::VGPR_32RegClass) - ErrorMargin;		->getNumAllocatableRegs(&AMDGPU::VGPR_32RegClass) - ErrorMargin;
		if (TargetOccupancy) {
		SGPRCriticalLimit = ST.getMaxNumSGPRs(TargetOccupancy, true);
		VGPRCriticalLimit = ST.getMaxNumVGPRs(TargetOccupancy);
		} else {
SGPRCriticalLimit = SRI->getRegPressureSetLimit(DAG->MF,		SGPRCriticalLimit = SRI->getRegPressureSetLimit(DAG->MF,
SRI->getSGPRPressureSet()) - ErrorMargin;		SRI->getSGPRPressureSet());
VGPRCriticalLimit = SRI->getRegPressureSetLimit(DAG->MF,		VGPRCriticalLimit = SRI->getRegPressureSetLimit(DAG->MF,
SRI->getVGPRPressureSet()) - ErrorMargin;		SRI->getVGPRPressureSet());
		}

		SGPRCriticalLimit -= ErrorMargin;
		VGPRCriticalLimit -= ErrorMargin;
}		}

void GCNMaxOccupancySchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU,		void GCNMaxOccupancySchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU,
bool AtTop, const RegPressureTracker &RPTracker,		bool AtTop, const RegPressureTracker &RPTracker,
const SIRegisterInfo *SRI,		const SIRegisterInfo *SRI,
unsigned SGPRPressure,		unsigned SGPRPressure,
unsigned VGPRPressure) {		unsigned VGPRPressure) {

▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	if (SU->isTopReady())
Top.removeReady(SU);		Top.removeReady(SU);
if (SU->isBottomReady())		if (SU->isBottomReady())
Bot.removeReady(SU);		Bot.removeReady(SU);

DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") " << *SU->getInstr());		DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") " << *SU->getInstr());
return SU;		return SU;
}		}

		GCNScheduleDAGMILive::GCNScheduleDAGMILive(MachineSchedContext *C,
		std::unique_ptr<MachineSchedStrategy> S) :
		ScheduleDAGMILive(C, std::move(S)),
		ST(MF.getSubtarget<SISubtarget>()),
		MFI(*MF.getInfo<SIMachineFunctionInfo>()),
		StartingOccupancy(ST.getOccupancyWithLocalMemSize(MFI.getLDSSize(),
		*MF.getFunction())),
		MinOccupancy(StartingOccupancy), Stage(0) {

		DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n");
		}

		void GCNScheduleDAGMILive::enterRegion(MachineBasicBlock *bb,
		MachineBasicBlock::iterator begin,
		MachineBasicBlock::iterator end,
		unsigned regioninstrs) {
		ScheduleDAGMILive::enterRegion(bb, begin, end, regioninstrs);

		if (Stage == 0)
		Regions.push_back(std::make_pair(begin, end));
		}

void GCNScheduleDAGMILive::schedule() {		void GCNScheduleDAGMILive::schedule() {
std::vector<MachineInstr*> Unsched;		std::vector<MachineInstr*> Unsched;
Unsched.reserve(NumRegionInstrs);		Unsched.reserve(NumRegionInstrs);
for (auto &I : *this)		for (auto &I : *this)
Unsched.push_back(&I);		Unsched.push_back(&I);

std::pair<unsigned, unsigned> PressureBefore;		std::pair<unsigned, unsigned> PressureBefore;
if (LIS) {		if (LIS) {
Show All 19 Lines	void GCNScheduleDAGMILive::schedule() {
}		}
unsigned WavesAfter = getMaxWaves(PressureAfter.first,		unsigned WavesAfter = getMaxWaves(PressureAfter.first,
PressureAfter.second, MF);		PressureAfter.second, MF);
unsigned WavesBefore = getMaxWaves(PressureBefore.first,		unsigned WavesBefore = getMaxWaves(PressureBefore.first,
PressureBefore.second, MF);		PressureBefore.second, MF);
DEBUG(dbgs() << "Occupancy before scheduling: " << WavesBefore <<		DEBUG(dbgs() << "Occupancy before scheduling: " << WavesBefore <<
", after " << WavesAfter << ".\n");		", after " << WavesAfter << ".\n");

		// We could not keep current target occupancy because of the just scheduled
		// region. Record new occupancy for next scheduling cycle.
		unsigned NewOccupancy = std::max(WavesAfter, WavesBefore);
		if (NewOccupancy < MinOccupancy) {
		MinOccupancy = NewOccupancy;
		DEBUG(dbgs() << "Occupancy lowered for the function to "
		<< MinOccupancy << ".\n");
		}

if (WavesAfter >= WavesBefore)		if (WavesAfter >= WavesBefore)
return;		return;

DEBUG(dbgs() << "Attempting to revert scheduling.\n");		DEBUG(dbgs() << "Attempting to revert scheduling.\n");
RegionEnd = RegionBegin;		RegionEnd = RegionBegin;
for (MachineInstr *MI : Unsched) {		for (MachineInstr *MI : Unsched) {
if (MI->getIterator() != RegionEnd) {		if (MI->getIterator() != RegionEnd) {
BB->remove(MI);		BB->remove(MI);
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	GCNScheduleDAGMILive::getRealRegPressure() const {

DEBUG(dbgs() << "Real region's register pressure:\nSGPR = " << MaxSGPRs		DEBUG(dbgs() << "Real region's register pressure:\nSGPR = " << MaxSGPRs
<< "\nVGPR = " << MaxVGPRs << '\n');		<< "\nVGPR = " << MaxVGPRs << '\n');

return std::make_pair(MaxSGPRs, MaxVGPRs);		return std::make_pair(MaxSGPRs, MaxVGPRs);
}		}

void GCNScheduleDAGMILive::finalizeSchedule() {		void GCNScheduleDAGMILive::finalizeSchedule() {
		// Retry function scheduling if we found resulting occupancy and it is
		// lower than used for first pass scheduling. This will give more freedom
		// to schedule low register pressure blocks.
		// Code is partially copied from MachineSchedulerBase::scheduleRegions().

		if (!LIS \|\| StartingOccupancy <= MinOccupancy)
		return;

		DEBUG(dbgs() << "Retrying function scheduling with lowest recorded occupancy "
		<< MinOccupancy << ".\n");

		Stage++;
		GCNMaxOccupancySchedStrategy &S = (GCNMaxOccupancySchedStrategy&)*SchedImpl;
		S.TargetOccupancy = MinOccupancy;

		MachineBasicBlock *MBB = nullptr;
		for (auto Region : Regions) {
		RegionBegin = Region.first;
		RegionEnd = Region.second;

		if (RegionBegin->getParent() != MBB) {
		if (MBB) finishBlock();
		MBB = RegionBegin->getParent();
		startBlock(MBB);
		}

		unsigned NumRegionInstrs = std::distance(begin(), end());
		enterRegion(MBB, begin(), end(), NumRegionInstrs);

		// Skip empty scheduling regions (0 or 1 schedulable instructions).
		if (begin() == end() \|\| begin() == std::prev(end())) {
		exitRegion();
		continue;
		}
		DEBUG(dbgs() << "******** MI Scheduling ********\n");
		DEBUG(dbgs() << MF.getName()
		<< ":BB#" << MBB->getNumber() << " " << MBB->getName()
		<< "\n From: " << *begin() << " To: ";
		if (RegionEnd != MBB->end()) dbgs() << *RegionEnd;
		else dbgs() << "End";
		dbgs() << " RegionInstrs: " << NumRegionInstrs << '\n');

		schedule();

		exitRegion();
		}
		finishBlock();
LiveIns.shrink_and_clear();		LiveIns.shrink_and_clear();
}		}