This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Turn register pressure estimation into forward tracker
ClosedPublic

Authored by rampitec on May 11 2017, 10:22 AM.

Download Raw Diff

Details

Reviewers

Commits

rG464cecf81e05: [AMDGPU] Turn register pressure estimation into forward tracker
rL303179: [AMDGPU] Turn register pressure estimation into forward tracker

Summary

This factors register pressure estimation mechanism from the
GCNSchedStrategy into the forward tracker to unify interface
with other strategies and expose it to other interested phases.

Diff Detail

Repository: rL LLVM

Event Timeline

rampitec created this revision.May 11 2017, 10:22 AM

Herald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptMay 11 2017, 10:22 AM

Added LastTrackedMI assignment.

Split advance() into advanceBefore(MI) and advanceTo(MI).
The tracker does two things: tracks pressure and returns live reg set. A proper live-in set when advancing to an instruction is after we have committed all dead registers before moving to the new instruction, but before we have added new defs from that new instruction.
If we just call advance(), which sequentially calls advanceBefore() and advanceTo(), we would always get a correct live reg set at the MI argument, but that is not the same as live-in set before this instruction. Calling just advanceBefore() leaves it in a live-in state.
In addition added helper to clear maximum register pressure, which is also needed to be reset in between of these two steps if we are tracking the block and want to split it into regions.

rampitec added a child revision: D33117: [AMDGPU] Cache live-ins and register pressure in scheduler.May 12 2017, 12:00 AM

vpykhtin added inline comments.May 12 2017, 7:18 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
305 ↗	(On Diff #98723)	I think this should be the responsibility of the caller to account valid instructions, it would look unexpected that reset moves somewhere else.
lib/Target/AMDGPU/GCNRegPressure.h
150 ↗	(On Diff #98723)	we have some inconsistency here: upward tracker's "reset" moves to the point after the MI, "recede" moves to the point before the MI, but downward reset and advance moves at.

rampitec added inline comments.May 12 2017, 8:02 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
305 ↗	(On Diff #98723)	Caller will always need to do it, otherwise reset will just crash. Then advance itself skips debug instructions, so they are consistent. If a caller resets tracker to the debug instruction and then just calls advance increasing the iterator it will not notice any difference. There is nothing good in letting caller to easily crash, there is no additional benefit of it besides the crash.
lib/Target/AMDGPU/GCNRegPressure.h
150 ↗	(On Diff #98723)	I do not think this is an inconsistency if we think about advance/recede as a two step move. We really want tracker to do two things, count pressure and get live-ins/live-outs. So reset just does the first step allowing us to collect required live set. If not that advance could just call advance one time to move at the argument instruction.

rampitec added inline comments.May 12 2017, 8:36 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
305 ↗	(On Diff #98723)	BTW, reset does not record any state besides live set, and live set is the same if you skip debug or not.

arsenm added inline comments.May 12 2017, 10:43 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗	(On Diff #98723)	Should this be using something like isTransient (as it is now looks inappropriate) to avoid IMPLICIT_DEF from impacting the pressure estimate?
317 ↗	(On Diff #98723)	extra space
359 ↗	(On Diff #98723)	std::max, not sure how this builds for you

vpykhtin added inline comments.May 12 2017, 10:44 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗	(On Diff #98723)	this is overloaded max for GCNRegPressure

arsenm added inline comments.May 12 2017, 10:48 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗	(On Diff #98723)	Probably should name it something else

rampitec added inline comments.May 12 2017, 10:49 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗	(On Diff #98723)	I believe these defs should be accounted towards pressure. RA will finally allocate something for it, and in our case it is better to overestimate pressure rather than underestimate it. Anyway, this is not a functional change, it basically factors out an existing algorithm into a common interface. I do not want to change its behavior with this change.

Removed extra space.

vpykhtin added inline comments.May 12 2017, 10:52 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗	(On Diff #98723)	I thought to choose better name for it as it doesn't compare two pressures and returning larger but instead constructs new pressure using maximum values from both, what would be better name here?

arsenm added inline comments.May 12 2017, 10:54 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗	(On Diff #98723)	implicit_def will use whatever register is available, I think usually (always?) VGPR0, so it shouldn't count. Copy/phi is more questionable.

Uploaded correct patch.

rampitec added inline comments.May 12 2017, 10:58 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗	(On Diff #98723)	It may use available registers, but it will use them. If there were no registers available before, then at implicit def we will have an increase. It probably deserves some experiments, but shall be a separate change from the interface.

rampitec mentioned this in D33117: [AMDGPU] Cache live-ins and register pressure in scheduler.May 12 2017, 1:48 PM

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

In D33105#754976, @vpykhtin wrote:

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

In general it cannot work on an arbitrary order, but it can be used as a probe to schedule next arbitrary instruction and then reset to the previous state.
Then in D33117 I'm using advanceBefore to cross basic block boundary and I do not really want to slow down advance method by checking for the iterator end condition which is needed relatively seldom.

rampitec added inline comments.May 15 2017, 9:21 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗	(On Diff #98723)	buildMax() maybe?

In D33105#755142, @rampitec wrote:

In D33105#754976, @vpykhtin wrote:

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

In general it cannot work on an arbitrary order, but it can be used as a probe to schedule next arbitrary instruction and then reset to the previous state.
Then in D33117 I'm using advanceBefore to cross basic block boundary and I do not really want to slow down advance method by checking for the iterator end condition which is needed relatively seldom.

Why are you sure it would work on arbitrary out-of-original-LIS-constructed-order instruction?

In D33105#755158, @vpykhtin wrote:

In D33105#755142, @rampitec wrote:

In D33105#754976, @vpykhtin wrote:

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

In general it cannot work on an arbitrary order, but it can be used as a probe to schedule next arbitrary instruction and then reset to the previous state.
Then in D33117 I'm using advanceBefore to cross basic block boundary and I do not really want to slow down advance method by checking for the iterator end condition which is needed relatively seldom.

Why are you sure it would work on arbitrary out-of-original-LIS-constructed-order instruction?

If I advance just a single out of order instruction it will give you not the correct RP at it, but a correct RP diff as a result of such single instruction scheduling. Honestly you cannot expect more from a probe without giving a whole order of intermediate instructions.

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

In D33105#755182, @vpykhtin wrote:

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

We probably should, but it probably does not belong to this change.

In D33105#755186, @rampitec wrote:

In D33105#755182, @vpykhtin wrote:

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

We probably should, but it probably does not belong to this change.

I would not introduce interface in a hope of such usage. This is really counterintuitive. Lets do the RP diff function instead when we need it.

In D33105#755187, @vpykhtin wrote:

In D33105#755186, @rampitec wrote:

In D33105#755182, @vpykhtin wrote:

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

We probably should, but it probably does not belong to this change.

I would not introduce interface in a hope of such usage. This is really counterintuitive. Lets do the RP diff function instead when we need it.

OK, think about D33117, GCNSchedStrategy.cpp near line 491. To implement this in the proposed interface I will need to check for the end of the basic block in the advanceBefore(), get the successor, check that is the only successor... I'm really trying to make it faster now ;)

In D33105#755202, @rampitec wrote:

In D33105#755187, @vpykhtin wrote:

I would not introduce interface in a hope of such usage. This is really counterintuitive. Lets do the RP diff function instead when we need it.

OK, think about D33117, GCNSchedStrategy.cpp near line 491. To implement this in the proposed interface I will need to check for the end of the basic block in the advanceBefore(), get the successor, check that is the only successor... I'm really trying to make it faster now ;)

... and then if there is no single successor I would need to return a failure, turning advance to bool and checking it everywhere... Plus pass it a flag if it needs to cross BB boundary or not. Err, that will be a bad interface.

I really don't understand it all. Why not just have reset(MachineInstr *, LiveRegSet ) and do a reset on the next BB with liveset from previous BB? It looks like your code does it. Another question: your reset does skip debugs, but the caller doesn't know about it and can supply unskipped iterator to the next advance.

In D33105#755231, @vpykhtin wrote:

I really don't understand it all. Why not just have reset(MachineInstr *, LiveRegSet ) and do a reset on the next BB with liveset from previous BB? It looks like your code does it. Another question: your reset does skip debugs, but the caller doesn't know about it and can supply unskipped iterator to the next advance.

Reset with liveset will mean copying of the map. This copy is inevitable in other places because that is really a backup copy, but here we do not need anything like that.
The unskipped iterator is not an issue, advance() will immediately return on a debug value. Caller will increment iterator and call advance again(). No problem.

You can do the reset with move - this would not require copy.

If you make advance moving its own iterator then it can skip debugs itself.

Making advance checking end pointer wouldn't increase the number of end checks, as you check it on every iteration of advance loop. You can return bool from advance and break the loop on it. Anyway such checks are cheap.

Speedup the tracker:

Replace MRI.reg_nodbg_empty() with LIS.hasInterval().
Delete empty regs from live set when tracking.

Changed forward RP tracker from stateless to statefull as requested by reviewer.
Improved speed by replacing MRI.reg_nodbg_empty() to LIS.hasInterval().

rampitec marked 3 inline comments as done.May 15 2017, 7:43 PM

I'm a bit confused with all of these advance... but lets submit and improve this later.

vpykhtin accepted this revision.May 16 2017, 8:00 AM

This revision is now accepted and ready to land.May 16 2017, 8:00 AM

Closed by commit rL303179: [AMDGPU] Turn register pressure estimation into forward tracker (authored by rampitec). · Explain WhyMay 16 2017, 8:57 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

49 lines

121 lines

10 lines

151 lines

Diff 99152

llvm/trunk/lib/Target/AMDGPU/GCNRegPressure.h

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	inline GCNRegPressure max(const GCNRegPressure &P1, const GCNRegPressure &P2) {
return Res;		return Res;
}		}

class GCNRPTracker {		class GCNRPTracker {
public:		public:
typedef DenseMap<unsigned, LaneBitmask> LiveRegSet;		typedef DenseMap<unsigned, LaneBitmask> LiveRegSet;

protected:		protected:
		const LiveIntervals &LIS;
LiveRegSet LiveRegs;		LiveRegSet LiveRegs;
GCNRegPressure CurPressure, MaxPressure;		GCNRegPressure CurPressure, MaxPressure;
const MachineInstr *LastTrackedMI = nullptr;		const MachineInstr *LastTrackedMI = nullptr;
mutable const MachineRegisterInfo *MRI = nullptr;		mutable const MachineRegisterInfo *MRI = nullptr;
GCNRPTracker() {}		GCNRPTracker(const LiveIntervals &LIS_) : LIS(LIS_) {}
		LaneBitmask getDefRegMask(const MachineOperand &MO) const;
		LaneBitmask getUsedRegMask(const MachineOperand &MO) const;
public:		public:
// live regs for the current state		// live regs for the current state
const decltype(LiveRegs) &getLiveRegs() const { return LiveRegs; }		const decltype(LiveRegs) &getLiveRegs() const { return LiveRegs; }
const MachineInstr *getLastTrackedMI() const { return LastTrackedMI; }		const MachineInstr *getLastTrackedMI() const { return LastTrackedMI; }

		void clearMaxPressure() { MaxPressure.clear(); }

// returns MaxPressure, resetting it		// returns MaxPressure, resetting it
decltype(MaxPressure) moveMaxPressure() {		decltype(MaxPressure) moveMaxPressure() {
auto Res = MaxPressure;		auto Res = MaxPressure;
MaxPressure.clear();		MaxPressure.clear();
return Res;		return Res;
}		}
decltype(LiveRegs) moveLiveRegs() {		decltype(LiveRegs) moveLiveRegs() {
return std::move(LiveRegs);		return std::move(LiveRegs);
}		}
};		};

class GCNUpwardRPTracker : public GCNRPTracker {		class GCNUpwardRPTracker : public GCNRPTracker {
const LiveIntervals &LIS;
LaneBitmask getDefRegMask(const MachineOperand &MO) const;
LaneBitmask getUsedRegMask(const MachineOperand &MO) const;
public:		public:
GCNUpwardRPTracker(const LiveIntervals &LIS_) : LIS(LIS_) {}		GCNUpwardRPTracker(const LiveIntervals &LIS_) : GCNRPTracker(LIS_) {}
// reset tracker to the point just below MI		// reset tracker to the point just below MI
// filling live regs upon this point using LIS		// filling live regs upon this point using LIS
void reset(const MachineInstr &MI);		void reset(const MachineInstr &MI, const LiveRegSet *LiveRegs = nullptr);

// move to the state just above the MI		// move to the state just above the MI
void recede(const MachineInstr &MI);		void recede(const MachineInstr &MI);

// checks whether the tracker's state after receding MI corresponds		// checks whether the tracker's state after receding MI corresponds
// to reported by LIS		// to reported by LIS
bool isValid() const;		bool isValid() const;
};		};

		class GCNDownwardRPTracker : public GCNRPTracker {
		// Last position of reset or advanceBeforeNext
		MachineBasicBlock::const_iterator NextMI;

		MachineBasicBlock::const_iterator MBBEnd;

		public:
		GCNDownwardRPTracker(const LiveIntervals &LIS_) : GCNRPTracker(LIS_) {}

		const MachineBasicBlock::const_iterator getNext() const { return NextMI; }

		// Reset tracker to the point before the MI
		// filling live regs upon this point using LIS.
		// Returns false if block is empty except debug values.
		bool reset(const MachineInstr &MI, const LiveRegSet *LiveRegs = nullptr);

		// Move to the state right before the next MI. Returns false if reached
		// end of the block.
		bool advanceBeforeNext();

		// Move to the state at the MI, advanceBeforeNext has to be called first.
		void advanceToNext();

		// Move to the state at the next MI. Returns false if reached end of block.
		bool advance();

		// Advance instructions until before End.
		bool advance(MachineBasicBlock::const_iterator End);

		// Reset to Begin and advance to End.
		bool advance(MachineBasicBlock::const_iterator Begin,
		MachineBasicBlock::const_iterator End,
		const LiveRegSet *LiveRegsCopy = nullptr);
		};

LaneBitmask getLiveLaneMask(unsigned Reg,		LaneBitmask getLiveLaneMask(unsigned Reg,
SlotIndex SI,		SlotIndex SI,
const LiveIntervals &LIS,		const LiveIntervals &LIS,
const MachineRegisterInfo &MRI);		const MachineRegisterInfo &MRI);

GCNRPTracker::LiveRegSet getLiveRegs(SlotIndex SI,		GCNRPTracker::LiveRegSet getLiveRegs(SlotIndex SI,
const LiveIntervals &LIS,		const LiveIntervals &LIS,
const MachineRegisterInfo &MRI);		const MachineRegisterInfo &MRI);
Show All 29 Lines

llvm/trunk/lib/Target/AMDGPU/GCNRegPressure.cpp

Show All 21 Lines
void llvm::printLivesAt(SlotIndex SI,		void llvm::printLivesAt(SlotIndex SI,
const LiveIntervals &LIS,		const LiveIntervals &LIS,
const MachineRegisterInfo &MRI) {		const MachineRegisterInfo &MRI) {
dbgs() << "Live regs at " << SI << ": "		dbgs() << "Live regs at " << SI << ": "
<< *LIS.getInstructionFromIndex(SI);		<< *LIS.getInstructionFromIndex(SI);
unsigned Num = 0;		unsigned Num = 0;
for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I) {		for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I) {
const unsigned Reg = TargetRegisterInfo::index2VirtReg(I);		const unsigned Reg = TargetRegisterInfo::index2VirtReg(I);
if (MRI.reg_nodbg_empty(Reg))		if (!LIS.hasInterval(Reg))
continue;		continue;
const auto &LI = LIS.getInterval(Reg);		const auto &LI = LIS.getInterval(Reg);
if (LI.hasSubRanges()) {		if (LI.hasSubRanges()) {
bool firstTime = true;		bool firstTime = true;
for (const auto &S : LI.subranges()) {		for (const auto &S : LI.subranges()) {
if (!S.liveAt(SI)) continue;		if (!S.liveAt(SI)) continue;
if (firstTime) {		if (firstTime) {
dbgs() << " " << PrintReg(Reg, MRI.getTargetRegisterInfo())		dbgs() << " " << PrintReg(Reg, MRI.getTargetRegisterInfo())
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// GCNRPTracker		// GCNRPTracker

LaneBitmask llvm::getLiveLaneMask(unsigned Reg,		LaneBitmask llvm::getLiveLaneMask(unsigned Reg,
SlotIndex SI,		SlotIndex SI,
const LiveIntervals &LIS,		const LiveIntervals &LIS,
const MachineRegisterInfo &MRI) {		const MachineRegisterInfo &MRI) {
assert(!MRI.reg_nodbg_empty(Reg));
LaneBitmask LiveMask;		LaneBitmask LiveMask;
const auto &LI = LIS.getInterval(Reg);		const auto &LI = LIS.getInterval(Reg);
if (LI.hasSubRanges()) {		if (LI.hasSubRanges()) {
for (const auto &S : LI.subranges())		for (const auto &S : LI.subranges())
if (S.liveAt(SI)) {		if (S.liveAt(SI)) {
LiveMask \|= S.LaneMask;		LiveMask \|= S.LaneMask;
assert(LiveMask < MRI.getMaxLaneMaskForVReg(Reg) \|\|		assert(LiveMask < MRI.getMaxLaneMaskForVReg(Reg) \|\|
LiveMask == MRI.getMaxLaneMaskForVReg(Reg));		LiveMask == MRI.getMaxLaneMaskForVReg(Reg));
}		}
} else if (LI.liveAt(SI)) {		} else if (LI.liveAt(SI)) {
LiveMask = MRI.getMaxLaneMaskForVReg(Reg);		LiveMask = MRI.getMaxLaneMaskForVReg(Reg);
}		}
return LiveMask;		return LiveMask;
}		}

GCNRPTracker::LiveRegSet llvm::getLiveRegs(SlotIndex SI,		GCNRPTracker::LiveRegSet llvm::getLiveRegs(SlotIndex SI,
const LiveIntervals &LIS,		const LiveIntervals &LIS,
const MachineRegisterInfo &MRI) {		const MachineRegisterInfo &MRI) {
GCNRPTracker::LiveRegSet LiveRegs;		GCNRPTracker::LiveRegSet LiveRegs;
for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I) {		for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I) {
auto Reg = TargetRegisterInfo::index2VirtReg(I);		auto Reg = TargetRegisterInfo::index2VirtReg(I);
if (MRI.reg_nodbg_empty(Reg))		if (!LIS.hasInterval(Reg))
continue;		continue;
auto LiveMask = getLiveLaneMask(Reg, SI, LIS, MRI);		auto LiveMask = getLiveLaneMask(Reg, SI, LIS, MRI);
if (LiveMask.any())		if (LiveMask.any())
LiveRegs[Reg] = LiveMask;		LiveRegs[Reg] = LiveMask;
}		}
return LiveRegs;		return LiveRegs;
}		}

void GCNUpwardRPTracker::reset(const MachineInstr &MI) {		LaneBitmask GCNRPTracker::getDefRegMask(const MachineOperand &MO) const {
MRI = &MI.getParent()->getParent()->getRegInfo();
LiveRegs = getLiveRegsAfter(MI, LIS);
MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);
}

LaneBitmask GCNUpwardRPTracker::getDefRegMask(const MachineOperand &MO) const {
assert(MO.isDef() && MO.isReg() &&		assert(MO.isDef() && MO.isReg() &&
TargetRegisterInfo::isVirtualRegister(MO.getReg()));		TargetRegisterInfo::isVirtualRegister(MO.getReg()));

// We don't rely on read-undef flag because in case of tentative schedule		// We don't rely on read-undef flag because in case of tentative schedule
// tracking it isn't set correctly yet. This works correctly however since		// tracking it isn't set correctly yet. This works correctly however since
// use mask has been tracked before using LIS.		// use mask has been tracked before using LIS.
return MO.getSubReg() == 0 ?		return MO.getSubReg() == 0 ?
MRI->getMaxLaneMaskForVReg(MO.getReg()) :		MRI->getMaxLaneMaskForVReg(MO.getReg()) :
MRI->getTargetRegisterInfo()->getSubRegIndexLaneMask(MO.getSubReg());		MRI->getTargetRegisterInfo()->getSubRegIndexLaneMask(MO.getSubReg());
}		}

LaneBitmask GCNUpwardRPTracker::getUsedRegMask(const MachineOperand &MO) const {		LaneBitmask GCNRPTracker::getUsedRegMask(const MachineOperand &MO) const {
assert(MO.isUse() && MO.isReg() &&		assert(MO.isUse() && MO.isReg() &&
TargetRegisterInfo::isVirtualRegister(MO.getReg()));		TargetRegisterInfo::isVirtualRegister(MO.getReg()));

if (auto SubReg = MO.getSubReg())		if (auto SubReg = MO.getSubReg())
return MRI->getTargetRegisterInfo()->getSubRegIndexLaneMask(SubReg);		return MRI->getTargetRegisterInfo()->getSubRegIndexLaneMask(SubReg);

auto MaxMask = MRI->getMaxLaneMaskForVReg(MO.getReg());		auto MaxMask = MRI->getMaxLaneMaskForVReg(MO.getReg());
if (MaxMask.getAsInteger() == 1) // cannot have subregs		if (MaxMask.getAsInteger() == 1) // cannot have subregs
return MaxMask;		return MaxMask;

// For a tentative schedule LIS isn't updated yet but livemask should remain		// For a tentative schedule LIS isn't updated yet but livemask should remain
// the same on any schedule. Subreg defs can be reordered but they all must		// the same on any schedule. Subreg defs can be reordered but they all must
// dominate uses anyway.		// dominate uses anyway.
auto SI = LIS.getInstructionIndex(*MO.getParent()).getBaseIndex();		auto SI = LIS.getInstructionIndex(*MO.getParent()).getBaseIndex();
return getLiveLaneMask(MO.getReg(), SI, LIS, *MRI);		return getLiveLaneMask(MO.getReg(), SI, LIS, *MRI);
}		}

		void GCNUpwardRPTracker::reset(const MachineInstr &MI,
		const LiveRegSet *LiveRegsCopy) {
		MRI = &MI.getParent()->getParent()->getRegInfo();
		if (LiveRegsCopy) {
		if (&LiveRegs != LiveRegsCopy)
		LiveRegs = *LiveRegsCopy;
		} else {
		LiveRegs = getLiveRegsAfter(MI, LIS);
		}
		MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);
		}

void GCNUpwardRPTracker::recede(const MachineInstr &MI) {		void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
assert(MRI && "call reset first");		assert(MRI && "call reset first");

LastTrackedMI = &MI;		LastTrackedMI = &MI;

if (MI.isDebugValue())		if (MI.isDebugValue())
return;		return;

Show All 22 Lines	for (const auto &MO : MI.uses()) {
auto PrevMask = LiveMask;		auto PrevMask = LiveMask;
LiveMask \|= getUsedRegMask(MO);		LiveMask \|= getUsedRegMask(MO);
CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);		CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);
}		}

MaxPressure = max(MaxPressure, CurPressure);		MaxPressure = max(MaxPressure, CurPressure);
}		}

		bool GCNDownwardRPTracker::reset(const MachineInstr &MI,
		const LiveRegSet *LiveRegsCopy) {
		MRI = &MI.getParent()->getParent()->getRegInfo();
		LastTrackedMI = nullptr;
		MBBEnd = MI.getParent()->end();
		NextMI = &MI;
		NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
		if (NextMI == MBBEnd)
		return false;
		if (LiveRegsCopy) {
		if (&LiveRegs != LiveRegsCopy)
		LiveRegs = *LiveRegsCopy;
		} else {
		LiveRegs = getLiveRegsBefore(*NextMI, LIS);
		}
		MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);
		return true;
		}

		bool GCNDownwardRPTracker::advanceBeforeNext() {
		assert(MRI && "call reset first");

		NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
		if (NextMI == MBBEnd)
		return false;

		SlotIndex SI = LIS.getInstructionIndex(*NextMI).getBaseIndex();
		assert(SI.isValid());

		// Remove dead registers or mask bits.
		for (auto &It : LiveRegs) {
		const LiveInterval &LI = LIS.getInterval(It.first);
		if (LI.hasSubRanges()) {
		for (const auto &S : LI.subranges()) {
		if (!S.liveAt(SI)) {
		auto PrevMask = It.second;
		It.second &= ~S.LaneMask;
		CurPressure.inc(It.first, PrevMask, It.second, *MRI);
		}
		}
		} else if (!LI.liveAt(SI)) {
		auto PrevMask = It.second;
		It.second = LaneBitmask::getNone();
		CurPressure.inc(It.first, PrevMask, It.second, *MRI);
		}
		if (It.second.none())
		LiveRegs.erase(It.first);
		}

		MaxPressure = max(MaxPressure, CurPressure);

		return true;
		}

		void GCNDownwardRPTracker::advanceToNext() {
		LastTrackedMI = &*NextMI++;

		// Add new registers or mask bits.
		for (const auto &MO : LastTrackedMI->defs()) {
		if (!MO.isReg())
		continue;
		unsigned Reg = MO.getReg();
		if (!TargetRegisterInfo::isVirtualRegister(Reg))
		continue;
		auto &LiveMask = LiveRegs[Reg];
		auto PrevMask = LiveMask;
		LiveMask \|= getDefRegMask(MO);
		CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);
		}

		MaxPressure = max(MaxPressure, CurPressure);
		}

		bool GCNDownwardRPTracker::advance() {
		// If we have just called reset live set is actual.
		if ((NextMI == MBBEnd) \|\| (LastTrackedMI && !advanceBeforeNext()))
		return false;
		advanceToNext();
		return true;
		}

		bool GCNDownwardRPTracker::advance(MachineBasicBlock::const_iterator End) {
		while (NextMI != End)
		if (!advance()) return false;
		return true;
		}

		bool GCNDownwardRPTracker::advance(MachineBasicBlock::const_iterator Begin,
		MachineBasicBlock::const_iterator End,
		const LiveRegSet *LiveRegsCopy) {
		reset(*Begin, LiveRegsCopy);
		return advance(End);
		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
static void reportMismatch(const GCNRPTracker::LiveRegSet &LISLR,		static void reportMismatch(const GCNRPTracker::LiveRegSet &LISLR,
const GCNRPTracker::LiveRegSet &TrackedLR,		const GCNRPTracker::LiveRegSet &TrackedLR,
const TargetRegisterInfo *TRI) {		const TargetRegisterInfo *TRI) {
for (auto const &P : TrackedLR) {		for (auto const &P : TrackedLR) {
auto I = LISLR.find(P.first);		auto I = LISLR.find(P.first);
if (I == LISLR.end()) {		if (I == LISLR.end()) {
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/GCNSchedStrategy.h

//===-- GCNSchedStrategy.h - GCN Scheduler Strategy -- C++ --------------===//		//===-- GCNSchedStrategy.h - GCN Scheduler Strategy -- C++ --------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H		#ifndef LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H
#define LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H		#define LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H

		#include "GCNRegPressure.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"

namespace llvm {		namespace llvm {

class SIMachineFunctionInfo;		class SIMachineFunctionInfo;
class SIRegisterInfo;		class SIRegisterInfo;
class SISubtarget;		class SISubtarget;

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	class GCNScheduleDAGMILive : public ScheduleDAGMILive {
// Scheduling stage number.		// Scheduling stage number.
unsigned Stage;		unsigned Stage;

// Vecor of regions recorder for later rescheduling		// Vecor of regions recorder for later rescheduling
SmallVector<std::pair<MachineBasicBlock::iterator,		SmallVector<std::pair<MachineBasicBlock::iterator,
MachineBasicBlock::iterator>, 32> Regions;		MachineBasicBlock::iterator>, 32> Regions;

// Region live-ins.		// Region live-ins.
DenseMap<unsigned, LaneBitmask> LiveIns;		GCNRPTracker::LiveRegSet LiveIns;

// Number of live-ins to the current region, first SGPR then VGPR.
std::pair<unsigned, unsigned> LiveInPressure;

// Collect current region live-ins.		// Collect current region live-ins.
void discoverLiveIns();		void discoverLiveIns();

// Return current region pressure. First value is SGPR number, second is VGPR.		// Return current region pressure.
std::pair<unsigned, unsigned> getRealRegPressure() const;		GCNRegPressure getRealRegPressure() const;

public:		public:
GCNScheduleDAGMILive(MachineSchedContext *C,		GCNScheduleDAGMILive(MachineSchedContext *C,
std::unique_ptr<MachineSchedStrategy> S);		std::unique_ptr<MachineSchedStrategy> S);

void schedule() override;		void schedule() override;

void finalizeSchedule() override;		void finalizeSchedule() override;
};		};

} // End namespace llvm		} // End namespace llvm

#endif // GCNSCHEDSTRATEGY_H		#endif // GCNSCHEDSTRATEGY_H

llvm/trunk/lib/Target/AMDGPU/GCNSchedStrategy.cpp

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines
}		}

void GCNScheduleDAGMILive::schedule() {		void GCNScheduleDAGMILive::schedule() {
std::vector<MachineInstr*> Unsched;		std::vector<MachineInstr*> Unsched;
Unsched.reserve(NumRegionInstrs);		Unsched.reserve(NumRegionInstrs);
for (auto &I : *this)		for (auto &I : *this)
Unsched.push_back(&I);		Unsched.push_back(&I);

std::pair<unsigned, unsigned> PressureBefore;		GCNRegPressure PressureBefore;
if (LIS) {		if (LIS) {
DEBUG(dbgs() << "Pressure before scheduling:\n");
discoverLiveIns();		discoverLiveIns();
PressureBefore = getRealRegPressure();		PressureBefore = getRealRegPressure();

		DEBUG(dbgs() << "Pressure before scheduling:\nSGPR = "
		<< PressureBefore.getSGPRNum()
		<< "\nVGPR = " << PressureBefore.getVGPRNum() << '\n');

}		}

ScheduleDAGMILive::schedule();		ScheduleDAGMILive::schedule();
if (Stage == 0)		if (Stage == 0)
Regions.push_back(std::make_pair(RegionBegin, RegionEnd));		Regions.push_back(std::make_pair(RegionBegin, RegionEnd));

if (!LIS)		if (!LIS)
return;		return;

// Check the results of scheduling.		// Check the results of scheduling.
GCNMaxOccupancySchedStrategy &S = (GCNMaxOccupancySchedStrategy&)*SchedImpl;		GCNMaxOccupancySchedStrategy &S = (GCNMaxOccupancySchedStrategy&)*SchedImpl;
DEBUG(dbgs() << "Pressure after scheduling:\n");
auto PressureAfter = getRealRegPressure();		auto PressureAfter = getRealRegPressure();

		DEBUG(dbgs() << "Pressure after scheduling:\nSGPR = "
		<< PressureAfter.getSGPRNum()
		<< "\nVGPR = " << PressureAfter.getVGPRNum() << '\n');

LiveIns.clear();		LiveIns.clear();

if (PressureAfter.first <= S.SGPRCriticalLimit &&		if (PressureAfter.getSGPRNum() <= S.SGPRCriticalLimit &&
PressureAfter.second <= S.VGPRCriticalLimit) {		PressureAfter.getVGPRNum() <= S.VGPRCriticalLimit) {
DEBUG(dbgs() << "Pressure in desired limits, done.\n");		DEBUG(dbgs() << "Pressure in desired limits, done.\n");
return;		return;
}		}
unsigned WavesAfter = getMaxWaves(PressureAfter.first,		unsigned WavesAfter = getMaxWaves(PressureAfter.getSGPRNum(),
PressureAfter.second, MF);		PressureAfter.getVGPRNum(), MF);
unsigned WavesBefore = getMaxWaves(PressureBefore.first,		unsigned WavesBefore = getMaxWaves(PressureBefore.getSGPRNum(),
PressureBefore.second, MF);		PressureBefore.getVGPRNum(), MF);
DEBUG(dbgs() << "Occupancy before scheduling: " << WavesBefore <<		DEBUG(dbgs() << "Occupancy before scheduling: " << WavesBefore <<
", after " << WavesAfter << ".\n");		", after " << WavesAfter << ".\n");

// We could not keep current target occupancy because of the just scheduled		// We could not keep current target occupancy because of the just scheduled
// region. Record new occupancy for next scheduling cycle.		// region. Record new occupancy for next scheduling cycle.
unsigned NewOccupancy = std::max(WavesAfter, WavesBefore);		unsigned NewOccupancy = std::max(WavesAfter, WavesBefore);
if (NewOccupancy < MinOccupancy) {		if (NewOccupancy < MinOccupancy) {
MinOccupancy = NewOccupancy;		MinOccupancy = NewOccupancy;
Show All 32 Lines	void GCNScheduleDAGMILive::schedule() {
}		}
RegionBegin = Unsched.front()->getIterator();		RegionBegin = Unsched.front()->getIterator();
if (Stage == 0)		if (Stage == 0)
Regions.back() = std::make_pair(RegionBegin, RegionEnd);		Regions.back() = std::make_pair(RegionBegin, RegionEnd);

placeDebugValues();		placeDebugValues();
}		}

static inline void setMask(const MachineRegisterInfo &MRI,
const SIRegisterInfo *SRI, unsigned Reg,
LaneBitmask &PrevMask, LaneBitmask NewMask,
unsigned &SGPRs, unsigned &VGPRs) {
int NewRegs = countPopulation(NewMask.getAsInteger()) -
countPopulation(PrevMask.getAsInteger());
if (SRI->isSGPRReg(MRI, Reg))
SGPRs += NewRegs;
if (SRI->isVGPR(MRI, Reg))
VGPRs += NewRegs;
assert ((int)SGPRs >= 0 && (int)VGPRs >= 0);
PrevMask = NewMask;
}

void GCNScheduleDAGMILive::discoverLiveIns() {		void GCNScheduleDAGMILive::discoverLiveIns() {
unsigned SGPRs = 0;		GCNDownwardRPTracker RPTracker(*LIS);
unsigned VGPRs = 0;		RPTracker.reset(*begin());

auto I = begin();		LiveIns = RPTracker.moveLiveRegs();
I = skipDebugInstructionsForward(I, I->getParent()->end());
const SIRegisterInfo SRI = static_cast<const SIRegisterInfo>(TRI);
SlotIndex SI = LIS->getInstructionIndex(*I).getBaseIndex();
assert (SI.isValid());

DEBUG(dbgs() << "Region live-ins:");		DEBUG(GCNRegPressure LiveInPressure = RPTracker.moveMaxPressure();
		const SIRegisterInfo SRI = static_cast<const SIRegisterInfo>(TRI);
		dbgs() << "Region live-ins:";
for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I) {		for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I) {
unsigned Reg = TargetRegisterInfo::index2VirtReg(I);		unsigned Reg = TargetRegisterInfo::index2VirtReg(I);
if (MRI.reg_nodbg_empty(Reg))		auto It = LiveIns.find(Reg);
continue;		if (It != LiveIns.end())
const LiveInterval &LI = LIS->getInterval(Reg);		dbgs() << ' ' << PrintVRegOrUnit(Reg, SRI) << ':'
LaneBitmask LaneMask = LaneBitmask::getNone();		<< PrintLaneMask(It->second);
if (LI.hasSubRanges()) {		}
for (const auto &S : LI.subranges())		dbgs() << "\nLive-in pressure:\nSGPR = "
if (S.liveAt(SI))		<< LiveInPressure.getSGPRNum()
LaneMask \|= S.LaneMask;		<< "\nVGPR = " << LiveInPressure.getVGPRNum() << '\n');
} else if (LI.liveAt(SI)) {
LaneMask = MRI.getMaxLaneMaskForVReg(Reg);
}

if (LaneMask.any()) {
setMask(MRI, SRI, Reg, LiveIns[Reg], LaneMask, SGPRs, VGPRs);

DEBUG(dbgs() << ' ' << PrintVRegOrUnit(Reg, SRI) << ':'
<< PrintLaneMask(LiveIns[Reg]));
}
}

LiveInPressure = std::make_pair(SGPRs, VGPRs);

DEBUG(dbgs() << "\nLive-in pressure:\nSGPR = " << SGPRs
<< "\nVGPR = " << VGPRs << '\n');
}		}

std::pair<unsigned, unsigned>		GCNRegPressure GCNScheduleDAGMILive::getRealRegPressure() const {
GCNScheduleDAGMILive::getRealRegPressure() const {		GCNDownwardRPTracker RPTracker(*LIS);
unsigned SGPRs, MaxSGPRs, VGPRs, MaxVGPRs;		RPTracker.advance(begin(), end(), &LiveIns);
SGPRs = MaxSGPRs = LiveInPressure.first;		return RPTracker.moveMaxPressure();
VGPRs = MaxVGPRs = LiveInPressure.second;

const SIRegisterInfo SRI = static_cast<const SIRegisterInfo>(TRI);
DenseMap<unsigned, LaneBitmask> LiveRegs(LiveIns);

for (const MachineInstr &MI : *this) {
if (MI.isDebugValue())
continue;
SlotIndex SI = LIS->getInstructionIndex(MI).getBaseIndex();
assert (SI.isValid());

// Remove dead registers or mask bits.
for (auto &It : LiveRegs) {
if (It.second.none())
continue;
const LiveInterval &LI = LIS->getInterval(It.first);
if (LI.hasSubRanges()) {
for (const auto &S : LI.subranges())
if (!S.liveAt(SI))
setMask(MRI, SRI, It.first, It.second, It.second & ~S.LaneMask,
SGPRs, VGPRs);
} else if (!LI.liveAt(SI)) {
setMask(MRI, SRI, It.first, It.second, LaneBitmask::getNone(),
SGPRs, VGPRs);
}
}

// Add new registers or mask bits.
for (const auto &MO : MI.defs()) {
if (!MO.isReg())
continue;
unsigned Reg = MO.getReg();
if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;
unsigned SubRegIdx = MO.getSubReg();
LaneBitmask LaneMask = SubRegIdx != 0
? TRI->getSubRegIndexLaneMask(SubRegIdx)
: MRI.getMaxLaneMaskForVReg(Reg);
LaneBitmask &LM = LiveRegs[Reg];
setMask(MRI, SRI, Reg, LM, LM \| LaneMask, SGPRs, VGPRs);
}
MaxSGPRs = std::max(MaxSGPRs, SGPRs);
MaxVGPRs = std::max(MaxVGPRs, VGPRs);
}

DEBUG(dbgs() << "Real region's register pressure:\nSGPR = " << MaxSGPRs
<< "\nVGPR = " << MaxVGPRs << '\n');

return std::make_pair(MaxSGPRs, MaxVGPRs);
}		}

void GCNScheduleDAGMILive::finalizeSchedule() {		void GCNScheduleDAGMILive::finalizeSchedule() {
// Retry function scheduling if we found resulting occupancy and it is		// Retry function scheduling if we found resulting occupancy and it is
// lower than used for first pass scheduling. This will give more freedom		// lower than used for first pass scheduling. This will give more freedom
// to schedule low register pressure blocks.		// to schedule low register pressure blocks.
// Code is partially copied from MachineSchedulerBase::scheduleRegions().		// Code is partially copied from MachineSchedulerBase::scheduleRegions().

▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines