This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
GCNRegPressure.h
1/22
GCNRegPressure.cpp

Differential D33289

[AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker
ClosedPublic

Authored by vpykhtin on May 17 2017, 10:25 AM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm

Commits

rG74cb9c88314a: [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker
rL303548: [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker

Summary

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

After several attempts to fix it with fewer lines of code I decided it would be easier to introduce MachineInstrRegs class which collects register def/uses for a machine instruction with theirs linemasks. This class does job similar to standard llvm's RegisterOperands class but much smaller.

Trying to figure out a test for this.

Diff Detail

Build Status

Buildable 6581
Build 6581: arc lint + arc unit

Event Timeline

vpykhtin created this revision.May 17 2017, 10:25 AM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptMay 17 2017, 10:25 AM

Where is GCNUpwardRPTracker::reset() now? What is the base revision this diff is taken against? The GCNUpwardRPTracker::reset() is right before recede() in the current revision, but I do not see it in the left pane as well. Same with some other functions like getDefRegMask.

lib/Target/AMDGPU/GCNRegPressure.cpp
314	You have GCNRPTracker::getDefRegMask() for this.
318	Do not you want to also erase it from LiveRegs if LaneMask.none()?
319	And there is getUsedRegMask() for this too.
330	Why not to do it right above, where you assign getAll()? It seems to be less work.

rampitec added inline comments.May 17 2017, 2:51 PM

lib/Target/AMDGPU/GCNRegPressure.cpp
318	You are now updating MaxPressure correctly, but your CurPressure is incorrect. At this point defs still contribute to the current pressure, they will be out only with the next recede. That may lead scheduler to wrong decisions about a current instruction. You can also run into a paradoxical situation that no single instruction has a pressure equal to max. I still believe that in the situation where we want to account for both defs and uses contributing to the RP of an instruction we naturally have a two step advance/recede process - step before/past the instruction and actually step to it.

vpykhtin added inline comments.May 18 2017, 3:15 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
314	This is another class, though getDefRegMask can be made nonmember
318	Why to erase? Actually recede moves from the point after the instruction to the point before the instruction in top-down order accounting max pressure interim. recede never stops at the instruction.
319	I'm avoiding getLiveLaneMask call here before all registers collected
330	This avoids getLiveLaneMask call more than once for a register

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

In D33289#758351, @vpykhtin wrote:

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

I see it now, thanks. Can you move it higher please?

lib/Target/AMDGPU/GCNRegPressure.cpp
314	Probably static inline is just right for it. Even for getUsedRegMask if you pass LIS to it.
318	If you erase you will not iterate over it getRegPressure(). I understand your point that recede does not stop in steps, but I'm still concerned that you will not get a correct CurPressure, or even will not get CurPressure equal to max pressure anywhere in the region. How about that?
330	Do you see a common situation where the same register is used in the same instruction more than once? This sounds quite exotic to me, provided we are speaking about uses only, not defs.

In D33289#758647, @rampitec wrote:

In D33289#758351, @vpykhtin wrote:

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

I see it now, thanks. Can you move it higher please?

Yes I'll fix that.

lib/Target/AMDGPU/GCNRegPressure.cpp
318	Ok, now that live regs can be reused it may have the point to clear a register immediately. Previously I used stripEmpty on the set but only for debug printing purposes. CurPressure isn't calculated for the at-the-instruction level, its calculated for the after recede point. I put an assert that CurPressure calculated correctly in the end of recede. CurPressure can never become MaxPressure, but I don't see a problem here. There is no at-the-instruction position in the tracker - it is always in between.
330	I agree its going to be very rare ocasion, but the main purpose of this class is to deduplicate register def/uses so it wouldn't be counted twice when calculating pressure. So if I deduplicated registers already then I can save some time on calculating mask.

rampitec added inline comments.May 18 2017, 10:30 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
314	Actually getUsedRegMask is unused now, so you can just delete it.
330	I mean, if that is extremely rare you may lose more in the inexpensive but way more often called second loop.

vpykhtin added inline comments.May 18 2017, 10:37 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
330	Ok, I'll make getDefRegMask and getUsedRegMask static and reuse it here removing bottom loop, thanks.

rampitec added inline comments.May 19 2017, 2:21 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
288	It looks like defs are not really needed in this class. Uses needed because you walk them twice, but defs can be just directly processed. I.e. the code can be simplified and overhead somewhat reduced.
318	I see it as a sort of quantum tracker. It hides the intermediate step where pressure actually peaks from an observer. As long as we agree on that understanding I have no objection on submitting such a tracker where actual pressure "tunnels" through recede method as we have no actual interested observers currently. We might want to split the method in the future if we have them.

rampitec mentioned this in D33087: [AMDGCN] Fix overly optimistic GCNUpwardRPTracker.May 19 2017, 2:25 AM

fixed as per comments

LGTM. Thanks!

This revision is now accepted and ready to land.May 19 2017, 9:21 AM

rampitec added inline comments.May 19 2017, 9:43 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
313	Are you sure it should always present? What if we have a dead def? I.e. an instruction defines a register which is never used. I guess it will not be reported by LIS. If so this should be if (I != LiveRegs.end()) continue;

rampitec added inline comments.May 19 2017, 9:44 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
313	Sorry, if (I == LiveRegs.end()) continue;

Closed by commit rL303548: [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker (authored by vpykhtin). · Explain WhyMay 22 2017, 6:09 AM

This revision was automatically updated to reflect the committed changes.

vpykhtin marked an inline comment as done.May 22 2017, 6:11 AM

vpykhtin added inline comments.

lib/Target/AMDGPU/GCNRegPressure.cpp
313	Done, thanks!

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 16 2023, 6:46 AM

Herald added subscribers: llvm-commits, kerbowa, jvesely. · View Herald Transcript

In D33289#4654033, @foad wrote:

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Would it make sense to only do the AtMIPressure part for instructions with early clobber?

In D33289#4654038, @piotr wrote:

In D33289#4654033, @foad wrote:

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Would it make sense to only do the AtMIPressure part for instructions with early clobber?

You're right, we should not increment pressure for early-clobbers twice in AtMIPressure

You're right, we should not increment pressure for early-clobbers twice in AtMIPressure

Sorry, disregard this comment. We should only account for early-clobbers this way.

Is anyone working on this at the moment?

In D33289#4654317, @vpykhtin wrote:

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

In D33289#4654324, @foad wrote:

In D33289#4654317, @vpykhtin wrote:

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

I can probably come up with a unit test in the way similar to how we test LiveIntervals and LiveVariables.

In D33289#4654325, @vpykhtin wrote:

In D33289#4654324, @foad wrote:

In D33289#4654317, @vpykhtin wrote:

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

I can probably come up with a unit test in the way similar to how we test LiveIntervals and LiveVariables.

I wonder if we can test it more directly by adding an analysis pass like this: https://github.com/GPUOpen-Drivers/llvm-project/commit/042be23e3d98963fb02833511a86f4e26378a04d
and then using something like opt -passes='print<amdgpu-reg-press>'.

I wonder if we can test it more directly by adding an analysis pass like this: https://github.com/GPUOpen-Drivers/llvm-project/commit/042be23e3d98963fb02833511a86f4e26378a04d
and then using something like opt -passes='print<amdgpu-reg-press>'.

We can try to print reg pressure at every instruction

Something like this? (don't forget to expand *.mir file diff, it's not shown by default)

https://github.com/llvm/llvm-project/compare/main...vpykhtin:llvm-project:rp_printer

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

GCNRegPressure.h

2 lines

GCNRegPressure.cpp

145 lines

Diff 99544

lib/Target/AMDGPU/GCNRegPressure.h

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines

	protected:			protected:
	const LiveIntervals &LIS;			const LiveIntervals &LIS;
	LiveRegSet LiveRegs;			LiveRegSet LiveRegs;
	GCNRegPressure CurPressure, MaxPressure;			GCNRegPressure CurPressure, MaxPressure;
	const MachineInstr *LastTrackedMI = nullptr;			const MachineInstr *LastTrackedMI = nullptr;
	mutable const MachineRegisterInfo *MRI = nullptr;			mutable const MachineRegisterInfo *MRI = nullptr;
	GCNRPTracker(const LiveIntervals &LIS_) : LIS(LIS_) {}			GCNRPTracker(const LiveIntervals &LIS_) : LIS(LIS_) {}
	LaneBitmask getDefRegMask(const MachineOperand &MO) const;
	LaneBitmask getUsedRegMask(const MachineOperand &MO) const;
	public:			public:
	// live regs for the current state			// live regs for the current state
	const decltype(LiveRegs) &getLiveRegs() const { return LiveRegs; }			const decltype(LiveRegs) &getLiveRegs() const { return LiveRegs; }
	const MachineInstr *getLastTrackedMI() const { return LastTrackedMI; }			const MachineInstr *getLastTrackedMI() const { return LastTrackedMI; }

	void clearMaxPressure() { MaxPressure.clear(); }			void clearMaxPressure() { MaxPressure.clear(); }

	// returns MaxPressure, resetting it			// returns MaxPressure, resetting it
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

lib/Target/AMDGPU/GCNRegPressure.cpp

//===------------------------- GCNRegPressure.cpp - -----------------------===//		//===------------------------- GCNRegPressure.cpp - -----------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "GCNRegPressure.h"		#include "GCNRegPressure.h"
		#include "llvm/CodeGen/RegisterPressure.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "misched"		#define DEBUG_TYPE "misched"

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
void llvm::printLivesAt(SlotIndex SI,		void llvm::printLivesAt(SlotIndex SI,
Show All 35 Lines	static bool isEqual(const GCNRPTracker::LiveRegSet &S1,
for (const auto &P : S1) {		for (const auto &P : S1) {
auto I = S2.find(P.first);		auto I = S2.find(P.first);
if (I == S2.end() \|\| I->second != P.second)		if (I == S2.end() \|\| I->second != P.second)
return false;		return false;
}		}
return true;		return true;
}		}

static GCNRPTracker::LiveRegSet
stripEmpty(const GCNRPTracker::LiveRegSet &LR) {
GCNRPTracker::LiveRegSet Res;
for (const auto &P : LR) {
if (P.second.any())
Res.insert(P);
}
return Res;
}
#endif		#endif

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// GCNRegPressure		// GCNRegPressure

unsigned GCNRegPressure::getRegKind(unsigned Reg,		unsigned GCNRegPressure::getRegKind(unsigned Reg,
const MachineRegisterInfo &MRI) {		const MachineRegisterInfo &MRI) {
assert(TargetRegisterInfo::isVirtualRegister(Reg));		assert(TargetRegisterInfo::isVirtualRegister(Reg));
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	void GCNRegPressure::print(raw_ostream &OS, const SISubtarget *ST) const {
if (ST) OS << "(O" << ST->getOccupancyWithNumSGPRs(getSGPRNum()) << ')';		if (ST) OS << "(O" << ST->getOccupancyWithNumSGPRs(getSGPRNum()) << ')';
OS << ", LVGPR WT: " << getVGPRTuplesWeight()		OS << ", LVGPR WT: " << getVGPRTuplesWeight()
<< ", LSGPR WT: " << getSGPRTuplesWeight();		<< ", LSGPR WT: " << getSGPRTuplesWeight();
if (ST) OS << " -> Occ: " << getOccupancy(*ST);		if (ST) OS << " -> Occ: " << getOccupancy(*ST);
OS << '\n';		OS << '\n';
}		}
#endif		#endif


		static LaneBitmask getDefRegMask(const MachineOperand &MO,
		const MachineRegisterInfo &MRI) {
		assert(MO.isDef() && MO.isReg() &&
		TargetRegisterInfo::isVirtualRegister(MO.getReg()));

		// We don't rely on read-undef flag because in case of tentative schedule
		// tracking it isn't set correctly yet. This works correctly however since
		// use mask has been tracked before using LIS.
		return MO.getSubReg() == 0 ?
		MRI.getMaxLaneMaskForVReg(MO.getReg()) :
		MRI.getTargetRegisterInfo()->getSubRegIndexLaneMask(MO.getSubReg());
		}

		static LaneBitmask getUsedRegMask(const MachineOperand &MO,
		const MachineRegisterInfo &MRI,
		const LiveIntervals &LIS) {
		assert(MO.isUse() && MO.isReg() &&
		TargetRegisterInfo::isVirtualRegister(MO.getReg()));

		if (auto SubReg = MO.getSubReg())
		return MRI.getTargetRegisterInfo()->getSubRegIndexLaneMask(SubReg);

		auto MaxMask = MRI.getMaxLaneMaskForVReg(MO.getReg());
		if (MaxMask.getAsInteger() == 1) // cannot have subregs
		return MaxMask;

		// For a tentative schedule LIS isn't updated yet but livemask should remain
		// the same on any schedule. Subreg defs can be reordered but they all must
		// dominate uses anyway.
		auto SI = LIS.getInstructionIndex(*MO.getParent()).getBaseIndex();
		return getLiveLaneMask(MO.getReg(), SI, LIS, MRI);
		}

		SmallVector<RegisterMaskPair, 8> collectVirtualRegUses(const MachineInstr &MI,
		const LiveIntervals &LIS,
		const MachineRegisterInfo &MRI) {
		SmallVector<RegisterMaskPair, 8> Res;
		for (const auto &MO : MI.operands()) {
		if (!MO.isReg() \|\| !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
		continue;
		if (!MO.isUse() \|\| !MO.readsReg())
		continue;

		auto const UsedMask = getUsedRegMask(MO, MRI, LIS);

		auto Reg = MO.getReg();
		auto I = std::find_if(Res.begin(), Res.end(), [Reg](const RegisterMaskPair &RM) {
		return RM.RegUnit == Reg;
		});
		if (I != Res.end())
		I->LaneMask \|= UsedMask;
		else
		Res.push_back(RegisterMaskPair(Reg, UsedMask));
		}
		return Res;
		}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// GCNRPTracker		// GCNRPTracker

LaneBitmask llvm::getLiveLaneMask(unsigned Reg,		LaneBitmask llvm::getLiveLaneMask(unsigned Reg,
SlotIndex SI,		SlotIndex SI,
const LiveIntervals &LIS,		const LiveIntervals &LIS,
const MachineRegisterInfo &MRI) {		const MachineRegisterInfo &MRI) {
LaneBitmask LiveMask;		LaneBitmask LiveMask;
Show All 21 Lines	if (!LIS.hasInterval(Reg))
continue;		continue;
auto LiveMask = getLiveLaneMask(Reg, SI, LIS, MRI);		auto LiveMask = getLiveLaneMask(Reg, SI, LIS, MRI);
if (LiveMask.any())		if (LiveMask.any())
LiveRegs[Reg] = LiveMask;		LiveRegs[Reg] = LiveMask;
}		}
return LiveRegs;		return LiveRegs;
}		}

LaneBitmask GCNRPTracker::getDefRegMask(const MachineOperand &MO) const {
assert(MO.isDef() && MO.isReg() &&
TargetRegisterInfo::isVirtualRegister(MO.getReg()));

// We don't rely on read-undef flag because in case of tentative schedule
// tracking it isn't set correctly yet. This works correctly however since
// use mask has been tracked before using LIS.
return MO.getSubReg() == 0 ?
MRI->getMaxLaneMaskForVReg(MO.getReg()) :
MRI->getTargetRegisterInfo()->getSubRegIndexLaneMask(MO.getSubReg());
}

LaneBitmask GCNRPTracker::getUsedRegMask(const MachineOperand &MO) const {
assert(MO.isUse() && MO.isReg() &&
TargetRegisterInfo::isVirtualRegister(MO.getReg()));

if (auto SubReg = MO.getSubReg())
return MRI->getTargetRegisterInfo()->getSubRegIndexLaneMask(SubReg);

auto MaxMask = MRI->getMaxLaneMaskForVReg(MO.getReg());
if (MaxMask.getAsInteger() == 1) // cannot have subregs
return MaxMask;

// For a tentative schedule LIS isn't updated yet but livemask should remain
// the same on any schedule. Subreg defs can be reordered but they all must
// dominate uses anyway.
auto SI = LIS.getInstructionIndex(*MO.getParent()).getBaseIndex();
return getLiveLaneMask(MO.getReg(), SI, LIS, *MRI);
}

void GCNUpwardRPTracker::reset(const MachineInstr &MI,		void GCNUpwardRPTracker::reset(const MachineInstr &MI,
const LiveRegSet *LiveRegsCopy) {		const LiveRegSet *LiveRegsCopy) {
MRI = &MI.getParent()->getParent()->getRegInfo();		MRI = &MI.getParent()->getParent()->getRegInfo();
if (LiveRegsCopy) {		if (LiveRegsCopy) {
if (&LiveRegs != LiveRegsCopy)		if (&LiveRegs != LiveRegsCopy)
LiveRegs = *LiveRegsCopy;		LiveRegs = *LiveRegsCopy;
} else {		} else {
LiveRegs = getLiveRegsAfter(MI, LIS);		LiveRegs = getLiveRegsAfter(MI, LIS);
}		}
MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);		MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);
}		}

void GCNUpwardRPTracker::recede(const MachineInstr &MI) {		void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
assert(MRI && "call reset first");		assert(MRI && "call reset first");
		rampitecUnsubmitted Not Done Reply Inline Actions It looks like defs are not really needed in this class. Uses needed because you walk them twice, but defs can be just directly processed. I.e. the code can be simplified and overhead somewhat reduced. rampitec: It looks like defs are not really needed in this class. Uses needed because you walk them twice…

LastTrackedMI = &MI;		LastTrackedMI = &MI;

if (MI.isDebugValue())		if (MI.isDebugValue())
return;		return;

// process all defs first to ensure early clobbers are handled correctly		auto const RegUses = collectVirtualRegUses(MI, LIS, *MRI);
// iterating over operands() to catch implicit defs
for (const auto &MO : MI.operands()) {		// calc pressure at the MI (defs + uses)
if (!MO.isReg() \|\| !MO.isDef() \|\|		auto AtMIPressure = CurPressure;
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))		for (const auto &U : RegUses) {
		auto LiveMask = LiveRegs[U.RegUnit];
		AtMIPressure.inc(U.RegUnit, LiveMask, LiveMask \| U.LaneMask, *MRI);
		}
		// update max pressure
		MaxPressure = max(AtMIPressure, MaxPressure);

		for (const auto &MO : MI.defs()) {
		if (!MO.isReg() \|\| !TargetRegisterInfo::isVirtualRegister(MO.getReg()) \|\|
		MO.isDead())
continue;		continue;

auto Reg = MO.getReg();		auto Reg = MO.getReg();
auto &LiveMask = LiveRegs[Reg];		auto I = LiveRegs.find(Reg);
		assert(I != LiveRegs.end());
		rampitecUnsubmitted Not Done Reply Inline Actions Are you sure it should always present? What if we have a dead def? I.e. an instruction defines a register which is never used. I guess it will not be reported by LIS. If so this should be if (I != LiveRegs.end()) continue; rampitec: Are you sure it should always present? What if we have a dead def? I.e. an instruction defines…
		rampitecUnsubmitted Done Reply Inline Actions Sorry, if (I == LiveRegs.end()) continue; rampitec: Sorry, if (I == LiveRegs.end()) continue;
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Done, thanks! vpykhtin: Done, thanks!
		auto &LiveMask = I->second;
		rampitecUnsubmitted Not Done Reply Inline Actions You have GCNRPTracker::getDefRegMask() for this. rampitec: You have GCNRPTracker::getDefRegMask() for this.
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions This is another class, though getDefRegMask can be made nonmember vpykhtin: This is another class, though getDefRegMask can be made nonmember
		rampitecUnsubmitted Not Done Reply Inline Actions Probably static inline is just right for it. Even for getUsedRegMask if you pass LIS to it. rampitec: Probably static inline is just right for it. Even for getUsedRegMask if you pass LIS to it.
		rampitecUnsubmitted Not Done Reply Inline Actions Actually getUsedRegMask is unused now, so you can just delete it. rampitec: Actually getUsedRegMask is unused now, so you can just delete it.
auto PrevMask = LiveMask;		auto PrevMask = LiveMask;
LiveMask &= ~getDefRegMask(MO);		LiveMask &= ~ getDefRegMask(MO, *MRI);
CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);		CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);
		if (LiveMask.none())
		rampitecUnsubmitted Not Done Reply Inline Actions Do not you want to also erase it from LiveRegs if LaneMask.none()? rampitec: Do not you want to also erase it from LiveRegs if LaneMask.none()?
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Why to erase? Actually recede moves from the point after the instruction to the point before the instruction in top-down order accounting max pressure interim. recede never stops at the instruction. vpykhtin: Why to erase? Actually recede moves from the point after the instruction to the point before…
		rampitecUnsubmitted Not Done Reply Inline Actions If you erase you will not iterate over it getRegPressure(). I understand your point that recede does not stop in steps, but I'm still concerned that you will not get a correct CurPressure, or even will not get CurPressure equal to max pressure anywhere in the region. How about that? rampitec: If you erase you will not iterate over it getRegPressure(). I understand your point that…
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Ok, now that live regs can be reused it may have the point to clear a register immediately. Previously I used stripEmpty on the set but only for debug printing purposes. CurPressure isn't calculated for the at-the-instruction level, its calculated for the after recede point. I put an assert that CurPressure calculated correctly in the end of recede. CurPressure can never become MaxPressure, but I don't see a problem here. There is no at-the-instruction position in the tracker - it is always in between. vpykhtin: Ok, now that live regs can be reused it may have the point to clear a register immediately.
		rampitecUnsubmitted Not Done Reply Inline Actions You are now updating MaxPressure correctly, but your CurPressure is incorrect. At this point defs still contribute to the current pressure, they will be out only with the next recede. That may lead scheduler to wrong decisions about a current instruction. You can also run into a paradoxical situation that no single instruction has a pressure equal to max. I still believe that in the situation where we want to account for both defs and uses contributing to the RP of an instruction we naturally have a two step advance/recede process - step before/past the instruction and actually step to it. rampitec: You are now updating MaxPressure correctly, but your CurPressure is incorrect. At this point…
		rampitecUnsubmitted Not Done Reply Inline Actions I see it as a sort of quantum tracker. It hides the intermediate step where pressure actually peaks from an observer. As long as we agree on that understanding I have no objection on submitting such a tracker where actual pressure "tunnels" through recede method as we have no actual interested observers currently. We might want to split the method in the future if we have them. rampitec: I see it as a sort of quantum tracker. It hides the intermediate step where pressure actually…
		LiveRegs.erase(I);
		rampitecUnsubmitted Not Done Reply Inline Actions And there is getUsedRegMask() for this too. rampitec: And there is getUsedRegMask() for this too.
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions I'm avoiding getLiveLaneMask call here before all registers collected vpykhtin: I'm avoiding getLiveLaneMask call here before all registers collected
}		}
		for (const auto &U : RegUses) {
// then all uses		auto &LiveMask = LiveRegs[U.RegUnit];
for (const auto &MO : MI.uses()) {
if (!MO.isReg() \|\| !MO.readsReg() \|\|
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
continue;

auto Reg = MO.getReg();
auto &LiveMask = LiveRegs[Reg];
auto PrevMask = LiveMask;		auto PrevMask = LiveMask;
LiveMask \|= getUsedRegMask(MO);		LiveMask \|= U.LaneMask;
CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);		CurPressure.inc(U.RegUnit, PrevMask, LiveMask, *MRI);
}		}
		assert(CurPressure == getRegPressure(*MRI, LiveRegs));
MaxPressure = max(MaxPressure, CurPressure);
}		}

bool GCNDownwardRPTracker::reset(const MachineInstr &MI,		bool GCNDownwardRPTracker::reset(const MachineInstr &MI,
		rampitecUnsubmitted Not Done Reply Inline Actions Why not to do it right above, where you assign getAll()? It seems to be less work. rampitec: Why not to do it right above, where you assign getAll()? It seems to be less work.
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions This avoids getLiveLaneMask call more than once for a register vpykhtin: This avoids getLiveLaneMask call more than once for a register
		rampitecUnsubmitted Not Done Reply Inline Actions Do you see a common situation where the same register is used in the same instruction more than once? This sounds quite exotic to me, provided we are speaking about uses only, not defs. rampitec: Do you see a common situation where the same register is used in the same instruction more than…
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions I agree its going to be very rare ocasion, but the main purpose of this class is to deduplicate register def/uses so it wouldn't be counted twice when calculating pressure. So if I deduplicated registers already then I can save some time on calculating mask. vpykhtin: I agree its going to be very rare ocasion, but the main purpose of this class is to deduplicate…
		rampitecUnsubmitted Not Done Reply Inline Actions I mean, if that is extremely rare you may lose more in the inexpensive but way more often called second loop. rampitec: I mean, if that is extremely rare you may lose more in the inexpensive but way more often…
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Ok, I'll make getDefRegMask and getUsedRegMask static and reuse it here removing bottom loop, thanks. vpykhtin: Ok, I'll make getDefRegMask and getUsedRegMask static and reuse it here removing bottom loop…
const LiveRegSet *LiveRegsCopy) {		const LiveRegSet *LiveRegsCopy) {
MRI = &MI.getParent()->getParent()->getRegInfo();		MRI = &MI.getParent()->getParent()->getRegInfo();
LastTrackedMI = nullptr;		LastTrackedMI = nullptr;
MBBEnd = MI.getParent()->end();		MBBEnd = MI.getParent()->end();
NextMI = &MI;		NextMI = &MI;
NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);		NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
if (NextMI == MBBEnd)		if (NextMI == MBBEnd)
return false;		return false;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	void GCNDownwardRPTracker::advanceToNext() {
for (const auto &MO : LastTrackedMI->defs()) {		for (const auto &MO : LastTrackedMI->defs()) {
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (!TargetRegisterInfo::isVirtualRegister(Reg))		if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;		continue;
auto &LiveMask = LiveRegs[Reg];		auto &LiveMask = LiveRegs[Reg];
auto PrevMask = LiveMask;		auto PrevMask = LiveMask;
LiveMask \|= getDefRegMask(MO);		LiveMask \|= getDefRegMask(MO, *MRI);
CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);		CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);
}		}

MaxPressure = max(MaxPressure, CurPressure);		MaxPressure = max(MaxPressure, CurPressure);
}		}

bool GCNDownwardRPTracker::advance() {		bool GCNDownwardRPTracker::advance() {
// If we have just called reset live set is actual.		// If we have just called reset live set is actual.
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (I == TrackedLR.end()) {
<< " isn't found in tracked set\n";		<< " isn't found in tracked set\n";
}		}
}		}
}		}

bool GCNUpwardRPTracker::isValid() const {		bool GCNUpwardRPTracker::isValid() const {
const auto &SI = LIS.getInstructionIndex(*LastTrackedMI).getBaseIndex();		const auto &SI = LIS.getInstructionIndex(*LastTrackedMI).getBaseIndex();
const auto LISLR = llvm::getLiveRegs(SI, LIS, *MRI);		const auto LISLR = llvm::getLiveRegs(SI, LIS, *MRI);
const auto TrackedLR = stripEmpty(LiveRegs);		const auto &TrackedLR = LiveRegs;

if (!isEqual(LISLR, TrackedLR)) {		if (!isEqual(LISLR, TrackedLR)) {
dbgs() << "\nGCNUpwardRPTracker error: Tracked and"		dbgs() << "\nGCNUpwardRPTracker error: Tracked and"
" LIS reported livesets mismatch:\n";		" LIS reported livesets mismatch:\n";
printLivesAt(SI, LIS, *MRI);		printLivesAt(SI, LIS, *MRI);
reportMismatch(LISLR, TrackedLR, MRI->getTargetRegisterInfo());		reportMismatch(LISLR, TrackedLR, MRI->getTargetRegisterInfo());
return false;		return false;
}		}
Show All 25 Lines