This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
1/22
GCNRegPressure.cpp

Differential D33289

[AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker
ClosedPublic

Authored by vpykhtin on May 17 2017, 10:25 AM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm

Commits

rG74cb9c88314a: [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker
rL303548: [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker

Summary

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

After several attempts to fix it with fewer lines of code I decided it would be easier to introduce MachineInstrRegs class which collects register def/uses for a machine instruction with theirs linemasks. This class does job similar to standard llvm's RegisterOperands class but much smaller.

Trying to figure out a test for this.

Diff Detail

Build Status

Buildable 6516
Build 6516: arc lint + arc unit

Event Timeline

vpykhtin created this revision.May 17 2017, 10:25 AM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptMay 17 2017, 10:25 AM

Where is GCNUpwardRPTracker::reset() now? What is the base revision this diff is taken against? The GCNUpwardRPTracker::reset() is right before recede() in the current revision, but I do not see it in the left pane as well. Same with some other functions like getDefRegMask.

lib/Target/AMDGPU/GCNRegPressure.cpp
295	You have GCNRPTracker::getDefRegMask() for this.
300	And there is getUsedRegMask() for this too.
311	Why not to do it right above, where you assign getAll()? It seems to be less work.
346	Do not you want to also erase it from LiveRegs if LaneMask.none()?

rampitec added inline comments.May 17 2017, 2:51 PM

lib/Target/AMDGPU/GCNRegPressure.cpp
346	You are now updating MaxPressure correctly, but your CurPressure is incorrect. At this point defs still contribute to the current pressure, they will be out only with the next recede. That may lead scheduler to wrong decisions about a current instruction. You can also run into a paradoxical situation that no single instruction has a pressure equal to max. I still believe that in the situation where we want to account for both defs and uses contributing to the RP of an instruction we naturally have a two step advance/recede process - step before/past the instruction and actually step to it.

vpykhtin added inline comments.May 18 2017, 3:15 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
295	This is another class, though getDefRegMask can be made nonmember
300	I'm avoiding getLiveLaneMask call here before all registers collected
311	This avoids getLiveLaneMask call more than once for a register
346	Why to erase? Actually recede moves from the point after the instruction to the point before the instruction in top-down order accounting max pressure interim. recede never stops at the instruction.

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

In D33289#758351, @vpykhtin wrote:

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

I see it now, thanks. Can you move it higher please?

lib/Target/AMDGPU/GCNRegPressure.cpp
295	Probably static inline is just right for it. Even for getUsedRegMask if you pass LIS to it.
311	Do you see a common situation where the same register is used in the same instruction more than once? This sounds quite exotic to me, provided we are speaking about uses only, not defs.
346	If you erase you will not iterate over it getRegPressure(). I understand your point that recede does not stop in steps, but I'm still concerned that you will not get a correct CurPressure, or even will not get CurPressure equal to max pressure anywhere in the region. How about that?

In D33289#758647, @rampitec wrote:

In D33289#758351, @vpykhtin wrote:

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

I see it now, thanks. Can you move it higher please?

Yes I'll fix that.

lib/Target/AMDGPU/GCNRegPressure.cpp
311	I agree its going to be very rare ocasion, but the main purpose of this class is to deduplicate register def/uses so it wouldn't be counted twice when calculating pressure. So if I deduplicated registers already then I can save some time on calculating mask.
346	Ok, now that live regs can be reused it may have the point to clear a register immediately. Previously I used stripEmpty on the set but only for debug printing purposes. CurPressure isn't calculated for the at-the-instruction level, its calculated for the after recede point. I put an assert that CurPressure calculated correctly in the end of recede. CurPressure can never become MaxPressure, but I don't see a problem here. There is no at-the-instruction position in the tracker - it is always in between.

rampitec added inline comments.May 18 2017, 10:30 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
295	Actually getUsedRegMask is unused now, so you can just delete it.
311	I mean, if that is extremely rare you may lose more in the inexpensive but way more often called second loop.

vpykhtin added inline comments.May 18 2017, 10:37 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
311	Ok, I'll make getDefRegMask and getUsedRegMask static and reuse it here removing bottom loop, thanks.

rampitec added inline comments.May 19 2017, 2:21 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
269	It looks like defs are not really needed in this class. Uses needed because you walk them twice, but defs can be just directly processed. I.e. the code can be simplified and overhead somewhat reduced.
346	I see it as a sort of quantum tracker. It hides the intermediate step where pressure actually peaks from an observer. As long as we agree on that understanding I have no objection on submitting such a tracker where actual pressure "tunnels" through recede method as we have no actual interested observers currently. We might want to split the method in the future if we have them.

rampitec mentioned this in D33087: [AMDGCN] Fix overly optimistic GCNUpwardRPTracker.May 19 2017, 2:25 AM

fixed as per comments

LGTM. Thanks!

This revision is now accepted and ready to land.May 19 2017, 9:21 AM

rampitec added inline comments.May 19 2017, 9:43 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
343	Are you sure it should always present? What if we have a dead def? I.e. an instruction defines a register which is never used. I guess it will not be reported by LIS. If so this should be if (I != LiveRegs.end()) continue;

rampitec added inline comments.May 19 2017, 9:44 AM

lib/Target/AMDGPU/GCNRegPressure.cpp
343	Sorry, if (I == LiveRegs.end()) continue;

Closed by commit rL303548: [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker (authored by vpykhtin). · Explain WhyMay 22 2017, 6:09 AM

This revision was automatically updated to reflect the committed changes.

vpykhtin marked an inline comment as done.May 22 2017, 6:11 AM

vpykhtin added inline comments.

lib/Target/AMDGPU/GCNRegPressure.cpp
343	Done, thanks!

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 16 2023, 6:46 AM

Herald added subscribers: llvm-commits, kerbowa, jvesely. · View Herald Transcript

In D33289#4654033, @foad wrote:

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Would it make sense to only do the AtMIPressure part for instructions with early clobber?

In D33289#4654038, @piotr wrote:

In D33289#4654033, @foad wrote:

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Would it make sense to only do the AtMIPressure part for instructions with early clobber?

You're right, we should not increment pressure for early-clobbers twice in AtMIPressure

You're right, we should not increment pressure for early-clobbers twice in AtMIPressure

Sorry, disregard this comment. We should only account for early-clobbers this way.

Is anyone working on this at the moment?

In D33289#4654317, @vpykhtin wrote:

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

In D33289#4654324, @foad wrote:

In D33289#4654317, @vpykhtin wrote:

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

I can probably come up with a unit test in the way similar to how we test LiveIntervals and LiveVariables.

In D33289#4654325, @vpykhtin wrote:

In D33289#4654324, @foad wrote:

In D33289#4654317, @vpykhtin wrote:

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

I can probably come up with a unit test in the way similar to how we test LiveIntervals and LiveVariables.

I wonder if we can test it more directly by adding an analysis pass like this: https://github.com/GPUOpen-Drivers/llvm-project/commit/042be23e3d98963fb02833511a86f4e26378a04d
and then using something like opt -passes='print<amdgpu-reg-press>'.

I wonder if we can test it more directly by adding an analysis pass like this: https://github.com/GPUOpen-Drivers/llvm-project/commit/042be23e3d98963fb02833511a86f4e26378a04d
and then using something like opt -passes='print<amdgpu-reg-press>'.

We can try to print reg pressure at every instruction

Something like this? (don't forget to expand *.mir file diff, it's not shown by default)

https://github.com/llvm/llvm-project/compare/main...vpykhtin:llvm-project:rp_printer

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

GCNRegPressure.cpp

98 lines

Diff 99324

lib/Target/AMDGPU/GCNRegPressure.cpp

//===------------------------- GCNRegPressure.cpp - -----------------------===//		//===------------------------- GCNRegPressure.cpp - -----------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "GCNRegPressure.h"		#include "GCNRegPressure.h"
		#include "llvm/CodeGen/RegisterPressure.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "misched"		#define DEBUG_TYPE "misched"

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
void llvm::printLivesAt(SlotIndex SI,		void llvm::printLivesAt(SlotIndex SI,
▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	if (LiveRegsCopy) {
if (&LiveRegs != LiveRegsCopy)		if (&LiveRegs != LiveRegsCopy)
LiveRegs = *LiveRegsCopy;		LiveRegs = *LiveRegsCopy;
} else {		} else {
LiveRegs = getLiveRegsAfter(MI, LIS);		LiveRegs = getLiveRegsAfter(MI, LIS);
}		}
MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);		MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);
}		}

		struct MachineInstrRegs {
		SmallVector<RegisterMaskPair, 4> Defs;
		rampitecUnsubmitted Not Done Reply Inline Actions It looks like defs are not really needed in this class. Uses needed because you walk them twice, but defs can be just directly processed. I.e. the code can be simplified and overhead somewhat reduced. rampitec: It looks like defs are not really needed in this class. Uses needed because you walk them twice…
		SmallVector<RegisterMaskPair, 8> Uses;

		private:
		static RegisterMaskPair& insert(SmallVectorImpl<RegisterMaskPair> &A,
		unsigned Reg) {
		auto I = std::find_if(A.begin(), A.end(), [Reg](const RegisterMaskPair &RM) {
		return RM.RegUnit == Reg;
		});
		if (I != A.end())
		return *I;
		A.push_back(RegisterMaskPair(Reg, LaneBitmask::getNone()));
		return A.back();
		}

		public:
		static MachineInstrRegs collectVirtualRegs(const MachineInstr &MI,
		const LiveIntervals &LIS,
		const MachineRegisterInfo &MRI) {
		MachineInstrRegs Res;
		for (const auto &MO : MI.operands()) {
		if (!MO.isReg() \|\| !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
		continue;

		if (MO.isDef() && !MO.isDead()) {
		auto &LaneMask = insert(Res.Defs, MO.getReg()).LaneMask;
		LaneMask \|= MO.getSubReg() == 0 ?
		rampitecUnsubmitted Not Done Reply Inline Actions You have GCNRPTracker::getDefRegMask() for this. rampitec: You have GCNRPTracker::getDefRegMask() for this.
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions This is another class, though getDefRegMask can be made nonmember vpykhtin: This is another class, though getDefRegMask can be made nonmember
		rampitecUnsubmitted Not Done Reply Inline Actions Probably static inline is just right for it. Even for getUsedRegMask if you pass LIS to it. rampitec: Probably static inline is just right for it. Even for getUsedRegMask if you pass LIS to it.
		rampitecUnsubmitted Not Done Reply Inline Actions Actually getUsedRegMask is unused now, so you can just delete it. rampitec: Actually getUsedRegMask is unused now, so you can just delete it.
		MRI.getMaxLaneMaskForVReg(MO.getReg()) :
		MRI.getTargetRegisterInfo()->getSubRegIndexLaneMask(MO.getSubReg());
		} else
		if (MO.isUse() && MO.readsReg()) {
		auto &LaneMask = insert(Res.Uses, MO.getReg()).LaneMask;
		rampitecUnsubmitted Not Done Reply Inline Actions And there is getUsedRegMask() for this too. rampitec: And there is getUsedRegMask() for this too.
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions I'm avoiding getLiveLaneMask call here before all registers collected vpykhtin: I'm avoiding getLiveLaneMask call here before all registers collected
		auto const MaxMask = MRI.getMaxLaneMaskForVReg(MO.getReg());
		if (MaxMask.getAsInteger() == 1) // cannot have subregs
		LaneMask = MaxMask;
		else if (auto SubReg = MO.getSubReg())
		LaneMask \|= MRI.getTargetRegisterInfo()->getSubRegIndexLaneMask(SubReg);
		else
		LaneMask = LaneBitmask::getAll(); // check actual usage mask later (once)
		}
		}
		// adjust correct usage mask using LIS
		for (auto &U : Res.Uses) {
		rampitecUnsubmitted Not Done Reply Inline Actions Why not to do it right above, where you assign getAll()? It seems to be less work. rampitec: Why not to do it right above, where you assign getAll()? It seems to be less work.
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions This avoids getLiveLaneMask call more than once for a register vpykhtin: This avoids getLiveLaneMask call more than once for a register
		rampitecUnsubmitted Not Done Reply Inline Actions Do you see a common situation where the same register is used in the same instruction more than once? This sounds quite exotic to me, provided we are speaking about uses only, not defs. rampitec: Do you see a common situation where the same register is used in the same instruction more than…
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions I agree its going to be very rare ocasion, but the main purpose of this class is to deduplicate register def/uses so it wouldn't be counted twice when calculating pressure. So if I deduplicated registers already then I can save some time on calculating mask. vpykhtin: I agree its going to be very rare ocasion, but the main purpose of this class is to deduplicate…
		rampitecUnsubmitted Not Done Reply Inline Actions I mean, if that is extremely rare you may lose more in the inexpensive but way more often called second loop. rampitec: I mean, if that is extremely rare you may lose more in the inexpensive but way more often…
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Ok, I'll make getDefRegMask and getUsedRegMask static and reuse it here removing bottom loop, thanks. vpykhtin: Ok, I'll make getDefRegMask and getUsedRegMask static and reuse it here removing bottom loop…
		if (!U.LaneMask.all()) continue;
		// For a tentative schedule LIS isn't updated yet but livemask should remain
		// the same on any schedule. Subreg defs can be reordered but they all must
		// dominate uses anyway.
		auto SI = LIS.getInstructionIndex(MI).getBaseIndex();
		U.LaneMask = getLiveLaneMask(U.RegUnit, SI, LIS, MRI);
		}
		return Res;
		}
		};

void GCNUpwardRPTracker::recede(const MachineInstr &MI) {		void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
assert(MRI && "call reset first");		assert(MRI && "call reset first");

LastTrackedMI = &MI;		LastTrackedMI = &MI;

if (MI.isDebugValue())		if (MI.isDebugValue())
return;		return;

// process all defs first to ensure early clobbers are handled correctly		auto const Regs = MachineInstrRegs::collectVirtualRegs(MI, LIS, *MRI);
// iterating over operands() to catch implicit defs
for (const auto &MO : MI.operands()) {
if (!MO.isReg() \|\| !MO.isDef() \|\|
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
continue;

auto Reg = MO.getReg();		// calc pressure at the MI (defs + uses)
auto &LiveMask = LiveRegs[Reg];		auto AtMIPressure = CurPressure;
auto PrevMask = LiveMask;		for (const auto &U : Regs.Uses) {
LiveMask &= ~getDefRegMask(MO);		auto LiveMask = LiveRegs[U.RegUnit];
CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);		AtMIPressure.inc(U.RegUnit, LiveMask, LiveMask \| U.LaneMask, *MRI);
}		}
		// update max pressure
		MaxPressure = max(AtMIPressure, MaxPressure);

// then all uses		for (const auto &D : Regs.Defs) {
for (const auto &MO : MI.uses()) {		auto &LiveMask = LiveRegs[D.RegUnit];
		rampitecUnsubmitted Not Done Reply Inline Actions Are you sure it should always present? What if we have a dead def? I.e. an instruction defines a register which is never used. I guess it will not be reported by LIS. If so this should be if (I != LiveRegs.end()) continue; rampitec: Are you sure it should always present? What if we have a dead def? I.e. an instruction defines…
		rampitecUnsubmitted Done Reply Inline Actions Sorry, if (I == LiveRegs.end()) continue; rampitec: Sorry, if (I == LiveRegs.end()) continue;
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Done, thanks! vpykhtin: Done, thanks!
if (!MO.isReg() \|\| !MO.readsReg() \|\|
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
continue;

auto Reg = MO.getReg();
auto &LiveMask = LiveRegs[Reg];
auto PrevMask = LiveMask;		auto PrevMask = LiveMask;
LiveMask \|= getUsedRegMask(MO);		LiveMask &= ~D.LaneMask;
CurPressure.inc(Reg, PrevMask, LiveMask, *MRI);		CurPressure.inc(D.RegUnit, PrevMask, LiveMask, *MRI);
		rampitecUnsubmitted Not Done Reply Inline Actions Do not you want to also erase it from LiveRegs if LaneMask.none()? rampitec: Do not you want to also erase it from LiveRegs if LaneMask.none()?
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Why to erase? Actually recede moves from the point after the instruction to the point before the instruction in top-down order accounting max pressure interim. recede never stops at the instruction. vpykhtin: Why to erase? Actually recede moves from the point after the instruction to the point before…
		rampitecUnsubmitted Not Done Reply Inline Actions If you erase you will not iterate over it getRegPressure(). I understand your point that recede does not stop in steps, but I'm still concerned that you will not get a correct CurPressure, or even will not get CurPressure equal to max pressure anywhere in the region. How about that? rampitec: If you erase you will not iterate over it getRegPressure(). I understand your point that…
		vpykhtinAuthorUnsubmitted Not Done Reply Inline Actions Ok, now that live regs can be reused it may have the point to clear a register immediately. Previously I used stripEmpty on the set but only for debug printing purposes. CurPressure isn't calculated for the at-the-instruction level, its calculated for the after recede point. I put an assert that CurPressure calculated correctly in the end of recede. CurPressure can never become MaxPressure, but I don't see a problem here. There is no at-the-instruction position in the tracker - it is always in between. vpykhtin: Ok, now that live regs can be reused it may have the point to clear a register immediately.
		rampitecUnsubmitted Not Done Reply Inline Actions You are now updating MaxPressure correctly, but your CurPressure is incorrect. At this point defs still contribute to the current pressure, they will be out only with the next recede. That may lead scheduler to wrong decisions about a current instruction. You can also run into a paradoxical situation that no single instruction has a pressure equal to max. I still believe that in the situation where we want to account for both defs and uses contributing to the RP of an instruction we naturally have a two step advance/recede process - step before/past the instruction and actually step to it. rampitec: You are now updating MaxPressure correctly, but your CurPressure is incorrect. At this point…
		rampitecUnsubmitted Not Done Reply Inline Actions I see it as a sort of quantum tracker. It hides the intermediate step where pressure actually peaks from an observer. As long as we agree on that understanding I have no objection on submitting such a tracker where actual pressure "tunnels" through recede method as we have no actual interested observers currently. We might want to split the method in the future if we have them. rampitec: I see it as a sort of quantum tracker. It hides the intermediate step where pressure actually…
}		}
		for (const auto &U : Regs.Uses) {
MaxPressure = max(MaxPressure, CurPressure);		auto &LiveMask = LiveRegs[U.RegUnit];
		auto PrevMask = LiveMask;
		LiveMask \|= U.LaneMask;
		CurPressure.inc(U.RegUnit, PrevMask, LiveMask, *MRI);
		}
		assert(CurPressure == getRegPressure(*MRI, LiveRegs));
}		}

bool GCNDownwardRPTracker::reset(const MachineInstr &MI,		bool GCNDownwardRPTracker::reset(const MachineInstr &MI,
const LiveRegSet *LiveRegsCopy) {		const LiveRegSet *LiveRegsCopy) {
MRI = &MI.getParent()->getParent()->getRegInfo();		MRI = &MI.getParent()->getParent()->getRegInfo();
LastTrackedMI = nullptr;		LastTrackedMI = nullptr;
MBBEnd = MI.getParent()->end();		MBBEnd = MI.getParent()->end();
NextMI = &MI;		NextMI = &MI;
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines