This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixed hazard recognizer to walk predecessors
ClosedPublic

Authored by rampitec on Jan 18 2019, 11:09 AM.

Download Raw Diff

Details

Reviewers

critson
nhaehnle
msearles
vpykhtin
arsenm

Commits

rGf92ed6966eb8: [AMDGPU] Fixed hazard recognizer to walk predecessors
rL351759: [AMDGPU] Fixed hazard recognizer to walk predecessors

Summary

Fixes two problems with GCNHazardRecognizer:

It only scans up to 5 instructions emitted earlier.
It does not take control flow into account. An earlier instruction

from the previous basic block is not necessarily a predecessor.
At the same time a real predecessor block is not scanned.

The patch provides a way to distinguish between scheduler and
hazard recognizer mode. It is OK to work with emitted instructions
in the scheduler because we do not really know what will be emitted
later and its order. However, when pass works as a hazard recognizer
the schedule is already finalized, and we have full access to the
instructions for the whole function, so we can properly traverse
predecessors and their instructions.

Diff Detail

Event Timeline

rampitec created this revision.Jan 18 2019, 11:09 AM

Herald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptJan 18 2019, 11:09 AM

arsenm added inline comments.Jan 18 2019, 12:16 PM

lib/Target/AMDGPU/GCNHazardRecognizer.cpp
288	llvm:: not necessary
295–296	isInlineAsm/isImplicitDef
334	These are used a bunch of places, so could use a typedef
lib/Target/AMDGPU/GCNHazardRecognizer.h
34–35	I'm surprised this is necessary. To clarify did you somehow only see these problems with -O0? As far as I know we don't run the standalone hazard recognizer when the scheduler is in use, so you should end up with the same issues either way.

rampitec marked an inline comment as done.Jan 18 2019, 12:20 PM

rampitec added inline comments.

lib/Target/AMDGPU/GCNHazardRecognizer.h
34–35	The problem exists regardless of the optimization level. We do run standalone recognizer even when scheduler is in use. We add it from GCNPassConfig::addPreEmitPass(). In fact that is wrong to call it regardless of the scheduler, as scheduler sends regions in a bottom-up order. To ensure all hazards are correctly checked a target must run standalone recognizer, there is even a comment there about it.

Addressed review comments.

rampitec marked an inline comment as done.Jan 18 2019, 12:38 PM

rampitec marked an inline comment as done.Jan 19 2019, 12:17 AM

rampitec added inline comments.

lib/Target/AMDGPU/GCNHazardRecognizer.h
34–35	It is wrong NOT to call it of course.

LGTM

This revision is now accepted and ready to land.Jan 19 2019, 2:27 PM

LGTM, thanks!

Closed by commit rL351759: [AMDGPU] Fixed hazard recognizer to walk predecessors (authored by rampitec). · Explain WhyJan 21 2019, 11:11 AM

This revision was automatically updated to reflect the committed changes.

foad added a subscriber: foad.Apr 27 2021, 9:09 AM

foad added inline comments.

llvm/trunk/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
317 ↗	(On Diff #182814)	I think this early return is broken, because there might be another predecessor that has a smaller value of MinWaitStates which would not satisfy the IsExpired test. Do you agree? (Sorry for reopening such an old review.)

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2021, 9:09 AM

Herald added a subscriber: kerbowa. · View Herald Transcript

rampitec added inline comments.Apr 27 2021, 11:22 AM

llvm/trunk/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
317 ↗	(On Diff #182814)	Hm... Probably. Did you try to exploit it?

critson added inline comments.Apr 27 2021, 7:27 PM

llvm/trunk/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
317 ↗	(On Diff #182814)	I do not think it is broken per-say but definitely confusing. This allows the IsExpired function to trigger an early exit given a specific waitstate count. The early-exit probe is detectable as it occurs with the MachineInstr* set to null. And hence avoidable by return false if !MI. Looking at all the IsExpired implementations, none of them use this functionality (or at least not correctly). So I'll prepare a patch to remove it and tidy up.

critson added inline comments.Apr 28 2021, 1:10 AM

llvm/trunk/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
317 ↗	(On Diff #182814)	See D101430.

foad mentioned this in D101430: [AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn.Apr 28 2021, 1:21 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

GCNHazardRecognizer.h

16 lines

GCNHazardRecognizer.cpp

138 lines

SIInstrInfo.h

2 lines

SIInstrInfo.cpp

2 lines

test/

CodeGen/

AMDGPU/

vmem-vcc-hazard.mir

230 lines

Diff 182587

lib/Target/AMDGPU/GCNHazardRecognizer.h

Show All 25 Lines
class MachineOperand;		class MachineOperand;
class MachineRegisterInfo;		class MachineRegisterInfo;
class ScheduleDAG;		class ScheduleDAG;
class SIInstrInfo;		class SIInstrInfo;
class SIRegisterInfo;		class SIRegisterInfo;
class GCNSubtarget;		class GCNSubtarget;

class GCNHazardRecognizer final : public ScheduleHazardRecognizer {		class GCNHazardRecognizer final : public ScheduleHazardRecognizer {
		public:
		typedef function_ref<bool(MachineInstr *)> IsHazardFn;
		arsenmUnsubmitted Done Reply Inline Actions I'm surprised this is necessary. To clarify did you somehow only see these problems with -O0? As far as I know we don't run the standalone hazard recognizer when the scheduler is in use, so you should end up with the same issues either way. arsenm: I'm surprised this is necessary. To clarify did you somehow only see these problems with -O0?
		rampitecAuthorUnsubmitted Done Reply Inline Actions The problem exists regardless of the optimization level. We do run standalone recognizer even when scheduler is in use. We add it from GCNPassConfig::addPreEmitPass(). In fact that is wrong to call it regardless of the scheduler, as scheduler sends regions in a bottom-up order. To ensure all hazards are correctly checked a target must run standalone recognizer, there is even a comment there about it. rampitec: The problem exists regardless of the optimization level. We do run standalone recognizer even…
		rampitecAuthorUnsubmitted Done Reply Inline Actions It is wrong NOT to call it of course. rampitec: It is wrong NOT to call it of course.

		private:
		// Distinguish if we are called from scheduler or hazard recognizer
		bool IsHazardRecognizerMode;

// This variable stores the instruction that has been emitted this cycle. It		// This variable stores the instruction that has been emitted this cycle. It
// will be added to EmittedInstrs, when AdvanceCycle() or RecedeCycle() is		// will be added to EmittedInstrs, when AdvanceCycle() or RecedeCycle() is
// called.		// called.
MachineInstr *CurrCycleInstr;		MachineInstr *CurrCycleInstr;
std::list<MachineInstr*> EmittedInstrs;		std::list<MachineInstr*> EmittedInstrs;
const MachineFunction &MF;		const MachineFunction &MF;
const GCNSubtarget &ST;		const GCNSubtarget &ST;
const SIInstrInfo &TII;		const SIInstrInfo &TII;
const SIRegisterInfo &TRI;		const SIRegisterInfo &TRI;

/// RegUnits of uses in the current soft memory clause.		/// RegUnits of uses in the current soft memory clause.
BitVector ClauseUses;		BitVector ClauseUses;

/// RegUnits of defs in the current soft memory clause.		/// RegUnits of defs in the current soft memory clause.
BitVector ClauseDefs;		BitVector ClauseDefs;

void resetClause() {		void resetClause() {
ClauseUses.reset();		ClauseUses.reset();
ClauseDefs.reset();		ClauseDefs.reset();
}		}

void addClauseInst(const MachineInstr &MI);		void addClauseInst(const MachineInstr &MI);

int getWaitStatesSince(function_ref<bool(MachineInstr *)> IsHazard);		int getWaitStatesSince(IsHazardFn IsHazard, int Limit);
int getWaitStatesSinceDef(unsigned Reg,		int getWaitStatesSinceDef(unsigned Reg, IsHazardFn IsHazardDef, int Limit);
function_ref<bool(MachineInstr *)> IsHazardDef =		int getWaitStatesSinceSetReg(IsHazardFn IsHazard, int Limit);
[](MachineInstr *) { return true; });
int getWaitStatesSinceSetReg(function_ref<bool(MachineInstr *)> IsHazard);

int checkSoftClauseHazards(MachineInstr *SMEM);		int checkSoftClauseHazards(MachineInstr *SMEM);
int checkSMRDHazards(MachineInstr *SMRD);		int checkSMRDHazards(MachineInstr *SMRD);
int checkVMEMHazards(MachineInstr* VMEM);		int checkVMEMHazards(MachineInstr* VMEM);
int checkDPPHazards(MachineInstr *DPP);		int checkDPPHazards(MachineInstr *DPP);
int checkDivFMasHazards(MachineInstr *DivFMas);		int checkDivFMasHazards(MachineInstr *DivFMas);
int checkGetRegHazards(MachineInstr *GetRegInstr);		int checkGetRegHazards(MachineInstr *GetRegInstr);
int checkSetRegHazards(MachineInstr *SetRegInstr);		int checkSetRegHazards(MachineInstr *SetRegInstr);
Show All 10 Lines	public:
// We can only issue one instruction per cycle.		// We can only issue one instruction per cycle.
bool atIssueLimit() const override { return true; }		bool atIssueLimit() const override { return true; }
void EmitInstruction(SUnit *SU) override;		void EmitInstruction(SUnit *SU) override;
void EmitInstruction(MachineInstr *MI) override;		void EmitInstruction(MachineInstr *MI) override;
HazardType getHazardType(SUnit *SU, int Stalls) override;		HazardType getHazardType(SUnit *SU, int Stalls) override;
void EmitNoop() override;		void EmitNoop() override;
unsigned PreEmitNoops(SUnit *SU) override;		unsigned PreEmitNoops(SUnit *SU) override;
unsigned PreEmitNoops(MachineInstr *) override;		unsigned PreEmitNoops(MachineInstr *) override;
		unsigned PreEmitNoopsCommon(MachineInstr *);
void AdvanceCycle() override;		void AdvanceCycle() override;
void RecedeCycle() override;		void RecedeCycle() override;
};		};

} // end namespace llvm		} // end namespace llvm

#endif //LLVM_LIB_TARGET_AMDGPUHAZARDRECOGNIZERS_H		#endif //LLVM_LIB_TARGET_AMDGPUHAZARDRECOGNIZERS_H

lib/Target/AMDGPU/GCNHazardRecognizer.cpp

Show All 32 Lines

using namespace llvm;		using namespace llvm;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Hazard Recoginizer Implementation		// Hazard Recoginizer Implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

GCNHazardRecognizer::GCNHazardRecognizer(const MachineFunction &MF) :		GCNHazardRecognizer::GCNHazardRecognizer(const MachineFunction &MF) :
		IsHazardRecognizerMode(false),
CurrCycleInstr(nullptr),		CurrCycleInstr(nullptr),
MF(MF),		MF(MF),
ST(MF.getSubtarget<GCNSubtarget>()),		ST(MF.getSubtarget<GCNSubtarget>()),
TII(*ST.getInstrInfo()),		TII(*ST.getInstrInfo()),
TRI(TII.getRegisterInfo()),		TRI(TII.getRegisterInfo()),
ClauseUses(TRI.getNumRegUnits()),		ClauseUses(TRI.getNumRegUnits()),
ClauseDefs(TRI.getNumRegUnits()) {		ClauseDefs(TRI.getNumRegUnits()) {
MaxLookAhead = 5;		MaxLookAhead = 5;
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	GCNHazardRecognizer::getHazardType(SUnit *SU, int Stalls) {

if (checkAnyInstHazards(MI) > 0)		if (checkAnyInstHazards(MI) > 0)
return NoopHazard;		return NoopHazard;

return NoHazard;		return NoHazard;
}		}

unsigned GCNHazardRecognizer::PreEmitNoops(SUnit *SU) {		unsigned GCNHazardRecognizer::PreEmitNoops(SUnit *SU) {
return PreEmitNoops(SU->getInstr());		IsHazardRecognizerMode = false;
		return PreEmitNoopsCommon(SU->getInstr());
}		}

unsigned GCNHazardRecognizer::PreEmitNoops(MachineInstr *MI) {		unsigned GCNHazardRecognizer::PreEmitNoops(MachineInstr *MI) {
		IsHazardRecognizerMode = true;
		CurrCycleInstr = MI;
		unsigned W = PreEmitNoopsCommon(MI);
		CurrCycleInstr = nullptr;
		return W;
		}

		unsigned GCNHazardRecognizer::PreEmitNoopsCommon(MachineInstr *MI) {
int WaitStates = std::max(0, checkAnyInstHazards(MI));		int WaitStates = std::max(0, checkAnyInstHazards(MI));

if (SIInstrInfo::isSMRD(*MI))		if (SIInstrInfo::isSMRD(*MI))
return std::max(WaitStates, checkSMRDHazards(MI));		return std::max(WaitStates, checkSMRDHazards(MI));

if (SIInstrInfo::isVALU(*MI))		if (SIInstrInfo::isVALU(*MI))
WaitStates = std::max(WaitStates, checkVALUHazards(MI));		WaitStates = std::max(WaitStates, checkVALUHazards(MI));

Show All 39 Lines	void GCNHazardRecognizer::AdvanceCycle() {
// When the scheduler detects a stall, it will call AdvanceCycle() without		// When the scheduler detects a stall, it will call AdvanceCycle() without
// emitting any instructions.		// emitting any instructions.
if (!CurrCycleInstr)		if (!CurrCycleInstr)
return;		return;

// Do not track non-instructions which do not affect the wait states.		// Do not track non-instructions which do not affect the wait states.
// If included, these instructions can lead to buffer overflow such that		// If included, these instructions can lead to buffer overflow such that
// detectable hazards are missed.		// detectable hazards are missed.
if (CurrCycleInstr->getOpcode() == AMDGPU::IMPLICIT_DEF)		if (CurrCycleInstr->isImplicitDef())
return;		return;
else if (CurrCycleInstr->isDebugInstr())		else if (CurrCycleInstr->isDebugInstr())
return;		return;

unsigned NumWaitStates = TII.getNumWaitStates(*CurrCycleInstr);		unsigned NumWaitStates = TII.getNumWaitStates(*CurrCycleInstr);

// Keep track of emitted instructions		// Keep track of emitted instructions
EmittedInstrs.push_front(CurrCycleInstr);		EmittedInstrs.push_front(CurrCycleInstr);
Show All 17 Lines
void GCNHazardRecognizer::RecedeCycle() {		void GCNHazardRecognizer::RecedeCycle() {
llvm_unreachable("hazard recognizer does not support bottom-up scheduling.");		llvm_unreachable("hazard recognizer does not support bottom-up scheduling.");
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper Functions		// Helper Functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

int GCNHazardRecognizer::getWaitStatesSince(		typedef function_ref<bool(MachineInstr *, int WaitStates)> IsExpiredFn;
function_ref<bool(MachineInstr *)> IsHazard) {
		// Returns a minimum wait states since \p I walking all predecessors.
		// Only scans until \p IsExpired does not return true.
		// Can only be run in a hazard recognizer mode.
		static int getWaitStatesSince(GCNHazardRecognizer::IsHazardFn IsHazard,
		MachineBasicBlock *MBB,
		MachineBasicBlock::reverse_instr_iterator I,
		int WaitStates,
		IsExpiredFn IsExpired,
		arsenmUnsubmitted Done Reply Inline Actions llvm:: not necessary arsenm: llvm:: not necessary
		DenseSet<const MachineBasicBlock *> &Visited) {

		for (auto E = MBB->rend() ; I != E; ++I) {
		if (IsHazard(&*I))
		return WaitStates;

		if (I->isInlineAsm() \|\| I->isImplicitDef() \|\| I->isDebugInstr())
		continue;
		arsenmUnsubmitted Done Reply Inline Actions isInlineAsm/isImplicitDef arsenm: isInlineAsm/isImplicitDef

		WaitStates += SIInstrInfo::getNumWaitStates(*I);

		if (IsExpired(&*I, WaitStates))
		return std::numeric_limits<int>::max();
		}

		int MinWaitStates = WaitStates;
		bool Found = false;
		for (MachineBasicBlock *Pred : MBB->predecessors()) {
		if (!Visited.insert(Pred).second)
		continue;

		int W = getWaitStatesSince(IsHazard, Pred, Pred->instr_rbegin(),
		WaitStates, IsExpired, Visited);

		if (W == std::numeric_limits<int>::max())
		continue;

		MinWaitStates = Found ? std::min(MinWaitStates, W) : W;
		if (IsExpired(nullptr, MinWaitStates))
		return MinWaitStates;

		Found = true;
		}

		if (Found)
		return MinWaitStates;

		return std::numeric_limits<int>::max();
		}

		static int getWaitStatesSince(GCNHazardRecognizer::IsHazardFn IsHazard,
		MachineInstr *MI,
		IsExpiredFn IsExpired) {
		DenseSet<const MachineBasicBlock *> Visited;
		return getWaitStatesSince(IsHazard, MI->getParent(),
		std::next(MI->getReverseIterator()),
		arsenmUnsubmitted Done Reply Inline Actions These are used a bunch of places, so could use a typedef arsenm: These are used a bunch of places, so could use a typedef
		0, IsExpired, Visited);
		}

		int GCNHazardRecognizer::getWaitStatesSince(IsHazardFn IsHazard, int Limit) {
		if (IsHazardRecognizerMode) {
		auto IsExpiredFn = [Limit] (MachineInstr *, int WaitStates) {
		return WaitStates >= Limit;
		};
		return ::getWaitStatesSince(IsHazard, CurrCycleInstr, IsExpiredFn);
		}

int WaitStates = 0;		int WaitStates = 0;
for (MachineInstr *MI : EmittedInstrs) {		for (MachineInstr *MI : EmittedInstrs) {
if (MI) {		if (MI) {
if (IsHazard(MI))		if (IsHazard(MI))
return WaitStates;		return WaitStates;

unsigned Opcode = MI->getOpcode();		if (MI->isInlineAsm())
if (Opcode == AMDGPU::INLINEASM)
continue;		continue;
}		}
++WaitStates;		++WaitStates;

		if (WaitStates >= Limit)
		break;
}		}
return std::numeric_limits<int>::max();		return std::numeric_limits<int>::max();
}		}

int GCNHazardRecognizer::getWaitStatesSinceDef(		int GCNHazardRecognizer::getWaitStatesSinceDef(unsigned Reg,
unsigned Reg, function_ref<bool(MachineInstr *)> IsHazardDef) {		IsHazardFn IsHazardDef,
		int Limit) {
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();

auto IsHazardFn = [IsHazardDef, TRI, Reg] (MachineInstr *MI) {		auto IsHazardFn = [IsHazardDef, TRI, Reg] (MachineInstr *MI) {
return IsHazardDef(MI) && MI->modifiesRegister(Reg, TRI);		return IsHazardDef(MI) && MI->modifiesRegister(Reg, TRI);
};		};

return getWaitStatesSince(IsHazardFn);		return getWaitStatesSince(IsHazardFn, Limit);
}		}

int GCNHazardRecognizer::getWaitStatesSinceSetReg(		int GCNHazardRecognizer::getWaitStatesSinceSetReg(IsHazardFn IsHazard,
function_ref<bool(MachineInstr *)> IsHazard) {		int Limit) {
auto IsHazardFn = [IsHazard] (MachineInstr *MI) {		auto IsHazardFn = [IsHazard] (MachineInstr *MI) {
return isSSetReg(MI->getOpcode()) && IsHazard(MI);		return isSSetReg(MI->getOpcode()) && IsHazard(MI);
};		};

return getWaitStatesSince(IsHazardFn);		return getWaitStatesSince(IsHazardFn, Limit);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// No-op Hazard Detection		// No-op Hazard Detection
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static void addRegUnits(const SIRegisterInfo &TRI,		static void addRegUnits(const SIRegisterInfo &TRI,
BitVector &BV, unsigned Reg) {		BitVector &BV, unsigned Reg) {
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	int GCNHazardRecognizer::checkSMRDHazards(MachineInstr *SMRD) {
auto IsBufferHazardDefFn = [this] (MachineInstr MI) { return TII.isSALU(MI); };		auto IsBufferHazardDefFn = [this] (MachineInstr MI) { return TII.isSALU(MI); };

bool IsBufferSMRD = TII.isBufferSMRD(*SMRD);		bool IsBufferSMRD = TII.isBufferSMRD(*SMRD);

for (const MachineOperand &Use : SMRD->uses()) {		for (const MachineOperand &Use : SMRD->uses()) {
if (!Use.isReg())		if (!Use.isReg())
continue;		continue;
int WaitStatesNeededForUse =		int WaitStatesNeededForUse =
SmrdSgprWaitStates - getWaitStatesSinceDef(Use.getReg(), IsHazardDefFn);		SmrdSgprWaitStates - getWaitStatesSinceDef(Use.getReg(), IsHazardDefFn,
		SmrdSgprWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);		WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);

// This fixes what appears to be undocumented hardware behavior in SI where		// This fixes what appears to be undocumented hardware behavior in SI where
// s_mov writing a descriptor and s_buffer_load_dword reading the descriptor		// s_mov writing a descriptor and s_buffer_load_dword reading the descriptor
// needs some number of nops in between. We don't know how many we need, but		// needs some number of nops in between. We don't know how many we need, but
// let's use 4. This wasn't discovered before probably because the only		// let's use 4. This wasn't discovered before probably because the only
// case when this happens is when we expand a 64-bit pointer into a full		// case when this happens is when we expand a 64-bit pointer into a full
// descriptor and use s_buffer_load_dword instead of s_load_dword, which was		// descriptor and use s_buffer_load_dword instead of s_load_dword, which was
// probably never encountered in the closed-source land.		// probably never encountered in the closed-source land.
if (IsBufferSMRD) {		if (IsBufferSMRD) {
int WaitStatesNeededForUse =		int WaitStatesNeededForUse =
SmrdSgprWaitStates - getWaitStatesSinceDef(Use.getReg(),		SmrdSgprWaitStates - getWaitStatesSinceDef(Use.getReg(),
IsBufferHazardDefFn);		IsBufferHazardDefFn,
		SmrdSgprWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);		WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);
}		}
}		}

return WaitStatesNeeded;		return WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkVMEMHazards(MachineInstr* VMEM) {		int GCNHazardRecognizer::checkVMEMHazards(MachineInstr* VMEM) {
if (ST.getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)		if (ST.getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)
return 0;		return 0;

int WaitStatesNeeded = checkSoftClauseHazards(VMEM);		int WaitStatesNeeded = checkSoftClauseHazards(VMEM);

// A read of an SGPR by a VMEM instruction requires 5 wait states when the		// A read of an SGPR by a VMEM instruction requires 5 wait states when the
// SGPR was written by a VALU Instruction.		// SGPR was written by a VALU Instruction.
const int VmemSgprWaitStates = 5;		const int VmemSgprWaitStates = 5;
auto IsHazardDefFn = [this] (MachineInstr MI) { return TII.isVALU(MI); };		auto IsHazardDefFn = [this] (MachineInstr MI) { return TII.isVALU(MI); };

for (const MachineOperand &Use : VMEM->uses()) {		for (const MachineOperand &Use : VMEM->uses()) {
if (!Use.isReg() \|\| TRI.isVGPR(MF.getRegInfo(), Use.getReg()))		if (!Use.isReg() \|\| TRI.isVGPR(MF.getRegInfo(), Use.getReg()))
continue;		continue;

int WaitStatesNeededForUse =		int WaitStatesNeededForUse =
VmemSgprWaitStates - getWaitStatesSinceDef(Use.getReg(), IsHazardDefFn);		VmemSgprWaitStates - getWaitStatesSinceDef(Use.getReg(), IsHazardDefFn,
		VmemSgprWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);		WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);
}		}
return WaitStatesNeeded;		return WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkDPPHazards(MachineInstr *DPP) {		int GCNHazardRecognizer::checkDPPHazards(MachineInstr *DPP) {
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();

// Check for DPP VGPR read after VALU VGPR write and EXEC write.		// Check for DPP VGPR read after VALU VGPR write and EXEC write.
int DppVgprWaitStates = 2;		int DppVgprWaitStates = 2;
int DppExecWaitStates = 5;		int DppExecWaitStates = 5;
int WaitStatesNeeded = 0;		int WaitStatesNeeded = 0;
auto IsHazardDefFn = [TII] (MachineInstr MI) { return TII->isVALU(MI); };		auto IsHazardDefFn = [TII] (MachineInstr MI) { return TII->isVALU(MI); };

for (const MachineOperand &Use : DPP->uses()) {		for (const MachineOperand &Use : DPP->uses()) {
if (!Use.isReg() \|\| !TRI->isVGPR(MF.getRegInfo(), Use.getReg()))		if (!Use.isReg() \|\| !TRI->isVGPR(MF.getRegInfo(), Use.getReg()))
continue;		continue;
int WaitStatesNeededForUse =		int WaitStatesNeededForUse =
DppVgprWaitStates - getWaitStatesSinceDef(Use.getReg());		DppVgprWaitStates - getWaitStatesSinceDef(Use.getReg(),
		[](MachineInstr *) { return true; },
		DppVgprWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);		WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);
}		}

WaitStatesNeeded = std::max(		WaitStatesNeeded = std::max(
WaitStatesNeeded,		WaitStatesNeeded,
DppExecWaitStates - getWaitStatesSinceDef(AMDGPU::EXEC, IsHazardDefFn));		DppExecWaitStates - getWaitStatesSinceDef(AMDGPU::EXEC, IsHazardDefFn,
		DppExecWaitStates));

return WaitStatesNeeded;		return WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkDivFMasHazards(MachineInstr *DivFMas) {		int GCNHazardRecognizer::checkDivFMasHazards(MachineInstr *DivFMas) {
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();

// v_div_fmas requires 4 wait states after a write to vcc from a VALU		// v_div_fmas requires 4 wait states after a write to vcc from a VALU
// instruction.		// instruction.
const int DivFMasWaitStates = 4;		const int DivFMasWaitStates = 4;
auto IsHazardDefFn = [TII] (MachineInstr MI) { return TII->isVALU(MI); };		auto IsHazardDefFn = [TII] (MachineInstr MI) { return TII->isVALU(MI); };
int WaitStatesNeeded = getWaitStatesSinceDef(AMDGPU::VCC, IsHazardDefFn);		int WaitStatesNeeded = getWaitStatesSinceDef(AMDGPU::VCC, IsHazardDefFn,
		DivFMasWaitStates);

return DivFMasWaitStates - WaitStatesNeeded;		return DivFMasWaitStates - WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkGetRegHazards(MachineInstr *GetRegInstr) {		int GCNHazardRecognizer::checkGetRegHazards(MachineInstr *GetRegInstr) {
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
unsigned GetRegHWReg = getHWReg(TII, *GetRegInstr);		unsigned GetRegHWReg = getHWReg(TII, *GetRegInstr);

const int GetRegWaitStates = 2;		const int GetRegWaitStates = 2;
auto IsHazardFn = [TII, GetRegHWReg] (MachineInstr *MI) {		auto IsHazardFn = [TII, GetRegHWReg] (MachineInstr *MI) {
return GetRegHWReg == getHWReg(TII, *MI);		return GetRegHWReg == getHWReg(TII, *MI);
};		};
int WaitStatesNeeded = getWaitStatesSinceSetReg(IsHazardFn);		int WaitStatesNeeded = getWaitStatesSinceSetReg(IsHazardFn, GetRegWaitStates);

return GetRegWaitStates - WaitStatesNeeded;		return GetRegWaitStates - WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkSetRegHazards(MachineInstr *SetRegInstr) {		int GCNHazardRecognizer::checkSetRegHazards(MachineInstr *SetRegInstr) {
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
unsigned HWReg = getHWReg(TII, *SetRegInstr);		unsigned HWReg = getHWReg(TII, *SetRegInstr);

const int SetRegWaitStates =		const int SetRegWaitStates =
ST.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ? 1 : 2;		ST.getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS ? 1 : 2;
auto IsHazardFn = [TII, HWReg] (MachineInstr *MI) {		auto IsHazardFn = [TII, HWReg] (MachineInstr *MI) {
return HWReg == getHWReg(TII, *MI);		return HWReg == getHWReg(TII, *MI);
};		};
int WaitStatesNeeded = getWaitStatesSinceSetReg(IsHazardFn);		int WaitStatesNeeded = getWaitStatesSinceSetReg(IsHazardFn, SetRegWaitStates);
return SetRegWaitStates - WaitStatesNeeded;		return SetRegWaitStates - WaitStatesNeeded;
}		}

int GCNHazardRecognizer::createsVALUHazard(const MachineInstr &MI) {		int GCNHazardRecognizer::createsVALUHazard(const MachineInstr &MI) {
if (!MI.mayStore())		if (!MI.mayStore())
return -1;		return -1;

const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (!TRI->isVGPR(MRI, Def.getReg()))
return WaitStatesNeeded;		return WaitStatesNeeded;
unsigned Reg = Def.getReg();		unsigned Reg = Def.getReg();
auto IsHazardFn = [this, Reg, TRI] (MachineInstr *MI) {		auto IsHazardFn = [this, Reg, TRI] (MachineInstr *MI) {
int DataIdx = createsVALUHazard(*MI);		int DataIdx = createsVALUHazard(*MI);
return DataIdx >= 0 &&		return DataIdx >= 0 &&
TRI->regsOverlap(MI->getOperand(DataIdx).getReg(), Reg);		TRI->regsOverlap(MI->getOperand(DataIdx).getReg(), Reg);
};		};
int WaitStatesNeededForDef =		int WaitStatesNeededForDef =
VALUWaitStates - getWaitStatesSince(IsHazardFn);		VALUWaitStates - getWaitStatesSince(IsHazardFn, VALUWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForDef);		WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForDef);

return WaitStatesNeeded;		return WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkVALUHazards(MachineInstr *VALU) {		int GCNHazardRecognizer::checkVALUHazards(MachineInstr *VALU) {
// This checks for the hazard where VMEM instructions that store more than		// This checks for the hazard where VMEM instructions that store more than
// 8 bytes can have there store data over written by the next instruction.		// 8 bytes can have there store data over written by the next instruction.
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (!LaneSelectOp->isReg() \|\| !TRI->isSGPRReg(MRI, LaneSelectOp->getReg()))
return 0;		return 0;

unsigned LaneSelectReg = LaneSelectOp->getReg();		unsigned LaneSelectReg = LaneSelectOp->getReg();
auto IsHazardFn = [TII] (MachineInstr *MI) {		auto IsHazardFn = [TII] (MachineInstr *MI) {
return TII->isVALU(*MI);		return TII->isVALU(*MI);
};		};

const int RWLaneWaitStates = 4;		const int RWLaneWaitStates = 4;
int WaitStatesSince = getWaitStatesSinceDef(LaneSelectReg, IsHazardFn);		int WaitStatesSince = getWaitStatesSinceDef(LaneSelectReg, IsHazardFn,
		RWLaneWaitStates);
return RWLaneWaitStates - WaitStatesSince;		return RWLaneWaitStates - WaitStatesSince;
}		}

int GCNHazardRecognizer::checkRFEHazards(MachineInstr *RFE) {		int GCNHazardRecognizer::checkRFEHazards(MachineInstr *RFE) {
if (ST.getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)		if (ST.getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)
return 0;		return 0;

const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();

const int RFEWaitStates = 1;		const int RFEWaitStates = 1;

auto IsHazardFn = [TII] (MachineInstr *MI) {		auto IsHazardFn = [TII] (MachineInstr *MI) {
return getHWReg(TII, *MI) == AMDGPU::Hwreg::ID_TRAPSTS;		return getHWReg(TII, *MI) == AMDGPU::Hwreg::ID_TRAPSTS;
};		};
int WaitStatesNeeded = getWaitStatesSinceSetReg(IsHazardFn);		int WaitStatesNeeded = getWaitStatesSinceSetReg(IsHazardFn, RFEWaitStates);
return RFEWaitStates - WaitStatesNeeded;		return RFEWaitStates - WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkAnyInstHazards(MachineInstr *MI) {		int GCNHazardRecognizer::checkAnyInstHazards(MachineInstr *MI) {
if (MI->isDebugInstr())		if (MI->isDebugInstr())
return 0;		return 0;

const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
if (!ST.hasSMovFedHazard())		if (!ST.hasSMovFedHazard())
return 0;		return 0;

// Check for any instruction reading an SGPR after a write from		// Check for any instruction reading an SGPR after a write from
// s_mov_fed_b32.		// s_mov_fed_b32.
int MovFedWaitStates = 1;		int MovFedWaitStates = 1;
int WaitStatesNeeded = 0;		int WaitStatesNeeded = 0;

for (const MachineOperand &Use : MI->uses()) {		for (const MachineOperand &Use : MI->uses()) {
if (!Use.isReg() \|\| TRI->isVGPR(MF.getRegInfo(), Use.getReg()))		if (!Use.isReg() \|\| TRI->isVGPR(MF.getRegInfo(), Use.getReg()))
continue;		continue;
auto IsHazardFn = [] (MachineInstr *MI) {		auto IsHazardFn = [] (MachineInstr *MI) {
return MI->getOpcode() == AMDGPU::S_MOV_FED_B32;		return MI->getOpcode() == AMDGPU::S_MOV_FED_B32;
};		};
int WaitStatesNeededForUse =		int WaitStatesNeededForUse =
MovFedWaitStates - getWaitStatesSinceDef(Use.getReg(), IsHazardFn);		MovFedWaitStates - getWaitStatesSinceDef(Use.getReg(), IsHazardFn,
		MovFedWaitStates);
WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);		WaitStatesNeeded = std::max(WaitStatesNeeded, WaitStatesNeededForUse);
}		}

return WaitStatesNeeded;		return WaitStatesNeeded;
}		}

int GCNHazardRecognizer::checkReadM0Hazards(MachineInstr *MI) {		int GCNHazardRecognizer::checkReadM0Hazards(MachineInstr *MI) {
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const int SMovRelWaitStates = 1;		const int SMovRelWaitStates = 1;
auto IsHazardFn = [TII] (MachineInstr *MI) {		auto IsHazardFn = [TII] (MachineInstr *MI) {
return TII->isSALU(*MI);		return TII->isSALU(*MI);
};		};
return SMovRelWaitStates - getWaitStatesSinceDef(AMDGPU::M0, IsHazardFn);		return SMovRelWaitStates - getWaitStatesSinceDef(AMDGPU::M0, IsHazardFn,
		SMovRelWaitStates);
}		}

lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 832 Lines • ▼ Show 20 Lines	void insertWaitStates(MachineBasicBlock &MBB,MachineBasicBlock::iterator MI,
int Count) const;		int Count) const;

void insertNoop(MachineBasicBlock &MBB,		void insertNoop(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI) const override;		MachineBasicBlock::iterator MI) const override;

void insertReturn(MachineBasicBlock &MBB) const;		void insertReturn(MachineBasicBlock &MBB) const;
/// Return the number of wait states that result from executing this		/// Return the number of wait states that result from executing this
/// instruction.		/// instruction.
unsigned getNumWaitStates(const MachineInstr &MI) const;		static unsigned getNumWaitStates(const MachineInstr &MI);

/// Returns the operand named \p Op. If \p MI does not have an		/// Returns the operand named \p Op. If \p MI does not have an
/// operand named \c Op, this function returns nullptr.		/// operand named \c Op, this function returns nullptr.
LLVM_READONLY		LLVM_READONLY
MachineOperand *getNamedOperand(MachineInstr &MI, unsigned OperandName) const;		MachineOperand *getNamedOperand(MachineInstr &MI, unsigned OperandName) const;

LLVM_READONLY		LLVM_READONLY
const MachineOperand *getNamedOperand(const MachineInstr &MI,		const MachineOperand *getNamedOperand(const MachineInstr &MI,
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 1,148 Lines • ▼ Show 20 Lines	void SIInstrInfo::insertReturn(MachineBasicBlock &MBB) const {
if (MBB.succ_empty()) {		if (MBB.succ_empty()) {
bool HasNoTerminator = MBB.getFirstTerminator() == MBB.end();		bool HasNoTerminator = MBB.getFirstTerminator() == MBB.end();
if (HasNoTerminator)		if (HasNoTerminator)
BuildMI(MBB, MBB.end(), DebugLoc(),		BuildMI(MBB, MBB.end(), DebugLoc(),
get(Info->returnsVoid() ? AMDGPU::S_ENDPGM : AMDGPU::SI_RETURN_TO_EPILOG));		get(Info->returnsVoid() ? AMDGPU::S_ENDPGM : AMDGPU::SI_RETURN_TO_EPILOG));
}		}
}		}

unsigned SIInstrInfo::getNumWaitStates(const MachineInstr &MI) const {		unsigned SIInstrInfo::getNumWaitStates(const MachineInstr &MI) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return 1; // FIXME: Do wait states equal cycles?		default: return 1; // FIXME: Do wait states equal cycles?

case AMDGPU::S_NOP:		case AMDGPU::S_NOP:
return MI.getOperand(0).getImm() + 1;		return MI.getOperand(0).getImm() + 1;
}		}
}		}

▲ Show 20 Lines • Show All 4,475 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/vmem-vcc-hazard.mir

This file was added.

				# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -run-pass post-RA-hazard-rec -o - %s \| FileCheck -check-prefix=GCN %s

				# GCN-LABEL: name: vmem_vcc_fallthrough
				# GCN: bb.1:
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_fallthrough
				body: \|
				bb.0:
				successors: %bb.1

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec

				bb.1:
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				...
				# GCN-LABEL: name: vmem_vcc_branch_to_next
				# GCN: bb.1:
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_branch_to_next
				body: \|
				bb.0:
				successors: %bb.1

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				...
				# GCN-LABEL: name: vmem_vcc_fallthrough_no_hazard_too_far
				# GCN: bb.1:
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_fallthrough_no_hazard_too_far
				body: \|
				bb.0:
				successors: %bb.1

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				$sgpr0 = S_MOV_B32 0
				$sgpr0 = S_MOV_B32 0
				$sgpr0 = S_MOV_B32 0
				$sgpr0 = S_MOV_B32 0
				$sgpr0 = S_MOV_B32 0

				bb.1:
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				...
				# GCN-LABEL: name: vmem_vcc_fallthrough_no_hazard_nops
				# GCN: bb.1:
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_fallthrough_no_hazard_nops
				body: \|
				bb.0:
				successors: %bb.1

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				S_NOP 4

				bb.1:
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				...
				# GCN-LABEL: name: vmem_vcc_branch_around
				# GCN: bb.2:
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_branch_around
				body: \|
				bb.0:
				successors: %bb.2

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				S_BRANCH %bb.2

				bb.1:
				successors: %bb.2

				S_NOP 0
				S_NOP 0
				S_NOP 0
				S_NOP 0

				bb.2:
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				...
				# GCN-LABEL: name: vmem_vcc_branch_backedge
				# GCN: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_branch_backedge
				body: \|
				bb.0:
				successors: %bb.1

				$vgpr0 = IMPLICIT_DEF
				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec

				bb.1:
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				S_BRANCH %bb.0
				...
				# GCN-LABEL: name: vmem_vcc_min_of_two
				# GCN: bb.2:
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_min_of_two
				body: \|
				bb.0:
				successors: %bb.2

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				S_NOP 0
				S_BRANCH %bb.2

				bb.1:
				successors: %bb.2

				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec

				bb.2:
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				...
				# GCN-LABEL: name: vmem_vcc_self_loop
				# GCN: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_self_loop
				body: \|
				bb.0:
				successors: %bb.0

				$vgpr0 = IMPLICIT_DEF
				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				S_BRANCH %bb.0
				...
				# GCN-LABEL: name: vmem_vcc_min_of_two_self_loop1
				# GCN: bb.1:
				# GCN: $sgpr0 = S_MOV_B32 0
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_min_of_two_self_loop1
				body: \|
				bb.0:
				successors: %bb.1

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec

				bb.1:
				successors: %bb.1

				$sgpr0 = S_MOV_B32 0
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				$vgpr1 = V_ADDC_U32_e32 $vgpr1, $vgpr1, implicit-def $vcc, implicit $vcc, implicit $exec
				S_BRANCH %bb.1
				...
				# GCN-LABEL: name: vmem_vcc_min_of_two_self_loop2
				# GCN: bb.1:
				# GCN: $sgpr0 = S_MOV_B32 0
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: S_NOP
				# GCN-NEXT: BUFFER_LOAD_DWORD_OFFEN
				---
				name: vmem_vcc_min_of_two_self_loop2
				body: \|
				bb.0:
				successors: %bb.1

				$sgpr0_sgpr1_sgpr2_sgpr3 = IMPLICIT_DEF
				$vgpr0 = IMPLICIT_DEF
				$vgpr1 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				S_NOP 0

				bb.1:
				successors: %bb.1

				$sgpr0 = S_MOV_B32 0
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $vcc_lo, 0, 0, 0, 0, implicit $exec
				$vgpr1 = V_ADDC_U32_e32 $vgpr1, $vgpr1, implicit-def $vcc, implicit $vcc, implicit $exec
				S_BRANCH %bb.1
				...