This is an archive of the discontinued LLVM Phabricator instance.

[RFC] AMDGPU: Add MachineInstr::Initiator and ::Terminator flags
AbandonedPublic

Authored by nhaehnle on Sep 3 2016, 8:40 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Summary

Control flow in AMDGPU is lowered not just via branches, but also via bit
manipulation of the EXEC mask. This means every basic block might have
some ALU instructions at the beginning and end to setup the EXEC mask
correctly. Bad things can happen when target-independent passes like
scheduling or (especially) register allocation mess with these parts. This
has become more of an issue now that control flow pseudo instructions are
lowered earlier (which is how things should be).

For example, a conditional branch from an if-statement might be lowered as

s_and_saveexec_b64 s[10:11], vcc
s_xor_b64 s[10:11], exec, s[10:11]
; mask branch BB5_2

As long as only the ; mask branch pseudo instruction is a terminator, the
register allocator might decide to spill registers after the s_and_saveexec
instruction, which (in the case of a vector register) means that the
register will not be spilled correctly.

Another problem is that we want to move the AMDGPU-specific Whole Quad Mode
pass to after machine scheduling. That pass needs to introduce its own
exec-manipulating instructions while being aware of the meaning of the
already existing exec-instructions. That meaning is currently lost.

So here's a suggestion for dealing with all that: Allow arbitrary
instructions to be marked as Terminator (end of BB) or Initiator (beginning
of BB) instructions.

The intention is that target-independent passes will mess with those
instructions as little as possible: no scheduling changes, and register
spilling and restoring happens outside those regions whenever possible.

The diff here is not complete, but it shows the direction I want to go in.
It passes all the AMDGPU lit tests except for spill-m0.ll, which needs
RegAllocFast to be fixed to become aware of the Initiator/Terminator
rules.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 70268.Sep 3 2016, 8:40 AM

nhaehnle retitled this revision from to [RFC] AMDGPU: Add MachineInstr::Initiator and ::Terminator flags.

nhaehnle updated this object.

nhaehnle added reviewers: arsenm, • tstellarAMD.

nhaehnle added a subscriber: llvm-commits.

Herald added subscribers: nhaehnle, wdng, arsenm. · View Herald TranscriptSep 3 2016, 8:40 AM

I have a patch that fixes the control flow spilling problem already. This won't actually solve that part

I've been moving more in the direction of not considering mask branches as branches at all, and their own separate concept. My patch adds a handful of terminator instruction aliases that are replaced after register allocation with the regular instructions, since it's only used to get correct spill code placement.

nhaehnle abandoned this revision.Feb 21 2018, 6:55 AM

Herald added subscribers: t-tye, tpr, dstuttard and 2 others. · View Herald TranscriptFeb 21 2018, 6:55 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineInstr.h

15 lines

lib/

CodeGen/

MachineInstr.cpp

14 lines

MachineVerifier.cpp

9 lines

Target/

AMDGPU/

SIInsertWaits.cpp

26 lines

SILowerControlFlow.cpp

27 lines

test/

CodeGen/

AMDGPU/

loop_break.ll

2 lines

valu-i1.ll

2 lines

Diff 70268

include/llvm/CodeGen/MachineInstr.h

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	public:

enum MIFlag {		enum MIFlag {
NoFlags = 0,		NoFlags = 0,
FrameSetup = 1 << 0, // Instruction is used as a part of		FrameSetup = 1 << 0, // Instruction is used as a part of
// function frame setup code.		// function frame setup code.
FrameDestroy = 1 << 1, // Instruction is used as a part of		FrameDestroy = 1 << 1, // Instruction is used as a part of
// function frame destruction code.		// function frame destruction code.
BundledPred = 1 << 2, // Instruction has bundled predecessors.		BundledPred = 1 << 2, // Instruction has bundled predecessors.
BundledSucc = 1 << 3 // Instruction has bundled successors.		BundledSucc = 1 << 3, // Instruction has bundled successors.
		Initiator = 1 << 4, // Instruction is used as a part of
		// target-specific basic block prolog
		Terminator = 1 << 5 // Instruction is used as a part of
		// target-specific basic block epilog
};		};
private:		private:
const MCInstrDesc *MCID; // Instruction descriptor.		const MCInstrDesc *MCID; // Instruction descriptor.
MachineBasicBlock *Parent; // Pointer to the owning basic block.		MachineBasicBlock *Parent; // Pointer to the owning basic block.

// Operands are allocated by an ArrayRecycler.		// Operands are allocated by an ArrayRecycler.
MachineOperand *Operands; // Pointer to the first operand.		MachineOperand *Operands; // Pointer to the first operand.
unsigned NumOperands; // Number of operands on instruction.		unsigned NumOperands; // Number of operands on instruction.
▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	public:

/// Returns true if the specified instruction stops control flow		/// Returns true if the specified instruction stops control flow
/// from executing the instruction immediately following it. Examples include		/// from executing the instruction immediately following it. Examples include
/// unconditional branches and return instructions.		/// unconditional branches and return instructions.
bool isBarrier(QueryType Type = AnyInBundle) const {		bool isBarrier(QueryType Type = AnyInBundle) const {
return hasProperty(MCID::Barrier, Type);		return hasProperty(MCID::Barrier, Type);
}		}

		/// Returns true if this instruction is part of the initiator for a basic
		/// block. This can be used by targets that have non-uniform control flow
		/// to set up execution masks.
		bool isInitiator() const {
		return getFlag(Initiator); // TODO: QueryType?
		}

/// Returns true if this instruction part of the terminator for a basic block.		/// Returns true if this instruction part of the terminator for a basic block.
/// Typically this is things like return and branch instructions.		/// Typically this is things like return and branch instructions.
///		///
/// Various passes use this to insert code into the bottom of a basic block,		/// Various passes use this to insert code into the bottom of a basic block,
/// but before control flow occurs.		/// but before control flow occurs.
bool isTerminator(QueryType Type = AnyInBundle) const {		bool isTerminator(QueryType Type = AnyInBundle) const {
return hasProperty(MCID::Terminator, Type);		return hasProperty(MCID::Terminator, Type) \|\| getFlag(Terminator); // TODO: QueryType?
}		}

/// Returns true if this is a conditional, unconditional, or indirect branch.		/// Returns true if this is a conditional, unconditional, or indirect branch.
/// Predicates below can be used to discriminate between		/// Predicates below can be used to discriminate between
/// these cases, and the TargetInstrInfo::AnalyzeBranch method can be used to		/// these cases, and the TargetInstrInfo::AnalyzeBranch method can be used to
/// get more information.		/// get more information.
bool isBranch(QueryType Type = AnyInBundle) const {		bool isBranch(QueryType Type = AnyInBundle) const {
return hasProperty(MCID::Branch, Type);		return hasProperty(MCID::Branch, Type);
▲ Show 20 Lines • Show All 853 Lines • Show Last 20 Lines

lib/CodeGen/MachineInstr.cpp

Show First 20 Lines • Show All 1,923 Lines • ▼ Show 20 Lines	void MachineInstr::print(raw_ostream &OS, ModuleSlotTracker &MST,

// Briefly indicate whether any call clobbers were omitted.		// Briefly indicate whether any call clobbers were omitted.
if (OmittedAnyCallClobbers) {		if (OmittedAnyCallClobbers) {
if (!FirstOp) OS << ",";		if (!FirstOp) OS << ",";
OS << " ...";		OS << " ...";
}		}

bool HaveSemi = false;		bool HaveSemi = false;
const unsigned PrintableFlags = FrameSetup \| FrameDestroy;		const unsigned PrintableFlags = FrameSetup \| FrameDestroy \| Initiator \| Terminator;
if (Flags & PrintableFlags) {		if (Flags & PrintableFlags) {
if (!HaveSemi) {		if (!HaveSemi) {
OS << ";";		OS << ";";
HaveSemi = true;		HaveSemi = true;
}		}
OS << " flags: ";		OS << " flags:";

if (Flags & FrameSetup)		if (Flags & FrameSetup)
OS << "FrameSetup";		OS << " FrameSetup";

if (Flags & FrameDestroy)		if (Flags & FrameDestroy)
OS << "FrameDestroy";		OS << " FrameDestroy";

		if (Flags & Initiator)
		OS << " Initiator";

		if (Flags & Terminator)
		OS << " Terminator";
}		}

if (!memoperands_empty()) {		if (!memoperands_empty()) {
if (!HaveSemi) {		if (!HaveSemi) {
OS << ";";		OS << ";";
HaveSemi = true;		HaveSemi = true;
}		}

▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

lib/CodeGen/MachineVerifier.cpp

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	struct MachineVerifier {

typedef SmallVector<unsigned, 16> RegVector;		typedef SmallVector<unsigned, 16> RegVector;
typedef SmallVector<const uint32_t*, 4> RegMaskVector;		typedef SmallVector<const uint32_t*, 4> RegMaskVector;
typedef DenseSet<unsigned> RegSet;		typedef DenseSet<unsigned> RegSet;
typedef DenseMap<unsigned, const MachineInstr*> RegMap;		typedef DenseMap<unsigned, const MachineInstr*> RegMap;
typedef SmallPtrSet<const MachineBasicBlock*, 8> BlockSet;		typedef SmallPtrSet<const MachineBasicBlock*, 8> BlockSet;

const MachineInstr *FirstTerminator;		const MachineInstr *FirstTerminator;
		bool SeenNonInitiator;
BlockSet FunctionBlocks;		BlockSet FunctionBlocks;

BitVector regsReserved;		BitVector regsReserved;
RegSet regsLive;		RegSet regsLive;
RegVector regsDefined, regsDead, regsKilled;		RegVector regsDefined, regsDead, regsKilled;
RegMaskVector regMasks;		RegMaskVector regMasks;
RegSet regsLiveInButUnused;		RegSet regsLiveInButUnused;

▲ Show 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	static bool matchPair(MachineBasicBlock::const_succ_iterator i,
if (*i == b)		if (*i == b)
return *++i == a;		return *++i == a;
return false;		return false;
}		}

void		void
MachineVerifier::visitMachineBasicBlockBefore(const MachineBasicBlock *MBB) {		MachineVerifier::visitMachineBasicBlockBefore(const MachineBasicBlock *MBB) {
FirstTerminator = nullptr;		FirstTerminator = nullptr;
		SeenNonInitiator = false;

if (!MF->getProperties().hasProperty(		if (!MF->getProperties().hasProperty(
MachineFunctionProperties::Property::NoPHIs)) {		MachineFunctionProperties::Property::NoPHIs)) {
// If this block has allocatable physical registers live-in, check that		// If this block has allocatable physical registers live-in, check that
// it is an entry block or landing pad.		// it is an entry block or landing pad.
for (const auto &LI : MBB->liveins()) {		for (const auto &LI : MBB->liveins()) {
if (isAllocatable(LI.PhysReg) && !MBB->isEHPad() &&		if (isAllocatable(LI.PhysReg) && !MBB->isEHPad() &&
MBB->getIterator() != MBB->getParent()->begin()) {		MBB->getIterator() != MBB->getParent()->begin()) {
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	if (Indexes && Indexes->hasIndex(*MI)) {
SlotIndex idx = Indexes->getInstructionIndex(*MI);		SlotIndex idx = Indexes->getInstructionIndex(*MI);
if (!(idx > lastIndex)) {		if (!(idx > lastIndex)) {
report("Instruction index out of order", MI);		report("Instruction index out of order", MI);
errs() << "Last instruction was at " << lastIndex << '\n';		errs() << "Last instruction was at " << lastIndex << '\n';
}		}
lastIndex = idx;		lastIndex = idx;
}		}

		// Ensure initiators don't follow non-initiators.
		if (!MI->isInitiator()) {
		SeenNonInitiator = true;
		} else if (SeenNonInitiator) {
		report("Initiator instruction after a non-initiator", MI);
		}

// Ensure non-terminators don't follow terminators.		// Ensure non-terminators don't follow terminators.
// Ignore predicated terminators formed by if conversion.		// Ignore predicated terminators formed by if conversion.
// FIXME: If conversion shouldn't need to violate this rule.		// FIXME: If conversion shouldn't need to violate this rule.
if (MI->isTerminator() && !TII->isPredicated(*MI)) {		if (MI->isTerminator() && !TII->isPredicated(*MI)) {
if (!FirstTerminator)		if (!FirstTerminator)
FirstTerminator = MI;		FirstTerminator = MI;
} else if (FirstTerminator) {		} else if (FirstTerminator) {
report("Non-terminator instruction after the first terminator", MI);		report("Non-terminator instruction after the first terminator", MI);
▲ Show 20 Lines • Show All 1,304 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInsertWaits.cpp

Show First 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = I->getNumOperands(); i != e; ++i) {
}		}
}		}
}		}

bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,		bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
const Counters &Required) {		const Counters &Required) {

// End of program? No need to wait on anything
// A function not returning void needs to wait, because other bytecode will
// be appended after it and we don't know what it will be.
if (I != MBB.end() && I->getOpcode() == AMDGPU::S_ENDPGM && ReturnsVoid)
return false;

// Figure out if the async instructions execute in order		// Figure out if the async instructions execute in order
bool Ordered[3];		bool Ordered[3];

// VM_CNT is always ordered		// VM_CNT is always ordered
Ordered[0] = true;		Ordered[0] = true;

// EXP_CNT is unordered if we have both EXP & VM-writes		// EXP_CNT is unordered if we have both EXP & VM-writes
Ordered[1] = ExpInstrTypesSeen == 3;		Ordered[1] = ExpInstrTypesSeen == 3;
Show All 30 Lines	bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,
if (!NeedWait)		if (!NeedWait)
return false;		return false;

// Reset EXP_CNT instruction types		// Reset EXP_CNT instruction types
if (Counts.Named.EXP == 0)		if (Counts.Named.EXP == 0)
ExpInstrTypesSeen = 0;		ExpInstrTypesSeen = 0;

// Build the wait instruction		// Build the wait instruction
		MachineInstr *Wait =
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
.addImm((Counts.Named.VM & 0xF) \|		.addImm((Counts.Named.VM & 0xF) \|
((Counts.Named.EXP & 0x7) << 4) \|		((Counts.Named.EXP & 0x7) << 4) \|
((Counts.Named.LGKM & 0xF) << 8));		((Counts.Named.LGKM & 0xF) << 8));

		if (MachineInstr *Prev = Wait->getPrevNode()) {
		if (Prev->isTerminator())
		Wait->setFlag(MachineInstr::Terminator);
		}

LastOpcodeType = OTHER;		LastOpcodeType = OTHER;
LastInstWritesM0 = false;		LastInstWritesM0 = false;
return true;		return true;
}		}

/// \brief helper function for handleOperands		/// \brief helper function for handleOperands
static void increaseCounters(Counters &Dst, const Counters &Src) {		static void increaseCounters(Counters &Dst, const Counters &Src) {

▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
if (I->getOpcode() == AMDGPU::S_WAITCNT) {		if (I->getOpcode() == AMDGPU::S_WAITCNT) {
handleExistingWait(*I);		handleExistingWait(*I);
RemoveMI.push_back(&*I);		RemoveMI.push_back(&*I);
continue;		continue;
}		}

Counters Required;		Counters Required;

// Wait for everything before a barrier.		// Wait for everything before a branch or barrier.
//		//
// S_SENDMSG implicitly waits for all outstanding LGKM transfers to finish,		// S_SENDMSG implicitly waits for all outstanding LGKM transfers to finish,
// but we also want to wait for any other outstanding transfers before		// but we also want to wait for any other outstanding transfers before
// signalling other hardware blocks		// signalling other hardware blocks
if (I->getOpcode() == AMDGPU::S_BARRIER \|\|		if (I->isBranch() \|\| I->getOpcode() == AMDGPU::SI_MASK_BRANCH \|\|
		I->getOpcode() == AMDGPU::S_BARRIER \|\|
I->getOpcode() == AMDGPU::S_SENDMSG)		I->getOpcode() == AMDGPU::S_SENDMSG)
Required = LastIssued;		Required = LastIssued;
else		else
Required = handleOperands(*I);		Required = handleOperands(*I);

Counters Increment = getHwCounts(*I);		Counters Increment = getHwCounts(*I);

if (countersNonZero(Required) \|\| countersNonZero(Increment))		if (countersNonZero(Required) \|\| countersNonZero(Increment))
increaseCounters(Required, DelayedWaitOn);		increaseCounters(Required, DelayedWaitOn);

Changes \|= insertWait(MBB, I, Required);		Changes \|= insertWait(MBB, I, Required);

pushInstruction(MBB, I, Increment);		pushInstruction(MBB, I, Increment);
handleSendMsg(MBB, I);		handleSendMsg(MBB, I);
}		}

// Wait for everything at the end of the MBB		// Wait for everything at the end of the MBB, in case there are no
Changes \|= insertWait(MBB, MBB.getFirstTerminator(), LastIssued);		// branches. No need to wait at the end of the (void-returning) program,
		// since the hardware does so automatically.
		if (!MBB.empty() && MBB.back().getOpcode() != AMDGPU::S_ENDPGM)
		Changes \|= insertWait(MBB, MBB.end(), LastIssued);
}		}

for (MachineInstr *I : RemoveMI)		for (MachineInstr *I : RemoveMI)
I->eraseFromParent();		I->eraseFromParent();

return Changes;		return Changes;
}		}

lib/Target/AMDGPU/SILowerControlFlow.cpp

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	void SILowerControlFlow::emitIf(MachineInstr &MI) {
MachineOperand &Cond = MI.getOperand(1);		MachineOperand &Cond = MI.getOperand(1);
assert(SaveExec.getSubReg() == AMDGPU::NoSubRegister &&		assert(SaveExec.getSubReg() == AMDGPU::NoSubRegister &&
Cond.getSubReg() == AMDGPU::NoSubRegister);		Cond.getSubReg() == AMDGPU::NoSubRegister);

unsigned SaveExecReg = SaveExec.getReg();		unsigned SaveExecReg = SaveExec.getReg();

MachineInstr *AndSaveExec =		MachineInstr *AndSaveExec =
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_AND_SAVEEXEC_B64), SaveExecReg)		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_AND_SAVEEXEC_B64), SaveExecReg)
.addOperand(Cond);		.addOperand(Cond)
		.setMIFlag(MachineInstr::Terminator);

MachineInstr *Xor =		MachineInstr *Xor =
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_XOR_B64), SaveExecReg)		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_XOR_B64), SaveExecReg)
.addReg(AMDGPU::EXEC)		.addReg(AMDGPU::EXEC)
.addReg(SaveExecReg);		.addReg(SaveExecReg)
		.setMIFlag(MachineInstr::Terminator);

// Insert a pseudo terminator to help keep the verifier happy. This will also		// Insert a pseudo terminator to help keep the verifier happy. This will also
// be used later when inserting skips.		// be used later when inserting skips.
MachineInstr *NewBr =		MachineInstr *NewBr =
BuildMI(MBB, I, DL, TII->get(AMDGPU::SI_MASK_BRANCH))		BuildMI(MBB, I, DL, TII->get(AMDGPU::SI_MASK_BRANCH))
.addOperand(MI.getOperand(2));		.addOperand(MI.getOperand(2));

if (!LIS) {		if (!LIS) {
Show All 24 Lines	void SILowerControlFlow::emitElse(MachineInstr &MI) {

bool ExecModified = MI.getOperand(3).getImm() != 0;		bool ExecModified = MI.getOperand(3).getImm() != 0;
MachineBasicBlock::iterator Start = MBB.begin();		MachineBasicBlock::iterator Start = MBB.begin();

// This must be inserted before phis and any spill code inserted before the		// This must be inserted before phis and any spill code inserted before the
// else.		// else.
MachineInstr *OrSaveExec =		MachineInstr *OrSaveExec =
BuildMI(MBB, Start, DL, TII->get(AMDGPU::S_OR_SAVEEXEC_B64), DstReg)		BuildMI(MBB, Start, DL, TII->get(AMDGPU::S_OR_SAVEEXEC_B64), DstReg)
.addOperand(MI.getOperand(1)); // Saved EXEC		.addOperand(MI.getOperand(1)) // Saved EXEC
		.setMIFlag(MachineInstr::Initiator);
MachineBasicBlock *DestBB = MI.getOperand(2).getMBB();		MachineBasicBlock *DestBB = MI.getOperand(2).getMBB();

MachineBasicBlock::iterator ElsePt(MI);		MachineBasicBlock::iterator ElsePt(MI);

if (ExecModified) {		if (ExecModified) {
MachineInstr *And =		MachineInstr *And =
BuildMI(MBB, ElsePt, DL, TII->get(AMDGPU::S_AND_B64), DstReg)		BuildMI(MBB, ElsePt, DL, TII->get(AMDGPU::S_AND_B64), DstReg)
.addReg(AMDGPU::EXEC)		.addReg(AMDGPU::EXEC)
.addReg(DstReg);		.addReg(DstReg)
		.setMIFlag(MachineInstr::Terminator);

if (LIS)		if (LIS)
LIS->InsertMachineInstrInMaps(*And);		LIS->InsertMachineInstrInMaps(*And);
}		}

MachineInstr *Xor =		MachineInstr *Xor =
BuildMI(MBB, ElsePt, DL, TII->get(AMDGPU::S_XOR_B64), AMDGPU::EXEC)		BuildMI(MBB, ElsePt, DL, TII->get(AMDGPU::S_XOR_B64), AMDGPU::EXEC)
.addReg(AMDGPU::EXEC)		.addReg(AMDGPU::EXEC)
.addReg(DstReg);		.addReg(DstReg)
		.setMIFlag(MachineInstr::Terminator);

MachineBasicBlock::iterator Term = MBB.getFirstTerminator();		// Insert an additional pseudo terminator to help keep the verifier happy
// Insert a pseudo terminator to help keep the verifier happy.		// and mark the location for skips to be inserted later.
MachineInstr *Branch =		MachineInstr *Branch =
BuildMI(MBB, Term, DL, TII->get(AMDGPU::SI_MASK_BRANCH))		BuildMI(MBB, ElsePt, DL, TII->get(AMDGPU::SI_MASK_BRANCH))
.addMBB(DestBB);		.addMBB(DestBB);

if (!LIS) {		if (!LIS) {
MI.eraseFromParent();		MI.eraseFromParent();
return;		return;
}		}

LIS->RemoveMachineInstrFromMaps(MI);		LIS->RemoveMachineInstrFromMaps(MI);
Show All 37 Lines

void SILowerControlFlow::emitLoop(MachineInstr &MI) {		void SILowerControlFlow::emitLoop(MachineInstr &MI) {
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
const DebugLoc &DL = MI.getDebugLoc();		const DebugLoc &DL = MI.getDebugLoc();

MachineInstr *AndN2 =		MachineInstr *AndN2 =
BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_ANDN2_B64), AMDGPU::EXEC)		BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_ANDN2_B64), AMDGPU::EXEC)
.addReg(AMDGPU::EXEC)		.addReg(AMDGPU::EXEC)
.addOperand(MI.getOperand(0));		.addOperand(MI.getOperand(0))
		.setMIFlag(MachineInstr::Terminator);

MachineInstr *Branch =		MachineInstr *Branch =
BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))		BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))
.addOperand(MI.getOperand(1));		.addOperand(MI.getOperand(1));

if (LIS) {		if (LIS) {
LIS->ReplaceMachineInstrInMaps(MI, *AndN2);		LIS->ReplaceMachineInstrInMaps(MI, *AndN2);
LIS->InsertMachineInstrInMaps(*Branch);		LIS->InsertMachineInstrInMaps(*Branch);
}		}

MI.eraseFromParent();		MI.eraseFromParent();
}		}

void SILowerControlFlow::emitEndCf(MachineInstr &MI) {		void SILowerControlFlow::emitEndCf(MachineInstr &MI) {
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
const DebugLoc &DL = MI.getDebugLoc();		const DebugLoc &DL = MI.getDebugLoc();

MachineBasicBlock::iterator InsPt = MBB.begin();		MachineBasicBlock::iterator InsPt = MBB.begin();
MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(MBB, InsPt, DL, TII->get(AMDGPU::S_OR_B64), AMDGPU::EXEC)		BuildMI(MBB, InsPt, DL, TII->get(AMDGPU::S_OR_B64), AMDGPU::EXEC)
.addReg(AMDGPU::EXEC)		.addReg(AMDGPU::EXEC)
.addOperand(MI.getOperand(0));		.addOperand(MI.getOperand(0))
		.setMIFlag(MachineInstr::Initiator);

if (LIS)		if (LIS)
LIS->ReplaceMachineInstrInMaps(MI, *NewMI);		LIS->ReplaceMachineInstrInMaps(MI, *NewMI);

MI.eraseFromParent();		MI.eraseFromParent();

if (LIS)		if (LIS)
LIS->handleMove(*NewMI);		LIS->handleMove(*NewMI);
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/loop_break.ll

	Show All 35 Lines
	; GCN: v_cmp_ge_i32_e32 vcc,			; GCN: v_cmp_ge_i32_e32 vcc,
	; GCN: s_or_b64 [[MASK]], vcc, [[INITMASK]]			; GCN: s_or_b64 [[MASK]], vcc, [[INITMASK]]

	; GCN: [[FLOW]]:			; GCN: [[FLOW]]:
	; GCN: s_mov_b64 [[INITMASK]], [[MASK]]			; GCN: s_mov_b64 [[INITMASK]], [[MASK]]
	; GCN: s_andn2_b64 exec, exec, [[MASK]]			; GCN: s_andn2_b64 exec, exec, [[MASK]]
	; GCN-NEXT: s_cbranch_execnz [[LOOP_ENTRY]]			; GCN-NEXT: s_cbranch_execnz [[LOOP_ENTRY]]

	; GCN: ; BB#4: ; %bb9			; GCN-NEXT: BB0_4: ; %bb9
	; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]			; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define void @break_loop(i32 %arg) #0 {			define void @break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	Show All 19 Lines

test/CodeGen/AMDGPU/valu-i1.ll

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines

	; SI: [[LABEL_FLOW]]:			; SI: [[LABEL_FLOW]]:
	; SI-NEXT: ; in Loop: Header=[[LABEL_LOOP]]			; SI-NEXT: ; in Loop: Header=[[LABEL_LOOP]]
	; SI-NEXT: s_or_b64 exec, exec, [[ORNEG2]]			; SI-NEXT: s_or_b64 exec, exec, [[ORNEG2]]
	; SI-NEXT: s_or_b64 [[COND_STATE]], [[ORNEG2]], [[TMP]]			; SI-NEXT: s_or_b64 [[COND_STATE]], [[ORNEG2]], [[TMP]]
	; SI-NEXT: s_andn2_b64 exec, exec, [[COND_STATE]]			; SI-NEXT: s_andn2_b64 exec, exec, [[COND_STATE]]
	; SI-NEXT: s_cbranch_execnz [[LABEL_LOOP]]			; SI-NEXT: s_cbranch_execnz [[LABEL_LOOP]]

	; SI: BB#5			; SI: BB{{[0-9]+_[0-9]+}}: ; %Flow8
	; SI: s_or_b64 exec, exec, [[COND_STATE]]			; SI: s_or_b64 exec, exec, [[COND_STATE]]

	; SI: [[LABEL_EXIT]]:			; SI: [[LABEL_EXIT]]:
	; SI-NOT: [[COND_STATE]]			; SI-NOT: [[COND_STATE]]
	; SI: s_endpgm			; SI: s_endpgm

	define void @multi_vcond_loop(i32 addrspace(1)* noalias nocapture %arg, i32 addrspace(1)* noalias nocapture readonly %arg1, i32 addrspace(1)* noalias nocapture readonly %arg2, i32 addrspace(1)* noalias nocapture readonly %arg3) #1 {			define void @multi_vcond_loop(i32 addrspace(1)* noalias nocapture %arg, i32 addrspace(1)* noalias nocapture readonly %arg1, i32 addrspace(1)* noalias nocapture readonly %arg2, i32 addrspace(1)* noalias nocapture readonly %arg3) #1 {
	bb:			bb:
	Show All 34 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RFC] AMDGPU: Add MachineInstr::Initiator and ::Terminator flagsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 70268

include/llvm/CodeGen/MachineInstr.h

lib/CodeGen/MachineInstr.cpp

lib/CodeGen/MachineVerifier.cpp

lib/Target/AMDGPU/SIInsertWaits.cpp

lib/Target/AMDGPU/SILowerControlFlow.cpp

test/CodeGen/AMDGPU/loop_break.ll

test/CodeGen/AMDGPU/valu-i1.ll

[RFC] AMDGPU: Add MachineInstr::Initiator and ::Terminator flags
AbandonedPublic