This is an archive of the discontinued LLVM Phabricator instance.

include/llvm/IR/IntrinsicsAMDGPU.td
757 ↗	(On Diff #117674)	This should be convergent (and no mem?)
lib/Target/AMDGPU/AMDGPUInstructions.td
170–175 ↗	(On Diff #117674)	I'm not sure what this means by NONANS. I think this is just doing the same thing as the existing COND_O* patleafs by accepting the ordered and unspecified compares as ordered.
lib/Target/AMDGPU/SIISelLowering.cpp
3002 ↗	(On Diff #117674)	I don't think we should have a family of different kill opcodes just for the different compare types. Can we just have SI_KILL with the condition register input? We could then have an optimization pass look for the %cond = V_CMP_* SI_KILL %cond -> V_CMPX_*_term pattern. We would probably have to introduce the new _term variants of the CMPX instructions.
lib/Target/AMDGPU/SIInsertSkips.cpp
204–206 ↗	(On Diff #117674)	If the condition is an SGPR this will violate the operand constraints, so this should be creating the _e64 version. The problem with that is this pass runs after SIShrinkInstructions, so this won't be optimized down in the common case which is part of why this expansion should probably be done earlier.
lib/Transforms/InstCombine/InstCombineCalls.cpp
3543–3550 ↗	(On Diff #117674)	Test missing for this part
test/CodeGen/AMDGPU/kill.ll
1 ↗	(On Diff #117674)	This should just run llc. The instcombine test should be separate in test/Transforms/InstCombine/AMDGPU Also should use -enable-var-scope for FileCheck
23 ↗	(On Diff #117674)	Delete unused arguments (opt -deadarghaX0r should be able to do this for you)
26 ↗	(On Diff #117674)	It would be better to put the new intrinsic tests in a new llvm.amdgcn.kill file, and leave the legacy versions in a separate test file

arsenm added a subscriber: llvm-commits.Oct 4 2017, 10:26 AM

mareko added inline comments.Oct 4 2017, 3:15 PM

include/llvm/IR/IntrinsicsAMDGPU.td
757 ↗	(On Diff #117674)	amdgcn.kill shouldn't be moved across ds.bpermute, ds.swizzle, and image_sample opcodes. Does IntrNoMem or IntrConvergent assure that?
lib/Target/AMDGPU/AMDGPUInstructions.td
170–175 ↗	(On Diff #117674)	It means I'm lazy to handle OGE and !OGE separately. I'd still like to handle !OGE in a simple way and not care about NaNs. Alternatively, I can use add 2 patterns, one for COND_OGE and one for COND_UGE, both mapping to SI_KILL_F32_GE_0).
lib/Target/AMDGPU/SIISelLowering.cpp
3002 ↗	(On Diff #117674)	I tried to do that but it's too much work. Eventually we'd like all flavors of V_CMPX, but it's not a high priority right now.
lib/Target/AMDGPU/SIInsertSkips.cpp
204–206 ↗	(On Diff #117674)	This is not new code. It's the previous code moved by a few lines.
lib/Transforms/InstCombine/InstCombineCalls.cpp
3543–3550 ↗	(On Diff #117674)	It's the "kill_true" test.
test/CodeGen/AMDGPU/kill.ll
1 ↗	(On Diff #117674)	What's the deal with not running -instcombine as part of AMDGCN tests? It seems like it would be convenient everywhere.

A bunch of inline comments, but also some higher level things that don't really fit anywhere. This is a useful feature, but I don't think we've ever gotten the design of kill just right, because kill is really an implicit control flow intrinsic.

So, for example, if you have

%v = llvm.amdgcn.icmp(...) ; ballot-type instruction
kill(...)
use %v

then LLVM is free to move the ballot-type instruction to after the kill according to the LLVM IR semantics, even though that is incorrect.

This isn't a problem in practice yet, because the instructions most likely to be affected by this are image sample intrinsics. Those are IntrReadMem, and kill itself has arbitrary side effects today, so sample intrinsics cannot be moved past a kill. Still, it might lead to problems in the future with shaders that use ballot and DPP / reduction intrinsics. So I've been wondering if we couldn't perhaps use kill like this:

  br i1 %cond, label %kill_bb, label %cont
kill_bb:
  call noret void @llvm.amdgcn.kill()
  ret undef
cont:
  ...

or perhaps better:

  %kill = call i1 @llvm.amdgcn.ps.kill(%cond)
  br i1 %kill, label %kill_bb, label %cont
kill_bb:
  call noret void @llvm.amdgcn.kill()
  ret undef
cont:
  ...

In this second variant, the ps.kill intrinsic would update the live mask and return true for all threads that can exit, i.e. ps.kill would internally do the WQM vote.

The advantage is that the control flow aspect of kill is properly modeled at LLVM IR level and so we can't run into issues with convergent intrinsics moving past it. I'd feel much more comfortable with an approach like that.

include/llvm/IR/IntrinsicsAMDGPU.td
757 ↗	(On Diff #117674)	If we do stick with the simple intrinsic-based approach, I think we should keep the attributes as they are right now, i.e. keep them identical to AMDGPU.kill. That will give us fewer surprises...
lib/Target/AMDGPU/SIISelLowering.cpp
3002 ↗	(On Diff #117674)	I think Matt is right. It would be cleaner and give us the benefit of optimizing more cases.
test/CodeGen/AMDGPU/kill.ll
1 ↗	(On Diff #117674)	Fewer moving parts, I think. If instcombine is run on the test as well, there's a higher chance of tests being "broken" (in a false positive way) by random unrelated changes.

Address feedback.

Nicolai, that's a good point, though let's just merge this intrinsic replacement for now.

You're right, on second thought the existing intrinsic has the same problem. So this is still a strict improvement.

This revision is now accepted and ready to land.Oct 23 2017, 2:59 AM

Closed by commit rL316427: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1) (authored by mareko). · Explain WhyOct 24 2017, 3:27 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

3 lines

lib/

Target/

AMDGPU/

AMDGPUInstructions.td

1 line

7 lines

111 lines

3 lines

21 lines

4 lines

50 lines

SILowerControlFlow.cpp

9 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

8 lines

test/

CodeGen/

AMDGPU/

insert-skips-kill-uncond.mir

2 lines

llvm.amdgcn.kill.ll

241 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

15 lines

Diff 120033

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 747 Lines • ▼ Show 20 Lines
	>;			>;

	// Return true if at least one thread within the pixel quad passes true into			// Return true if at least one thread within the pixel quad passes true into
	// the function.			// the function.
	def int_amdgcn_wqm_vote : Intrinsic<[llvm_i1_ty],			def int_amdgcn_wqm_vote : Intrinsic<[llvm_i1_ty],
	[llvm_i1_ty], [IntrNoMem, IntrConvergent]			[llvm_i1_ty], [IntrNoMem, IntrConvergent]
	>;			>;

				// If false, set EXEC=0 for the current thread until the end of program.
				def int_amdgcn_kill : Intrinsic<[], [llvm_i1_ty], []>;

	// Copies the active channels of the source value to the destination value,			// Copies the active channels of the source value to the destination value,
	// with the guarantee that the source value is computed as if the entire			// with the guarantee that the source value is computed as if the entire
	// program were executed in Whole Wavefront Mode, i.e. with all channels			// program were executed in Whole Wavefront Mode, i.e. with all channels
	// enabled, with a few exceptions: - Phi nodes with require WWM return an			// enabled, with a few exceptions: - Phi nodes with require WWM return an
	// undefined value.			// undefined value.
	def int_amdgcn_wwm : Intrinsic<[llvm_any_ty],			def int_amdgcn_wwm : Intrinsic<[llvm_any_ty],
	[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]			[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]
	>;			>;
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructions.td

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	def COND_OLT : PatLeaf <
[{return N->get() == ISD::SETOLT \|\| N->get() == ISD::SETLT;}]		[{return N->get() == ISD::SETOLT \|\| N->get() == ISD::SETLT;}]
>;		>;

def COND_OLE : PatLeaf <		def COND_OLE : PatLeaf <
(cond),		(cond),
[{return N->get() == ISD::SETOLE \|\| N->get() == ISD::SETLE;}]		[{return N->get() == ISD::SETOLE \|\| N->get() == ISD::SETLE;}]
>;		>;


def COND_O : PatLeaf <(cond), [{return N->get() == ISD::SETO;}]>;		def COND_O : PatLeaf <(cond), [{return N->get() == ISD::SETO;}]>;
def COND_UO : PatLeaf <(cond), [{return N->get() == ISD::SETUO;}]>;		def COND_UO : PatLeaf <(cond), [{return N->get() == ISD::SETUO;}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// PatLeafs for unsigned / unordered comparisons		// PatLeafs for unsigned / unordered comparisons
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def COND_UEQ : PatLeaf <(cond), [{return N->get() == ISD::SETUEQ;}]>;		def COND_UEQ : PatLeaf <(cond), [{return N->get() == ISD::SETUEQ;}]>;
▲ Show 20 Lines • Show All 506 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 2,443 Lines • ▼ Show 20 Lines	MachineBasicBlock *SITargetLowering::splitKillBlock(MachineInstr &MI,
MachineBasicBlock *BB) const {		MachineBasicBlock *BB) const {
const SIInstrInfo *TII = getSubtarget()->getInstrInfo();		const SIInstrInfo *TII = getSubtarget()->getInstrInfo();

MachineBasicBlock::iterator SplitPoint(&MI);		MachineBasicBlock::iterator SplitPoint(&MI);
++SplitPoint;		++SplitPoint;

if (SplitPoint == BB->end()) {		if (SplitPoint == BB->end()) {
// Don't bother with a new block.		// Don't bother with a new block.
MI.setDesc(TII->get(AMDGPU::SI_KILL_TERMINATOR));		MI.setDesc(TII->getKillTerminatorFromPseudo(MI.getOpcode()));
return BB;		return BB;
}		}

MachineFunction *MF = BB->getParent();		MachineFunction *MF = BB->getParent();
MachineBasicBlock *SplitBB		MachineBasicBlock *SplitBB
= MF->CreateMachineBasicBlock(BB->getBasicBlock());		= MF->CreateMachineBasicBlock(BB->getBasicBlock());

MF->insert(++MachineFunction::iterator(BB), SplitBB);		MF->insert(++MachineFunction::iterator(BB), SplitBB);
SplitBB->splice(SplitBB->begin(), BB, SplitPoint, BB->end());		SplitBB->splice(SplitBB->begin(), BB, SplitPoint, BB->end());

SplitBB->transferSuccessorsAndUpdatePHIs(BB);		SplitBB->transferSuccessorsAndUpdatePHIs(BB);
BB->addSuccessor(SplitBB);		BB->addSuccessor(SplitBB);

MI.setDesc(TII->get(AMDGPU::SI_KILL_TERMINATOR));		MI.setDesc(TII->getKillTerminatorFromPseudo(MI.getOpcode()));
return SplitBB;		return SplitBB;
}		}

// Do a v_movrels_b32 or v_movreld_b32 for each unique value of \p IdxReg in the		// Do a v_movrels_b32 or v_movreld_b32 for each unique value of \p IdxReg in the
// wavefront. If the value is uniform and just happens to be in a VGPR, this		// wavefront. If the value is uniform and just happens to be in a VGPR, this
// will only do one iteration. In the worst case, this will loop 64 times.		// will only do one iteration. In the worst case, this will loop 64 times.
//		//
// TODO: Just use v_readlane_b32 if we know the VGPR has a uniform value.		// TODO: Just use v_readlane_b32 if we know the VGPR has a uniform value.
▲ Show 20 Lines • Show All 537 Lines • ▼ Show 20 Lines	MachineBasicBlock *SITargetLowering::EmitInstrWithCustomInserter(
case AMDGPU::SI_INDIRECT_SRC_V16:		case AMDGPU::SI_INDIRECT_SRC_V16:
return emitIndirectSrc(MI, BB, getSubtarget());		return emitIndirectSrc(MI, BB, getSubtarget());
case AMDGPU::SI_INDIRECT_DST_V1:		case AMDGPU::SI_INDIRECT_DST_V1:
case AMDGPU::SI_INDIRECT_DST_V2:		case AMDGPU::SI_INDIRECT_DST_V2:
case AMDGPU::SI_INDIRECT_DST_V4:		case AMDGPU::SI_INDIRECT_DST_V4:
case AMDGPU::SI_INDIRECT_DST_V8:		case AMDGPU::SI_INDIRECT_DST_V8:
case AMDGPU::SI_INDIRECT_DST_V16:		case AMDGPU::SI_INDIRECT_DST_V16:
return emitIndirectDst(MI, BB, getSubtarget());		return emitIndirectDst(MI, BB, getSubtarget());
case AMDGPU::SI_KILL:		case AMDGPU::SI_KILL_F32_COND_IMM_PSEUDO:
		case AMDGPU::SI_KILL_I1_PSEUDO:
return splitKillBlock(MI, BB);		return splitKillBlock(MI, BB);
case AMDGPU::V_CNDMASK_B64_PSEUDO: {		case AMDGPU::V_CNDMASK_B64_PSEUDO: {
MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();

unsigned Dst = MI.getOperand(0).getReg();		unsigned Dst = MI.getOperand(0).getReg();
unsigned Src0 = MI.getOperand(1).getReg();		unsigned Src0 = MI.getOperand(1).getReg();
unsigned Src1 = MI.getOperand(2).getReg();		unsigned Src1 = MI.getOperand(2).getReg();
const DebugLoc &DL = MI.getDebugLoc();		const DebugLoc &DL = MI.getDebugLoc();
▲ Show 20 Lines • Show All 3,825 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInsertSkips.cpp

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	bool SIInsertSkips::skipIfDead(MachineInstr &MI, MachineBasicBlock &NextBB) {
BuildMI(*SkipBB, Insert, DL, TII->get(AMDGPU::S_ENDPGM));		BuildMI(*SkipBB, Insert, DL, TII->get(AMDGPU::S_ENDPGM));

return true;		return true;
}		}

void SIInsertSkips::kill(MachineInstr &MI) {		void SIInsertSkips::kill(MachineInstr &MI) {
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();

		switch (MI.getOpcode()) {
		case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR: {
		unsigned Opcode = 0;

		// The opcodes are inverted because the inline immediate has to be
		// the first operand, e.g. from "x < imm" to "imm > x"
		switch (MI.getOperand(2).getImm()) {
		case ISD::SETOEQ:
		case ISD::SETEQ:
		Opcode = AMDGPU::V_CMPX_EQ_F32_e32;
		break;
		case ISD::SETOGT:
		case ISD::SETGT:
		Opcode = AMDGPU::V_CMPX_LT_F32_e32;
		break;
		case ISD::SETOGE:
		case ISD::SETGE:
		Opcode = AMDGPU::V_CMPX_LE_F32_e32;
		break;
		case ISD::SETOLT:
		case ISD::SETLT:
		Opcode = AMDGPU::V_CMPX_GT_F32_e32;
		break;
		case ISD::SETOLE:
		case ISD::SETLE:
		Opcode = AMDGPU::V_CMPX_GE_F32_e32;
		break;
		case ISD::SETONE:
		case ISD::SETNE:
		Opcode = AMDGPU::V_CMPX_LG_F32_e32;
		break;
		case ISD::SETO:
		Opcode = AMDGPU::V_CMPX_O_F32_e32;
		break;
		case ISD::SETUO:
		Opcode = AMDGPU::V_CMPX_U_F32_e32;
		break;
		case ISD::SETUEQ:
		Opcode = AMDGPU::V_CMPX_NLG_F32_e32;
		break;
		case ISD::SETUGT:
		Opcode = AMDGPU::V_CMPX_NGE_F32_e32;
		break;
		case ISD::SETUGE:
		Opcode = AMDGPU::V_CMPX_NGT_F32_e32;
		break;
		case ISD::SETULT:
		Opcode = AMDGPU::V_CMPX_NLE_F32_e32;
		break;
		case ISD::SETULE:
		Opcode = AMDGPU::V_CMPX_NLT_F32_e32;
		break;
		case ISD::SETUNE:
		Opcode = AMDGPU::V_CMPX_NEQ_F32_e32;
		break;
		default:
		llvm_unreachable("invalid ISD:SET cond code");
		}

		// TODO: Allow this:
		if (!MI.getOperand(0).isReg() \|\|
		!TRI->isVGPR(MBB.getParent()->getRegInfo(),
		MI.getOperand(0).getReg()))
		llvm_unreachable("SI_KILL operand should be a VGPR");

		BuildMI(MBB, &MI, DL, TII->get(Opcode))
		.add(MI.getOperand(1))
		.add(MI.getOperand(0));
		break;
		}
		case AMDGPU::SI_KILL_I1_TERMINATOR: {
const MachineOperand &Op = MI.getOperand(0);		const MachineOperand &Op = MI.getOperand(0);
		int64_t KillVal = MI.getOperand(1).getImm();
		assert(KillVal == 0 \|\| KillVal == -1);

#ifndef NDEBUG		// Kill all threads if Op0 is an immediate and equal to the Kill value.
CallingConv::ID CallConv = MBB.getParent()->getFunction()->getCallingConv();
// Kill is only allowed in pixel / geometry shaders.
assert(CallConv == CallingConv::AMDGPU_PS \|\|
CallConv == CallingConv::AMDGPU_GS);
#endif
// Clear this thread from the exec mask if the operand is negative.
if (Op.isImm()) {		if (Op.isImm()) {
// Constant operand: Set exec mask to 0 or do nothing		int64_t Imm = Op.getImm();
if (Op.getImm() & 0x80000000) {		assert(Imm == 0 \|\| Imm == -1);

		if (Imm == KillVal)
BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_MOV_B64), AMDGPU::EXEC)		BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_MOV_B64), AMDGPU::EXEC)
.addImm(0);		.addImm(0);
		break;
}		}
} else {
BuildMI(MBB, &MI, DL, TII->get(AMDGPU::V_CMPX_LE_F32_e32))		unsigned Opcode = KillVal ? AMDGPU::S_ANDN2_B64 : AMDGPU::S_AND_B64;
.addImm(0)		BuildMI(MBB, &MI, DL, TII->get(Opcode), AMDGPU::EXEC)
		.addReg(AMDGPU::EXEC)
.add(Op);		.add(Op);
		break;
		}
		default:
		llvm_unreachable("invalid opcode, expected SI_KILL_*_TERMINATOR");
}		}
}		}

MachineBasicBlock *SIInsertSkips::insertSkipBlock(		MachineBasicBlock *SIInsertSkips::insertSkipBlock(
MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const {		MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();

MachineBasicBlock *SkipBB = MF->CreateMachineBasicBlock();		MachineBasicBlock *SkipBB = MF->CreateMachineBasicBlock();
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	for (I = MBB.begin(); I != MBB.end(); I = Next) {
} else if (HaveSkipBlock) {		} else if (HaveSkipBlock) {
// Remove the given unconditional branch when a skip block has been		// Remove the given unconditional branch when a skip block has been
// inserted after the current one and let skip the two instructions		// inserted after the current one and let skip the two instructions
// performing the kill if the exec mask is non-zero.		// performing the kill if the exec mask is non-zero.
MI.eraseFromParent();		MI.eraseFromParent();
}		}
break;		break;

case AMDGPU::SI_KILL_TERMINATOR:		case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR:
		case AMDGPU::SI_KILL_I1_TERMINATOR:
MadeChange = true;		MadeChange = true;
kill(MI);		kill(MI);

if (ExecBranchStack.empty()) {		if (ExecBranchStack.empty()) {
if (skipIfDead(MI, *NextBB)) {		if (skipIfDead(MI, *NextBB)) {
HaveSkipBlock = true;		HaveSkipBlock = true;
NextBB = std::next(BI);		NextBB = std::next(BI);
BE = MF.end();		BE = MF.end();
Show All 37 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 851 Lines • ▼ Show 20 Lines	public:
/// \brief Return a partially built integer add instruction without carry.		/// \brief Return a partially built integer add instruction without carry.
/// Caller must add source operands.		/// Caller must add source operands.
/// For pre-GFX9 it will generate unused carry destination operand.		/// For pre-GFX9 it will generate unused carry destination operand.
/// TODO: After GFX9 it should return a no-carry operation.		/// TODO: After GFX9 it should return a no-carry operation.
MachineInstrBuilder getAddNoCarry(MachineBasicBlock &MBB,		MachineInstrBuilder getAddNoCarry(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
const DebugLoc &DL,		const DebugLoc &DL,
unsigned DestReg) const;		unsigned DestReg) const;

		static bool isKillTerminator(unsigned Opcode);
		const MCInstrDesc &getKillTerminatorFromPseudo(unsigned Opcode) const;
};		};

namespace AMDGPU {		namespace AMDGPU {

LLVM_READONLY		LLVM_READONLY
int getVOPe64(uint16_t Opcode);		int getVOPe64(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 4,585 Lines • ▼ Show 20 Lines	SIInstrInfo::getAddNoCarry(MachineBasicBlock &MBB,
unsigned DestReg) const {		unsigned DestReg) const {
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();

unsigned UnusedCarry = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);		unsigned UnusedCarry = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);

return BuildMI(MBB, I, DL, get(AMDGPU::V_ADD_I32_e64), DestReg)		return BuildMI(MBB, I, DL, get(AMDGPU::V_ADD_I32_e64), DestReg)
.addReg(UnusedCarry, RegState::Define \| RegState::Dead);		.addReg(UnusedCarry, RegState::Define \| RegState::Dead);
}		}

		bool SIInstrInfo::isKillTerminator(unsigned Opcode) {
		switch (Opcode) {
		case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR:
		case AMDGPU::SI_KILL_I1_TERMINATOR:
		return true;
		default:
		return false;
		}
		}

		const MCInstrDesc &SIInstrInfo::getKillTerminatorFromPseudo(unsigned Opcode) const {
		switch (Opcode) {
		case AMDGPU::SI_KILL_F32_COND_IMM_PSEUDO:
		return get(AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR);
		case AMDGPU::SI_KILL_I1_PSEUDO:
		return get(AMDGPU::SI_KILL_I1_TERMINATOR);
		default:
		llvm_unreachable("invalid opcode, expected SI_KILL_*_PSEUDO");
		}
		}

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
	def as_i32imm: SDNodeXForm<imm, [{			def as_i32imm: SDNodeXForm<imm, [{
	return CurDAG->getTargetConstant(N->getSExtValue(), SDLoc(N), MVT::i32);			return CurDAG->getTargetConstant(N->getSExtValue(), SDLoc(N), MVT::i32);
	}]>;			}]>;

	def as_i64imm: SDNodeXForm<imm, [{			def as_i64imm: SDNodeXForm<imm, [{
	return CurDAG->getTargetConstant(N->getSExtValue(), SDLoc(N), MVT::i64);			return CurDAG->getTargetConstant(N->getSExtValue(), SDLoc(N), MVT::i64);
	}]>;			}]>;

				def cond_as_i32imm: SDNodeXForm<cond, [{
				return CurDAG->getTargetConstant(N->get(), SDLoc(N), MVT::i32);
				}]>;

	// Copied from the AArch64 backend:			// Copied from the AArch64 backend:
	def bitcast_fpimm_to_i32 : SDNodeXForm<fpimm, [{			def bitcast_fpimm_to_i32 : SDNodeXForm<fpimm, [{
	return CurDAG->getTargetConstant(			return CurDAG->getTargetConstant(
	N->getValueAPF().bitcastToAPInt().getZExtValue(), SDLoc(N), MVT::i32);			N->getValueAPF().bitcastToAPInt().getZExtValue(), SDLoc(N), MVT::i32);
	}]>;			}]>;

	def frameindex_to_targetframeindex : SDNodeXForm<frameindex, [{			def frameindex_to_targetframeindex : SDNodeXForm<frameindex, [{
	auto FI = cast<FrameIndexSDNode>(N);			auto FI = cast<FrameIndexSDNode>(N);
	▲ Show 20 Lines • Show All 1,538 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	def SI_ELSE_BREAK : CFPseudoInstSI <
(outs SReg_64:$dst), (ins SReg_64:$src0, SReg_64:$src1),		(outs SReg_64:$dst), (ins SReg_64:$src0, SReg_64:$src1),
[(set i64:$dst, (int_amdgcn_else_break i64:$src0, i64:$src1))]> {		[(set i64:$dst, (int_amdgcn_else_break i64:$src0, i64:$src1))]> {
let Size = 4;		let Size = 4;
let isAsCheapAsAMove = 1;		let isAsCheapAsAMove = 1;
let isReMaterializable = 1;		let isReMaterializable = 1;
}		}

let Uses = [EXEC], Defs = [EXEC,VCC] in {		let Uses = [EXEC], Defs = [EXEC,VCC] in {
def SI_KILL : PseudoInstSI <
(outs), (ins VSrc_b32:$src),		multiclass PseudoInstKill <dag ins> {
[(AMDGPUkill i32:$src)]> {		def _PSEUDO : PseudoInstSI <(outs), ins> {
let isConvergent = 1;		let isConvergent = 1;
let usesCustomInserter = 1;		let usesCustomInserter = 1;
}		}

def SI_KILL_TERMINATOR : SPseudoInstSI <		def _TERMINATOR : SPseudoInstSI <(outs), ins> {
(outs), (ins VSrc_b32:$src)> {
let isTerminator = 1;		let isTerminator = 1;
}		}
		}

		defm SI_KILL_I1 : PseudoInstKill <(ins SSrc_b64:$src, i1imm:$killvalue)>;
		defm SI_KILL_F32_COND_IMM : PseudoInstKill <(ins VSrc_b32:$src0, i32imm:$src1, i32imm:$cond)>;

def SI_ILLEGAL_COPY : SPseudoInstSI <		def SI_ILLEGAL_COPY : SPseudoInstSI <
(outs unknown:$dst), (ins unknown:$src),		(outs unknown:$dst), (ins unknown:$src),
[], " ; illegal copy $src to $dst">;		[], " ; illegal copy $src to $dst">;

} // End Uses = [EXEC], Defs = [EXEC,VCC]		} // End Uses = [EXEC], Defs = [EXEC,VCC]

// Branch on undef scc. Used to avoid intermediate copy from		// Branch on undef scc. Used to avoid intermediate copy from
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines

def : GCNPat<		def : GCNPat<
(AMDGPUelse i64:$src, bb:$target),		(AMDGPUelse i64:$src, bb:$target),
(SI_ELSE $src, $target, 0)		(SI_ELSE $src, $target, 0)
>;		>;

def : GCNPat <		def : GCNPat <
(int_AMDGPU_kilp),		(int_AMDGPU_kilp),
(SI_KILL (i32 0xbf800000))		(SI_KILL_I1_PSEUDO (i1 0), 0)
		>;

		def : Pat <
		// -1.0 as i32 (LowerINTRINSIC_VOID converts all other constants to -1.0)
		(AMDGPUkill (i32 -1082130432)),
		(SI_KILL_I1_PSEUDO (i1 0), 0)
		>;

		def : Pat <
		(int_amdgcn_kill i1:$src),
		(SI_KILL_I1_PSEUDO $src, 0)
		>;

		def : Pat <
		(int_amdgcn_kill (i1 (not i1:$src))),
		(SI_KILL_I1_PSEUDO $src, -1)
		>;

		def : Pat <
		(AMDGPUkill i32:$src),
		(SI_KILL_F32_COND_IMM_PSEUDO $src, 0, 3) // 3 means SETOGE
		>;

		def : Pat <
		(int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
		(SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
>;		>;
		// TODO: we could add more variants for other types of conditionals

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP1 Patterns		// VOP1 Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let SubtargetPredicate = isGCN, OtherPredicates = [UnsafeFPMath] in {		let SubtargetPredicate = isGCN, OtherPredicates = [UnsafeFPMath] in {

//def : RcpPat<V_RCP_F64_e32, f64>;		//def : RcpPat<V_RCP_F64_e32, f64>;
▲ Show 20 Lines • Show All 968 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SILowerControlFlow.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	static void setImpSCCDefDead(MachineInstr &MI, bool IsDead) {
MachineOperand &ImpDefSCC = MI.getOperand(3);		MachineOperand &ImpDefSCC = MI.getOperand(3);
assert(ImpDefSCC.getReg() == AMDGPU::SCC && ImpDefSCC.isDef());		assert(ImpDefSCC.getReg() == AMDGPU::SCC && ImpDefSCC.isDef());

ImpDefSCC.setIsDead(IsDead);		ImpDefSCC.setIsDead(IsDead);
}		}

char &llvm::SILowerControlFlowID = SILowerControlFlow::ID;		char &llvm::SILowerControlFlowID = SILowerControlFlow::ID;

static bool isSimpleIf(const MachineInstr &MI, const MachineRegisterInfo *MRI) {		static bool isSimpleIf(const MachineInstr &MI, const MachineRegisterInfo *MRI,
		const SIInstrInfo *TII) {
unsigned SaveExecReg = MI.getOperand(0).getReg();		unsigned SaveExecReg = MI.getOperand(0).getReg();
auto U = MRI->use_instr_nodbg_begin(SaveExecReg);		auto U = MRI->use_instr_nodbg_begin(SaveExecReg);

if (U == MRI->use_instr_nodbg_end() \|\|		if (U == MRI->use_instr_nodbg_end() \|\|
std::next(U) != MRI->use_instr_nodbg_end() \|\|		std::next(U) != MRI->use_instr_nodbg_end() \|\|
U->getOpcode() != AMDGPU::SI_END_CF)		U->getOpcode() != AMDGPU::SI_END_CF)
return false;		return false;

// Check for SI_KILL_TERMINATOR on path from if to endif.		// Check for SI_KILL_*_TERMINATOR on path from if to endif.
// if there is any such terminator simplififcations are not safe.		// if there is any such terminator simplififcations are not safe.
auto SMBB = MI.getParent();		auto SMBB = MI.getParent();
auto EMBB = U->getParent();		auto EMBB = U->getParent();
DenseSet<const MachineBasicBlock*> Visited;		DenseSet<const MachineBasicBlock*> Visited;
SmallVector<MachineBasicBlock*, 4> Worklist(SMBB->succ_begin(),		SmallVector<MachineBasicBlock*, 4> Worklist(SMBB->succ_begin(),
SMBB->succ_end());		SMBB->succ_end());

while (!Worklist.empty()) {		while (!Worklist.empty()) {
MachineBasicBlock *MBB = Worklist.pop_back_val();		MachineBasicBlock *MBB = Worklist.pop_back_val();

if (MBB == EMBB \|\| !Visited.insert(MBB).second)		if (MBB == EMBB \|\| !Visited.insert(MBB).second)
continue;		continue;
for(auto &Term : MBB->terminators())		for(auto &Term : MBB->terminators())
if (Term.getOpcode() == AMDGPU::SI_KILL_TERMINATOR)		if (TII->isKillTerminator(Term.getOpcode()))
return false;		return false;

Worklist.append(MBB->succ_begin(), MBB->succ_end());		Worklist.append(MBB->succ_begin(), MBB->succ_end());
}		}

return true;		return true;
}		}

Show All 10 Lines	void SILowerControlFlow::emitIf(MachineInstr &MI) {
unsigned SaveExecReg = SaveExec.getReg();		unsigned SaveExecReg = SaveExec.getReg();

MachineOperand &ImpDefSCC = MI.getOperand(4);		MachineOperand &ImpDefSCC = MI.getOperand(4);
assert(ImpDefSCC.getReg() == AMDGPU::SCC && ImpDefSCC.isDef());		assert(ImpDefSCC.getReg() == AMDGPU::SCC && ImpDefSCC.isDef());

// If there is only one use of save exec register and that use is SI_END_CF,		// If there is only one use of save exec register and that use is SI_END_CF,
// we can optimize SI_IF by returning the full saved exec mask instead of		// we can optimize SI_IF by returning the full saved exec mask instead of
// just cleared bits.		// just cleared bits.
bool SimpleIf = isSimpleIf(MI, MRI);		bool SimpleIf = isSimpleIf(MI, MRI, TII);

// Add an implicit def of exec to discourage scheduling VALU after this which		// Add an implicit def of exec to discourage scheduling VALU after this which
// will interfere with trying to form s_and_saveexec_b64 later.		// will interfere with trying to form s_and_saveexec_b64 later.
unsigned CopyReg = SimpleIf ? SaveExecReg		unsigned CopyReg = SimpleIf ? SaveExecReg
: MRI->createVirtualRegister(&AMDGPU::SReg_64RegClass);		: MRI->createVirtualRegister(&AMDGPU::SReg_64RegClass);
MachineInstr *CopyExec =		MachineInstr *CopyExec =
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), CopyReg)		BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), CopyReg)
.addReg(AMDGPU::EXEC)		.addReg(AMDGPU::EXEC)
▲ Show 20 Lines • Show All 321 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

	Show First 20 Lines • Show All 3,533 Lines • ▼ Show 20 Lines
	}			}
	case Intrinsic::amdgcn_wqm_vote: {			case Intrinsic::amdgcn_wqm_vote: {
	// wqm_vote is identity when the argument is constant.			// wqm_vote is identity when the argument is constant.
	if (!isa<Constant>(II->getArgOperand(0)))			if (!isa<Constant>(II->getArgOperand(0)))
	break;			break;

	return replaceInstUsesWith(*II, II->getArgOperand(0));			return replaceInstUsesWith(*II, II->getArgOperand(0));
	}			}
				case Intrinsic::amdgcn_kill: {
				const ConstantInt *C = dyn_cast<ConstantInt>(II->getArgOperand(0));
				if (!C \|\| !C->getZExtValue())
				break;

				// amdgcn.kill(i1 1) is a no-op
				return eraseInstFromFunction(CI);
				}
	case Intrinsic::stackrestore: {			case Intrinsic::stackrestore: {
	// If the save is right next to the restore, remove the restore. This can			// If the save is right next to the restore, remove the restore. This can
	// happen when variable allocas are DCE'd.			// happen when variable allocas are DCE'd.
	if (IntrinsicInst *SS = dyn_cast<IntrinsicInst>(II->getArgOperand(0))) {			if (IntrinsicInst *SS = dyn_cast<IntrinsicInst>(II->getArgOperand(0))) {
	if (SS->getIntrinsicID() == Intrinsic::stacksave) {			if (SS->getIntrinsicID() == Intrinsic::stacksave) {
	if (&*++SS->getIterator() == II)			if (&*++SS->getIterator() == II)
	return eraseInstFromFunction(CI);			return eraseInstFromFunction(CI);
	}			}
	▲ Show 20 Lines • Show All 846 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir

	Show All 27 Lines
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	S_CBRANCH_VCCNZ %bb.1, implicit %vcc			S_CBRANCH_VCCNZ %bb.1, implicit %vcc

	bb.1:			bb.1:
	successors: %bb.2			successors: %bb.2
	%vgpr0 = V_MOV_B32_e32 0, implicit %exec			%vgpr0 = V_MOV_B32_e32 0, implicit %exec
	SI_KILL_TERMINATOR %vgpr0, implicit-def %exec, implicit-def %vcc, implicit %exec			SI_KILL_F32_COND_IMM_TERMINATOR %vgpr0, 0, 3, implicit-def %exec, implicit-def %vcc, implicit %exec
	S_BRANCH %bb.2			S_BRANCH %bb.2

	bb.2:			bb.2:
	S_ENDPGM			S_ENDPGM

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.kill.ll

				; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI %s

				; SI-LABEL: {{^}}gs_const:
				; SI-NOT: v_cmpx
				; SI: s_mov_b64 exec, 0
				define amdgpu_gs void @gs_const() {
				%tmp = icmp ule i32 0, 3
				%tmp1 = select i1 %tmp, float 1.000000e+00, float -1.000000e+00
				%c1 = fcmp oge float %tmp1, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				%tmp2 = icmp ule i32 3, 0
				%tmp3 = select i1 %tmp2, float 1.000000e+00, float -1.000000e+00
				%c2 = fcmp oge float %tmp3, 0.0
				call void @llvm.amdgcn.kill(i1 %c2)
				ret void
				}

				; SI-LABEL: {{^}}vcc_implicit_def:
				; SI-NOT: v_cmp_gt_f32_e32 vcc,
				; SI: v_cmp_gt_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], 0, v{{[0-9]+}}
				; SI: v_cmpx_le_f32_e32 vcc, 0, v{{[0-9]+}}
				; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, 1.0, [[CMP]]
				define amdgpu_ps void @vcc_implicit_def(float %arg13, float %arg14) {
				%tmp0 = fcmp olt float %arg13, 0.000000e+00
				%c1 = fcmp oge float %arg14, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				%tmp1 = select i1 %tmp0, float 1.000000e+00, float 0.000000e+00
				call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
				ret void
				}

				; SI-LABEL: {{^}}true:
				; SI-NEXT: BB#
				; SI-NEXT: BB#
				; SI-NEXT: s_endpgm
				define amdgpu_gs void @true() {
				call void @llvm.amdgcn.kill(i1 true)
				ret void
				}

				; SI-LABEL: {{^}}false:
				; SI-NOT: v_cmpx
				; SI: s_mov_b64 exec, 0
				define amdgpu_gs void @false() {
				call void @llvm.amdgcn.kill(i1 false)
				ret void
				}

				; SI-LABEL: {{^}}and:
				; SI: v_cmp_lt_i32
				; SI: v_cmp_lt_i32
				; SI: s_or_b64 s[0:1]
				; SI: s_and_b64 exec, exec, s[0:1]
				define amdgpu_gs void @and(i32 %a, i32 %b, i32 %c, i32 %d) {
				%c1 = icmp slt i32 %a, %b
				%c2 = icmp slt i32 %c, %d
				%x = or i1 %c1, %c2
				call void @llvm.amdgcn.kill(i1 %x)
				ret void
				}

				; SI-LABEL: {{^}}andn2:
				; SI: v_cmp_lt_i32
				; SI: v_cmp_lt_i32
				; SI: s_xor_b64 s[0:1]
				; SI: s_andn2_b64 exec, exec, s[0:1]
				define amdgpu_gs void @andn2(i32 %a, i32 %b, i32 %c, i32 %d) {
				%c1 = icmp slt i32 %a, %b
				%c2 = icmp slt i32 %c, %d
				%x = xor i1 %c1, %c2
				%y = xor i1 %x, 1
				call void @llvm.amdgcn.kill(i1 %y)
				ret void
				}

				; SI-LABEL: {{^}}oeq:
				; SI: v_cmpx_eq_f32
				; SI-NOT: s_and
				define amdgpu_gs void @oeq(float %a) {
				%c1 = fcmp oeq float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}ogt:
				; SI: v_cmpx_lt_f32
				; SI-NOT: s_and
				define amdgpu_gs void @ogt(float %a) {
				%c1 = fcmp ogt float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}oge:
				; SI: v_cmpx_le_f32
				; SI-NOT: s_and
				define amdgpu_gs void @oge(float %a) {
				%c1 = fcmp oge float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}olt:
				; SI: v_cmpx_gt_f32
				; SI-NOT: s_and
				define amdgpu_gs void @olt(float %a) {
				%c1 = fcmp olt float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}ole:
				; SI: v_cmpx_ge_f32
				; SI-NOT: s_and
				define amdgpu_gs void @ole(float %a) {
				%c1 = fcmp ole float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}one:
				; SI: v_cmpx_lg_f32
				; SI-NOT: s_and
				define amdgpu_gs void @one(float %a) {
				%c1 = fcmp one float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}ord:
				; FIXME: This is absolutely unimportant, but we could use the cmpx variant here.
				; SI: v_cmp_o_f32
				define amdgpu_gs void @ord(float %a) {
				%c1 = fcmp ord float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}uno:
				; FIXME: This is absolutely unimportant, but we could use the cmpx variant here.
				; SI: v_cmp_u_f32
				define amdgpu_gs void @uno(float %a) {
				%c1 = fcmp uno float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}ueq:
				; SI: v_cmpx_nlg_f32
				; SI-NOT: s_and
				define amdgpu_gs void @ueq(float %a) {
				%c1 = fcmp ueq float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}ugt:
				; SI: v_cmpx_nge_f32
				; SI-NOT: s_and
				define amdgpu_gs void @ugt(float %a) {
				%c1 = fcmp ugt float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}uge:
				; SI: v_cmpx_ngt_f32_e32 vcc, -1.0
				; SI-NOT: s_and
				define amdgpu_gs void @uge(float %a) {
				%c1 = fcmp uge float %a, -1.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}ult:
				; SI: v_cmpx_nle_f32_e32 vcc, -2.0
				; SI-NOT: s_and
				define amdgpu_gs void @ult(float %a) {
				%c1 = fcmp ult float %a, -2.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}ule:
				; SI: v_cmpx_nlt_f32_e32 vcc, 2.0
				; SI-NOT: s_and
				define amdgpu_gs void @ule(float %a) {
				%c1 = fcmp ule float %a, 2.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}une:
				; SI: v_cmpx_neq_f32_e32 vcc, 0
				; SI-NOT: s_and
				define amdgpu_gs void @une(float %a) {
				%c1 = fcmp une float %a, 0.0
				call void @llvm.amdgcn.kill(i1 %c1)
				ret void
				}

				; SI-LABEL: {{^}}neg_olt:
				; SI: v_cmpx_ngt_f32_e32 vcc, 1.0
				; SI-NOT: s_and
				define amdgpu_gs void @neg_olt(float %a) {
				%c1 = fcmp olt float %a, 1.0
				%c2 = xor i1 %c1, 1
				call void @llvm.amdgcn.kill(i1 %c2)
				ret void
				}

				; SI-LABEL: {{^}}fcmp_x2:
				; FIXME: LLVM should be able to combine these fcmp opcodes.
				; SI: v_cmp_gt_f32
				; SI: v_cndmask_b32
				; SI: v_cmpx_le_f32
				define amdgpu_ps void @fcmp_x2(float %a) #0 {
				%ogt = fcmp nsz ogt float %a, 2.500000e-01
				%k = select i1 %ogt, float -1.000000e+00, float 0.000000e+00
				%c = fcmp nsz oge float %k, 0.000000e+00
				call void @llvm.amdgcn.kill(i1 %c) #1
				ret void
				}

				; SI-LABEL: {{^}}wqm:
				; SI: v_cmp_neq_f32_e32 vcc, 0
				; SI: s_wqm_b64 s[0:1], vcc
				; SI: s_and_b64 exec, exec, s[0:1]
				define amdgpu_ps void @wqm(float %a) {
				%c1 = fcmp une float %a, 0.0
				%c2 = call i1 @llvm.amdgcn.wqm.vote(i1 %c1)
				call void @llvm.amdgcn.kill(i1 %c2)
				ret void
				}

				declare void @llvm.amdgcn.kill(i1) #0
				declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
				declare i1 @llvm.amdgcn.wqm.vote(i1)

				attributes #0 = { nounwind }

llvm/trunk/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

	Show First 20 Lines • Show All 1,564 Lines • ▼ Show 20 Lines
	; CHECK: ret float 0.000000e+00			; CHECK: ret float 0.000000e+00
	define float @wqm_vote_undef() {			define float @wqm_vote_undef() {
	main_body:			main_body:
	%w = call i1 @llvm.amdgcn.wqm.vote(i1 undef)			%w = call i1 @llvm.amdgcn.wqm.vote(i1 undef)
	%r = select i1 %w, float 1.0, float 0.0			%r = select i1 %w, float 1.0, float 0.0
	ret float %r			ret float %r
	}			}

				; --------------------------------------------------------------------
				; llvm.amdgcn.kill
				; --------------------------------------------------------------------

				declare void @llvm.amdgcn.kill(i1)

				; CHECK-LABEL: @kill_true() {
				; CHECK-NEXT: ret void
				; CHECK-NEXT: }
				define void @kill_true() {
				call void @llvm.amdgcn.kill(i1 true)
				ret void
				}


	; CHECK: attributes #5 = { convergent }			; CHECK: attributes #5 = { convergent }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 120033

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructions.td

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/SIInsertSkips.cpp

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/SIInstructions.td

llvm/trunk/lib/Target/AMDGPU/SILowerControlFlow.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/trunk/test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.kill.ll

llvm/trunk/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
ClosedPublic