Diff 315328

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 1,348 Lines • ▼ Show 20 Lines

// Pixel shaders only: whether the current pixel is live (i.e. not a helper		// Pixel shaders only: whether the current pixel is live (i.e. not a helper
// invocation for derivative computation).		// invocation for derivative computation).
def int_amdgcn_ps_live : Intrinsic <		def int_amdgcn_ps_live : Intrinsic <
[llvm_i1_ty],		[llvm_i1_ty],
[],		[],
[IntrNoMem, IntrWillReturn]>;		[IntrNoMem, IntrWillReturn]>;

		// Similar to int_amdgcn_ps_live, but cannot be moved by LICM.
		// Returns true if lane is not a helper.
		arsenmUnsubmitted Not Done Reply Inline Actions I assume this needs to be convergent arsenm: I assume this needs to be convergent
		critsonAuthorUnsubmitted Done Reply Inline Actions Could be, my understanding is that without flags the intrinsic is marked "has side effects", which is correct as then it will not moved by LICM or removed by CSE. critson: Could be, my understanding is that without flags the intrinsic is marked "has side effects"…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Rethinking this, convergent isn't correct here, because there is no implied cross-thread communication. Rather, the semantics are that amdgcn.wqm.helper reads from some hidden memory that is written to by wqm.demote. So by that logic, this should arguably be ReadInaccessibleMemOnly. nhaehnle: Rethinking this, convergent isn't correct here, because there is no implied cross-thread…
		def int_amdgcn_live_mask : Intrinsic <[llvm_i1_ty],
		[], [IntrReadMem, IntrInaccessibleMemOnly]
		>;

def int_amdgcn_mbcnt_lo :		def int_amdgcn_mbcnt_lo :
GCCBuiltin<"__builtin_amdgcn_mbcnt_lo">,		GCCBuiltin<"__builtin_amdgcn_mbcnt_lo">,
Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],		Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
[IntrNoMem, IntrWillReturn]>;		[IntrNoMem, IntrWillReturn]>;

def int_amdgcn_mbcnt_hi :		def int_amdgcn_mbcnt_hi :
GCCBuiltin<"__builtin_amdgcn_mbcnt_hi">,		GCCBuiltin<"__builtin_amdgcn_mbcnt_hi">,
Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],		Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	def int_amdgcn_wqm_vote : Intrinsic<[llvm_i1_ty],
[llvm_i1_ty], [IntrNoMem, IntrConvergent, IntrWillReturn]		[llvm_i1_ty], [IntrNoMem, IntrConvergent, IntrWillReturn]
>;		>;

// If false, set EXEC=0 for the current thread until the end of program.		// If false, set EXEC=0 for the current thread until the end of program.
// FIXME: Should this be IntrNoMem, IntrHasSideEffects, or IntrWillReturn?		// FIXME: Should this be IntrNoMem, IntrHasSideEffects, or IntrWillReturn?
def int_amdgcn_kill : Intrinsic<[], [llvm_i1_ty], []>;		def int_amdgcn_kill : Intrinsic<[], [llvm_i1_ty], []>;

def int_amdgcn_endpgm : GCCBuiltin<"__builtin_amdgcn_endpgm">,		def int_amdgcn_endpgm : GCCBuiltin<"__builtin_amdgcn_endpgm">,
Intrinsic<[], [], [IntrNoReturn, IntrCold, IntrNoMem, IntrHasSideEffects]		Intrinsic<[], [], [IntrNoReturn, IntrCold, IntrNoMem, IntrHasSideEffects]
		arsenmUnsubmitted Not Done Reply Inline Actions Ditto, and nomem arsenm: Ditto, and nomem
		critsonAuthorUnsubmitted Not Done Reply Inline Actions Convergent maybe (as above), but not nomem. If this is marked nomem then it will be eaten by early CSE. Since this was modelled on kill, is there a reason we don't mark kill Convergent? critson: Convergent maybe (as above), but not nomem. If this is marked nomem then it will be eaten by…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Following the logic above, this should not be convergent but WritesInaccessibleMemOnly. At least I think that captures the semantics most accurately. nhaehnle: Following the logic above, this should not be convergent but WritesInaccessibleMemOnly. At…
>;		>;

		// If false, mark all active lanes as helper lanes until the end of program.
		def int_amdgcn_wqm_demote : Intrinsic<[],
		[llvm_i1_ty], [IntrWriteMem, IntrInaccessibleMemOnly]
		>;

// Copies the active channels of the source value to the destination value,		// Copies the active channels of the source value to the destination value,
// with the guarantee that the source value is computed as if the entire		// with the guarantee that the source value is computed as if the entire
// program were executed in Whole Wavefront Mode, i.e. with all channels		// program were executed in Whole Wavefront Mode, i.e. with all channels
// enabled, with a few exceptions: - Phi nodes with require WWM return an		// enabled, with a few exceptions: - Phi nodes with require WWM return an
// undefined value.		// undefined value.
def int_amdgcn_wwm : Intrinsic<[llvm_any_ty],		def int_amdgcn_wwm : Intrinsic<[llvm_any_ty],
[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable,		[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable,
IntrConvergent, IntrWillReturn]		IntrConvergent, IntrWillReturn]
▲ Show 20 Lines • Show All 432 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 4,312 Lines • ▼ Show 20 Lines	case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS: {
}		}
case Intrinsic::amdgcn_else: {		case Intrinsic::amdgcn_else: {
unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);		unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);		OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);		OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);		OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
break;		break;
}		}
		case Intrinsic::amdgcn_live_mask: {
		OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
		break;
		}
		case Intrinsic::amdgcn_wqm_demote:
case Intrinsic::amdgcn_kill: {		case Intrinsic::amdgcn_kill: {
OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);		OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
break;		break;
}		}
case Intrinsic::amdgcn_raw_buffer_load:		case Intrinsic::amdgcn_raw_buffer_load:
case Intrinsic::amdgcn_raw_tbuffer_load: {		case Intrinsic::amdgcn_raw_tbuffer_load: {
// FIXME: Should make intrinsic ID the last operand of the instruction,		// FIXME: Should make intrinsic ID the last operand of the instruction,
// then this would be the same as store		// then this would be the same as store
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

	Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
	def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_or>;			def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_or>;
	def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_xor>;			def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_xor>;
	def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_inc>;			def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_inc>;
	def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_dec>;			def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_dec>;
	def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_fadd>;			def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_fadd>;
	def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_cmpswap>;			def : SourceOfDivergence<int_amdgcn_struct_buffer_atomic_cmpswap>;
	def : SourceOfDivergence<int_amdgcn_buffer_atomic_csub>;			def : SourceOfDivergence<int_amdgcn_buffer_atomic_csub>;
	def : SourceOfDivergence<int_amdgcn_ps_live>;			def : SourceOfDivergence<int_amdgcn_ps_live>;
				def : SourceOfDivergence<int_amdgcn_live_mask>;
				arsenmUnsubmitted Done Reply Inline Actions Missing DivergenceAnalysis test arsenm: Missing DivergenceAnalysis test
	def : SourceOfDivergence<int_amdgcn_ds_swizzle>;			def : SourceOfDivergence<int_amdgcn_ds_swizzle>;
	def : SourceOfDivergence<int_amdgcn_ds_ordered_add>;			def : SourceOfDivergence<int_amdgcn_ds_ordered_add>;
	def : SourceOfDivergence<int_amdgcn_ds_ordered_swap>;			def : SourceOfDivergence<int_amdgcn_ds_ordered_swap>;
	def : SourceOfDivergence<int_amdgcn_permlane16>;			def : SourceOfDivergence<int_amdgcn_permlane16>;
	def : SourceOfDivergence<int_amdgcn_permlanex16>;			def : SourceOfDivergence<int_amdgcn_permlanex16>;
	def : SourceOfDivergence<int_amdgcn_mov_dpp>;			def : SourceOfDivergence<int_amdgcn_mov_dpp>;
	def : SourceOfDivergence<int_amdgcn_mov_dpp8>;			def : SourceOfDivergence<int_amdgcn_mov_dpp8>;
	def : SourceOfDivergence<int_amdgcn_update_dpp>;			def : SourceOfDivergence<int_amdgcn_update_dpp>;
	Show All 34 Lines

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp

Show All 37 Lines	private:
MachineDominatorTree *MDT = nullptr;		MachineDominatorTree *MDT = nullptr;

MachineBasicBlock *EarlyExitBlock = nullptr;		MachineBasicBlock *EarlyExitBlock = nullptr;
bool EarlyExitClearsExec = false;		bool EarlyExitClearsExec = false;

bool shouldSkip(const MachineBasicBlock &From,		bool shouldSkip(const MachineBasicBlock &From,
const MachineBasicBlock &To) const;		const MachineBasicBlock &To) const;

bool dominatesAllReachable(MachineBasicBlock &MBB);
void ensureEarlyExitBlock(MachineBasicBlock &MBB, bool ClearExec);		void ensureEarlyExitBlock(MachineBasicBlock &MBB, bool ClearExec);
void skipIfDead(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
DebugLoc DL);

bool kill(MachineInstr &MI);		bool tidySCCDef(MachineInstr &MI);
void earlyTerm(MachineInstr &MI);		void earlyTerm(MachineInstr &MI);

bool skipMaskBranch(MachineInstr &MI, MachineBasicBlock &MBB);		bool skipMaskBranch(MachineInstr &MI, MachineBasicBlock &MBB);

public:		public:
static char ID;		static char ID;

		unsigned MovOpc;
		Register ExecReg;

SIInsertSkips() : MachineFunctionPass(ID) {}		SIInsertSkips() : MachineFunctionPass(ID) {}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override {		StringRef getPassName() const override {
return "SI insert s_cbranch_execz instructions";		return "SI insert s_cbranch_execz instructions";
}		}

▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::const_iterator I = MBB.begin(), E = MBB.end();
if (NumInstr >= SkipThreshold)		if (NumInstr >= SkipThreshold)
return true;		return true;
}		}
}		}

return false;		return false;
}		}

/// Check whether \p MBB dominates all blocks that are reachable from it.
bool SIInsertSkips::dominatesAllReachable(MachineBasicBlock &MBB) {
for (MachineBasicBlock *Other : depth_first(&MBB)) {
if (!MDT->dominates(&MBB, Other))
return false;
}
return true;
}

static void generateEndPgm(MachineBasicBlock &MBB,		static void generateEndPgm(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I, DebugLoc DL,		MachineBasicBlock::iterator I, DebugLoc DL,
const SIInstrInfo *TII, bool IsPS) {		const SIInstrInfo *TII, bool IsPS) {
// "null export"		// "null export"
if (IsPS) {		if (IsPS) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::EXP_DONE))		BuildMI(MBB, I, DL, TII->get(AMDGPU::EXP_DONE))
.addImm(AMDGPU::Exp::ET_NULL)		.addImm(AMDGPU::Exp::ET_NULL)
.addReg(AMDGPU::VGPR0, RegState::Undef)		.addReg(AMDGPU::VGPR0, RegState::Undef)
Show All 18 Lines	if (!EarlyExitBlock) {
MF->insert(MF->end(), EarlyExitBlock);		MF->insert(MF->end(), EarlyExitBlock);
generateEndPgm(*EarlyExitBlock, EarlyExitBlock->end(), DL, TII,		generateEndPgm(*EarlyExitBlock, EarlyExitBlock->end(), DL, TII,
MF->getFunction().getCallingConv() ==		MF->getFunction().getCallingConv() ==
CallingConv::AMDGPU_PS);		CallingConv::AMDGPU_PS);
EarlyExitClearsExec = false;		EarlyExitClearsExec = false;
}		}

if (ClearExec && !EarlyExitClearsExec) {		if (ClearExec && !EarlyExitClearsExec) {
const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();
unsigned Mov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
Register Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
auto ExitI = EarlyExitBlock->getFirstNonPHI();		auto ExitI = EarlyExitBlock->getFirstNonPHI();
BuildMI(*EarlyExitBlock, ExitI, DL, TII->get(Mov), Exec).addImm(0);		BuildMI(*EarlyExitBlock, ExitI, DL, TII->get(MovOpc), ExecReg).addImm(0);
EarlyExitClearsExec = true;		EarlyExitClearsExec = true;
}		}
}		}

static void splitBlock(MachineBasicBlock &MBB, MachineInstr &MI,		static void splitBlock(MachineBasicBlock &MBB, MachineInstr &MI,
MachineDominatorTree *MDT) {		MachineDominatorTree *MDT) {
MachineBasicBlock SplitBB = MBB.splitAt(MI, /UpdateLiveIns*/ true);		MachineBasicBlock SplitBB = MBB.splitAt(MI, /UpdateLiveIns*/ true);

// Update dominator tree		// Update dominator tree
using DomTreeT = DomTreeBase<MachineBasicBlock>;		using DomTreeT = DomTreeBase<MachineBasicBlock>;
SmallVector<DomTreeT::UpdateType, 16> DTUpdates;		SmallVector<DomTreeT::UpdateType, 16> DTUpdates;
for (MachineBasicBlock *Succ : SplitBB->successors()) {		for (MachineBasicBlock *Succ : SplitBB->successors()) {
DTUpdates.push_back({DomTreeT::Insert, SplitBB, Succ});		DTUpdates.push_back({DomTreeT::Insert, SplitBB, Succ});
DTUpdates.push_back({DomTreeT::Delete, &MBB, Succ});		DTUpdates.push_back({DomTreeT::Delete, &MBB, Succ});
}		}
DTUpdates.push_back({DomTreeT::Insert, &MBB, SplitBB});		DTUpdates.push_back({DomTreeT::Insert, &MBB, SplitBB});
MDT->getBase().applyUpdates(DTUpdates);		MDT->getBase().applyUpdates(DTUpdates);
}		}

/// Insert an "if exec=0 { null export; s_endpgm }" sequence before the given		bool SIInsertSkips::tidySCCDef(MachineInstr &MI) {
/// iterator. Only applies to pixel shaders.
void SIInsertSkips::skipIfDead(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I, DebugLoc DL) {
MachineFunction *MF = MBB.getParent();
(void)MF;
assert(MF->getFunction().getCallingConv() == CallingConv::AMDGPU_PS);

// It is possible for an SI_KILL_*_TERMINATOR to sit at the bottom of a
// basic block that has no further successors (e.g., there was an
// `unreachable` there in IR). This can happen with original source of the
// form:
//
// if (uniform_condition) {
// write_to_memory();
// discard;
// }
//
// In this case, we write the "null_export; s_endpgm" skip code in the
// already-existing basic block.
auto NextBBI = std::next(MBB.getIterator());
bool NoSuccessor =
I == MBB.end() && !llvm::is_contained(MBB.successors(), &*NextBBI);

if (NoSuccessor) {
generateEndPgm(MBB, I, DL, TII, true);
} else {
ensureEarlyExitBlock(MBB, false);

MachineInstr *BranchMI =
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_CBRANCH_EXECZ))
.addMBB(EarlyExitBlock);

// Split the block if the branch will not come at the end.
auto Next = std::next(BranchMI->getIterator());
if (Next != MBB.end() && !Next->isTerminator())
splitBlock(MBB, *BranchMI, MDT);

MBB.addSuccessor(EarlyExitBlock);
MDT->getBase().insertEdge(&MBB, EarlyExitBlock);
}
}

/// Translate a SI_KILL_*_TERMINATOR into exec-manipulating instructions.
/// Return true unless the terminator is a no-op.
bool SIInsertSkips::kill(MachineInstr &MI) {
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
DebugLoc DL = MI.getDebugLoc();

switch (MI.getOpcode()) {
case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR: {
unsigned Opcode = 0;

// The opcodes are inverted because the inline immediate has to be		// Peek at the previous instruction in case this can be unconditional
// the first operand, e.g. from "x < imm" to "imm > x"		assert(MI.getIterator() != MBB.begin());
switch (MI.getOperand(2).getImm()) {		auto Prev = std::prev(MI.getIterator());
case ISD::SETOEQ:		if (Prev->getOpcode() == AMDGPU::S_ANDN2_B32 \|\|
case ISD::SETEQ:		Prev->getOpcode() == AMDGPU::S_ANDN2_B64) {
Opcode = AMDGPU::V_CMPX_EQ_F32_e64;		auto Src0 = Prev->getOperand(1);
break;		auto Src1 = Prev->getOperand(2);
case ISD::SETOGT:		if (Src0.isReg() && Src0.getReg() == ExecReg && Src1.isReg() &&
case ISD::SETGT:		Src1.getReg() == ExecReg) {
Opcode = AMDGPU::V_CMPX_LT_F32_e64;		// SCC will always be 0; use unconditional branch
break;		Register Dst = Prev->getOperand(0).getReg();
case ISD::SETOGE:		// Simplify S_ANDN2, remove entirely for exec, as it is set in exit block
case ISD::SETGE:		if (Dst != ExecReg) {
Opcode = AMDGPU::V_CMPX_LE_F32_e64;		BuildMI(MBB, Prev, Prev->getDebugLoc(), TII->get(MovOpc), Dst)
break;
case ISD::SETOLT:
case ISD::SETLT:
Opcode = AMDGPU::V_CMPX_GT_F32_e64;
break;
case ISD::SETOLE:
case ISD::SETLE:
Opcode = AMDGPU::V_CMPX_GE_F32_e64;
break;
case ISD::SETONE:
case ISD::SETNE:
Opcode = AMDGPU::V_CMPX_LG_F32_e64;
break;
case ISD::SETO:
Opcode = AMDGPU::V_CMPX_O_F32_e64;
break;
case ISD::SETUO:
Opcode = AMDGPU::V_CMPX_U_F32_e64;
break;
case ISD::SETUEQ:
Opcode = AMDGPU::V_CMPX_NLG_F32_e64;
break;
case ISD::SETUGT:
Opcode = AMDGPU::V_CMPX_NGE_F32_e64;
break;
case ISD::SETUGE:
Opcode = AMDGPU::V_CMPX_NGT_F32_e64;
break;
case ISD::SETULT:
Opcode = AMDGPU::V_CMPX_NLE_F32_e64;
break;
case ISD::SETULE:
Opcode = AMDGPU::V_CMPX_NLT_F32_e64;
break;
case ISD::SETUNE:
Opcode = AMDGPU::V_CMPX_NEQ_F32_e64;
break;
default:
llvm_unreachable("invalid ISD:SET cond code");
}

const GCNSubtarget &ST = MBB.getParent()->getSubtarget<GCNSubtarget>();
if (ST.hasNoSdstCMPX())
Opcode = AMDGPU::getVCMPXNoSDstOp(Opcode);

assert(MI.getOperand(0).isReg());

if (TRI->isVGPR(MBB.getParent()->getRegInfo(),
MI.getOperand(0).getReg())) {
Opcode = AMDGPU::getVOPe32(Opcode);
BuildMI(MBB, &MI, DL, TII->get(Opcode))
.add(MI.getOperand(1))
.add(MI.getOperand(0));
} else {
auto I = BuildMI(MBB, &MI, DL, TII->get(Opcode));
if (!ST.hasNoSdstCMPX())
I.addReg(AMDGPU::VCC, RegState::Define);

I.addImm(0) // src0 modifiers
.add(MI.getOperand(1))
.addImm(0) // src1 modifiers
.add(MI.getOperand(0));

I.addImm(0); // omod
}
return true;
}
case AMDGPU::SI_KILL_I1_TERMINATOR: {
const MachineFunction *MF = MI.getParent()->getParent();
const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();
unsigned Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
const MachineOperand &Op = MI.getOperand(0);
int64_t KillVal = MI.getOperand(1).getImm();
assert(KillVal == 0 \|\| KillVal == -1);

// Kill all threads if Op0 is an immediate and equal to the Kill value.
if (Op.isImm()) {
int64_t Imm = Op.getImm();
assert(Imm == 0 \|\| Imm == -1);

if (Imm == KillVal) {
BuildMI(MBB, &MI, DL, TII->get(ST.isWave32() ? AMDGPU::S_MOV_B32
: AMDGPU::S_MOV_B64), Exec)
.addImm(0);		.addImm(0);
return true;
}		}
return false;		Prev->eraseFromParent();
}

unsigned Opcode = KillVal ? AMDGPU::S_ANDN2_B64 : AMDGPU::S_AND_B64;
if (ST.isWave32())
Opcode = KillVal ? AMDGPU::S_ANDN2_B32 : AMDGPU::S_AND_B32;
BuildMI(MBB, &MI, DL, TII->get(Opcode), Exec)
.addReg(Exec)
.add(Op);
return true;		return true;
}		}
default:
llvm_unreachable("invalid opcode, expected SI_KILL_*_TERMINATOR");
}		}
		return false;
}		}

void SIInsertSkips::earlyTerm(MachineInstr &MI) {		void SIInsertSkips::earlyTerm(MachineInstr &MI) {
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
const DebugLoc DL = MI.getDebugLoc();		const DebugLoc DL = MI.getDebugLoc();

ensureEarlyExitBlock(MBB, true);		ensureEarlyExitBlock(MBB, true);

auto BranchMI = BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_CBRANCH_SCC0))		// Can we make branch unconditional?
		bool ReplaceSuccessor = MBB.succ_size() <= 1;
		if (ReplaceSuccessor)
		ReplaceSuccessor = tidySCCDef(MI);

		MachineInstr *BranchMI = nullptr;
		if (ReplaceSuccessor) {
		// Branch is always taken
		BranchMI =
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_BRANCH)).addMBB(EarlyExitBlock);
		} else {
		BranchMI = BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_CBRANCH_SCC0))
.addMBB(EarlyExitBlock);		.addMBB(EarlyExitBlock);
auto Next = std::next(MI.getIterator());		}

		auto Next = std::next(MI.getIterator());
if (Next != MBB.end() && !Next->isTerminator())		if (Next != MBB.end() && !Next->isTerminator())
splitBlock(MBB, *BranchMI, MDT);		splitBlock(MBB, *BranchMI, MDT);

		MachineBasicBlock *OldSuccessor = nullptr;
		if (ReplaceSuccessor && !MBB.succ_empty()) {
		OldSuccessor = *MBB.succ_begin();
		MBB.replaceSuccessor(OldSuccessor, EarlyExitBlock);
		} else {
MBB.addSuccessor(EarlyExitBlock);		MBB.addSuccessor(EarlyExitBlock);
		}

		// Update MDT
MDT->getBase().insertEdge(&MBB, EarlyExitBlock);		MDT->getBase().insertEdge(&MBB, EarlyExitBlock);
		if (OldSuccessor)
		MDT->getBase().deleteEdge(&MBB, OldSuccessor);

MI.eraseFromParent();		MI.eraseFromParent();
}		}

// Returns true if a branch over the block was inserted.		// Returns true if a branch over the block was inserted.
bool SIInsertSkips::skipMaskBranch(MachineInstr &MI,		bool SIInsertSkips::skipMaskBranch(MachineInstr &MI,
MachineBasicBlock &SrcMBB) {		MachineBasicBlock &SrcMBB) {
MachineBasicBlock *DestBB = MI.getOperand(0).getMBB();		MachineBasicBlock *DestBB = MI.getOperand(0).getMBB();
Show All 12 Lines

bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) {		bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();
MDT = &getAnalysis<MachineDominatorTree>();		MDT = &getAnalysis<MachineDominatorTree>();
SkipThreshold = SkipThresholdFlag;		SkipThreshold = SkipThresholdFlag;

SmallVector<MachineInstr *, 4> KillInstrs;		MovOpc = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
		ExecReg = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;

SmallVector<MachineInstr *, 4> EarlyTermInstrs;		SmallVector<MachineInstr *, 4> EarlyTermInstrs;
bool MadeChange = false;		bool MadeChange = false;

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
MachineBasicBlock::iterator I, Next;		MachineBasicBlock::iterator I, Next;
for (I = MBB.begin(); I != MBB.end(); I = Next) {		for (I = MBB.begin(); I != MBB.end(); I = Next) {
Next = std::next(I);		Next = std::next(I);
MachineInstr &MI = *I;		MachineInstr &MI = *I;

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::SI_MASK_BRANCH:		case AMDGPU::SI_MASK_BRANCH:
MadeChange \|= skipMaskBranch(MI, MBB);		MadeChange \|= skipMaskBranch(MI, MBB);
break;		break;

case AMDGPU::S_BRANCH:		case AMDGPU::S_BRANCH:
// Optimize out branches to the next block.		// Optimize out branches to the next block.
// FIXME: Shouldn't this be handled by BranchFolding?		// FIXME: Shouldn't this be handled by BranchFolding?
if (MBB.isLayoutSuccessor(MI.getOperand(0).getMBB())) {		if (MBB.isLayoutSuccessor(MI.getOperand(0).getMBB())) {
assert(&MI == &MBB.back());		assert(&MI == &MBB.back());
MI.eraseFromParent();		MI.eraseFromParent();
MadeChange = true;		MadeChange = true;
}		}
break;		break;

case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR:
case AMDGPU::SI_KILL_I1_TERMINATOR: {
MadeChange = true;
bool CanKill = kill(MI);

// Check if we can add an early "if exec=0 { end shader }".
//
// Note that we _always_ do this if it is correct, even if the kill
// happens fairly late in the shader, because the null export should
// generally still be cheaper than normal export(s).
//
// TODO: The dominatesAllReachable check is conservative: if the
// dominance is only missing due to _uniform_ branches, we could
// in fact insert the early-exit as well.
if (CanKill &&
MF.getFunction().getCallingConv() == CallingConv::AMDGPU_PS &&
dominatesAllReachable(MBB)) {
// Mark the instruction for kill-if-dead insertion. We delay this
// change because it modifies the CFG.
KillInstrs.push_back(&MI);
} else {
MI.eraseFromParent();
}
break;
}

case AMDGPU::SI_KILL_CLEANUP:
if (MF.getFunction().getCallingConv() == CallingConv::AMDGPU_PS &&
dominatesAllReachable(MBB)) {
KillInstrs.push_back(&MI);
} else {
MI.eraseFromParent();
}
break;

case AMDGPU::SI_EARLY_TERMINATE_SCC0:		case AMDGPU::SI_EARLY_TERMINATE_SCC0:
EarlyTermInstrs.push_back(&MI);		EarlyTermInstrs.push_back(&MI);
break;		break;

default:		default:
break;		break;
}		}
}		}
}		}

for (MachineInstr *Instr : EarlyTermInstrs) {		for (MachineInstr *Instr : EarlyTermInstrs) {
// Early termination in GS does nothing		// Early termination in GS does nothing
if (MF.getFunction().getCallingConv() != CallingConv::AMDGPU_GS)		if (MF.getFunction().getCallingConv() != CallingConv::AMDGPU_GS) {
earlyTerm(*Instr);		earlyTerm(*Instr);
else		} else {
		tidySCCDef(*Instr);
Instr->eraseFromParent();		Instr->eraseFromParent();
}		}
for (MachineInstr *Kill : KillInstrs) {
skipIfDead(*Kill->getParent(), std::next(Kill->getIterator()),
Kill->getDebugLoc());
Kill->eraseFromParent();
}		}
KillInstrs.clear();
EarlyTermInstrs.clear();		EarlyTermInstrs.clear();
EarlyExitBlock = nullptr;		EarlyExitBlock = nullptr;

return MadeChange;		return MadeChange;
}		}

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,629 Lines • ▼ Show 20 Lines	case AMDGPU::S_ANDN2_B64_term:
break;		break;

case AMDGPU::S_ANDN2_B32_term:		case AMDGPU::S_ANDN2_B32_term:
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
// register allocation.		// register allocation.
MI.setDesc(get(AMDGPU::S_ANDN2_B32));		MI.setDesc(get(AMDGPU::S_ANDN2_B32));
break;		break;

		case AMDGPU::S_AND_B64_term:
		// This is only a terminator to get the correct spill code placement during
		// register allocation.
		MI.setDesc(get(AMDGPU::S_AND_B64));
		break;

		case AMDGPU::S_AND_B32_term:
		// This is only a terminator to get the correct spill code placement during
		// register allocation.
		MI.setDesc(get(AMDGPU::S_AND_B32));
		break;

case AMDGPU::V_MOV_B64_PSEUDO: {		case AMDGPU::V_MOV_B64_PSEUDO: {
Register Dst = MI.getOperand(0).getReg();		Register Dst = MI.getOperand(0).getReg();
Register DstLo = RI.getSubReg(Dst, AMDGPU::sub0);		Register DstLo = RI.getSubReg(Dst, AMDGPU::sub0);
Register DstHi = RI.getSubReg(Dst, AMDGPU::sub1);		Register DstHi = RI.getSubReg(Dst, AMDGPU::sub1);

const MachineOperand &SrcOp = MI.getOperand(1);		const MachineOperand &SrcOp = MI.getOperand(1);
// FIXME: Will this work for 64-bit floating point immediates?		// FIXME: Will this work for 64-bit floating point immediates?
assert(!SrcOp.isFPImm());		assert(!SrcOp.isFPImm());
▲ Show 20 Lines • Show All 615 Lines • ▼ Show 20 Lines	bool SIInstrInfo::analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
while (I != E && !I->isBranch() && !I->isReturn() &&		while (I != E && !I->isBranch() && !I->isReturn() &&
I->getOpcode() != AMDGPU::SI_MASK_BRANCH) {		I->getOpcode() != AMDGPU::SI_MASK_BRANCH) {
switch (I->getOpcode()) {		switch (I->getOpcode()) {
case AMDGPU::SI_MASK_BRANCH:		case AMDGPU::SI_MASK_BRANCH:
case AMDGPU::S_MOV_B64_term:		case AMDGPU::S_MOV_B64_term:
case AMDGPU::S_XOR_B64_term:		case AMDGPU::S_XOR_B64_term:
case AMDGPU::S_OR_B64_term:		case AMDGPU::S_OR_B64_term:
case AMDGPU::S_ANDN2_B64_term:		case AMDGPU::S_ANDN2_B64_term:
		case AMDGPU::S_AND_B64_term:
case AMDGPU::S_MOV_B32_term:		case AMDGPU::S_MOV_B32_term:
case AMDGPU::S_XOR_B32_term:		case AMDGPU::S_XOR_B32_term:
case AMDGPU::S_OR_B32_term:		case AMDGPU::S_OR_B32_term:
case AMDGPU::S_ANDN2_B32_term:		case AMDGPU::S_ANDN2_B32_term:
		case AMDGPU::S_AND_B32_term:
break;		break;
case AMDGPU::SI_IF:		case AMDGPU::SI_IF:
case AMDGPU::SI_ELSE:		case AMDGPU::SI_ELSE:
case AMDGPU::SI_KILL_I1_TERMINATOR:		case AMDGPU::SI_KILL_I1_TERMINATOR:
case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR:		case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR:
// FIXME: It's messy that these need to be considered here at all.		// FIXME: It's messy that these need to be considered here at all.
return true;		return true;
default:		default:
▲ Show 20 Lines • Show All 5,150 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	class WrapTerminatorInst<SOP_Pseudo base_inst> : SPseudoInstSI<
let SchedRW = base_inst.SchedRW;		let SchedRW = base_inst.SchedRW;
}		}

let WaveSizePredicate = isWave64 in {		let WaveSizePredicate = isWave64 in {
def S_MOV_B64_term : WrapTerminatorInst<S_MOV_B64>;		def S_MOV_B64_term : WrapTerminatorInst<S_MOV_B64>;
def S_XOR_B64_term : WrapTerminatorInst<S_XOR_B64>;		def S_XOR_B64_term : WrapTerminatorInst<S_XOR_B64>;
def S_OR_B64_term : WrapTerminatorInst<S_OR_B64>;		def S_OR_B64_term : WrapTerminatorInst<S_OR_B64>;
def S_ANDN2_B64_term : WrapTerminatorInst<S_ANDN2_B64>;		def S_ANDN2_B64_term : WrapTerminatorInst<S_ANDN2_B64>;
		def S_AND_B64_term : WrapTerminatorInst<S_AND_B64>;
}		}

let WaveSizePredicate = isWave32 in {		let WaveSizePredicate = isWave32 in {
def S_MOV_B32_term : WrapTerminatorInst<S_MOV_B32>;		def S_MOV_B32_term : WrapTerminatorInst<S_MOV_B32>;
def S_XOR_B32_term : WrapTerminatorInst<S_XOR_B32>;		def S_XOR_B32_term : WrapTerminatorInst<S_XOR_B32>;
def S_OR_B32_term : WrapTerminatorInst<S_OR_B32>;		def S_OR_B32_term : WrapTerminatorInst<S_OR_B32>;
def S_ANDN2_B32_term : WrapTerminatorInst<S_ANDN2_B32>;		def S_ANDN2_B32_term : WrapTerminatorInst<S_ANDN2_B32>;
		def S_AND_B32_term : WrapTerminatorInst<S_AND_B32>;
}		}


def WAVE_BARRIER : SPseudoInstSI<(outs), (ins),		def WAVE_BARRIER : SPseudoInstSI<(outs), (ins),
[(int_amdgcn_wave_barrier)]> {		[(int_amdgcn_wave_barrier)]> {
let SchedRW = [];		let SchedRW = [];
let hasNoSchedulingInfo = 1;		let hasNoSchedulingInfo = 1;
let hasSideEffects = 1;		let hasSideEffects = 1;
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	multiclass PseudoInstKill <dag ins> {
def _TERMINATOR : SPseudoInstSI <(outs), ins> {		def _TERMINATOR : SPseudoInstSI <(outs), ins> {
let isTerminator = 1;		let isTerminator = 1;
}		}
}		}

defm SI_KILL_I1 : PseudoInstKill <(ins SCSrc_i1:$src, i1imm:$killvalue)>;		defm SI_KILL_I1 : PseudoInstKill <(ins SCSrc_i1:$src, i1imm:$killvalue)>;
defm SI_KILL_F32_COND_IMM : PseudoInstKill <(ins VSrc_b32:$src0, i32imm:$src1, i32imm:$cond)>;		defm SI_KILL_F32_COND_IMM : PseudoInstKill <(ins VSrc_b32:$src0, i32imm:$src1, i32imm:$cond)>;

let Defs = [EXEC] in
def SI_KILL_CLEANUP : SPseudoInstSI <(outs), (ins)>;

let Defs = [EXEC,VCC] in		let Defs = [EXEC,VCC] in
def SI_ILLEGAL_COPY : SPseudoInstSI <		def SI_ILLEGAL_COPY : SPseudoInstSI <
(outs unknown:$dst), (ins unknown:$src),		(outs unknown:$dst), (ins unknown:$src),
[], " ; illegal copy $src to $dst">;		[], " ; illegal copy $src to $dst">;

} // End Uses = [EXEC], Defs = [EXEC,VCC]		} // End Uses = [EXEC], Defs = [EXEC,VCC]

// Branch on undef scc. Used to avoid intermediate copy from		// Branch on undef scc. Used to avoid intermediate copy from
// IMPLICIT_DEF to SCC.		// IMPLICIT_DEF to SCC.
def SI_BR_UNDEF : SPseudoInstSI <(outs), (ins sopp_brtarget:$simm16)> {		def SI_BR_UNDEF : SPseudoInstSI <(outs), (ins sopp_brtarget:$simm16)> {
let isTerminator = 1;		let isTerminator = 1;
let usesCustomInserter = 1;		let usesCustomInserter = 1;
let isBranch = 1;		let isBranch = 1;
}		}

def SI_PS_LIVE : PseudoInstSI <		def SI_PS_LIVE : PseudoInstSI <
(outs SReg_1:$dst), (ins),		(outs SReg_1:$dst), (ins),
[(set i1:$dst, (int_amdgcn_ps_live))]> {		[(set i1:$dst, (int_amdgcn_ps_live))]> {
let SALU = 1;		let SALU = 1;
}		}

		let Uses = [EXEC] in {
		def SI_LIVE_MASK : PseudoInstSI <
		(outs SReg_1:$dst), (ins),
		[(set i1:$dst, (int_amdgcn_live_mask))]> {
		let SALU = 1;
		}
		let Defs = [EXEC,SCC] in {
		// Demote: Turn a pixel shader thread into a helper lane.
		def SI_DEMOTE_I1 : SPseudoInstSI <(outs), (ins SCSrc_i1:$src, i1imm:$killvalue)> {
		}
		} // End Defs = [EXEC,SCC]
		foadUnsubmitted Not Done Reply Inline Actions EXEC,SCC foad: EXEC,SCC
		critsonAuthorUnsubmitted Done Reply Inline Actions Thanks! critson: Thanks!
		} // End Uses = [EXEC]

def SI_MASKED_UNREACHABLE : SPseudoInstSI <(outs), (ins),		def SI_MASKED_UNREACHABLE : SPseudoInstSI <(outs), (ins),
[(int_amdgcn_unreachable)],		[(int_amdgcn_unreachable)],
"; divergent unreachable"> {		"; divergent unreachable"> {
let Size = 0;		let Size = 0;
let hasNoSchedulingInfo = 1;		let hasNoSchedulingInfo = 1;
let FixedSize = 1;		let FixedSize = 1;
}		}

▲ Show 20 Lines • Show All 376 Lines • ▼ Show 20 Lines	def : Pat <
(SI_KILL_I1_PSEUDO SCSrc_i1:$src, -1)		(SI_KILL_I1_PSEUDO SCSrc_i1:$src, -1)
>;		>;

def : Pat <		def : Pat <
(int_amdgcn_kill (i1 (setcc f32:$src, InlineImmFP32:$imm, cond:$cond))),		(int_amdgcn_kill (i1 (setcc f32:$src, InlineImmFP32:$imm, cond:$cond))),
(SI_KILL_F32_COND_IMM_PSEUDO VSrc_b32:$src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))		(SI_KILL_F32_COND_IMM_PSEUDO VSrc_b32:$src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
>;		>;

		def : Pat <
		(int_amdgcn_wqm_demote i1:$src),
		(SI_DEMOTE_I1 SCSrc_i1:$src, 0)
		>;

		def : Pat <
		(int_amdgcn_wqm_demote (i1 (not i1:$src))),
		(SI_DEMOTE_I1 SCSrc_i1:$src, -1)
		>;

// TODO: we could add more variants for other types of conditionals		// TODO: we could add more variants for other types of conditionals

def : Pat <		def : Pat <
(i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),		(i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
(COPY $src) // Return the SGPRs representing i1 src		(COPY $src) // Return the SGPRs representing i1 src
>;		>;

def : Pat <		def : Pat <
▲ Show 20 Lines • Show All 1,910 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
class SILowerControlFlow : public MachineFunctionPass {		class SILowerControlFlow : public MachineFunctionPass {
private:		private:
const SIRegisterInfo *TRI = nullptr;		const SIRegisterInfo *TRI = nullptr;
const SIInstrInfo *TII = nullptr;		const SIInstrInfo *TII = nullptr;
LiveIntervals *LIS = nullptr;		LiveIntervals *LIS = nullptr;
MachineRegisterInfo *MRI = nullptr;		MachineRegisterInfo *MRI = nullptr;
SetVector<MachineInstr*> LoweredEndCf;		SetVector<MachineInstr*> LoweredEndCf;
DenseSet<Register> LoweredIf;		DenseSet<Register> LoweredIf;
SmallSet<MachineInstr *, 16> NeedsKillCleanup;

const TargetRegisterClass *BoolRC = nullptr;		const TargetRegisterClass *BoolRC = nullptr;
bool InsertKillCleanups;
unsigned AndOpc;		unsigned AndOpc;
unsigned OrOpc;		unsigned OrOpc;
unsigned XorOpc;		unsigned XorOpc;
unsigned MovTermOpc;		unsigned MovTermOpc;
unsigned Andn2TermOpc;		unsigned Andn2TermOpc;
unsigned XorTermrOpc;		unsigned XorTermrOpc;
unsigned OrTermrOpc;		unsigned OrTermrOpc;
unsigned OrSaveExecOpc;		unsigned OrSaveExecOpc;
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	void SILowerControlFlow::emitIf(MachineInstr &MI) {
MachineOperand &ImpDefSCC = MI.getOperand(4);		MachineOperand &ImpDefSCC = MI.getOperand(4);
assert(ImpDefSCC.getReg() == AMDGPU::SCC && ImpDefSCC.isDef());		assert(ImpDefSCC.getReg() == AMDGPU::SCC && ImpDefSCC.isDef());

// If there is only one use of save exec register and that use is SI_END_CF,		// If there is only one use of save exec register and that use is SI_END_CF,
// we can optimize SI_IF by returning the full saved exec mask instead of		// we can optimize SI_IF by returning the full saved exec mask instead of
// just cleared bits.		// just cleared bits.
bool SimpleIf = isSimpleIf(MI, MRI);		bool SimpleIf = isSimpleIf(MI, MRI);

if (InsertKillCleanups) {		if (SimpleIf) {
// Check for SI_KILL_*_TERMINATOR on full path of control flow and
// flag the associated SI_END_CF for insertion of a kill cleanup.
auto UseMI = MRI->use_instr_nodbg_begin(SaveExecReg);
while (UseMI->getOpcode() != AMDGPU::SI_END_CF) {
assert(std::next(UseMI) == MRI->use_instr_nodbg_end());
assert(UseMI->getOpcode() == AMDGPU::SI_ELSE);
MachineOperand &NextExec = UseMI->getOperand(0);
Register NextExecReg = NextExec.getReg();
if (NextExec.isDead()) {
assert(!SimpleIf);
break;
}
UseMI = MRI->use_instr_nodbg_begin(NextExecReg);
}
if (UseMI->getOpcode() == AMDGPU::SI_END_CF) {
if (hasKill(MI.getParent(), UseMI->getParent(), TII)) {
NeedsKillCleanup.insert(&*UseMI);
SimpleIf = false;
}
}
} else if (SimpleIf) {
// Check for SI_KILL_*_TERMINATOR on path from if to endif.		// Check for SI_KILL_*_TERMINATOR on path from if to endif.
// if there is any such terminator simplifications are not safe.		// if there is any such terminator simplifications are not safe.
auto UseMI = MRI->use_instr_nodbg_begin(SaveExecReg);		auto UseMI = MRI->use_instr_nodbg_begin(SaveExecReg);
SimpleIf = !hasKill(MI.getParent(), UseMI->getParent(), TII);		SimpleIf = !hasKill(MI.getParent(), UseMI->getParent(), TII);
}		}

// Add an implicit def of exec to discourage scheduling VALU after this which		// Add an implicit def of exec to discourage scheduling VALU after this which
// will interfere with trying to form s_and_saveexec_b64 later.		// will interfere with trying to form s_and_saveexec_b64 later.
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	SILowerControlFlow::skipIgnoreExecInstsTrivialSucc(
SmallSet<const MachineBasicBlock *, 4> Visited;		SmallSet<const MachineBasicBlock *, 4> Visited;
MachineBasicBlock *B = &MBB;		MachineBasicBlock *B = &MBB;
do {		do {
if (!Visited.insert(B).second)		if (!Visited.insert(B).second)
return MBB.end();		return MBB.end();

auto E = B->end();		auto E = B->end();
for ( ; It != E; ++It) {		for ( ; It != E; ++It) {
if (It->getOpcode() == AMDGPU::SI_KILL_CLEANUP)
continue;
if (TII->mayReadEXEC(MRI, It))		if (TII->mayReadEXEC(MRI, It))
break;		break;
}		}

if (It != E)		if (It != E)
return It;		return It;

if (B->succ_size() != 1)		if (B->succ_size() != 1)
Show All 36 Lines	MachineBasicBlock *SILowerControlFlow::emitEndCf(MachineInstr &MI) {

MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(MBB, InsPt, DL, TII->get(Opcode), Exec)		BuildMI(MBB, InsPt, DL, TII->get(Opcode), Exec)
.addReg(Exec)		.addReg(Exec)
.add(MI.getOperand(0));		.add(MI.getOperand(0));

LoweredEndCf.insert(NewMI);		LoweredEndCf.insert(NewMI);

// If this ends control flow which contains kills (as flagged in emitIf)		if (LIS)
// then insert an SI_KILL_CLEANUP immediately following the exec mask
// manipulation. This can be lowered to early termination if appropriate.
MachineInstr *CleanUpMI = nullptr;
if (NeedsKillCleanup.count(&MI))
CleanUpMI = BuildMI(MBB, InsPt, DL, TII->get(AMDGPU::SI_KILL_CLEANUP));

if (LIS) {
LIS->ReplaceMachineInstrInMaps(MI, *NewMI);		LIS->ReplaceMachineInstrInMaps(MI, *NewMI);
if (CleanUpMI)
LIS->InsertMachineInstrInMaps(*CleanUpMI);
}

MI.eraseFromParent();		MI.eraseFromParent();

if (LIS)		if (LIS)
LIS->handleMove(*NewMI);		LIS->handleMove(*NewMI);
return SplitBB;		return SplitBB;
}		}

▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	bool SILowerControlFlow::runOnMachineFunction(MachineFunction &MF) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

// This doesn't actually need LiveIntervals, but we can preserve them.		// This doesn't actually need LiveIntervals, but we can preserve them.
LIS = getAnalysisIfAvailable<LiveIntervals>();		LIS = getAnalysisIfAvailable<LiveIntervals>();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
BoolRC = TRI->getBoolRC();		BoolRC = TRI->getBoolRC();
InsertKillCleanups =
MF.getFunction().getCallingConv() == CallingConv::AMDGPU_PS;

if (ST.isWave32()) {		if (ST.isWave32()) {
AndOpc = AMDGPU::S_AND_B32;		AndOpc = AMDGPU::S_AND_B32;
OrOpc = AMDGPU::S_OR_B32;		OrOpc = AMDGPU::S_OR_B32;
XorOpc = AMDGPU::S_XOR_B32;		XorOpc = AMDGPU::S_XOR_B32;
MovTermOpc = AMDGPU::S_MOV_B32_term;		MovTermOpc = AMDGPU::S_MOV_B32_term;
Andn2TermOpc = AMDGPU::S_ANDN2_B32_term;		Andn2TermOpc = AMDGPU::S_ANDN2_B32_term;
XorTermrOpc = AMDGPU::S_XOR_B32_term;		XorTermrOpc = AMDGPU::S_XOR_B32_term;
Show All 32 Lines	for (I = MBB->begin(); I != E; I = Next) {
SplitMBB = process(MI);		SplitMBB = process(MI);
break;		break;

case AMDGPU::SI_ELSE:		case AMDGPU::SI_ELSE:
case AMDGPU::SI_IF_BREAK:		case AMDGPU::SI_IF_BREAK:
case AMDGPU::SI_LOOP:		case AMDGPU::SI_LOOP:
case AMDGPU::SI_END_CF:		case AMDGPU::SI_END_CF:
// Only build worklist if SI_IF instructions must be processed first.		// Only build worklist if SI_IF instructions must be processed first.
if (InsertKillCleanups)
Worklist.push_back(&MI);
else
SplitMBB = process(MI);		SplitMBB = process(MI);
break;		break;

default:		default:
break;		break;
}		}

if (SplitMBB != MBB) {		if (SplitMBB != MBB) {
MBB = Next->getParent();		MBB = Next->getParent();
E = MBB->end();		E = MBB->end();
}		}
}		}
}		}

for (MachineInstr *MI : Worklist)		for (MachineInstr *MI : Worklist)
process(*MI);		process(*MI);

optimizeEndCf();		optimizeEndCf();

LoweredEndCf.clear();		LoweredEndCf.clear();
LoweredIf.clear();		LoweredIf.clear();
NeedsKillCleanup.clear();

return true;		return true;
}		}

llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	case AMDGPU::S_ANDN2_B64_term: {
return true;		return true;
}		}
case AMDGPU::S_ANDN2_B32_term: {		case AMDGPU::S_ANDN2_B32_term: {
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
// register allocation.		// register allocation.
MI.setDesc(TII.get(AMDGPU::S_ANDN2_B32));		MI.setDesc(TII.get(AMDGPU::S_ANDN2_B32));
return true;		return true;
}		}
		case AMDGPU::S_AND_B64_term: {
		// This is only a terminator to get the correct spill code placement during
		// register allocation.
		MI.setDesc(TII.get(AMDGPU::S_AND_B64));
		return true;
		}
		case AMDGPU::S_AND_B32_term: {
		// This is only a terminator to get the correct spill code placement during
		// register allocation.
		MI.setDesc(TII.get(AMDGPU::S_AND_B32));
		return true;
		}
default:		default:
return false;		return false;
}		}
}		}

// Turn all pseudoterminators in the block into their equivalent non-terminator		// Turn all pseudoterminators in the block into their equivalent non-terminator
// instructions. Returns the reverse iterator to the first non-terminator		// instructions. Returns the reverse iterator to the first non-terminator
// instruction in the block.		// instruction in the block.
▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

//===-- SIWholeQuadMode.cpp - enter and suspend whole quad mode -----------===//		//===-- SIWholeQuadMode.cpp - enter and suspend whole quad mode -----------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// This pass adds instructions to enable whole quad mode for pixel		/// This pass adds instructions to enable whole quad mode for pixel
/// shaders, and whole wavefront mode for all programs.		/// shaders, and whole wavefront mode for all programs.
///		///
/// Whole quad mode is required for derivative computations, but it interferes		/// Whole quad mode is required for derivative computations, but it interferes
/// with shader side effects (stores and atomics). This pass is run on the		/// with shader side effects (stores and atomics). It ensures that WQM is
/// scheduled machine IR but before register coalescing, so that machine SSA is		/// enabled when necessary, but disabled around stores and atomics.
/// available for analysis. It ensures that WQM is enabled when necessary, but
/// disabled around stores and atomics.
///		///
/// When necessary, this pass creates a function prolog		/// When necessary, this pass creates a function prolog
///		///
/// S_MOV_B64 LiveMask, EXEC		/// S_MOV_B64 LiveMask, EXEC
/// S_WQM_B64 EXEC, EXEC		/// S_WQM_B64 EXEC, EXEC
///		///
/// to enter WQM at the top of the function and surrounds blocks of Exact		/// to enter WQM at the top of the function and surrounds blocks of Exact
/// instructions by		/// instructions by
Show All 30 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
		#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "si-wqm"		#define DEBUG_TYPE "si-wqm"

Show All 36 Lines	struct InstrInfo {
char Disabled = 0;		char Disabled = 0;
char OutNeeds = 0;		char OutNeeds = 0;
};		};

struct BlockInfo {		struct BlockInfo {
char Needs = 0;		char Needs = 0;
char InNeeds = 0;		char InNeeds = 0;
char OutNeeds = 0;		char OutNeeds = 0;
		char InitialState = 0;
		bool NeedsLowering = false;
};		};

struct WorkItem {		struct WorkItem {
MachineBasicBlock *MBB = nullptr;		MachineBasicBlock *MBB = nullptr;
MachineInstr *MI = nullptr;		MachineInstr *MI = nullptr;

WorkItem() = default;		WorkItem() = default;
WorkItem(MachineBasicBlock *MBB) : MBB(MBB) {}		WorkItem(MachineBasicBlock *MBB) : MBB(MBB) {}
WorkItem(MachineInstr *MI) : MI(MI) {}		WorkItem(MachineInstr *MI) : MI(MI) {}
};		};

class SIWholeQuadMode : public MachineFunctionPass {		class SIWholeQuadMode : public MachineFunctionPass {
private:		private:
CallingConv::ID CallingConv;
const SIInstrInfo *TII;		const SIInstrInfo *TII;
const SIRegisterInfo *TRI;		const SIRegisterInfo *TRI;
const GCNSubtarget *ST;		const GCNSubtarget *ST;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
LiveIntervals *LIS;		LiveIntervals *LIS;
		MachineDominatorTree *MDT;
		MachinePostDominatorTree *PDT;

unsigned AndOpc;		unsigned AndOpc;
unsigned XorTermrOpc;		unsigned AndN2Opc;
		unsigned XorOpc;
		unsigned AndSaveExecOpc;
unsigned OrSaveExecOpc;		unsigned OrSaveExecOpc;
unsigned Exec;		unsigned WQMOpc;
		Register Exec;
		Register LiveMaskReg;

DenseMap<const MachineInstr *, InstrInfo> Instructions;		DenseMap<const MachineInstr *, InstrInfo> Instructions;
MapVector<MachineBasicBlock *, BlockInfo> Blocks;		DenseMap<MachineBasicBlock *, BlockInfo> Blocks;
SmallVector<MachineInstr *, 1> LiveMaskQueries;
		// Tracks state (WQM/WWM/Exact) after a given instruction
		DenseMap<const MachineInstr *, char> StateTransition;

		SmallVector<MachineInstr *, 2> LiveMaskQueries;
SmallVector<MachineInstr *, 4> LowerToMovInstrs;		SmallVector<MachineInstr *, 4> LowerToMovInstrs;
SmallVector<MachineInstr *, 4> LowerToCopyInstrs;		SmallVector<MachineInstr *, 4> LowerToCopyInstrs;
		SmallVector<MachineInstr *, 4> KillInstrs;

void printInfo();		void printInfo();

void markInstruction(MachineInstr &MI, char Flag,		void markInstruction(MachineInstr &MI, char Flag,
std::vector<WorkItem> &Worklist);		std::vector<WorkItem> &Worklist);
void markDefs(const MachineInstr &UseMI, LiveRange &LR, Register Reg,		void markDefs(const MachineInstr &UseMI, LiveRange &LR, Register Reg,
unsigned SubReg, char Flag, std::vector<WorkItem> &Worklist);		unsigned SubReg, char Flag, std::vector<WorkItem> &Worklist);
void markInstructionUses(const MachineInstr &MI, char Flag,		void markInstructionUses(const MachineInstr &MI, char Flag,
std::vector<WorkItem> &Worklist);		std::vector<WorkItem> &Worklist);
char scanInstructions(MachineFunction &MF, std::vector<WorkItem> &Worklist);		char scanInstructions(MachineFunction &MF, std::vector<WorkItem> &Worklist);
void propagateInstruction(MachineInstr &MI, std::vector<WorkItem> &Worklist);		void propagateInstruction(MachineInstr &MI, std::vector<WorkItem> &Worklist);
void propagateBlock(MachineBasicBlock &MBB, std::vector<WorkItem> &Worklist);		void propagateBlock(MachineBasicBlock &MBB, std::vector<WorkItem> &Worklist);
char analyzeFunction(MachineFunction &MF);		char analyzeFunction(MachineFunction &MF);

MachineBasicBlock::iterator saveSCC(MachineBasicBlock &MBB,		MachineBasicBlock::iterator saveSCC(MachineBasicBlock &MBB,
MachineBasicBlock::iterator Before);		MachineBasicBlock::iterator Before);
MachineBasicBlock::iterator		MachineBasicBlock::iterator
prepareInsertion(MachineBasicBlock &MBB, MachineBasicBlock::iterator First,		prepareInsertion(MachineBasicBlock &MBB, MachineBasicBlock::iterator First,
MachineBasicBlock::iterator Last, bool PreferLast,		MachineBasicBlock::iterator Last, bool PreferLast,
bool SaveSCC);		bool SaveSCC);
void toExact(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,		void toExact(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,
unsigned SaveWQM, unsigned LiveMaskReg);		Register SaveWQM);
void toWQM(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,		void toWQM(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,
unsigned SavedWQM);		Register SavedWQM);
void toWWM(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,		void toWWM(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,
unsigned SaveOrig);		Register SaveOrig);
void fromWWM(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,		void fromWWM(MachineBasicBlock &MBB, MachineBasicBlock::iterator Before,
unsigned SavedOrig);		Register SavedOrig, char NonWWMState);
void processBlock(MachineBasicBlock &MBB, unsigned LiveMaskReg, bool isEntry);
		bool canSplitBlockAt(MachineBasicBlock BB, MachineInstr MI);
		MachineBasicBlock splitBlock(MachineBasicBlock BB, MachineInstr *TermMI);

		MachineInstr *lowerKillI1(MachineBasicBlock &MBB, MachineInstr &MI,
		bool isDemote);
		MachineInstr *lowerKillF32(MachineBasicBlock &MBB, MachineInstr &MI);

void lowerLiveMaskQueries(unsigned LiveMaskReg);		void lowerBlock(MachineBasicBlock &MBB);
		void processBlock(MachineBasicBlock &MBB, bool isEntry);

		void lowerLiveMaskQueries();
void lowerCopyInstrs();		void lowerCopyInstrs();
		void lowerKillInstrs();

public:		public:
static char ID;		static char ID;

SIWholeQuadMode() :		SIWholeQuadMode() :
MachineFunctionPass(ID) { }		MachineFunctionPass(ID) { }

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override { return "SI Whole Quad Mode"; }		StringRef getPassName() const override { return "SI Whole Quad Mode"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<SlotIndexes>();
AU.addRequired<LiveIntervals>();		AU.addRequired<LiveIntervals>();
AU.addPreserved<SlotIndexes>();		AU.addPreserved<SlotIndexes>();
AU.addPreserved<LiveIntervals>();		AU.addPreserved<LiveIntervals>();
AU.setPreservesCFG();		AU.addRequired<MachineDominatorTree>();
		AU.addPreserved<MachineDominatorTree>();
		AU.addRequired<MachinePostDominatorTree>();
		AU.addPreserved<MachinePostDominatorTree>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char SIWholeQuadMode::ID = 0;		char SIWholeQuadMode::ID = 0;

INITIALIZE_PASS_BEGIN(SIWholeQuadMode, DEBUG_TYPE, "SI Whole Quad Mode", false,		INITIALIZE_PASS_BEGIN(SIWholeQuadMode, DEBUG_TYPE, "SI Whole Quad Mode", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(LiveIntervals)		INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
		INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
INITIALIZE_PASS_END(SIWholeQuadMode, DEBUG_TYPE, "SI Whole Quad Mode", false,		INITIALIZE_PASS_END(SIWholeQuadMode, DEBUG_TYPE, "SI Whole Quad Mode", false,
false)		false)

char &llvm::SIWholeQuadModeID = SIWholeQuadMode::ID;		char &llvm::SIWholeQuadModeID = SIWholeQuadMode::ID;

FunctionPass *llvm::createSIWholeQuadModePass() {		FunctionPass *llvm::createSIWholeQuadModePass() {
return new SIWholeQuadMode;		return new SIWholeQuadMode;
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void SIWholeQuadMode::markInstruction(MachineInstr &MI, char Flag,
II.Needs \|= Flag;		II.Needs \|= Flag;
Worklist.push_back(&MI);		Worklist.push_back(&MI);
}		}

/// Mark all relevant definitions of register \p Reg in usage \p UseMI.		/// Mark all relevant definitions of register \p Reg in usage \p UseMI.
void SIWholeQuadMode::markDefs(const MachineInstr &UseMI, LiveRange &LR,		void SIWholeQuadMode::markDefs(const MachineInstr &UseMI, LiveRange &LR,
Register Reg, unsigned SubReg, char Flag,		Register Reg, unsigned SubReg, char Flag,
std::vector<WorkItem> &Worklist) {		std::vector<WorkItem> &Worklist) {
assert(!MRI->isSSA());

LLVM_DEBUG(dbgs() << "markDefs " << PrintState(Flag) << ": " << UseMI);		LLVM_DEBUG(dbgs() << "markDefs " << PrintState(Flag) << ": " << UseMI);

LiveQueryResult UseLRQ = LR.Query(LIS->getInstructionIndex(UseMI));		LiveQueryResult UseLRQ = LR.Query(LIS->getInstructionIndex(UseMI));
if (!UseLRQ.valueIn())		if (!UseLRQ.valueIn())
return;		return;

SmallPtrSet<const VNInfo *, 4> Visited;		SmallPtrSet<const VNInfo *, 4> Visited;
SmallVector<const VNInfo *, 4> ToProcess;		SmallVector<const VNInfo *, 4> ToProcess;
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if (!Reg.isVirtual()) {

for (MCRegUnitIterator RegUnit(Reg.asMCReg(), TRI); RegUnit.isValid();		for (MCRegUnitIterator RegUnit(Reg.asMCReg(), TRI); RegUnit.isValid();
++RegUnit) {		++RegUnit) {
LiveRange &LR = LIS->getRegUnit(*RegUnit);		LiveRange &LR = LIS->getRegUnit(*RegUnit);
const VNInfo *Value = LR.Query(LIS->getInstructionIndex(MI)).valueIn();		const VNInfo *Value = LR.Query(LIS->getInstructionIndex(MI)).valueIn();
if (!Value)		if (!Value)
continue;		continue;

if (MRI->isSSA()) {
// Since we're in machine SSA, we do not need to track physical
// registers across basic blocks.
if (Value->isPHIDef())
continue;
markInstruction(*LIS->getInstructionFromIndex(Value->def), Flag,
Worklist);
} else {
markDefs(MI, LR, *RegUnit, AMDGPU::NoSubRegister, Flag, Worklist);		markDefs(MI, LR, *RegUnit, AMDGPU::NoSubRegister, Flag, Worklist);
}		}
}

continue;		continue;
}		}

if (MRI->isSSA()) {
for (MachineInstr &DefMI : MRI->def_instructions(Use.getReg()))
markInstruction(DefMI, Flag, Worklist);
} else {
LiveRange &LR = LIS->getInterval(Reg);		LiveRange &LR = LIS->getInterval(Reg);
markDefs(MI, LR, Reg, Use.getSubReg(), Flag, Worklist);		markDefs(MI, LR, Reg, Use.getSubReg(), Flag, Worklist);
}		}
}		}
}

// Scan instructions to determine which ones require an Exact execmask and		// Scan instructions to determine which ones require an Exact execmask and
// which ones seed WQM requirements.		// which ones seed WQM requirements.
char SIWholeQuadMode::scanInstructions(MachineFunction &MF,		char SIWholeQuadMode::scanInstructions(MachineFunction &MF,
std::vector<WorkItem> &Worklist) {		std::vector<WorkItem> &Worklist) {
char GlobalFlags = 0;		char GlobalFlags = 0;
bool WQMOutputs = MF.getFunction().hasFnAttribute("amdgpu-ps-wqm-outputs");		bool WQMOutputs = MF.getFunction().hasFnAttribute("amdgpu-ps-wqm-outputs");
SmallVector<MachineInstr *, 4> SetInactiveInstrs;		SmallVector<MachineInstr *, 4> SetInactiveInstrs;
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	for (auto II = MBB.begin(), IE = MBB.end(); II != IE; ++II) {
if (!(BBI.InNeeds & StateExact)) {		if (!(BBI.InNeeds & StateExact)) {
BBI.InNeeds \|= StateExact;		BBI.InNeeds \|= StateExact;
Worklist.push_back(&MBB);		Worklist.push_back(&MBB);
}		}
GlobalFlags \|= StateExact;		GlobalFlags \|= StateExact;
III.Disabled = StateWQM \| StateWWM;		III.Disabled = StateWQM \| StateWWM;
continue;		continue;
} else {		} else {
if (Opcode == AMDGPU::SI_PS_LIVE) {		if (Opcode == AMDGPU::SI_PS_LIVE \|\| Opcode == AMDGPU::SI_LIVE_MASK) {
LiveMaskQueries.push_back(&MI);		LiveMaskQueries.push_back(&MI);
		} else if (Opcode == AMDGPU::SI_KILL_I1_TERMINATOR \|\|
		Opcode == AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR \|\|
		Opcode == AMDGPU::SI_DEMOTE_I1) {
		KillInstrs.push_back(&MI);
		BBI.NeedsLowering = true;
} else if (WQMOutputs) {		} else if (WQMOutputs) {
// The function is in machine SSA form, which means that physical		// The function is in machine SSA form, which means that physical
// VGPRs correspond to shader inputs and outputs. Inputs are		// VGPRs correspond to shader inputs and outputs. Inputs are
// only used, outputs are only defined.		// only used, outputs are only defined.
		// FIXME: is this still valid?
for (const MachineOperand &MO : MI.defs()) {		for (const MachineOperand &MO : MI.defs()) {
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;

Register Reg = MO.getReg();		Register Reg = MO.getReg();

if (!Reg.isVirtual() &&		if (!Reg.isVirtual() &&
TRI->hasVectorRegisters(TRI->getPhysRegClass(Reg))) {		TRI->hasVectorRegisters(TRI->getPhysRegClass(Reg))) {
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	SIWholeQuadMode::saveSCC(MachineBasicBlock &MBB,

LIS->InsertMachineInstrInMaps(*Save);		LIS->InsertMachineInstrInMaps(*Save);
LIS->InsertMachineInstrInMaps(*Restore);		LIS->InsertMachineInstrInMaps(*Restore);
LIS->createAndComputeVirtRegInterval(SaveReg);		LIS->createAndComputeVirtRegInterval(SaveReg);

return Restore;		return Restore;
}		}

		bool SIWholeQuadMode::canSplitBlockAt(MachineBasicBlock BB, MachineInstr MI) {
		// Cannot split immediately before the epilog
		// because there are values in physical registers
		if (MI->getOpcode() == AMDGPU::SI_RETURN_TO_EPILOG) {
		return false;
		}

		return true;
		}

		MachineBasicBlock SIWholeQuadMode::splitBlock(MachineBasicBlock BB,
		MachineInstr *TermMI) {
		LLVM_DEBUG(dbgs() << "Split block " << printMBBReference(*BB) << " @ "
		<< *TermMI << "\n");

		MachineBasicBlock *SplitBB =
		BB->splitAt(TermMI, /UpdateLiveIns*/ true, LIS);

		// Convert last instruction in to a terminator.
		// Note: this only covers the expected patterns
		switch (TermMI->getOpcode()) {
		case AMDGPU::S_AND_B32:
		TermMI->setDesc(TII->get(AMDGPU::S_AND_B32_term));
		break;
		case AMDGPU::S_AND_B64:
		TermMI->setDesc(TII->get(AMDGPU::S_AND_B64_term));
		break;
		case AMDGPU::S_MOV_B32:
		TermMI->setDesc(TII->get(AMDGPU::S_MOV_B32_term));
		break;
		case AMDGPU::S_MOV_B64:
		TermMI->setDesc(TII->get(AMDGPU::S_MOV_B64_term));
		break;
		default:
		break;
		}

		if (SplitBB != BB) {
		// Update dominator trees
		using DomTreeT = DomTreeBase<MachineBasicBlock>;
		SmallVector<DomTreeT::UpdateType, 16> DTUpdates;
		for (MachineBasicBlock *Succ : SplitBB->successors()) {
		DTUpdates.push_back({DomTreeT::Insert, SplitBB, Succ});
		DTUpdates.push_back({DomTreeT::Delete, BB, Succ});
		}
		DTUpdates.push_back({DomTreeT::Insert, BB, SplitBB});
		if (MDT)
		MDT->getBase().applyUpdates(DTUpdates);
		if (PDT)
		PDT->getBase().applyUpdates(DTUpdates);

		// Link blocks
		MachineInstr *MI =
		BuildMI(*BB, BB->end(), DebugLoc(), TII->get(AMDGPU::S_BRANCH))
		.addMBB(SplitBB);
		LIS->InsertMachineInstrInMaps(*MI);
		}

		return SplitBB;
		}

		MachineInstr *SIWholeQuadMode::lowerKillF32(MachineBasicBlock &MBB,
		MachineInstr &MI) {
		const DebugLoc &DL = MI.getDebugLoc();
		unsigned Opcode = 0;

		assert(MI.getOperand(0).isReg());

		// Operands are reverse for comparison as inline immediate must be first
		// argument. However comparison is for live lanes, and here we compute killed
		// lanes.
		switch (MI.getOperand(2).getImm()) {
		case ISD::SETOEQ:
		case ISD::SETEQ:
		Opcode = AMDGPU::V_CMP_LG_F32_e64;
		break;
		case ISD::SETOGT:
		case ISD::SETGT:
		Opcode = AMDGPU::V_CMP_GT_F32_e64;
		break;
		case ISD::SETOGE:
		case ISD::SETGE:
		Opcode = AMDGPU::V_CMP_GE_F32_e64;
		break;
		case ISD::SETOLT:
		case ISD::SETLT:
		Opcode = AMDGPU::V_CMP_LT_F32_e64;
		break;
		case ISD::SETOLE:
		case ISD::SETLE:
		Opcode = AMDGPU::V_CMP_LE_F32_e64;
		break;
		case ISD::SETONE:
		case ISD::SETNE:
		Opcode = AMDGPU::V_CMP_EQ_F32_e64;
		break;
		case ISD::SETO:
		Opcode = AMDGPU::V_CMP_O_F32_e64;
		break;
		case ISD::SETUO:
		Opcode = AMDGPU::V_CMP_U_F32_e64;
		break;
		case ISD::SETUEQ:
		Opcode = AMDGPU::V_CMP_NEQ_F32_e64;
		break;
		case ISD::SETUGT:
		Opcode = AMDGPU::V_CMP_NLE_F32_e64;
		break;
		case ISD::SETUGE:
		Opcode = AMDGPU::V_CMP_NLT_F32_e64;
		break;
		case ISD::SETULT:
		Opcode = AMDGPU::V_CMP_NGE_F32_e64;
		break;
		case ISD::SETULE:
		Opcode = AMDGPU::V_CMP_NGT_F32_e64;
		break;
		case ISD::SETUNE:
		Opcode = AMDGPU::V_CMP_NLG_F32_e64;
		break;
		default:
		llvm_unreachable("invalid ISD:SET cond code");
		}

		// Pick opcode based on comparison type.
		MachineInstr *VcmpMI;
		const MachineOperand &Op0 = MI.getOperand(0);
		const MachineOperand &Op1 = MI.getOperand(1);
		if (TRI->isVGPR(*MRI, Op0.getReg())) {
		Opcode = AMDGPU::getVOPe32(Opcode);
		VcmpMI = BuildMI(MBB, &MI, DL, TII->get(Opcode)).add(Op1).add(Op0);
		} else {
		VcmpMI = BuildMI(MBB, &MI, DL, TII->get(Opcode))
		.addReg(AMDGPU::VCC, RegState::Define)
		.addImm(0) // src0 modifiers
		.add(Op1)
		.addImm(0) // src1 modifiers
		.add(Op0)
		.addImm(0); // omod
		}

		// VCC represents lanes killed.
		Register VCC = ST->isWave32() ? AMDGPU::VCC_LO : AMDGPU::VCC;

		MachineInstr *MaskUpdateMI =
		BuildMI(MBB, MI, DL, TII->get(AndN2Opc), LiveMaskReg)
		.addReg(LiveMaskReg)
		.addReg(VCC);

		// State of SCC represents whether any lanes are live in mask,
		// if SCC is 0 then no lanes will be alive anymore.
		MachineInstr *EarlyTermMI =
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::SI_EARLY_TERMINATE_SCC0));

		MachineInstr *ExecMaskMI =
		BuildMI(MBB, MI, DL, TII->get(AndN2Opc), Exec).addReg(Exec).addReg(VCC);

		assert(MBB.succ_size() == 1);
		MachineInstr *NewTerm = BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_BRANCH))
		.addMBB(*MBB.succ_begin());

		// Update live intervals
		LIS->ReplaceMachineInstrInMaps(MI, *VcmpMI);
		MBB.remove(&MI);

		LIS->InsertMachineInstrInMaps(*MaskUpdateMI);
		LIS->InsertMachineInstrInMaps(*ExecMaskMI);
		LIS->InsertMachineInstrInMaps(*EarlyTermMI);
		LIS->InsertMachineInstrInMaps(*NewTerm);

		return NewTerm;
		}

		MachineInstr *SIWholeQuadMode::lowerKillI1(MachineBasicBlock &MBB,
		MachineInstr &MI, bool isDemote) {
		const DebugLoc &DL = MI.getDebugLoc();
		MachineInstr *MaskUpdateMI = nullptr;

		const MachineOperand &Op = MI.getOperand(0);
		int64_t KillVal = MI.getOperand(1).getImm();
		MachineInstr *ComputeKilledMaskMI = nullptr;
		Register CndReg = !Op.isImm() ? Op.getReg() : Register();
		Register TmpReg;

		// Is this a static or dynamic kill?
		if (Op.isImm()) {
		if (Op.getImm() == KillVal) {
		// Static: all active lanes are killed
		MaskUpdateMI = BuildMI(MBB, MI, DL, TII->get(AndN2Opc), LiveMaskReg)
		.addReg(LiveMaskReg)
		.addReg(Exec);
		} else {
		// Static: kill does nothing
		MachineInstr *NewTerm = nullptr;
		if (!isDemote) {
		assert(MBB.succ_size() == 1);
		NewTerm = BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_BRANCH))
		.addMBB(*MBB.succ_begin());
		LIS->ReplaceMachineInstrInMaps(MI, *NewTerm);
		} else {
		LIS->RemoveMachineInstrFromMaps(MI);
		}
		MBB.remove(&MI);
		return NewTerm;
		}
		} else {
		if (!KillVal) {
		// Op represents live lanes after kill,
		// so exec mask needs to be factored in.
		TmpReg = MRI->createVirtualRegister(TRI->getBoolRC());
		ComputeKilledMaskMI =
		BuildMI(MBB, MI, DL, TII->get(XorOpc), TmpReg).add(Op).addReg(Exec);
		MaskUpdateMI = BuildMI(MBB, MI, DL, TII->get(AndN2Opc), LiveMaskReg)
		.addReg(LiveMaskReg)
		.addReg(TmpReg);
		} else {
		// Op represents lanes to kill
		MaskUpdateMI = BuildMI(MBB, MI, DL, TII->get(AndN2Opc), LiveMaskReg)
		.addReg(LiveMaskReg)
		.add(Op);
		}
		}

		// State of SCC represents whether any lanes are live in mask,
		// if SCC is 0 then no lanes will be alive anymore.
		MachineInstr *EarlyTermMI =
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::SI_EARLY_TERMINATE_SCC0));

		// In the case we got this far some lanes are still live,
		// update EXEC to deactivate lanes as appropriate.
		MachineInstr *NewTerm;
		MachineInstr *WQMMaskMI = nullptr;
		Register LiveMaskWQM;
		if (isDemote) {
		// Demotes deactive quads with only helper lanes
		LiveMaskWQM = MRI->createVirtualRegister(TRI->getBoolRC());
		WQMMaskMI =
		BuildMI(MBB, MI, DL, TII->get(WQMOpc), LiveMaskWQM).addReg(LiveMaskReg);
		NewTerm = BuildMI(MBB, MI, DL, TII->get(AndOpc), Exec)
		.addReg(Exec)
		.addReg(LiveMaskWQM);
		} else {
		// Kills deactivate lanes
		if (Op.isImm()) {
		unsigned MovOpc = ST->isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
		NewTerm = BuildMI(MBB, &MI, DL, TII->get(MovOpc), Exec).addImm(0);
		} else {
		NewTerm = BuildMI(MBB, &MI, DL, TII->get(AndOpc), Exec)
		.addReg(Exec)
		.addReg(LiveMaskReg);
		}
		}

		// Update live intervals
		LIS->RemoveMachineInstrFromMaps(MI);
		MBB.remove(&MI);
		assert(EarlyTermMI);
		assert(MaskUpdateMI);
		assert(NewTerm);
		if (ComputeKilledMaskMI)
		LIS->InsertMachineInstrInMaps(*ComputeKilledMaskMI);
		LIS->InsertMachineInstrInMaps(*MaskUpdateMI);
		LIS->InsertMachineInstrInMaps(*EarlyTermMI);
		if (WQMMaskMI)
		LIS->InsertMachineInstrInMaps(*WQMMaskMI);
		LIS->InsertMachineInstrInMaps(*NewTerm);

		if (CndReg) {
		LIS->removeInterval(CndReg);
		LIS->createAndComputeVirtRegInterval(CndReg);
		}
		if (TmpReg)
		LIS->createAndComputeVirtRegInterval(TmpReg);
		if (LiveMaskWQM)
		LIS->createAndComputeVirtRegInterval(LiveMaskWQM);

		return NewTerm;
		}

		// Replace (or supplement) instructions accessing live mask.
		// This can only happen once all the live mask registers have been created
		// and the execute state (WQM/WWM/Exact) of instructions is known.
		void SIWholeQuadMode::lowerBlock(MachineBasicBlock &MBB) {
		auto BII = Blocks.find(&MBB);
		if (BII == Blocks.end())
		return;

		const BlockInfo &BI = BII->second;
		if (!BI.NeedsLowering)
		return;

		LLVM_DEBUG(dbgs() << "\nLowering block " << printMBBReference(MBB) << ":\n");

		SmallVector<MachineInstr *, 4> SplitPoints;
		char State = BI.InitialState;

		auto II = MBB.getFirstNonPHI(), IE = MBB.end();
		while (II != IE) {
		auto Next = std::next(II);
		MachineInstr &MI = *II;

		if (StateTransition.count(&MI))
		State = StateTransition[&MI];

		MachineInstr *SplitPoint = nullptr;
		switch (MI.getOpcode()) {
		case AMDGPU::SI_DEMOTE_I1: {
		SplitPoint = lowerKillI1(MBB, MI, State == StateWQM);
		break;
		case AMDGPU::SI_KILL_I1_TERMINATOR:
		SplitPoint = lowerKillI1(MBB, MI, false);
		break;
		case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR:
		SplitPoint = lowerKillF32(MBB, MI);
		break;
		}
		default:
		break;
		}
		if (SplitPoint)
		SplitPoints.push_back(SplitPoint);

		II = Next;
		}

		// Perform splitting after instruction scan to simplify iteration.
		if (!SplitPoints.empty()) {
		MachineBasicBlock *BB = &MBB;
		for (MachineInstr *MI : SplitPoints) {
		BB = splitBlock(BB, MI);
		}
		}
		}

// Return an iterator in the (inclusive) range [First, Last] at which		// Return an iterator in the (inclusive) range [First, Last] at which
// instructions can be safely inserted, keeping in mind that some of the		// instructions can be safely inserted, keeping in mind that some of the
// instructions we want to add necessarily clobber SCC.		// instructions we want to add necessarily clobber SCC.
MachineBasicBlock::iterator SIWholeQuadMode::prepareInsertion(		MachineBasicBlock::iterator SIWholeQuadMode::prepareInsertion(
MachineBasicBlock &MBB, MachineBasicBlock::iterator First,		MachineBasicBlock &MBB, MachineBasicBlock::iterator First,
MachineBasicBlock::iterator Last, bool PreferLast, bool SaveSCC) {		MachineBasicBlock::iterator Last, bool PreferLast, bool SaveSCC) {
if (!SaveSCC)		if (!SaveSCC)
return PreferLast ? Last : First;		return PreferLast ? Last : First;
Show All 38 Lines	MachineBasicBlock::iterator SIWholeQuadMode::prepareInsertion(
else {		else {
assert(Idx == LIS->getMBBEndIdx(&MBB));		assert(Idx == LIS->getMBBEndIdx(&MBB));
MBBI = MBB.end();		MBBI = MBB.end();
}		}

// Move insertion point past any operations modifying EXEC.		// Move insertion point past any operations modifying EXEC.
// This assumes that the value of SCC defined by any of these operations		// This assumes that the value of SCC defined by any of these operations
// does not need to be preserved.		// does not need to be preserved.
while (MBBI != Last) {		while (MBBI != Last) {
bool IsExecDef = false;		bool IsExecDef = false;
		arsenmUnsubmitted Not Done Reply Inline Actions use Register arsenm: use Register
		critsonAuthorUnsubmitted Done Reply Inline Actions I assume you mean the variable name Src -> Register? critson: I assume you mean the variable name Src -> Register?
		foadUnsubmitted Not Done Reply Inline Actions I'm pretty sure he meant s/unsigned Src/Register Src/. clang-tidy thinks the same. foad: I'm pretty sure he meant s/unsigned Src/Register Src/. clang-tidy thinks the same.
for (const MachineOperand &MO : MBBI->operands()) {		for (const MachineOperand &MO : MBBI->operands()) {
if (MO.isReg() && MO.isDef()) {		if (MO.isReg() && MO.isDef()) {
IsExecDef \|=		IsExecDef \|=
MO.getReg() == AMDGPU::EXEC_LO \|\| MO.getReg() == AMDGPU::EXEC;		MO.getReg() == AMDGPU::EXEC_LO \|\| MO.getReg() == AMDGPU::EXEC;
}		}
}		}
if (!IsExecDef)		if (!IsExecDef)
break;		break;
MBBI++;		MBBI++;
S = nullptr;		S = nullptr;
}		}

if (S)		if (S)
MBBI = saveSCC(MBB, MBBI);		MBBI = saveSCC(MBB, MBBI);

return MBBI;		return MBBI;
}		}

void SIWholeQuadMode::toExact(MachineBasicBlock &MBB,		void SIWholeQuadMode::toExact(MachineBasicBlock &MBB,
MachineBasicBlock::iterator Before,		MachineBasicBlock::iterator Before,
unsigned SaveWQM, unsigned LiveMaskReg) {		Register SaveWQM) {
MachineInstr *MI;		MachineInstr *MI;

if (SaveWQM) {		if (SaveWQM) {
MI = BuildMI(MBB, Before, DebugLoc(), TII->get(ST->isWave32() ?		MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AndSaveExecOpc), SaveWQM)
AMDGPU::S_AND_SAVEEXEC_B32 : AMDGPU::S_AND_SAVEEXEC_B64),
SaveWQM)
.addReg(LiveMaskReg);		.addReg(LiveMaskReg);
} else {		} else {
unsigned Exec = ST->isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;		MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AndOpc), Exec)
MI = BuildMI(MBB, Before, DebugLoc(), TII->get(ST->isWave32() ?
AMDGPU::S_AND_B32 : AMDGPU::S_AND_B64),
Exec)
.addReg(Exec)		.addReg(Exec)
.addReg(LiveMaskReg);		.addReg(LiveMaskReg);
}		}

LIS->InsertMachineInstrInMaps(*MI);		LIS->InsertMachineInstrInMaps(*MI);
		StateTransition[MI] = StateExact;
}		}

void SIWholeQuadMode::toWQM(MachineBasicBlock &MBB,		void SIWholeQuadMode::toWQM(MachineBasicBlock &MBB,
MachineBasicBlock::iterator Before,		MachineBasicBlock::iterator Before,
unsigned SavedWQM) {		Register SavedWQM) {
MachineInstr *MI;		MachineInstr *MI;

unsigned Exec = ST->isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
if (SavedWQM) {		if (SavedWQM) {
MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::COPY), Exec)		MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::COPY), Exec)
.addReg(SavedWQM);		.addReg(SavedWQM);
} else {		} else {
MI = BuildMI(MBB, Before, DebugLoc(), TII->get(ST->isWave32() ?		MI = BuildMI(MBB, Before, DebugLoc(), TII->get(WQMOpc), Exec).addReg(Exec);
AMDGPU::S_WQM_B32 : AMDGPU::S_WQM_B64),
Exec)
.addReg(Exec);
}		}

LIS->InsertMachineInstrInMaps(*MI);		LIS->InsertMachineInstrInMaps(*MI);
		StateTransition[MI] = StateWQM;
}		}

void SIWholeQuadMode::toWWM(MachineBasicBlock &MBB,		void SIWholeQuadMode::toWWM(MachineBasicBlock &MBB,
MachineBasicBlock::iterator Before,		MachineBasicBlock::iterator Before,
unsigned SaveOrig) {		Register SaveOrig) {
MachineInstr *MI;		MachineInstr *MI;

assert(SaveOrig);		assert(SaveOrig);
MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::ENTER_WWM), SaveOrig)		MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::ENTER_WWM), SaveOrig)
.addImm(-1);		.addImm(-1);
LIS->InsertMachineInstrInMaps(*MI);		LIS->InsertMachineInstrInMaps(*MI);
		StateTransition[MI] = StateWWM;
}		}

void SIWholeQuadMode::fromWWM(MachineBasicBlock &MBB,		void SIWholeQuadMode::fromWWM(MachineBasicBlock &MBB,
MachineBasicBlock::iterator Before,		MachineBasicBlock::iterator Before,
unsigned SavedOrig) {		Register SavedOrig, char NonWWMState) {
MachineInstr *MI;		MachineInstr *MI;

assert(SavedOrig);		assert(SavedOrig);
MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::EXIT_WWM),		MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::EXIT_WWM), Exec)
ST->isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC)
.addReg(SavedOrig);		.addReg(SavedOrig);
LIS->InsertMachineInstrInMaps(*MI);		LIS->InsertMachineInstrInMaps(*MI);
		StateTransition[MI] = NonWWMState;
}		}

void SIWholeQuadMode::processBlock(MachineBasicBlock &MBB, unsigned LiveMaskReg,		void SIWholeQuadMode::processBlock(MachineBasicBlock &MBB, bool isEntry) {
bool isEntry) {
auto BII = Blocks.find(&MBB);		auto BII = Blocks.find(&MBB);
if (BII == Blocks.end())		if (BII == Blocks.end())
return;		return;

const BlockInfo &BI = BII->second;		BlockInfo &BI = BII->second;

// This is a non-entry block that is WQM throughout, so no need to do		// This is a non-entry block that is WQM throughout, so no need to do
// anything.		// anything.
if (!isEntry && BI.Needs == StateWQM && BI.OutNeeds != StateExact)		if (!isEntry && BI.Needs == StateWQM && BI.OutNeeds != StateExact) {
		BI.InitialState = StateWQM;
return;		return;
		}

LLVM_DEBUG(dbgs() << "\nProcessing block " << printMBBReference(MBB)		LLVM_DEBUG(dbgs() << "\nProcessing block " << printMBBReference(MBB)
<< ":\n");		<< ":\n");

unsigned SavedWQMReg = 0;		Register SavedWQMReg;
unsigned SavedNonWWMReg = 0;		Register SavedNonWWMReg;
bool WQMFromExec = isEntry;		bool WQMFromExec = isEntry;
char State = (isEntry \|\| !(BI.InNeeds & StateWQM)) ? StateExact : StateWQM;		char State = (isEntry \|\| !(BI.InNeeds & StateWQM)) ? StateExact : StateWQM;
char NonWWMState = 0;		char NonWWMState = 0;
const TargetRegisterClass *BoolRC = TRI->getBoolRC();		const TargetRegisterClass *BoolRC = TRI->getBoolRC();

auto II = MBB.getFirstNonPHI(), IE = MBB.end();		auto II = MBB.getFirstNonPHI(), IE = MBB.end();
if (isEntry) {		if (isEntry) {
// Skip the instruction that saves LiveMask		// Skip the instruction that saves LiveMask
if (II != IE && II->getOpcode() == AMDGPU::COPY)		if (II != IE && II->getOpcode() == AMDGPU::COPY)
++II;		++II;
}		}

// This stores the first instruction where it's safe to switch from WQM to		// This stores the first instruction where it's safe to switch from WQM to
// Exact or vice versa.		// Exact or vice versa.
MachineBasicBlock::iterator FirstWQM = IE;		MachineBasicBlock::iterator FirstWQM = IE;

// This stores the first instruction where it's safe to switch from WWM to		// This stores the first instruction where it's safe to switch from WWM to
// Exact/WQM or to switch to WWM. It must always be the same as, or after,		// Exact/WQM or to switch to WWM. It must always be the same as, or after,
// FirstWQM since if it's safe to switch to/from WWM, it must be safe to		// FirstWQM since if it's safe to switch to/from WWM, it must be safe to
// switch to/from WQM as well.		// switch to/from WQM as well.
MachineBasicBlock::iterator FirstWWM = IE;		MachineBasicBlock::iterator FirstWWM = IE;

		// Record initial state is block information.
		BI.InitialState = State;

for (;;) {		for (;;) {
MachineBasicBlock::iterator Next = II;		MachineBasicBlock::iterator Next = II;
char Needs = StateExact \| StateWQM; // WWM is disabled by default		char Needs = StateExact \| StateWQM; // WWM is disabled by default
char OutNeeds = 0;		char OutNeeds = 0;

if (FirstWQM == IE)		if (FirstWQM == IE)
FirstWQM = II;		FirstWQM = II;

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (!(Needs & State)) {
}		}

MachineBasicBlock::iterator Before =		MachineBasicBlock::iterator Before =
prepareInsertion(MBB, First, II, Needs == StateWQM,		prepareInsertion(MBB, First, II, Needs == StateWQM,
Needs == StateExact \|\| WQMFromExec);		Needs == StateExact \|\| WQMFromExec);

if (State == StateWWM) {		if (State == StateWWM) {
assert(SavedNonWWMReg);		assert(SavedNonWWMReg);
fromWWM(MBB, Before, SavedNonWWMReg);		fromWWM(MBB, Before, SavedNonWWMReg, NonWWMState);
LIS->createAndComputeVirtRegInterval(SavedNonWWMReg);		LIS->createAndComputeVirtRegInterval(SavedNonWWMReg);
SavedNonWWMReg = 0;		SavedNonWWMReg = 0;
State = NonWWMState;		State = NonWWMState;
}		}

if (Needs == StateWWM) {		if (Needs == StateWWM) {
NonWWMState = State;		NonWWMState = State;
assert(!SavedNonWWMReg);		assert(!SavedNonWWMReg);
SavedNonWWMReg = MRI->createVirtualRegister(BoolRC);		SavedNonWWMReg = MRI->createVirtualRegister(BoolRC);
toWWM(MBB, Before, SavedNonWWMReg);		toWWM(MBB, Before, SavedNonWWMReg);
State = StateWWM;		State = StateWWM;
} else {		} else {
if (State == StateWQM && (Needs & StateExact) && !(Needs & StateWQM)) {		if (State == StateWQM && (Needs & StateExact) && !(Needs & StateWQM)) {
if (!WQMFromExec && (OutNeeds & StateWQM)) {		if (!WQMFromExec && (OutNeeds & StateWQM)) {
assert(!SavedWQMReg);		assert(!SavedWQMReg);
SavedWQMReg = MRI->createVirtualRegister(BoolRC);		SavedWQMReg = MRI->createVirtualRegister(BoolRC);
}		}

toExact(MBB, Before, SavedWQMReg, LiveMaskReg);		toExact(MBB, Before, SavedWQMReg);
State = StateExact;		State = StateExact;
} else if (State == StateExact && (Needs & StateWQM) &&		} else if (State == StateExact && (Needs & StateWQM) &&
!(Needs & StateExact)) {		!(Needs & StateExact)) {
assert(WQMFromExec == (SavedWQMReg == 0));		assert(WQMFromExec == (SavedWQMReg == 0));

toWQM(MBB, Before, SavedWQMReg);		toWQM(MBB, Before, SavedWQMReg);

if (SavedWQMReg) {		if (SavedWQMReg) {
Show All 19 Lines	if (II == IE)
break;		break;

II = Next;		II = Next;
}		}
assert(!SavedWQMReg);		assert(!SavedWQMReg);
assert(!SavedNonWWMReg);		assert(!SavedNonWWMReg);
}		}

void SIWholeQuadMode::lowerLiveMaskQueries(unsigned LiveMaskReg) {		void SIWholeQuadMode::lowerLiveMaskQueries() {
for (MachineInstr *MI : LiveMaskQueries) {		for (MachineInstr *MI : LiveMaskQueries) {
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();
Register Dest = MI->getOperand(0).getReg();		Register Dest = MI->getOperand(0).getReg();

MachineInstr *Copy =		MachineInstr *Copy =
BuildMI(*MI->getParent(), MI, DL, TII->get(AMDGPU::COPY), Dest)		BuildMI(*MI->getParent(), MI, DL, TII->get(AMDGPU::COPY), Dest)
.addReg(LiveMaskReg);		.addReg(LiveMaskReg);

Show All 15 Lines	if (TRI->isVGPR(*MRI, Reg)) {
if (SubReg)		if (SubReg)
regClass = TRI->getSubRegClass(regClass, SubReg);		regClass = TRI->getSubRegClass(regClass, SubReg);

const unsigned MovOp = TII->getMovOpcode(regClass);		const unsigned MovOp = TII->getMovOpcode(regClass);
MI->setDesc(TII->get(MovOp));		MI->setDesc(TII->get(MovOp));

// And make it implicitly depend on exec (like all VALU movs should do).		// And make it implicitly depend on exec (like all VALU movs should do).
MI->addOperand(MachineOperand::CreateReg(AMDGPU::EXEC, false, true));		MI->addOperand(MachineOperand::CreateReg(AMDGPU::EXEC, false, true));
} else if (!MRI->isSSA()) {		} else {
// Remove early-clobber and exec dependency from simple SGPR copies.		// Remove early-clobber and exec dependency from simple SGPR copies.
// This allows some to be eliminated during/post RA.		// This allows some to be eliminated during/post RA.
LLVM_DEBUG(dbgs() << "simplify SGPR copy: " << *MI);		LLVM_DEBUG(dbgs() << "simplify SGPR copy: " << *MI);
if (MI->getOperand(0).isEarlyClobber()) {		if (MI->getOperand(0).isEarlyClobber()) {
LIS->removeInterval(Reg);		LIS->removeInterval(Reg);
MI->getOperand(0).setIsEarlyClobber(false);		MI->getOperand(0).setIsEarlyClobber(false);
LIS->createAndComputeVirtRegInterval(Reg);		LIS->createAndComputeVirtRegInterval(Reg);
}		}
Show All 19 Lines	for (MachineInstr *MI : LowerToCopyInstrs) {
} else {		} else {
assert(MI->getNumExplicitOperands() == 2);		assert(MI->getNumExplicitOperands() == 2);
}		}

MI->setDesc(TII->get(AMDGPU::COPY));		MI->setDesc(TII->get(AMDGPU::COPY));
}		}
}		}

		void SIWholeQuadMode::lowerKillInstrs() {
		for (MachineInstr *MI : KillInstrs) {
		MachineBasicBlock *MBB = MI->getParent();
		MachineInstr *SplitPoint = nullptr;
		switch (MI->getOpcode()) {
		case AMDGPU::SI_DEMOTE_I1:
		SplitPoint = lowerKillI1(MBB, MI, true);
		break;
		case AMDGPU::SI_KILL_I1_TERMINATOR:
		SplitPoint = lowerKillI1(MBB, MI, false);
		break;
		case AMDGPU::SI_KILL_F32_COND_IMM_TERMINATOR:
		SplitPoint = lowerKillF32(MBB, MI);
		break;
		default:
		continue;
		}
		if (SplitPoint)
		splitBlock(MBB, SplitPoint);
		}
		}

bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {		bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
Instructions.clear();		Instructions.clear();
Blocks.clear();		Blocks.clear();
LiveMaskQueries.clear();		LiveMaskQueries.clear();
LowerToCopyInstrs.clear();		LowerToCopyInstrs.clear();
LowerToMovInstrs.clear();		LowerToMovInstrs.clear();
CallingConv = MF.getFunction().getCallingConv();		KillInstrs.clear();
		StateTransition.clear();

ST = &MF.getSubtarget<GCNSubtarget>();		ST = &MF.getSubtarget<GCNSubtarget>();

TII = ST->getInstrInfo();		TII = ST->getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
LIS = &getAnalysis<LiveIntervals>();		LIS = &getAnalysis<LiveIntervals>();
		MDT = &getAnalysis<MachineDominatorTree>();
		PDT = &getAnalysis<MachinePostDominatorTree>();

if (ST->isWave32()) {		if (ST->isWave32()) {
AndOpc = AMDGPU::S_AND_B32;		AndOpc = AMDGPU::S_AND_B32;
XorTermrOpc = AMDGPU::S_XOR_B32_term;		AndN2Opc = AMDGPU::S_ANDN2_B32;
		XorOpc = AMDGPU::S_XOR_B32;
		AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B32;
OrSaveExecOpc = AMDGPU::S_OR_SAVEEXEC_B32;		OrSaveExecOpc = AMDGPU::S_OR_SAVEEXEC_B32;
		WQMOpc = AMDGPU::S_WQM_B32;
Exec = AMDGPU::EXEC_LO;		Exec = AMDGPU::EXEC_LO;
} else {		} else {
AndOpc = AMDGPU::S_AND_B64;		AndOpc = AMDGPU::S_AND_B64;
XorTermrOpc = AMDGPU::S_XOR_B64_term;		AndN2Opc = AMDGPU::S_ANDN2_B64;
		XorOpc = AMDGPU::S_XOR_B64;
		AndSaveExecOpc = AMDGPU::S_AND_SAVEEXEC_B64;
OrSaveExecOpc = AMDGPU::S_OR_SAVEEXEC_B64;		OrSaveExecOpc = AMDGPU::S_OR_SAVEEXEC_B64;
		WQMOpc = AMDGPU::S_WQM_B64;
Exec = AMDGPU::EXEC;		Exec = AMDGPU::EXEC;
}		}

char GlobalFlags = analyzeFunction(MF);		const char GlobalFlags = analyzeFunction(MF);
unsigned LiveMaskReg = 0;		const bool NeedsLiveMask = !(KillInstrs.empty() && LiveMaskQueries.empty());
if (!(GlobalFlags & StateWQM)) {
lowerLiveMaskQueries(Exec);		LiveMaskReg = Exec;
if (!(GlobalFlags & StateWWM) && LowerToCopyInstrs.empty() && LowerToMovInstrs.empty())
		// Shader is simple, only needs WQM or WWM
		if (!(GlobalFlags & (StateWQM \| StateWWM)) && LowerToCopyInstrs.empty() &&
		LowerToMovInstrs.empty() && KillInstrs.empty()) {
		lowerLiveMaskQueries();
return !LiveMaskQueries.empty();		return !LiveMaskQueries.empty();
} else {		}
// Store a copy of the original live mask when required
MachineBasicBlock &Entry = MF.front();		MachineBasicBlock &Entry = MF.front();
MachineBasicBlock::iterator EntryMI = Entry.getFirstNonPHI();		MachineBasicBlock::iterator EntryMI = Entry.getFirstNonPHI();

if (GlobalFlags & StateExact \|\| !LiveMaskQueries.empty()) {		// Store a copy of the original live mask when required
		if (NeedsLiveMask \|\| (GlobalFlags & StateWQM)) {
LiveMaskReg = MRI->createVirtualRegister(TRI->getBoolRC());		LiveMaskReg = MRI->createVirtualRegister(TRI->getBoolRC());
MachineInstr *MI = BuildMI(Entry, EntryMI, DebugLoc(),		MachineInstr *MI =
TII->get(AMDGPU::COPY), LiveMaskReg)		BuildMI(Entry, EntryMI, DebugLoc(), TII->get(AMDGPU::COPY), LiveMaskReg)
.addReg(Exec);		.addReg(Exec);
LIS->InsertMachineInstrInMaps(*MI);		LIS->InsertMachineInstrInMaps(*MI);
}		}

lowerLiveMaskQueries(LiveMaskReg);

if (GlobalFlags == StateWQM) {
// For a shader that needs only WQM, we can just set it once.
auto MI = BuildMI(Entry, EntryMI, DebugLoc(),
TII->get(ST->isWave32() ? AMDGPU::S_WQM_B32
: AMDGPU::S_WQM_B64),
Exec)
.addReg(Exec);
LIS->InsertMachineInstrInMaps(*MI);

lowerCopyInstrs();
// EntryMI may become invalid here
return true;
}
}

LLVM_DEBUG(printInfo());		LLVM_DEBUG(printInfo());

		lowerLiveMaskQueries();
lowerCopyInstrs();		lowerCopyInstrs();

// Handle the general case		// Shader only needs WQM
		if (GlobalFlags == StateWQM) {
		auto MI = BuildMI(Entry, EntryMI, DebugLoc(), TII->get(WQMOpc), Exec)
		.addReg(Exec);
		LIS->InsertMachineInstrInMaps(*MI);
		lowerKillInstrs();
		} else {
for (auto BII : Blocks)		for (auto BII : Blocks)
processBlock(BII.first, LiveMaskReg, BII.first == &MF.begin());		processBlock(*BII.first, BII.first == &Entry);
		// Lowering blocks causes block splitting so perform as a second pass.
		for (auto BII : Blocks)
		lowerBlock(*BII.first);
		}

if (LiveMaskReg)		// Compute live range for live mask
		if (LiveMaskReg != Exec)
LIS->createAndComputeVirtRegInterval(LiveMaskReg);		LIS->createAndComputeVirtRegInterval(LiveMaskReg);

// Physical registers like SCC aren't tracked by default anyway, so just		// Physical registers like SCC aren't tracked by default anyway, so just
// removing the ranges we computed is the simplest option for maintaining		// removing the ranges we computed is the simplest option for maintaining
// the analysis results.		// the analysis results.
LIS->removeRegUnit(*MCRegUnitIterator(MCRegister::from(AMDGPU::SCC), TRI));		LIS->removeRegUnit(*MCRegUnitIterator(MCRegister::from(AMDGPU::SCC), TRI));

		// If we performed any kills then recompute EXEC
		if (!KillInstrs.empty())
		LIS->removeRegUnit(*MCRegUnitIterator(AMDGPU::EXEC, TRI));

return true;		return true;
}		}

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wqm.demote.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -global-isel -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX9 %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10-32 %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10-64 %s

				define amdgpu_ps void @static_exact(float %arg0, float %arg1) {
				; SI-LABEL: static_exact:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, 0
				; SI-NEXT: s_mov_b64 s[0:1], exec
				; SI-NEXT: s_xor_b64 s[2:3], s[2:3], exec
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], s[2:3]
				; SI-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_cbranch_scc0 BB0_2
				; SI-NEXT: ; %bb.1: ; %.entry
				; SI-NEXT: s_and_b64 exec, exec, s[0:1]
				; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; SI-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB0_2:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: static_exact:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, 0
				; GFX9-NEXT: s_mov_b64 s[0:1], exec
				; GFX9-NEXT: s_xor_b64 s[2:3], s[2:3], exec
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], s[2:3]
				; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_cbranch_scc0 BB0_2
				; GFX9-NEXT: ; %bb.1: ; %.entry
				; GFX9-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX9-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB0_2:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: static_exact:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s1, 0, 0
				; GFX10-32-NEXT: s_mov_b32 s0, exec_lo
				; GFX10-32-NEXT: v_cmp_gt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_xor_b32 s1, s1, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, s1
				; GFX10-32-NEXT: s_cbranch_scc0 BB0_2
				; GFX10-32-NEXT: ; %bb.1: ; %.entry
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s0
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
				; GFX10-32-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB0_2:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: static_exact:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, 0
				; GFX10-64-NEXT: s_mov_b64 s[0:1], exec
				; GFX10-64-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_xor_b64 s[2:3], s[2:3], exec
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], s[2:3]
				; GFX10-64-NEXT: s_cbranch_scc0 BB0_2
				; GFX10-64-NEXT: ; %bb.1: ; %.entry
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX10-64-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB0_2:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%c0 = fcmp olt float %arg0, 0.000000e+00
				%c1 = fcmp oge float %arg1, 0.0
				call void @llvm.amdgcn.wqm.demote(i1 false)
				%tmp1 = select i1 %c0, float 1.000000e+00, float 0.000000e+00
				call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
				ret void
				}

				define amdgpu_ps void @dynamic_exact(float %arg0, float %arg1) {
				; SI-LABEL: dynamic_exact:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: v_cmp_le_f32_e64 s[0:1], 0, v1
				; SI-NEXT: s_mov_b64 s[2:3], exec
				; SI-NEXT: s_xor_b64 s[0:1], s[0:1], exec
				; SI-NEXT: s_andn2_b64 s[2:3], s[2:3], s[0:1]
				; SI-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_cbranch_scc0 BB1_2
				; SI-NEXT: ; %bb.1: ; %.entry
				; SI-NEXT: s_and_b64 exec, exec, s[2:3]
				; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; SI-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB1_2:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: dynamic_exact:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: v_cmp_le_f32_e64 s[0:1], 0, v1
				; GFX9-NEXT: s_mov_b64 s[2:3], exec
				; GFX9-NEXT: s_xor_b64 s[0:1], s[0:1], exec
				; GFX9-NEXT: s_andn2_b64 s[2:3], s[2:3], s[0:1]
				; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_cbranch_scc0 BB1_2
				; GFX9-NEXT: ; %bb.1: ; %.entry
				; GFX9-NEXT: s_and_b64 exec, exec, s[2:3]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX9-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB1_2:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: dynamic_exact:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: v_cmp_le_f32_e64 s0, 0, v1
				; GFX10-32-NEXT: s_mov_b32 s1, exec_lo
				; GFX10-32-NEXT: v_cmp_gt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_xor_b32 s0, s0, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s1, s1, s0
				; GFX10-32-NEXT: s_cbranch_scc0 BB1_2
				; GFX10-32-NEXT: ; %bb.1: ; %.entry
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
				; GFX10-32-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB1_2:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: dynamic_exact:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: v_cmp_le_f32_e64 s[0:1], 0, v1
				; GFX10-64-NEXT: s_mov_b64 s[2:3], exec
				; GFX10-64-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_xor_b64 s[0:1], s[0:1], exec
				; GFX10-64-NEXT: s_andn2_b64 s[2:3], s[2:3], s[0:1]
				; GFX10-64-NEXT: s_cbranch_scc0 BB1_2
				; GFX10-64-NEXT: ; %bb.1: ; %.entry
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[2:3]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX10-64-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB1_2:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%c0 = fcmp olt float %arg0, 0.000000e+00
				%c1 = fcmp oge float %arg1, 0.0
				call void @llvm.amdgcn.wqm.demote(i1 %c1)
				%tmp1 = select i1 %c0, float 1.000000e+00, float 0.000000e+00
				call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
				ret void
				}

				define amdgpu_ps void @branch(float %arg0, float %arg1) {
				; SI-LABEL: branch:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: v_cvt_i32_f32_e32 v0, v0
				; SI-NEXT: v_cvt_i32_f32_e32 v1, v1
				; SI-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, 1
				; SI-NEXT: s_mov_b64 s[0:1], exec
				; SI-NEXT: v_or_b32_e32 v0, v0, v1
				; SI-NEXT: v_and_b32_e32 v0, 1, v0
				; SI-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; SI-NEXT: s_xor_b64 s[4:5], vcc, s[2:3]
				; SI-NEXT: s_and_saveexec_b64 s[2:3], s[4:5]
				; SI-NEXT: ; %bb.1: ; %.demote
				; SI-NEXT: v_cmp_ne_u32_e64 s[4:5], 0, 0
				; SI-NEXT: s_xor_b64 s[4:5], s[4:5], exec
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], s[4:5]
				; SI-NEXT: s_cbranch_scc0 BB2_4
				; SI-NEXT: ; %bb.2: ; %.demote
				; SI-NEXT: s_and_b64 exec, exec, s[0:1]
				; SI-NEXT: ; %bb.3: ; %.continue
				; SI-NEXT: s_or_b64 exec, exec, s[2:3]
				; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; SI-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB2_4:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: branch:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX9-NEXT: v_cvt_i32_f32_e32 v1, v1
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, 1
				; GFX9-NEXT: s_mov_b64 s[0:1], exec
				; GFX9-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
				; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX9-NEXT: s_xor_b64 s[4:5], vcc, s[2:3]
				; GFX9-NEXT: s_and_saveexec_b64 s[2:3], s[4:5]
				; GFX9-NEXT: ; %bb.1: ; %.demote
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[4:5], 0, 0
				; GFX9-NEXT: s_xor_b64 s[4:5], s[4:5], exec
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], s[4:5]
				; GFX9-NEXT: s_cbranch_scc0 BB2_4
				; GFX9-NEXT: ; %bb.2: ; %.demote
				; GFX9-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX9-NEXT: ; %bb.3: ; %.continue
				; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX9-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB2_4:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: branch:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v1, v1
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s1, 0, 1
				; GFX10-32-NEXT: s_mov_b32 s0, exec_lo
				; GFX10-32-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX10-32-NEXT: v_and_b32_e32 v0, 1, v0
				; GFX10-32-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_xor_b32 s2, vcc_lo, s1
				; GFX10-32-NEXT: s_and_saveexec_b32 s1, s2
				; GFX10-32-NEXT: ; %bb.1: ; %.demote
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s2, 0, 0
				; GFX10-32-NEXT: s_xor_b32 s2, s2, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, s2
				; GFX10-32-NEXT: s_cbranch_scc0 BB2_4
				; GFX10-32-NEXT: ; %bb.2: ; %.demote
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s0
				; GFX10-32-NEXT: ; %bb.3: ; %.continue
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
				; GFX10-32-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB2_4:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: branch:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v1, v1
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, 1
				; GFX10-64-NEXT: s_mov_b64 s[0:1], exec
				; GFX10-64-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX10-64-NEXT: v_and_b32_e32 v0, 1, v0
				; GFX10-64-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_xor_b64 s[4:5], vcc, s[2:3]
				; GFX10-64-NEXT: s_and_saveexec_b64 s[2:3], s[4:5]
				; GFX10-64-NEXT: ; %bb.1: ; %.demote
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[4:5], 0, 0
				; GFX10-64-NEXT: s_xor_b64 s[4:5], s[4:5], exec
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], s[4:5]
				; GFX10-64-NEXT: s_cbranch_scc0 BB2_4
				; GFX10-64-NEXT: ; %bb.2: ; %.demote
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX10-64-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB2_4:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%i0 = fptosi float %arg0 to i32
				%i1 = fptosi float %arg1 to i32
				%c0 = or i32 %i0, %i1
				%c1 = and i32 %c0, 1
				%c2 = icmp eq i32 %c1, 0
				br i1 %c2, label %.continue, label %.demote

				.demote:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue

				.continue:
				%tmp1 = select i1 %c2, float 1.000000e+00, float 0.000000e+00
				call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
				ret void
				}

				define amdgpu_ps <4 x float> @wqm_demote_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {
				; SI-LABEL: wqm_demote_1:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[12:13], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v1
				; SI-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; SI-NEXT: ; %bb.1: ; %.demote
				; SI-NEXT: v_cmp_ne_u32_e64 s[16:17], 0, 0
				; SI-NEXT: s_xor_b64 s[16:17], s[16:17], exec
				; SI-NEXT: s_andn2_b64 s[12:13], s[12:13], s[16:17]
				; SI-NEXT: s_cbranch_scc0 BB3_4
				; SI-NEXT: ; %bb.2: ; %.demote
				; SI-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; SI-NEXT: s_and_b64 exec, exec, s[16:17]
				; SI-NEXT: ; %bb.3: ; %.continue
				; SI-NEXT: s_or_b64 exec, exec, s[14:15]
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_add_f32_e32 v0, v0, v0
				; SI-NEXT: s_and_b64 exec, exec, s[12:13]
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: s_branch BB3_5
				; SI-NEXT: BB3_4:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB3_5:
				;
				; GFX9-LABEL: wqm_demote_1:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[12:13], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v1
				; GFX9-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[16:17], 0, 0
				; GFX9-NEXT: s_xor_b64 s[16:17], s[16:17], exec
				; GFX9-NEXT: s_andn2_b64 s[12:13], s[12:13], s[16:17]
				; GFX9-NEXT: s_cbranch_scc0 BB3_4
				; GFX9-NEXT: ; %bb.2: ; %.demote
				; GFX9-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX9-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX9-NEXT: ; %bb.3: ; %.continue
				; GFX9-NEXT: s_or_b64 exec, exec, s[14:15]
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_branch BB3_5
				; GFX9-NEXT: BB3_4:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB3_5:
				;
				; GFX10-32-LABEL: wqm_demote_1:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: v_cmp_ngt_f32_e32 vcc_lo, 0, v1
				; GFX10-32-NEXT: s_and_saveexec_b32 s13, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s28, 0, 0
				; GFX10-32-NEXT: s_xor_b32 s14, s28, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s12, s12, s14
				; GFX10-32-NEXT: s_cbranch_scc0 BB3_4
				; GFX10-32-NEXT: ; %bb.2: ; %.demote
				; GFX10-32-NEXT: s_wqm_b32 s28, s12
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s28
				; GFX10-32-NEXT: ; %bb.3: ; %.continue
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s13
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: s_branch BB3_5
				; GFX10-32-NEXT: BB3_4:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB3_5:
				;
				; GFX10-64-LABEL: wqm_demote_1:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[12:13], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v1
				; GFX10-64-NEXT: s_and_saveexec_b64 s[28:29], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[16:17], 0, 0
				; GFX10-64-NEXT: s_xor_b64 s[16:17], s[16:17], exec
				; GFX10-64-NEXT: s_andn2_b64 s[12:13], s[12:13], s[16:17]
				; GFX10-64-NEXT: s_cbranch_scc0 BB3_4
				; GFX10-64-NEXT: ; %bb.2: ; %.demote
				; GFX10-64-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[28:29]
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: s_branch BB3_5
				; GFX10-64-NEXT: BB3_4:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB3_5:
				.entry:
				%z.cmp = fcmp olt float %z, 0.0
				br i1 %z.cmp, label %.continue, label %.demote

				.demote:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue

				.continue:
				%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0
				%tex0 = extractelement <4 x float> %tex, i32 0
				%tex1 = extractelement <4 x float> %tex, i32 0
				%coord1 = fadd float %tex0, %tex1
				%rtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord1, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0

				ret <4 x float> %rtex
				}

				define amdgpu_ps <4 x float> @wqm_demote_2(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {
				; SI-LABEL: wqm_demote_2:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[12:13], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; SI-NEXT: ; %bb.1: ; %.demote
				; SI-NEXT: v_cmp_ne_u32_e64 s[16:17], 0, 0
				; SI-NEXT: s_xor_b64 s[16:17], s[16:17], exec
				; SI-NEXT: s_andn2_b64 s[12:13], s[12:13], s[16:17]
				; SI-NEXT: s_cbranch_scc0 BB4_4
				; SI-NEXT: ; %bb.2: ; %.demote
				; SI-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; SI-NEXT: s_and_b64 exec, exec, s[16:17]
				; SI-NEXT: ; %bb.3: ; %.continue
				; SI-NEXT: s_or_b64 exec, exec, s[14:15]
				; SI-NEXT: v_add_f32_e32 v0, v0, v0
				; SI-NEXT: s_and_b64 exec, exec, s[12:13]
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: s_branch BB4_5
				; SI-NEXT: BB4_4:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB4_5:
				;
				; GFX9-LABEL: wqm_demote_2:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[12:13], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[16:17], 0, 0
				; GFX9-NEXT: s_xor_b64 s[16:17], s[16:17], exec
				; GFX9-NEXT: s_andn2_b64 s[12:13], s[12:13], s[16:17]
				; GFX9-NEXT: s_cbranch_scc0 BB4_4
				; GFX9-NEXT: ; %bb.2: ; %.demote
				; GFX9-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX9-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX9-NEXT: ; %bb.3: ; %.continue
				; GFX9-NEXT: s_or_b64 exec, exec, s[14:15]
				; GFX9-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_branch BB4_5
				; GFX9-NEXT: BB4_4:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB4_5:
				;
				; GFX10-32-LABEL: wqm_demote_2:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: v_cmp_ngt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_and_saveexec_b32 s13, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s28, 0, 0
				; GFX10-32-NEXT: s_xor_b32 s14, s28, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s12, s12, s14
				; GFX10-32-NEXT: s_cbranch_scc0 BB4_4
				; GFX10-32-NEXT: ; %bb.2: ; %.demote
				; GFX10-32-NEXT: s_wqm_b32 s28, s12
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s28
				; GFX10-32-NEXT: ; %bb.3: ; %.continue
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s13
				; GFX10-32-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: s_branch BB4_5
				; GFX10-32-NEXT: BB4_4:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB4_5:
				;
				; GFX10-64-LABEL: wqm_demote_2:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[12:13], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_and_saveexec_b64 s[28:29], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[16:17], 0, 0
				; GFX10-64-NEXT: s_xor_b64 s[16:17], s[16:17], exec
				; GFX10-64-NEXT: s_andn2_b64 s[12:13], s[12:13], s[16:17]
				; GFX10-64-NEXT: s_cbranch_scc0 BB4_4
				; GFX10-64-NEXT: ; %bb.2: ; %.demote
				; GFX10-64-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[28:29]
				; GFX10-64-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: s_branch BB4_5
				; GFX10-64-NEXT: BB4_4:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB4_5:
				.entry:
				%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0
				%tex0 = extractelement <4 x float> %tex, i32 0
				%tex1 = extractelement <4 x float> %tex, i32 0
				%z.cmp = fcmp olt float %tex0, 0.0
				br i1 %z.cmp, label %.continue, label %.demote

				.demote:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue

				.continue:
				%coord1 = fadd float %tex0, %tex1
				%rtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord1, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0

				ret <4 x float> %rtex
				}

				define amdgpu_ps <4 x float> @wqm_demote_dynamic(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {
				; SI-LABEL: wqm_demote_dynamic:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[12:13], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_xor_b64 s[14:15], vcc, exec
				; SI-NEXT: s_andn2_b64 s[12:13], s[12:13], s[14:15]
				; SI-NEXT: s_cbranch_scc0 BB5_2
				; SI-NEXT: ; %bb.1: ; %.entry
				; SI-NEXT: s_wqm_b64 s[14:15], s[12:13]
				; SI-NEXT: s_and_b64 exec, exec, s[14:15]
				; SI-NEXT: v_add_f32_e32 v0, v0, v0
				; SI-NEXT: s_and_b64 exec, exec, s[12:13]
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: s_branch BB5_3
				; SI-NEXT: BB5_2:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB5_3:
				;
				; GFX9-LABEL: wqm_demote_dynamic:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[12:13], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_xor_b64 s[14:15], vcc, exec
				; GFX9-NEXT: s_andn2_b64 s[12:13], s[12:13], s[14:15]
				; GFX9-NEXT: s_cbranch_scc0 BB5_2
				; GFX9-NEXT: ; %bb.1: ; %.entry
				; GFX9-NEXT: s_wqm_b64 s[14:15], s[12:13]
				; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
				; GFX9-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_branch BB5_3
				; GFX9-NEXT: BB5_2:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB5_3:
				;
				; GFX10-32-LABEL: wqm_demote_dynamic:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: v_cmp_gt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_xor_b32 s13, vcc_lo, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s12, s12, s13
				; GFX10-32-NEXT: s_cbranch_scc0 BB5_2
				; GFX10-32-NEXT: ; %bb.1: ; %.entry
				; GFX10-32-NEXT: s_wqm_b32 s13, s12
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s13
				; GFX10-32-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: s_branch BB5_3
				; GFX10-32-NEXT: BB5_2:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB5_3:
				;
				; GFX10-64-LABEL: wqm_demote_dynamic:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[12:13], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_xor_b64 s[14:15], vcc, exec
				; GFX10-64-NEXT: s_andn2_b64 s[12:13], s[12:13], s[14:15]
				; GFX10-64-NEXT: s_cbranch_scc0 BB5_2
				; GFX10-64-NEXT: ; %bb.1: ; %.entry
				; GFX10-64-NEXT: s_wqm_b64 s[28:29], s[12:13]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[28:29]
				; GFX10-64-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: s_branch BB5_3
				; GFX10-64-NEXT: BB5_2:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB5_3:
				.entry:
				%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0
				%tex0 = extractelement <4 x float> %tex, i32 0
				%tex1 = extractelement <4 x float> %tex, i32 0
				%z.cmp = fcmp olt float %tex0, 0.0
				call void @llvm.amdgcn.wqm.demote(i1 %z.cmp)
				%coord1 = fadd float %tex0, %tex1
				%rtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord1, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0

				ret <4 x float> %rtex
				}

				define amdgpu_ps void @wqm_deriv(<2 x float> %input, float %arg, i32 %index) {
				; SI-LABEL: wqm_deriv:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[0:1], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: v_cvt_i32_f32_e32 v0, v0
				; SI-NEXT: s_movk_i32 s2, 0x3c00
				; SI-NEXT: s_bfe_u32 s4, 0, 0x100000
				; SI-NEXT: s_bfe_u32 s3, s2, 0x100000
				; SI-NEXT: s_lshl_b32 s2, s4, 16
				; SI-NEXT: s_or_b32 s2, s3, s2
				; SI-NEXT: s_lshl_b32 s3, s3, 16
				; SI-NEXT: s_or_b32 s3, s4, s3
				; SI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; SI-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; SI-NEXT: ; %bb.1: ; %.demote0
				; SI-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 0
				; SI-NEXT: s_xor_b64 s[6:7], s[6:7], exec
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], s[6:7]
				; SI-NEXT: s_cbranch_scc0 BB6_7
				; SI-NEXT: ; %bb.2: ; %.demote0
				; SI-NEXT: s_wqm_b64 s[6:7], s[0:1]
				; SI-NEXT: s_and_b64 exec, exec, s[6:7]
				; SI-NEXT: ; %bb.3: ; %.continue0
				; SI-NEXT: s_or_b64 exec, exec, s[4:5]
				; SI-NEXT: s_mov_b64 s[4:5], s[0:1]
				; SI-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[4:5]
				; SI-NEXT: v_mov_b32_e32 v1, v0
				; SI-NEXT: s_nop 1
				; SI-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: s_nop 1
				; SI-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; SI-NEXT: s_and_b64 exec, exec, s[0:1]
				; SI-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0
				; SI-NEXT: s_and_b64 s[4:5], s[0:1], vcc
				; SI-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 1
				; SI-NEXT: s_xor_b64 s[6:7], s[4:5], s[6:7]
				; SI-NEXT: s_and_saveexec_b64 s[4:5], s[6:7]
				; SI-NEXT: ; %bb.4: ; %.demote1
				; SI-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 0
				; SI-NEXT: s_xor_b64 s[6:7], s[6:7], exec
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], s[6:7]
				; SI-NEXT: s_cbranch_scc0 BB6_7
				; SI-NEXT: ; %bb.5: ; %.demote1
				; SI-NEXT: s_and_b64 exec, exec, s[0:1]
				; SI-NEXT: ; %bb.6: ; %.continue1
				; SI-NEXT: s_or_b64 exec, exec, s[4:5]
				; SI-NEXT: v_mov_b32_e32 v0, s2
				; SI-NEXT: v_mov_b32_e32 v1, s3
				; SI-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB6_7:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: wqm_deriv:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[0:1], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX9-NEXT: s_movk_i32 s3, 0x3c00
				; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote0
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 0
				; GFX9-NEXT: s_xor_b64 s[6:7], s[6:7], exec
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], s[6:7]
				; GFX9-NEXT: s_cbranch_scc0 BB6_7
				; GFX9-NEXT: ; %bb.2: ; %.demote0
				; GFX9-NEXT: s_wqm_b64 s[6:7], s[0:1]
				; GFX9-NEXT: s_and_b64 exec, exec, s[6:7]
				; GFX9-NEXT: ; %bb.3: ; %.continue0
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_mov_b64 s[4:5], s[0:1]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[4:5]
				; GFX9-NEXT: v_mov_b32_e32 v1, v0
				; GFX9-NEXT: s_pack_ll_b32_b16 s2, s3, 0
				; GFX9-NEXT: s_pack_ll_b32_b16 s3, 0, s3
				; GFX9-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: s_nop 1
				; GFX9-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX9-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_and_b64 s[4:5], s[0:1], vcc
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 1
				; GFX9-NEXT: s_xor_b64 s[6:7], s[4:5], s[6:7]
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], s[6:7]
				; GFX9-NEXT: ; %bb.4: ; %.demote1
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 0
				; GFX9-NEXT: s_xor_b64 s[6:7], s[6:7], exec
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], s[6:7]
				; GFX9-NEXT: s_cbranch_scc0 BB6_7
				; GFX9-NEXT: ; %bb.5: ; %.demote1
				; GFX9-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX9-NEXT: ; %bb.6: ; %.continue1
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: v_mov_b32_e32 v0, s2
				; GFX9-NEXT: v_mov_b32_e32 v1, s3
				; GFX9-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB6_7:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: wqm_deriv:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s0, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-32-NEXT: s_movk_i32 s1, 0x3c00
				; GFX10-32-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_and_saveexec_b32 s2, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote0
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s3, 0, 0
				; GFX10-32-NEXT: s_xor_b32 s3, s3, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, s3
				; GFX10-32-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-32-NEXT: ; %bb.2: ; %.demote0
				; GFX10-32-NEXT: s_wqm_b32 s3, s0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s3
				; GFX10-32-NEXT: ; %bb.3: ; %.continue0
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s2
				; GFX10-32-NEXT: s_mov_b32 s3, s0
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s3
				; GFX10-32-NEXT: v_mov_b32_e32 v1, v0
				; GFX10-32-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s0
				; GFX10-32-NEXT: v_cmp_eq_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s4, 0, 1
				; GFX10-32-NEXT: s_pack_ll_b32_b16 s2, s1, 0
				; GFX10-32-NEXT: s_pack_ll_b32_b16 s1, 0, s1
				; GFX10-32-NEXT: s_and_b32 s3, s0, vcc_lo
				; GFX10-32-NEXT: s_xor_b32 s4, s3, s4
				; GFX10-32-NEXT: s_and_saveexec_b32 s3, s4
				; GFX10-32-NEXT: ; %bb.4: ; %.demote1
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s4, 0, 0
				; GFX10-32-NEXT: s_xor_b32 s4, s4, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, s4
				; GFX10-32-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-32-NEXT: ; %bb.5: ; %.demote1
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s0
				; GFX10-32-NEXT: ; %bb.6: ; %.continue1
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s3
				; GFX10-32-NEXT: v_mov_b32_e32 v0, s2
				; GFX10-32-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-32-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB6_7:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: wqm_deriv:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[0:1], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-64-NEXT: s_movk_i32 s2, 0x3c00
				; GFX10-64-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote0
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 0
				; GFX10-64-NEXT: s_xor_b64 s[6:7], s[6:7], exec
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], s[6:7]
				; GFX10-64-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-64-NEXT: ; %bb.2: ; %.demote0
				; GFX10-64-NEXT: s_wqm_b64 s[6:7], s[0:1]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[6:7]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue0
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: s_mov_b64 s[4:5], s[0:1]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[4:5]
				; GFX10-64-NEXT: v_mov_b32_e32 v1, v0
				; GFX10-64-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX10-64-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 1
				; GFX10-64-NEXT: s_pack_ll_b32_b16 s3, s2, 0
				; GFX10-64-NEXT: s_pack_ll_b32_b16 s2, 0, s2
				; GFX10-64-NEXT: s_and_b64 s[4:5], s[0:1], vcc
				; GFX10-64-NEXT: s_xor_b64 s[6:7], s[4:5], s[6:7]
				; GFX10-64-NEXT: s_and_saveexec_b64 s[4:5], s[6:7]
				; GFX10-64-NEXT: ; %bb.4: ; %.demote1
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 0
				; GFX10-64-NEXT: s_xor_b64 s[6:7], s[6:7], exec
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], s[6:7]
				; GFX10-64-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-64-NEXT: ; %bb.5: ; %.demote1
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX10-64-NEXT: ; %bb.6: ; %.continue1
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: v_mov_b32_e32 v0, s3
				; GFX10-64-NEXT: v_mov_b32_e32 v1, s2
				; GFX10-64-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB6_7:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%p0 = extractelement <2 x float> %input, i32 0
				%p1 = extractelement <2 x float> %input, i32 1
				%x0 = call float @llvm.amdgcn.interp.p1(float %p0, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%x1 = call float @llvm.amdgcn.interp.p2(float %x0, float %p1, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%argi = fptosi float %arg to i32
				%cond0 = icmp eq i32 %argi, 0
				br i1 %cond0, label %.continue0, label %.demote0

				.demote0:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue0

				.continue0:
				%live = call i1 @llvm.amdgcn.live.mask()
				%live.cond = select i1 %live, i32 0, i32 1065353216
				%live.v0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 85, i32 15, i32 15, i1 true)
				%live.v0f = bitcast i32 %live.v0 to float
				%live.v1 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 0, i32 15, i32 15, i1 true)
				%live.v1f = bitcast i32 %live.v1 to float
				%v0 = fsub float %live.v0f, %live.v1f
				%v0.wqm = call float @llvm.amdgcn.wqm.f32(float %v0)
				%cond1 = fcmp oeq float %v0.wqm, 0.000000e+00
				%cond2 = and i1 %live, %cond1
				br i1 %cond2, label %.continue1, label %.demote1

				.demote1:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue1

				.continue1:
				call void @llvm.amdgcn.exp.compr.v2f16(i32 immarg 0, i32 immarg 15, <2 x half> <half 0xH3C00, half 0xH0000>, <2 x half> <half 0xH0000, half 0xH3C00>, i1 immarg true, i1 immarg true) #3
				ret void
				}

				define amdgpu_ps void @wqm_deriv_loop(<2 x float> %input, float %arg, i32 %index, i32 %limit) {
				; SI-LABEL: wqm_deriv_loop:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[0:1], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: v_cvt_i32_f32_e32 v0, v0
				; SI-NEXT: s_movk_i32 s2, 0x3c00
				; SI-NEXT: s_bfe_u32 s4, 0, 0x100000
				; SI-NEXT: s_bfe_u32 s3, s2, 0x100000
				; SI-NEXT: s_lshl_b32 s2, s4, 16
				; SI-NEXT: s_or_b32 s2, s3, s2
				; SI-NEXT: s_lshl_b32 s3, s3, 16
				; SI-NEXT: s_or_b32 s3, s4, s3
				; SI-NEXT: s_mov_b32 s6, 0
				; SI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; SI-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; SI-NEXT: ; %bb.1: ; %.demote0
				; SI-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 0
				; SI-NEXT: s_xor_b64 s[8:9], s[8:9], exec
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], s[8:9]
				; SI-NEXT: s_cbranch_scc0 BB7_9
				; SI-NEXT: ; %bb.2: ; %.demote0
				; SI-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; SI-NEXT: s_and_b64 exec, exec, s[8:9]
				; SI-NEXT: ; %bb.3: ; %.continue0.preheader
				; SI-NEXT: s_or_b64 exec, exec, s[4:5]
				; SI-NEXT: s_mov_b64 s[4:5], 0
				; SI-NEXT: v_mov_b32_e32 v0, s6
				; SI-NEXT: s_branch BB7_5
				; SI-NEXT: BB7_4: ; %.continue1
				; SI-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; SI-NEXT: s_or_b64 exec, exec, s[6:7]
				; SI-NEXT: v_add_u32_e32 v0, vcc, 1, v0
				; SI-NEXT: v_cmp_ge_i32_e32 vcc, v0, v1
				; SI-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
				; SI-NEXT: s_andn2_b64 exec, exec, s[4:5]
				; SI-NEXT: s_cbranch_execz BB7_8
				; SI-NEXT: BB7_5: ; %.continue0
				; SI-NEXT: ; =>This Inner Loop Header: Depth=1
				; SI-NEXT: s_mov_b64 s[6:7], s[0:1]
				; SI-NEXT: v_cndmask_b32_e64 v2, v0, 0, s[6:7]
				; SI-NEXT: v_mov_b32_e32 v3, v2
				; SI-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 1
				; SI-NEXT: s_nop 0
				; SI-NEXT: v_mov_b32_dpp v3, v3 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: s_nop 1
				; SI-NEXT: v_subrev_f32_dpp v2, v2, v3 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: ; kill: def $vgpr2 killed $vgpr2 killed $exec
				; SI-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
				; SI-NEXT: s_and_b64 s[6:7], s[0:1], vcc
				; SI-NEXT: s_xor_b64 s[8:9], s[6:7], s[8:9]
				; SI-NEXT: s_and_saveexec_b64 s[6:7], s[8:9]
				; SI-NEXT: s_cbranch_execz BB7_4
				; SI-NEXT: ; %bb.6: ; %.demote1
				; SI-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; SI-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 0
				; SI-NEXT: s_xor_b64 s[8:9], s[8:9], exec
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], s[8:9]
				; SI-NEXT: s_cbranch_scc0 BB7_9
				; SI-NEXT: ; %bb.7: ; %.demote1
				; SI-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; SI-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; SI-NEXT: s_and_b64 exec, exec, s[8:9]
				; SI-NEXT: s_branch BB7_4
				; SI-NEXT: BB7_8: ; %.return
				; SI-NEXT: s_or_b64 exec, exec, s[4:5]
				; SI-NEXT: s_and_b64 exec, exec, s[0:1]
				; SI-NEXT: v_mov_b32_e32 v0, s2
				; SI-NEXT: v_mov_b32_e32 v1, s3
				; SI-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB7_9:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: wqm_deriv_loop:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[0:1], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX9-NEXT: s_movk_i32 s3, 0x3c00
				; GFX9-NEXT: s_mov_b32 s6, 0
				; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote0
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 0
				; GFX9-NEXT: s_xor_b64 s[8:9], s[8:9], exec
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], s[8:9]
				; GFX9-NEXT: s_cbranch_scc0 BB7_9
				; GFX9-NEXT: ; %bb.2: ; %.demote0
				; GFX9-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; GFX9-NEXT: s_and_b64 exec, exec, s[8:9]
				; GFX9-NEXT: ; %bb.3: ; %.continue0.preheader
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_pack_ll_b32_b16 s2, s3, 0
				; GFX9-NEXT: s_pack_ll_b32_b16 s3, 0, s3
				; GFX9-NEXT: s_mov_b64 s[4:5], 0
				; GFX9-NEXT: v_mov_b32_e32 v0, s6
				; GFX9-NEXT: s_branch BB7_5
				; GFX9-NEXT: BB7_4: ; %.continue1
				; GFX9-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX9-NEXT: s_or_b64 exec, exec, s[6:7]
				; GFX9-NEXT: v_add_u32_e32 v0, 1, v0
				; GFX9-NEXT: v_cmp_ge_i32_e32 vcc, v0, v1
				; GFX9-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
				; GFX9-NEXT: s_andn2_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_cbranch_execz BB7_8
				; GFX9-NEXT: BB7_5: ; %.continue0
				; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX9-NEXT: s_mov_b64 s[6:7], s[0:1]
				; GFX9-NEXT: v_cndmask_b32_e64 v2, v0, 0, s[6:7]
				; GFX9-NEXT: v_mov_b32_e32 v3, v2
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 1
				; GFX9-NEXT: s_nop 0
				; GFX9-NEXT: v_mov_b32_dpp v3, v3 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: s_nop 1
				; GFX9-NEXT: v_subrev_f32_dpp v2, v2, v3 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: ; kill: def $vgpr2 killed $vgpr2 killed $exec
				; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
				; GFX9-NEXT: s_and_b64 s[6:7], s[0:1], vcc
				; GFX9-NEXT: s_xor_b64 s[8:9], s[6:7], s[8:9]
				; GFX9-NEXT: s_and_saveexec_b64 s[6:7], s[8:9]
				; GFX9-NEXT: s_cbranch_execz BB7_4
				; GFX9-NEXT: ; %bb.6: ; %.demote1
				; GFX9-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX9-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 0
				; GFX9-NEXT: s_xor_b64 s[8:9], s[8:9], exec
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], s[8:9]
				; GFX9-NEXT: s_cbranch_scc0 BB7_9
				; GFX9-NEXT: ; %bb.7: ; %.demote1
				; GFX9-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX9-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; GFX9-NEXT: s_and_b64 exec, exec, s[8:9]
				; GFX9-NEXT: s_branch BB7_4
				; GFX9-NEXT: BB7_8: ; %.return
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX9-NEXT: v_mov_b32_e32 v0, s2
				; GFX9-NEXT: v_mov_b32_e32 v1, s3
				; GFX9-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB7_9:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: wqm_deriv_loop:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s0, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-32-NEXT: s_movk_i32 s2, 0x3c00
				; GFX10-32-NEXT: s_mov_b32 s1, 0
				; GFX10-32-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_and_saveexec_b32 s3, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote0
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s4, 0, 0
				; GFX10-32-NEXT: s_xor_b32 s4, s4, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, s4
				; GFX10-32-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-32-NEXT: ; %bb.2: ; %.demote0
				; GFX10-32-NEXT: s_wqm_b32 s4, s0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s4
				; GFX10-32-NEXT: ; %bb.3: ; %.continue0.preheader
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s3
				; GFX10-32-NEXT: v_mov_b32_e32 v0, s1
				; GFX10-32-NEXT: s_pack_ll_b32_b16 s3, s2, 0
				; GFX10-32-NEXT: s_pack_ll_b32_b16 s2, 0, s2
				; GFX10-32-NEXT: s_branch BB7_5
				; GFX10-32-NEXT: BB7_4: ; %.continue1
				; GFX10-32-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s4
				; GFX10-32-NEXT: v_add_nc_u32_e32 v0, 1, v0
				; GFX10-32-NEXT: v_cmp_ge_i32_e32 vcc_lo, v0, v1
				; GFX10-32-NEXT: s_or_b32 s1, vcc_lo, s1
				; GFX10-32-NEXT: s_andn2_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: s_cbranch_execz BB7_8
				; GFX10-32-NEXT: BB7_5: ; %.continue0
				; GFX10-32-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX10-32-NEXT: s_mov_b32 s4, s0
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s5, 0, 1
				; GFX10-32-NEXT: v_cndmask_b32_e64 v2, v0, 0, s4
				; GFX10-32-NEXT: v_mov_b32_e32 v3, v2
				; GFX10-32-NEXT: v_mov_b32_dpp v3, v3 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: v_subrev_f32_dpp v2, v2, v3 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: ; kill: def $vgpr2 killed $vgpr2 killed $exec
				; GFX10-32-NEXT: v_cmp_eq_f32_e32 vcc_lo, 0, v2
				; GFX10-32-NEXT: s_and_b32 s4, s0, vcc_lo
				; GFX10-32-NEXT: s_xor_b32 s5, s4, s5
				; GFX10-32-NEXT: s_and_saveexec_b32 s4, s5
				; GFX10-32-NEXT: s_cbranch_execz BB7_4
				; GFX10-32-NEXT: ; %bb.6: ; %.demote1
				; GFX10-32-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-32-NEXT: v_cmp_ne_u32_e64 s5, 0, 0
				; GFX10-32-NEXT: s_xor_b32 s5, s5, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, s5
				; GFX10-32-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-32-NEXT: ; %bb.7: ; %.demote1
				; GFX10-32-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-32-NEXT: s_wqm_b32 s5, s0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s5
				; GFX10-32-NEXT: s_branch BB7_4
				; GFX10-32-NEXT: BB7_8: ; %.return
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s0
				; GFX10-32-NEXT: v_mov_b32_e32 v0, s3
				; GFX10-32-NEXT: v_mov_b32_e32 v1, s2
				; GFX10-32-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB7_9:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: wqm_deriv_loop:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[0:1], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-64-NEXT: s_movk_i32 s2, 0x3c00
				; GFX10-64-NEXT: s_mov_b32 s3, 0
				; GFX10-64-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote0
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[6:7], 0, 0
				; GFX10-64-NEXT: s_xor_b64 s[6:7], s[6:7], exec
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], s[6:7]
				; GFX10-64-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-64-NEXT: ; %bb.2: ; %.demote0
				; GFX10-64-NEXT: s_wqm_b64 s[6:7], s[0:1]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[6:7]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue0.preheader
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: v_mov_b32_e32 v0, s3
				; GFX10-64-NEXT: s_pack_ll_b32_b16 s3, s2, 0
				; GFX10-64-NEXT: s_pack_ll_b32_b16 s2, 0, s2
				; GFX10-64-NEXT: s_mov_b64 s[4:5], 0
				; GFX10-64-NEXT: s_branch BB7_5
				; GFX10-64-NEXT: BB7_4: ; %.continue1
				; GFX10-64-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[6:7]
				; GFX10-64-NEXT: v_add_nc_u32_e32 v0, 1, v0
				; GFX10-64-NEXT: v_cmp_ge_i32_e32 vcc, v0, v1
				; GFX10-64-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
				; GFX10-64-NEXT: s_andn2_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: s_cbranch_execz BB7_8
				; GFX10-64-NEXT: BB7_5: ; %.continue0
				; GFX10-64-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX10-64-NEXT: s_mov_b64 s[6:7], s[0:1]
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 1
				; GFX10-64-NEXT: v_cndmask_b32_e64 v2, v0, 0, s[6:7]
				; GFX10-64-NEXT: v_mov_b32_e32 v3, v2
				; GFX10-64-NEXT: v_mov_b32_dpp v3, v3 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: v_subrev_f32_dpp v2, v2, v3 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: ; kill: def $vgpr2 killed $vgpr2 killed $exec
				; GFX10-64-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
				; GFX10-64-NEXT: s_and_b64 s[6:7], s[0:1], vcc
				; GFX10-64-NEXT: s_xor_b64 s[8:9], s[6:7], s[8:9]
				; GFX10-64-NEXT: s_and_saveexec_b64 s[6:7], s[8:9]
				; GFX10-64-NEXT: s_cbranch_execz BB7_4
				; GFX10-64-NEXT: ; %bb.6: ; %.demote1
				; GFX10-64-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-64-NEXT: v_cmp_ne_u32_e64 s[8:9], 0, 0
				; GFX10-64-NEXT: s_xor_b64 s[8:9], s[8:9], exec
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], s[8:9]
				; GFX10-64-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-64-NEXT: ; %bb.7: ; %.demote1
				; GFX10-64-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-64-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[8:9]
				; GFX10-64-NEXT: s_branch BB7_4
				; GFX10-64-NEXT: BB7_8: ; %.return
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX10-64-NEXT: v_mov_b32_e32 v0, s3
				; GFX10-64-NEXT: v_mov_b32_e32 v1, s2
				; GFX10-64-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB7_9:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%p0 = extractelement <2 x float> %input, i32 0
				%p1 = extractelement <2 x float> %input, i32 1
				%x0 = call float @llvm.amdgcn.interp.p1(float %p0, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%x1 = call float @llvm.amdgcn.interp.p2(float %x0, float %p1, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%argi = fptosi float %arg to i32
				%cond0 = icmp eq i32 %argi, 0
				br i1 %cond0, label %.continue0, label %.demote0

				.demote0:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue0

				.continue0:
				%count = phi i32 [ 0, %.entry ], [ 0, %.demote0 ], [ %next, %.continue1 ]
				%live = call i1 @llvm.amdgcn.live.mask()
				%live.cond = select i1 %live, i32 0, i32 %count
				%live.v0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 85, i32 15, i32 15, i1 true)
				%live.v0f = bitcast i32 %live.v0 to float
				%live.v1 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 0, i32 15, i32 15, i1 true)
				%live.v1f = bitcast i32 %live.v1 to float
				%v0 = fsub float %live.v0f, %live.v1f
				%v0.wqm = call float @llvm.amdgcn.wqm.f32(float %v0)
				%cond1 = fcmp oeq float %v0.wqm, 0.000000e+00
				%cond2 = and i1 %live, %cond1
				br i1 %cond2, label %.continue1, label %.demote1

				.demote1:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue1

				.continue1:
				%next = add i32 %count, 1
				%loop.cond = icmp slt i32 %next, %limit
				br i1 %loop.cond, label %.continue0, label %.return

				.return:
				call void @llvm.amdgcn.exp.compr.v2f16(i32 immarg 0, i32 immarg 15, <2 x half> <half 0xH3C00, half 0xH0000>, <2 x half> <half 0xH0000, half 0xH3C00>, i1 immarg true, i1 immarg true) #3
				ret void
				}

				declare void @llvm.amdgcn.wqm.demote(i1) #0
				declare i1 @llvm.amdgcn.live.mask() #0
				declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
				declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare float @llvm.amdgcn.wqm.f32(float) #1
				declare float @llvm.amdgcn.interp.p1(float, i32 immarg, i32 immarg, i32) #2
				declare float @llvm.amdgcn.interp.p2(float, float, i32 immarg, i32 immarg, i32) #2
				declare void @llvm.amdgcn.exp.compr.v2f16(i32 immarg, i32 immarg, <2 x half>, <2 x half>, i1 immarg, i1 immarg) #3
				declare i32 @llvm.amdgcn.mov.dpp.i32(i32, i32 immarg, i32 immarg, i32 immarg, i1 immarg) #4

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind readnone speculatable }
				attributes #3 = { inaccessiblememonly nounwind }
				attributes #4 = { convergent nounwind readnone }

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.live.mask.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s \| FileCheck %s
				# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s \| FileCheck %s

				---
				name: live_mask
				legalized: true

				body: \|
				bb.0:
				; CHECK-LABEL: name: live_mask
				; CHECK: [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.live.mask)
				; CHECK: S_ENDPGM 0, implicit [[INT]](s1)
				%0:_(s1) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.live.mask)
				S_ENDPGM 0, implicit %0
				...

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.wqm.demote.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s\| FileCheck %s
				# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s\| FileCheck %s

				---
				name: wqm_demote_scc
				legalized: true

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1
				; CHECK-LABEL: name: wqm_demote_scc
				; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
				; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
				; CHECK: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(eq), [[COPY]](s32), [[COPY1]]
				; CHECK: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[ICMP]](s32)
				; CHECK: [[COPY2:%[0-9]+]]:vcc(s1) = COPY [[TRUNC]](s1)
				; CHECK: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), [[COPY2]](s1)
				%0:_(s32) = COPY $sgpr0
				%1:_(s32) = COPY $sgpr1
				%2:_(s1) = G_ICMP intpred(eq), %0, %1
				G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), %2
				...

				---
				name: wqm_demote_vcc
				legalized: true

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1
				; CHECK-LABEL: name: wqm_demote_vcc
				; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
				; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
				; CHECK: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(eq), [[COPY]](s32), [[COPY1]]
				; CHECK: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), [[ICMP]](s1)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s1) = G_ICMP intpred(eq), %0, %1
				G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), %2
				...

				---
				name: wqm_demote_constant_true
				legalized: true

				body: \|
				bb.0:
				; CHECK-LABEL: name: wqm_demote_constant_true
				; CHECK: [[C:%[0-9]+]]:sgpr(s1) = G_CONSTANT i1 true
				; CHECK: [[COPY:%[0-9]+]]:vcc(s1) = COPY [[C]](s1)
				; CHECK: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), [[COPY]](s1)
				%0:_(s1) = G_CONSTANT i1 true
				G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), %0
				...

				---
				name: wqm_demote_constant_false
				legalized: true

				body: \|
				bb.0:
				; CHECK-LABEL: name: wqm_demote_constant_false
				; CHECK: [[C:%[0-9]+]]:sgpr(s1) = G_CONSTANT i1 false
				; CHECK: [[COPY:%[0-9]+]]:vcc(s1) = COPY [[C]](s1)
				; CHECK: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), [[COPY]](s1)
				%0:_(s1) = G_CONSTANT i1 false
				G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.wqm.demote), %0
				...

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: ; %bb.1: ; %if			; GFX7-NEXT: ; %bb.1: ; %if
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: BB1_2: ; %else			; GFX7-NEXT: BB1_2: ; %else
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_varying:			; GFX8-LABEL: add_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_mov_b64 s[10:11], exec			; GFX8-NEXT: s_mov_b64 s[8:9], exec
				; GFX8-NEXT: s_mov_b64 s[10:11], s[8:9]
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
				foadUnsubmitted Not Done Reply Inline Actions Why the extra move here? foad: Why the extra move here?
				critsonAuthorUnsubmitted Done Reply Inline Actions This is unfortunate cruft from moving WQM pass later in compiler where there is less COPY elimination. I have been thinking about putting some additional COPY clean up optimisations in another pass. I had originally put some in WQM pass, but agreed with Nicolai this was probably not appropriate as the pass is already getting quite complicated. In this specific case, first copy is saving live lane mask at begin of shader, second copy is PS_LIVE which copies live lane mask. In practice these are very unlikely to be next to each other, and determining that they are the trivial copies of each other will need to consider control flow. critson: This is unfortunate cruft from moving WQM pass later in compiler where there is less COPY…
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX8-NEXT: s_cbranch_execz BB1_4			; GFX8-NEXT: s_cbranch_execz BB1_4
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[10:11]			; GFX8-NEXT: s_mov_b64 exec, s[10:11]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	Show All 37 Lines
	; GFX8-NEXT: s_cbranch_vccnz BB1_6			; GFX8-NEXT: s_cbranch_vccnz BB1_6
	; GFX8-NEXT: ; %bb.5: ; %if			; GFX8-NEXT: ; %bb.5: ; %if
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: BB1_6: ; %UnifiedReturnBlock			; GFX8-NEXT: BB1_6: ; %UnifiedReturnBlock
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_varying:			; GFX9-LABEL: add_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_mov_b64 s[10:11], exec			; GFX9-NEXT: s_mov_b64 s[8:9], exec
				; GFX9-NEXT: s_mov_b64 s[10:11], s[8:9]
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX9-NEXT: s_cbranch_execz BB1_4			; GFX9-NEXT: s_cbranch_execz BB1_4
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[10:11]			; GFX9-NEXT: s_mov_b64 exec, s[10:11]
	Show All 38 Lines
	; GFX9-NEXT: s_cbranch_vccnz BB1_6			; GFX9-NEXT: s_cbranch_vccnz BB1_6
	; GFX9-NEXT: ; %bb.5: ; %if			; GFX9-NEXT: ; %bb.5: ; %if
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: BB1_6: ; %UnifiedReturnBlock			; GFX9-NEXT: BB1_6: ; %UnifiedReturnBlock
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_varying:			; GFX1064-LABEL: add_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_mov_b64 s[10:11], exec			; GFX1064-NEXT: s_mov_b64 s[8:9], exec
	; GFX1064-NEXT: v_mov_b32_e32 v1, v0			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
				; GFX1064-NEXT: s_mov_b64 s[10:11], s[8:9]
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX1064-NEXT: s_cbranch_execz BB1_4			; GFX1064-NEXT: s_cbranch_execz BB1_4
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; GFX1064-NEXT: s_cbranch_vccnz BB1_6			; GFX1064-NEXT: s_cbranch_vccnz BB1_6
	; GFX1064-NEXT: ; %bb.5: ; %if			; GFX1064-NEXT: ; %bb.5: ; %if
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: BB1_6: ; %UnifiedReturnBlock			; GFX1064-NEXT: BB1_6: ; %UnifiedReturnBlock
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: add_i32_varying:			; GFX1032-LABEL: add_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_mov_b32 s9, exec_lo			; GFX1032-NEXT: s_mov_b32 s8, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v1, v0			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
				; GFX1032-NEXT: s_mov_b32 s9, s8
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s8, s9			; GFX1032-NEXT: s_and_saveexec_b32 s8, s9
	; GFX1032-NEXT: s_cbranch_execz BB1_4			; GFX1032-NEXT: s_cbranch_execz BB1_4
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s9, -1			; GFX1032-NEXT: s_or_saveexec_b32 s9, -1
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/early-term.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass=si-insert-skips -verify-machineinstrs %s -o - \| FileCheck %s		# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass=si-insert-skips -verify-machineinstrs %s -o - \| FileCheck %s

--- \|		--- \|
define amdgpu_ps void @early_term_scc0_end_block() {		define amdgpu_ps void @early_term_scc0_end_block() {
ret void		ret void
}		}

define amdgpu_ps void @early_term_scc0_next_terminator() {		define amdgpu_ps void @early_term_scc0_next_terminator() {
ret void		ret void
}		}

define amdgpu_ps void @early_term_scc0_in_block() {		define amdgpu_ps void @early_term_scc0_in_block() {
ret void		ret void
}		}

define amdgpu_ps void @early_term_scc0_with_kill() {
ret void
}

define amdgpu_gs void @early_term_scc0_gs() {		define amdgpu_gs void @early_term_scc0_gs() {
ret void		ret void
}		}

define amdgpu_cs void @early_term_scc0_cs() {		define amdgpu_cs void @early_term_scc0_cs() {
ret void		ret void
}		}
...		...
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	body: \|
bb.1:		bb.1:
liveins: $vgpr0, $vgpr1		liveins: $vgpr0, $vgpr1
EXP 1, $vgpr1, $vgpr1, $vgpr1, $vgpr1, -1, -1, 15, implicit $exec		EXP 1, $vgpr1, $vgpr1, $vgpr1, $vgpr1, -1, -1, 15, implicit $exec
EXP_DONE 0, $vgpr0, $vgpr0, $vgpr0, $vgpr0, -1, -1, 15, implicit $exec		EXP_DONE 0, $vgpr0, $vgpr0, $vgpr0, $vgpr0, -1, -1, 15, implicit $exec
S_ENDPGM 0		S_ENDPGM 0
...		...

---		---
name: early_term_scc0_with_kill
tracksRegLiveness: true
liveins:
- { reg: '$sgpr0' }
- { reg: '$vgpr2' }
body: \|
; CHECK-LABEL: name: early_term_scc0_with_kill
; CHECK: bb.0:
; CHECK: successors: %bb.1(0x80000000), %bb.3(0x00000000)
; CHECK: liveins: $sgpr0, $vgpr2
; CHECK: $vgpr0 = V_MOV_B32_e32 0, implicit $exec
; CHECK: V_CMPX_LE_F32_nosdst_e32 0, killed $vgpr2, implicit-def $exec, implicit $mode, implicit $exec
; CHECK: S_CBRANCH_EXECZ %bb.3, implicit $exec
; CHECK: bb.1:
; CHECK: successors: %bb.4(0x40000000), %bb.3(0x40000000)
; CHECK: liveins: $sgpr0, $vgpr0
; CHECK: S_CMP_EQ_U32 killed $sgpr0, 0, implicit-def $scc
; CHECK: S_CBRANCH_SCC0 %bb.3, implicit $scc
; CHECK: bb.4:
; CHECK: successors: %bb.2(0x80000000)
; CHECK: liveins: $vgpr0, $scc
; CHECK: $vgpr1 = V_MOV_B32_e32 1, implicit $exec
; CHECK: bb.2:
; CHECK: liveins: $vgpr0, $vgpr1
; CHECK: EXP 1, $vgpr1, $vgpr1, $vgpr1, $vgpr1, -1, -1, 15, implicit $exec
; CHECK: EXP_DONE 0, $vgpr0, $vgpr0, $vgpr0, $vgpr0, -1, -1, 15, implicit $exec
; CHECK: S_ENDPGM 0
; CHECK: bb.3:
; CHECK: $exec_lo = S_MOV_B32 0
; CHECK: EXP_DONE 9, undef $vgpr0, undef $vgpr0, undef $vgpr0, undef $vgpr0, 1, 0, 0, implicit $exec
; CHECK: S_ENDPGM 0
bb.0:
liveins: $sgpr0, $vgpr2
successors: %bb.1
$vgpr0 = V_MOV_B32_e32 0, implicit $exec
SI_KILL_F32_COND_IMM_TERMINATOR killed $vgpr2, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec

bb.1:
liveins: $sgpr0, $vgpr0
successors: %bb.2
S_CMP_EQ_U32 killed $sgpr0, 0, implicit-def $scc
SI_EARLY_TERMINATE_SCC0 implicit $scc, implicit $exec
$vgpr1 = V_MOV_B32_e32 1, implicit $exec

bb.2:
liveins: $vgpr0, $vgpr1
EXP 1, $vgpr1, $vgpr1, $vgpr1, $vgpr1, -1, -1, 15, implicit $exec
EXP_DONE 0, $vgpr0, $vgpr0, $vgpr0, $vgpr0, -1, -1, 15, implicit $exec
S_ENDPGM 0
...

---
name: early_term_scc0_gs		name: early_term_scc0_gs
tracksRegLiveness: true		tracksRegLiveness: true
liveins:		liveins:
- { reg: '$sgpr0' }		- { reg: '$sgpr0' }
body: \|		body: \|
; CHECK-LABEL: name: early_term_scc0_gs		; CHECK-LABEL: name: early_term_scc0_gs
; CHECK: bb.0:		; CHECK: bb.0:
; CHECK: successors: %bb.1(0x80000000)		; CHECK: successors: %bb.1(0x80000000)
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir

This file was deleted.

	# RUN: llc -march=amdgcn -mcpu=polaris10 -run-pass si-insert-skips -amdgpu-skip-threshold-legacy=1 %s -o - \| FileCheck %s
	# https://bugs.freedesktop.org/show_bug.cgi?id=99019
	--- \|
	define amdgpu_ps void @kill_uncond_branch() {
	ret void
	}
	...
	---

	# CHECK-LABEL: name: kill_uncond_branch

	# CHECK: bb.0:
	# CHECK: S_CBRANCH_VCCNZ %bb.1, implicit $vcc

	# CHECK: bb.1:
	# CHECK: V_CMPX_LE_F32_e32
	# CHECK-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec

	# CHECK: bb.2:
	# CHECK: S_ENDPGM 0

	# CHECK: bb.3:
	# CHECK-NEXT: EXP_DONE
	# CHECK: S_ENDPGM 0

	name: kill_uncond_branch

	body: \|
	bb.0:
	successors: %bb.1
	S_CBRANCH_VCCNZ %bb.1, implicit $vcc

	bb.1:
	successors: %bb.2
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec
	S_BRANCH %bb.2

	bb.2:
	S_ENDPGM 0

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.kill.ll

; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI %s		; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI %s
; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX10 %s		; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX10 %s

; GCN-LABEL: {{^}}gs_const:		; GCN-LABEL: {{^}}gs_const:
; GCN-NOT: v_cmpx		; GCN-NOT: v_cmpx
; GCN: s_mov_b64 exec, 0		; GCN: s_mov_b64 exec, 0
define amdgpu_gs void @gs_const() {		define amdgpu_gs void @gs_const() {
%tmp = icmp ule i32 0, 3		%tmp = icmp ule i32 0, 3
%tmp1 = select i1 %tmp, float 1.000000e+00, float -1.000000e+00		%tmp1 = select i1 %tmp, float 1.000000e+00, float -1.000000e+00
%c1 = fcmp oge float %tmp1, 0.0		%c1 = fcmp oge float %tmp1, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
%tmp2 = icmp ule i32 3, 0		%tmp2 = icmp ule i32 3, 0
%tmp3 = select i1 %tmp2, float 1.000000e+00, float -1.000000e+00		%tmp3 = select i1 %tmp2, float 1.000000e+00, float -1.000000e+00
%c2 = fcmp oge float %tmp3, 0.0		%c2 = fcmp oge float %tmp3, 0.0
call void @llvm.amdgcn.kill(i1 %c2)		call void @llvm.amdgcn.kill(i1 %c2)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}vcc_implicit_def:		; GCN-LABEL: {{^}}vcc_implicit_def:
; GCN-NOT: v_cmp_gt_f32_e32 vcc,		; GCN: v_cmp_ge_f32_e32 vcc, 0, v{{[0-9]+}}
; GCN: v_cmp_gt_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], 0, v{{[0-9]+}}		; GCN: v_cmp_gt_f32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], 0, v{{[0-9]+}}
; SI: v_cmpx_le_f32_e32 vcc, 0, v{{[0-9]+}}		; GCN: s_andn2_b64 exec, exec, vcc
; GFX10: v_cmpx_le_f32_e32 0, v{{[0-9]+}}
; GCN: v_cndmask_b32_e64 v{{[0-9]+}}, 0, 1.0, [[CMP]]		; GCN: v_cndmask_b32_e64 v{{[0-9]+}}, 0, 1.0, [[CMP]]
define amdgpu_ps void @vcc_implicit_def(float %arg13, float %arg14) {		define amdgpu_ps void @vcc_implicit_def(float %arg13, float %arg14) {
%tmp0 = fcmp olt float %arg13, 0.000000e+00		%tmp0 = fcmp olt float %arg13, 0.000000e+00
%c1 = fcmp oge float %arg14, 0.0		%c1 = fcmp oge float %arg14, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
%tmp1 = select i1 %tmp0, float 1.000000e+00, float 0.000000e+00		%tmp1 = select i1 %tmp0, float 1.000000e+00, float 0.000000e+00
call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0		call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}true:		; GCN-LABEL: {{^}}true:
; GCN-NEXT: %bb.		; GCN-NEXT: %bb.
; GCN-NEXT: %bb.
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_gs void @true() {		define amdgpu_gs void @true() {
call void @llvm.amdgcn.kill(i1 true)		call void @llvm.amdgcn.kill(i1 true)
ret void		ret void
}		}

; GCN-LABEL: {{^}}false:		; GCN-LABEL: {{^}}false:
; GCN-NOT: v_cmpx		; GCN-NOT: v_cmpx
; GCN: s_mov_b64 exec, 0		; GCN: s_mov_b64 exec, 0
define amdgpu_gs void @false() {		define amdgpu_gs void @false() {
call void @llvm.amdgcn.kill(i1 false)		call void @llvm.amdgcn.kill(i1 false)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}and:		; GCN-LABEL: {{^}}and:
; GCN: v_cmp_lt_i32		; GCN: v_cmp_lt_i32
; GCN: v_cmp_lt_i32		; GCN: v_cmp_lt_i32
; GCN: s_or_b64 s[0:1]		; GCN: s_or_b64 s[0:1]
; GCN: s_and_b64 exec, exec, s[0:1]		; GCN: s_xor_b64 s[0:1], s[0:1], exec
		; GCN: s_andn2_b64 s[2:3], s[2:3], s[0:1]
		; GCN: s_and_b64 exec, exec, s[2:3]
define amdgpu_gs void @and(i32 %a, i32 %b, i32 %c, i32 %d) {		define amdgpu_gs void @and(i32 %a, i32 %b, i32 %c, i32 %d) {
%c1 = icmp slt i32 %a, %b		%c1 = icmp slt i32 %a, %b
%c2 = icmp slt i32 %c, %d		%c2 = icmp slt i32 %c, %d
%x = or i1 %c1, %c2		%x = or i1 %c1, %c2
call void @llvm.amdgcn.kill(i1 %x)		call void @llvm.amdgcn.kill(i1 %x)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}andn2:		; GCN-LABEL: {{^}}andn2:
; GCN: v_cmp_lt_i32		; GCN: v_cmp_lt_i32
; GCN: v_cmp_lt_i32		; GCN: v_cmp_lt_i32
; GCN: s_xor_b64 s[0:1]		; GCN: s_xor_b64 s[0:1]
; GCN: s_andn2_b64 exec, exec, s[0:1]		; GCN: s_andn2_b64 s[2:3], s[2:3], s[0:1]
		; GCN: s_and_b64 exec, exec, s[2:3]
define amdgpu_gs void @andn2(i32 %a, i32 %b, i32 %c, i32 %d) {		define amdgpu_gs void @andn2(i32 %a, i32 %b, i32 %c, i32 %d) {
%c1 = icmp slt i32 %a, %b		%c1 = icmp slt i32 %a, %b
%c2 = icmp slt i32 %c, %d		%c2 = icmp slt i32 %c, %d
%x = xor i1 %c1, %c2		%x = xor i1 %c1, %c2
%y = xor i1 %x, 1		%y = xor i1 %x, 1
call void @llvm.amdgcn.kill(i1 %y)		call void @llvm.amdgcn.kill(i1 %y)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}oeq:		; GCN-LABEL: {{^}}oeq:
; GCN: v_cmpx_eq_f32		; GCN: v_cmp_lg_f32
; GCN-NOT: s_and
define amdgpu_gs void @oeq(float %a) {		define amdgpu_gs void @oeq(float %a) {
%c1 = fcmp oeq float %a, 0.0		%c1 = fcmp oeq float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}ogt:		; GCN-LABEL: {{^}}ogt:
; GCN: v_cmpx_lt_f32		; GCN: v_cmp_gt_f32
; GCN-NOT: s_and
define amdgpu_gs void @ogt(float %a) {		define amdgpu_gs void @ogt(float %a) {
%c1 = fcmp ogt float %a, 0.0		%c1 = fcmp ogt float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}oge:		; GCN-LABEL: {{^}}oge:
; GCN: v_cmpx_le_f32		; GCN: v_cmp_ge_f32
; GCN-NOT: s_and
define amdgpu_gs void @oge(float %a) {		define amdgpu_gs void @oge(float %a) {
%c1 = fcmp oge float %a, 0.0		%c1 = fcmp oge float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}olt:		; GCN-LABEL: {{^}}olt:
; GCN: v_cmpx_gt_f32		; GCN: v_cmp_lt_f32
; GCN-NOT: s_and
define amdgpu_gs void @olt(float %a) {		define amdgpu_gs void @olt(float %a) {
%c1 = fcmp olt float %a, 0.0		%c1 = fcmp olt float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}ole:		; GCN-LABEL: {{^}}ole:
; GCN: v_cmpx_ge_f32		; GCN: v_cmp_le_f32
; GCN-NOT: s_and
define amdgpu_gs void @ole(float %a) {		define amdgpu_gs void @ole(float %a) {
%c1 = fcmp ole float %a, 0.0		%c1 = fcmp ole float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}one:		; GCN-LABEL: {{^}}one:
; GCN: v_cmpx_lg_f32		; GCN: v_cmp_eq_f32
; GCN-NOT: s_and
define amdgpu_gs void @one(float %a) {		define amdgpu_gs void @one(float %a) {
%c1 = fcmp one float %a, 0.0		%c1 = fcmp one float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}ord:		; GCN-LABEL: {{^}}ord:
; FIXME: This is absolutely unimportant, but we could use the cmpx variant here.
; GCN: v_cmp_o_f32		; GCN: v_cmp_o_f32
define amdgpu_gs void @ord(float %a) {		define amdgpu_gs void @ord(float %a) {
%c1 = fcmp ord float %a, 0.0		%c1 = fcmp ord float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}uno:		; GCN-LABEL: {{^}}uno:
; FIXME: This is absolutely unimportant, but we could use the cmpx variant here.
; GCN: v_cmp_u_f32		; GCN: v_cmp_u_f32
define amdgpu_gs void @uno(float %a) {		define amdgpu_gs void @uno(float %a) {
%c1 = fcmp uno float %a, 0.0		%c1 = fcmp uno float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}ueq:		; GCN-LABEL: {{^}}ueq:
; GCN: v_cmpx_nlg_f32		; GCN: v_cmp_neq_f32
; GCN-NOT: s_and
define amdgpu_gs void @ueq(float %a) {		define amdgpu_gs void @ueq(float %a) {
%c1 = fcmp ueq float %a, 0.0		%c1 = fcmp ueq float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}ugt:		; GCN-LABEL: {{^}}ugt:
; GCN: v_cmpx_nge_f32		; GCN: v_cmp_nle_f32
; GCN-NOT: s_and
define amdgpu_gs void @ugt(float %a) {		define amdgpu_gs void @ugt(float %a) {
%c1 = fcmp ugt float %a, 0.0		%c1 = fcmp ugt float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}uge:		; GCN-LABEL: {{^}}uge:
; SI: v_cmpx_ngt_f32_e32 vcc, -1.0		; GCN: v_cmp_nlt_f32_e32 vcc, -1.0
; GFX10: v_cmpx_ngt_f32_e32 -1.0
; GCN-NOT: s_and
define amdgpu_gs void @uge(float %a) {		define amdgpu_gs void @uge(float %a) {
%c1 = fcmp uge float %a, -1.0		%c1 = fcmp uge float %a, -1.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}ult:		; GCN-LABEL: {{^}}ult:
; SI: v_cmpx_nle_f32_e32 vcc, -2.0		; GCN: v_cmp_nge_f32_e32 vcc, -2.0
; GFX10: v_cmpx_nle_f32_e32 -2.0
; GCN-NOT: s_and
define amdgpu_gs void @ult(float %a) {		define amdgpu_gs void @ult(float %a) {
%c1 = fcmp ult float %a, -2.0		%c1 = fcmp ult float %a, -2.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}ule:		; GCN-LABEL: {{^}}ule:
; SI: v_cmpx_nlt_f32_e32 vcc, 2.0		; GCN: v_cmp_ngt_f32_e32 vcc, 2.0
; GFX10: v_cmpx_nlt_f32_e32 2.0
; GCN-NOT: s_and
define amdgpu_gs void @ule(float %a) {		define amdgpu_gs void @ule(float %a) {
%c1 = fcmp ule float %a, 2.0		%c1 = fcmp ule float %a, 2.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}une:		; GCN-LABEL: {{^}}une:
; SI: v_cmpx_neq_f32_e32 vcc, 0		; GCN: v_cmp_nlg_f32_e32 vcc, 0
; GFX10: v_cmpx_neq_f32_e32 0
; GCN-NOT: s_and
define amdgpu_gs void @une(float %a) {		define amdgpu_gs void @une(float %a) {
%c1 = fcmp une float %a, 0.0		%c1 = fcmp une float %a, 0.0
call void @llvm.amdgcn.kill(i1 %c1)		call void @llvm.amdgcn.kill(i1 %c1)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}neg_olt:		; GCN-LABEL: {{^}}neg_olt:
; SI: v_cmpx_ngt_f32_e32 vcc, 1.0		; GCN: v_cmp_nlt_f32_e32 vcc, 1.0
; GFX10: v_cmpx_ngt_f32_e32 1.0
; GCN-NOT: s_and
define amdgpu_gs void @neg_olt(float %a) {		define amdgpu_gs void @neg_olt(float %a) {
%c1 = fcmp olt float %a, 1.0		%c1 = fcmp olt float %a, 1.0
%c2 = xor i1 %c1, 1		%c2 = xor i1 %c1, 1
call void @llvm.amdgcn.kill(i1 %c2)		call void @llvm.amdgcn.kill(i1 %c2)
		call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
ret void		ret void
}		}

; GCN-LABEL: {{^}}fcmp_x2:		; GCN-LABEL: {{^}}fcmp_x2:
; FIXME: LLVM should be able to combine these fcmp opcodes.		; FIXME: LLVM should be able to combine these fcmp opcodes.
; SI: v_cmp_lt_f32_e32 vcc, s{{[0-9]+}}, v0		; SI: v_cmp_lt_f32_e32 vcc, s{{[0-9]+}}, v0
; GFX10: v_cmp_lt_f32_e32 vcc, 0x3e800000, v0		; GFX10: v_cmp_lt_f32_e32 vcc, 0x3e800000, v0
; GCN: v_cndmask_b32		; GCN: v_cndmask_b32
; GCN: v_cmpx_le_f32		; GCN: v_cmp_ge_f32
define amdgpu_ps void @fcmp_x2(float %a) #0 {		define amdgpu_ps void @fcmp_x2(float %a) #0 {
%ogt = fcmp nsz ogt float %a, 2.500000e-01		%ogt = fcmp nsz ogt float %a, 2.500000e-01
%k = select i1 %ogt, float -1.000000e+00, float 0.000000e+00		%k = select i1 %ogt, float -1.000000e+00, float 0.000000e+00
%c = fcmp nsz oge float %k, 0.000000e+00		%c = fcmp nsz oge float %k, 0.000000e+00
call void @llvm.amdgcn.kill(i1 %c) #1		call void @llvm.amdgcn.kill(i1 %c) #1
ret void		ret void
}		}

		; Note: an almost identical test for this exists in llvm.amdgcn.wqm.vote.ll
; GCN-LABEL: {{^}}wqm:		; GCN-LABEL: {{^}}wqm:
; GCN: v_cmp_neq_f32_e32 vcc, 0		; GCN: v_cmp_neq_f32_e32 vcc, 0
; GCN: s_wqm_b64 s[0:1], vcc		; GCN-DAG: s_wqm_b64 s[2:3], vcc
		; GCN-DAG: s_mov_b64 s[0:1], exec
		; GCN: s_xor_b64 s[2:3], s[2:3], exec
		; GCN: s_andn2_b64 s[0:1], s[0:1], s[2:3]
; GCN: s_and_b64 exec, exec, s[0:1]		; GCN: s_and_b64 exec, exec, s[0:1]
define amdgpu_ps void @wqm(float %a) {		define amdgpu_ps float @wqm(float %a) {
%c1 = fcmp une float %a, 0.0		%c1 = fcmp une float %a, 0.0
%c2 = call i1 @llvm.amdgcn.wqm.vote(i1 %c1)		%c2 = call i1 @llvm.amdgcn.wqm.vote(i1 %c1)
call void @llvm.amdgcn.kill(i1 %c2)		call void @llvm.amdgcn.kill(i1 %c2)
ret void		ret float 0.0
}		}

; This checks that we use the 64-bit encoding when the operand is a SGPR.		; This checks that we use the 64-bit encoding when the operand is a SGPR.
; GCN-LABEL: {{^}}test_sgpr:		; GCN-LABEL: {{^}}test_sgpr:
; GCN: v_cmpx_ge_f32_e64		; GCN: v_cmp_ge_f32_e64
define amdgpu_ps void @test_sgpr(float inreg %a) #0 {		define amdgpu_ps void @test_sgpr(float inreg %a) #0 {
%c = fcmp ole float %a, 1.000000e+00		%c = fcmp ole float %a, 1.000000e+00
call void @llvm.amdgcn.kill(i1 %c) #1		call void @llvm.amdgcn.kill(i1 %c) #1
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_non_inline_imm_sgpr:		; GCN-LABEL: {{^}}test_non_inline_imm_sgpr:
; GCN-NOT: v_cmpx_ge_f32_e64		; GCN-NOT: v_cmp_le_f32_e64
define amdgpu_ps void @test_non_inline_imm_sgpr(float inreg %a) #0 {		define amdgpu_ps void @test_non_inline_imm_sgpr(float inreg %a) #0 {
%c = fcmp ole float %a, 1.500000e+00		%c = fcmp ole float %a, 1.500000e+00
call void @llvm.amdgcn.kill(i1 %c) #1		call void @llvm.amdgcn.kill(i1 %c) #1
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_scc_liveness:		; GCN-LABEL: {{^}}test_scc_liveness:
; GCN: v_cmp		; GCN: v_cmp
Show All 12 Lines	loop3: ; preds = %loop3, %main_body
br i1 %tmp1, label %endloop15, label %loop3		br i1 %tmp1, label %endloop15, label %loop3

endloop15: ; preds = %loop3		endloop15: ; preds = %loop3
ret void		ret void
}		}

declare void @llvm.amdgcn.kill(i1) #0		declare void @llvm.amdgcn.kill(i1) #0
declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0		declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
		declare void @llvm.amdgcn.s.sendmsg(i32, i32) #0
declare i1 @llvm.amdgcn.wqm.vote(i1)		declare i1 @llvm.amdgcn.wqm.vote(i1)

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-32 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-64 %s

				define amdgpu_ps void @static_exact(float %arg0, float %arg1) {
				; SI-LABEL: static_exact:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_branch BB0_2
				; SI-NEXT: ; %bb.1: ; %.entry
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; SI-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB0_2:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: static_exact:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_branch BB0_2
				; GFX9-NEXT: ; %bb.1: ; %.entry
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX9-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB0_2:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: static_exact:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: v_cmp_gt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_branch BB0_2
				; GFX10-32-NEXT: ; %bb.1: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
				; GFX10-32-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB0_2:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: static_exact:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_branch BB0_2
				; GFX10-64-NEXT: ; %bb.1: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX10-64-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB0_2:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%c0 = fcmp olt float %arg0, 0.000000e+00
				%c1 = fcmp oge float %arg1, 0.0
				call void @llvm.amdgcn.wqm.demote(i1 false)
				%tmp1 = select i1 %c0, float 1.000000e+00, float 0.000000e+00
				call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
				ret void
				}

				define amdgpu_ps void @dynamic_exact(float %arg0, float %arg1) {
				; SI-LABEL: dynamic_exact:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: v_cmp_le_f32_e64 s[0:1], 0, v1
				; SI-NEXT: s_mov_b64 s[2:3], exec
				; SI-NEXT: s_xor_b64 s[0:1], s[0:1], exec
				; SI-NEXT: s_andn2_b64 s[2:3], s[2:3], s[0:1]
				; SI-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_cbranch_scc0 BB1_2
				; SI-NEXT: ; %bb.1: ; %.entry
				; SI-NEXT: s_and_b64 exec, exec, s[2:3]
				; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; SI-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB1_2:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: dynamic_exact:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: v_cmp_le_f32_e64 s[0:1], 0, v1
				; GFX9-NEXT: s_mov_b64 s[2:3], exec
				; GFX9-NEXT: s_xor_b64 s[0:1], s[0:1], exec
				; GFX9-NEXT: s_andn2_b64 s[2:3], s[2:3], s[0:1]
				; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_cbranch_scc0 BB1_2
				; GFX9-NEXT: ; %bb.1: ; %.entry
				; GFX9-NEXT: s_and_b64 exec, exec, s[2:3]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX9-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB1_2:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: dynamic_exact:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: v_cmp_le_f32_e64 s0, 0, v1
				; GFX10-32-NEXT: s_mov_b32 s1, exec_lo
				; GFX10-32-NEXT: v_cmp_gt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_xor_b32 s0, s0, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s1, s1, s0
				; GFX10-32-NEXT: s_cbranch_scc0 BB1_2
				; GFX10-32-NEXT: ; %bb.1: ; %.entry
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
				; GFX10-32-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB1_2:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: dynamic_exact:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: v_cmp_le_f32_e64 s[0:1], 0, v1
				; GFX10-64-NEXT: s_mov_b64 s[2:3], exec
				; GFX10-64-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_xor_b64 s[0:1], s[0:1], exec
				; GFX10-64-NEXT: s_andn2_b64 s[2:3], s[2:3], s[0:1]
				; GFX10-64-NEXT: s_cbranch_scc0 BB1_2
				; GFX10-64-NEXT: ; %bb.1: ; %.entry
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[2:3]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX10-64-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB1_2:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%c0 = fcmp olt float %arg0, 0.000000e+00
				%c1 = fcmp oge float %arg1, 0.0
				call void @llvm.amdgcn.wqm.demote(i1 %c1)
				%tmp1 = select i1 %c0, float 1.000000e+00, float 0.000000e+00
				call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
				ret void
				}

				define amdgpu_ps void @branch(float %arg0, float %arg1) {
				; SI-LABEL: branch:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: v_cvt_i32_f32_e32 v0, v0
				; SI-NEXT: v_cvt_i32_f32_e32 v1, v1
				; SI-NEXT: s_mov_b64 s[2:3], exec
				; SI-NEXT: v_or_b32_e32 v0, v0, v1
				; SI-NEXT: v_and_b32_e32 v1, 1, v0
				; SI-NEXT: v_and_b32_e32 v0, 1, v0
				; SI-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1
				; SI-NEXT: v_cmp_eq_u32_e64 s[0:1], 1, v0
				; SI-NEXT: s_and_saveexec_b64 s[4:5], s[0:1]
				; SI-NEXT: ; %bb.1: ; %.demote
				; SI-NEXT: s_andn2_b64 s[2:3], s[2:3], exec
				; SI-NEXT: s_cbranch_scc0 BB2_4
				; SI-NEXT: ; %bb.2: ; %.demote
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: ; %bb.3: ; %.continue
				; SI-NEXT: s_or_b64 exec, exec, s[4:5]
				; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; SI-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB2_4:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: branch:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX9-NEXT: v_cvt_i32_f32_e32 v1, v1
				; GFX9-NEXT: s_mov_b64 s[2:3], exec
				; GFX9-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX9-NEXT: v_and_b32_e32 v1, 1, v0
				; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
				; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1
				; GFX9-NEXT: v_cmp_eq_u32_e64 s[0:1], 1, v0
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], s[0:1]
				; GFX9-NEXT: ; %bb.1: ; %.demote
				; GFX9-NEXT: s_andn2_b64 s[2:3], s[2:3], exec
				; GFX9-NEXT: s_cbranch_scc0 BB2_4
				; GFX9-NEXT: ; %bb.2: ; %.demote
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: ; %bb.3: ; %.continue
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX9-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB2_4:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: branch:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v1, v1
				; GFX10-32-NEXT: s_mov_b32 s1, exec_lo
				; GFX10-32-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX10-32-NEXT: v_and_b32_e32 v1, 1, v0
				; GFX10-32-NEXT: v_and_b32_e32 v0, 1, v0
				; GFX10-32-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v1
				; GFX10-32-NEXT: v_cmp_eq_u32_e64 s0, 1, v0
				; GFX10-32-NEXT: s_and_saveexec_b32 s2, s0
				; GFX10-32-NEXT: ; %bb.1: ; %.demote
				; GFX10-32-NEXT: s_andn2_b32 s1, s1, exec_lo
				; GFX10-32-NEXT: s_cbranch_scc0 BB2_4
				; GFX10-32-NEXT: ; %bb.2: ; %.demote
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: ; %bb.3: ; %.continue
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s2
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
				; GFX10-32-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB2_4:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: branch:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v1, v1
				; GFX10-64-NEXT: s_mov_b64 s[2:3], exec
				; GFX10-64-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX10-64-NEXT: v_and_b32_e32 v1, 1, v0
				; GFX10-64-NEXT: v_and_b32_e32 v0, 1, v0
				; GFX10-64-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1
				; GFX10-64-NEXT: v_cmp_eq_u32_e64 s[0:1], 1, v0
				; GFX10-64-NEXT: s_and_saveexec_b64 s[4:5], s[0:1]
				; GFX10-64-NEXT: ; %bb.1: ; %.demote
				; GFX10-64-NEXT: s_andn2_b64 s[2:3], s[2:3], exec
				; GFX10-64-NEXT: s_cbranch_scc0 BB2_4
				; GFX10-64-NEXT: ; %bb.2: ; %.demote
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: ; %bb.3: ; %.continue
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 0, 1.0, vcc
				; GFX10-64-NEXT: exp mrt1 v0, v0, v0, v0 done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB2_4:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%i0 = fptosi float %arg0 to i32
				%i1 = fptosi float %arg1 to i32
				%c0 = or i32 %i0, %i1
				%c1 = and i32 %c0, 1
				%c2 = icmp eq i32 %c1, 0
				br i1 %c2, label %.continue, label %.demote

				.demote:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue

				.continue:
				%tmp1 = select i1 %c2, float 1.000000e+00, float 0.000000e+00
				call void @llvm.amdgcn.exp.f32(i32 1, i32 15, float %tmp1, float %tmp1, float %tmp1, float %tmp1, i1 true, i1 true) #0
				ret void
				}


				define amdgpu_ps <4 x float> @wqm_demote_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {
				; SI-LABEL: wqm_demote_1:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[12:13], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v1
				; SI-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; SI-NEXT: ; %bb.1: ; %.demote
				; SI-NEXT: s_andn2_b64 s[12:13], s[12:13], exec
				; SI-NEXT: s_cbranch_scc0 BB3_4
				; SI-NEXT: ; %bb.2: ; %.demote
				; SI-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; SI-NEXT: s_and_b64 exec, exec, s[16:17]
				; SI-NEXT: ; %bb.3: ; %.continue
				; SI-NEXT: s_or_b64 exec, exec, s[14:15]
				; SI-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_add_f32_e32 v0, v0, v0
				; SI-NEXT: s_and_b64 exec, exec, s[12:13]
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: s_branch BB3_5
				; SI-NEXT: BB3_4:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB3_5:
				;
				; GFX9-LABEL: wqm_demote_1:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[12:13], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v1
				; GFX9-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote
				; GFX9-NEXT: s_andn2_b64 s[12:13], s[12:13], exec
				; GFX9-NEXT: s_cbranch_scc0 BB3_4
				; GFX9-NEXT: ; %bb.2: ; %.demote
				; GFX9-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX9-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX9-NEXT: ; %bb.3: ; %.continue
				; GFX9-NEXT: s_or_b64 exec, exec, s[14:15]
				; GFX9-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_branch BB3_5
				; GFX9-NEXT: BB3_4:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB3_5:
				;
				; GFX10-32-LABEL: wqm_demote_1:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: v_cmp_ngt_f32_e32 vcc_lo, 0, v1
				; GFX10-32-NEXT: s_and_saveexec_b32 s13, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote
				; GFX10-32-NEXT: s_andn2_b32 s12, s12, exec_lo
				; GFX10-32-NEXT: s_cbranch_scc0 BB3_4
				; GFX10-32-NEXT: ; %bb.2: ; %.demote
				; GFX10-32-NEXT: s_wqm_b32 s28, s12
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s28
				; GFX10-32-NEXT: ; %bb.3: ; %.continue
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s13
				; GFX10-32-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: s_branch BB3_5
				; GFX10-32-NEXT: BB3_4:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB3_5:
				;
				; GFX10-64-LABEL: wqm_demote_1:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[12:13], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v1
				; GFX10-64-NEXT: s_and_saveexec_b64 s[28:29], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote
				; GFX10-64-NEXT: s_andn2_b64 s[12:13], s[12:13], exec
				; GFX10-64-NEXT: s_cbranch_scc0 BB3_4
				; GFX10-64-NEXT: ; %bb.2: ; %.demote
				; GFX10-64-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[28:29]
				; GFX10-64-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: s_branch BB3_5
				; GFX10-64-NEXT: BB3_4:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB3_5:
				.entry:
				%z.cmp = fcmp olt float %z, 0.0
				br i1 %z.cmp, label %.continue, label %.demote

				.demote:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue

				.continue:
				%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0
				%tex0 = extractelement <4 x float> %tex, i32 0
				%tex1 = extractelement <4 x float> %tex, i32 0
				%coord1 = fadd float %tex0, %tex1
				%rtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord1, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0

				ret <4 x float> %rtex
				}

				define amdgpu_ps <4 x float> @wqm_demote_2(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {
				; SI-LABEL: wqm_demote_2:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[12:13], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; SI-NEXT: ; %bb.1: ; %.demote
				; SI-NEXT: s_andn2_b64 s[12:13], s[12:13], exec
				; SI-NEXT: s_cbranch_scc0 BB4_4
				; SI-NEXT: ; %bb.2: ; %.demote
				; SI-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; SI-NEXT: s_and_b64 exec, exec, s[16:17]
				; SI-NEXT: ; %bb.3: ; %.continue
				; SI-NEXT: s_or_b64 exec, exec, s[14:15]
				; SI-NEXT: v_add_f32_e32 v0, v0, v0
				; SI-NEXT: s_and_b64 exec, exec, s[12:13]
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: s_branch BB4_5
				; SI-NEXT: BB4_4:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB4_5:
				;
				; GFX9-LABEL: wqm_demote_2:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[12:13], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_and_saveexec_b64 s[14:15], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote
				; GFX9-NEXT: s_andn2_b64 s[12:13], s[12:13], exec
				; GFX9-NEXT: s_cbranch_scc0 BB4_4
				; GFX9-NEXT: ; %bb.2: ; %.demote
				; GFX9-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX9-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX9-NEXT: ; %bb.3: ; %.continue
				; GFX9-NEXT: s_or_b64 exec, exec, s[14:15]
				; GFX9-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_branch BB4_5
				; GFX9-NEXT: BB4_4:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB4_5:
				;
				; GFX10-32-LABEL: wqm_demote_2:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: v_cmp_ngt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_and_saveexec_b32 s13, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote
				; GFX10-32-NEXT: s_andn2_b32 s12, s12, exec_lo
				; GFX10-32-NEXT: s_cbranch_scc0 BB4_4
				; GFX10-32-NEXT: ; %bb.2: ; %.demote
				; GFX10-32-NEXT: s_wqm_b32 s28, s12
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s28
				; GFX10-32-NEXT: ; %bb.3: ; %.continue
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s13
				; GFX10-32-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: s_branch BB4_5
				; GFX10-32-NEXT: BB4_4:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB4_5:
				;
				; GFX10-64-LABEL: wqm_demote_2:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[12:13], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: v_cmp_ngt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_and_saveexec_b64 s[28:29], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote
				; GFX10-64-NEXT: s_andn2_b64 s[12:13], s[12:13], exec
				; GFX10-64-NEXT: s_cbranch_scc0 BB4_4
				; GFX10-64-NEXT: ; %bb.2: ; %.demote
				; GFX10-64-NEXT: s_wqm_b64 s[16:17], s[12:13]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[16:17]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[28:29]
				; GFX10-64-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: s_branch BB4_5
				; GFX10-64-NEXT: BB4_4:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB4_5:
				.entry:
				%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0
				%tex0 = extractelement <4 x float> %tex, i32 0
				%tex1 = extractelement <4 x float> %tex, i32 0
				%z.cmp = fcmp olt float %tex0, 0.0
				br i1 %z.cmp, label %.continue, label %.demote

				.demote:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue

				.continue:
				%coord1 = fadd float %tex0, %tex1
				%rtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord1, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0

				ret <4 x float> %rtex
				}

				define amdgpu_ps <4 x float> @wqm_demote_dynamic(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {
				; SI-LABEL: wqm_demote_dynamic:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[12:13], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; SI-NEXT: s_xor_b64 s[14:15], vcc, exec
				; SI-NEXT: s_andn2_b64 s[12:13], s[12:13], s[14:15]
				; SI-NEXT: s_cbranch_scc0 BB5_2
				; SI-NEXT: ; %bb.1: ; %.entry
				; SI-NEXT: s_wqm_b64 s[14:15], s[12:13]
				; SI-NEXT: s_and_b64 exec, exec, s[14:15]
				; SI-NEXT: v_add_f32_e32 v0, v0, v0
				; SI-NEXT: s_and_b64 exec, exec, s[12:13]
				; SI-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: s_branch BB5_3
				; SI-NEXT: BB5_2:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB5_3:
				;
				; GFX9-LABEL: wqm_demote_dynamic:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[12:13], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_xor_b64 s[14:15], vcc, exec
				; GFX9-NEXT: s_andn2_b64 s[12:13], s[12:13], s[14:15]
				; GFX9-NEXT: s_cbranch_scc0 BB5_2
				; GFX9-NEXT: ; %bb.1: ; %.entry
				; GFX9-NEXT: s_wqm_b64 s[14:15], s[12:13]
				; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
				; GFX9-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_branch BB5_3
				; GFX9-NEXT: BB5_2:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB5_3:
				;
				; GFX10-32-LABEL: wqm_demote_dynamic:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: v_cmp_gt_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_xor_b32 s13, vcc_lo, exec_lo
				; GFX10-32-NEXT: s_andn2_b32 s12, s12, s13
				; GFX10-32-NEXT: s_cbranch_scc0 BB5_2
				; GFX10-32-NEXT: ; %bb.1: ; %.entry
				; GFX10-32-NEXT: s_wqm_b32 s13, s12
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s13
				; GFX10-32-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-32-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-32-NEXT: s_waitcnt vmcnt(0)
				; GFX10-32-NEXT: s_branch BB5_3
				; GFX10-32-NEXT: BB5_2:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB5_3:
				;
				; GFX10-64-LABEL: wqm_demote_dynamic:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[12:13], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: v_cmp_gt_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_xor_b64 s[14:15], vcc, exec
				; GFX10-64-NEXT: s_andn2_b64 s[12:13], s[12:13], s[14:15]
				; GFX10-64-NEXT: s_cbranch_scc0 BB5_2
				; GFX10-64-NEXT: ; %bb.1: ; %.entry
				; GFX10-64-NEXT: s_wqm_b64 s[28:29], s[12:13]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[28:29]
				; GFX10-64-NEXT: v_add_f32_e32 v0, v0, v0
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[12:13]
				; GFX10-64-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
				; GFX10-64-NEXT: s_waitcnt vmcnt(0)
				; GFX10-64-NEXT: s_branch BB5_3
				; GFX10-64-NEXT: BB5_2:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB5_3:
				.entry:
				%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0
				%tex0 = extractelement <4 x float> %tex, i32 0
				%tex1 = extractelement <4 x float> %tex, i32 0
				%z.cmp = fcmp olt float %tex0, 0.0
				call void @llvm.amdgcn.wqm.demote(i1 %z.cmp)
				%coord1 = fadd float %tex0, %tex1
				%rtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord1, <8 x i32> %rsrc, <4 x i32> %sampler, i1 0, i32 0, i32 0) #0

				ret <4 x float> %rtex
				}


				define amdgpu_ps void @wqm_deriv(<2 x float> %input, float %arg, i32 %index) {
				; SI-LABEL: wqm_deriv:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[0:1], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: v_cvt_i32_f32_e32 v0, v0
				; SI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; SI-NEXT: s_and_saveexec_b64 s[2:3], vcc
				; SI-NEXT: ; %bb.1: ; %.demote0
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; SI-NEXT: s_cbranch_scc0 BB6_7
				; SI-NEXT: ; %bb.2: ; %.demote0
				; SI-NEXT: s_wqm_b64 s[4:5], s[0:1]
				; SI-NEXT: s_and_b64 exec, exec, s[4:5]
				; SI-NEXT: ; %bb.3: ; %.continue0
				; SI-NEXT: s_or_b64 exec, exec, s[2:3]
				; SI-NEXT: s_mov_b64 s[2:3], s[0:1]
				; SI-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[2:3]
				; SI-NEXT: v_mov_b32_e32 v1, v0
				; SI-NEXT: s_xor_b64 s[2:3], s[0:1], -1
				; SI-NEXT: s_nop 0
				; SI-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: s_nop 1
				; SI-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; SI-NEXT: s_and_b64 exec, exec, s[0:1]
				; SI-NEXT: v_cmp_neq_f32_e32 vcc, 0, v0
				; SI-NEXT: s_or_b64 s[4:5], s[2:3], vcc
				; SI-NEXT: s_and_saveexec_b64 s[2:3], s[4:5]
				; SI-NEXT: ; %bb.4: ; %.demote1
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; SI-NEXT: s_cbranch_scc0 BB6_7
				; SI-NEXT: ; %bb.5: ; %.demote1
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: ; %bb.6: ; %.continue1
				; SI-NEXT: s_or_b64 exec, exec, s[2:3]
				; SI-NEXT: v_bfrev_b32_e32 v0, 60
				; SI-NEXT: v_mov_b32_e32 v1, 0x3c00
				; SI-NEXT: exp mrt0 v1, v1, v0, v0 done compr vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB6_7:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: wqm_deriv:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[0:1], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote0
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX9-NEXT: s_cbranch_scc0 BB6_7
				; GFX9-NEXT: ; %bb.2: ; %.demote0
				; GFX9-NEXT: s_wqm_b64 s[4:5], s[0:1]
				; GFX9-NEXT: s_and_b64 exec, exec, s[4:5]
				; GFX9-NEXT: ; %bb.3: ; %.continue0
				; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX9-NEXT: s_mov_b64 s[2:3], s[0:1]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[2:3]
				; GFX9-NEXT: v_mov_b32_e32 v1, v0
				; GFX9-NEXT: s_xor_b64 s[2:3], s[0:1], -1
				; GFX9-NEXT: s_nop 0
				; GFX9-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: s_nop 1
				; GFX9-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX9-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX9-NEXT: v_cmp_neq_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_or_b64 s[4:5], s[2:3], vcc
				; GFX9-NEXT: s_and_saveexec_b64 s[2:3], s[4:5]
				; GFX9-NEXT: ; %bb.4: ; %.demote1
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX9-NEXT: s_cbranch_scc0 BB6_7
				; GFX9-NEXT: ; %bb.5: ; %.demote1
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: ; %bb.6: ; %.continue1
				; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX9-NEXT: v_mov_b32_e32 v0, 0x3c00
				; GFX9-NEXT: v_bfrev_b32_e32 v1, 60
				; GFX9-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB6_7:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: wqm_deriv:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s0, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-32-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_and_saveexec_b32 s1, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote0
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, exec_lo
				; GFX10-32-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-32-NEXT: ; %bb.2: ; %.demote0
				; GFX10-32-NEXT: s_wqm_b32 s2, s0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s2
				; GFX10-32-NEXT: ; %bb.3: ; %.continue0
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: s_mov_b32 s1, s0
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s1
				; GFX10-32-NEXT: v_mov_b32_e32 v1, v0
				; GFX10-32-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s0
				; GFX10-32-NEXT: v_cmp_neq_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_xor_b32 s1, s0, -1
				; GFX10-32-NEXT: s_or_b32 s2, s1, vcc_lo
				; GFX10-32-NEXT: s_and_saveexec_b32 s1, s2
				; GFX10-32-NEXT: ; %bb.4: ; %.demote1
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, exec_lo
				; GFX10-32-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-32-NEXT: ; %bb.5: ; %.demote1
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: ; %bb.6: ; %.continue1
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: v_mov_b32_e32 v0, 0x3c00
				; GFX10-32-NEXT: v_bfrev_b32_e32 v1, 60
				; GFX10-32-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB6_7:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: wqm_deriv:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[0:1], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-64-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_and_saveexec_b64 s[2:3], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote0
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX10-64-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-64-NEXT: ; %bb.2: ; %.demote0
				; GFX10-64-NEXT: s_wqm_b64 s[4:5], s[0:1]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue0
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX10-64-NEXT: s_mov_b64 s[2:3], s[0:1]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[2:3]
				; GFX10-64-NEXT: v_mov_b32_e32 v1, v0
				; GFX10-64-NEXT: v_mov_b32_dpp v1, v1 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: v_subrev_f32_dpp v0, v0, v1 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX10-64-NEXT: v_cmp_neq_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_xor_b64 s[2:3], s[0:1], -1
				; GFX10-64-NEXT: s_or_b64 s[4:5], s[2:3], vcc
				; GFX10-64-NEXT: s_and_saveexec_b64 s[2:3], s[4:5]
				; GFX10-64-NEXT: ; %bb.4: ; %.demote1
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX10-64-NEXT: s_cbranch_scc0 BB6_7
				; GFX10-64-NEXT: ; %bb.5: ; %.demote1
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: ; %bb.6: ; %.continue1
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX10-64-NEXT: v_mov_b32_e32 v0, 0x3c00
				; GFX10-64-NEXT: v_bfrev_b32_e32 v1, 60
				; GFX10-64-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB6_7:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%p0 = extractelement <2 x float> %input, i32 0
				%p1 = extractelement <2 x float> %input, i32 1
				%x0 = call float @llvm.amdgcn.interp.p1(float %p0, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%x1 = call float @llvm.amdgcn.interp.p2(float %x0, float %p1, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%argi = fptosi float %arg to i32
				%cond0 = icmp eq i32 %argi, 0
				br i1 %cond0, label %.continue0, label %.demote0

				.demote0:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue0

				.continue0:
				%live = call i1 @llvm.amdgcn.live.mask()
				%live.cond = select i1 %live, i32 0, i32 1065353216
				%live.v0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 85, i32 15, i32 15, i1 true)
				%live.v0f = bitcast i32 %live.v0 to float
				%live.v1 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 0, i32 15, i32 15, i1 true)
				%live.v1f = bitcast i32 %live.v1 to float
				%v0 = fsub float %live.v0f, %live.v1f
				%v0.wqm = call float @llvm.amdgcn.wqm.f32(float %v0)
				%cond1 = fcmp oeq float %v0.wqm, 0.000000e+00
				%cond2 = and i1 %live, %cond1
				br i1 %cond2, label %.continue1, label %.demote1

				.demote1:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue1

				.continue1:
				call void @llvm.amdgcn.exp.compr.v2f16(i32 immarg 0, i32 immarg 15, <2 x half> <half 0xH3C00, half 0xH0000>, <2 x half> <half 0xH0000, half 0xH3C00>, i1 immarg true, i1 immarg true) #3
				ret void
				}

				define amdgpu_ps void @wqm_deriv_loop(<2 x float> %input, float %arg, i32 %index, i32 %limit) {
				; SI-LABEL: wqm_deriv_loop:
				; SI: ; %bb.0: ; %.entry
				; SI-NEXT: s_mov_b64 s[0:1], exec
				; SI-NEXT: s_wqm_b64 exec, exec
				; SI-NEXT: v_cvt_i32_f32_e32 v0, v0
				; SI-NEXT: s_mov_b32 s2, 0
				; SI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; SI-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; SI-NEXT: ; %bb.1: ; %.demote0
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; SI-NEXT: s_cbranch_scc0 BB7_9
				; SI-NEXT: ; %bb.2: ; %.demote0
				; SI-NEXT: s_wqm_b64 s[6:7], s[0:1]
				; SI-NEXT: s_and_b64 exec, exec, s[6:7]
				; SI-NEXT: ; %bb.3: ; %.continue0.preheader
				; SI-NEXT: s_or_b64 exec, exec, s[4:5]
				; SI-NEXT: s_mov_b64 s[4:5], 0
				; SI-NEXT: s_branch BB7_5
				; SI-NEXT: BB7_4: ; %.continue1
				; SI-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; SI-NEXT: s_or_b64 exec, exec, s[6:7]
				; SI-NEXT: s_add_i32 s2, s2, 1
				; SI-NEXT: v_cmp_ge_i32_e32 vcc, s2, v1
				; SI-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
				; SI-NEXT: s_andn2_b64 exec, exec, s[4:5]
				; SI-NEXT: s_cbranch_execz BB7_8
				; SI-NEXT: BB7_5: ; %.continue0
				; SI-NEXT: ; =>This Inner Loop Header: Depth=1
				; SI-NEXT: v_mov_b32_e32 v0, s2
				; SI-NEXT: s_mov_b64 s[6:7], s[0:1]
				; SI-NEXT: v_cndmask_b32_e64 v0, v0, 0, s[6:7]
				; SI-NEXT: v_mov_b32_e32 v2, v0
				; SI-NEXT: s_xor_b64 s[6:7], s[0:1], -1
				; SI-NEXT: s_nop 0
				; SI-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: s_nop 1
				; SI-NEXT: v_subrev_f32_dpp v0, v0, v2 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; SI-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; SI-NEXT: v_cmp_neq_f32_e32 vcc, 0, v0
				; SI-NEXT: s_or_b64 s[8:9], s[6:7], vcc
				; SI-NEXT: s_and_saveexec_b64 s[6:7], s[8:9]
				; SI-NEXT: s_cbranch_execz BB7_4
				; SI-NEXT: ; %bb.6: ; %.demote1
				; SI-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; SI-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; SI-NEXT: s_cbranch_scc0 BB7_9
				; SI-NEXT: ; %bb.7: ; %.demote1
				; SI-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; SI-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; SI-NEXT: s_and_b64 exec, exec, s[8:9]
				; SI-NEXT: s_branch BB7_4
				; SI-NEXT: BB7_8: ; %.return
				; SI-NEXT: s_or_b64 exec, exec, s[4:5]
				; SI-NEXT: s_and_b64 exec, exec, s[0:1]
				; SI-NEXT: v_bfrev_b32_e32 v0, 60
				; SI-NEXT: v_mov_b32_e32 v1, 0x3c00
				; SI-NEXT: exp mrt0 v1, v1, v0, v0 done compr vm
				; SI-NEXT: s_endpgm
				; SI-NEXT: BB7_9:
				; SI-NEXT: s_mov_b64 exec, 0
				; SI-NEXT: exp null off, off, off, off done vm
				; SI-NEXT: s_endpgm
				;
				; GFX9-LABEL: wqm_deriv_loop:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: s_mov_b64 s[0:1], exec
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX9-NEXT: s_mov_b32 s2, 0
				; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX9-NEXT: ; %bb.1: ; %.demote0
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX9-NEXT: s_cbranch_scc0 BB7_9
				; GFX9-NEXT: ; %bb.2: ; %.demote0
				; GFX9-NEXT: s_wqm_b64 s[6:7], s[0:1]
				; GFX9-NEXT: s_and_b64 exec, exec, s[6:7]
				; GFX9-NEXT: ; %bb.3: ; %.continue0.preheader
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_mov_b64 s[4:5], 0
				; GFX9-NEXT: s_branch BB7_5
				; GFX9-NEXT: BB7_4: ; %.continue1
				; GFX9-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX9-NEXT: s_or_b64 exec, exec, s[6:7]
				; GFX9-NEXT: s_add_i32 s2, s2, 1
				; GFX9-NEXT: v_cmp_ge_i32_e32 vcc, s2, v1
				; GFX9-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
				; GFX9-NEXT: s_andn2_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_cbranch_execz BB7_8
				; GFX9-NEXT: BB7_5: ; %.continue0
				; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX9-NEXT: v_mov_b32_e32 v0, s2
				; GFX9-NEXT: s_mov_b64 s[6:7], s[0:1]
				; GFX9-NEXT: v_cndmask_b32_e64 v0, v0, 0, s[6:7]
				; GFX9-NEXT: v_mov_b32_e32 v2, v0
				; GFX9-NEXT: s_xor_b64 s[6:7], s[0:1], -1
				; GFX9-NEXT: s_nop 0
				; GFX9-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: s_nop 1
				; GFX9-NEXT: v_subrev_f32_dpp v0, v0, v2 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX9-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX9-NEXT: v_cmp_neq_f32_e32 vcc, 0, v0
				; GFX9-NEXT: s_or_b64 s[8:9], s[6:7], vcc
				; GFX9-NEXT: s_and_saveexec_b64 s[6:7], s[8:9]
				; GFX9-NEXT: s_cbranch_execz BB7_4
				; GFX9-NEXT: ; %bb.6: ; %.demote1
				; GFX9-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX9-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX9-NEXT: s_cbranch_scc0 BB7_9
				; GFX9-NEXT: ; %bb.7: ; %.demote1
				; GFX9-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX9-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; GFX9-NEXT: s_and_b64 exec, exec, s[8:9]
				; GFX9-NEXT: s_branch BB7_4
				; GFX9-NEXT: BB7_8: ; %.return
				; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX9-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX9-NEXT: v_mov_b32_e32 v0, 0x3c00
				; GFX9-NEXT: v_bfrev_b32_e32 v1, 60
				; GFX9-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX9-NEXT: s_endpgm
				; GFX9-NEXT: BB7_9:
				; GFX9-NEXT: s_mov_b64 exec, 0
				; GFX9-NEXT: exp null off, off, off, off done vm
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-32-LABEL: wqm_deriv_loop:
				; GFX10-32: ; %bb.0: ; %.entry
				; GFX10-32-NEXT: s_mov_b32 s0, exec_lo
				; GFX10-32-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-32-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-32-NEXT: s_mov_b32 s1, 0
				; GFX10-32-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_and_saveexec_b32 s2, vcc_lo
				; GFX10-32-NEXT: ; %bb.1: ; %.demote0
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, exec_lo
				; GFX10-32-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-32-NEXT: ; %bb.2: ; %.demote0
				; GFX10-32-NEXT: s_wqm_b32 s3, s0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s3
				; GFX10-32-NEXT: ; %bb.3: ; %.continue0.preheader
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s2
				; GFX10-32-NEXT: s_mov_b32 s2, 0
				; GFX10-32-NEXT: s_branch BB7_5
				; GFX10-32-NEXT: BB7_4: ; %.continue1
				; GFX10-32-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s3
				; GFX10-32-NEXT: s_add_i32 s2, s2, 1
				; GFX10-32-NEXT: v_cmp_ge_i32_e32 vcc_lo, s2, v1
				; GFX10-32-NEXT: s_or_b32 s1, vcc_lo, s1
				; GFX10-32-NEXT: s_andn2_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: s_cbranch_execz BB7_8
				; GFX10-32-NEXT: BB7_5: ; %.continue0
				; GFX10-32-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX10-32-NEXT: s_mov_b32 s3, s0
				; GFX10-32-NEXT: v_cndmask_b32_e64 v0, s2, 0, s3
				; GFX10-32-NEXT: s_xor_b32 s3, s0, -1
				; GFX10-32-NEXT: v_mov_b32_e32 v2, v0
				; GFX10-32-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: v_subrev_f32_dpp v0, v0, v2 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-32-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX10-32-NEXT: v_cmp_neq_f32_e32 vcc_lo, 0, v0
				; GFX10-32-NEXT: s_or_b32 s4, s3, vcc_lo
				; GFX10-32-NEXT: s_and_saveexec_b32 s3, s4
				; GFX10-32-NEXT: s_cbranch_execz BB7_4
				; GFX10-32-NEXT: ; %bb.6: ; %.demote1
				; GFX10-32-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-32-NEXT: s_andn2_b32 s0, s0, exec_lo
				; GFX10-32-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-32-NEXT: ; %bb.7: ; %.demote1
				; GFX10-32-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-32-NEXT: s_wqm_b32 s4, s0
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s4
				; GFX10-32-NEXT: s_branch BB7_4
				; GFX10-32-NEXT: BB7_8: ; %.return
				; GFX10-32-NEXT: s_or_b32 exec_lo, exec_lo, s1
				; GFX10-32-NEXT: s_and_b32 exec_lo, exec_lo, s0
				; GFX10-32-NEXT: v_mov_b32_e32 v0, 0x3c00
				; GFX10-32-NEXT: v_bfrev_b32_e32 v1, 60
				; GFX10-32-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-32-NEXT: s_endpgm
				; GFX10-32-NEXT: BB7_9:
				; GFX10-32-NEXT: s_mov_b32 exec_lo, 0
				; GFX10-32-NEXT: exp null off, off, off, off done vm
				; GFX10-32-NEXT: s_endpgm
				;
				; GFX10-64-LABEL: wqm_deriv_loop:
				; GFX10-64: ; %bb.0: ; %.entry
				; GFX10-64-NEXT: s_mov_b64 s[0:1], exec
				; GFX10-64-NEXT: s_wqm_b64 exec, exec
				; GFX10-64-NEXT: v_cvt_i32_f32_e32 v0, v0
				; GFX10-64-NEXT: s_mov_b32 s2, 0
				; GFX10-64-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_and_saveexec_b64 s[4:5], vcc
				; GFX10-64-NEXT: ; %bb.1: ; %.demote0
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX10-64-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-64-NEXT: ; %bb.2: ; %.demote0
				; GFX10-64-NEXT: s_wqm_b64 s[6:7], s[0:1]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[6:7]
				; GFX10-64-NEXT: ; %bb.3: ; %.continue0.preheader
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: s_mov_b64 s[4:5], 0
				; GFX10-64-NEXT: s_branch BB7_5
				; GFX10-64-NEXT: BB7_4: ; %.continue1
				; GFX10-64-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[6:7]
				; GFX10-64-NEXT: s_add_i32 s2, s2, 1
				; GFX10-64-NEXT: v_cmp_ge_i32_e32 vcc, s2, v1
				; GFX10-64-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
				; GFX10-64-NEXT: s_andn2_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: s_cbranch_execz BB7_8
				; GFX10-64-NEXT: BB7_5: ; %.continue0
				; GFX10-64-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX10-64-NEXT: s_mov_b64 s[6:7], s[0:1]
				; GFX10-64-NEXT: v_cndmask_b32_e64 v0, s2, 0, s[6:7]
				; GFX10-64-NEXT: s_xor_b64 s[6:7], s[0:1], -1
				; GFX10-64-NEXT: v_mov_b32_e32 v2, v0
				; GFX10-64-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: v_subrev_f32_dpp v0, v0, v2 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX10-64-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
				; GFX10-64-NEXT: v_cmp_neq_f32_e32 vcc, 0, v0
				; GFX10-64-NEXT: s_or_b64 s[8:9], s[6:7], vcc
				; GFX10-64-NEXT: s_and_saveexec_b64 s[6:7], s[8:9]
				; GFX10-64-NEXT: s_cbranch_execz BB7_4
				; GFX10-64-NEXT: ; %bb.6: ; %.demote1
				; GFX10-64-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-64-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
				; GFX10-64-NEXT: s_cbranch_scc0 BB7_9
				; GFX10-64-NEXT: ; %bb.7: ; %.demote1
				; GFX10-64-NEXT: ; in Loop: Header=BB7_5 Depth=1
				; GFX10-64-NEXT: s_wqm_b64 s[8:9], s[0:1]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[8:9]
				; GFX10-64-NEXT: s_branch BB7_4
				; GFX10-64-NEXT: BB7_8: ; %.return
				; GFX10-64-NEXT: s_or_b64 exec, exec, s[4:5]
				; GFX10-64-NEXT: s_and_b64 exec, exec, s[0:1]
				; GFX10-64-NEXT: v_mov_b32_e32 v0, 0x3c00
				; GFX10-64-NEXT: v_bfrev_b32_e32 v1, 60
				; GFX10-64-NEXT: exp mrt0 v0, v0, v1, v1 done compr vm
				; GFX10-64-NEXT: s_endpgm
				; GFX10-64-NEXT: BB7_9:
				; GFX10-64-NEXT: s_mov_b64 exec, 0
				; GFX10-64-NEXT: exp null off, off, off, off done vm
				; GFX10-64-NEXT: s_endpgm
				.entry:
				%p0 = extractelement <2 x float> %input, i32 0
				%p1 = extractelement <2 x float> %input, i32 1
				%x0 = call float @llvm.amdgcn.interp.p1(float %p0, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%x1 = call float @llvm.amdgcn.interp.p2(float %x0, float %p1, i32 immarg 0, i32 immarg 0, i32 %index) #2
				%argi = fptosi float %arg to i32
				%cond0 = icmp eq i32 %argi, 0
				br i1 %cond0, label %.continue0, label %.demote0

				.demote0:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue0

				.continue0:
				%count = phi i32 [ 0, %.entry ], [ 0, %.demote0 ], [ %next, %.continue1 ]
				%live = call i1 @llvm.amdgcn.live.mask()
				%live.cond = select i1 %live, i32 0, i32 %count
				%live.v0 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 85, i32 15, i32 15, i1 true)
				%live.v0f = bitcast i32 %live.v0 to float
				%live.v1 = call i32 @llvm.amdgcn.mov.dpp.i32(i32 %live.cond, i32 0, i32 15, i32 15, i1 true)
				%live.v1f = bitcast i32 %live.v1 to float
				%v0 = fsub float %live.v0f, %live.v1f
				%v0.wqm = call float @llvm.amdgcn.wqm.f32(float %v0)
				%cond1 = fcmp oeq float %v0.wqm, 0.000000e+00
				%cond2 = and i1 %live, %cond1
				br i1 %cond2, label %.continue1, label %.demote1

				.demote1:
				call void @llvm.amdgcn.wqm.demote(i1 false)
				br label %.continue1

				.continue1:
				%next = add i32 %count, 1
				%loop.cond = icmp slt i32 %next, %limit
				br i1 %loop.cond, label %.continue0, label %.return

				.return:
				call void @llvm.amdgcn.exp.compr.v2f16(i32 immarg 0, i32 immarg 15, <2 x half> <half 0xH3C00, half 0xH0000>, <2 x half> <half 0xH0000, half 0xH3C00>, i1 immarg true, i1 immarg true) #3
				ret void
				}

				declare void @llvm.amdgcn.wqm.demote(i1) #0
				declare i1 @llvm.amdgcn.live.mask() #0
				declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
				declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare float @llvm.amdgcn.wqm.f32(float) #1
				declare float @llvm.amdgcn.interp.p1(float, i32 immarg, i32 immarg, i32) #2
				declare float @llvm.amdgcn.interp.p2(float, float, i32 immarg, i32 immarg, i32) #2
				declare void @llvm.amdgcn.exp.compr.v2f16(i32 immarg, i32 immarg, <2 x half>, <2 x half>, i1 immarg, i1 immarg) #3
				declare i32 @llvm.amdgcn.mov.dpp.i32(i32, i32 immarg, i32 immarg, i32 immarg, i1 immarg) #4

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind readnone speculatable }
				attributes #3 = { inaccessiblememonly nounwind }
				attributes #4 = { convergent nounwind readnone }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.vote.ll

	Show All 28 Lines
	;WAVE32: s_wqm_b32			;WAVE32: s_wqm_b32
	define amdgpu_ps float @false() #1 {			define amdgpu_ps float @false() #1 {
	main_body:			main_body:
	%w = call i1 @llvm.amdgcn.wqm.vote(i1 false)			%w = call i1 @llvm.amdgcn.wqm.vote(i1 false)
	%r = select i1 %w, float 1.0, float 0.0			%r = select i1 %w, float 1.0, float 0.0
	ret float %r			ret float %r
	}			}

				; Note: an almost identical test for this exists in llvm.amdgcn.kill.ll
	;CHECK-LABEL: {{^}}kill:			;CHECK-LABEL: {{^}}kill:
	;CHECK: v_cmp_eq_u32_e32 [[CMP:[^,]+]], v0, v1			;CHECK: v_cmp_eq_u32_e32 [[CMP:[^,]+]], v0, v1

	;WAVE64: s_wqm_b64 [[WQM:[^,]+]], [[CMP]]			;WAVE64: s_wqm_b64 [[WQM:[^,]+]], [[CMP]]
	;WAVE64: s_and_b64 exec, exec, [[WQM]]			;WAVE64: s_xor_b64 [[KILL:[^,]+]], [[WQM]], exec
				;WAVE64: s_andn2_b64 [[MASK:[^,]+]], [[EXEC:[^,]+]], [[KILL]]
				;WAVE64: s_and_b64 exec, exec, [[MASK]]

	;WAVE32: s_wqm_b32 [[WQM:[^,]+]], [[CMP]]			;WAVE32: s_wqm_b32 [[WQM:[^,]+]], [[CMP]]
	;WAVE32: s_and_b32 exec_lo, exec_lo, [[WQM]]			;WAVE32: s_xor_b32 [[KILL:[^,]+]], [[WQM]], exec
				;WAVE32: s_andn2_b32 [[MASK:[^,]+]], [[EXEC:[^,]+]], [[KILL]]
				;WAVE32: s_and_b32 exec_lo, exec_lo, [[MASK]]

	;CHECK: s_endpgm			;CHECK: s_endpgm
	define amdgpu_ps void @kill(i32 %v0, i32 %v1) #1 {			define amdgpu_ps float @kill(i32 %v0, i32 %v1) #1 {
	main_body:			main_body:
	%c = icmp eq i32 %v0, %v1			%c = icmp eq i32 %v0, %v1
	%w = call i1 @llvm.amdgcn.wqm.vote(i1 %c)			%w = call i1 @llvm.amdgcn.wqm.vote(i1 %c)
	call void @llvm.amdgcn.kill(i1 %w)			call void @llvm.amdgcn.kill(i1 %w)
	ret void			ret float 0.0
	}			}

	declare void @llvm.amdgcn.kill(i1) #1			declare void @llvm.amdgcn.kill(i1) #1
	declare i1 @llvm.amdgcn.wqm.vote(i1)			declare i1 @llvm.amdgcn.wqm.vote(i1)

	attributes #1 = { nounwind }			attributes #1 = { nounwind }

llvm/test/CodeGen/AMDGPU/skip-if-dead.ll

; RUN: llc -march=amdgcn -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck %s		; RUN: llc -march=amdgcn -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck %s

; CHECK-LABEL: {{^}}test_kill_depth_0_imm_pos:		; CHECK-LABEL: {{^}}test_kill_depth_0_imm_pos:
; CHECK-NEXT: ; %bb.0:		; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: ; %bb.1:
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_depth_0_imm_pos() #0 {		define amdgpu_ps void @test_kill_depth_0_imm_pos() #0 {
call void @llvm.amdgcn.kill(i1 true)		call void @llvm.amdgcn.kill(i1 true)
ret void		ret void
}		}

; CHECK-LABEL: {{^}}test_kill_depth_0_imm_neg:		; CHECK-LABEL: {{^}}test_kill_depth_0_imm_neg:
; CHECK-NEXT: ; %bb.0:		; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: s_mov_b64 exec, 0		; CHECK-NEXT: s_branch [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NEXT: s_cbranch_execz BB1_2
; CHECK-NEXT: ; %bb.1:
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
; CHECK-NEXT: BB1_2:		; CHECK-NEXT: [[EXIT_BB]]:
		; CHECK-NEXT: s_mov_b64 exec, 0
; CHECK-NEXT: exp null off, off, off, off done vm		; CHECK-NEXT: exp null off, off, off, off done vm
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_depth_0_imm_neg() #0 {		define amdgpu_ps void @test_kill_depth_0_imm_neg() #0 {
call void @llvm.amdgcn.kill(i1 false)		call void @llvm.amdgcn.kill(i1 false)
ret void		ret void
}		}

; FIXME: Ideally only one would be emitted		; FIXME: Ideally only one would be emitted
; CHECK-LABEL: {{^}}test_kill_depth_0_imm_neg_x2:		; CHECK-LABEL: {{^}}test_kill_depth_0_imm_neg_x2:
; CHECK-NEXT: ; %bb.0:		; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: s_mov_b64 exec, 0		; CHECK-NEXT: s_mov_b64 s[0:1], exec
; CHECK-NEXT: s_cbranch_execz BB2_3		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NEXT: ; %bb.1:		; CHECK-NEXT: ; %bb.1:
; CHECK-NEXT: s_mov_b64 exec, 0		; CHECK-NEXT: s_mov_b64 exec, 0
; CHECK-NEXT: s_cbranch_execz BB2_3		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
; CHECK-NEXT: ; %bb.2:		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB]]
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
; CHECK-NEXT: BB2_3:		; CHECK-NEXT: [[EXIT_BB]]:
; CHECK: exp null		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_depth_0_imm_neg_x2() #0 {		define amdgpu_ps void @test_kill_depth_0_imm_neg_x2() #0 {
call void @llvm.amdgcn.kill(i1 false)		call void @llvm.amdgcn.kill(i1 false)
call void @llvm.amdgcn.kill(i1 false)		call void @llvm.amdgcn.kill(i1 false)
ret void		ret void
}		}

; CHECK-LABEL: {{^}}test_kill_depth_var:		; CHECK-LABEL: {{^}}test_kill_depth_var:
; CHECK-NEXT: ; %bb.0:		; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: v_cmpx_gt_f32_e32 vcc, 0, v0		; CHECK-NEXT: v_cmp_lt_f32_e32 vcc, 0, v0
; CHECK-NEXT: s_cbranch_execz BB3_2		; CHECK-NEXT: s_andn2_b64 exec, exec, vcc
; CHECK-NEXT: ; %bb.1:		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
; CHECK-NEXT: BB3_2:		; CHECK-NEXT: [[EXIT_BB]]:
; CHECK: exp null		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_depth_var(float %x) #0 {		define amdgpu_ps void @test_kill_depth_var(float %x) #0 {
%cmp = fcmp olt float %x, 0.0		%cmp = fcmp olt float %x, 0.0
call void @llvm.amdgcn.kill(i1 %cmp)		call void @llvm.amdgcn.kill(i1 %cmp)
ret void		ret void
}		}

; FIXME: Ideally only one would be emitted		; FIXME: Ideally only one would be emitted
; CHECK-LABEL: {{^}}test_kill_depth_var_x2_same:		; CHECK-LABEL: {{^}}test_kill_depth_var_x2_same:
; CHECK-NEXT: ; %bb.0:		; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: v_cmpx_gt_f32_e32 vcc, 0, v0		; CHECK-NEXT: s_mov_b64 s[0:1], exec
; CHECK-NEXT: s_cbranch_execz BB4_3		; CHECK-NEXT: v_cmp_lt_f32_e32 vcc, 0, v0
		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], vcc
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NEXT: ; %bb.1:		; CHECK-NEXT: ; %bb.1:
; CHECK-NEXT: v_cmpx_gt_f32_e32 vcc, 0, v0		; CHECK-NEXT: s_andn2_b64 exec, exec, vcc
; CHECK-NEXT: s_cbranch_execz BB4_3		; CHECK-NEXT: v_cmp_lt_f32_e32 vcc, 0, v0
; CHECK-NEXT: ; %bb.2:		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], vcc
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB]]
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
; CHECK-NEXT: BB4_3:		; CHECK-NEXT: [[EXIT_BB]]:
; CHECK: exp null		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_depth_var_x2_same(float %x) #0 {		define amdgpu_ps void @test_kill_depth_var_x2_same(float %x) #0 {
%cmp = fcmp olt float %x, 0.0		%cmp = fcmp olt float %x, 0.0
call void @llvm.amdgcn.kill(i1 %cmp)		call void @llvm.amdgcn.kill(i1 %cmp)
call void @llvm.amdgcn.kill(i1 %cmp)		call void @llvm.amdgcn.kill(i1 %cmp)
ret void		ret void
}		}

; FIXME: Ideally only one early-exit would be emitted		; FIXME: Ideally only one early-exit would be emitted
; CHECK-LABEL: {{^}}test_kill_depth_var_x2:		; CHECK-LABEL: {{^}}test_kill_depth_var_x2:
; CHECK-NEXT: ; %bb.0:		; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: v_cmpx_gt_f32_e32 vcc, 0, v0		; CHECK-NEXT: s_mov_b64 s[0:1], exec
; CHECK-NEXT: s_cbranch_execz BB5_3		; CHECK-NEXT: v_cmp_lt_f32_e32 vcc, 0, v0
; CHECK-NEXT: ; %bb.1		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], vcc
; CHECK-NEXT: v_cmpx_gt_f32_e32 vcc, 0, v1		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NEXT: s_cbranch_execz BB5_3		; CHECK-NEXT: ; %bb.1:
; CHECK-NEXT: ; %bb.2		; CHECK-NEXT: s_andn2_b64 exec, exec, vcc
		; CHECK-NEXT: v_cmp_lt_f32_e32 vcc, 0, v1
		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], vcc
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB]]
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
; CHECK-NEXT: BB5_3:		; CHECK-NEXT: [[EXIT_BB]]:
; CHECK: exp null		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_depth_var_x2(float %x, float %y) #0 {		define amdgpu_ps void @test_kill_depth_var_x2(float %x, float %y) #0 {
%cmp.x = fcmp olt float %x, 0.0		%cmp.x = fcmp olt float %x, 0.0
call void @llvm.amdgcn.kill(i1 %cmp.x)		call void @llvm.amdgcn.kill(i1 %cmp.x)
%cmp.y = fcmp olt float %y, 0.0		%cmp.y = fcmp olt float %y, 0.0
call void @llvm.amdgcn.kill(i1 %cmp.y)		call void @llvm.amdgcn.kill(i1 %cmp.y)
ret void		ret void
}		}

; CHECK-LABEL: {{^}}test_kill_depth_var_x2_instructions:		; CHECK-LABEL: {{^}}test_kill_depth_var_x2_instructions:
; CHECK-NEXT: ; %bb.0:		; CHECK-NEXT: ; %bb.0:
; CHECK-NEXT: v_cmpx_gt_f32_e32 vcc, 0, v0		; CHECK-NEXT: s_mov_b64 s[0:1], exec
; CHECK-NEXT: s_cbranch_execz BB6_3		; CHECK-NEXT: v_cmp_lt_f32_e32 vcc, 0, v0
		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], vcc
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NEXT: ; %bb.1:		; CHECK-NEXT: ; %bb.1:
		; CHECK-NEXT: s_andn2_b64 exec, exec, vcc
; CHECK: v_mov_b32_e64 v7, -1		; CHECK: v_mov_b32_e64 v7, -1
; CHECK: v_cmpx_gt_f32_e32 vcc, 0, v7		; CHECK: v_cmp_lt_f32_e32 vcc, 0, v7
; CHECK-NEXT: s_cbranch_execz BB6_3		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], vcc
; CHECK-NEXT: ; %bb.2:		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB]]
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
; CHECK-NEXT: BB6_3:		; CHECK-NEXT: [[EXIT_BB]]:
		; CHECK-NEXT: s_mov_b64 exec, 0
; CHECK-NEXT: exp null		; CHECK-NEXT: exp null
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_depth_var_x2_instructions(float %x) #0 {		define amdgpu_ps void @test_kill_depth_var_x2_instructions(float %x) #0 {
%cmp.x = fcmp olt float %x, 0.0		%cmp.x = fcmp olt float %x, 0.0
call void @llvm.amdgcn.kill(i1 %cmp.x)		call void @llvm.amdgcn.kill(i1 %cmp.x)
%y = call float asm sideeffect "v_mov_b32_e64 v7, -1", "={v7}"()		%y = call float asm sideeffect "v_mov_b32_e64 v7, -1", "={v7}"()
%cmp.y = fcmp olt float %y, 0.0		%cmp.y = fcmp olt float %y, 0.0
call void @llvm.amdgcn.kill(i1 %cmp.y)		call void @llvm.amdgcn.kill(i1 %cmp.y)
ret void		ret void
}		}

; FIXME: why does the skip depend on the asm length in the same block?		; FIXME: why does the skip depend on the asm length in the same block?

; CHECK-LABEL: {{^}}test_kill_control_flow:		; CHECK-LABEL: {{^}}test_kill_control_flow:
; CHECK: s_cmp_lg_u32 s{{[0-9]+}}, 0		; CHECK: s_cmp_lg_u32 s{{[0-9]+}}, 0
; CHECK: s_cbranch_scc1 [[RETURN_BB:BB[0-9]+_[0-9]+]]		; CHECK: s_cbranch_scc0 [[BODY_BB:BB[0-9]+_[0-9]+]]

; CHECK-NEXT: ; %bb.1:		; CHECK: v_mov_b32_e32 v0, 1.0
		; CHECK: s_branch [[RETURN_BB:BB[0-9]+_[0-9]+]]

		; [[BODY_BB]]:
; CHECK: v_mov_b32_e64 v7, -1		; CHECK: v_mov_b32_e64 v7, -1
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64

; CHECK: v_cmpx_gt_f32_e32 vcc, 0, v7		; CHECK: v_cmp_lt_f32_e32 vcc, 0, v7
		; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], vcc
; TODO: We could do an early-exit here (the branch above is uniform!)		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NOT: exp null

		; CHECK: s_andn2_b64 exec, exec, vcc
; CHECK: v_mov_b32_e32 v0, 1.0		; CHECK: v_mov_b32_e32 v0, 1.0

		; CHECK: [[EXIT_BB]]
		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
		; CHECK-NEXT: s_endpgm
define amdgpu_ps float @test_kill_control_flow(i32 inreg %arg) #0 {		define amdgpu_ps float @test_kill_control_flow(i32 inreg %arg) #0 {
entry:		entry:
%cmp = icmp eq i32 %arg, 0		%cmp = icmp eq i32 %arg, 0
br i1 %cmp, label %bb, label %exit		br i1 %cmp, label %bb, label %exit

bb:		bb:
%var = call float asm sideeffect "		%var = call float asm sideeffect "
v_mov_b32_e64 v7, -1		v_mov_b32_e64 v7, -1
Show All 28 Lines
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: ;;#ASMEND		; CHECK: ;;#ASMEND
; CHECK: v_mov_b32_e64 v8, -1		; CHECK: v_mov_b32_e64 v8, -1
; CHECK: ;;#ASMEND		; CHECK: ;;#ASMEND
; CHECK: v_cmpx_gt_f32_e32 vcc, 0, v7		; CHECK: v_cmp_lt_f32_e32 vcc, 0, v7
		; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], vcc
; TODO: We could do an early-exit here (the branch above is uniform!)		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
; CHECK-NOT: exp null

; CHECK: buffer_store_dword v8		; CHECK: buffer_store_dword v8
; CHECK: v_mov_b32_e64 v9, -2		; CHECK: v_mov_b32_e64 v9, -2

; CHECK: {{^}}BB{{[0-9]+_[0-9]+}}:		; CHECK: {{^}}BB{{[0-9]+_[0-9]+}}:
; CHECK: buffer_store_dword v9		; CHECK: buffer_store_dword v9
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm

		; CHECK: [[EXIT_BB]]
		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_control_flow_remainder(i32 inreg %arg) #0 {		define amdgpu_ps void @test_kill_control_flow_remainder(i32 inreg %arg) #0 {
entry:		entry:
%cmp = icmp eq i32 %arg, 0		%cmp = icmp eq i32 %arg, 0
br i1 %cmp, label %bb, label %exit		br i1 %cmp, label %bb, label %exit

bb:		bb:
%var = call float asm sideeffect "		%var = call float asm sideeffect "
v_mov_b32_e64 v7, -1		v_mov_b32_e64 v7, -1
Show All 18 Lines
exit:		exit:
%phi = phi float [ 0.0, %entry ], [ %live.out, %bb ]		%phi = phi float [ 0.0, %entry ], [ %live.out, %bb ]
store float %phi, float addrspace(1)* undef		store float %phi, float addrspace(1)* undef
ret void		ret void
}		}

; CHECK-LABEL: {{^}}test_kill_control_flow_return:		; CHECK-LABEL: {{^}}test_kill_control_flow_return:

		; CHECK: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec
; CHECK: v_cmp_eq_u32_e64 [[KILL_CC:s\[[0-9]+:[0-9]+\]]], s0, 1		; CHECK: v_cmp_eq_u32_e64 [[KILL_CC:s\[[0-9]+:[0-9]+\]]], s0, 1
; CHECK: s_and_b64 exec, exec, s[2:3]		; CHECK: s_xor_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], [[KILL_CC]], exec
; CHECK-NEXT: s_cbranch_execz [[EXIT_BB:BB[0-9]+_[0-9]+]]		; CHECK: s_andn2_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], [[LIVE]], [[TMP]]
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
		; CHECK: s_and_b64 exec, exec, [[MASK]]

; CHECK: s_cmp_lg_u32 s{{[0-9]+}}, 0		; CHECK: s_cmp_lg_u32 s{{[0-9]+}}, 0
; CHECK: s_cbranch_scc0 [[COND_BB:BB[0-9]+_[0-9]+]]		; CHECK: s_cbranch_scc0 [[COND_BB:BB[0-9]+_[0-9]+]]
; CHECK: s_branch [[RETURN_BB:BB[0-9]+_[0-9]+]]		; CHECK: s_branch [[RETURN_BB:BB[0-9]+_[0-9]+]]

; CHECK: [[COND_BB]]:		; CHECK: [[COND_BB]]:
; CHECK: v_mov_b32_e64 v7, -1		; CHECK: v_mov_b32_e64 v7, -1
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_mov_b32_e32 v0, v7		; CHECK: v_mov_b32_e32 v0, v7

; CHECK: [[EXIT_BB]]:		; CHECK: [[EXIT_BB]]:
		; CHECK-NEXT: s_mov_b64 exec, 0
; CHECK-NEXT: exp null		; CHECK-NEXT: exp null
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm

; CHECK: [[RETURN_BB]]:		; CHECK: [[RETURN_BB]]:
define amdgpu_ps float @test_kill_control_flow_return(i32 inreg %arg) #0 {		define amdgpu_ps float @test_kill_control_flow_return(i32 inreg %arg) #0 {
entry:		entry:
%kill = icmp eq i32 %arg, 1		%kill = icmp eq i32 %arg, 1
%cmp = icmp eq i32 %arg, 0		%cmp = icmp eq i32 %arg, 0
Show All 28 Lines

; CHECK: ; %bb.{{[0-9]+}}: ; %bb.preheader		; CHECK: ; %bb.{{[0-9]+}}: ; %bb.preheader
; CHECK: s_mov_b32		; CHECK: s_mov_b32

; CHECK: [[LOOP_BB:BB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP_BB:BB[0-9]+_[0-9]+]]:

; CHECK: v_mov_b32_e64 v7, -1		; CHECK: v_mov_b32_e64 v7, -1
; CHECK: v_nop_e64		; CHECK: v_nop_e64
; CHECK: v_cmpx_gt_f32_e32 vcc, 0, v7		; CHECK: v_cmp_lt_f32_e32 vcc, 0, v7
		; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], vcc
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]

; CHECK-NEXT: ; %bb.3:		; CHECK-NEXT: ; %bb.3:
; CHECK: buffer_load_dword [[LOAD:v[0-9]+]]		; CHECK: buffer_load_dword [[LOAD:v[0-9]+]]
; CHECK: v_cmp_eq_u32_e32 vcc, 0, [[LOAD]]		; CHECK: v_cmp_eq_u32_e32 vcc, 0, [[LOAD]]
; CHECK-NEXT: s_and_b64 vcc, exec, vcc		; CHECK-NEXT: s_and_b64 vcc, exec, vcc
; CHECK-NEXT: s_cbranch_vccnz [[LOOP_BB]]		; CHECK-NEXT: s_cbranch_vccnz [[LOOP_BB]]

; CHECK-NEXT: {{^}}[[EXIT]]:		; CHECK-NEXT: {{^}}[[EXIT]]:
; CHECK: s_or_b64 exec, exec, [[SAVEEXEC]]		; CHECK: s_or_b64 exec, exec, [[SAVEEXEC]]
; CHECK: buffer_store_dword		; CHECK: buffer_store_dword
; CHECK: s_endpgm		; CHECK: s_endpgm

		; CHECK: [[EXIT_BB]]:
		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @test_kill_divergent_loop(i32 %arg) #0 {		define amdgpu_ps void @test_kill_divergent_loop(i32 %arg) #0 {
entry:		entry:
%cmp = icmp eq i32 %arg, 0		%cmp = icmp eq i32 %arg, 0
br i1 %cmp, label %bb, label %exit		br i1 %cmp, label %bb, label %exit

bb:		bb:
%var = call float asm sideeffect "		%var = call float asm sideeffect "
v_mov_b32_e64 v7, -1		v_mov_b32_e64 v7, -1
Show All 16 Lines
exit:		exit:
store volatile i32 8, i32 addrspace(1)* undef		store volatile i32 8, i32 addrspace(1)* undef
ret void		ret void
}		}

; bug 28550		; bug 28550
; CHECK-LABEL: {{^}}phi_use_def_before_kill:		; CHECK-LABEL: {{^}}phi_use_def_before_kill:
; CHECK: v_cndmask_b32_e64 [[PHIREG:v[0-9]+]], 0, -1.0,		; CHECK: v_cndmask_b32_e64 [[PHIREG:v[0-9]+]], 0, -1.0,
; CHECK: v_cmpx_lt_f32_e32 vcc, 0,		; CHECK: v_cmp_gt_f32_e32 vcc, 0,
; CHECK-NEXT: s_cbranch_execz [[EXITBB:BB[0-9]+_[0-9]+]]		; CHECK-NEXT: s_andn2_b64 exec, exec, vcc
		; CHECK-NEXT: s_cbranch_scc0 [[EXITBB:BB[0-9]+_[0-9]+]]

; CHECK: ; %[[KILLBB:bb.[0-9]+]]:		; CHECK: ; %[[KILLBB:bb.[0-9]+]]:
		; CHECK-NEXT: s_andn2_b64
; CHECK-NEXT: s_cbranch_scc0 [[PHIBB:BB[0-9]+_[0-9]+]]		; CHECK-NEXT: s_cbranch_scc0 [[PHIBB:BB[0-9]+_[0-9]+]]

; CHECK: [[PHIBB]]:		; CHECK: [[PHIBB]]:
; CHECK: v_cmp_eq_f32_e32 vcc, 0, [[PHIREG]]		; CHECK: v_cmp_eq_f32_e32 vcc, 0, [[PHIREG]]
; CHECK: s_cbranch_vccz [[ENDBB:BB[0-9]+_[0-9]+]]		; CHECK: s_cbranch_vccz [[ENDBB:BB[0-9]+_[0-9]+]]

; CHECK: ; %bb10		; CHECK: ; %bb10
; CHECK: v_mov_b32_e32 v{{[0-9]+}}, 9		; CHECK: v_mov_b32_e32 v{{[0-9]+}}, 9
; CHECK: buffer_store_dword		; CHECK: buffer_store_dword

; CHECK: [[ENDBB]]:		; CHECK: [[ENDBB]]:
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm

; CHECK: [[EXITBB]]:		; CHECK: [[EXITBB]]:
; CHECK: exp null		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
; CHECK-NEXT: s_endpgm		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @phi_use_def_before_kill(float inreg %x) #0 {		define amdgpu_ps void @phi_use_def_before_kill(float inreg %x) #0 {
bb:		bb:
%tmp = fadd float %x, 1.000000e+00		%tmp = fadd float %x, 1.000000e+00
%tmp1 = fcmp olt float 0.000000e+00, %tmp		%tmp1 = fcmp olt float 0.000000e+00, %tmp
%tmp2 = select i1 %tmp1, float -1.000000e+00, float 0.000000e+00		%tmp2 = select i1 %tmp1, float -1.000000e+00, float 0.000000e+00
%cmp.tmp2 = fcmp olt float %tmp2, 0.0		%cmp.tmp2 = fcmp olt float %tmp2, 0.0
call void @llvm.amdgcn.kill(i1 %cmp.tmp2)		call void @llvm.amdgcn.kill(i1 %cmp.tmp2)
Show All 15 Lines
end:		end:
ret void		ret void
}		}

; CHECK-LABEL: {{^}}no_skip_no_successors:		; CHECK-LABEL: {{^}}no_skip_no_successors:
; CHECK: v_cmp_nge_f32		; CHECK: v_cmp_nge_f32
; CHECK: s_cbranch_vccz [[SKIPKILL:BB[0-9]+_[0-9]+]]		; CHECK: s_cbranch_vccz [[SKIPKILL:BB[0-9]+_[0-9]+]]

; CHECK: ; %bb6		; FIXME: ideally this should just be a s_branch
; CHECK: s_mov_b64 exec, 0		; CHECK: s_mov_b64 s[2:3], exec
		; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], exec
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]
		; CHECK-NEXT: ; %bb6
		; CHECK-NEXT: s_mov_b64 exec, 0

; CHECK: [[SKIPKILL]]:		; CHECK: [[SKIPKILL]]:
; CHECK: v_cmp_nge_f32_e32 vcc		; CHECK: v_cmp_nge_f32_e32 vcc
; CHECK: %bb.3: ; %bb5		; CHECK: %bb.4: ; %bb5
; CHECK-NEXT: .Lfunc_end{{[0-9]+}}
		; CHECK: [[EXIT_BB]]
		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
		; CHECK-NEXT: s_endpgm
define amdgpu_ps void @no_skip_no_successors(float inreg %arg, float inreg %arg1) #0 {		define amdgpu_ps void @no_skip_no_successors(float inreg %arg, float inreg %arg1) #0 {
bb:		bb:
%tmp = fcmp ult float %arg1, 0.000000e+00		%tmp = fcmp ult float %arg1, 0.000000e+00
%tmp2 = fcmp ult float %arg, 0x3FCF5C2900000000		%tmp2 = fcmp ult float %arg, 0x3FCF5C2900000000
br i1 %tmp, label %bb6, label %bb3		br i1 %tmp, label %bb6, label %bb3

bb3: ; preds = %bb		bb3: ; preds = %bb
br i1 %tmp2, label %bb5, label %bb4		br i1 %tmp2, label %bb5, label %bb4
Show All 12 Lines	bb7: ; preds = %bb4
ret void		ret void
}		}

; CHECK-LABEL: {{^}}if_after_kill_block:		; CHECK-LABEL: {{^}}if_after_kill_block:
; CHECK: ; %bb.0:		; CHECK: ; %bb.0:
; CHECK: s_and_saveexec_b64		; CHECK: s_and_saveexec_b64
; CHECK: s_xor_b64		; CHECK: s_xor_b64

; CHECK: v_cmpx_gt_f32_e32 vcc, 0,		; CHECK: v_cmp_lt_f32_e32 vcc, 0,
; CHECK: BB{{[0-9]+_[0-9]+}}:		; CHECK: s_cbranch_scc0 [[EXIT_BB:BB[0-9]+_[0-9]+]]

; CHECK: s_or_b64 exec, exec		; CHECK: s_or_b64 exec, exec
; CHECK: image_sample_c		; CHECK: image_sample_c

; CHECK: v_cmp_neq_f32_e32 vcc, 0,		; CHECK: v_cmp_neq_f32_e32 vcc, 0,
; CHECK: s_and_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, vcc		; CHECK: s_and_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, vcc
; CHECK-NEXT: s_cbranch_execz [[END:BB[0-9]+_[0-9]+]]		; CHECK-NEXT: s_cbranch_execz [[END:BB[0-9]+_[0-9]+]]
; CHECK-NOT: branch		; CHECK-NOT: branch

; CHECK: ; %bb.{{[0-9]+}}: ; %bb8		; CHECK: ; %bb.{{[0-9]+}}: ; %bb8
; CHECK: buffer_store_dword		; CHECK: buffer_store_dword

; CHECK: [[END]]:		; CHECK: [[END]]:
; CHECK: s_endpgm		; CHECK: s_endpgm

		; CHECK: [[EXIT_BB]]:
		; CHECK-NEXT: s_mov_b64 exec, 0
		; CHECK-NEXT: exp null
		; CHECK-NEXT: s_endpgm

define amdgpu_ps void @if_after_kill_block(float %arg, float %arg1, float %arg2, float %arg3) #0 {		define amdgpu_ps void @if_after_kill_block(float %arg, float %arg1, float %arg2, float %arg3) #0 {
bb:		bb:
%tmp = fcmp ult float %arg1, 0.000000e+00		%tmp = fcmp ult float %arg1, 0.000000e+00
br i1 %tmp, label %bb3, label %bb4		br i1 %tmp, label %bb3, label %bb4

bb3: ; preds = %bb		bb3: ; preds = %bb
%cmp.arg = fcmp olt float %arg, 0.0		%cmp.arg = fcmp olt float %arg, 0.0
call void @llvm.amdgcn.kill(i1 %cmp.arg)		call void @llvm.amdgcn.kill(i1 %cmp.arg)
Show All 9 Lines	bb8: ; preds = %bb9, %bb4
store volatile i32 9, i32 addrspace(1)* undef		store volatile i32 9, i32 addrspace(1)* undef
ret void		ret void

bb9: ; preds = %bb4		bb9: ; preds = %bb4
ret void		ret void
}		}

; CHECK-LABEL: {{^}}cbranch_kill:		; CHECK-LABEL: {{^}}cbranch_kill:
		; CHECK: ; %bb.{{[0-9]+}}: ; %kill
		; CHECK-NEXT: s_andn2
		; CHECK-NEXT: s_cbranch_scc0 [[EXIT:BB[0-9]+_[0-9]+]]
; CHECK: ; %bb.{{[0-9]+}}: ; %export		; CHECK: ; %bb.{{[0-9]+}}: ; %export
; CHECK-NEXT: s_or_b64		; CHECK-NEXT: s_or_b64
; CHECK-NEXT: s_cbranch_execz [[EXIT:BB[0-9]+_[0-9]+]]
; CHECK: [[EXIT]]:		; CHECK: [[EXIT]]:
		; CHECK-NEXT: s_mov_b64 exec, 0
; CHECK-NEXT: exp null off, off, off, off done vm		; CHECK-NEXT: exp null off, off, off, off done vm
define amdgpu_ps void @cbranch_kill(i32 inreg %0, <2 x float> %1) {		define amdgpu_ps void @cbranch_kill(i32 inreg %0, <2 x float> %1) {
.entry:		.entry:
%val0 = extractelement <2 x float> %1, i32 0		%val0 = extractelement <2 x float> %1, i32 0
%val1 = extractelement <2 x float> %1, i32 1		%val1 = extractelement <2 x float> %1, i32 1
%p0 = call float @llvm.amdgcn.interp.p1(float %val0, i32 immarg 0, i32 immarg 1, i32 %0) #2		%p0 = call float @llvm.amdgcn.interp.p1(float %val0, i32 immarg 0, i32 immarg 1, i32 %0) #2
%sample = call float @llvm.amdgcn.image.sample.l.2darray.f32.f32(i32 1, float %p0, float %p0, float %p0, float 0.000000e+00, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)		%sample = call float @llvm.amdgcn.image.sample.l.2darray.f32.f32(i32 1, float %p0, float %p0, float %p0, float 0.000000e+00, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)
%cond0 = fcmp ugt float %sample, 0.000000e+00		%cond0 = fcmp ugt float %sample, 0.000000e+00
Show All 22 Lines	export:
%out.0 = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %proxy.0.0, float %proxy.0.1) #2		%out.0 = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %proxy.0.0, float %proxy.0.1) #2
%out.1 = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %proxy.0.2, float %proxy.0.3) #2		%out.1 = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %proxy.0.2, float %proxy.0.3) #2
call void @llvm.amdgcn.exp.compr.v2f16(i32 immarg 0, i32 immarg 15, <2 x half> %out.0, <2 x half> %out.1, i1 immarg true, i1 immarg true) #3		call void @llvm.amdgcn.exp.compr.v2f16(i32 immarg 0, i32 immarg 15, <2 x half> %out.0, <2 x half> %out.1, i1 immarg true, i1 immarg true) #3
ret void		ret void
}		}

; CHECK-LABEL: {{^}}complex_loop:		; CHECK-LABEL: {{^}}complex_loop:
; CHECK: s_mov_b64 exec, 0		; CHECK: s_mov_b64 exec, 0
; CHECK-NOT: exp null		; CHECK: exp null
define amdgpu_ps void @complex_loop(i32 inreg %cmpa, i32 %cmpb, i32 %cmpc) {		define amdgpu_ps void @complex_loop(i32 inreg %cmpa, i32 %cmpb, i32 %cmpc) {
.entry:		.entry:
%flaga = icmp sgt i32 %cmpa, 0		%flaga = icmp sgt i32 %cmpa, 0
br i1 %flaga, label %.lr.ph, label %._crit_edge		br i1 %flaga, label %.lr.ph, label %._crit_edge

.lr.ph:		.lr.ph:
br label %hdr		br label %hdr

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll

	Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_ps { <4 x float> } @test_return_to_epilog_with_optimized_kill(float %val) #0 {			define amdgpu_ps { <4 x float> } @test_return_to_epilog_with_optimized_kill(float %val) #0 {
	; GCN-LABEL: name: test_return_to_epilog_with_optimized_kill			; GCN-LABEL: name: test_return_to_epilog_with_optimized_kill
	; GCN: bb.0.entry:			; GCN: bb.0.entry:
	; GCN: successors: %bb.1(0x40000000), %bb.4(0x40000000)			; GCN: successors: %bb.1(0x40000000), %bb.4(0x40000000)
	; GCN: liveins: $vgpr0			; GCN: liveins: $vgpr0
	; GCN: renamable $vgpr1 = nofpexcept V_RCP_F32_e32 $vgpr0, implicit $mode, implicit $exec			; GCN: renamable $vgpr1 = nofpexcept V_RCP_F32_e32 $vgpr0, implicit $mode, implicit $exec
				; GCN: $sgpr0_sgpr1 = S_MOV_B64 $exec
	; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed $vgpr1, implicit-def $vcc, implicit $mode, implicit $exec			; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed $vgpr1, implicit-def $vcc, implicit $mode, implicit $exec
	; GCN: $sgpr0_sgpr1 = S_AND_SAVEEXEC_B64 killed $vcc, implicit-def $exec, implicit-def $scc, implicit $exec			; GCN: $sgpr2_sgpr3 = S_AND_SAVEEXEC_B64 killed $vcc, implicit-def $exec, implicit-def $scc, implicit $exec
	; GCN: renamable $sgpr0_sgpr1 = S_XOR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc			; GCN: renamable $sgpr2_sgpr3 = S_XOR_B64 $exec, killed renamable $sgpr2_sgpr3, implicit-def dead $scc
	; GCN: S_CBRANCH_EXECZ %bb.4, implicit $exec			; GCN: S_CBRANCH_EXECZ %bb.4, implicit $exec
	; GCN: bb.1.flow.preheader:			; GCN: bb.1.flow.preheader:
	; GCN: successors: %bb.2(0x80000000)			; GCN: successors: %bb.2(0x80000000)
	; GCN: liveins: $vgpr0, $sgpr0_sgpr1			; GCN: liveins: $vgpr0, $sgpr0_sgpr1, $sgpr2_sgpr3
	; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $mode, implicit $exec			; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $mode, implicit $exec
	; GCN: renamable $sgpr2_sgpr3 = S_MOV_B64 0			; GCN: renamable $sgpr4_sgpr5 = S_MOV_B64 0
	; GCN: bb.2.flow:			; GCN: bb.2.flow:
	; GCN: successors: %bb.3(0x04000000), %bb.2(0x7c000000)			; GCN: successors: %bb.3(0x04000000), %bb.2(0x7c000000)
	; GCN: liveins: $vcc, $sgpr0_sgpr1, $sgpr2_sgpr3			; GCN: liveins: $vcc, $sgpr0_sgpr1, $sgpr2_sgpr3, $sgpr4_sgpr5
	; GCN: renamable $sgpr4_sgpr5 = S_AND_B64 $exec, renamable $vcc, implicit-def $scc			; GCN: renamable $sgpr6_sgpr7 = S_AND_B64 $exec, renamable $vcc, implicit-def $scc
	; GCN: renamable $sgpr2_sgpr3 = S_OR_B64 killed renamable $sgpr4_sgpr5, killed renamable $sgpr2_sgpr3, implicit-def $scc			; GCN: renamable $sgpr4_sgpr5 = S_OR_B64 killed renamable $sgpr6_sgpr7, killed renamable $sgpr4_sgpr5, implicit-def $scc
	; GCN: $exec = S_ANDN2_B64 $exec, renamable $sgpr2_sgpr3, implicit-def $scc			; GCN: $exec = S_ANDN2_B64 $exec, renamable $sgpr4_sgpr5, implicit-def $scc
	; GCN: S_CBRANCH_EXECNZ %bb.2, implicit $exec			; GCN: S_CBRANCH_EXECNZ %bb.2, implicit $exec
	; GCN: bb.3.Flow:			; GCN: bb.3.Flow:
	; GCN: successors: %bb.4(0x80000000)			; GCN: successors: %bb.4(0x80000000)
	; GCN: liveins: $sgpr0_sgpr1, $sgpr2_sgpr3			; GCN: liveins: $sgpr0_sgpr1, $sgpr2_sgpr3, $sgpr4_sgpr5
	; GCN: $exec = S_OR_B64 $exec, killed renamable $sgpr2_sgpr3, implicit-def $scc			; GCN: $exec = S_OR_B64 $exec, killed renamable $sgpr4_sgpr5, implicit-def $scc
	; GCN: bb.4.Flow1:			; GCN: bb.4.Flow1:
	; GCN: successors: %bb.5(0x40000000), %bb.6(0x40000000)			; GCN: successors: %bb.5(0x40000000)
	; GCN: liveins: $sgpr0_sgpr1			; GCN: liveins: $sgpr0_sgpr1, $sgpr2_sgpr3
	; GCN: renamable $sgpr0_sgpr1 = S_OR_SAVEEXEC_B64 killed renamable $sgpr0_sgpr1, implicit-def $exec, implicit-def $scc, implicit $exec			; GCN: renamable $sgpr2_sgpr3 = S_OR_SAVEEXEC_B64 killed renamable $sgpr2_sgpr3, implicit-def $exec, implicit-def $scc, implicit $exec
	; GCN: $exec = S_XOR_B64 $exec, renamable $sgpr0_sgpr1, implicit-def $scc			; GCN: $exec = S_XOR_B64 $exec, renamable $sgpr2_sgpr3, implicit-def $scc
	; GCN: S_CBRANCH_EXECZ %bb.6, implicit $exec
	; GCN: bb.5.kill0:			; GCN: bb.5.kill0:
				; GCN: successors: %bb.8(0x40000000), %bb.7(0x40000000)
				; GCN: liveins: $sgpr0_sgpr1, $sgpr2_sgpr3
				; GCN: dead renamable $sgpr0_sgpr1 = S_ANDN2_B64 killed renamable $sgpr0_sgpr1, $exec, implicit-def $scc
				; GCN: S_CBRANCH_SCC0 %bb.7, implicit $scc
				; GCN: bb.8.kill0:
	; GCN: successors: %bb.6(0x80000000)			; GCN: successors: %bb.6(0x80000000)
	; GCN: liveins: $sgpr0_sgpr1			; GCN: liveins: $sgpr2_sgpr3, $scc
	; GCN: $exec = S_MOV_B64 0			; GCN: $exec = S_MOV_B64 0
	; GCN: bb.6.end:			; GCN: bb.6.end:
	; GCN: successors: %bb.7(0x40000000), %bb.8(0x40000000)			; GCN: successors: %bb.9(0x80000000)
	; GCN: liveins: $sgpr0_sgpr1			; GCN: liveins: $sgpr2_sgpr3
	; GCN: $exec = S_OR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def $scc			; GCN: $exec = S_OR_B64 $exec, killed renamable $sgpr2_sgpr3, implicit-def $scc
	; GCN: S_CBRANCH_EXECZ %bb.7, implicit $exec			; GCN: S_BRANCH %bb.9
	; GCN: S_BRANCH %bb.8
	; GCN: bb.7:			; GCN: bb.7:
				; GCN: $exec = S_MOV_B64 0
	; GCN: EXP_DONE 9, undef $vgpr0, undef $vgpr0, undef $vgpr0, undef $vgpr0, 1, 0, 0, implicit $exec			; GCN: EXP_DONE 9, undef $vgpr0, undef $vgpr0, undef $vgpr0, undef $vgpr0, 1, 0, 0, implicit $exec
	; GCN: S_ENDPGM 0			; GCN: S_ENDPGM 0
	; GCN: bb.8:			; GCN: bb.9:
	entry:			entry:
	%.i0 = fdiv reassoc nnan nsz arcp contract afn float 1.000000e+00, %val			%.i0 = fdiv reassoc nnan nsz arcp contract afn float 1.000000e+00, %val
	%cmp0 = fcmp olt float %.i0, 0.000000e+00			%cmp0 = fcmp olt float %.i0, 0.000000e+00
	br i1 %cmp0, label %kill0, label %flow			br i1 %cmp0, label %kill0, label %flow

	kill0: ; preds = %entry			kill0: ; preds = %entry
	call void @llvm.amdgcn.kill(i1 false)			call void @llvm.amdgcn.kill(i1 false)
	br label %end			br label %end
	Show All 16 Lines

llvm/test/CodeGen/AMDGPU/vcmpx-exec-war-hazard.mir

	# RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs -run-pass si-insert-skips,post-RA-hazard-rec -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs -run-pass post-RA-hazard-rec -o - %s \| FileCheck -check-prefix=GCN %s

	# GCN-LABEL: name: hazard_vcmpx_smov_exec_lo			# GCN-LABEL: name: hazard_vcmpx_smov_exec_lo
	# GCN: $sgpr0 = S_MOV_B32 $exec_lo			# GCN: $sgpr0 = S_MOV_B32 $exec_lo
	# GCN-NEXT: S_WAITCNT_DEPCTR 65534			# GCN-NEXT: S_WAITCNT_DEPCTR 65534
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: hazard_vcmpx_smov_exec_lo			name: hazard_vcmpx_smov_exec_lo
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	$sgpr0 = S_MOV_B32 $exec_lo			$sgpr0 = S_MOV_B32 $exec_lo
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: hazard_vcmpx_smov_exec			# GCN-LABEL: name: hazard_vcmpx_smov_exec
	# GCN: $sgpr0_sgpr1 = S_MOV_B64 $exec			# GCN: $sgpr0_sgpr1 = S_MOV_B64 $exec
	# GCN-NEXT: S_WAITCNT_DEPCTR 65534			# GCN-NEXT: S_WAITCNT_DEPCTR 65534
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: hazard_vcmpx_smov_exec			name: hazard_vcmpx_smov_exec
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	$sgpr0_sgpr1 = S_MOV_B64 $exec			$sgpr0_sgpr1 = S_MOV_B64 $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: no_hazard_vcmpx_vmov_exec_lo			# GCN-LABEL: name: no_hazard_vcmpx_vmov_exec_lo
	# GCN: $vgpr0 = V_MOV_B32_e32 $exec_lo, implicit $exec			# GCN: $vgpr0 = V_MOV_B32_e32 $exec_lo, implicit $exec
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: no_hazard_vcmpx_vmov_exec_lo			name: no_hazard_vcmpx_vmov_exec_lo
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 $exec_lo, implicit $exec			$vgpr0 = V_MOV_B32_e32 $exec_lo, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: no_hazard_vcmpx_valu_impuse_exec			# GCN-LABEL: name: no_hazard_vcmpx_valu_impuse_exec
	# GCN: $vgpr0 = V_MOV_B32_e32 0, implicit $exec			# GCN: $vgpr0 = V_MOV_B32_e32 0, implicit $exec
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: no_hazard_vcmpx_valu_impuse_exec			name: no_hazard_vcmpx_valu_impuse_exec
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_imp			# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_imp
	# GCN: $sgpr0 = S_MOV_B32 $exec_lo			# GCN: $sgpr0 = S_MOV_B32 $exec_lo
	# GCN-NEXT: $vgpr0 = V_ADDC_U32_e32 0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec			# GCN-NEXT: $vgpr0 = V_ADDC_U32_e32 0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_imp			name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_imp
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	$sgpr0 = S_MOV_B32 $exec_lo			$sgpr0 = S_MOV_B32 $exec_lo
	$vgpr0 = V_ADDC_U32_e32 0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec			$vgpr0 = V_ADDC_U32_e32 0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_exp			# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_exp
	# GCN: $sgpr0 = S_MOV_B32 $exec_lo			# GCN: $sgpr0 = S_MOV_B32 $exec_lo
	# GCN-NEXT: $sgpr0_sgpr1 = V_CMP_EQ_U32_e64 $vgpr0, 0, implicit $exec			# GCN-NEXT: $sgpr0_sgpr1 = V_CMP_EQ_U32_e64 $vgpr0, 0, implicit $exec
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_exp			name: no_hazard_vcmpx_smov_exec_lo_valu_writes_sgpr_exp
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	$sgpr0 = S_MOV_B32 $exec_lo			$sgpr0 = S_MOV_B32 $exec_lo
	$sgpr0_sgpr1 = V_CMP_EQ_U32_e64 $vgpr0, 0, implicit $exec			$sgpr0_sgpr1 = V_CMP_EQ_U32_e64 $vgpr0, 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_depctr_fffe			# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_depctr_fffe
	# GCN: $sgpr0 = S_MOV_B32 $exec_lo			# GCN: $sgpr0 = S_MOV_B32 $exec_lo
	# GCN-NEXT: S_WAITCNT_DEPCTR 65534			# GCN-NEXT: S_WAITCNT_DEPCTR 65534
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: no_hazard_vcmpx_smov_exec_lo_depctr_fffe			name: no_hazard_vcmpx_smov_exec_lo_depctr_fffe
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	$sgpr0 = S_MOV_B32 $exec_lo			$sgpr0 = S_MOV_B32 $exec_lo
	S_WAITCNT_DEPCTR 65534			S_WAITCNT_DEPCTR 65534
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_depctr_ffff			# GCN-LABEL: name: no_hazard_vcmpx_smov_exec_lo_depctr_ffff
	# GCN: $sgpr0 = S_MOV_B32 $exec_lo			# GCN: $sgpr0 = S_MOV_B32 $exec_lo
	# GCN-NEXT: S_WAITCNT_DEPCTR 65535			# GCN-NEXT: S_WAITCNT_DEPCTR 65535
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: no_hazard_vcmpx_smov_exec_lo_depctr_ffff			name: no_hazard_vcmpx_smov_exec_lo_depctr_ffff
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	$sgpr0 = S_MOV_B32 $exec_lo			$sgpr0 = S_MOV_B32 $exec_lo
	S_WAITCNT_DEPCTR 65535			S_WAITCNT_DEPCTR 65535
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: hazard_vcmpx_smov_exec_lo_depctr_effe			# GCN-LABEL: name: hazard_vcmpx_smov_exec_lo_depctr_effe
	# GCN: $sgpr0 = S_MOV_B32 $exec_lo			# GCN: $sgpr0 = S_MOV_B32 $exec_lo
	# GCN: S_WAITCNT_DEPCTR 65534			# GCN: S_WAITCNT_DEPCTR 65534
	# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32			# GCN-NEXT: V_CMPX_LE_F32_nosdst_e32
	---			---
	name: hazard_vcmpx_smov_exec_lo_depctr_effe			name: hazard_vcmpx_smov_exec_lo_depctr_effe
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	$sgpr0 = S_MOV_B32 $exec_lo			$sgpr0 = S_MOV_B32 $exec_lo
	S_WAITCNT_DEPCTR 61438			S_WAITCNT_DEPCTR 61438
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	S_ENDPGM 0			S_ENDPGM 0
	...			...

llvm/test/CodeGen/AMDGPU/vcmpx-permlane-hazard.mir

	# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass si-insert-skips,post-RA-hazard-rec -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass post-RA-hazard-rec -o - %s \| FileCheck -check-prefix=GCN %s

	# GCN-LABEL: name: hazard_vcmpx_permlane16			# GCN-LABEL: name: hazard_vcmpx_permlane16
	# GCN: V_CMPX_LE_F32_nosdst_e32			# GCN: V_CMPX_LE_F32_nosdst_e32
	# GCN: S_ADD_U32			# GCN: S_ADD_U32
	# GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec			# GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec
	# GCN-NEXT: V_PERMLANE16_B32			# GCN-NEXT: V_PERMLANE16_B32
	---			---
	name: hazard_vcmpx_permlane16			name: hazard_vcmpx_permlane16
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr1 = IMPLICIT_DEF			$vgpr1 = IMPLICIT_DEF
	$vgpr2 = IMPLICIT_DEF			$vgpr2 = IMPLICIT_DEF
	$sgpr0 = IMPLICIT_DEF			$sgpr0 = IMPLICIT_DEF
	$sgpr1 = S_ADD_U32 $sgpr0, 0, implicit-def $scc			$sgpr1 = S_ADD_U32 $sgpr0, 0, implicit-def $scc
	$vgpr1 = V_PERMLANE16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec			$vgpr1 = V_PERMLANE16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: hazard_vcmpx_permlanex16			# GCN-LABEL: name: hazard_vcmpx_permlanex16
	# GCN: V_CMPX_LE_F32_nosdst_e32			# GCN: V_CMPX_LE_F32_nosdst_e32
	# GCN: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec			# GCN: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec
	# GCN-NEXT: V_PERMLANEX16_B32			# GCN-NEXT: V_PERMLANEX16_B32
	---			---
	name: hazard_vcmpx_permlanex16			name: hazard_vcmpx_permlanex16
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr1 = IMPLICIT_DEF			$vgpr1 = IMPLICIT_DEF
	$vgpr2 = IMPLICIT_DEF			$vgpr2 = IMPLICIT_DEF
	$sgpr0 = IMPLICIT_DEF			$sgpr0 = IMPLICIT_DEF
	$sgpr1 = IMPLICIT_DEF			$sgpr1 = IMPLICIT_DEF
	$vgpr1 = V_PERMLANEX16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec			$vgpr1 = V_PERMLANEX16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: hazard_vcmpx_permlane16_v_nop			# GCN-LABEL: name: hazard_vcmpx_permlane16_v_nop
	# GCN: V_CMPX_LE_F32_nosdst_e32			# GCN: V_CMPX_LE_F32_nosdst_e32
	# GCN: V_NOP			# GCN: V_NOP
	# GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec			# GCN-NEXT: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec
	# GCN-NEXT: V_PERMLANE16_B32			# GCN-NEXT: V_PERMLANE16_B32
	---			---
	name: hazard_vcmpx_permlane16_v_nop			name: hazard_vcmpx_permlane16_v_nop
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr1 = IMPLICIT_DEF			$vgpr1 = IMPLICIT_DEF
	$vgpr2 = IMPLICIT_DEF			$vgpr2 = IMPLICIT_DEF
	$sgpr0 = IMPLICIT_DEF			$sgpr0 = IMPLICIT_DEF
	$sgpr1 = IMPLICIT_DEF			$sgpr1 = IMPLICIT_DEF
	V_NOP_e32 implicit $exec			V_NOP_e32 implicit $exec
	$vgpr1 = V_PERMLANE16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec			$vgpr1 = V_PERMLANE16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: hazard_vcmpx_permlane16_far			# GCN-LABEL: name: hazard_vcmpx_permlane16_far
	# GCN: V_CMPX_LE_F32_nosdst_e32			# GCN: V_CMPX_LE_F32_nosdst_e32
	# GCN: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec			# GCN: $vgpr1 = V_MOV_B32_e32 killed $vgpr1, implicit $exec
	# GCN-NEXT: V_PERMLANE16_B32			# GCN-NEXT: V_PERMLANE16_B32
	---			---
	name: hazard_vcmpx_permlane16_far			name: hazard_vcmpx_permlane16_far
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr1 = IMPLICIT_DEF			$vgpr1 = IMPLICIT_DEF
	$vgpr2 = IMPLICIT_DEF			$vgpr2 = IMPLICIT_DEF
	$sgpr0 = IMPLICIT_DEF			$sgpr0 = IMPLICIT_DEF
	$sgpr1 = IMPLICIT_DEF			$sgpr1 = IMPLICIT_DEF
	V_NOP_e32 implicit $exec			V_NOP_e32 implicit $exec
	Show All 14 Lines
	# GCN: V_ADD_F32			# GCN: V_ADD_F32
	# GCN-NEXT: V_PERMLANE16_B32			# GCN-NEXT: V_PERMLANE16_B32
	---			---
	name: hazard_vcmpx_permlane16_no_hazard			name: hazard_vcmpx_permlane16_no_hazard
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr1 = IMPLICIT_DEF			$vgpr1 = IMPLICIT_DEF
	$sgpr0 = IMPLICIT_DEF			$sgpr0 = IMPLICIT_DEF
	$sgpr1 = IMPLICIT_DEF			$sgpr1 = IMPLICIT_DEF
	$vgpr2 = V_ADD_F32_e32 $vgpr1, $vgpr1, implicit $mode, implicit $exec			$vgpr2 = V_ADD_F32_e32 $vgpr1, $vgpr1, implicit $mode, implicit $exec
	$vgpr1 = V_PERMLANE16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec			$vgpr1 = V_PERMLANE16_B32 0, killed $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, $vgpr1, 0, implicit $exec
	S_ENDPGM 0			S_ENDPGM 0
	...			...

	# GCN-LABEL: name: hazard_vcmpx_permlane16_undef_src			# GCN-LABEL: name: hazard_vcmpx_permlane16_undef_src
	# GCN: V_CMPX_LE_F32_nosdst_e32			# GCN: V_CMPX_LE_F32_nosdst_e32
	# GCN: S_ADD_U32			# GCN: S_ADD_U32
	# GCN-NEXT: dead $vgpr1 = V_MOV_B32_e32 undef $vgpr1, implicit $exec			# GCN-NEXT: dead $vgpr1 = V_MOV_B32_e32 undef $vgpr1, implicit $exec
	# GCN-NEXT: V_PERMLANE16_B32			# GCN-NEXT: V_PERMLANE16_B32
	---			---
	name: hazard_vcmpx_permlane16_undef_src			name: hazard_vcmpx_permlane16_undef_src
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	$vgpr0 = V_MOV_B32_e32 0, implicit $exec			$vgpr0 = V_MOV_B32_e32 0, implicit $exec
	SI_KILL_F32_COND_IMM_TERMINATOR $vgpr0, 0, 3, implicit-def $exec, implicit-def $vcc, implicit-def $scc, implicit $exec			V_CMPX_LE_F32_nosdst_e32 0, $vgpr0, implicit-def $exec, implicit $mode, implicit $exec
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr2 = IMPLICIT_DEF			$vgpr2 = IMPLICIT_DEF
	$sgpr0 = IMPLICIT_DEF			$sgpr0 = IMPLICIT_DEF
	$sgpr1 = S_ADD_U32 $sgpr0, 0, implicit-def $scc			$sgpr1 = S_ADD_U32 $sgpr0, 0, implicit-def $scc
	$vgpr1 = V_PERMLANE16_B32 0, undef $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, undef $vgpr1, 0, implicit $exec			$vgpr1 = V_PERMLANE16_B32 0, undef $vgpr1, 0, killed $sgpr1, 0, killed $sgpr0, undef $vgpr1, 0, implicit $exec
	S_ENDPGM 0			S_ENDPGM 0
	...			...

llvm/test/CodeGen/AMDGPU/wave32.ll

Show All 28 Lines	define amdgpu_kernel void @test_vopc_f32(float addrspace(1)* %arg) {
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %lid		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %lid
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%cmp = fcmp ugt float %load, 0.0		%cmp = fcmp ugt float %load, 0.0
%sel = select i1 %cmp, float 1.0, float 2.0		%sel = select i1 %cmp, float 1.0, float 2.0
store float %sel, float addrspace(1)* %gep, align 4		store float %sel, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_vopc_vcmpx:		; GCN-LABEL: {{^}}test_vopc_vcmp:
; GFX1032: v_cmpx_le_f32_e32 0, v{{[0-9]+}}		; GFX1032: v_cmp_ge_f32_e32 vcc_lo, 0, v{{[0-9]+}}
; GFX1064: v_cmpx_le_f32_e32 0, v{{[0-9]+}}		; GFX1064: v_cmp_ge_f32_e32 vcc, 0, v{{[0-9]+}}
define amdgpu_ps void @test_vopc_vcmpx(float %x) {		define amdgpu_ps void @test_vopc_vcmp(float %x) {
%cmp = fcmp oge float %x, 0.0		%cmp = fcmp oge float %x, 0.0
call void @llvm.amdgcn.kill(i1 %cmp)		call void @llvm.amdgcn.kill(i1 %cmp)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_vopc_2xf16:		; GCN-LABEL: {{^}}test_vopc_2xf16:
; GFX1032: v_cmp_le_f16_sdwa [[SC:vcc_lo\|s[0-9]+]], {{[vs][0-9]+}}, v{{[0-9]+}} src0_sel:WORD_1 src1_sel:DWORD		; GFX1032: v_cmp_le_f16_sdwa [[SC:vcc_lo\|s[0-9]+]], {{[vs][0-9]+}}, v{{[0-9]+}} src0_sel:WORD_1 src1_sel:DWORD
; GFX1032: v_cndmask_b32_e32 v{{[0-9]+}}, 0x3c003c00, v{{[0-9]+}}, [[SC]]		; GFX1032: v_cndmask_b32_e32 v{{[0-9]+}}, 0x3c003c00, v{{[0-9]+}}, [[SC]]
▲ Show 20 Lines • Show All 604 Lines • ▼ Show 20 Lines
; GFX1032: s_mov_b32 exec_lo, 0		; GFX1032: s_mov_b32 exec_lo, 0
; GFX1064: s_mov_b64 exec, 0		; GFX1064: s_mov_b64 exec, 0
define amdgpu_ps void @test_kill_i1_terminator_float() #0 {		define amdgpu_ps void @test_kill_i1_terminator_float() #0 {
call void @llvm.amdgcn.kill(i1 false)		call void @llvm.amdgcn.kill(i1 false)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_kill_i1_terminator_i1:		; GCN-LABEL: {{^}}test_kill_i1_terminator_i1:
		; GFX1032: s_mov_b32 [[LIVE:s[0-9]+]], exec_lo
; GFX1032: s_or_b32 [[OR:s[0-9]+]],		; GFX1032: s_or_b32 [[OR:s[0-9]+]],
; GFX1032: s_and_b32 exec_lo, exec_lo, [[OR]]		; GFX1032: s_xor_b32 [[KILL:s[0-9]+]], [[OR]], exec_lo
		; GFX1032: s_andn2_b32 [[MASK:s[0-9]+]], [[LIVE]], [[KILL]]
		; GFX1032: s_and_b32 exec_lo, exec_lo, [[MASK]]
		; GFX1064: s_mov_b64 [[LIVE:s\[[0-9:]+\]]], exec
; GFX1064: s_or_b64 [[OR:s\[[0-9:]+\]]],		; GFX1064: s_or_b64 [[OR:s\[[0-9:]+\]]],
; GFX1064: s_and_b64 exec, exec, [[OR]]		; GFX1064: s_xor_b64 [[KILL:s\[[0-9:]+\]]], [[OR]], exec
		; GFX1064: s_andn2_b64 [[MASK:s\[[0-9:]+\]]], [[LIVE]], [[KILL]]
		; GFX1064: s_and_b64 exec, exec, [[MASK]]
define amdgpu_gs void @test_kill_i1_terminator_i1(i32 %a, i32 %b, i32 %c, i32 %d) #0 {		define amdgpu_gs void @test_kill_i1_terminator_i1(i32 %a, i32 %b, i32 %c, i32 %d) #0 {
%c1 = icmp slt i32 %a, %b		%c1 = icmp slt i32 %a, %b
%c2 = icmp slt i32 %c, %d		%c2 = icmp slt i32 %c, %d
%x = or i1 %c1, %c2		%x = or i1 %c1, %c2
call void @llvm.amdgcn.kill(i1 %x)		call void @llvm.amdgcn.kill(i1 %x)
		call void @llvm.amdgcn.exp.f32(i32 0, i32 0, float 0.0, float 0.0, float 0.0, float 0.0, i1 false, i1 false)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_loop_vcc:		; GCN-LABEL: {{^}}test_loop_vcc:
; GFX1032: v_cmp_lt_f32_e32 vcc_lo,		; GFX1032: v_cmp_lt_f32_e32 vcc_lo,
; GFX1064: v_cmp_lt_f32_e32 vcc,		; GFX1064: v_cmp_lt_f32_e32 vcc,
; GCN: s_cbranch_vccz		; GCN: s_cbranch_vccz
define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) #0 {		define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) #0 {
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
define amdgpu_kernel void @test_intr_icmp_i32(i32 addrspace(1)* %out, i32 %src) {		define amdgpu_kernel void @test_intr_icmp_i32(i32 addrspace(1)* %out, i32 %src) {
%result = call i32 @llvm.amdgcn.icmp.i32.i32(i32 %src, i32 100, i32 32)		%result = call i32 @llvm.amdgcn.icmp.i32.i32(i32 %src, i32 100, i32 32)
store i32 %result, i32 addrspace(1)* %out		store i32 %result, i32 addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_wqm_vote:		; GCN-LABEL: {{^}}test_wqm_vote:
; GFX1032: v_cmp_neq_f32_e32 vcc_lo, 0		; GFX1032: v_cmp_neq_f32_e32 vcc_lo, 0
		; GFX1032: s_mov_b32 [[LIVE:s[0-9]+]], exec_lo
; GFX1032: s_wqm_b32 [[WQM:s[0-9]+]], vcc_lo		; GFX1032: s_wqm_b32 [[WQM:s[0-9]+]], vcc_lo
; GFX1032: s_and_b32 exec_lo, exec_lo, [[WQM]]		; GFX1032: s_xor_b32 [[KILL:s[0-9]+]], [[WQM]], exec_lo
		; GFX1032: s_andn2_b32 [[MASK:s[0-9]+]], [[LIVE]], [[KILL]]
		; GFX1032: s_and_b32 exec_lo, exec_lo, [[MASK]]
; GFX1064: v_cmp_neq_f32_e32 vcc, 0		; GFX1064: v_cmp_neq_f32_e32 vcc, 0
; GFX1064: s_wqm_b64 [[WQM:s\[[0-9:]+\]]], vcc{{$}}		; GFX1064: s_mov_b64 [[LIVE:s\[[0-9:]+\]]], exec
; GFX1064: s_and_b64 exec, exec, [[WQM]]		; GFX1064: s_wqm_b64 [[WQM:s\[[0-9:]+\]]], vcc
		; GFX1064: s_xor_b64 [[KILL:s\[[0-9:]+\]]], [[WQM]], exec
		; GFX1064: s_andn2_b64 [[MASK:s\[[0-9:]+\]]], [[LIVE]], [[KILL]]
		; GFX1064: s_and_b64 exec, exec, [[MASK]]
define amdgpu_ps void @test_wqm_vote(float %a) {		define amdgpu_ps void @test_wqm_vote(float %a) {
%c1 = fcmp une float %a, 0.0		%c1 = fcmp une float %a, 0.0
%c2 = call i1 @llvm.amdgcn.wqm.vote(i1 %c1)		%c2 = call i1 @llvm.amdgcn.wqm.vote(i1 %c1)
call void @llvm.amdgcn.kill(i1 %c2)		call void @llvm.amdgcn.kill(i1 %c2)
		call void @llvm.amdgcn.exp.f32(i32 0, i32 0, float 0.0, float 0.0, float 0.0, float 0.0, i1 false, i1 false)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_branch_true:		; GCN-LABEL: {{^}}test_branch_true:
; GFX1032: s_mov_b32 vcc_lo, exec_lo		; GFX1032: s_mov_b32 vcc_lo, exec_lo
; GFX1064: s_mov_b64 vcc, exec		; GFX1064: s_mov_b64 vcc, exec
define amdgpu_kernel void @test_branch_true() #2 {		define amdgpu_kernel void @test_branch_true() #2 {
entry:		entry:
▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines
declare i64 @llvm.amdgcn.icmp.i64.i32(i32, i32, i32)		declare i64 @llvm.amdgcn.icmp.i64.i32(i32, i32, i32)
declare i32 @llvm.amdgcn.fcmp.i32.f32(float, float, i32)		declare i32 @llvm.amdgcn.fcmp.i32.f32(float, float, i32)
declare i32 @llvm.amdgcn.icmp.i32.i32(i32, i32, i32)		declare i32 @llvm.amdgcn.icmp.i32.i32(i32, i32, i32)
declare void @llvm.amdgcn.kill(i1)		declare void @llvm.amdgcn.kill(i1)
declare i1 @llvm.amdgcn.wqm.vote(i1)		declare i1 @llvm.amdgcn.wqm.vote(i1)
declare i1 @llvm.amdgcn.ps.live()		declare i1 @llvm.amdgcn.ps.live()
declare i64 @llvm.cttz.i64(i64, i1)		declare i64 @llvm.cttz.i64(i64, i1)
declare i32 @llvm.cttz.i32(i32, i1)		declare i32 @llvm.cttz.i32(i32, i1)
		declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #5

attributes #0 = { nounwind readnone speculatable }		attributes #0 = { nounwind readnone speculatable }
attributes #1 = { nounwind }		attributes #1 = { nounwind }
attributes #2 = { nounwind readnone optnone noinline }		attributes #2 = { nounwind readnone optnone noinline }
attributes #3 = { "target-features"="+wavefrontsize32" }		attributes #3 = { "target-features"="+wavefrontsize32" }
attributes #4 = { "target-features"="+wavefrontsize64" }		attributes #4 = { "target-features"="+wavefrontsize64" }
		attributes #5 = { inaccessiblememonly nounwind }

llvm/test/CodeGen/AMDGPU/wqm.ll

	Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines
	;CHECK-LABEL: {{^}}test_kill_0:			;CHECK-LABEL: {{^}}test_kill_0:
	;CHECK-NEXT: ; %main_body			;CHECK-NEXT: ; %main_body
	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec			;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	;CHECK-NEXT: s_wqm_b64 exec, exec			;CHECK-NEXT: s_wqm_b64 exec, exec
	;CHECK: s_and_b64 exec, exec, [[ORIG]]			;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: image_sample			;CHECK: image_sample
	;CHECK: buffer_store_dword			;CHECK: buffer_store_dword
	;CHECK: s_wqm_b64 exec, exec			;CHECK: s_wqm_b64 exec, exec
	;CHECK: v_cmpx_			;CHECK: v_cmp_
	;CHECK: image_sample			;CHECK: image_sample
	;CHECK: s_and_b64 exec, exec, [[ORIG]]			;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: image_sample			;CHECK: image_sample
	;CHECK: buffer_store_dword			;CHECK: buffer_store_dword
	define amdgpu_ps <4 x float> @test_kill_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <2 x i32> %idx, <2 x float> %data, float %coord, float %coord2, float %z) {			define amdgpu_ps <4 x float> @test_kill_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <2 x i32> %idx, <2 x float> %data, float %coord, float %coord2, float %z) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0			%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0
	%idx.0 = extractelement <2 x i32> %idx, i32 0			%idx.0 = extractelement <2 x i32> %idx, i32 0
	Show All 18 Lines
	;			;
	; CHECK-LABEL: {{^}}test_kill_1:			; CHECK-LABEL: {{^}}test_kill_1:
	; CHECK-NEXT: ; %main_body			; CHECK-NEXT: ; %main_body
	; CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec			; CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec
	; CHECK: image_sample			; CHECK: image_sample
	; CHECK: s_and_b64 exec, exec, [[ORIG]]			; CHECK: s_and_b64 exec, exec, [[ORIG]]
	; CHECK: image_sample			; CHECK: image_sample
	; CHECK: buffer_store_dword
	; CHECK-NOT: wqm			; CHECK-NOT: wqm
	; CHECK: v_cmpx_			; CHECK-DAG: buffer_store_dword
				; CHECK-DAG: v_cmp_
	define amdgpu_ps <4 x float> @test_kill_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {			define amdgpu_ps <4 x float> @test_kill_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, i32 %idx, float %data, float %coord, float %coord2, float %z) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0			%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0
	%tex0 = extractelement <4 x float> %tex, i32 0			%tex0 = extractelement <4 x float> %tex, i32 0
	%dtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %tex0, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0			%dtex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %tex0, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0

	call void @llvm.amdgcn.raw.buffer.store.f32(float %data, <4 x i32> undef, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.raw.buffer.store.f32(float %data, <4 x i32> undef, i32 0, i32 0, i32 0)

	▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic and live mask tracking
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 315328

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp

llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wqm.demote.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.live.mask.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.wqm.demote.mir

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

llvm/test/CodeGen/AMDGPU/early-term.mir

llvm/test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.kill.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.vote.ll

llvm/test/CodeGen/AMDGPU/skip-if-dead.ll

llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll

llvm/test/CodeGen/AMDGPU/vcmpx-exec-war-hazard.mir

llvm/test/CodeGen/AMDGPU/vcmpx-permlane-hazard.mir

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/CodeGen/AMDGPU/wqm.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic and live mask trackingAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 315328

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp

llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wqm.demote.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.live.mask.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.wqm.demote.mir

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

llvm/test/CodeGen/AMDGPU/early-term.mir

llvm/test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.kill.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.vote.ll

llvm/test/CodeGen/AMDGPU/skip-if-dead.ll

llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll

llvm/test/CodeGen/AMDGPU/vcmpx-exec-war-hazard.mir

llvm/test/CodeGen/AMDGPU/vcmpx-permlane-hazard.mir

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/CodeGen/AMDGPU/wqm.ll

[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic and live mask tracking
AbandonedPublic